%%{init: { 'logLevel': 'debug', 'theme': 'base' } }%% graph LR waveform[Get Data] --> correlate(Correlation) correlate:::active -->|save| corrdb[(CorrDB/hdf5)] corrdb --> monitor monitor[Measure dv] -->|save| dv{{DV}} click waveform "../trace_data.html" "trace_data" click correlate "../correlate.html" "correlate" click monitor "../monitor.html" "monitor" click corrdb "../corrdb.html" "CorrDB" click dv "../monitor/dv.html" "DV" classDef active fill:#f666, stroke-width:4px, stroke:#f06;
Computing your first Noise Correlations#
After having downloaded your data as shown here (or retrieved seismic data any other way),
we are ready to compute our first noise correlations!
In SeisMIC, parameters for the preprocessing, the correlation, and, subsequently,
measuring the change in seismic velocity are provided as a .yaml file or as a python dictionary.
An example for such a .yaml file is shown below (and provided in the repository as params_example.yaml
).
Setting the parameters#
1#### Project wide parameters
2# lowest level project directory
3proj_dir : '/path/to/project/root/'
4# Save the component combinations in separate hdf5 files
5# This is faster for multi-core if True
6# Set to False for compatibility with SeisMIC version < 0.5.0
7save_comps_separately: True
8# directory for logging information
9log_subdir : 'log'
10# levels:
11# 'DEBUG', 'INFO', 'WARNING', 'ERROR', or 'CRITICAL'
12log_level: 'WARNING'
13# folder for figures
14fig_subdir : 'figures'
15
16
17#### parameters that are network specific
18net:
19 # list of stations used in the project
20 # type: list of strings or string, wildcards allowed
21 network : 'D0'
22 station : ['BZG', 'ESO', 'KBG', 'KIR']
23 # For "component" only strings are allowed (no lists!)
24 component: 'Z' # Use * or ? or None to allow all components
25
26#### parameters for correlation (emperical Green's function creation)
27co:
28 # subdirectory of 'proj_dir' to store correlation
29 # type: string
30 subdir : 'corr'
31 # times sequences to read for cliping or muting on stream basis
32 # These should be long enough for the reference (e.g. the standard
33 # deviation) to be rather independent of the parts to remove
34 # type: string
35 read_start : '2015-08-1 00:00:01.0'
36 read_end : '2015-08-06 00:00:00.0'
37 # type: float [seconds]
38 # The longer the faster, but higher RAM usage.
39 # Note that this is also the length of the correlation batches
40 # that will be written (i.e., length that will be
41 # kept in memory before writing to disk)
42 # If you are unsure, keep defaults
43 read_len : 86398
44 read_inc : 86400
45
46 # New sampling rate in Hz. Note that it will try to decimate
47 # if possible (i.e., there is an integer factor from the
48 # native sampling_rate). Pay attention to Nyquist frequency for
49 # later steps
50 sampling_rate : 25
51 # Remove the instrument response, will take substantially more time
52 remove_response : False
53
54 # Method to combine different traces
55 # Options are: 'betweenStations', 'betweenComponents', 'autoComponents', 'allSimpleCombinations', or 'allCombinations'
56 combination_method : 'betweenStations'
57
58 # If you want only specific combinations to be computed enter them here
59 # In the form [Net0-Net1.Stat0-Stat1] or [Net0-Net1.Stat0-Stat1.Cha0-Cha1]
60 # This option will only be consider if combination_method == 'betweenStations'
61 # Comment or set == None if not in use
62 xcombinations : None
63
64 # preprocessing of the original length time series
65 # these function work on an obspy.Stream object given as first argument
66 # and return an obspy.Stream object.
67 # available preimplemented functions are in seismic.correlate.preprocessing_stream
68 preProcessing : [
69 # This is intended as a "technical" bandpass filter to remove unphysical signals, i.e., frequencies you would not expect in the data
70 {'function':'seismic.correlate.preprocessing_stream.stream_filter',
71 'args':{'ftype':'bandpass',
72 'filter_option':{'freqmin':0.01, #0.01
73 'freqmax':12}}}
74 # Mute data, when the amplitude is above a certain threshold
75 #{'function':'seismic.correlate.preprocessing_stream.stream_mute',
76 # 'args':{'taper_len':100,
77 # 'mute_method':'std_factor',
78 # 'mute_value':3}}
79 ]
80 # subdivision of the read sequences for correlation
81 # if this is set the stream processing will happen on the hourly subdivision. This has the
82 # advantage that data that already exists will not need to be preprocessed again
83 # On the other hand, computing a whole new database might be slower
84 # Recommended to be set to True if:
85 # a) You update your database and a lot of the data is already available (up to a magnitude faster)
86 # b) corr_len is close to read_len
87 # Is automatically set to False if no existing correlations are found
88 preprocess_subdiv: True
89 # type: presence of this key
90 subdivision:
91 # type: float [seconds]
92 corr_inc : 3600
93 corr_len : 3600
94 # recombine these subdivisions
95 # unused at the time
96 # type: boolean
97 recombine_subdivision : True
98 # delete
99 # type: booblean
100 delete_subdivision : False
101
102 # parameters for correlation preprocessing
103 # Standard functions reside in seismic.correlate.preprocessing_td and preprocessing_fd, respectively
104 corr_args : {'TDpreProcessing':[
105 {'function':'seismic.correlate.preprocessing_td.taper',
106 'args':{'type':'cosine_taper', 'p': 0.02}},
107 {'function':'seismic.correlate.preprocessing_td.detrend',
108 'args':{'type':'linear'}},
109 {'function':'seismic.correlate.preprocessing_td.TDfilter',
110 'args':{'type':'bandpass','freqmin':2,'freqmax':8}},
111 #{'function':'seismic.correlate.preprocessing_td.mute',
112 # 'args':{'taper_len':100.,
113 # 'threshold':1000, absolute threshold
114 # 'std_factor':3,
115 # 'filter':{'type':'bandpass','freqmin':2,'freqmax':4},
116 # 'extend_gaps':True}},
117 {'function':'seismic.correlate.preprocessing_td.clip',
118 'args':{'std_factor':2.5}},
119 {'function':'seismic.correlate.preprocessing_td.signBitNormalization',
120 'args': {}}
121 ],
122 # Standard functions reside in seismic.correlate.preprocessing_fd
123 'FDpreProcessing':[
124 {'function':'seismic.correlate.preprocessing_fd.spectralWhitening',
125 'args':{'joint_norm':False}},
126 {'function':'seismic.correlate.preprocessing_fd.FDfilter',
127 'args':{'flimit':[0.01,0.02,9,10]}}
128 # {'function':seismic.correlate.preprocessing_fd.FDsignBitNormalization,
129 # 'args':{}}
130 ],
131 'lengthToSave':100,
132 'center_correlation':True, # make sure zero correlation time is in the center
133 'normalize_correlation':True,
134 'combinations':[]
135 }
136
137
138#### parameters for the estimation of time differences
139dv:
140 # subfolder for storage of time difference results
141 subdir : 'vel_change'
142
143 # Plotting
144 plot_vel_change : True
145
146 ### Definition of calender time windows for the time difference measurements
147 start_date : '2015-05-01 00:00:00.0' # %Y-%m-%dT%H:%M:%S.%fZ'
148 end_date : '2016-01-01 00:00:00.0'
149 win_len : 86400 # length of window in which EGFs are stacked
150 date_inc : 86400 # increment of measurements
151
152 ### Frequencies
153 freq_min : 4
154 freq_max : 8
155
156 ### Definition of lapse time window, i.e. time window in the coda that is used for the dv/v estimate
157 tw_start : 20 # lapse time of first sample [s]
158 tw_len : 60 # length of window [s] Can be None if the whole (rest) of the coda should be used
159 sides : 'both' # options are left (for acausal), right (causal), both, or single (for active source experiments where the first sample is the trigger time)
160 compute_tt : True # Computes the travel time and adds it to tw_start (tt is 0 if not xstations). If true a rayleigh wave velocity has to be provided
161 rayleigh_wave_velocity : 1 # rayleigh wave velocity in km/s, will be ignored if compute_tt=False
162
163
164 ### Range to try stretching
165 # maximum stretch factor
166 stretch_range : 0.03
167 # number of stretching increments
168 stretch_steps : 1001
169 # Return the similarity matrix for each dv/v estimate
170 # required for post processing. If False, saves some RAM and disk space
171 return_sim_mat : True
172
173 #### Reference trace extraction
174 # win_inc : Length in days of each reference time window to be used for trace extraction
175 # If == 0, only one trace will be extracted
176 # If > 0, mutliple reference traces will be used for the dv calculation
177 # Can also be a list if the length of the windows should vary
178 # See seismic.correlate.stream.CorrBulk.extract_multi_trace for details on arguments
179 dt_ref : {'win_inc' : 0, 'method': 'mean', 'percentile': 50}
180
181 # preprocessing on the correlation bulk or corr_stream before stretch estimation
182 preprocessing: [
183 #{'function': 'pop_at_utcs', 'args': {'utcs': np.array([UTCDateTime()])}, # Used to remove correlations from certain times
184 #{'function': 'select_time', 'args': {'start': (0, 0, 0), 'end': (23, 0, 0), exclude=False}, # include only certain time of day (h, minute, second)
185 {'function': 'smooth', 'args': {'wsize': 48, 'wtype': 'hanning', 'axis': 1}}
186 ]
187
188 # postprocessing of dv objects before saving and plotting
189 postprocessing: [
190 {'function': 'smooth_sim_mat', 'args': {'win_len': 7, exclude_corr_below: 0}}
191 ]
192
193#### parameters to compute the waveform coherence
194wfc:
195# subfolder for storage of time difference results
196 subdir : 'wfc'
197
198 ### Definition of calender time windows for the time difference measurements
199 start_date : '2015-05-01 00:00:00.0' # %Y-%m-%dT%H:%M:%S.%fZ'
200 end_date : '2016-01-01 00:00:00.0'
201 win_len : 86400 # length of window in which EGFs are stacked
202 date_inc : 86400 # increment of measurements
203
204 ### Frequencies
205 # can be lists of same length
206 freq_min : [0.0625, 0.09, 0.125, 0.25, 0.375, 0.5, 0.75, 1, 1.5, 2, 3, 4, 5]
207 freq_max : [0.125, 0.18, 0.25, 0.5, 0.75, 1, 1.5, 2, 3, 4, 6, 8, 10]
208
209 ### Definition of lapse time window
210 # can be lists of same length or tw_start: List and tw_len: single value (will be applied to all)
211 tw_start : [0, 1.25, 2.5, 3.75, 5, 6.25, 7.5, 8.75, 10, 11.25, 12.5, 13.75, 15, 16.25, 17.5, 17.75, 20] # lapse time of first sample [s]
212 tw_len : 5 # length of window [s]
213
214 #### Reference trace extraction
215 # win_inc : Length in days of each reference time window to be used for trace extraction
216 # If == 0, only one trace will be extracted
217 # If > 0, mutliple reference traces will be used for the dv calculation
218 # Can also be a list if the length of the windows should vary
219 # See seismic.correlate.stream.CorrBulk.extract_multi_trace for details on arguments
220 dt_ref : {'win_inc' : 0, 'method': 'mean', 'percentile': 50}
221
222 # preprocessing on the correlation bulk before stretch estimation
223 preprocessing: [
224 #{'function': 'select_time', 'args': {'start': (0, 0, 0), 'end': (23, 0, 0), exclude=False}, # include only certain time of day (h, minute, second)
225 {'function': 'smooth', 'args': {'wsize': 4, 'wtype': 'hanning', 'axis': 1}}
226 ]
227
228 ### SAVING
229 # save components separately or only their average?
230 save_comps: False
This might look a little intimidating at first glancec, but it is actually quite straight-forward. To achieve a better understanding of what each of the parameters do, let’s have a close look at them individually.
Project Wide Parameters#
1#### Project wide parameters
2# lowest level project directory
3proj_dir : '/path/to/project/root/'
4# Save the component combinations in separate hdf5 files
5# This is faster for multi-core if True
6# Set to False for compatibility with SeisMIC version < 0.5.0
7save_comps_separately: True
8# directory for logging information
9log_subdir : 'log'
10# levels:
11# 'DEBUG', 'INFO', 'WARNING', 'ERROR', or 'CRITICAL'
12log_level: 'WARNING'
13# folder for figures
14fig_subdir : 'figures'
Those are parameters that govern the logging and the file-structure. proj_dir
is the root directory, we have chosen when initialising our
Store_Client
as shown here .
fig_dir
and log_dir
are just subdirectories for figures and logs, respectively, and the log level decides how much will actually be logged.
In most cases, WARNING is the appropriate choice - everything below will start spitting out a lot of information.
Network Specific Parameters#
1#### parameters that are network specific
2net:
3 # list of stations used in the project
4 # type: list of strings or string, wildcards allowed
5 network : 'D0'
6 station : ['BZG', 'ESO', 'KBG', 'KIR']
7 # For "component" only strings are allowed (no lists!)
8 component: 'Z' # Use * or ? or None to allow all components
Here, we decide which data to use (i.e., which data the correlator will look for and read in). All parameters accept wildcards. Network and Station can be strings or lists.
Note
If both network
and station
are lists, they have to have the same length.
The corresponding logic is: [net0, net1, net2, net3] and [stat0, stat1, stat2, stat3]
and so on.
Correlation Arguments#
This is the really juicy stuff and probably the part that will have the strongest influence on your results. Let’s start by getting the most obvious parameters out of the way:
1#### parameters for correlation (emperical Green's function creation)
2co:
3 # subdirectory of 'proj_dir' to store correlation
4 # type: string
5 subdir : 'corr'
6 # times sequences to read for cliping or muting on stream basis
7 # These should be long enough for the reference (e.g. the standard
8 # deviation) to be rather independent of the parts to remove
9 # type: string
10 read_start : '2015-08-1 00:00:01.0'
11 read_end : '2015-08-06 00:00:00.0'
12 # type: float [seconds]
13 # The longer the faster, but higher RAM usage.
14 # Note that this is also the length of the correlation batches
15 # that will be written (i.e., length that will be
16 # kept in memory before writing to disk)
17 read_len : 86398
18 read_inc : 86400
subdir
The directory to save the correlations in (correlations are generally saved in hdf5 format).read_start
andread_end
are the earliest and latest dates that you want to readread_len
the length that will be read in. Usually, you have one mseed file per day. To avoid having to read several files, you will want that to be a bit less than a dayread_inc
is the increment between each reading interval
Note
Neither read_len
nor read_inc
are deciding about the correlation length.
1 # New sampling rate in Hz. Note that it will try to decimate
2 # if possible (i.e., there is an integer factor from the
3 # native sampling_rate)
4 sampling_rate: 25
5 # Remove the instrument response, will take substantially more time
6 remove_response: False
7
8 # Method to combine different traces
9 combination_method : 'betweenStations'
10
11 # If you want only specific combinations to be computed enter them here
12 # In the form [Net0-Net0.Stat0-Stat1]
13 # This option will only be consider if combination_method == 'betweenStations'
14 # Comment or set == None if not in use
15 xcombinations : None
Sampling_rate
is the new sampling rate you will want your data to have. SeisMIC will take care of anti-alias filtering and determine whether data can be decimated.remove_response
if you want the data to be corrected for the instrument response, set this toTrue
. Instrument response removal in obspy is unfortunately quite expensive..combination_method
decides which components you will want to correlate. Seecalc_cross_combis()
for allowed options.xcombinations
If you want to save some computational resources and only compute specific combinations, use this function. If you want to limit the maximum distance between stations to cross-correlate you can useseismic.correlate.correlate.Correlator.find_interstat_dist()
to compute this parameter
Preprocessing Arguments#
SeisMIC is coded in a manner that makes it easy for the user to pass custom preprocessing functions. Custom functions can be defined in the three parameters preProcessing
, TDpreProcessing
, and FDpreprocessing
.
All these parameters expect a list
of dictionaries
as input. Each dictionary must have the keys function
and args
. The value for function is a string describing the complete import path of the preprocessing function in the form ‘package.module.sobmodule.function’.
args
is simply a keyword argument dictionary that will be passed to the function.
SeisMIC comes with a number of preprocessing functions. If you are creating a custom preprocessing function, it is probably a good idea to have a look at these first in order to understand the required syntax. Preprocecssing is generally done in three steps:
Preprocessing “on per stream basis”
All functions here take an obspy stream as input and return the processed stream.
An over view of available stream preprocessing functions can be found in preprocessing_stream
.
1 # preprocessing of the original length time series
2 # these function work on an obspy.Stream object given as first argument
3 # and return an obspy.Stream object.
4 preProcessing : [
5 {'function':'seismic.correlate.preprocessing_stream.detrend_st',
6 'args':{'type':'linear'}},
7 {'function':'seismic.correlate.preprocessing_stream.cos_taper_st',
8 'args':{'taper_len': 100, # seconds
9 'lossless': True}}, # lossless tapering stitches additional data to trace, tapers, and removes the tapered ends after preprocessing
10 # This is intended as a "technical" bandpass filter to remove unphysical signals, i.e., frequencies you would not expect in the data
11 {'function':'seismic.correlate.preprocessing_stream.stream_filter',
12 'args':{'ftype':'bandpass',
13 'filter_option':{'freqmin':0.01, #0.01
14 'freqmax':12}}}
15 # Mute data, when the amplitude is above a certain threshold
16 #{'function':'seismic.correlate.preprocessing_stream.stream_mute',
17 # 'args':{'taper_len':100,
18 # 'mute_method':'std_factor',
19 # 'mute_value':3}}
20 ]
Preprocessing on arrays in time and frequency domain
The functions to use have to be provided in corr_args['TDpreProcecssing']
and corr_args['FDpreProcecssing']
.
A custom function would need to take a matrix as input, where each column is one waveform in time or frequency domain.
Additionally, the args
dictionary and a params
dictionary will be passed.
1# parameters for correlation preprocessing
2# Standard functions reside in seismic.correlate.preprocessing_td
3corr_args : {'TDpreProcessing':[
4 # detrend not recommended. Use preProcessing detrend_st instead (faster)
5 # {'function':'seismic.correlate.preprocessing_td.detrend',
6 # 'args':{'type':'linear'}},
7 {'function':'seismic.correlate.preprocessing_td.TDfilter',
8 'args':{'type':'bandpass','freqmin':2,'freqmax':8}},
9 #{'function':'seismic.correlate.preprocessing_td.mute',
10 # 'args':{'taper_len':100.,
11 # 'threshold':1000, absolute threshold
12 # 'std_factor':3,
13 # 'filter':{'type':'bandpass','freqmin':2,'freqmax':4},
14 # 'extend_gaps':True}},
15 {'function':'seismic.correlate.preprocessing_td.clip',
16 'args':{'std_factor':2.5}},
17 {'function':'seismic.correlate.preprocessing_td.signBitNormalization',
18 'args': {}}
19 ],
20 # Standard functions reside in seismic.correlate.preprocessing_fd
21 'FDpreProcessing':[
22 {'function':'seismic.correlate.preprocessing_fd.spectralWhitening',
23 'args':{'joint_norm':False}},
24 {'function':'seismic.correlate.preprocessing_fd.FDfilter',
25 'args':{'flimit':[0.01,0.02,9,10]}}
26 # {'function':seismic.correlate.preprocessing_fd.FDsignBitNormalization,
27 # 'args':{}}
28 ],
29 ...
30 }
Arguments for the actual correlation#
Subdivision
is the parameter that decides about the length and increment of the noise recordings to be preprocessed and correlated.
If recombine_subdivision=True
, the correlations will be stacked to read_len
.
LengthToSave
is the length of each correlation function in secondsCenter_Correlation
If True, zero-lag will always be in the middle of the function.normalize_correlation
: Normalise the correlation by the absolute maximum?
1 # subdivision of the read sequences for correlation
2 # type: presence of this key
3 subdivision:
4 # type: float [seconds]
5 corr_inc : 3600
6 corr_len : 3600
7 # recombine these subdivisions
8 # unused at the time
9 # type: boolean
10 recombine_subdivision : True
11 # delete
12 # type: booblean
13 delete_subdivision : False
14
15 # parameters for correlation preprocessing
16 # Standard functions reside in seismic.correlate.preprocessing_td
17 corr_args : {'lengthToSave':100,
18 'center_correlation':True, # make sure zero correlation time is in the center
19 'normalize_correlation':True,
20 'combinations':[]
21 }
The rest of the yaml file will be discussed at a later points. Now, let’s actually start the computation!