%%{init: { 'logLevel': 'debug', 'theme': 'base' } }%%
graph LR
    waveform[Get Data] --> correlate(Correlation)
    correlate:::active -->|save| corrdb[(CorrDB/hdf5)]
    corrdb --> monitor
    monitor[Measure dv] -->|save| dv{{DV}}
    click waveform "../trace_data.html" "trace_data"
    click correlate "../correlate.html" "correlate"
    click monitor "../monitor.html" "monitor"
    click corrdb "../corrdb.html" "CorrDB"
    click dv "../monitor/dv.html" "DV"
    classDef active fill:#f666, stroke-width:4px, stroke:#f06;
    

Computing your first Noise Correlations#

After having downloaded your data as shown here (or retrieved seismic data any other way), we are ready to compute our first noise correlations! In SeisMIC, parameters for the preprocessing, the correlation, and, subsequently, measuring the change in seismic velocity are provided as a .yaml file or as a python dictionary. An example for such a .yaml file is shown below (and provided in the repository as params_example.yaml).

Setting the parameters#

  1#### Project wide parameters
  2# lowest level project directory
  3proj_dir : '/path/to/project/root/'
  4# Save the component combinations in separate hdf5 files
  5# This is faster for multi-core if True
  6# Set to False for compatibility with SeisMIC version < 0.5.0
  7save_comps_separately: True
  8# directory for logging information
  9log_subdir : 'log'
 10# levels:
 11# 'DEBUG', 'INFO', 'WARNING', 'ERROR', or 'CRITICAL'
 12log_level: 'WARNING'
 13# folder for figures
 14fig_subdir : 'figures'
 15
 16
 17#### parameters that are network specific
 18net:
 19    # list of stations used in the project
 20    # type: list of strings or string, wildcards allowed
 21    network : 'D0'
 22    station : ['BZG', 'ESO', 'KBG', 'KIR']
 23    # For "component" only strings are allowed (no lists!)
 24    component: 'Z'  # Use * or ? or None to allow all components
 25
 26#### parameters for correlation (emperical Green's function creation)
 27co:
 28    # subdirectory of 'proj_dir' to store correlation
 29    # type: string
 30    subdir : 'corr'
 31    # times sequences to read for cliping or muting on stream basis
 32    # These should be long enough for the reference (e.g. the standard
 33    # deviation) to be rather independent of the parts to remove
 34    # type: string
 35    read_start : '2015-08-1 00:00:01.0'
 36    read_end : '2015-08-06 00:00:00.0'
 37    # type: float [seconds]
 38    # The longer the faster, but higher RAM usage.
 39    # Note that this is also the length of the correlation batches
 40    # that will be written (i.e., length that will be
 41    # kept in memory before writing to disk)
 42    # If you are unsure, keep defaults
 43    read_len : 86398
 44    read_inc : 86400
 45
 46    # New sampling rate in Hz. Note that it will try to decimate
 47    # if possible (i.e., there is an integer factor from the
 48    # native sampling_rate). Pay attention to Nyquist frequency for
 49    # later steps
 50    sampling_rate : 25
 51    # Remove the instrument response, will take substantially more time
 52    remove_response : False
 53
 54    # Method to combine different traces
 55    # Options are: 'betweenStations', 'betweenComponents', 'autoComponents', 'allSimpleCombinations', or 'allCombinations'
 56    combination_method : 'betweenStations'
 57
 58    # If you want only specific combinations to be computed enter them here
 59    # In the form [Net0-Net1.Stat0-Stat1] or [Net0-Net1.Stat0-Stat1.Cha0-Cha1]
 60    # This option will only be consider if combination_method == 'betweenStations'
 61    # Comment or set == None if not in use
 62    xcombinations : None
 63
 64    # preprocessing of the original length time series
 65    # these function work on an obspy.Stream object given as first argument
 66    # and return an obspy.Stream object.
 67    # available preimplemented functions are in seismic.correlate.preprocessing_stream
 68    preProcessing : [
 69                    # This is intended as a "technical" bandpass filter to remove unphysical signals, i.e., frequencies you would not expect in the data
 70                    {'function':'seismic.correlate.preprocessing_stream.stream_filter',
 71                    'args':{'ftype':'bandpass',
 72                            'filter_option':{'freqmin':0.01, #0.01
 73                                            'freqmax':12}}}
 74                    # Mute data, when the amplitude is above a certain threshold
 75                    #{'function':'seismic.correlate.preprocessing_stream.stream_mute',
 76                    # 'args':{'taper_len':100,
 77                    #         'mute_method':'std_factor',
 78                    #         'mute_value':3}}
 79                    ]
 80    # subdivision of the read sequences for correlation
 81    # if this is set the stream processing will happen on the hourly subdivision. This has the
 82    # advantage that data that already exists will not need to be preprocessed again
 83    # On the other hand, computing a whole new database might be slower
 84    # Recommended to be set to True if:
 85    # a) You update your database and a lot of the data is already available (up to a magnitude faster)
 86    # b) corr_len is close to read_len
 87    # Is automatically set to False if no existing correlations are found
 88    preprocess_subdiv: True
 89    # type: presence of this key
 90    subdivision:
 91        # type: float [seconds]
 92        corr_inc : 3600
 93        corr_len : 3600
 94        # recombine these subdivisions
 95        # unused at the time
 96        # type: boolean
 97        recombine_subdivision : True
 98        # delete
 99        # type: booblean
100        delete_subdivision : False
101
102    # parameters for correlation preprocessing
103    # Standard functions reside in seismic.correlate.preprocessing_td and preprocessing_fd, respectively
104    corr_args : {'TDpreProcessing':[
105                                {'function':'seismic.correlate.preprocessing_td.taper',
106                                    'args':{'type':'cosine_taper', 'p': 0.02}},
107                                    {'function':'seismic.correlate.preprocessing_td.detrend',
108                                    'args':{'type':'linear'}},
109                                {'function':'seismic.correlate.preprocessing_td.TDfilter',
110                                'args':{'type':'bandpass','freqmin':2,'freqmax':8}},
111                                    #{'function':'seismic.correlate.preprocessing_td.mute',
112                                    # 'args':{'taper_len':100.,
113                                        # 'threshold':1000, absolute threshold
114                                    #         'std_factor':3,
115                                    #         'filter':{'type':'bandpass','freqmin':2,'freqmax':4},
116                                    #         'extend_gaps':True}},
117                                {'function':'seismic.correlate.preprocessing_td.clip',
118                                    'args':{'std_factor':2.5}},
119                                {'function':'seismic.correlate.preprocessing_td.signBitNormalization',
120                                    'args': {}}
121                                ],
122                # Standard functions reside in seismic.correlate.preprocessing_fd
123                'FDpreProcessing':[
124                                    {'function':'seismic.correlate.preprocessing_fd.spectralWhitening',
125                                    'args':{'joint_norm':False}},
126                                    {'function':'seismic.correlate.preprocessing_fd.FDfilter',
127                                    'args':{'flimit':[0.01,0.02,9,10]}}
128                                    #  {'function':seismic.correlate.preprocessing_fd.FDsignBitNormalization,
129                                    # 'args':{}}
130                                    ],
131                'lengthToSave':100,
132                'center_correlation':True,      # make sure zero correlation time is in the center
133                'normalize_correlation':True,
134                'combinations':[]
135                }
136
137
138#### parameters for the estimation of time differences
139dv:
140    # subfolder for storage of time difference results
141    subdir : 'vel_change'
142
143    # Plotting
144    plot_vel_change : True
145
146    ### Definition of calender time windows for the time difference measurements
147    start_date : '2015-05-01 00:00:00.0'   # %Y-%m-%dT%H:%M:%S.%fZ'
148    end_date : '2016-01-01 00:00:00.0'
149    win_len : 86400                         # length of window in which EGFs are stacked
150    date_inc : 86400                        # increment of measurements
151
152    ### Frequencies
153    freq_min : 4
154    freq_max : 8
155
156    ### Definition of lapse time window, i.e. time window in the coda that is used for the dv/v estimate
157    tw_start : 20     # lapse time of first sample [s]
158    tw_len : 60       # length of window [s] Can be None if the whole (rest) of the coda should be used
159    sides : 'both'   # options are left (for acausal), right (causal), both, or single (for active source experiments where the first sample is the trigger time)
160    compute_tt : True  # Computes the travel time and adds it to tw_start (tt is 0 if not xstations). If true a rayleigh wave velocity has to be provided
161    rayleigh_wave_velocity : 1  # rayleigh wave velocity in km/s, will be ignored if compute_tt=False
162
163
164    ### Range to try stretching
165    # maximum stretch factor
166    stretch_range : 0.03
167    # number of stretching increments
168    stretch_steps : 1001
169    # Return the similarity matrix for each dv/v estimate
170    # required for post processing. If False, saves some RAM and disk space
171    return_sim_mat : True
172
173    #### Reference trace extraction
174    #  win_inc : Length in days of each reference time window to be used for trace extraction
175    # If == 0, only one trace will be extracted
176    # If > 0, mutliple reference traces will be used for the dv calculation
177    # Can also be a list if the length of the windows should vary
178    # See seismic.correlate.stream.CorrBulk.extract_multi_trace for details on arguments
179    dt_ref : {'win_inc' : 0, 'method': 'mean', 'percentile': 50}
180
181    # preprocessing on the correlation bulk or corr_stream before stretch estimation
182    preprocessing: [
183                    #{'function': 'pop_at_utcs', 'args': {'utcs': np.array([UTCDateTime()])},  # Used to remove correlations from certain times
184                    #{'function': 'select_time', 'args': {'start': (0, 0, 0), 'end': (23, 0, 0), exclude=False}, # include only certain time of day (h, minute, second)
185                    {'function': 'smooth', 'args': {'wsize': 48, 'wtype': 'hanning', 'axis': 1}}
186    ]
187
188    # postprocessing of dv objects before saving and plotting
189    postprocessing: [
190                    {'function': 'smooth_sim_mat', 'args': {'win_len': 7, exclude_corr_below: 0}}
191    ]
192
193#### parameters to compute the waveform coherence
194wfc:
195# subfolder for storage of time difference results
196    subdir : 'wfc'
197
198    ### Definition of calender time windows for the time difference measurements
199    start_date : '2015-05-01 00:00:00.0'   # %Y-%m-%dT%H:%M:%S.%fZ'
200    end_date : '2016-01-01 00:00:00.0'
201    win_len : 86400                         # length of window in which EGFs are stacked
202    date_inc : 86400                        # increment of measurements
203
204    ### Frequencies
205    # can be lists of same length
206    freq_min : [0.0625, 0.09, 0.125, 0.25, 0.375, 0.5, 0.75, 1, 1.5, 2, 3, 4, 5]
207    freq_max : [0.125, 0.18, 0.25, 0.5, 0.75, 1, 1.5, 2, 3, 4, 6, 8, 10]
208
209    ### Definition of lapse time window
210    # can be lists of same length or tw_start: List and tw_len: single value (will be applied to all)
211    tw_start : [0, 1.25, 2.5, 3.75, 5, 6.25, 7.5, 8.75, 10, 11.25, 12.5, 13.75, 15, 16.25, 17.5, 17.75, 20]     # lapse time of first sample [s]
212    tw_len : 5       # length of window [s]
213
214    #### Reference trace extraction
215    #  win_inc : Length in days of each reference time window to be used for trace extraction
216    # If == 0, only one trace will be extracted
217    # If > 0, mutliple reference traces will be used for the dv calculation
218    # Can also be a list if the length of the windows should vary
219    # See seismic.correlate.stream.CorrBulk.extract_multi_trace for details on arguments
220    dt_ref : {'win_inc' : 0, 'method': 'mean', 'percentile': 50}
221
222    # preprocessing on the correlation bulk before stretch estimation
223    preprocessing: [
224                    #{'function': 'select_time', 'args': {'start': (0, 0, 0), 'end': (23, 0, 0), exclude=False}, # include only certain time of day (h, minute, second)
225                    {'function': 'smooth', 'args': {'wsize': 4, 'wtype': 'hanning', 'axis': 1}}
226    ]
227
228    ### SAVING
229    # save components separately or only their average?
230    save_comps: False

This might look a little intimidating at first glancec, but it is actually quite straight-forward. To achieve a better understanding of what each of the parameters do, let’s have a close look at them individually.

Project Wide Parameters#

 1#### Project wide parameters
 2# lowest level project directory
 3proj_dir : '/path/to/project/root/'
 4# Save the component combinations in separate hdf5 files
 5# This is faster for multi-core if True
 6# Set to False for compatibility with SeisMIC version < 0.5.0
 7save_comps_separately: True
 8# directory for logging information
 9log_subdir : 'log'
10# levels:
11# 'DEBUG', 'INFO', 'WARNING', 'ERROR', or 'CRITICAL'
12log_level: 'WARNING'
13# folder for figures
14fig_subdir : 'figures'

Those are parameters that govern the logging and the file-structure. proj_dir is the root directory, we have chosen when initialising our Store_Client as shown here . fig_dir and log_dir are just subdirectories for figures and logs, respectively, and the log level decides how much will actually be logged. In most cases, WARNING is the appropriate choice - everything below will start spitting out a lot of information.

Network Specific Parameters#

1#### parameters that are network specific
2net:
3    # list of stations used in the project
4    # type: list of strings or string, wildcards allowed
5    network : 'D0'
6    station : ['BZG', 'ESO', 'KBG', 'KIR']
7    # For "component" only strings are allowed (no lists!)
8    component: 'Z'  # Use * or ? or None to allow all components

Here, we decide which data to use (i.e., which data the correlator will look for and read in). All parameters accept wildcards. Network and Station can be strings or lists.

Note

If both network and station are lists, they have to have the same length. The corresponding logic is: [net0, net1, net2, net3] and [stat0, stat1, stat2, stat3] and so on.

Correlation Arguments#

This is the really juicy stuff and probably the part that will have the strongest influence on your results. Let’s start by getting the most obvious parameters out of the way:

 1#### parameters for correlation (emperical Green's function creation)
 2co:
 3    # subdirectory of 'proj_dir' to store correlation
 4    # type: string
 5    subdir : 'corr'
 6    # times sequences to read for cliping or muting on stream basis
 7    # These should be long enough for the reference (e.g. the standard
 8    # deviation) to be rather independent of the parts to remove
 9    # type: string
10    read_start : '2015-08-1 00:00:01.0'
11    read_end : '2015-08-06 00:00:00.0'
12    # type: float [seconds]
13    # The longer the faster, but higher RAM usage.
14    # Note that this is also the length of the correlation batches
15    # that will be written (i.e., length that will be
16    # kept in memory before writing to disk)
17    read_len : 86398
18    read_inc : 86400
  • subdir The directory to save the correlations in (correlations are generally saved in hdf5 format).

  • read_start and read_end are the earliest and latest dates that you want to read

  • read_len the length that will be read in. Usually, you have one mseed file per day. To avoid having to read several files, you will want that to be a bit less than a day

  • read_inc is the increment between each reading interval

Note

Neither read_len nor read_inc are deciding about the correlation length.

 1    # New sampling rate in Hz. Note that it will try to decimate
 2    # if possible (i.e., there is an integer factor from the
 3    # native sampling_rate)
 4    sampling_rate: 25
 5    # Remove the instrument response, will take substantially more time
 6    remove_response: False
 7
 8    # Method to combine different traces
 9    combination_method : 'betweenStations'
10
11    # If you want only specific combinations to be computed enter them here
12    # In the form [Net0-Net0.Stat0-Stat1]
13    # This option will only be consider if combination_method == 'betweenStations'
14    # Comment or set == None if not in use
15    xcombinations : None
  • Sampling_rate is the new sampling rate you will want your data to have. SeisMIC will take care of anti-alias filtering and determine whether data can be decimated.

  • remove_response if you want the data to be corrected for the instrument response, set this to True. Instrument response removal in obspy is unfortunately quite expensive..

  • combination_method decides which components you will want to correlate. See calc_cross_combis() for allowed options.

  • xcombinations If you want to save some computational resources and only compute specific combinations, use this function. If you want to limit the maximum distance between stations to cross-correlate you can use seismic.correlate.correlate.Correlator.find_interstat_dist() to compute this parameter

Preprocessing Arguments#

SeisMIC is coded in a manner that makes it easy for the user to pass custom preprocessing functions. Custom functions can be defined in the three parameters preProcessing, TDpreProcessing, and FDpreprocessing. All these parameters expect a list of dictionaries as input. Each dictionary must have the keys function and args. The value for function is a string describing the complete import path of the preprocessing function in the form ‘package.module.sobmodule.function’. args is simply a keyword argument dictionary that will be passed to the function.

SeisMIC comes with a number of preprocessing functions. If you are creating a custom preprocessing function, it is probably a good idea to have a look at these first in order to understand the required syntax. Preprocecssing is generally done in three steps:

Preprocessing “on per stream basis” All functions here take an obspy stream as input and return the processed stream. An over view of available stream preprocessing functions can be found in preprocessing_stream.

 1    # preprocessing of the original length time series
 2    # these function work on an obspy.Stream object given as first argument
 3    # and return an obspy.Stream object.
 4    preProcessing : [
 5                    {'function':'seismic.correlate.preprocessing_stream.detrend_st',
 6                    'args':{'type':'linear'}},
 7                    {'function':'seismic.correlate.preprocessing_stream.cos_taper_st',
 8                    'args':{'taper_len': 100, # seconds
 9                            'lossless': True}}, # lossless tapering stitches additional data to trace, tapers, and removes the tapered ends after preprocessing
10                    # This is intended as a "technical" bandpass filter to remove unphysical signals, i.e., frequencies you would not expect in the data
11                    {'function':'seismic.correlate.preprocessing_stream.stream_filter',
12                    'args':{'ftype':'bandpass',
13                            'filter_option':{'freqmin':0.01, #0.01
14                                            'freqmax':12}}}
15                    # Mute data, when the amplitude is above a certain threshold
16                    #{'function':'seismic.correlate.preprocessing_stream.stream_mute',
17                    # 'args':{'taper_len':100,
18                    #         'mute_method':'std_factor',
19                    #         'mute_value':3}}
20                    ]

Preprocessing on arrays in time and frequency domain The functions to use have to be provided in corr_args['TDpreProcecssing'] and corr_args['FDpreProcecssing']. A custom function would need to take a matrix as input, where each column is one waveform in time or frequency domain. Additionally, the args dictionary and a params dictionary will be passed.

 1# parameters for correlation preprocessing
 2# Standard functions reside in seismic.correlate.preprocessing_td
 3corr_args : {'TDpreProcessing':[
 4                                # detrend not recommended. Use preProcessing detrend_st instead (faster)
 5                                # {'function':'seismic.correlate.preprocessing_td.detrend',
 6                                # 'args':{'type':'linear'}},
 7                               {'function':'seismic.correlate.preprocessing_td.TDfilter',
 8                               'args':{'type':'bandpass','freqmin':2,'freqmax':8}},
 9                                #{'function':'seismic.correlate.preprocessing_td.mute',
10                                # 'args':{'taper_len':100.,
11                                       # 'threshold':1000, absolute threshold
12                                #         'std_factor':3,
13                                #         'filter':{'type':'bandpass','freqmin':2,'freqmax':4},
14                                #         'extend_gaps':True}},
15                               {'function':'seismic.correlate.preprocessing_td.clip',
16                                'args':{'std_factor':2.5}},
17                               {'function':'seismic.correlate.preprocessing_td.signBitNormalization',
18                                'args': {}}
19                               ],
20              # Standard functions reside in seismic.correlate.preprocessing_fd
21             'FDpreProcessing':[
22                                {'function':'seismic.correlate.preprocessing_fd.spectralWhitening',
23                                 'args':{'joint_norm':False}},
24                                {'function':'seismic.correlate.preprocessing_fd.FDfilter',
25                                 'args':{'flimit':[0.01,0.02,9,10]}}
26                                #  {'function':seismic.correlate.preprocessing_fd.FDsignBitNormalization,
27                                # 'args':{}}
28                                ],
29            ...
30            }

Arguments for the actual correlation#

Subdivision is the parameter that decides about the length and increment of the noise recordings to be preprocessed and correlated. If recombine_subdivision=True, the correlations will be stacked to read_len.

  • LengthToSave is the length of each correlation function in seconds

  • Center_Correlation If True, zero-lag will always be in the middle of the function.

  • normalize_correlation: Normalise the correlation by the absolute maximum?

 1    # subdivision of the read sequences for correlation
 2    # type: presence of this key
 3    subdivision:
 4        # type: float [seconds]
 5        corr_inc : 3600
 6        corr_len : 3600
 7        # recombine these subdivisions
 8        # unused at the time
 9        # type: boolean
10        recombine_subdivision : True
11        # delete
12        # type: booblean
13        delete_subdivision : False
14
15    # parameters for correlation preprocessing
16    # Standard functions reside in seismic.correlate.preprocessing_td
17    corr_args : {'lengthToSave':100,
18                'center_correlation':True,      # make sure zero correlation time is in the center
19                'normalize_correlation':True,
20                'combinations':[]
21                }

The rest of the yaml file will be discussed at a later points. Now, let’s actually start the computation!