%%{init: { 'logLevel': 'debug', 'theme': 'base' } }%%
graph LR
    waveform[Get Data] --> correlate(Correlation)
    correlate:::active -->|save| corrdb[(CorrDB/hdf5)]
    corrdb --> monitor
    monitor[Measure dv] -->|save| dv{{DV}}
    click waveform "../trace_data.html" "trace_data"
    click correlate "../correlate.html" "correlate"
    click monitor "../monitor.html" "monitor"
    click corrdb "../corrdb.html" "CorrDB"
    click dv "../monitor/dv.html" "DV"
    classDef active fill:#f666, stroke-width:4px, stroke:#f06;
    

Computing your first Noise Correlations#

After having downloaded your data as shown here (or retrieved seismic data any other way), we are ready to compute our first noise correlations! In SeisMIC, parameters for the preprocessing, the correlation, and, subsequently, measuring the change in seismic velocity are provided as a .yaml file or as a python dictionary. An example for such a .yaml file is shown below (and provided in the repository as params_example.yaml).

Setting the parameters#

  1#### Project wide parameters
  2# lowest level project directory
  3proj_dir : '/path/to/project/root/'
  4# Save the component combinations in separate hdf5 files
  5# This is faster for multi-core if True
  6# Set to False for compatibility with SeisMIC version < 0.5.0
  7save_comps_separately: True
  8# directory for logging information
  9log_subdir : 'log'
 10# levels:
 11# 'DEBUG', 'INFO', 'WARNING', 'ERROR', or 'CRITICAL'
 12log_level: 'WARNING'
 13# folder for figures
 14fig_subdir : 'figures'
 15# Path to stationxml files, default is "inventory/*.xml"
 16stationxml_file : 'inventory/*.xml'
 17# sds root directory, default is "mseed"
 18sds_dir : 'mseed'
 19# sds format string of waveform file names,
 20# change if your filenames deviate from standard pattern
 21sds_fmtstr : '{year}/{network}/{station}/{channel}.{sds_type}/{network}.{station}.{location}.{channel}.{sds_type}.{year}.{doy:03d}'
 22
 23
 24
 25#### parameters that are network specific
 26net:
 27    # list of stations used in the project
 28    # type: list of strings or string, wildcards allowed
 29    network : 'D0'
 30    station : ['BZG', 'ESO', 'KBG', 'KIR']
 31    # For "component" only strings are allowed (no lists!)
 32    component: 'Z'  # Use * or ? or None to allow all components
 33
 34#### parameters for correlation (emperical Green's function creation)
 35co:
 36    # subdirectory of 'proj_dir' to store correlation
 37    # type: string
 38    subdir : 'corr'
 39    # times sequences to read for cliping or muting on stream basis
 40    # These should be long enough for the reference (e.g. the standard
 41    # deviation) to be rather independent of the parts to remove
 42    # type: string
 43    read_start : '2015-08-1 00:00:01.0'
 44    read_end : '2015-08-06 00:00:00.0'
 45    # type: float [seconds]
 46    # The longer the faster, but higher RAM usage.
 47    # Note that this is also the length of the correlation batches
 48    # that will be written (i.e., length that will be
 49    # kept in memory before writing to disk)
 50    # If you are unsure, keep defaults
 51    read_len : 86398
 52    read_inc : 86400
 53
 54    # New sampling rate in Hz. Note that it will try to decimate
 55    # if possible (i.e., there is an integer factor from the
 56    # native sampling_rate). Pay attention to Nyquist frequency for
 57    # later steps
 58    sampling_rate : 25
 59    # Remove the instrument response, will take substantially more time
 60    remove_response : False
 61
 62    # Method to combine different traces
 63    # Options are: 'betweenStations', 'betweenComponents', 'autoComponents', 'allSimpleCombinations', or 'allCombinations'
 64    combination_method : 'betweenStations'
 65
 66    # If you want only specific combinations to be computed enter them here
 67    # In the form [Net0-Net1.Stat0-Stat1] or [Net0-Net1.Stat0-Stat1.Cha0-Cha1]
 68    # This option will only be consider if combination_method == 'betweenStations'
 69    # Comment or set == None if not in use
 70    xcombinations : None
 71
 72    # preprocessing of the original length time series
 73    # these function work on an obspy.Stream object given as first argument
 74    # and return an obspy.Stream object.
 75    # available preimplemented functions are in seismic.correlate.preprocessing_stream
 76    preProcessing : [
 77                    # This is intended as a "technical" bandpass filter to remove unphysical signals, i.e., frequencies you would not expect in the data
 78                    {'function':'seismic.correlate.preprocessing_stream.stream_filter',
 79                    'args':{'ftype':'bandpass',
 80                            'filter_option':{'freqmin':0.01, #0.01
 81                                            'freqmax':12}}}
 82                    # Mute data, when the amplitude is above a certain threshold
 83                    #{'function':'seismic.correlate.preprocessing_stream.stream_mute',
 84                    # 'args':{'taper_len':100,
 85                    #         'mute_method':'std_factor',
 86                    #         'mute_value':3}}
 87                    ]
 88    # subdivision of the read sequences for correlation
 89    # if this is set the stream processing will happen on the hourly subdivision. This has the
 90    # advantage that data that already exists will not need to be preprocessed again
 91    # On the other hand, computing a whole new database might be slower
 92    # Recommended to be set to True if:
 93    # a) You update your database and a lot of the data is already available (up to a magnitude faster)
 94    # b) corr_len is close to read_len
 95    # Is automatically set to False if no existing correlations are found
 96    preprocess_subdiv: True
 97    # type: presence of this key
 98    subdivision:
 99        # type: float [seconds]
100        corr_inc : 3600
101        corr_len : 3600
102        # recombine these subdivisions
103        # unused at the time
104        # type: boolean
105        recombine_subdivision : True
106        # delete
107        # type: booblean
108        delete_subdivision : False
109
110    # parameters for correlation preprocessing
111    # Standard functions reside in seismic.correlate.preprocessing_td and preprocessing_fd, respectively
112    corr_args : {'TDpreProcessing':[
113                                {'function':'seismic.correlate.preprocessing_td.taper',
114                                    'args':{'type':'cosine_taper', 'p': 0.02}},
115                                    {'function':'seismic.correlate.preprocessing_td.detrend',
116                                    'args':{'type':'linear'}},
117                                {'function':'seismic.correlate.preprocessing_td.TDfilter',
118                                'args':{'type':'bandpass','freqmin':2,'freqmax':8}},
119                                    #{'function':'seismic.correlate.preprocessing_td.mute',
120                                    # 'args':{'taper_len':100.,
121                                        # 'threshold':1000, absolute threshold
122                                    #         'std_factor':3,
123                                    #         'filter':{'type':'bandpass','freqmin':2,'freqmax':4},
124                                    #         'extend_gaps':True}},
125                                {'function':'seismic.correlate.preprocessing_td.clip',
126                                    'args':{'std_factor':2.5}},
127                                {'function':'seismic.correlate.preprocessing_td.signBitNormalization',
128                                    'args': {}}
129                                ],
130                # Standard functions reside in seismic.correlate.preprocessing_fd
131                'FDpreProcessing':[
132                                    {'function':'seismic.correlate.preprocessing_fd.spectralWhitening',
133                                    'args':{'joint_norm':False}},
134                                    {'function':'seismic.correlate.preprocessing_fd.FDfilter',
135                                    'args':{'flimit':[0.01,0.02,9,10]}}
136                                    #  {'function':seismic.correlate.preprocessing_fd.FDsignBitNormalization,
137                                    # 'args':{}}
138                                    ],
139                'lengthToSave':100,
140                'center_correlation':True,      # make sure zero correlation time is in the center
141                'normalize_correlation':True,
142                'combinations':[]
143                }
144
145
146#### parameters for the estimation of time differences
147dv:
148    # subfolder for storage of time difference results
149    subdir : 'vel_change'
150
151    # Plotting
152    plot_vel_change : True
153
154    ### Definition of calender time windows for the time difference measurements
155    start_date : '2015-05-01 00:00:00.0'   # %Y-%m-%dT%H:%M:%S.%fZ'
156    end_date : '2016-01-01 00:00:00.0'
157    win_len : 86400                         # length of window in which EGFs are stacked
158    date_inc : 86400                        # increment of measurements
159
160    ### Frequencies
161    freq_min : 4
162    freq_max : 8
163
164    ### Definition of lapse time window, i.e. time window in the coda that is used for the dv/v estimate
165    tw_start : 20     # lapse time of first sample [s]
166    tw_len : 60       # length of window [s] Can be None if the whole (rest) of the coda should be used
167    sides : 'both'   # options are left (for acausal), right (causal), both, or single (for active source experiments where the first sample is the trigger time)
168    compute_tt : True  # Computes the travel time and adds it to tw_start (tt is 0 if not xstations). If true a rayleigh wave velocity has to be provided
169    rayleigh_wave_velocity : 1  # rayleigh wave velocity in km/s, will be ignored if compute_tt=False
170
171
172    ### Range to try stretching
173    # maximum stretch factor
174    stretch_range : 0.03
175    # number of stretching increments
176    stretch_steps : 1001
177    # Return the similarity matrix for each dv/v estimate
178    # required for post processing. If False, saves some RAM and disk space
179    return_sim_mat : True
180
181    #### Reference trace extraction
182    #  win_inc : Length in days of each reference time window to be used for trace extraction
183    # If == 0, only one trace will be extracted
184    # If > 0, mutliple reference traces will be used for the dv calculation
185    # Can also be a list if the length of the windows should vary
186    # See seismic.correlate.stream.CorrBulk.extract_multi_trace for details on arguments
187    dt_ref : {'win_inc' : 0, 'method': 'mean', 'percentile': 50}
188
189    # preprocessing on the correlation bulk or corr_stream before stretch estimation
190    preprocessing: [
191                    #{'function': 'pop_at_utcs', 'args': {'utcs': np.array([UTCDateTime()])},  # Used to remove correlations from certain times
192                    #{'function': 'select_time', 'args': {'start': (0, 0, 0), 'end': (23, 0, 0), exclude=False}, # include only certain time of day (h, minute, second)
193                    {'function': 'smooth', 'args': {'wsize': 48, 'wtype': 'hanning', 'axis': 1}}
194    ]
195
196    # postprocessing of dv objects before saving and plotting
197    postprocessing: [
198                    {'function': 'smooth_sim_mat', 'args': {'win_len': 7, exclude_corr_below: 0}}
199    ]
200
201#### parameters to compute the waveform coherence
202wfc:
203# subfolder for storage of time difference results
204    subdir : 'wfc'
205
206    ### Definition of calender time windows for the time difference measurements
207    start_date : '2015-05-01 00:00:00.0'   # %Y-%m-%dT%H:%M:%S.%fZ'
208    end_date : '2016-01-01 00:00:00.0'
209    win_len : 86400                         # length of window in which EGFs are stacked
210    date_inc : 86400                        # increment of measurements
211
212    ### Frequencies
213    # can be lists of same length
214    freq_min : [0.0625, 0.09, 0.125, 0.25, 0.375, 0.5, 0.75, 1, 1.5, 2, 3, 4, 5]
215    freq_max : [0.125, 0.18, 0.25, 0.5, 0.75, 1, 1.5, 2, 3, 4, 6, 8, 10]
216
217    ### Definition of lapse time window
218    # can be lists of same length or tw_start: List and tw_len: single value (will be applied to all)
219    tw_start : [0, 1.25, 2.5, 3.75, 5, 6.25, 7.5, 8.75, 10, 11.25, 12.5, 13.75, 15, 16.25, 17.5, 17.75, 20]     # lapse time of first sample [s]
220    tw_len : 5       # length of window [s]
221
222    #### Reference trace extraction
223    #  win_inc : Length in days of each reference time window to be used for trace extraction
224    # If == 0, only one trace will be extracted
225    # If > 0, mutliple reference traces will be used for the dv calculation
226    # Can also be a list if the length of the windows should vary
227    # See seismic.correlate.stream.CorrBulk.extract_multi_trace for details on arguments
228    dt_ref : {'win_inc' : 0, 'method': 'mean', 'percentile': 50}
229
230    # preprocessing on the correlation bulk before stretch estimation
231    preprocessing: [
232                    #{'function': 'select_time', 'args': {'start': (0, 0, 0), 'end': (23, 0, 0), exclude=False}, # include only certain time of day (h, minute, second)
233                    {'function': 'smooth', 'args': {'wsize': 4, 'wtype': 'hanning', 'axis': 1}}
234    ]
235
236    ### SAVING
237    # save components separately or only their average?
238    save_comps: False

This might look a little intimidating at first glancec, but it is actually quite straight-forward. To achieve a better understanding of what each of the parameters do, let’s have a close look at them individually.

Project Wide Parameters#

 1#### Project wide parameters
 2# lowest level project directory
 3proj_dir : '/path/to/project/root/'
 4# Save the component combinations in separate hdf5 files
 5# This is faster for multi-core if True
 6# Set to False for compatibility with SeisMIC version < 0.5.0
 7save_comps_separately: True
 8# directory for logging information
 9log_subdir : 'log'
10# levels:
11# 'DEBUG', 'INFO', 'WARNING', 'ERROR', or 'CRITICAL'
12log_level: 'WARNING'
13# folder for figures
14fig_subdir : 'figures'
15# Path to stationxml files, default is "inventory/*.xml"
16stationxml_file : 'inventory/*.xml'
17# sds root directory, default is "mseed"
18sds_dir : 'mseed'
19# sds format string of waveform file names,
20# change if your filenames deviate from standard SDS pattern
21sds_fmtstr : '{year}/{network}/{station}/{channel}.{sds_type}/{network}.{station}.{location}.{channel}.{sds_type}.{year}.{doy:03d}'

Those are parameters that govern the logging and the file-structure. proj_dir is the root directory, we have chosen when initialising our Store_Client as shown here . fig_dir and log_dir are just subdirectories for figures and logs, respectively, and the log level decides how much will actually be logged. In most cases, WARNING is the appropriate choice - everything below will start spitting out a lot of information. stationxml_file is the path to the stationxml files, and sds_dir is the directory where the mseed files are stored. sds_fmtstr is the format string for the mseed files. The default is the standard for SeisComP Data Structure (SDS). Leave these parameters to the default values if you used SeisMIC to download the data. However, if you want to use your own local data base, you will have to adjust these parameters accordingly. Wildcards are permitted.

The sds_fmtstr defines the naming pattern of the mseed files. The following placeholders (everything in {}) can be used: year, network, station, channel, location, sds_type, doy. The standard pattern for SeisComP3 is:

sds_fmtstr : '{year}/{network}/{station}/{channel}.{sds_type}/{network}.{station}.{location}.{channel}.{sds_type}.{year}.{doy:03d}'

Network Specific Parameters#

1#### parameters that are network specific
2net:
3    # list of stations used in the project
4    # type: list of strings or string, wildcards allowed
5    network : 'D0'
6    station : ['BZG', 'ESO', 'KBG', 'KIR']
7    # For "component" only strings are allowed (no lists!)
8    component: 'Z'  # Use * or ? or None to allow all components

Here, we decide which data to use (i.e., which data the correlator will look for and read in). All parameters accept wildcards. Network and Station can be strings or lists.

Note

If both network and station are lists, they have to have the same length. The corresponding logic is: [net0, net1, net2, net3] and [stat0, stat1, stat2, stat3] and so on.

Correlation Arguments#

This is the really juicy stuff and probably the part that will have the strongest influence on your results. Let’s start by getting the most obvious parameters out of the way:

 1#### parameters for correlation (emperical Green's function creation)
 2co:
 3    # subdirectory of 'proj_dir' to store correlation
 4    # type: string
 5    subdir : 'corr'
 6    # times sequences to read for cliping or muting on stream basis
 7    # These should be long enough for the reference (e.g. the standard
 8    # deviation) to be rather independent of the parts to remove
 9    # type: string
10    read_start : '2015-08-1 00:00:01.0'
11    read_end : '2015-08-06 00:00:00.0'
12    # type: float [seconds]
13    # The longer the faster, but higher RAM usage.
14    # Note that this is also the length of the correlation batches
15    # that will be written (i.e., length that will be
16    # kept in memory before writing to disk)
17    read_len : 86398
18    read_inc : 86400
  • subdir The directory to save the correlations in (correlations are generally saved in hdf5 format).

  • read_start and read_end are the earliest and latest dates that you want to read

  • read_len the length that will be read in. Usually, you have one mseed file per day. To avoid having to read several files, you will want that to be a bit less than a day

  • read_inc is the increment between each reading interval

Note

Neither read_len nor read_inc are deciding about the correlation length.

 1    # New sampling rate in Hz. Note that it will try to decimate
 2    # if possible (i.e., there is an integer factor from the
 3    # native sampling_rate)
 4    sampling_rate: 25
 5    # Remove the instrument response, will take substantially more time
 6    remove_response: False
 7
 8    # Method to combine different traces
 9    combination_method : 'betweenStations'
10
11    # If you want only specific combinations to be computed enter them here
12    # In the form [Net0-Net0.Stat0-Stat1]
13    # This option will only be consider if combination_method == 'betweenStations'
14    # Comment or set == None if not in use
15    xcombinations : None
  • Sampling_rate is the new sampling rate you will want your data to have. SeisMIC will take care of anti-alias filtering and determine whether data can be decimated.

  • remove_response if you want the data to be corrected for the instrument response, set this to True. Instrument response removal in obspy is unfortunately quite expensive..

  • combination_method decides which components you will want to correlate. See calc_cross_combis() for allowed options.

  • xcombinations If you want to save some computational resources and only compute specific combinations, use this function. If you want to limit the maximum distance between stations to cross-correlate you can use seismic.correlate.correlate.Correlator.find_interstat_dist() to compute this parameter

Preprocessing Arguments#

SeisMIC is coded in a manner that makes it easy for the user to pass custom preprocessing functions. Custom functions can be defined in the three parameters preProcessing, TDpreProcessing, and FDpreprocessing. All these parameters expect a list of dictionaries as input. Each dictionary must have the keys function and args. The value for function is a string describing the complete import path of the preprocessing function in the form ‘package.module.sobmodule.function’. args is simply a keyword argument dictionary that will be passed to the function.

SeisMIC comes with a number of preprocessing functions. If you are creating a custom preprocessing function, it is probably a good idea to have a look at these first in order to understand the required syntax. Preprocecssing is generally done in three steps:

Preprocessing “on per stream basis” All functions here take an obspy stream as input and return the processed stream. An over view of available stream preprocessing functions can be found in preprocessing_stream.

 1    # preprocessing of the original length time series
 2    # these function work on an obspy.Stream object given as first argument
 3    # and return an obspy.Stream object.
 4    preProcessing : [
 5                    {'function':'seismic.correlate.preprocessing_stream.detrend_st',
 6                    'args':{'type':'linear'}},
 7                    {'function':'seismic.correlate.preprocessing_stream.cos_taper_st',
 8                    'args':{'taper_len': 100, # seconds
 9                            'lossless': True}}, # lossless tapering stitches additional data to trace, tapers, and removes the tapered ends after preprocessing
10                    # This is intended as a "technical" bandpass filter to remove unphysical signals, i.e., frequencies you would not expect in the data
11                    {'function':'seismic.correlate.preprocessing_stream.stream_filter',
12                    'args':{'ftype':'bandpass',
13                            'filter_option':{'freqmin':0.01, #0.01
14                                            'freqmax':12}}}
15                    # Mute data, when the amplitude is above a certain threshold
16                    #{'function':'seismic.correlate.preprocessing_stream.stream_mute',
17                    # 'args':{'taper_len':100,
18                    #         'mute_method':'std_factor',
19                    #         'mute_value':3}}
20                    ]

Preprocessing on arrays in time and frequency domain The functions to use have to be provided in corr_args['TDpreProcecssing'] and corr_args['FDpreProcecssing']. A custom function would need to take a matrix as input, where each column is one waveform in time or frequency domain. Additionally, the args dictionary and a params dictionary will be passed.

 1# parameters for correlation preprocessing
 2# Standard functions reside in seismic.correlate.preprocessing_td
 3corr_args : {'TDpreProcessing':[
 4                                # detrend not recommended. Use preProcessing detrend_st instead (faster)
 5                                # {'function':'seismic.correlate.preprocessing_td.detrend',
 6                                # 'args':{'type':'linear'}},
 7                               {'function':'seismic.correlate.preprocessing_td.TDfilter',
 8                               'args':{'type':'bandpass','freqmin':2,'freqmax':8}},
 9                                #{'function':'seismic.correlate.preprocessing_td.mute',
10                                # 'args':{'taper_len':100.,
11                                       # 'threshold':1000, absolute threshold
12                                #         'std_factor':3,
13                                #         'filter':{'type':'bandpass','freqmin':2,'freqmax':4},
14                                #         'extend_gaps':True}},
15                               {'function':'seismic.correlate.preprocessing_td.clip',
16                                'args':{'std_factor':2.5}},
17                               {'function':'seismic.correlate.preprocessing_td.signBitNormalization',
18                                'args': {}}
19                               ],
20              # Standard functions reside in seismic.correlate.preprocessing_fd
21             'FDpreProcessing':[
22                                {'function':'seismic.correlate.preprocessing_fd.spectralWhitening',
23                                 'args':{'joint_norm':False}},
24                                {'function':'seismic.correlate.preprocessing_fd.FDfilter',
25                                 'args':{'flimit':[0.01,0.02,9,10]}}
26                                #  {'function':seismic.correlate.preprocessing_fd.FDsignBitNormalization,
27                                # 'args':{}}
28                                ],
29            ...
30            }

Arguments for the actual correlation#

Subdivision is the parameter that decides about the length and increment of the noise recordings to be preprocessed and correlated. If recombine_subdivision=True, the correlations will be stacked to read_len.

  • LengthToSave is the length of each correlation function in seconds

  • Center_Correlation If True, zero-lag will always be in the middle of the function.

  • normalize_correlation: Normalise the correlation by the absolute maximum?

 1    # subdivision of the read sequences for correlation
 2    # type: presence of this key
 3    subdivision:
 4        # type: float [seconds]
 5        corr_inc : 3600
 6        corr_len : 3600
 7        # recombine these subdivisions
 8        # unused at the time
 9        # type: boolean
10        recombine_subdivision : True
11        # delete
12        # type: booblean
13        delete_subdivision : False
14
15    # parameters for correlation preprocessing
16    # Standard functions reside in seismic.correlate.preprocessing_td
17    corr_args : {'lengthToSave':100,
18                'center_correlation':True,      # make sure zero correlation time is in the center
19                'normalize_correlation':True,
20                'combinations':[]
21                }

The rest of the yaml file will be discussed at a later points. Now, let’s actually start the computation!