%%{init: { 'logLevel': 'debug', 'theme': 'base' } }%%
graph LR
    waveform[Get Data] --> correlate(Correlation)
    correlate:::active -->|save| corrdb[(CorrDB/hdf5)]
    corrdb --> monitor
    monitor[Measure dv] -->|save| dv{{DV}}
    click waveform "../trace_data.html" "trace_data"
    click correlate "../correlate.html" "correlate"
    click monitor "../monitor.html" "monitor"
    click corrdb "../corrdb.html" "CorrDB"
    click dv "../monitor/dv.html" "DV"
    classDef active fill:#f666, stroke-width:4px, stroke:#f06;

Computing your first Noise Correlations#

After having downloaded your data as shown here (or retrieved seismic data any other way), we are ready to compute our first noise correlations! In SeisMIC, parameters for the preprocessing, the correlation, and, subsequently, measuring the change in seismic velocity are provided as a .yaml file or as a python dictionary. An example for such a .yaml file is shown below (and provided in the repository as params_example.yaml).

Setting the parameters#

#### Project wide parameters
# lowest level project directory
proj_dir : '/path/to/project/root/'
# Save the component combinations in separate hdf5 files
# This is faster for multi-core if True
# Set to False for compatibility with SeisMIC version < 0.5.0
save_comps_separately: True
# directory for logging information
log_subdir : 'log'
# levels:
# 'DEBUG', 'INFO', 'WARNING', 'ERROR', or 'CRITICAL'
log_level: 'WARNING'
# folder for figures
fig_subdir : 'figures'
# Path to stationxml files, default is "inventory/*.xml"
stationxml_file : 'inventory/*.xml'
# sds root directory, default is "mseed"
sds_dir : 'mseed'
# sds format string of waveform file names,
# change if your filenames deviate from standard pattern
sds_fmtstr : '{year}/{network}/{station}/{channel}.{sds_type}/{network}.{station}.{location}.{channel}.{sds_type}.{year}.{doy:03d}'



#### parameters that are network specific
net:
    # list of stations used in the project
    # type: list of strings or string, wildcards allowed
    network : 'D0'
    station : ['BZG', 'ESO', 'KBG', 'KIR']
    # For "component" only strings are allowed (no lists!)
    component: 'Z'  # Use * or ? or None to allow all components

#### parameters for correlation (emperical Green's function creation)
co:
    # subdirectory of 'proj_dir' to store correlation
    # type: string
    subdir : 'corr'
    # times sequences to read for cliping or muting on stream basis
    # These should be long enough for the reference (e.g. the standard
    # deviation) to be rather independent of the parts to remove
    # type: string
    read_start : '2015-08-1 00:00:01.0'
    read_end : '2015-08-06 00:00:00.0'
    # type: float [seconds]
    # The longer the faster, but higher RAM usage.
    # Note that this is also the length of the correlation batches
    # that will be written (i.e., length that will be
    # kept in memory before writing to disk)
    # If you are unsure, keep defaults
    read_len : 86398
    read_inc : 86400

    # New sampling rate in Hz. Note that it will try to decimate
    # if possible (i.e., there is an integer factor from the
    # native sampling_rate). Pay attention to Nyquist frequency for
    # later steps
    sampling_rate : 25
    # Remove the instrument response, will take substantially more time
    remove_response : False

    # Method to combine different traces
    # Options are: 'betweenStations', 'betweenComponents', 'autoComponents', 'allSimpleCombinations', or 'allCombinations'
    combination_method : 'betweenStations'

    # If you want only specific combinations to be computed enter them here
    # In the form [Net0-Net1.Stat0-Stat1] or [Net0-Net1.Stat0-Stat1.Cha0-Cha1]
    # This option will only be consider if combination_method == 'betweenStations'
    # Comment or set == None if not in use
    xcombinations : None

    # preprocessing of the original length time series
    # these function work on an obspy.Stream object given as first argument
    # and return an obspy.Stream object.
    # available preimplemented functions are in seismic.correlate.preprocessing_stream
    preProcessing : [
                    # This is intended as a "technical" bandpass filter to remove unphysical signals, i.e., frequencies you would not expect in the data
                    {'function':'seismic.correlate.preprocessing_stream.stream_filter',
                    'args':{'ftype':'bandpass',
                            'filter_option':{'freqmin':0.01, #0.01
                                            'freqmax':12}}}
                    # Mute data, when the amplitude is above a certain threshold
                    #{'function':'seismic.correlate.preprocessing_stream.stream_mute',
                    # 'args':{'taper_len':100,
                    #         'mute_method':'std_factor',
                    #         'mute_value':3}}
                    ]
    # subdivision of the read sequences for correlation
    # if this is set the stream processing will happen on the hourly subdivision. This has the
    # advantage that data that already exists will not need to be preprocessed again
    # On the other hand, computing a whole new database might be slower
    # Recommended to be set to True if:
    # a) You update your database and a lot of the data is already available (up to a magnitude faster)
    # b) corr_len is close to read_len
    # Is automatically set to False if no existing correlations are found
    preprocess_subdiv: True
    # type: presence of this key
    subdivision:
        # type: float [seconds]
        corr_inc : 3600
        corr_len : 3600
        # recombine these subdivisions
        # unused at the time
        # type: boolean
        recombine_subdivision : True
        # delete
        # type: booblean
        delete_subdivision : False

    # parameters for correlation preprocessing
    # Standard functions reside in seismic.correlate.preprocessing_td and preprocessing_fd, respectively
    corr_args : {'TDpreProcessing':[
                                {'function':'seismic.correlate.preprocessing_td.taper',
                                    'args':{'type':'cosine_taper', 'p': 0.02}},
                                    {'function':'seismic.correlate.preprocessing_td.detrend',
                                    'args':{'type':'linear'}},
                                {'function':'seismic.correlate.preprocessing_td.TDfilter',
                                'args':{'type':'bandpass','freqmin':2,'freqmax':8}},
                                    #{'function':'seismic.correlate.preprocessing_td.mute',
                                    # 'args':{'taper_len':100.,
                                        # 'threshold':1000, absolute threshold
                                    #         'std_factor':3,
                                    #         'filter':{'type':'bandpass','freqmin':2,'freqmax':4},
                                    #         'extend_gaps':True}},
                                {'function':'seismic.correlate.preprocessing_td.clip',
                                    'args':{'std_factor':2.5}},
                                {'function':'seismic.correlate.preprocessing_td.signBitNormalization',
                                    'args': {}}
                                ],
                # Standard functions reside in seismic.correlate.preprocessing_fd
                'FDpreProcessing':[
                                    {'function':'seismic.correlate.preprocessing_fd.spectralWhitening',
                                    'args':{'joint_norm':False}},
                                    {'function':'seismic.correlate.preprocessing_fd.FDfilter',
                                    'args':{'flimit':[0.01,0.02,9,10]}}
                                    #  {'function':seismic.correlate.preprocessing_fd.FDsignBitNormalization,
                                    # 'args':{}}
                                    ],
                'lengthToSave':100,
                'center_correlation':True,      # make sure zero correlation time is in the center
                'normalize_correlation':True,
                'combinations':[]
                }


#### parameters for the estimation of time differences
dv:
    # subfolder for storage of time difference results
    subdir : 'vel_change'

    # Plotting
    plot_vel_change : True

    ### Definition of calender time windows for the time difference measurements
    start_date : '2015-05-01 00:00:00.0'   # %Y-%m-%dT%H:%M:%S.%fZ'
    end_date : '2016-01-01 00:00:00.0'
    win_len : 86400                         # length of window in which EGFs are stacked
    date_inc : 86400                        # increment of measurements

    ### Frequencies
    freq_min : 4
    freq_max : 8

    ### Definition of lapse time window, i.e. time window in the coda that is used for the dv/v estimate
    tw_start : 20     # lapse time of first sample [s]
    tw_len : 60       # length of window [s] Can be None if the whole (rest) of the coda should be used
    sides : 'both'   # options are left (for acausal), right (causal), both, or single (for active source experiments where the first sample is the trigger time)
    compute_tt : True  # Computes the travel time and adds it to tw_start (tt is 0 if not xstations). If true a rayleigh wave velocity has to be provided
    rayleigh_wave_velocity : 1  # rayleigh wave velocity in km/s, will be ignored if compute_tt=False


    ### Range to try stretching
    # maximum stretch factor
    stretch_range : 0.03
    # number of stretching increments
    stretch_steps : 1001
    # Return the similarity matrix for each dv/v estimate
    # required for post processing. If False, saves some RAM and disk space
    return_sim_mat : True

    #### Reference trace extraction
    #  win_inc : Length in days of each reference time window to be used for trace extraction
    # If == 0, only one trace will be extracted
    # If > 0, mutliple reference traces will be used for the dv calculation
    # Can also be a list if the length of the windows should vary
    # See seismic.correlate.stream.CorrBulk.extract_multi_trace for details on arguments
    dt_ref : {'win_inc' : 0, 'method': 'mean', 'percentile': 50}

    # preprocessing on the correlation bulk or corr_stream before stretch estimation
    preprocessing: [
                    #{'function': 'pop_at_utcs', 'args': {'utcs': np.array([UTCDateTime()])},  # Used to remove correlations from certain times
                    #{'function': 'select_time', 'args': {'start': (0, 0, 0), 'end': (23, 0, 0), exclude=False}, # include only certain time of day (h, minute, second)
                    {'function': 'smooth', 'args': {'wsize': 48, 'wtype': 'hanning', 'axis': 1}}
    ]

    # postprocessing of dv objects before saving and plotting
    postprocessing: [
                    {'function': 'smooth_sim_mat', 'args': {'win_len': 7, exclude_corr_below: 0}}
    ]

#### parameters to compute the waveform coherence
wfc:
# subfolder for storage of time difference results
    subdir : 'wfc'

    ### Definition of calender time windows for the time difference measurements
    start_date : '2015-05-01 00:00:00.0'   # %Y-%m-%dT%H:%M:%S.%fZ'
    end_date : '2016-01-01 00:00:00.0'
    win_len : 86400                         # length of window in which EGFs are stacked
    date_inc : 86400                        # increment of measurements

    ### Frequencies
    # can be lists of same length
    freq_min : [0.0625, 0.09, 0.125, 0.25, 0.375, 0.5, 0.75, 1, 1.5, 2, 3, 4, 5]
    freq_max : [0.125, 0.18, 0.25, 0.5, 0.75, 1, 1.5, 2, 3, 4, 6, 8, 10]

    ### Definition of lapse time window
    # can be lists of same length or tw_start: List and tw_len: single value (will be applied to all)
    tw_start : [0, 1.25, 2.5, 3.75, 5, 6.25, 7.5, 8.75, 10, 11.25, 12.5, 13.75, 15, 16.25, 17.5, 17.75, 20]     # lapse time of first sample [s]
    tw_len : 5       # length of window [s]

    #### Reference trace extraction
    #  win_inc : Length in days of each reference time window to be used for trace extraction
    # If == 0, only one trace will be extracted
    # If > 0, mutliple reference traces will be used for the dv calculation
    # Can also be a list if the length of the windows should vary
    # See seismic.correlate.stream.CorrBulk.extract_multi_trace for details on arguments
    dt_ref : {'win_inc' : 0, 'method': 'mean', 'percentile': 50}

    # preprocessing on the correlation bulk before stretch estimation
    preprocessing: [
                    #{'function': 'select_time', 'args': {'start': (0, 0, 0), 'end': (23, 0, 0), exclude=False}, # include only certain time of day (h, minute, second)
                    {'function': 'smooth', 'args': {'wsize': 4, 'wtype': 'hanning', 'axis': 1}}
    ]

    ### SAVING
    # save components separately or only their average?
    save_comps: False

This might look a little intimidating at first glancec, but it is actually quite straight-forward. To achieve a better understanding of what each of the parameters do, let’s have a close look at them individually.

Project Wide Parameters#

#### Project wide parameters
# lowest level project directory
proj_dir : '/path/to/project/root/'
# Save the component combinations in separate hdf5 files
# This is faster for multi-core if True
# Set to False for compatibility with SeisMIC version < 0.5.0
save_comps_separately: True
# directory for logging information
log_subdir : 'log'
# levels:
# 'DEBUG', 'INFO', 'WARNING', 'ERROR', or 'CRITICAL'
log_level: 'WARNING'
# folder for figures
fig_subdir : 'figures'
# Path to stationxml files, default is "inventory/*.xml"
stationxml_file : 'inventory/*.xml'
# sds root directory, default is "mseed"
sds_dir : 'mseed'
# sds format string of waveform file names,
# change if your filenames deviate from standard SDS pattern
sds_fmtstr : '{year}/{network}/{station}/{channel}.{sds_type}/{network}.{station}.{location}.{channel}.{sds_type}.{year}.{doy:03d}'

Those are parameters that govern the logging and the file-structure. proj_dir is the root directory, we have chosen when initialising our Store_Client as shown here . fig_dir and log_dir are just subdirectories for figures and logs, respectively, and the log level decides how much will actually be logged. In most cases, WARNING is the appropriate choice - everything below will start spitting out a lot of information. stationxml_file is the path to the stationxml files, and sds_dir is the directory where the mseed files are stored. sds_fmtstr is the format string for the mseed files. The default is the standard for SeisComP Data Structure (SDS). Leave these parameters to the default values if you used SeisMIC to download the data. However, if you want to use your own local data base, you will have to adjust these parameters accordingly. Wildcards are permitted.

The sds_fmtstr defines the naming pattern of the mseed files. The following placeholders (everything in {}) can be used: year, network, station, channel, location, sds_type, doy. The standard pattern for SeisComP3 is:

sds_fmtstr : '{year}/{network}/{station}/{channel}.{sds_type}/{network}.{station}.{location}.{channel}.{sds_type}.{year}.{doy:03d}'

Network Specific Parameters#

#### parameters that are network specific
net:
    # list of stations used in the project
    # type: list of strings or string, wildcards allowed
    network : 'D0'
    station : ['BZG', 'ESO', 'KBG', 'KIR']
    # For "component" only strings are allowed (no lists!)
    component: 'Z'  # Use * or ? or None to allow all components

Here, we decide which data to use (i.e., which data the correlator will look for and read in). All parameters accept wildcards. Network and Station can be strings or lists.

Note

If both network and station are lists, they have to have the same length. The corresponding logic is: [net0, net1, net2, net3] and [stat0, stat1, stat2, stat3] and so on.

Correlation Arguments#

This is the really juicy stuff and probably the part that will have the strongest influence on your results. Let’s start by getting the most obvious parameters out of the way:

#### parameters for correlation (emperical Green's function creation)
co:
    # subdirectory of 'proj_dir' to store correlation
    # type: string
    subdir : 'corr'
    # times sequences to read for cliping or muting on stream basis
    # These should be long enough for the reference (e.g. the standard
    # deviation) to be rather independent of the parts to remove
    # type: string
    read_start : '2015-08-1 00:00:01.0'
    read_end : '2015-08-06 00:00:00.0'
    # type: float [seconds]
    # The longer the faster, but higher RAM usage.
    # Note that this is also the length of the correlation batches
    # that will be written (i.e., length that will be
    # kept in memory before writing to disk)
    read_len : 86398
    read_inc : 86400

subdir The directory to save the correlations in (correlations are generally saved in hdf5 format).
read_start and read_end are the earliest and latest dates that you want to read
read_len the length that will be read in. Usually, you have one mseed file per day. To avoid having to read several files, you will want that to be a bit less than a day
read_inc is the increment between each reading interval

Note

Neither read_len nor read_inc are deciding about the correlation length.

    # New sampling rate in Hz. Note that it will try to decimate
    # if possible (i.e., there is an integer factor from the
    # native sampling_rate)
    sampling_rate: 25
    # Remove the instrument response, will take substantially more time
    remove_response: False

    # Method to combine different traces
    combination_method : 'betweenStations'

    # If you want only specific combinations to be computed enter them here
    # In the form [Net0-Net0.Stat0-Stat1]
    # This option will only be consider if combination_method == 'betweenStations'
    # Comment or set == None if not in use
    xcombinations : None

Sampling_rate is the new sampling rate you will want your data to have. SeisMIC will take care of anti-alias filtering and determine whether data can be decimated.
remove_response if you want the data to be corrected for the instrument response, set this to True. Instrument response removal in obspy is unfortunately quite expensive..
combination_method decides which components you will want to correlate. See calc_cross_combis() for allowed options.
xcombinations If you want to save some computational resources and only compute specific combinations, use this function. If you want to limit the maximum distance between stations to cross-correlate you can use seismic.correlate.correlate.Correlator.find_interstat_dist() to compute this parameter

Preprocessing Arguments#

SeisMIC is coded in a manner that makes it easy for the user to pass custom preprocessing functions. Custom functions can be defined in the three parameters preProcessing, TDpreProcessing, and FDpreprocessing. All these parameters expect a list of dictionaries as input. Each dictionary must have the keys function and args. The value for function is a string describing the complete import path of the preprocessing function in the form ‘package.module.sobmodule.function’. args is simply a keyword argument dictionary that will be passed to the function.

SeisMIC comes with a number of preprocessing functions. If you are creating a custom preprocessing function, it is probably a good idea to have a look at these first in order to understand the required syntax. Preprocecssing is generally done in three steps:

Preprocessing “on per stream basis” All functions here take an obspy stream as input and return the processed stream. An over view of available stream preprocessing functions can be found in preprocessing_stream.

    # preprocessing of the original length time series
    # these function work on an obspy.Stream object given as first argument
    # and return an obspy.Stream object.
    preProcessing : [
                    {'function':'seismic.correlate.preprocessing_stream.detrend_st',
                    'args':{'type':'linear'}},
                    {'function':'seismic.correlate.preprocessing_stream.cos_taper_st',
                    'args':{'taper_len': 100, # seconds
                            'lossless': True}}, # lossless tapering stitches additional data to trace, tapers, and removes the tapered ends after preprocessing
                    # This is intended as a "technical" bandpass filter to remove unphysical signals, i.e., frequencies you would not expect in the data
                    {'function':'seismic.correlate.preprocessing_stream.stream_filter',
                    'args':{'ftype':'bandpass',
                            'filter_option':{'freqmin':0.01, #0.01
                                            'freqmax':12}}}
                    # Mute data, when the amplitude is above a certain threshold
                    #{'function':'seismic.correlate.preprocessing_stream.stream_mute',
                    # 'args':{'taper_len':100,
                    #         'mute_method':'std_factor',
                    #         'mute_value':3}}
                    ]

Preprocessing on arrays in time and frequency domain The functions to use have to be provided in corr_args['TDpreProcecssing'] and corr_args['FDpreProcecssing']. A custom function would need to take a matrix as input, where each column is one waveform in time or frequency domain. Additionally, the args dictionary and a params dictionary will be passed.

# parameters for correlation preprocessing
# Standard functions reside in seismic.correlate.preprocessing_td
corr_args : {'TDpreProcessing':[
                                # detrend not recommended. Use preProcessing detrend_st instead (faster)
                                # {'function':'seismic.correlate.preprocessing_td.detrend',
                                # 'args':{'type':'linear'}},
                               {'function':'seismic.correlate.preprocessing_td.TDfilter',
                               'args':{'type':'bandpass','freqmin':2,'freqmax':8}},
                                #{'function':'seismic.correlate.preprocessing_td.mute',
                                # 'args':{'taper_len':100.,
                                       # 'threshold':1000, absolute threshold
                                #         'std_factor':3,
                                #         'filter':{'type':'bandpass','freqmin':2,'freqmax':4},
                                #         'extend_gaps':True}},
                               {'function':'seismic.correlate.preprocessing_td.clip',
                                'args':{'std_factor':2.5}},
                               {'function':'seismic.correlate.preprocessing_td.signBitNormalization',
                                'args': {}}
                               ],
              # Standard functions reside in seismic.correlate.preprocessing_fd
             'FDpreProcessing':[
                                {'function':'seismic.correlate.preprocessing_fd.spectralWhitening',
                                 'args':{'joint_norm':False}},
                                {'function':'seismic.correlate.preprocessing_fd.FDfilter',
                                 'args':{'flimit':[0.01,0.02,9,10]}}
                                #  {'function':seismic.correlate.preprocessing_fd.FDsignBitNormalization,
                                # 'args':{}}
                                ],
            ...
            }

Arguments for the actual correlation#

Subdivision is the parameter that decides about the length and increment of the noise recordings to be preprocessed and correlated. If recombine_subdivision=True, the correlations will be stacked to read_len.

LengthToSave is the length of each correlation function in seconds
Center_Correlation If True, zero-lag will always be in the middle of the function.
normalize_correlation: Normalise the correlation by the absolute maximum?

    # subdivision of the read sequences for correlation
    # type: presence of this key
    subdivision:
        # type: float [seconds]
        corr_inc : 3600
        corr_len : 3600
        # recombine these subdivisions
        # unused at the time
        # type: boolean
        recombine_subdivision : True
        # delete
        # type: booblean
        delete_subdivision : False

    # parameters for correlation preprocessing
    # Standard functions reside in seismic.correlate.preprocessing_td
    corr_args : {'lengthToSave':100,
                'center_correlation':True,      # make sure zero correlation time is in the center
                'normalize_correlation':True,
                'combinations':[]
                }

The rest of the yaml file will be discussed at a later points. Now, let’s actually start the computation!