%%{init: { 'logLevel': 'debug', 'theme': 'base' } }%%
graph LR
    waveform[Get Data] --> correlate(Correlation)
    correlate -->|save| corrdb[(CorrDB/hdf5)]
    corrdb:::active --> monitor
    monitor[Measure dv] -->|save| dv{{DV}}
    click waveform "../trace_data.html" "trace_data"
    click correlate "../correlate.html" "correlate"
    click monitor "../monitor.html" "monitor"
    click corrdb "../corrdb.html" "CorrDB"
    click dv "../monitor/dv.html" "DV"
    classDef active fill:#f666, stroke-width:4px, stroke:#f06;

CorrelationDataBase and DBHandler objects#

In SeisMIC, correlations are stored in HDF5 container files. This has the advantage of avoiding potential overhead caused by large amounts of correlation files. SeisMIC’s implementation relies on modified h5py classes. After computing your correlations as shown in the earlier steps, they will be saved in one file per component-combination (default since SeisMIC 0.5.0) or one file per station combination (for older databanks). You can alter this setting in your params.yaml.

Example for one file per component-combination: The file IU-TA.HRV-M58A.00-01.HHZ-HHZ holds the correlations of the two vertical components of the stations (location 00 and 01, respectively).

Example for one file per station-combination: The file IU-TA.HRV-M58A.00-01 holds the correlations of all components and locations of the two stations with each other, whereas the file IU-IU.HRV-HRV.00-00 holds all autocorrelations and intercomponent correlations of the station IU.HRV.

Note

SeisMIC makes use of SEED-like station codes. The general logic of those codes is: net0-net1.stat0-stat1.loc0-loc1.ch0-ch1. Here, 0 is the first station and 1 the second station. Correlations will always only be computed alphabetically. That is, SeisMIC will not compute a correlation for TA-IU.M58A-HRV but only for IU-TA.HRV-M58A.

As a user, you will only ever be calling the CorrelationDataBase class. The only function of this class is to return a DBHandler, which hold all the “useful” functions. To call CorrelationDataBase, use a context manager like so:

>>> from seismic.db.corr_hdf5 import CorrelationDataBase
>>> with CorrelationDataBase('/path/to/myfile.h5') as cdb:
>>>     type(cdb)  # This is a DBHandler
<class 'seismic.db.corr_hdf5.DBHandler'>

Warning

Do not call DBHandler directly! This might lead to unexpected behaviour or even dataloss due to corrupted hdf5 files.

Warning

If you should for some reason decide to not use the context manager, you will have to close the hdf5 file with seismic.db.corr_hdf5.DBHandler._close() to avoid corrupting your files!

Reading Correlations#

The most common usecase is probably that you will want to access correlations that SeisMIC computed for you (as shown earlier). To do so, you can use the get_data() method:

>>> from seismic.db.corr_hdf5 import CorrelationDataBase
>>> with CorrelationDataBase('/path/to/myfile.h5', mode='r') as cdb:
>>>     cst = cdb.get_data(
>>>         tag='subdivision', network='IU-IU', station='*', channel='??Z-??Z', location='00-00', corr_start=None, corr_end=None)
>>> # cst is a CorrStream object on that we can use our known methods
>>> print(type(cst))
<class 'seismic.correltea.stream.CorrStream'>
>>> #cst.count()
289

As you can see, we use station combination codes (described in the note box above) to identify data. All arguments accept wildcards. The data we are loading are correlations from waveforms recorded between corr_start and corr_end. Setting those arguments to None is similar to using a wildcard (i.e., load from earliest to latest available data).

Tags#

SeisMIC uses the following standard tags:

subdivision: Are the unstacked correlations with the the correlation length of corr_len as defined in our yaml file.
stack_$stacklen$ : Is the standard tag for correlation stacks, where $stacklen$ should be replaced by the stack length in seconds.

Obtain correlation parameters#

You might want to get the dictionary that you used to produce the correlations in the file. You can do that by using seismic.corr_hdf5.corrdb.DBHandler.get_corr_options().

Getting an overview over available data#

Once you have a sufficiently large dataset, you might be confused about which data you have already produced. In this case, SeisMIC offers several methods to make your life a little easier:

seismic.corr_hdf5.corrdb.DBHandler.get_available_starttimes(): Returns a dictionary of available starttimes for your chosen network, station, and channel combinations (wildcards are allowed).
seismic.corr_hdf5.corrdb.DBHandler.get_available_channels(): Returns the available channel combinations for a given station combination.
Access the DBHandler like a dictionary: Just like in h5py, it is possible to access the DBHandler like a dictionary. The logic works as follows: dbh[tag][netcomb][statcomb][chacomb][corr_start][corr_end]

Following the logic of the structure above, we can get a list of all available tags as follows:

>>> print(list(dbh.keys()))
['stack_34798', 'subdivision']

Writing Correlations#

If you postprocess your correlations (e.g., stacking), you might want to save the data afterwards. When writing to the correlation hdf5 files, you will have to pay attention to a couple of particularities:

You need to provide a corr_options dictionary to be able to open the file with mode!=r. If you don’t provide a dictionary or your dictionary is different from the one used to produce the data, the code will raise an error. This is meant to prevent mixing of differently processed data.
You should consider using a sensible convention for your tags (if saving stacks, it’s best to stick to the standard convention as discussed above).

import yaml

from seismic.db.corr_hdf5 import CorrelationDataBase
from seismic.correlate.stream import CorrStream

# For this example, we are just gonna create an empty CorrStream
# Of course, this will not really add any data to the file
cst = CorrStream()

# Get your correlation dictionary
with open('/path/to/my/params.yaml') as file:
    co = yaml.load(file, Loader=yaml.FullLoader)
with CorrelationDataBase('/path/to/myfile.h5', mode='w', corr_options=co) as cdb:
    cdb.add_correlation(cst, tag='my_sensible_tag')

If there had been any data in our CorrStream, we could retrieve it as shown above. Network, station, and channel information are determined automatically from the CorrTrace header.

Why won’t SeisMIC allow me to write into an old file again?#

Sometimes when re-executing your correlation workflow, you might encounter a PermissionError. This happens whenever you change your processing parameters in your params.yaml file under corr. Essentially, SeisMIC prevents you from mixing correlations that have been processed in different ways. So if you want to recompute your correlations with a new set of processing parameters, you need to write them to a different folder!