Quality Assurance

Preamble and the importance of Users’ feedback

Quality assurance (QA) of seismological datasets is a complex topic that concerns all stages of the data cycle of observational seismology, from waveform data collection to delivery of data products and scientific results. Summarising QA procedures for the seismological data services and products coordinated by ORFEUS is further complicated by the distributed and federated structure of the European Integrated Data Archive (EIDA), where the individual EIDA nodes and the seismological observatories associated to them are primarily in charge of selecting and implementing their own data/metadata quality control procedures, yet following international best practices and standards. Therefore, this document is meant to provide a general overview of typical QA strategies across the data services and products coordinated by ORFEUS without providing details on local workflows. Note that the data services promoted by ORFEUS are described in detail in peer-reviewed publications (e.g., https://doi.org/10.1785/0220200413 & https://doi.org/10.1785/0220200398). We focus herein on permanent seismic networks and the associated data services and products (e.g., https://pubs.geoscienceworld.org/srl/issue/92/3); an overview of best practices for temporary deployments can be found at https://polybox.ethz.ch/index.php/s/EreWWnfm2gQoLdD.

Among the governing bodies of ORFEUS is a User Advisory Group (UAG) in charge of reviewing services and suggesting improvements to ORFEUS services and products. The UAG delivers annual reports for the ORFEUS Executive Committee and Service Management Committees, that are chiefly focused on improving data/metadata quality and availability. ORFEUS and its UAG promote community workshops on these topics (see e.g, https://polybox.ethz.ch/index.php/s/7ZUfDsFzDQPyYcW & https://polybox.ethz.ch/index.php/s/pOqflrkKfwP9MJD). In 2019, triggered by the AlpArray user community, the EIDA developers opened an EIDA User Feedback Repository hosted on GitHub (https://github.com/EIDA/userfeedback). This has become the preferred way of reporting a wide variety of issues to the EIDA maintenance team (technical difficulties, questions, and suggestions). Extensive documentation for the users is also provided and kept up to date next to the services access pages at the ORFEUS portal, including specific documentation for data center operators.

QA is relevant and performed in all steps of seismological data management as summarised in the following sections.

1 – Instrumentation

High-quality is ensured by relying on worldwide trusted manufacturers.

In Europe and worldwide, modern seismic networks, like those contributing data/metadata to ORFEUS programs, rely on state-of-the-art seismological instrumentation and carefully plan the resources to allow maintenance, renewal and upgrades of the monitoring infrastructure. Current permanent seismic monitoring means dense deployments of modern broadband / short-period velocity and acceleration sensors, often co-located, writing on 24- or 26-bit digitisers (dataloggers) with continuous real-time streaming to data centers. The make and model of seismic instrumentation used across seismic networks are heterogeneous because several trusted manufacturers exist (see e.g., https://ds.iris.edu/NRL/). The use of low-cost seismic instruments (e.g., MEMs accelerometers), sometimes located at low quality (high-noise) sites, is becoming increasingly popular in regions of high seismicity if seismic networks need to operate under significant budget constraints; this type of instrumentation features reduced dynamic range compared to current state-of-the-art permanent seismic monitoring devices and is chiefly meant for strong ground motion monitoring; it is not yet widespread across ORFEUS associated data centers. Alongside digital strong motion data coming from on-line sources such as the EIDA infrastructure, the Engineering Strong-Motion (ESM; https://orfeus-eu.org/esm) database also contains important earthquake recordings of analogue instrumentation and lower-resolution (12-, 16-, 19-bit) accelerographs: the quality of these records coming from legacy and off-line resources is checked and assessed by expert strong-motion waveform analysts before being added to the ESM (see https://link.springer.com/article/10.1007/s10518-017-0299-z). An overview of the ESM workflow including data provenance is available at https://esm-db.eu/#/about/overview.

2 - Site scouting

High-quality is ensured by monitoring the background noise levels at candidate sites.

Candidate sites for installation of permanent seismological instrumentation are carefully chosen based on the local and regional seismicity, and best practices for earthquake monitoring (e.g., https://bib.telegrafenberg.de/publizieren/bibliotheksverlag/nmsop). Permanent seismic networks are denser in areas of higher seismicity, and stations are spatially distributed to attempt homogeneous coverage of the monitored regions. Candidate sites are typically equipped with temporary seismological instrumentation for a few days / weeks, and the final locations are chosen based on the observed background noise levels at the candidate sites. Broadband velocity instrumentation is normally installed at sites with low noise levels across a very broad frequency range to allow detection and on-scale recording of weak motions and teleseisms; the USGS low- and high-noise models (and subsequent regional updates) are used as reference (https://pubs.er.usgs.gov/publication/ofr93322). Conversely, acceleration sensors, designed to record strong ground shaking, are often installed in urban (risk-prone) areas or at locations where significant site amplification is expected. When high-quality strong-motion sensors are installed or co-located at sites of seismological quality, these stations are capable of reliably recording weak motions across a broad frequency range (e.g., https://doi.org/10.1193/1.4000107).

3 - Station set-up

High-quality is ensured by following community best practices.

Seismic sensors are normally installed with optimal coupling with the ground/support – typically in shallow boreholes or vaults - and isolated from external sources of noise (e.g., temperature, humidity, wind). Digitisers are preferably located far from the seismic sensor to mitigate the amplitude of the mains hum. Broadband velocity sensors are often located in underground facilities: this is crucial information to provide to users in order to correctly interpret the amplitudes of the recordings. Velocity and acceleration sensors are often co-located to allow recording of the whole dynamic range of Earth signals.

4 - Telecommunication and real-time data streaming

High-quality is ensured by relying on the best available telemetry solutions.

Permanent seismic networks rely on the best available telemetry options for earthquake monitoring, including government secured internet connections in some cases. Data streaming to data centers happens in real-time typically based on the SeedLink protocol. Dataloggers have local storage capacity that allows retrieval of off-line recorded data in case real-time data flow is temporarily interrupted. Dataloggers can stream State-of-Health (SOH) channels that include information on timing quality. Stations are also equipped with custom made monitoring devices that allow remote control of all connected equipment and auto watchdogs of the telemetry.

5 - Data and Metadata

High-quality is ensured by following community standards and best practices.

Different data formats exist and are used by the geophysical community. The most common data formats in observational seismology are presently SEED (https://www.fdsn.org/pdf/SEEDManual_V2.4.pdf), StationXML (https://www.fdsn.org/xml/station/) and QuakeML (https://quake.ethz.ch/quakeml/), all standards promoted by the International Federation of Digital Seismograph Networks (FDSN; https://www.fdsn.org/); the formats of seismological waveform data and associated station metadata distributed via EIDA are miniSEED and StationXML, respectively. For dissemination of large datasets, as well as for heterogenous datasets including waveform data and station/earthquake metadata, PH5 (https://ds.iris.edu/ds/nodes/dmc/data/types/ph5/) and ADSF (Adaptable Seismic Data Format https://seismic-data.org/) data formats are becoming increasingly popular in the seismological community; ADSF is used along with ASCII and miniSEED for disseminating data of the ESM dataset. Overall, all waveform data and station/event metadata distributed via EIDA and the European SM services are generated and formatted according to state-of-the-art international standards. Data and metadata quality is routinely monitored by seismological observatories and data centers using a variety of procedures and software tools, that range from manual/visual interaction with the recorded waveforms (e.g., during manual earthquake location and characterisation), to seamless automated processing of continuous data streams. A popular software tool that can be used for data and metadata quality control is PQLX (https://pubs.er.usgs.gov/publication/ofr20101292, or its commercial version SQLX), that allows continuous quality monitoring of seismic stations through probability density functions of the power spectral densities of the recorded signals (see, e.g., http://www.seismo.ethz.ch/en/research-and-teaching/products-software/station-information/psd-intro/). Both IRIS and ORFEUS (ODC and other datacenters) developed and routinely use software tools to monitor data & metadata quality and availability: WFCatalog (https://www.orfeus-eu.org/data/eida/webservices/wfcatalog/), EIDAQC (https://eidaqc.readthedocs.io/en/latest/eidaqc.html), MUSTANG (https://services.iris.edu/mustang/), AutoStatsQ (https://github.com/gesape/AutoStatsQ), etc. There are community projects ongoing within ORFEUS, focused on documenting and improving data/metadata quality (e.g., https://polybox.ethz.ch/index.php/s/AFqgbE9m7Q3FU4F). The Rapid Raw Strong-Motion (RRSM, https://orfeus-eu.org/rrsm) database and associated services are downstream products of EIDA. Event information on the RRSM is based on earthquake alerts of the Euro-Mediterranean Seismological Center (EMSC; https://www.emsc-csem.org/), hence the quality control/assurance procedures of the EMSC apply (https://www.emsc-csem.org/Earthquake/seismicity/real_time.php). The ESM integrates data and metadata from EIDA as well as from other data providers. The station and event metadata provided by the ESM are based on the best available information from the data providers and network operators or through peer-reviewed journal publications. Presently, many waveforms distributed by the ESM are manually processed by experts of strong-motion data analysis using documented methods and algorithms.

6 - Data & metadata archival and dissemination, data products

High-quality is ensured by following community standards and best practices, and by publications in peer-reviewed scientific journals. Seismological waveform data/metadata are archived and backed-up for all services and products promoted by ORFEUS following standard data management procedures and community best practices. Waveform data are typically stored on multiple NAS (network-attached storage) and regularly backed-up; station metadata are also safely archived and backed-up, and changes are monitored with version control systems. Dissemination is chiefly based on webservices, including those promoted by the FDSN as well as EIDA- and ESM-specific webservices (see https://www.orfeus-eu.org/data/eida/webservices/ & https://esm-db.eu/#/data_and_services/web_services). Information on the availability of the EIDA webservices is seamlessly updated and provided in real-time on the ORFEUS website and on the webpages of a dedicated project at EIDA@BGR (https://www.szgrf.bgr.de/eida_availability_report.html). IRIS also provides a simplistic monitoring page http://ds.iris.edu/files/stats/datacenters/. The functionality of the web interfaces promoted by ORFEUS to access data from EIDA, RRSM, ESM is also regularly monitored, and any issues are promptly fixed if/as they arise. Quality of derived data products is ensured by publication in peer-reviewed scientific journals (e.g., https://link.springer.com/article/10.1007/s10518-018-0480-z)

Concluding remark

Overall, the highest possible quality of data, metadata, services, and derived products distributed by the infrastructures coordinated by ORFEUS is ensured at any stage of the data management process by means of the adoption of international community standards and best practices, including publications in peer-reviewed journals. Possible issues are tracked and promptly solved.