Policies

DesignSafe Data Depot Repository Mission and History

Mission

The Data Depot Repository (DDR) is the platform for curation and publication of datasets generated in the course of natural hazards research. The DDR is an open access data repository that enables data producers to safely store, share, organize, and describe research data, towards permanent publication, distribution and impact evaluation. The DDR allows data consumers to discover, search for, access, and reuse published data in an effort to accelerate research discovery. The DDR is one component of the DesignSafe cyberinfrastructure, which represents a comprehensive research environment that provides cloud-based tools to manage, analyze, understand, and publish critical data for research to understand the impacts of natural hazards. DesignSafe is part of the NSF-supported Natural Hazards Engineering Research Infrastructure (NHERI), and aligns with its mission to provide the natural hazards research community with open access, shared-use scholarship, education, and community resources aimed at supporting civil infrastructure prior to, during, and following natural disasters. However, DesignSafe also supports the broader natural hazards research community that extends beyond the NHERI network.

History

The DDR has been in operation since 2016 and is currently supported by NSF through 2025. The DDR preserves natural hazards research data published since its inception in 2016, and also provides access to legacy data dating from about 2005. These legacy data were generated as part of the NSF-supported Network for Earthquake Engineering Simulation (NEES), a predecessor to NHERI. Legacy data and metadata belonging to NEES were transferred to the DDR for continuous preservation and access. View the published NEES data here.

Community

The DDR serves the broader natural hazards research community both as data producers and consumers. This research community includes, but is not limited to, the facilities that make up NHERI network - for which we are the designated data repository. We work with each component of the NHERI network to meet the requirements and commitments of their distinct research focus and functions. As the only repository for open natural hazards research data, we welcome data produced in other national and international facilities and organizations. At large, the repository serves a broad national and international audience of natural hazard researchers, students, practitioners, policy makers, as well as the general public.

Governance

Policies for the DDR are driven by the Natural Hazards (NH) scientific community and informed by best practices in library and information sciences. The DDR operates under the leadership of the DesignSafe Management Team (DSMT), who establishes and updates policies, evaluates and recommends best practices, oversees its technical development, and prioritizes activities. The broad organizational structure under which the DDR operates is here.

Repository Team

An interdisciplinary repository team (RT) carries out ongoing design, development and day-to-day operations, gathering requirements and discussing solutions through formal monthly and bi-weekly meetings with the NHERI community and maintaining regular communications with members of the network, including monthly meetings with the Experimental Facilities, RAPID, and CONVERGE staff. Based on these fluid communications, the RT designs functionalities, researches and develops best-practices, and implements agreed-upon solutions. The figure below shows the current formation of the RT, including their expertise.

Formal mechanisms are in place for external evaluators to gather feedback and conduct structured assessments, in the form of usability studies and yearly user surveys, to ensure that the repository is meeting the community’s expectations and needs. To track development the DDR curator meets every other week with the DesignSafe PI and with the head of the development team. All DDR activities are reported to the National Science Foundation on a quarterly and annual basis in terms of quantitative and qualitative progress.

Community Norms for DDR

Within the broader conditions of use for DesignSafe we have established a set of Community Norms specific for DDR which have to be agreed upon at the point of registering an account on the platform. These norms, highlighting our existing policies, are the following:

Users who either publish and use data in DDR must abide by both the TACC Acceptable Use Policy and the DesignSafe Terms of Use.

For users curating and publishing data in DDR:
  • Users understand that their data submissions to the DDR should follow our Data Policies and our Curation and Publication Best Practices to the best of their ability.
  • Users agree to use DDR to publish only open access data, which they must document in a manner that does not hinder the ability of other users to reuse or reproduce it.
  • Users publishing or reusing data of others in their data publications must properly cite the datasets in accordance with the Joint Declaration of Data Citation Principles using the fields provided in the DDR interface.
  • Users agree to provide all the needed licenses and permissions to make data available for archiving and for reuse by others.
  • Users publishing human subjects data should abide by our Protected Data Best Practices.
  • Using DDR to publish data is entirely voluntary. None of these terms supersede any prior contractual obligations to confidentiality or proprietary information the user may have with third parties; thus, the user is entirely responsible for what they upload or share with DDR. .
  • Publications that do not fall within these norms may be removed.
For users using data published in DDR:
  • Users accessing and using DDR data agree to the following Data Use Agreement.
  • Users agree to use DDR resources in accessing and reusing open access data in ways that respect the licenses established in the publications.
  • Users agree to properly cite the datasets they use in their works in accordance with the Joint Declaration of Data Citation Principles using the citations provided in the published datasets landing pages.
  • We reserve the right to ask users to suspend their use of DDR should we receive complaints or note violations of these Community Norms.

Data Collections Development

Data Types

We accept engineering datasets, as well as social and behavioural sciences datasets, derived from research conducted in the context of natural hazards. In the area of engineering the primary focus is on data generated through simulation, hybrid simulation, experimental and field research methods regarding the impacts of wind, earthquake, and storm surge hazards. We also accept data reports, publications of Jupyter notebooks, code, scripts, lectures, and learning materials. In social and behavioural sciences (SBE), accepted datasets encompass the study of the human dimensions of hazards and disasters. As the field and the expertise of the community evolves we have expanded our focus to include datasets related to COVID-19, Fire Hazards, and Sustainable Material Management.

Users that deposit data that does not correspond to the accepted types will be alerted when possible prior to publication so they can remove their data from DDR, and will not be allowed to publish it. If a dataset non-compliant with the Collections Development policy gets published with a corresponding DOI, we will contact the authors and work with them to remove the data and leave a tombstone explaining why the data is not available. Curators review both in process and published data on a monthly basis. In both cases we will work with the authors to find an adequate repository for their dataset.

Data Size

Researchers in the natural hazards community generate very large datasets during large-scale experiments, simulations, and field research projects. At the moment the DDR does not pose limitations on the amount of data to be published, but we do recommend to be selective and publish data that is relevant to a project completeness and is adequately described for reuse. Our Data Curation Best Practices include recommendations to achieve a quality data publication. We are observing trends in relation to sizes and subsequent data reuse of these products, which will inform if and how we will implement data size publication limit policies.

Data Formats

We do not pose file format restrictions. The natural hazards research community utilizes diverse research methods to generate and record data in both open and proprietary formats, and there is continual update of equipment used in the field. We do encourage our users to convert to open formats when possible. The DDR follows the Library of Congress Recommended Format Statement and has guidance in place to convert proprietary formats to open formats for long term preservation; see our Accepted and Recommended Data Formats for more information. However, conversion can present challenges; Matlab, for example, allows saving complex data structures, yet not all of the files stored can be converted to a csv or a text file without losing some clarity and simplicity for handling and reusing the data. In addition, some proprietary formats such as jpeg, and excel have been considered standards for research and teaching for the last two decades. In attention to these reasons, we allow users to publish the data in both proprietary and open formats. Through our Fedora repository we keep file format identification information of all the datasets stored in DDR.

Data Curation

Data curation involves the organization, description, representation, permanent publication, and preservation of datasets in compliance with community best practices and FAIR data principles. In the DDR, data curation is a joint responsibility between the researchers that generate data and the DDR team. Researchers understand better the logic and functions of the datasets they create, and our team's role is to help them make these datasets FAIR-compliant.

Our goal is to enable researchers to curate their data from the beginning of a research project and turn it into publications through interactive pipelines and consultation with data curators. The DDR has and continues to invest efforts in developing and testing curation and publication pipelines based on data models designed with input from the NHERI community.

Data Management Plan

For natural hazards researchers submitting proposals to the NSF using any of the NHERI network facilities/resources, or alternative facilities/resources, we developed Data Management guidelines that explain how to use the DDR for data curation and publication. See Data Management Plan at: https://www.designsafe-ci.org/rw/user-guides/ and https://converge.colorado.edu/data/data-management

Data Models

To facilitate data curation of the diverse and large datasets generated in the fields associated with natural hazards, we worked with experts in natural hazards research to develop five data models that encompass the following types of datasets: experimental, simulation, field research, hybrid simulation, and other data products (See: 10.3390/publications7030051; 10.2218/ijdc.v13i1.661) as well as lists of specialized vocabulary. Based on the Core Scientific Metadata Model, these data models were designed considering the community's research practices and workflows, the need for documenting these processes (provenance), and using terms common to the field. The models highlight the structure and components of natural hazards research projects across time, tests, geographical locations, provenance, and instrumentation. Researchers in our community have presented on the design, implementation and use of these models broadly.

In the DDR web interface the data models are implemented as interactive functions with instructions that guide the researchers through the curation and publication tasks. As researchers move through the tion pipelines, the interactive features reinforce data and metadata completeness and thus the quality of the publication. The process will not move forward if requirements for metadata are not in place (See Metadata in Best Practices), or if key files are missing.

Metadata

Up to date, there is no standard metadata schema to describe natural hazards data. In DDR we follow a combination of standard metadata schemas and expert-contributed vocabularies to help users describe and find data.

Embedded in the DDR data models are categories and terms as metadata elements that experts in the NHERI network contributed and deemed important for data explainability and reuse. Categories reflect the structure and components of the research dataset, and the terms describe these components. The structure and components of the published datasets are represented on the dataset landing pages and through the Data Diagram presented for each dataset.

Due to variations in their research methods, researchers may not need all the categories and terms available to describe and represent their datasets. However, we have identified a core set of metadata that allows proper data representation, explainability, and citation. These sets of core metadata are shown for each data model in our Metadata Requirements in Best Practices.

To further describe datasets, the curation interface offers the possibility to add both predefined and custom file tags. Predefined file tags are specialized terms provided by the natural hazard community; their use is optional, but highly recommended. The lists of tags are evolving for each data model, continuing to be expanded, updated, and corrected as we gather feedback and observe how researchers use them in their publications.

For purposes of metadata exchange and interoperability, the elements and tags in the data models are mapped to widely-used, standardized metadata schemas. These are: Dublin Core for description of the dataset project, DDI (Data Documentation Initiative) for social science data, DataCite for DOI assignment and citation, and PROV-O to record the structure of the dataset. Metadata mapping is substantiated as the data is ingested into Fedora. Users can download the standardized metadata in the publications landing page.

Metadata and Data Quality

The diversity and quantity of research methods, instruments, and data formats in natural hazards research is vast and highly specialized. For this reason, we conceive of data quality as a collaboration between the researchers and the DDR. In consultation with the larger NHERI network we are continuously observing and defining best practices that emerge from our community's understanding and standards.

We address data quality from a variety of perspectives:

Metadata quality: Metadata is fundamental to data explainability and reuse. To support metadata quality we provide onboarding descriptions of all metadata elements, indicate which ones are required, and suggest how to complete them. Requirements for core metadata elements are automatically reinforced within the publication pipeline and the dataset will not be published if those are not fulfilled. Metadata can be accessed by users in standardized formats on the projects’ landing pages.

Data content quality: Different groups in the NHERI network have developed benchmarks and guidelines for data quality assurance, including StEER, CONVERGE and RAPID. In turn, each NHERI Experimental Facility has methods and criteria in place for ensuring and assessing data quality during and after experiments are conducted. Most of the data curated and published along NHERI guidelines in the DDR are related to peer-reviewed research projects and papers, speaking to the relevance and standards of their design and outputs. Still, the community acknowledges that for very large datasets the opportunity for detailed quality assessment emerges after publication, as data are analyzed and turned into knowledge. Because work in many projects continues after publication, both for the data producers and reusers, the community has the opportunity to version datasets.

Data completeness and representation: We understand data completeness as the presence of all relevant files that enable reproducibility, understandability, and therefore reuse. This may include readme files, data dictionaries and data reports, as well as data files. The DDR complies with data completeness by recommending and requesting users to include required data to fullfill the data model required categories indispensable for a publication understandibility and reuse. During the publication process the system verifies that those categories have data assigned to them.The Data Diagram on the landing page reflects which relevant data categories are present in each publication. A similar process happens for metadata during the publication pipeline; metadata is automatically vetted against the research community’s Metadata Requirements before moving on to receive a DOI for persistent identification.

We also support citation best practices for datasets reused in our publications. When users reuse data from other sources in their data projects, they have the opportunity to include them in the metadata through the Related Works and Referenced Data fields.

Data publications review: Once a month, data curators meet to review new publications. These reviews show us how the community is using and understanding the models, and allows verifying the overall quality of the data publications. When we identify curation problems (e.g. insufficient or unclear descriptions, file or category misplacement, etc.) that could not be automatically detected, we contact the researchers and work on solving these issues. Based on the feedback, users have the possibility to amend/improve their descriptions and to version their datasets (See amends and version control).

Curation and Publication Assistance

We believe that researchers are best prepared to tell the story of their projects through their data publications; our role is to enable them to communicate their research to the public by providing flexible and easy to use curation resources and guidance that abide by publication best practices. To support researchers organizing, categorizing and describing their data, we provide interactive pipelines with onboarding instructions, different modes of training and documentation, and one-on-one help.

Interactive pipelines: The DDR interface is designed to facilitate large scale data curation and publication through interactive step-by-step capabilities aided by onboarding instructions. This includes the possibility to categorize and tag multiple files in relation to the data models, and to establish relations between categories via diagrams that are intuitive to data producers and easy to understand for data consumers. Onboarding instructions including vocabulary definitions, suggestions for best practices, availability of controlled terms, and automated quality control checks are in place.

One-on-one support: We hold virtual office hours twice a week during which a curator is available for real-time consulting with individuals and teams. Other virtual consulting times can be scheduled on demand. Users can also submit Help tickets, which are answered within 24 hours, as well as send emails to the curators. Users also communicate with curatorial staff via the DesignSafe Slack channel. The curatorial staff includes a natural hazards engineer, a data librarian, and a USEX specialist. Furthermore, developers are on call to assist when needed.

Guidance on Best Practices: Curatorial staff prepares guides and video tutorials, including special training materials for Undergraduate Research Experience students and for Graduate Students working at Experimental Facilities.

Webinars by Researchers: Various researchers in our community contribute to our curation and publication efforts by conducting webinars in which they relay their data curation and publication experiences. Some examples are webinars on curation and publication of hybrid simulations, field research and social sciences datasets.

Data Publication and Usage

Protected Data

Protected data are information subject to regulation under relevant privacy and data protection laws, such as HIPAA, FERPA and FISMA, as well as human subjects data containing Personally Identifiable Information (PII) and data involving vulnerable populations and or containing sensitive information.

Publishing protected data in the DDR involves complying with the requirements, norms, and procedures approved by the data producers Institutional Review Board (IRB) or equivalent body regarding human subjects data storage and publication, and managing direct and indirect identifiers in accordance with accepted means of data de-identification. In the DDR protected data issues are considered at the onset of the curation and publication process and before storing data. Researchers working with protected data in DDR have the possibility to communicate this to the curation team when they select a project type in DDR and the curator gets in touch with them to discuss options and procedures.

Unless approved by an IRB, most forms of protected data cannot be published in DesignSafe. No direct identifiers and only up to three indirect identifiers are allowed in published datasets. However, data containing PII can be published in the DDR with proper consent from the subject(s) and documentation of that consent in the project's IRB paperwork. In all publications involving human subjects, researchers should include and publish their IRB documentation showing the agreement.

If as a consequence of data de-identification the data looses meaning, it is possible to publish a description of the data, the corresponding IRB documents, the data instruents if applicable, and obtain a DOI and a citation for the dataset. In this case, the dataset will show as with Restricted Access. In addition, authors should include information of how to reach them in order to gain access or discuss more information about the dataset. The responsibility to maintain the protected dataset in compliance with the IRB comitements and for the long term will lie on the authors, and they can use TACC's Protected Data Services if they need to. For more information on how to manage this case see our Protected Data Best Practices.

It is the user’s responsibility to adhere to these policies and the procedures and standards of their IRB or other equivalent institution, and DesignSafe will not be held liable for any violations of these terms regarding improper publication of protected data. User uploads that we are notified of that violate this policy may be removed from the DDR with or without notice, and the user may be asked to suspend their use of the DDR and other DesignSafe resources. We may also contact the user’s IRB and/or other respective institution with any cases of violation, which could incur in an active audit (See 24) of the research project, so users should review their institution’s policies regarding publishing with protected data before using DesignSafe and DDR.

For any data not subject to IRB oversight but may still contain PII, such as Google Earth images containing images of people not studied in the scope of the research project, we recommend blocking out or blurring any information that could be considered PII before publishing the data in the DDR. We still invite any researchers that are interested in seeing the raw data to contact the PI of the research project to try and attain that. See our Protected Data Best Practices for information on how to manage protected data in DDR.

Subsequent Publishing

Attending to the needs expressed by the community, we enable the possibility to publish data and other products subsequently within a project, each with a DOI. This arises from the longitudinal and/or tiered structure of some research projects such as experiments and field research missions which happen at different time periods, may involve multiple distinct teams, have the need to publish different types of materials or to release information promptly after a natural hazards event and later publish related products. Subsequent publishing is enabled in My Project interface where users and teams manage and curate their active data throughout their projects' lifecycle.

Timely Data Publication

Although no firm deadline requirements are specified for data publishing, as an NSF-funded platform we expect researchers to publish in a timely manner, so we provide recommended timelines for publishing different types of research data in our Timely Data Publication Best Practices.

Peer Review

Users that need to submit their data for revision prior to publishing and assigning a DOI have the opportunity to do so by: a) adding reviewers to their My Project, when there is no need for annonymous review, or b) by contacting the DesignSafe data curator through a Help ticket to obtain a Public Accessibility Data Delay (See below). Note that the data must be fully curated prior to requesting a Public Accessibility Delay.

Public Accessibility Delay

Many researchers request a DOI for their data before it is made publicly available to include in papers submitted to journals for review. In order to assign a DOI in the DDR, the data has to be curated and ready to be published. Once the DOI is in place, we provide services to researchers with such commitments to delay the public accessibility of their data publication in the DDR, i.e. to make the user’s data publication, via their assigned DOI, not web indexable through DataCote and or not publicly available in DDR's data browser until the corresponding paper is published in a journal, or for up to one year after the data is deposited. The logic behind this policy is that once a DOI has been assigned, it will inevitably be published, so this delay can be used to provide reviewers access to a data publication before it is broadly distributed. Note that data should be fully curated, and that while not broadly it will be eventually indexed by search engines. Users that need to amend/correct their publications will be able to do so via version control. See our Data Delay Best Practices for more information on obtaining a public accessibility delay.

Data Licenses

DDR provides users with 5 licensing options to accommodate the variety of research outputs generated and how researchers in this community want to be attributed. The following licenses were selected after discussions within our community. In general, DDR users are keen about sharing their data openly but expect attribution. In addition to data, our community issues reports, survey instruments, presentations, learning materials, and code. The licenses are: Creative Commons Attribution (CC-BY 4.0), Creative Commons Public Domain Dedication (CC-0 1.0), Open Data Commons Attribution (ODC-BY 1.0), Open Data Commons Public Domain Dedication (ODC-PPDL 1.0), and GNU General Public License (GNU-GPL 3). During the publication process users have the option of selecting one license per publication with a DOI. More specifications of these license options and the works they can be applied to can be found in Licensing Best Practices.

DDR also requires that users reusing data from others in their projects do so in compliance with the terms of the data original license.

The expectations of DDR and the responsibilities of users in relation to the application and compliance with licenses are included in the DesignSafe Terms of Use, the Data Usage Agreement, and the Data Publication Agreement. As clearly stated in those documents, in the event that we note or are notified that the licencing policies and best practices are not followed, we will notify the user of the infringement and may cancel their DesignSafe account.

Data Citation

DDR abides by and promotes the Joint Declaration of Data Citation Principles amongst its users.

We encourage and facilitate researchers using data from the DDR to cite it using the DOI and citation language available in the datasets landing page. The DOI relies on the DataCite schema for citation and accurate access.

For users publishing data in DDR, we enable referencing works and or data reused in their projects. For this we provide two fields, Related Work and Referenced Data, for citing data and works in their data publication landing page.

The expectations of DDR and the responsibilities of users in relation to the application and compliance with data citation are included in the DesignSafe Terms of Use, the Data Usage Agreement, and the Data Publication Agreement. As clearly stated in those documents, in the event that we note or are notified that citation policies and best practices are not followed, we will notify the user of the infringement and may cancel their DesignSafe account.

However, given that it is not feasible to know with certainty if users comply with data citation, our approach is to educate our community by reinforcing citation in a positive way. For this we implement outreach strategies to stimulate data citation. Through diverse documentation, FAQs webinars, and via emails, we regularly train our users on data citation best practices. And, by tracking and publishing information about the impact and science contributions of the works they publish citing the data that they use, we demonstrate the value of data reuse and further stimulate publishing and citing data.

Data Publication Agreement

This agreement is read and has to be accepted by the user prior to publishing a dataset.

This submission represents my original work and meets the policies and requirements established by the DesignSafe Policies and Best Practices. I grant the Data Depot Repository (DDR) all required permissions and licenses to make the work I publish in the DDR available for archiving and continued access. These permissions include allowing DesignSafe to:

  1. Disseminate the content in a variety of distribution formats according to the DDR Policies and Best Practices.
  2. Promote and advertise the content publicly in DesignSafe.
  3. Store, translate, copy, or re-format files in any way to ensure its future preservation and accessibility,
  4. Improve usability and/or protect respondent confidentiality.
  5. Exchange and or incorporate metadata or documentation in the content into public access catalogues.
  6. Transfer data, metadata with respective DOI to other institution for long-term accessibility if needed for continuos access.

I understand the type of license I choose to distribute my data, and I guarantee that I am entitled to grant the rights contained in them. I agree that when this submission is made public with a unique digital object identifier (DOI), this will result in a publication that cannot be changed. If the dataset requires revision, a new version of the data publication will be published under the same DOI.

I warrant that I am lawfully entitled and have full authority to license the content submitted, as described in this agreement. None of the above supersedes any prior contractual obligations with third parties that require any information to be kept confidential.

If applicable, I warrant that I am following the IRB agreements in place for my research and following Protected Data Best Practices.

I understand that the DDR does not approve data publications before they are posted; therefore, I am solely responsible for the submission, publication, and all possible confidentiality/privacy issues that may arise from the publication.

Data Usage Agreement

Users who access, preview, download or reuse data and metadata from the DesignSafe Data Depot Repository (DDR) agree to the following policies. If these policies are not followed, we will notify the user of the infringement and may cancel their DesignSafe account.

  • Use of the data includes, but is not limited to, viewing parts or the whole of the content; comparing with data or content in other datasets; verifying research results and using any part of the content in other projects, publications, or other related work products.
  • Users will not use the data in any way prohibited by applicable laws, distribution licenses, and permissions explicit in the data publication landing pages.
  • The data are provided “as is,” and its use is at the users' risk. While the DDR promotes data and metadata quality, the data authors and publishers do not guarantee that:
    1. the materials are accurate, complete, reliable or correct;
    2. any defects or errors will be corrected;
    3. the materials and accompanying files are free of viruses or other harmful components; or
    4. the results of using the data will meet the user’s requirements.
  • Use of data in the DDR abides by the DesignSafe Privacy Policy.
  • Users are responsible for abiding by the restrictions outlined by the data author in their publications' landing pages and by the DDR in this agreement, but they are not responsible for any restrictions not otherwise explicitly described here or in the landing pages.
  • Users will not obtain personal information associated with DDR data that results in directly or indirectly identifying research subjects, individuals, or organizations with the aid of other information acquired elsewhere.
  • Users will not in any event hold the DDR or the data authors liable for any and all losses, costs, expenses, or damages arising from use of DDR data or any other violation of this agreement, including infringement of licenses, intellectual property rights, and other rights of people or entities contained in the data.
  • We do not gather IP addresses about public users that preview or download files from the DDR.
  • Our system logs file actions completed by registered users in the DDR including previewing, downloading or copying published data to My Data or My Projects. We only use this information in aggregate for metrics purposes and do not link it to the user’s identity.

Amends and Version Control

Users can amend and version their data publications. Since the DDR came online, we have helped users correct and or improve the metadata applied to their datasets after publication. Most requests involve improving the text of the descriptions, changing the order of the authors, and adding references of papers publised using the data in the project; users also required the possibility to version their datasets. Our amends and version control policy derives from meeting our users needs.

Changes allowed during amends are:

  • Adding Related Works such as a paper they published after the data.
  • Correct typos and or improve the abstract and the keyword list.
  • Correct or add an award.
  • Change the order of the authors.

If users need to add or delete files or change the content of the files, they have the opportunity to version their data publication. The following are the

  • Versions will have the same DOI, and the title will indicate the version number. The decision to maintain the same DOI was agreed upon by our community to facilitate DOI management to data publishers and users.
  • Users will be able to view all existing versions in the publication's landing page.
  • The DOI will always resolve in the latest version of the publication.
  • Versions are documented by data publishers so other users understand what changed and why. The documentation is publicly displayed

Documentation of versions requires including the name of the file/s changed, removed or added, and identifying within which category they are located. We include guidance on how to document versions within the curation and publication onboarding instruction.

The Fedora repository manages all amends and versions so there is a record of all changes. Version number is passed to DataCite as metadata.

More information about the reasons for amends and versioning are in Publication Best Practices.

Leave Data Feedback

Users can click a “Leave Feedback” button on the projects’ landing pages to provide comments on any publication. This feedback is forwarded to the curation team for any needed actions, including contacting the authors. In addition, it is possible for users to message the authors directly as their contact information is available via the authors field in the publication landing pages. We encourage users to provide constructive feedback and suggest themes they may want to discuss about the publication in our Leave Data Feedback Best Practices

Data Impact

We understand data impact as a strategy that includes complementary efforts at the crossroads of data discoverability, usage metrics, and scholarly communications.

Search Engine Optimization (SEO)

We have in place SEO methods to enhance the web visibility of the data publications. To increase discoverability and indexing of our publications we follow guidance from Google Search Console and Google Data Search.

Data Usage Metrics

Our metrics follow the Make your Data Count Counter Code of Practice for Research Data.

Below are the definitions for each metric:

File Preview: Examining data in the portal such as clicking on a file name brings up a modal window that allows previewing files. Not all document types can be previewed. Among those that can are: text, spreadsheets, graphics and code files. (example extensions: .txt, .doc, .docx, .csv, .xlsx, .pdf, .jpg, .m, .ipynb). Those that can't include binary executables, MATLAB containers, compressed files, and video (eg. .bin, .mat, .zip, .tar, mp4, .mov).

File Download: Copying a file to the machine the user is running on, or to a storage device that machine has access to. This can be done by ticking the checkbox next to a document and selecting "Download" at the top of the project page. With documents that can be previewed, clicking "Download" at the top of the preview modal window has the same effect. Downloads are counted per project and per individual files. We also consider counts of copying a file from the published project to the user's My data, My projects, or to Tools and applications in DesignSafe or one of the connected spaces (Box, Dropbox, Google Drive). Tick the checkbox next to a document and select "Copy" at the top of the project page.

File Requests: Total file downloads + total file previews.

Project Downloads: Total downloads of a compressed entire project to a user's machine.

We report the metrics in the publications landing pages. To provide context to the metrics, we indicate the total amount of files in each publication.

We started counting since May 17, 2021. We update the reports on a monthly basis and we report data metrics to NSF every quarter. Currently we are in the process of formatting the reports to participate in the Make your Data Count initiative.

Data Vignettes

Since 2020 we conduct Data Reuse Vignettes. For this, we identify published papers and interview researchers that have reused data published in DDR. In this context, reuse means that researchers are using data published by others for purposes different than those intended by the data creators. During the interviews we use a semi-structured questionnaire to discuss the academic relevance of the research, the ease of access to the data in DDR, and the understandability of the data publication in relation to metadata and documentation clarity and completeness. We feature the data stories on the DesignSafe website and use the feedback to make changes and to design new reuse strategies. The methodology used in this project was presented at the International Qualitative and Quantitative Methods in Libraries 2020 International Conference . See Perspectives on Data Reuse from the Field of Natural Hazards Engineering.

Data Awards

In 2021 we launched the first Data Publishing Award to encourage excellence in data publication and to stimulate reuse. Data publications are nominated by our user community based on contribution to scientific advancement and curation

Data Preservation

Data preservation encompasses diverse activities carried out by all the stakeholders involved in the lifecycle of data, from data management planning to data curation, publication and long-term archiving. Once data is submitted to the Data Depot Repository (DDR,) we have functionalities and Guidance in place to address the long-term preservation of the submitted data.

The DDR has been operational since 2016 and is currently supported by the NSF from October 1st, 2020 through September 30, 2025. During this award period, the DDR will continue to preserve the natural hazards research data published since its inception, as well as supporting preservation of and access to legacy data and the accompanying metadata from the Network for Earthquake Engineering Simulation (NEES), a NHERI predecessor, dating from 2005. The legacy data comprising 33 TB, 5.1 million files,2 and their metadata was transferred to DesignSafe in 2016 as part of the conditions of the original grant. See NEES data here.

Data in the (DDR) is preserved according to state-of-the art digital library standards and best practices. DesignSafe is implemented within the reliable, secure, and scalable storage infrastructure at the Texas Advanced Computing Center (TACC), with 20 years of experience and innovation in High Performance Computing. TACC is currently over 20 years old, and TACC and its predecessors have operated a digital data archive continuously since 1986 – currently implemented in the Corral Data Management system and the Ranch tape archive system, with capacity of approximately half an exabyte. Corral and Ranch hold the data for DesignSafe and hundreds of other research data collections. For details about the digital preservation architecture and procedures for DDR go to Data Preservation Best Practices.

Within TACC’s storage infrastructure a Fedora repository, considered a standard for digital libraries, manages the preservation of the published data. Through its functionalities, Fedora assures the authenticity and integrity of the digital objects, manages versioning, identifies file formats, records preservation events as metadata, maintains RDF metadata in accordance to standard schemas, conducts audits, and maintains the relationships between data and metadata for each published research project and its corresponding datasets. Each published dataset in DesignSafe has a Digital Object Identifier, whose maintenance we understand as a firm commitment to data persistence.

While at the moment DDR is committed to preserve data in the format in which it is submitted, we procure the necessary authorizations from users to conduct further preservation actions as well as to transfer the data to other organizations if applicable. These permissions are granted through our Data Publication Agreement, which authors acknowledge and have the choice to agree to at the end of the publication workflow and prior to receiving a DOI for their dataset.

Data sustainability is a continuous effort that DDR accomplishes along with the rest of the NHERI partners. In the natural hazards space, data is central to new advances, which is evidenced by the data reuse record of our community and the following initiatives:

Continuity of Access

As part of the requirements of the current award we have a contingency pan in place to transfer all the DDR data, metadata and corresponding DOIs to a new awardee (should one be selected) without interruption of services and access to data. Fedora has export capabilities for transfer of data and metadata to another repository in a complete and validated fashion. The portability plan is confirmed and updated in the Operations Project Execution Plan that we present anually to the NSF.

In the case in which the NSF and/or the other stakeholders involved in this community decide not to continue the NHERI program or a subsequent data repository, we will continue to preserve the published data and provide access to it through TACC, DesignSafe’s host at the University of Texas at Austin. TACC has formally committed to preserving the data with landing pages and corresponding DOIs indefinitely. TACC has on permanent staff a User Services team as well as curators that will attend users’ requests /and help tickets related to the data. Because TACC is constantly updating its high-performance storage resources and security mechanisms, data will be preserved at the same preservation level that is currently available. Considering that DOIs are supported through the University of Texas Libraries and that the web services and the data reside within TACC’s managed resources, access to data will not be interrupted. Fedora is now part of TACC’s software suite and we will continue its maintenance as our preservation repository. Like with all systems at TACC, we will revisit its versioning and continuity and make decisions based on state-of-the-art practices. Should funding constraints ever make this no longer possible, TACC will continue to keep an archive copy on Ranch (with landing pages on online storage) for as long as TACC remains a viable entity.