Glossary

Active research data: Research data files that are in the process of continuous change and/or development. Files containing this data are accessed, amended and/or updated as new data is gathered and/or processed. Some datasets may never be ‘finished’. A ‘snapshot’ of active research data can be archived to create a version that is fixed and can be cited. (Cambridge University Libraries Digital Preservation Policy)

Archive: A data archive is a site where machine-readable materials are stored, preserved, and possibly redistributed to individuals interested in using the materials (noun). (ICPSR: Glossary of Social Science Terms)

To archive is to transfer records from the individual or office of creation to a repository authorized to appraise, preserve, and provide access to those records (verb). (SAA: A Glossary of Archival and Records Terminology)

In computing/information technology: To archive (verb) is to store data offline. An archive (noun) is data stored offline. (SAA: A Glossary of Archival and Records Terminology)

At-risk data: Data that are at risk of being lost. At-risk data include data that are not easily accessible, have been dispersed, have been separated from the research output object, are stored on a medium that is obsolete or at risk of deterioration, data that were not recorded in digital form, and digital data that are available but are not useable because they have been detached from supporting data, metadata, and information needed to use and interpret them intelligently. (The CASRAI Dictionary)

Backup: A copy of all or portions of software or data files on a system kept on storage media, such as tape or disk, or on a separate system so that the files can be restored if the original data is deleted or damaged (noun). To back up is to create such copies of data (verb).

In computing/information technology, 'archive' is commonly used as a synonym for 'backup' and 'back up'. (SAA: A Glossary of Archival and Records Terminology)

Digital object identifier (DOI): A name (not a location) for an entity on digital networks. It provides a system for persistent and actionable identification and interoperable exchange of managed information on digital networks. A DOI is a type of Persistent Identifier (PID) issued by the International DOI Foundation. This permanent identifier is associated with a digital object that permits it to be referenced reliably even if its location and metadata undergo change over time. (The CASRAI Dictionary)

Metadata: Metadata is the data that describes an item such as a data set. Having metadata associated with a data set enables it to be found and cited. It provides other researchers with the information they require to understand the data. Metadata gives context to research data by providing descriptive detail about it. It offers standardized, structured information explaining data in terms of, for example, purpose, origin, time references, geographic location, creator, access conditions, and terms of use of a data collection. Used to enable resource discovery, metadata can provide pathways for searching existing data; present as a bibliographic record for citation; or facilitate online browsing of data.

An example of a metadata schema, or element set, is the widely used Dublin Core metadata schema. Examples of metadata elements are title, contributor, creator, subject, description, type, format, date, relation, and identifier. (Science Europe Data Glossary)

Open access: Typically used to describe publications, open access refers to online, freely available material that has few or no copyright or licensing restrictions. (Suber, 2004) (Cornell University Glossary of Data Management Terms)

Open researcher and contributor ID (ORCID): ORCID is a non-profit organization which manages ORCID IDs which are unique identifiers for researchers. The ORCID ID allows for both researcher disambiguation and a method of accurately associating authors with all their works regardless of how their name and institutional affiliations change over time. (National Network of Libraries of Medicine Data Thesaurus)

Open source: A philosophy and methodology (often used by software developers) where the source code is made freely available, so that others may continue to develop the software. Open source is the philosophical opposite of proprietary. (Cambridge University Libraries Digital Preservation Policy)

Persistent unique identifiers (PIDs): Persistent unique identifiers provide a means of long-lasting identification of digital objects that are global, standardized, and widely used in the digital environment and can provide information on the object, regardless of where the object is located. Persistent unique identifiers include DOIs, ARKs, Handles, and ORCiDs. Assigning persistent unique identifiers to data helps to provide a method to locate data in the vast amounts of research data generated on a daily basis. (National Network of Libraries of Medicine Data Thesaurus)

Preservation: Digital preservation combines policies, strategies and actions to ensure the accurate rendering of authenticated content over time, regardless of the challenges of media failure and technological change. Digital preservation applies to both born digital and reformatted content. ("Definitions of Digital Preservation", American Library Association, February 21, 2008).

Data preservation consists of a series of managed activities necessary to ensure continued access to data for as long as necessary. Data preservation requires ongoing active management of data from as early in the lifecycle as possible. (National Network of Libraries of Medicine Data Thesaurus)

Public access policies: Public Access policies ensure that the results of research are freely available to the public. This term is generally used by funders to policies that align with the objectives of the OSTP memo, "Increasing Access to the Results of Federally Funded Scientific Research." (Cornell University Glossary of Data Management Terms)

README file: A readme file provides information about a data file and is intended to help ensure that the data can be correctly interpreted, by yourself at a later date or by others when sharing or publishing data. Standards-based metadata is generally preferable, but where no appropriate standard exists, for internal use, writing “readme” style metadata is an appropriate strategy. (Cornell University Research Data Management Service Group)

Repository: Repositories preserve, manage, and provide access to many types of digital materials in a variety of formats. Materials in online repositories are curated to enable search, discovery, and reuse. There must be sufficient control for the digital material to be authentic, reliable, accessible and usable on a continuing basis. (The CASRAI Dictionary)

Research data: The recorded factual material commonly accepted in the scientific community as necessary to validate research findings, but not any of the following: preliminary analyses, drafts of scientific papers, plans for future research, peer views, or communications with colleagues. Research data may be in hard-copy form (including research notes, laboratory notebooks, or photographs) or in electronic form, such as computer software, computer storage/backup, or digital images.

Research data are not limited to raw experimental results and instrument outputs; they encompass associated protocols, numbers, graphs, tables, and charts used to collect and reconstruct the data. Research data include numbers, field notes or observations, procedures for data analysis and/or reduction, data obtained from interviews, or surveys, computer files and databases, research notebooks or laboratory journals, slides, audio/video recordings, and/or photographs. (UNH Policy on Ownership, Management, and Sharing of Research Data)

Research data management: Research data management is an explicit process covering the creation and stewardship of research materials to enable their use for as long as they retain value. (Digital Curation Centre (DCC) Glossary)

Research data lifecycle: The data lifecycle represents all of the stages of data throughout its life from its creation for a study to its distribution and reuse. The data lifecycle begins with a researcher(s) developing a concept for a study; once a study concept is developed, data is then collected for that study. After data is collected, it is processed for distribution so that it can be archived and used by other researchers at a later date. Once data reaches the distribution stage of the lifecycle, it is stored in a location (i.e. repository, registry) where it can then be discovered by other researchers. Data discovery leads to the repurposing of data, which creates a continual loop back to the data processing stage where the repurposed data is archived and distributed for discovery. (National Network of Libraries of Medicine Data Thesaurus)

Research lifecycle: The research lifecycle is the process that a researcher takes to complete a project or study from its inception to its completion. Research data management is involved in each step of the research process. (National Network of Libraries of Medicine Data Thesaurus)

Research materials: Research materials are tangible physical objects from which data are obtained such as, environmental samples, biological specimens, cell lines, derived reagents, drilling core samples, or genetically-altered microorganisms. While these are not considered to be research data, they should be retained consistent with disciplinary standards. (UNH Policy on Ownership, Management, and Sharing of Research Data)

Glossary

Data Services