- Q: What are the elements of a data management plan?
A: This table is on the webpage Elements of a Good Data Management Plan
Example text is from the ICPSR Webinar PowerPoint slides on data management plans presented by Amy Pienta, director of data acquisitions.
The examples and descriptions are mainly from social science sources as the NIH has had data management plan requirements since 2003.
This list of elements is informed by a gap analysis that ICPSR conducted of existing recommendations for data management plans and other forms of guidance made available for researchers generating data. The result of the gap analysis was a comparison of existing forms of guidance. Elements that are highly recommended for inclusion in effective data management plans are noted.
Element Description Recommended? Data description A description of the information to be gathered; the nature and scale of the data that will be generated or collected. Generic Example 1: This project will produce public-use nationally representative survey data for the United States covering Americans' social backgrounds, enduring political predispositions, social and political values, perceptions and evaluations of groups and candidates, opinions on questions of public policy, and participation in political life. Generic Example 2: [Provide a brief description of the information to be gathered -- the nature, scope, and scale of the data that will be generated or collected.] These data, which will be submitted to [repository], fit within the scope of the [repository] Collection Development Policy. A letter of support describing [repository]'s commitment to the data as they have been described is provided. Data description is important to include because it will help reviewers understand the characteristics of the data, their relationship to existing data, and any disclosure risks that may apply. Highly recommended. Existing data A survey of existing data relevant to the project and a discussion of whether and how these data will be integrated. Generic Example 1: Data have been collected on this topic previously (for example: [add example(s)]). The data collected as part of this project reflect the current time period and historical context. It is possible that several of these datasets, including the data collected here, could be combined to better understand how processes have unfolded over time. Recommended Format Formats in which the data will be generated, maintained, and made available, including a justification for the procedural and archival appropriateness of those formats. Generic Example 1: Quantitative survey data files generated will be processed and submitted to the [repository] as [standard-format] files with DDI XML documentation. The data will be distributed in several widely used formats, including ASCII, tab-delimited (for use with Excel), [standard processing software]. Documentation will be provided as PDF. Data will be stored as ASCII along with setup files for the statistical software packages. Documentation will be preserved using XML and PDF/A. Generic Example 2: Digital video data files generated will be processed and submitted to the [repository] in MPEG-4 (.mp4) format. Depositing data and documentation in formats preferred for archiving can make the processing and release of data faster and more efficient.Preservation formats should be platform-independent and non-proprietary to ensure that they will be usable in the future. Highly recommended. Metadata A description of the metadata to be provided along with the generated data, and a discussion of the metadata standards used. Generic Example 1: Metadata will be tagged in XML using the Data Documentation Initiative (DDI) format. The codebook will contain information on study design, sampling methodology, fieldwork, variable-level detail, and all information necessary for a secondary analyst to use the data accurately and effectively. Data Citation with Digital Object Identifier (DOI). A standard citation will be provided to facilitate attribution. The DOI provides permanent identification for data & ensures that they will always be found at the URL. Good descriptive metadata are essential to effective data use. Metadata are often the only form of communication between the secondary analyst and the data producer, so they must be comprehensive and provide all of the needed information for accurate analysis.Structured or tagged metadata, like the XML format of the Data Documentation Initiative (DDI) standard, are optimal because the XML offers flexibility in display and is also preservation-ready and machine-actionable. Highly recommended. Data organization How the data will be managed during the project, with information about version control, naming conventions, etc. Generic Example 1: Data will be stored in a CVS system and checked in and out for purposes of versioning. Variables will use a standardized naming convention consisting of a prefix, root, suffix system. • Quality Assurance Procedures for ensuring data quality during the project. Generic Example 1: Quality assurance measures will comply with the standards, guidelines, and procedures established by the [appropriate sub-discipline organization]. Generic Example 2: For quantitative data files, the [repository] ensures that missing data codes are defined, that actual data values fall within the range of expected values and that the data are free from wild codes. Processed data files are reviewed by a supervisory staff member before release. Producing data of high quality is essential to the advancement of science, and every effort should be taken to be transparent with respect to data quality measures undertaken across the data life cycle. • Storage and backup Storage methods and backup procedures for the data, including the physical and cyber resources and facilities that will be used for the effective preservation and storage of the research data. Generic Example 1: [Repository] will place a master copy of each digital file (i.e., research data files, documentation, and other related files) in Archival Storage, with several copies stored at designated locations and synchronized with the master through the Storage Resource Broker. Digital data are fragile and best practice for protecting them is to store multiple copies in multiple locations. Highly recommended. Security A description of technical and procedural protections for information, including confidential information, and how permissions, restrictions, and embargoes will be enforced. Security for digital information is important over the data life cycle. Processed data may or may not contain disclosure risk and should be secured in keeping with the level of disclosure risk inherent in the data. Secure work and storage environments may include access restrictions (e.g., passwords), encryption, power supply backup, and virus and intruder protection. Recommended Responsibility Names of the individuals responsible for data management in the research project. Generic Example 1: The project will assign a qualified data manager certified in disclosure risk management to act as steward for the data while they are being collected, processed, and analyzed. Typically data are owned by the institution awarded a Federal grant and the principal investigator oversees the research data (collection and management of data) throughout the project period. It is important to describe any atypical circumstances. For example, if there is more than one principal investigator the division of responsibilities for the data should be described. Recommended Budget The costs of preparing data and documentation for archiving and how these costs will be paid. Requests for funding may be included. Generic Example 1: Staff time has been allocated in the proposed budget to cover the costs of preparing data and documentation for archiving. The [repository] has estimated their additional cost to archive the data is [insert dollar amount]. This fee appears in the budget for this application as well. • Intellectual property rights Entities or persons who will hold the intellectual property rights to the data, and how IP will be protected if necessary. Any copyright constraints (e.g., copyrighted data collection instruments) should be noted. Highly recommended. Legal requirements A listing of all relevant federal or funder requirements for data management and data sharing. Generic Example 1: The proposed medical records research falls under the HIPAA Privacy Rule. Consequently, the investigators will provide documentation that an alteration or waiver of research participants' authorization for use/disclosure of information about them for research purposes has been approved by an IRB or a Privacy Board. Some data have legal restrictions that impact data sharing--for example, data covered by HIPAA, proprietary data, and data collected through the use of copyrighted data collection instruments. How these issues might impact data sharing should be described fully in the data management plan. Recommended Access and sharing A description of how data will be shared, including access procedures, embargo periods, technical mechanisms for dissemination and whether access will be open or granted only to specific user groups. A timeframe for data sharing and publishing should also be provided. Generic Example 1: The research data from this project will be deposited with [repository] to ensure that the research community has long-term access to the data. Generic Example 2: The project team will create a dedicated Web site to manage and distribute the data because the audience for the data is small and has a tradition of interacting as a community. The site will be established using a content management system like Drupal or Joomla so that data users can participate in adding site content over time, making the site self-sustaining. The site will be available at a .org location. For preservation, we will supply periodic copies of the data to [repository]. That repository will be the ultimate home for the data. Generic Example 3: The research data from this project will be deposited with [repository] to ensure that the research community has long-term access to the data. The data will be under embargo for one year while the investigators complete their analyses. Generic Example 4: The research data from this project will be deposited with the institutional repository on the grantees’ campus. Sharing data helps to advance science and to maximize the research investment. A recent paper [link: http://deepblue.lib.umich.edu/handle/2027.42/78307] reported that when data are shared through an archive, research productivity increases and many times the number of publications result as opposed to when data are not shared. With respect to timeliness of data deposit, archival experience has demonstrated that the durability of the data increases and the cost of processing and preservation decreases when data deposits are timely. It is important that data be deposited while the producers are still familiar with the dataset and able to transfer their knowledge fully to the archive. Highly recommended. Audience The potential secondary users of the data. The audience for the data may influence how the data are managed and shared--for example, when audiences beyond the academic community may use the research data. Recommended Selection and retention periods A description of how data will be selected for archiving, how long the data will be held, and plans for eventual transition or termination of the data collection in the future. Generic Example 1: Our project will generate a large volume of data, some of which may not be appropriate for sharing since it involves a small sample that is not representative. The investigators will work with staff of the [repository] to determine what to archive and how long the deposited data should be retained. Not all data need to be preserved in perpetuity, so thinking through the proper retention period for the data is important, in particular when there are reasons the data will not be preserved permanently. Recommended Archiving and preservation The procedures in place or envisioned for long-term archiving and preservation of the data, including succession plans for the data should the expected archiving entity go out of existence. Generic Example 1: By depositing data with [repository], our project will ensure that the research data are migrated to new formats, platforms, and storage media as required by good practice. Generic Example 2: In addition to distributing the data from a project Web site, future long-term use of the data will be ensured by placing a copy of the data into [repository], ensuring that best practices in digital preservation will safeguard the files. Digital data need to be actively managed over time to ensure that they will always be available and usable. This is important in order to preserve and protect our investment in science. Preservation of digital information is widely considered to require more constant and ongoing attention than preservation of other media. Depositing data resources with a trusted digital archive can ensure that they are curated and handled according to good practices in digital preservation. Highly recommended. Ethics and privacy A discussion of how informed consent will be handled and how privacy will be protected, including any exceptional arrangements that might be needed to protect participant confidentiality, and other ethical issues that may arise. Generic Example 1: The following language will be used in the informed consent: The information in this study will only be used in ways that will not reveal who you are. You will not be identified in any publication from this study or in any data files shared with other researchers. Your participation in this study is confidential. Federal or state laws may require us to show information to university or government officials [or sponsors], who are responsible for monitoring the safety of this study. Highly recommended.