Skip to Main Content
It looks like you're using Internet Explorer 11 or older. This website works best with modern browsers such as the latest versions of Chrome, Firefox, Safari, and Edge. If you continue with this browser, you may see unexpected results.

Research Data Management Guide

A guide for managing data.

Documenting Your Data

Regardless of your discipline, part of your Data Management Plan should include descriptive documentation about your data; this information can be stored in a readme.txt file or embedded within your data set. Some of the components that you'll want to document include:

  • GENERAL OVERVIEW: a general overview of your project, including who is in charge of it, dates, how the data was generated, how it was funded, etc.
  • CONTENT DESCRIPTION: a description of your data content, such as listing the variables, codes, and language used
  • TECHNICAL DESCRIPTION: a description of your file inventory, file formats, organization, and version stamps
  • ACCESS: a description of any intellectual property rights, licenses, restrictions on the use of the data, as well as where and how your data will be accessed by others

Visit the DMPTool's Data Management General Guidance page for more information about the types of components you should be documenting.

Creating a Data Dictionary

Data dictionaries provide detailed information about the contents of your dataset. This may include the names of variables, types of formats, text descriptions, possible values of the variables, and explanations of any potential relationship between variables (such as calculations).

 

When collecting data, make sure you are recording data in its rawest form (i.e., recording height and weight, not a calculated BMI). This will  ensure you are not locked into using a specific calculation, and will allow you to conduct further calculations in the future. Also, make sure to be explicit about units of measurement and variables -- the last thing you want to do is not document this, and then discover that your data has been documented using varying methods of measurement. Having non-uniform data measurements may result in time lost to data cleanup, or even loss of data. 


Table from NYU Health Sciences Library

File Naming Conventions

What makes sense to you today might not make sense to someone else (or to your future self). When naming your files and folders, be specific and consistent. File names may include information such as the file creator or project lead's name or initials, the project name, the date the file was created, the location, and/or the version number.

Other suggestions:

  • Don’t use special characters; instead, use hyphens or underscores -- just make sure your use is consistent
  • Try to keep file names as succinct and descriptive as possible
  • For dates, ISO format is suggested (YYYYMMDD)
  • If you will have multiple versions, make sure to lead your version number with zeros

Examples:

  • YYYYMMDD-ProjectName-Version:   20220103-RDM_Guide-v01
  • ProjectName-Creator-Version:   RDM_Guide-Jackson-v02

Organizing Your Data

Your data should be organized in a clear and consistent way, ensuring that anyone accessing your data in the future (including yourself!) will be able to find what they are looking for. This should include creating a standardized directory structure for the storing of your files and using standard file naming conventions. It is highly recommended that you create a .readme file for each folder you create, detailing the contents of the folder and any special information that someone browsing the directory would need to know. 

Example of an organized file structure:

Your data should be organized and you should document all of the information that would allow someone to reproduce your research. 

Accessibility | Proxy Logout