The Rolling Deck to Repository program (R2R; rvdata.us) provides shore-side data management for routine
underway environmental sensor data collected on US academic research vessels. This work is a collaboration with vessel operators
and chief scientists. The Research Vessel Data Management Roles and Responsibilities
document gives a high-level description of
the various operator, Chief Scientist and R2R responsibilities, whereas this document describes in detail how operators should
provide their data to R2R. For guidance on metadata that R2R expects from ship operators please see Metadata Requirements.
- A description of all standard underway devices onboard the vessel, including make, model, location and filenames of data in directory structure
and data format description. R2R should be notified of any changes in equipment or data location before sending a cruise data distribution.
Operators are encouraged to use the standard directory structure
for cruise data developed by the R2R program and the operator community.
- Cruise data distribution(s):
- Includes only vessel operated, underway data in standard format as described to R2R for this operator/vessel
- Includes Navigation Data (at least one clearly-identified, full resolution GPS data set for creating a simple ship track on a location map)
- Includes an md5deep text file in the root directory (this is explained below).
- Excludes any science party data, Personal Identifiable Information (PII), recreational communications
or imagery, anything that would be
copyright infringement (music, movies, …), etc.
- Excludes data collected by dedicated facilities such as the National Marine Seismic Facility (NMSF),
National Deep Submergence Facility (NDSF) or
The Ocean Bottom Seismic Instrument Center (OBSIC). Such data are already
archived by other programs.
- Cruise metadata, including cruise ID, chief scientist, departure/arrival ports, dates, and funding source information. The best way to submit this information to R2R is by entering it into the Marine Facility Planning system. If you do not use the MFP, you are responsible for contacting R2R for further guidance. Do not change cruise identifiers (i.e. cruise codes such as RR1808) once data or metadata related to that cruise have been provided to R2R. R2R uses cruise identifiers to uniquely identify cruises throughout its workflow, and Cruise ID changes require substantial effort.
- Submit Data Early
- R2R should receive new cruise data distributions (“distros”) on at least a quarterly basis.
- Timely submission of data expedites the data publication process and gives R2R more time to verify and validate network-transmitted data.
- Avoid Data Update Requests
- It is difficult to make changes to data once they have been processed at R2R, and particularly if they have been archived at the NOAA National Centers for Environmental Information.
- Please provide complete, finalized distros.
- If changes are needed, please communicate with R2R in advance of sending a new distro.
- Separate Oversize Filesets
- Filesets over 500 gigabytes, such as those commonly generated by the SBP-29 & EK-80, should be separated from the rest of their distros and sent to R2R as separate tarballs.
- The directory structure of the separate distro should match that of the original such that, if both were merged, the result would be identical to the original distro.
- Network Transfers (Globus)
- R2R is moving away from physical storage handoffs to network transfers.
- R2R's preferred network transfer method is Globus.
- Globus performs checks on all transmitted data to ensure file integrity remains uncompromised, and ensures security of the transfer.
- Pushing data to R2R's endpoint does not require a subscription.
- If you are pushing a particularly large distro, please let us know by emailing info@rvdata.us beforehand!
- R2R is happy to help you set up and configure Globus.
- Include Deep Checksum Files (.md5deep extension)
- md5deep files list the contents of a distro and generate md5 checksums for each file. R2R uses these files to verify that its copy of the distro matches the operator's copy.
- These files are created "manually" by running the md5deep command line tool on the completed cruise distro (after the cruise has ended). To do this, the tool must first be installed. Follow the instructions below:
- First verify that the hashdeep command line tool isn't already installed. If it is, then you should be able to call it from the command line with
md5deep -h. If your terminal doesn't recognize the command then follow the appropriate step below.
- If you're on Linux or Mac OS then install the hashdeep (same thing as md5deep) command line tool using your system's package manager. For example:
dnf install hashdeep for RHEL, apt install hashdeep for Ubuntu, or install hashdeep for Mac OS.
- If you're on Windows then download the Github release binary (the one called "md5deep-4.4.zip") here. Unzip the downloaded folder and locate the correct ".exe" file:
md5deep64.exe. Add the directory containing this file to your system's PATH environment variable.
- Once hashdeep is installed, navigate into the directory containing the target cruise directory. For this example assume the cruise directory is named
SP2411.
- Extract/Unzip the cruise directory if you haven't already - md5deep operates on directories not files.
- If you're on Mac OS/Linux then run this command:
md5deep -c -r -l -o f -t -z SP2411 > SP2411/SP2411.md5deep. This command was tested with the 4.4 version of md5deep.
- If you're on Windows run:
md5deep64.exe -c -r -l -o f -t -z SP2411 > SP2411\SP2411.md5deep
- Wait for the command to complete. Once it does, you'll see the md5deep file inside your cruise directory.
- Safe and Effective File Naming
- Include a date string in your raw data filenames as
YYYY-MM-DD. If a device produces multiple files a day then use the hour/minute/second/millisecond at the time of acquisition to further distinguish as needed. Avoid Julian Day date/time strings as it's hard to detect and parse. We do actually use these date/time strings to verify that each file belongs to the correct cruise package.
- Don't put special characters in filenames. Examples of problem file names:
- “Seapath_330_User's_Guide.pdf”: this shouldn't have an apostrophe.
- “Knudsen comparison with SBP39_top line of Oceanus Z_W-bound_same line, same time.jpg”: this shouldn't have commas or spaces.
- Regularly Maintain Device Format Documentation
- Many data users need more information to fully use a data set than is contained in data files alone. Data format descriptions meet this need by supplying this information. In addition, these documents will be useful to R2R in the future to support search-by-parameter features and automated checks on the data.
-
A data format description document should be included for each device type in the distro in a top-level “docs” directory. These descriptions should contain the name of the device, parameter names/description of all the fields in the data, structure, units, etc. Please include this for each NMEA string if you are using NMEA formats. Use this example as a template:
- Keep format documents up to date through time.
- Data-Specific Asks
- Check for empty files i.e. not zero-byte but missing required data.
- Include gravity ties and magnetometer layback in every distro even if not collected during an active cruise. Store these files in a consistent directory with a consistent naming scheme.
- If gravimeter data is part of a data acquisition system then gravity ties should be stored in the acquisition system's docs directory. If gravimeter data is a standalone data set within the distro then gravity ties should be stored in the distro's top-level docs directory. Keep gravity tie filenames consistent. A good filename format is
gravtie_<date>.pdf. The exact format doesn't need to match exactly but must include the date and must be consistent.
- Confirm water column sonar data contains navigation.
- Lack of navigation in wcd data creates downstream issues for archiving at NCEI.
Globus is the preferred method for transferring cruise data to R2R. The R2R team would be happy to work with operators/technicians on getting Globus set up. If Globus is not feasible, please contact R2R to discuss a SFTP SSH/rsync or cloud download approach.
To transfer data via Globus:
- Get Globus Connect: Download Globus Personal Connect or check for institutional site license.
- Create Account: Register here.
- Set Up Endpoint: Configure an endpoint for your cruise data directory.
- Share ID: Send your Globus ID to info@rvdata.us. We'll set up your landing spot.
- Test Transfer: Upload a small test file to R2R to verify access.
- Notify for Large Transfers: Please inform us before uploading large cruise data so we can confirm disk space.
- Help/Tutorials: link
Please Contact Us
VESSELS
50
RESEARCH CRUISES
10,087
DATA SETS ARCHIVED
61,499
DOWNLOADABLE FILES
23,879,223
Copyright © 2025 Rolling Deck to Repository (R2R). All Rights Reserved. | Contact Us
Hosted at the Lamont-Doherty Earth Observatory of Columbia University