UMGC Sequencing Data Storage at MSI

UMGC Sequencing Data Storage Timeline (2011 - Present)

MSI has historically provided temporary storage for sequencing data generated from UMGC, subject to data retention policies. The retention policy that applies to the data depends on when it was sequenced.

  • All sequencing data in a group's 'data_release' directory (e.g. /home/group_name/data_release) added during 2017 or prior is subject to the MSI five-year retention policy for 'data_release'.

  • Sequencing data added to ‘data_release’ during 2018 until October 1st, 2021, is managed separately by the University of Minnesota Genomics Center (UMGC).

  • Sequencing data added on or after October 1, 2021, is managed by MSI’s Shared User Research Facilities Storage (SURFS) system,  and will be stored in a group’s ‘data_delivery’ directory (e.g. /home/group_name/data_delivery) and is subject to the MSI one-year retention policy for SURFS data. For details, see here: https://www.msi.umn.edu/surfs

WHEN DATA HAS EXPIRED, IT WILL BE DELETED FROM MSI SYSTEMS. FURTHER STORAGE IS THE RESPONSIBILITY OF THE RESEARCHER. Please see below for options and processes. 

Below is a timeline that shows the changes that have taken place with regards to the storage of UMGC-sequenced data on MSI's Tier 1 Storage.

UMGC Data Cycle

Quarterly Deletion of Expired and Expiring Data

Beginning Quarter 4 of 2021, MSI is now resuming the quarterly deletion process of pre-2018 ‘data_release’ data that was put on hold due to the COVID-19 pandemic. Moving forward, MSI will also be maintaining a quarterly deletion process for data deposited from 10/1/21 in a group’s ‘data_delivery’ directory. 

 

This means that data will now be deleted on a quarterly basis as it reaches the appropriate retention limit. PIs and Group Administrators with expiring data will now receive a notification at the beginning of the quarter with a list of data files that will be expiring, and they will have approximately three months to transfer their data if they wish to retain it. Expired data will then be deleted at the end of the quarter. These policies will apply both to pre-2018 ‘data_release’ data as it reaches its 5 year retention limit, and data deposited in ‘data_delivery’ on or after October 1, 2021 as it reaches its 1 year retention limit. 

Quarter 1 (Q1) expiration = Notification will be sent in January; data will be deleted March 31

Quarter 2 (Q2) expiration = Notification will be sent in April; data will be deleted June 30

Quarter 3 (Q3) expiration = Notification will be sent in July; data will be deleted September 30

Quarter 4 (Q4) expiration = Notification will be sent in October; data will be deleted December 31

 

Currently in 2021, a bulk of the pre-2018 data* has expired or will be expiring within the next year. Note that data upon the commencement of this deletion process, data that has already expired (i.e. data that was sequenced Q3-2016 or earlier) will be deleted at the end of Q4-2021 (December 31, 2021), alongside any normally expiring Q4-2016 data.

*data that has been sequenced and deposited into ‘data_release’ in Quarter 4 of 2017 or earlier, and is subject to the 5-year retention policy.

Pre-2018 Data Transferred to Google Shared Drive

All pre-2018 data has been backed up to a Shared Google Drive. This Shared Drive has been shared with your group's PI and Group Administrators. To access this Shared Drive, sign into your UMN Google Drive, and on the left hand bar, select "Shared Drives" (see image below). In your list of Shared Drives, you should find the drive named msi-datarelease <group>, replacing <group> with your group name. This drive contains your pre-2018 data files.

Image of google drive options (left hand bar once you've signed into drive.google.com) showing how to select the "shared drives" tab.

Alternatively, an email has also been sent providing access to this drive. You may wish to try searching your UMN inbox for an email:

  • From: msi-datarelease University of Minnesota (via Google Drive) <drive-shares-noreply@google.com>

  • Subject: You’ve been added to the shared drive msi-datarelease <group>

Archiving data before it expires and is deleted from MSI systems

If you are done analyzing the data and simply need to archive it, there are several possible storage options both at MSI and the wider university. You can explore your options using OIT’s digital storage options chooser tool here:

https://it.umn.edu/services-technologies/comparisons/select-digital-storage-options

 

Another alternative is to submit raw sequencing data to NCBI’s Sequence Read Archive (SRA), and use that repository as the permanent, long-term archive of your data.

Downloading data to a local computer

For instructions on how to download your data to a local computer, please see UMGC’s instructions for:

Retaining Access to your Data on MSI systems

If you wish to continue using MSI systems to analyze data that will be expiring, you will need to do one of the following:

  1. Copy your data from ‘data_release’ or ‘data_delivery’  to your Tier 1 or Tier 2 storage prior to the deletion deadline. 

  2. For pre-2018 ‘data_release’ data only: copy your data from its Google shared drive location to Tier 1 storage. 

 

There are a couple of options for transferring your data.

Copying data from data_release to Tier 1 Storage

This can easily be done from the command line. For example, to copy a pre-2018 directory called /home/group_name/data_release/umgc/hiseq/160227_SN1293_0411_BD1TE0BCXX/Project_Group_Name_Project_019 from data_release to the group’s shared directory, a member of the group would log into MSI and start an interactive job, then type: 

cp -r /home/group_name/data_release/umgc/hiseq/160227_SN1293_0411_BD1TE0BCXX/Project_Group_Name_Project_019 /home/group_name/shared/Project_Group_Name_Project_019



If there isn’t space in your group’s tier 1 storage, and 

  1. you need the data on tier 1 so you can analyze it, and 

  2. you cannot delete or archive other data within your tier 1 storage space to make room for it, 

you can request a storage quota increase



Transferring data from a temporary, Tier 1 storage 'data_release' or ‘data_delivery’ directory to Tier 2 Storage

Before you are able to move your data from Google Drive to Tier 2, you must first transfer it to a directory on the MSI Tier 1 storage system. From there, you may then transfer it to Tier 2 storage using the available options below. There are two options for transferring your data from Tier 1 to Tier 2:

Option 1: Globus

Instructions on how to use Globus (a user-friendly web-based interface) to copy data to Tier 2 can be found here: https://www.msi.umn.edu/support/faq/how-do-i-use-globus-transfer-data-se...

Option 2: Command Line

First, log into MSI and start an interactive job, then follow the instructions on how to copy data to Tier 2 via the command line here: https://www.msi.umn.edu/support/faq/how-do-i-use-second-tier-storage-com...

For pre-2018 data_release data ONLY: Transferring data from Google Drive to Tier 1 Storage

All data from pre-2018 ‘data_release’ was copied to Google shared drives, and the drives were then shared with the PI and administrators of the group. At any time (even after the data is deleted from ‘data_release’), you may transfer your data from Google Drive to your group's Tier 1 storage. You can find instructions on transferring data from Google Drive to Tier 1 here: https://www.msi.umn.edu/support/faq/how-do-i-transfer-data-google-drive-.... Note that even if you wish to transfer the data to Tier 2, you must first transfer it to Tier 1.

If you cannot access your Shared Drive, please contact help@msi.umn.edu.