UMGC Sequencing Data Storage Timeline (2011 - Present)
MSI has historically provided temporary storage for sequencing data generated from UMGC, subject to data retention policies. The retention policy that applies to the data depends on when it was sequenced.
-
All sequencing data in a group's 'data_release' directory (e.g. /home/group_name/data_release) added during 2017 or prior is subject to the MSI five-year retention policy for 'data_release'.
-
Sequencing data added to ‘data_release’ during 2018 until October 1st, 2021, is managed separately by the University of Minnesota Genomics Center (UMGC).
-
Sequencing data added on or after October 1, 2021, is managed by MSI’s Shared User Research Facilities Storage (SURFS) system, and will be stored in a group’s ‘data_delivery’ directory (e.g. /home/group_name/data_delivery) and is subject to the MSI one-year retention policy for SURFS data. For details, see here: https://www.msi.umn.edu/surfs
WHEN DATA HAS EXPIRED, IT WILL BE DELETED FROM MSI SYSTEMS. FURTHER STORAGE IS THE RESPONSIBILITY OF THE RESEARCHER. Please see below for options and processes.
Below is a timeline that shows the changes that have taken place with regards to the storage of UMGC-sequenced data on MSI's Tier 1 Storage.
UMGC Data Cycle
Quarterly Deletion of Expired and Expiring Data
Beginning Quarter 4 of 2021, MSI is now resuming the quarterly deletion process of pre-2018 ‘data_release’ data that was put on hold due to the COVID-19 pandemic. Moving forward, MSI will also be maintaining a quarterly deletion process for data deposited from 10/1/21 in a group’s ‘data_delivery’ directory.
This means that data will now be deleted on a quarterly basis as it reaches the appropriate retention limit. PIs and Group Administrators with expiring data will now receive a notification at the beginning of the quarter with a list of data files that will be expiring, and they will have approximately three months to transfer their data if they wish to retain it. Expired data will then be deleted at the end of the quarter. These policies will apply both to pre-2018 ‘data_release’ data as it reaches its 5 year retention limit, and data deposited in ‘data_delivery’ on or after October 1, 2021 as it reaches its 1 year retention limit.
Quarter 1 (Q1) expiration = Notification will be sent in January; data will be deleted March 31
Quarter 2 (Q2) expiration = Notification will be sent in April; data will be deleted June 30
Quarter 3 (Q3) expiration = Notification will be sent in July; data will be deleted September 30
Quarter 4 (Q4) expiration = Notification will be sent in October; data will be deleted December 31
Currently in 2021, a bulk of the pre-2018 data* has expired or will be expiring within the next year. Note that data upon the commencement of this deletion process, data that has already expired (i.e. data that was sequenced Q3-2016 or earlier) will be deleted at the end of Q4-2021 (December 31, 2021), alongside any normally expiring Q4-2016 data.
*data that has been sequenced and deposited into ‘data_release’ in Quarter 4 of 2017 or earlier, and is subject to the 5-year retention policy.
Pre-2018 Data Transferred to Google Shared Drive
All pre-2018 data has been backed up to a Shared Google Drive. This Shared Drive has been shared with your group's PI and Group Administrators. To access this Shared Drive, sign into your UMN Google Drive, and on the left hand bar, select "Shared Drives" (see image below). In your list of Shared Drives, you should find the drive named msi-datarelease <group>, replacing <group> with your group name. This drive contains your pre-2018 data files.
Alternatively, an email has also been sent providing access to this drive. You may wish to try searching your UMN inbox for an email:
-
From: msi-datarelease University of Minnesota (via Google Drive) <drive-shares-noreply@google.com>
-
Subject: You’ve been added to the shared drive msi-datarelease <group>
Archiving data before it expires and is deleted from MSI systems
If you are done analyzing the data and simply need to archive it, there are several possible storage options both at MSI and the wider university. You can explore your options using OIT’s digital storage options chooser tool here:
https://it.umn.edu/services-technologies/comparisons/select-digital-storage-options
Another alternative is to submit raw sequencing data to NCBI’s Sequence Read Archive (SRA), and use that repository as the permanent, long-term archive of your data.
Downloading data to a local computer
For instructions on how to download your data to a local computer, please see UMGC’s instructions for:
-
Data stored in ‘data_release’ here: https://umgcdownload.msi.umn.edu/theme/datareleasehelp.html
-
Data stored in ‘data_delivery’ (SURFS, starting 10/1/21) here: https://umgcdownload.msi.umn.edu/theme/surfshelp.html
Retaining Access to your Data on MSI systems
If you wish to continue using MSI systems to analyze data that will be expiring, you will need to do one of the following:
-
Copy your data from ‘data_release’ or ‘data_delivery’ to your Tier 1 or Tier 2 storage prior to the deletion deadline.
-
For pre-2018 ‘data_release’ data only: copy your data from its Google shared drive location to Tier 1 storage.
There are a couple of options for transferring your data.
Copying data from data_release to Tier 1 Storage
This can easily be done from the command line. For example, to copy a pre-2018 directory called /home/group_name/data_release/umgc/hiseq/160227_SN1293_0411_BD1TE0BCXX/Project_Group_Name_Project_019 from data_release to the group’s shared directory, a member of the group would log into MSI and start an interactive job, then type:
cp -r /home/group_name/data_release/umgc/hiseq/160227_SN1293_0411_BD1TE0BCXX/Project_Group_Name_Project_019 /home/group_name/shared/Project_Group_Name_Project_019
If there isn’t space in your group’s tier 1 storage, and
-
you need the data on tier 1 so you can analyze it, and
-
you cannot delete or archive other data within your tier 1 storage space to make room for it,
you can request a storage quota increase.
Transferring data from a temporary, Tier 1 storage 'data_release' or ‘data_delivery’ directory to Tier 2 Storage
Before you are able to move your data from Google Drive to Tier 2, you must first transfer it to a directory on the MSI Tier 1 storage system. From there, you may then transfer it to Tier 2 storage using the available options below. There are two options for transferring your data from Tier 1 to Tier 2:
Option 1: Globus
Instructions on how to use Globus (a user-friendly web-based interface) to copy data to Tier 2 can be found here: https://www.msi.umn.edu/support/faq/how-do-i-use-globus-transfer-data-se...
Option 2: Command Line
First, log into MSI and start an interactive job, then follow the instructions on how to copy data to Tier 2 via the command line here: https://www.msi.umn.edu/support/faq/how-do-i-use-second-tier-storage-com...
For pre-2018 data_release data ONLY: Transferring data from Google Drive to Tier 1 Storage
All data from pre-2018 ‘data_release’ was copied to Google shared drives, and the drives were then shared with the PI and administrators of the group. At any time (even after the data is deleted from ‘data_release’), you may transfer your data from Google Drive to your group's Tier 1 storage. You can find instructions on transferring data from Google Drive to Tier 1 here: https://www.msi.umn.edu/support/faq/how-do-i-transfer-data-google-drive-.... Note that even if you wish to transfer the data to Tier 2, you must first transfer it to Tier 1.
If you cannot access your Shared Drive, please contact help@msi.umn.edu.