How do I use S3 buckets to share data on Tier 2 storage with other users?

The answer to this question depends on how you access MSI's Tier-2 storage. If you primarily use it via the globus interface, we have some directions for sharing there. This covers how to share data using the s3 command line tools such as s3cmd.

The best way we have found to share data between users via the S3 interface is using bucket policies.

Unlike our primary storage, S3 has no concept of a "group"; data on Tier 2 is directly owned by the user who created the bucket, with no relationship to the project PI, etc. The method we describe here is probably the best way to share data within a research group, with the bucket being owned by the project PI (or an assigned data manager), and shared to other members using a bucket policy.

This involves writing a policy file using a text editor, and then applying it to your bucket with s3cmd.

A simple policy file would look something like this:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {"AWS": [ "arn:aws:iam:::user/uid=54321", "arn:aws:iam:::user/uid=76543" ]},
      "Action": [
        "s3:ListBucket",
        "s3:GetObject",
        "s3:GetObjectVersion"
      ],
      "Resource": ["arn:aws:s3:::mybucket/*", "arn:aws:s3:::mybucket"]
    }
  ]
}

Some notes about the file contents...

  • The "version" line reflects a protocol revision - it's not a date which you can change.
  • The "principal" line contains a comma-separated list of users who should have access. It is meant to be a single line, though it may appear split on this webpage.
    • For most people, the usernames internal to Tier 2 are not the same as your Internet ID, but you can look them up using the "s3info" command - for example:
      % s3info info -u username
      Sharing address:
      Tier 2 username: uid=54321
    • "s3info" will return both your s3 keys and username for yourself; for other users it will return only their tier-2 username
  • The "Action" contains a list of s3 command that are permitted. There are a lot of options. In this simple example we permit read-only access to the bucket (both to list the bucket contents and to retrieve individual objects).
  • Finally, the resource defines what path the policy applies to. We give the bucket name itself ("mybucket") as well as its contents separately. You can apply policies to specific sub-paths as well.

Now we hopefully have a policy file ready to go. Let's assume your policy file is called "s3policy-mybucket". You can apply it to your bucket with s3cmd like this:

s3cmd setpolicy s3policy-mybucket s3://mybucket

After this, all of the listed users should be able to list, read and write the bucket.

There is one important caveat here:

Only the original bucket owner will be able to see the bucket in a list of buckets (the output from "s3cmd ls s3://"). Other users are only able to list it when explicitly requested - for example "s3cmd ls s3://mybucket" - the name of the bucket will need to be shared with them by other means.

In the longer term MSI hopes to provide some wrapper scripts to make creating some simple policies like the above easier.

 

Category: 
Data Storage