Advanced usage of LUMI-O¶
Introduction¶
This is not a comprehensive tutorial, but more of a list of examples of things that are possible when using LUMI-O. Please consult the manual pages of the tools for additional details.
The examples here assume that you have properly configured the tools to use LUMI-O,
otherwise they will usually default to using amazon aws s3. This is also the case for most other programs
so if you wish to use LUMI-O with other software, you usually have to find some configuration option or environment variable to set a non-default
host name. The correct hostname to use for LUMI-O is https://lumidata.eu
LUMI-O is an S3 compatible storage solution. However, this does not mean that the system is the same as the "Amazon S3 Cloud Storage". The interface for reading and writing data is exactly the same, but AWS has a bunch of additional features, like self-service provisioning of IAM users, life cycle configuration and write once, read many functionality, which are not really part of "just" s3 storage.
It's worth keeping the above in mind, as many people use S3 and Amazon S3 interchangeably when writing guides or instructions.
Warning
Some advanced operations which are supported by AWS will complete successfully when run against LUMI-O, e.g object locks, but will actually have no effect. Unless it is explicitly stated that a feature is provided by LUMI-O, assume that it will not work and be extra thorough in verifying correct functionality.
Credentials & Configuration¶
Moving tool configuration files¶
In some cases it might be required to read credentials from some other location than the default locations under home. This can be achieved using environment variables or command line flags.
rclone | s3cmd | aws | |
---|---|---|---|
DEFAULT | ~/.config/rclone/rclone.conf |
~/.s3cfg |
~/.aws/credentials and ~/.aws/config |
ENV | RCLONE_CONFIG |
S3CMD_CONFIG |
AWS_SHARED_CREDENTIALS_FILE and AWS_CONFIG_FILE |
FLAG | --config FILE |
-c FILE , --config=FILE |
The aws
cli additionally has the concept of profiles, and you can specify
which one to use using the --profile <name>
flag or the AWS_PROFILE
environment variable.
Environment¶
Most programs will use the environment variables AWS_ACCESS_KEY_ID
and AWS_SECRET_ACCESS_KEY
when trying to authenticate. So these can be set if one does not wish to save the credentials on disk.
The environment variables do not always take precedence over values set in configuration files, as
is the case for s3cmd
and rclone
. This means that invalid credentials in config files will
lead to an access denied even if there are valid credentials in the environment. The aws
command
will use the environment variables instead of ~/.aws/credentials
if they are set.
rclone
will additionally require RCLONE_S3_ENV_AUTH=true
in the environment or env_auth = true
in the config file.
Programmatic access¶
When use cases become sufficiently complex one might want to interact with LUMI-O in a more programmatic fashion instead of using the command line tools. One such option is the AWS SDK for Python boto3*.
The script
import boto3
session = boto3.session.Session(profile_name='lumi-465000001')
s3_client = session.client('s3')
buckets=s3_client.list_buckets()
Would fetch the buckets of project 465000001 and return the information as a python dictionary. For the full list of available functions, see the aws s3 client documentation
If a default profile has been configured ~/.aws/credentials
the client creation can be shortened to:
boto3 uses the same configuration files and respects the same environment variables as the aws
cli.
Note
You will need a sufficiently new version of boto3 (e.g version 1.26, which is installed if using python3.6, is too old) for it to understand a default profile set in ~/.aws/credentials and corresponding config file, otherwise the tool will always default to aws s3 endpoint and you will need to specify the profile/endpoint when constructing the client.
*If you prefer to work with some other language there are also options for e.g Java, GO and Javascript
Granular Access management¶
Using the rclone config generated by lumio-conf
or using s3cmd put -P
you can easily
make objects and buckets public or private. This section
explains how to apply more granular rules than a fully private/public content to e.g:
- Share data with another lumi project.
- Restrict object access to specific IP:s
- Allow external modification to only specific objects.
Projects in LUMI-O are handled as "single user tenants/accounts", where the project numerical id (e.g. 465000001) corresponds both the tenant/account name and the project name.
Subsequently, all members of a LUMI-O project have the exact same rights and permissions, unlike on the LUMI filesystem, where files have individual owners. Keep this mind if you have critical data in LUMI-O as any other member of your LUMI project could accidentally delete it
Warning
Be very careful when configuring and updating access to buckets and objects.
It's possible to lock yourself out from your own data, or alternatively make
objects visible to the whole world. In the former case, data recovery might not be possible
and your data could be permanently lost.
ACLs vs Policies¶
There are two ways to manage access for data in LUMI-O:
- Policies
- Access control list (ACL)
While ACLs are simpler to configure, they are an older method for access control and offer much less granular control over permissions. We recommend primarily using Policies
Some other differences include:
- ACLs can only be used to allow more access, not restrict access from the defaults
- ACLs can be applied to buckets and objects while policies can only be applied to buckets
- You can create bucket policies which only affect specific objects in the bucket.
- This also means that you will have to individually / recursively apply ACL changes to all objects in a bucket + the bucket itself.
Configuring Policies¶
You can apply policies to a bucket using s3cmd
or aws
commands:
or
You can list the existing polices on a bucket with:
or
The following example policy would allow the project 465000002
to:
- Download the object
out.json
from our bucket calledfortheauthenticated
- List all objects in the
fortheauthenticated
bucket - Create/modify (by overwriting) to the
upload.json
object in thefortheauthenticated
bucket
The critical part is the format of the Principal which is of the format
The full policy:
policy.json
{
"Version": "2012-10-17",
"Statement": [
{
"Action": ["s3:GetObject"],
"Effect": "Allow",
"Resource": "arn:aws:s3:::fortheauthenticated/out.json",
"Principal": {
"AWS": [
"arn:aws:iam::465000002:user/465000002"
]
}
},
{
"Action": ["s3:ListBucket"],
"Effect": "Allow",
"Resource": "arn:aws:s3:::fortheauthenticated",
"Principal": {
"AWS": [
"arn:aws:iam::465000002:user/465000002"
]
}
},
{
"Action": ["s3:PutObject"],
"Effect": "Allow",
"Resource": "arn:aws:s3:::fortheauthenticated/upload.json",
"Principal": {
"AWS": [
"arn:aws:iam::465000002:user/465000002"
]
}
}
]
}
Another potentially useful policy is a restriction on incoming IP:s
{
"Statement": [
{
"Sid": "IPAllow",
"Effect": "Allow",
"Principal": "*",
"Action": "s3:PutObject",
"Resource": "arn:aws:s3:::almostpublic/data*" ,
"Condition":
{
"IpAddress": {"aws:SourceIp": "193.167.209.166"}
}
}
]
}
This would allow any one connecting from "lumi-uan04.csc.fi" to upload
objects starting with data
to the bucket called almostpublic
(but not download or list them).
Warning
IP restrictions should never be the only measure to protect your data. Especially if there are multiple users on the system. Source IPs can also be spoofed.
For a full list of Actions and resources see the AWS documentation
Don't use an action which you do not understand
To remove policies you can do:
or
Configuring Access control lists (ACLs):¶
You can apply ACL:s to buckets or individual objects
Important
ACL:s are not inherited, e.g new objects created in a bucket with an ACL will not have any ACLs. By default created objects are private (unless you have created a policy changing this and applied it to a bucket ).
aws s3api
:
aws s3api put-bucket-acl --acl public-read --bucket <bucket_name>
aws s3api put-object-acl --acl public-read --bucket <bucket_name> --key <object_name>
--recursive
option.
The commands:
or Would make the bucket but not the object readable for the world → Only possible to list the objects but not download them. The inverse situation where the bucket is not readable but the objects are is similar to a UNIX directory with only executable permissions and no read permissions. I.e files / object can be retrieved from the directory / bucket, but it's not possible to list the content.To remove the public access you would run:
or
aws s3api put-bucket-acl --acl private --bucket <bucket_name>
aws s3api put-object-acl --acl private --bucket <bucket_name> --key <object_name>
put-object-acl
has to be run separately for each object.
while,
Would grant read access to all objects in the <bucket_name>
bucket for the <proj_id>
project.
The single quotes are important as otherwise the shell might interpret $<proj_id>
as an (empty) variable
The corresponding command for aws s3api
would be:
aws s3api put-bucket-acl --bucket <bucket_name> --grant-read id='<proj_id>$<proj_id>'
aws s3api put-object-acl --grant-read id='<proj_id>$<proj_id>' --bucket <bucket_name> --key <object_name>
The lumi-pub rlcone remotes configured by lumio-conf uses acl settings to make
created objects and buckets public, and the same goes for s3cmd put -P
So if you need to "unpublish" or "publish" some data you can use the above commands
Warning
Permissions granted with --acl-grant
are not revoked automatically when running --acl-private
and they have to be explicitly removed with --acl-revoke
Important
After modifying ACL:s, always verify that the intended effect was achieved.
I.e check that things which should be private are private and that public objects
and buckets are accessible without authentication. Public buckets / objects are available using the url
https://<proj_id>.lumidata.eu/<bucket>/<object>
, use e.g wget
, curl
or a browser to check the access permissions.
The aws
cli has a larger selection of acl settings than s3cmd
, e.g
Can be used to grant read-only access to all authenticated users of LUMI-O. Useful if data is semi-public but for some reason or another only people with lumi access. Note here that we are only granting read access to the bucket itself not any of the objects.
To view existing ACL:s you can use
or
aws s3api get-bucket-acl --bucket <bucket_name>
aws s3api get-object-acl --bucket <bucket_name> --key <object_name>
See the s3cmd documentation and aws s3api documentation for a full list of ACLs.
Sharing data with other projects.¶
The authentication information used when interacting with LUMI-O partially defines the scope for buckets.
Public buckets/objects for a project are located under
https://<proj_id>.lumidata.eu/<bucket>/<object>
But making the request to the same url while authenticated will
try to fetch <bucket>
from your own project not proj_id
.
Instead the format https://lumidata.eu/<proj_id>:<bucket>/<object>
must be used.
For public objects the above two URLs are equivalent. Note that the authorization header of any request is checked before any access rules are verified -> using invalid credentials will lead to an access denied even for public objects.
Due to the format of the URL, currently there is no known way to use boto3 or aws
cli
to interact with data which is specifically shared with your project.
s3cmd and rclone
To access buckets and subsequently objects not owned by the authenticated project:
Where465000001
would be your own project you have configured authentication for
and <proj_id>
is the numerical project id for the other project.
Curl
Don't use curl unless you have to, main point here is that the project id owning the bucket has to be included with the bucket and object name when generating the signature.
object=README.md
bucket=BucketName
project=465000001
resource="/$project:$bucket/$object"
endPoint=https://lumidata.eu$resource
contentType="text/plain"
dateValue=`date -R`
stringToSign="GET\n\n${contentType}\n${dateValue}\n${resource}"
s3Key=$S3_ACCESS_KEY_ID
s3Secret=$S3_SECRET_ACCESS_KEY
signature=`echo -en ${stringToSign} | openssl sha1 -hmac ${s3Secret} -binary | base64`
curl -X GET -s -o out.tmp -w "%{http_code}" \
-H "Host: https://lumidata.eu/" \
-H "Date: ${dateValue}" \
-H "Content-Type: ${contentType}" \
-H "Authorization: AWS ${s3Key}:${signature}" \
$endPoint
Presigned URLs¶
Presigned URLs are URLs generated by the user which grant time-limited "public" access to an object. It's also possible to generate an URL which allows time-limited upload for a specific object (key) in a bucket.
Read-only presigned urls¶
You can generate a presigned url using e.g s3cmd
That generates access link that is valid until the given unix epoch time. To get the required unix epoch time, it's possible to use online calculators, e.g when one wants to grant access until a specific date, or then adding the desired duration to the current time.
Irregardless of the set expiry time, presigned urls will become invalid when the access key used for the signing expires.
It's also possible to use the aws
command to presign:
Writable presigned urls¶
There is no way to create presigned urls for PUT
operations
using either s3cmd
or aws
. Below is a short example script
using boto3 to generate a valid url that can be then used to add an object called file.txt
to the defined bucket.
presign.py
import boto3
import argparse
def generate_presigned_url(s3_client, client_method, method_parameters, expires_in):
try:
url = s3_client.generate_presigned_url(
ClientMethod=client_method, Params=method_parameters, ExpiresIn=expires_in
)
except:
print("Couldn't get a presigned URL")
raise
return url
def usage_demo():
parser = argparse.ArgumentParser()
parser.add_argument("bucket", help="The name of the bucket.")
parser.add_argument("key", help="The name of the bucket")
args = parser.parse_args()
s3_client = boto3.client("s3")
client_action = "put_object"
url = generate_presigned_url(
s3_client, client_action, {"Bucket": args.bucket, "Key": args.key}, 1000
)
print(f"Generated put_object url: {url}")
if __name__ == "__main__":
usage_demo()