Data sharing from Comet @SDSC using Globus CLI
Consider a scenario where data from Comet cluster at SDSC needs to be shared with an arbitrary end user which does not have an account on Comet. While such sharing can be accomplished via Globus’s web app manually. This article discusses a scriptable method to accomplish this task. So it can be incorporated in existing computation workflows or Science Gateways. For data sharing we have three scenarios
- The end user does not have Globus account. (In this case we need to provision a new Globus account for this user by using their email address. Upon account provisioning Globus sends an account invitation to their email address with instructions to follow.)
- The end user has an existing Globus account and we know that their email address which we have is linked to their Globus identity. (No problems for this case)
- The end user has a Globus account, but the email address which we have is not linked to their Globus identity. (In this case we can provision a Globus account for the email address that we have, then email them a special URL, which on opening prompts the user to add and associate this new identity to their original Globus account if any. This only needs to be done once for each email address that a user wishes to associate with their original Globus identity.)
So we can handle each case in a distinct manner, but in practice we can simply use the solution for the last case, which automatically degenerates to other cases.
Note: These instructions are ONLY for sharing data from Comet cluster at SDSC to other end users, so if you wish to access your own data on Comet cluster via Globus you don't need anything special and hence this method is not applicable.
Table of contents
- Globus CLI setup on Comet cluster at SDSC
- Create and activate a new shared endpoint on Comet
- Data sharing
- Data download instruction for end user
Access control list (aka ACL) encapsulates a way to manage read/write privileges on an endpoint or a folder inside an endpoint in a granular manner.
Command line interface (aka CLI) also known as console user interface is a means of interacting with a program via predefined commands in form of text (command lines) with appropriate arguments.
An "endpoint" is one of the two file transfer locations – either the source or the destination – between which files can move. Once a resource (server, cluster, storage system, laptop, or other system) is defined as an endpoint, it will be available to authorized users who can transfer files to or from this endpoint. Globus endpoints are named using the following format:
#. For example, the XSEDE project has a Globus account under the username "xsede" and so it’s endpoints are named xsede#comet (for the Comet system at the San Diego Supercomputer Center. Likewise, an individual that has a Globus account under the username "maxim" might have a personal endpoint called maxim#mylaptop.
Globus Connect is easy-to-install, pre-configured software that turns your laptop, server, cluster or other local resource into a Globus endpoint. There are two versions of Globus Connect, one for use with personal machines such as your laptop, and another for use with server-class machines such as campus computing clusters and lab servers. Use Globus Connect Personal to enable file transfer to and from your personal machine (laptop or desktop.) A Globus Connect Personal endpoint is intended to be used only by a single user. See more information on this in the data download section below.
Uniform resource locator (aka URL) is a reference to web resource that specifies its location on a network and a mechanism for retrieving it.
Globus CLI setup on Comet cluster at SDSC
- Comet cluster @ SDSC
- Python 2.7x on Comet
- Globus CLI v1.2.0
Sign-in to your Globus account
You may create a new account via your XSEDE or institution credentials)
Install Globus CLI
Follow instructions below or use more elaborate installation using virtual-env as described here.
module load python pip install globus_cli or pip install globus_cli -–user export PATH=~/.local/bin:$PATH # add this to your path
Globus CLI authorization
We need to bind our Command Line Interaction (CLI) to our Globus account, this requires a one time setup that will authorize the Globus CLI client on Comet cluster to use our Identity. This can be accomplished as follows
- Login to your Globus account in a web browser
- Issue ‘globus login’ command on Comet cluster which will respond with a long URL and prompt to enter a code. This code needs to be fetched from a web browser, so copy and paste the URL in a web browser. Follow directions then paste the code received from here in the command line prompt. Essentially this step binds CLI to your account. This step in needed only once as long as one command is issued by CLI in six-month duration. See discussion about this here
Globus CLI lease renewal
We may execute any command to renew CLI lease. For example we could add a crontab to show all shared end points for demoshare on Comet every month
globus endpoint my-shared-endpoint-list de463f97-6d04-11e5-ba46-22000b92c6ec
Create and activate a new shared endpoint on Comet
Share location requirement on Comet
Data sharing from Comet can only be performed from a specific location (due to security policies in place) i.e. /oasis/projects/nsf/YOUR_PROJECT_NAME/YOUR_USERNAME/shared
You may create sub folders at this location and share them with others as needed.
Note: If you deviate from this root path, sharing won't work on Comet.
Find Globus end point for XSEDE Comet cluster
Identify the endpoint for the Comet cluster using the following command
globus endpoint search "XSEDE Comet" #Sample output #ID: de463f97-6d04-11e5-ba46-22000b92c6ec #Owner: firstname.lastname@example.org #Display Name: XSEDE CometNote: Make note of Comet's endpoint ID de463f97-6d04-11e5-ba46-22000b92c6ec
This will be used subsequently to create our own new end point.
Activate Comet's endpoint for our use
Before we create our own endpoint we need to activate Comet's endpoint to link it with our Globus account. There are different ways to activate an endpoint as documented here. The easiest method is to issue the following command that uses web method, simply paste the response of this command in a web browser and follow corresponding instructions.
Note: This endpoint activation will last for 11 days, its not clear how to renew activation automatically. This means that our sharing will stop working after 11 days, unless the another activation/renewal is performed manually
globus endpoint activate --no-browser --web de463f97-6d04-11e5-ba46-22000b92c6ec #Sample output #Web activation URL: https://www.globus.org/app/endpoints/656d277c-56d6-11e7-befe-22000b9a448b/activate
Create a new endpoint
Now we can create a new endpoint for sharing purpose at this path /oasis/projects/nsf/sds165/amit/shared/demoshare will following instructions
cd /oasis/projects/nsf/sds165/amit/shared/ mkdir demoshare cd demoshare pwd #Create end point globus endpoint create \ --shared de463f97-6d04-11e5-ba46-22000b92c6ec:/oasis/projects/nsf/sds165/amit/shared/demoshare 'Demo endpoint for sharing on Comet' \ --description 'Example of an endpoint for sharing purpose on Comet' #Sample output #Message: Shared endpoint created successfully #Endpoint ID: 656d277c-56d6-11e7-befe-22000b9a448bMake a note of this newly created end point 656d277c-56d6-11e7-befe-22000b9a448b
This new endpoint will be used later to share sub folders inside this location with arbitrary users via their email address.
Create content to share
We can share this endpoint in entirety or sub folders inside this endpoint. Here we discuss sharing sub folders with with arbitrary users as desired. So lets create a folder with some content inside it at follows
mkdir /oasis/projects/nsf/sds165/amit/shared/demoshare/foo touch /oasis/projects/nsf/sds165/amit/shared/demoshare/foo/123.txt touch /oasis/projects/nsf/sds165/amit/shared/demoshare/foo/abc.txt
Share a folder privately with an arbitrary user
Share the foo folder with a user at email@example.com with read access
# use --identity instead of --provision-identity if user already has a globus account globus endpoint permission create \ --permissions r "656d277c-56d6-11e7-befe-22000b9a448b:/foo/" \ --provision-identity firstname.lastname@example.org #Sample output #Message: Access rule created successfully. #Rule ID: f6b52a08-56d7-11e7-befe-22000b9a448b
Share a folder publicly with anyone
Share the bar folder publicly with read access with anyone. You should not grant write privileges to anonymous user, as that has security implications.
# use --identity instead of --provision-identity if user already has a globus account globus endpoint permission create --anonymous \ --permissions r "656d277c-56d6-11e7-befe-22000b9a448b:/bar/" \ #Sample output #Message: Access rule created successfully. #Rule ID: 95370758-94ba-11e7-aae0-22000a92523b
Create a share URL
We would like to send the sharee the URL for shared content, however Globus CLI does not generate it. None the less we can craft it as follows (see discussion here). We need to find the Globus account identifier for the sharee which can be accomplished by issuing following command "globus get-identities email@example.com"
The following components need to be concatenated to create share URL Base URL: https://www.globus.org/app/transfer?
Query string parameters
origin_id=YOUR END POINT ID i.e. 656d277c-56d6-11e7-befe-22000b9a448b
origin_path=RELATIVE PATH FROM SHARED ENDPOINT i.e. /foo/ as %2Ffoo%2F Note: / (slash) must be encoded in its octet notation as %2F
add_identity=UID OF SHAREE i.e. result of globus get-identities firstname.lastname@example.org Note: This is not needed for folders shared publicly
Data download instruction for end user
Once the sharing setup is complete and we have the share URL. We can now send this share URL to the end user (sharee) by email. The following steps need to be followed by the end user to download shared data
- Login to Globus using their Globus account
- Visit the share URL to access the data
- Since Globus currently does not support downloading the data directly from web browser, the end user needs to create a local end point by downloading and installing Globus Connect Personal.
- Browse your local end point in one of the listing pane (say left side), while the other listing pane (say right) shows the shared files on Comet cluster. Select any file/folder on Comet cluster pane then click on corresponding arrow to download it to your local endpoint.
The endpoint activation lasts for 11 days, its not clear how to renew activation automatically. This means that our sharing will stop working after 11 days, unless the another activation/renewal is performed manually
- Globus also imposes various limits on number of endpoints, ACL, etc. these limits could be found here, the key ones that may affect are
- 1,000 endpoints owned by a single user - this total includes both host endpoints and shared endpoints owned by the user. (This likely won't be an issue on Comet)
- 100 effective ACLs per user on an endpoint (Globus CLI does not stop us from creating more than 100 ACLs per user (it starts to complain after 1000, but the damage is already done past 100). When more than 100 ACLs are set for a given user, this user's sharing will break, but it will work as expected for other users. As of this writing there is no fix other than trimming the ACLs for the affected user to less than 100)
- 1,000 total ACLs per endpoint (This could be an issue in some cases e.g. Gateways who wish to setup more than 1000 folders i.e. one folder for each user with appropriate ACLs )
SummaryIn summary there are three main steps, the first two require manual intervention once, while the third one may be automated.
Authorize Globus CLI
#!/bin/bash module load python # Authorize globus cli globus login
Create and activate your endpoint
#!/bin/bash # Change the following #------------------------------# project_name="sds165" username="amit" my_endpoint_folder="perapera2" share_folder="foo" sharee_email="email@example.com" #------------------------------# # Set up share location # Sharing location for Comet : /oasis/projects/nsf/$project_name/$username/shared/ # NOTE: Sharing from other locations won't work # Shared folder will be: /oasis/projects/nsf/$project_name/$username/shared/$share_folder cd /oasis/projects/nsf/$project_name mkdir $username cd $username mkdir shared #this is required cd shared mkdir $my_endpoint_folder cd $my_endpoint_folder mkdir foo cd foo touch 123.txt touch abc.txt my_endpoint_path=`pwd` #Save location for my end point # Identify Comet's endpoint comet_endpoint=`globus endpoint search -F json "XSEDE Comet" \ --filter-owner-id "firstname.lastname@example.org" \ | python -c 'import json, sys; obj=json.load(sys.stdin); print obj["DATA"]["id"]'` echo $comet_endpoint # Create an endpoint for sharing result=`globus endpoint create \ --shared $comet_endpoint:$my_endpoint_path \ 'Demo endpoint for sharing on Comet' \ --description 'Example of an endpoint for sharing purpose on Comet' \ -F json` echo $result # Extract this newly created endpoint my_endpoint=`echo $result | python -c 'import json, sys; obj=json.load(sys.stdin); print obj["id"]'` echo "My endpoint="$my_endpoint echo "Now activate my endpoint" globus endpoint activate --web $my_endpoint
Share a folder with arbitrary user at your endpoint
#!/bin/bash # Share the foo folder with a user at $sharee_email with read access # Use --identity instead of --provision-identity if the user already has globus account globus endpoint permission create --permissions r "$my_endpoint:/$share_folder/" \ --provision-identity $sharee_email # Create share URL base_url="https://www.globus.org/app/transfer?" origin_id=$my_endpoint origin_path=%2F$share_folder%2F # / share must be encoded in octal notation as %2F #if the folder is public then add_identity must be removed add_identity=`globus get-identities $sharee_email` share_url="$base_url&origin_id=$origin_id&origin_path=$origin_path&add_identity=$add_identity" echo "Share URL="$share_url