SeedMe workshop: Collaborative data sharing infrastructure for researchers
Date: Aug 24, 2018 (9am to 6:00pm)
Venue: Synthesis Center/Vis Lab E-B143, San Diego Supercomputer Center, UCSD (See directions)
Contact us: Send email regarding workshop to "amit AT sdsc.edu"
Data is an integral part of scientific research. With a rapid growth in data collection and generation capability and an increasingly collaborative nature of research activities, data management and data sharing have become central and key to accomplishing research goals. Researchers today have variety of solutions at their disposal from local storage to Cloud based storage. However, these solutions solely focus and rely on hierarchical file and folder organization. While such an organization is pervasively used and quite useful, it relegates information about the context of the data such as description and associated collaborative notes to external systems, dispersing this vital information into different silos not only impedes the flow research activities in near term, but also has an impact on mid and long term retention of knowledge about intermediate steps.
In this workshop, we will introduce and provide hands on experience with tools designed to mitigate this critical gap via the NSF supported SeedMe2 platform. The SeedMe2 platform leverages the familiar hierarchical file and folder organization structure, but extends it with an ability to add data, its description and discussion in one system. It also allows a folder to be shared either privately with collaborators or publically for wider dissemination of information. Users may interact with the system via the web browser, command line utility or via REST API using familiar concepts for each method. The platform will enable users to rapidly share and access transient data and preliminary results with collaborators in consumable form. The workshop aims to provide practical training to customize and utilize this infrastructure and enable attendees to overcome existing gaps in collaboration as well as realize several aspects of research data management.
Note: SeedMe2 platform focuses on data that can be transferred easily on the web with standard tools such as stock Web Browsers. This limits the upload sizes to 2GB per file, however any number of files may be uploaded. Moreover derived products from large scale raw data tends to be small, so this workshop and platform is still highly relevant to large data producing groups. In future the platform will likely support larger size uploads.
- A laptop computer is required (tablets will not be sufficient for this workshop)
- Software requirements
- Web Browser: A recent version of one of the following web browser for your operating system is required
Chrome, Edge, Firefox, Opera or Konqueror. (Internet Explorer is not sufficient for this workshop)
- SSH Client:
Windows:Download and install PuTTy software
Linux and Mac: Built in the operating system
- Attendees must be proficient at using web browser and able to make simple edits to text files on a terminal with instructions
- (Optional) Ability to use command line interface
Who may wish to attend this workshop?
We welcome a broad set of attendees to this workshop with complimentary interests such as
- Researchers: Interested to set up or use data sharing for project or personal use
- Research IT/Cyberinfrastructure providers: Provide a predefined or custom data-sharing configuration to your users
- Scientific application/Gateway developers: Integrate/extend your Application/Science Gateways to provide data sharing capabilities and increase your impact by disseminating exemplar content from your users.
- Data curators: Create powerful repositories with custom fields to allow easy discovery via search and interactive exploration. Integrate other tools with repository content via powerful web services.
We anticipate by the end of the workshop the attendees will be able to accomplish the following
- Gain understanding and working knowledge of SeedMe2 platform and how to leverage it for research data needs
- Take away a working research data sharing website with your own branding that provides
- Ability to configure and manage site wide data sharing
- Customized data properties tailored to your project needs
- Customized website for other uses such as to disseminate project news, publications, etc.
- Automation and integration: Learn to use command line and web services tools that could be used from remote resources (such as HPC clusters) as well as for automation
- Deployment options: Learn where such a data infrastructure may be hosted
- Do-it-yourself: On site
- Third party vendor
- Regulatory compliant hosting
Instructors: Amit Chourasia and David Nadeau
Intern assistants:Rahul Kulkarni and Ryan Wei
Logistics: Susan Rathbun
- 9:00am: Welcome and opening remarks by Dr. Mike Norman, Director SDSC
Logistics and attendee introduction
- 9:30am: Keynote: Generative value through collaborative curation
(Mark Parson: Sr. Scientist at Rensselaer Polytechnic Institute, former Secretary General of Research Data Alliance)
- 10:30am: Break
- 10:40am: Lightning overview of web architecture (David Nadeau - Slides)
- 11:30am: Overview of SeedMe2 platform (Amit Chourasia - Slides)
- 12:00pm: Lunch at Cafe Ventana's
- 1:00pm: Invited talk: Sherlock Cloud - An accelerator for protected data computing
(Sandeep Chandra: Director - Health Cyberinfrastructure Division, SDSC)
- 1:20pm: Introduction to FolderShare (Hands-on: Amit Chourasia - Slides)
Introduction to Drupal (Hands-on: Amit Chourasia - Slides)
- 3:00pm: Break
- 3:10pm: FolderShare configuration (Hands-on: David Nadeau - Slides)
- 3:30pm: Customizing FolderShare for your website (Hands-on: Amit Chourasia - Slides)
- 4:00pm: FolderShare web services access from the command line (Hands-on: David Nadeau - Slides)
- 4:30pm: Capstone talk: Using SeedMe2 for collaborative research of laser-plasma interactions
(Dr. Alexey Arefiev: Assistant Professor, Mechanical and Aerospace Engineering, UCSD)
- 5:00pm: Discussion
- 5:30pm: Dinner at Cafe Ventana's
Keynote: Generative value through collaborative curation
Speaker: Mark Parsons (Sr. Scientist, Rensselaer Polytechnic Institute, former Secretary General of Research Data Alliance)
Abstract: Research data sharing and management is increasingly recognized as an important part of the scientific endeavor because it can increase the value of data to research and society. Nonetheless, it is not well understood how to measure the value of research data. Guidance to researchers about data management has largely been limited to data sharing mandates and management plan requirements. Scant attention is given to the need for active and ongoing data curation. In this talk, I suggest we adopt the concept of “generative value” for data by extending Zittrain’s (2008) definition of generativity: “the capacity to produce unanticipated change through unfiltered contributions from broad and varied audiences.” For data this means we must accommodate unanticipated use and unfiltered modifications. Building from decades of international experience and several case-studies, I present an argument and some initial methods for how data can be curated collaboratively by data management and disciplinary experts to maximize generative value.
Invited talk: Sherlock Cloud - An accelerator for protected data computing
Speaker: Sandeep Chandra (Director - Health Cyberinfrastructure Division, SDSC)
Abstract: In this talk I will discuss service providers perspective who have the necessity to remain agile with ever-changing technological and customer requirements while simultaneously ensuring a compliant environment for secure data. I will also describe Sherlock's ‘Innovation Accelerator Platforms’ that provides customers quick access to on-demand and secure data platforms to process and manage data in the cloud.
Capstone talk: Using SeedMe2 for collaborative research of laser-plasma interactions
Speaker: Dr. Alexey Arefiev (Assistant Professor, Mechanical and Aerospace Engineering, UCSD)
Abstract: My research group conducts large supercomputer simulations of laser-plasma interactions. Resulting data from these simulations is further processed to create derived data products, we have a strong need to share these derived products with on-site and foreign collaborators. In this talk I will share my group's experience of using SeedMe2 for collaborative research and its impact.
Venue, directions and parking
San Diego Supercomputer Center’s Synthesis Center E-B143 is located on B1 floor of SDSC’s east entrance, take the stairs just off the driveway on Hopkins Dr, close to the Hopkins Parking Structure, Northwest end of UC San Diego campus.
Venue: Synthesis Center/Vis Lab E-B143
Address: San Diego Supercomputer Center (SDSC)
10100 Hopkins Drive, La Jolla, CA 92093
Google maps exact location
Local map: Download map with SDSC location, housing, parking, coffee, restaurants.
Driving: For driving directions see SDSC's visitors page
Taxi / Shuttle
Cab or shuttle Pick-up/Drop-off: 10100 Hopkins Drive, La Jolla, CA 92093
- Ride sharing services: Lyft & Uber
- Yellow Cab: 619-444-4444
- Super Shuttle: 800.974.8885
Public transportation: Surrounding UC San Diego
Airport: The San Diego International Airport (SAN) is the closest airport to UC San Diego and SDSC.
The SeedMe workshop is based upon work supported by the National Science Foundation under Grant No. 1443083. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.