Presentation | 2014-10-17

Ubiquitous Access to Transient Data and Preliminary Results via the SeedMe Platform

Presented at the University of California, Riverside

Presenter: Amit ChourasiaSan Diego Supercomputer CenterUC San Diego

Abstract

The increasing availability of High Performance Computing (HPC), Cloud Computing, and high- resolution high-update rate sensing and imaging instruments has enabled researchers to model, observe, and analyze an ever-widening range of scientific phenomena. Latest investments in HPC are providing unprecedented capability and access to a wider gamut of researchers. We are witnessing a rapid increase in scientific data as well as the number of users of scientific computing. With all of this new data and technology, the challenge now is how to better support scientists as they analyze, explore, visualize, and collaborate to gain new insight.

Swift feedback: Computing tasks that filter, analyze, and model using this big data are often highly iterative in nature – the best-input parameters are rarely known ahead of time. This leads to a repeating process to run a job, assess results, refine parameters, run another job, and so on. Performing this process well requires rapid access to and assessment of transient data and the preliminary results generated during a compute job. Does the output look reasonable so far? Is the computation converging in an expected way? Is the job using compute power efficiently or is it bottlenecking somewhere? With timely feedback, an erroneous job can be terminated quickly before it wastes valuable time, and job parameters can be quickly adjusted and computation performed again. Swift informative job feedback and access to preliminary results is essential for the efficient use of a scientist’s time and expensive cyberinfrastructure.

Continuous feedback: If we imagine that big computations could be performed on a personal computer (PC), then jobs could show visual progress indicators and immediate interactive visual results. But big jobs can’t be run on a PC and instead require big remote compute power. This leaves remote jobs without a way to show progress feedback or provide quick visual results. Instead, a job’s output is usually directed to remote output files that remain unavailable until a job completes. This leaves the scientist disconnected from the job they’re running. Even when the job is done, the scientist has to move the data from remote to local systems before they can examine it and see if they’ve got useful results. Continuous feedback during a job is needed to tighten the iteration loop and keep scientists connected to the jobs they are running.

Visual results: On successful job completion, output data needs to be processed to extract meaningful preliminary results. Generating visual results requires post processing to build quick plots, or compose images, and videos. While there are tools that can help do this, they require time to start up and manually load and process the data. They also require an investment in time to learn the tools, and the funds to purchase them and the compute platforms they require. All of this complicates generating visuals and collaboratively analyzing preliminary results, and it ultimately delays the submission of the next job iteration with improved parameters. Quick tools to build preliminary visuals are needed to help collaborating scientists review job results and get on to the next job iteration.

Collaboration support: When preliminary results are generated, sharing them among collaborators is often limited to emails back and forth, which cannot easily include all of the relevant data and appropriate interactive visuals to explore them. Weak sharing tools complicate collaborations, increase errors, and slow down the iterative science process. Simple science collaboration tools are needed to support and track discussions among distributed collaborators.