High-performance image classification and search supporting large-scale seafloor biodiversity and habitat surveys
Overview: The project explores new domain-optimized high-performance computation methods for automatic image classification to enable seafloor population surveys at landscape scales. Recent advances in high-resolution imaging of seafloor communities have created a critical need for new optimized computation methods to process the resulting extremely large data collections. These new methods will enable scientists to create detailed biological and environmental maps of seafloor communities at unprecedented scales and do so repeatedly through time as these habitats change under the influence of human activity. Such maps provide a necessary long-term contextual framework and baseline data for subsequent sustainability data analysis and decision making.
Merit: Our approach uses domain knowledge to guide and tune image classification algorithms to enable high performance processing of very large sets of survey images. Custom color- and texture-based metrics are derived from image tiles to build feature vectors that are used to guide search through a k-d-tree-based species and habitat classification library. This project’s methods prototype and validate the performance and classification accuracy of new methods that cull the library based upon survey characteristics (geographic region, water temperature and salinity, sea bottom type from acoustic data), tune feature vectors based upon survey and library metrics (contextual color gamut and texture detail reduction, principal components analysis to combine and weight features), reduce the nearest-neighbor search set size using k-d tree metrics, restructure the k-d tree to improve common case search performance and cacheability, and parallelize the search for efficient classification across a cluster. Together these new methods are expected to substantially increase classification performance.