This workshop was organised and hosted by the POGO Biological Observations Task Force.
A short report on the event was published in Issue 37 of the POGO Newsletter (August 2019). You can read it here.
See below for original Workshop Announcement:
NB Workshop places are limited and will be allocated to POGO Member institutions or their nominated partners as a priority.
However, we welcome all expressions of interest from non-POGO organisations. Such applications will be added to a waiting list, to be considered after the closing date of 1 March, dependent on remaining capacity.
Description
The past two decades have seen rapid advances in technology available to oceanographers seeking to study and manage marine ecosystems. Relatively cheap, compact computers and digital storage have allowed scientists to collect big, complex datasets. Cruises now regularly return to port with terabytes of data, high temporal resolution coastal time series contain billions of measurements, and water samples are parsed into millions of DNA sequences. These information rich datasets have grown so large that analysis with traditional methods has become untenable.
Oceanographers have begun exploring high-throughput, automated methods to make sense of their big datasets. Recent developments in machine learning and artificial intelligence (ML/AI) offer the means to analyze a variety of types of data; acoustic recordings, digital images and video, and eDNA samples to name a few. ML/AI techniques, when properly applied, could be used to expedite analysis of existing oceanographic data and enable novel experimental designs.
This three day workshop seeks to educate the POGO community about ML/AI as it is currently being applied in biological oceanography. The gathering will begin with talks on the current state-of-the-art analysis techniques in three broad areas: acoustics, imaging, and genomics. Attendees will then participate in a domain specific hands-on tutorial, hosted by domain experts, with a focus on data preprocessing and organization. The workshop will conclude with discussions on the direction of ocean observation in the age of big data. The organizers hope that the program will inspire discussions on best practices for applying ML/AI in oceanography and opportunities for crosscutting research at the intersection of ML/AI and biological monitoring. Tutorial materials will be made public at the conclusion of the workshop.
Tutorials
The tutorials will take place over two sessions. The proceedings will begin with a general machine learning overview, geared toward scientists with a limited background in the area. Topics will include basic theory, an overview of techniques, and practical considerations for using machine learning in the field.
Participants will then break into one of three topical sessions: acoustics, genomics, and imaging. Each domain specific tutorial will be conducted as a guided, hands-on activity with labeled data supplied by the organizers. The tutorial leaders have coded exercises and examples that will be interacted with on Jupyter notebooks. Participants will have access to data and computing resources on Amazon Web Services (AWS) to complete the activities. The tutorials will contain approximately six hours of material.
Attendees will also have the opportunity to temporarily load some of their own data on AWS to use with the tutorial code and experiment on their own. Tutorial leaders will be available to assist on the second day. Instructions for loading the data and prescribed formats are forthcoming.
Acoustics
Leaders: Danelle Cline (MBARI), John Ryan (MBARI)
Passive acoustic sensing in the ocean provides a wealth of information about the presence and activity of marine life, as well as anthropogenic noise that can negatively impact marine life. This mode of sensing generates very large and complex data sets. In this research domain, ML is proving to be highly effective. This tutorial will cover ML fundamentals, optimum decimation filtering, spectrogram enhancement methods, and classification using convolutional neural networks (CNN). We will focus on end-to-end analysis methods for one type of sound source: low frequency whale calls, including detection and classification. Some basic experience in Python programming in Jupyter notebooks would be beneficial. Data will be provided for the tutorial. If you wish to bring your own data for experimenting outside of class time, please contact us to understand required dataset organization.
Genomics
Leaders: Tristan Cordier (University of Geneva), Anders Lanzén (AZTI-Tecnalia)
High-throughput amplicon sequencing of environmental DNA (eDNA metabarcoding) is a molecular technique that enables profiling of both the composition and diversity (both known and unknown “species”) of biological communities, directly from environmental samples. It provides biologists with an unprecedented amount of ecologically meaningful data. These tools have been recently tested in an environmental monitoring context, in which the bioindicator values of the species present in an environment indicate its Ecological Quality Status (EQS), i.e. its level of disturbance, usually caused by anthropogenic pressures. Those studies showed that anthropogenic impact can be clearly detected from metabarcoding data, although the taxonomic identification of many sequences remain challenging, hampering the transition of existing monitoring tools and practices into the genomics era. Recent studies have shown that Supervised Machine Learning (SML) algorithms can be successfully used to overcome this challenge in many cases, enabling robust predictive models that can provide EQS predictions from metabarcoding data, regardless of the taxonomic affiliations.
This tutorial will first provide a brief overview of general bioinformatics practices for processing of metabarcoding data, to obtain an exploitable contingency matrix (“OTU table”). We will then focus on the training and testing (cross-validation) of SML models to predict the EQS of samples in a monitoring context. Data collected along a pollution gradient in coastal environments in Norway will be provided and analysed, with the possibility to explore participants’ own data as well. The tutorial will be carried out using command line tools and R, and therefore requires basic familiarity with GNU/Linux (bash) as well as basic statistics commands and data manipulation (“programming”) in R.
Imaging
Leader: Eric Orenstein (SIO-UCSD)
Scientists are increasingly relying on digital imaging technology to study marine organisms. These tools promise to yield new insight into ecosystem function by densely sampling in space and time. But drawing conclusions and developing long term monitoring programs based on imaging systems is challenging due to the sheer volume of data they produce.
This tutorial will use a labeled plankton data set drawn from the Scripps Plankton Camera System to illustrate a variety of supervised image classification methods. The materials will cover basic image manipulation, feature extraction, margin, ensemble, and neural network classification. All coding examples and exercises will be presented in Python. These techniques are broadly applicable to all sorts of image data, from the macro- to the microscale. Participants are encouraged to bring their own images for experimentation.
Logistics
Dates: May 19-22 2019 (revised dates since first announcement)
Location: VLIZ InnovOcean Center at Ostend, Belgium
Attendance: A maximum of 60 attendees will participate in the workshop. We intend to limit each domain specific tutorial to 20 people.
Application: Please fill out the online application by March 1st, 2019. Accepted attendees will be notified on March 15th, 2019.
Travel support: Limited travel support is available for qualifying applicants. Please request it in the appropriate section of the application.
Costs: There is no cost to participate in the workshop.
Accommodation: Participants are responsible for their own accommodation arrangements and associated costs.
The organisers have block booked local accommodation, and details of these suggested locations will be supplied with joining instructions.
3-day Schedule (subject to change)
Day 0 (19 May) |
Day 1 (20 May) |
Day 2 (21 May) |
Day 3 (22 May) |
|
---|---|---|---|---|
Morning | Introduction & general ML/AI overview | Tutorial session 2 | Discussion and concluding remarks | |
Afternoon | Attendees arrive | Tutorial session 1 | “Homework” session | Attendees depart |
Evening | Reception |