May 5, 2016

Soon after starting as data manager with the Institute for Marine and Antarctic Studies (IMAS), Emma Flukes experienced the proud moment of loading her own data onto the Australian Ocean Data Network (AODN).

‘I was really excited to see how my data could be visualised,’ Emma says. ‘The concept of routinely making data available for the public to explore was very appealing, and I hope more and more researchers will start to recognise the benefits of open data. I always like being involved in the new thing: at the cutting edge. Open data is gaining momentum so quickly and I’m really excited to be part of the movement.’

Emma’s data management responsibilities are now divided between IMAS and the Marine Biodiversity Hub, where she sees her role as smoothing the way between researchers and technical (data) specialists. After auditing datasets from Hub projects, she will develop guidelines to help researchers publish their data through the AODN 1-2-3 Data Portal, an online interface that catalogues and visualises open-access data collected by the Australian marine community.
 
‘Open data systems require some effort to get up and running, but once the wheels are turning they hugely decrease the overall effort that goes into collecting and showcasing data,’ Emma says. ‘Open data enables a faster and more efficient path to scientific discovery, encourages collaboration, reduces duplication of research efforts, and can even boost citation rates. So I was really excited to start working with the Marine Biodiversity Hub which has the policy of making all its data publically available as quickly as possible.
 
‘Another advantage is that researchers can keep track of who else is using their data, and this can lead to collaboration opportunities. Recent studies have shown that making data open access results in, on average, a 30 percent increase in citations of associated publications, in addition to direct data citations. In fact data themselves can now be cited. Datasets from Reef Life Survey – a global project supported by the Hub for its national work – were some of the first to be published, in the first edition of the Nature journal Scientific Data.'

Metadata vital to data discovery

Emma says that one of the biggest challenges facing researchers is finding at the most powerful way to publicise their data. The first step in this process is creating metadata that describes a data collection and allows it to be publically discovered.
 
‘A data collection only needs to be described once,’ she says. ‘Creating metadata may take an hour or so and a couple of revisions along the way to describe several years-worth of research, but it’s not a terribly arduous task and all the information required usually has already been documented in publications. When I first receive a dataset I’ll have a look at the metadata, the file format, what the researcher has actually measured and what filtering process might make sense from a user’s perspective. For example, will an end-user just want to filter the dataset by date or location, or can we apply more specific filters such as substrate type, satellite tag number, species name etc.? You can never be too descriptive.’

Journeying from kelp to canyons

Emma approaches data management from the perspective of a community ecologist. Her PhD research looked at the effects of climate change on habitat-forming kelps and associated rocky reef communities, and she has also worked on the impacts of overgrazing by the long-spined sea urchin in south-eastern Australia. Now her fascination for ecology is fulfilled by serendipitous currents of open data.
 
‘The AODN Data Portal can be searched using keywords, without the need to know a particular dataset exists, and third parties may use data in ways that, initially, were never envisaged, Emma says. ‘It’s fascinating to see what kind of datasets are publicly available. The IMAS Data Portal hosts 170-year-old Port Arthur tide gauge records, 40 years of phytoplankton data from the Derwent River, and some very impressive global fish biodiversity datasets. The NERP submarine canyons dataset is also really exciting.
 
‘I’ve always had a fascination with deep sea trenches: the world down there seems as remote as the moon. I was recently exploring a map of submarine canyons around the Australian continental margin sampled as part of the NERP Hub, and was excited to discover a heap of environmental data was also collected from nearby shelf ecosystems.’

Precision downloading on AODN

Emma says that through the AODN Portal, the ability to connect to data directly from desktop software or visualise data through the online interface, without having to download large data files, is incredibly innovative.
 
‘You can get a preview of what’s contained in a dataset, and subset the data spatially and temporally: that can be really important if you’re dealing with large data sets or you’re interested in really fine-scale model output,' she says. ‘For example, it might be a satellite-derived sea surface temperature map, or the rocky reef component of an Australia-wide habitat map. The Portal allows you to filter and subset a dataset by time, or to draw a box around a spatial area of interest. It’s all about streamlining the process for the user: the easier it is for someone to discover and download exactly what data they want, the more likely they are to reuse that data for their own further analyses.’
 
Emma says the ultimate user experience will be to access all kinds of data available for a particular geographic area ─ images, videos, model output, empirical measurements ─ and see them displayed as an integrated map. This has the added power of highlighting areas that have been well studied, and well as ‘black holes’ for data collection. For example, a huge amount of information has been collected by ships on particular common voyage routes, while massive oceanic areas off the Western Australian coast remain conspicuously understudied.

Tackling the challenge of showcasing BRUVS data

Emma says the variety and types of datasets that can be visualised through the Portal is being constantly improved by the AODN team. For example, a challenge at the moment is how best to showcase scoring, video and image data from baited remote underwater video system (BRUVS).
 
‘Ideally there would be a central aggregation point for data collected by different organisations that could be linked to a global map, Emma says. ‘Imagine a map that displays every online BRUVS data collection: that’s the power of using a single central portal.’

A map showing benthic habitat data for waters near Maria Island

Above: a sample IMAS data visualisation for benthic habitats around Tasmania's Maria Island.

The status of the Portal

At present the AODN Portal contains only IMOS-hosted data. In future, AODN will harvest all metadata (and data) that have been prepared in the format required for this Portal. Emma and the AODN technical team are working to ensure that NESP datasets are correctly prepared, and NERP data will be ‘retrofitted’ for viewing on AODN. In the meantime, NESP/NERP data are discoverable through the AODN catalogue which provides an aggregation point for metadata and access to download the data, but without an interactive spatial mapping facility. A sneak preview of what the AODN 1-2-3 Portal will look like is available from the IMOS and IMAS Data Portals. Watch this space for Marine Biodiversity Hub data!
 
Further reading
Graham J. Edgar and Rick D. Stuart-Smith (2014) Systematic Global Assessment of reef fish communities by the Reef Life Survey programScientific Data, May 2014.
Piwowar HA, Day RS, Fridsman DB. 2007. Sharing detailed research data is associated with increased citation rate. PLoS ONE 2(3): e308. doi:10.1371/journal.pone.0000308