6 minute read.Raising the standard: GigaScience Press on metadata and discoverability
To mark Crossref’s 25th anniversary, we launched our first Metadata Awards to highlight members with the best metadata practices.
GigaScience Press, based in Hong Kong, was the leader among small publishers, defined as organisations with less than USD 1 million in publishing revenue or expenses. We spoke with Scott Edmunds, Ph.D., Editor-in-Chief at GigaScience Press, about how discoverability drives their high metadata standards.
Our objective is to communicate science openly and collaboratively, without barriers, to solve problems in a data- and evidence-driven manner through Open Science publishing. High-quality metadata helps us address these objectives by improving the discoverability, transparency, and provenance of the work we publish. It is an integral part of the FAIR principles and UNESCO Open Science Recommendation, playing a role in increasing the accessibility of research for both humans and machines. As one of the authors of the FAIR principles paper and an advisor of the Make Data Count project, I’ve also personally been very conscious to practice what I preach.
We’ve been privileged to work with our technical partners at River Valley Technologies, and the novel XML-first publishing platform they have developed has made it particularly easy to integrate and collect persistent identifiers and other metadata, embedding it into the resulting rich-XML. As Open Access advocates, licensing and machine readability were early focuses when launching our journals. We ensured that we provided a text and data mining portal, allowing bulk downloads of our content to encourage reuse. Many specific metadata elements highlighted by the FAIR principles and UNESCO Open Science recommendations, and so these have also helped guide what should be prioritised. If there’s one specific tool to mention, we’ve been big fans of the Crossref participation reports, as this has helped highlight what is missing and what we need to improve upon.
The participation reports, in particular, have been useful for this, and by regularly checking them, we’ve managed to spot when processes have broken, for example. When you’ve added new fields to the reports like ROR IDs (Research Organization Registry), this has also motivated us to prioritise integrating these, so having a curated list of metadata fields like this definitively helps users focus on what should be the most important. River Valley Technologies has been very responsive to this type of feedback, and being able to see the participation report data in real-time has helped drive them to fix and update our metadata. So I thank them for being so patient and quick to respond to our very demanding standards.
From an Editorial side, our technical partners at River Valley Technologies have found having this metadata information available very useful in the Research Integrity tools they have developed and integrated into our publication platform. Things like ORCID IDs, RORs, and other identifiers are very useful for tracking provenance and increasing trust.
From a business side, putting the effort into collecting rich metadata has paid off in the long run by making it easier to integrate our publishing data into new platforms. Making it easier and quicker to integrate and track our data via OA Switchboard, for example. It also helps us more easily mirror and list our content in indexes like PMC, Scopus, Web of Science, and others.
One of the main metadata areas that has currently let us down, funding and registries, is because our publishing model is so affordable. The automated production processes from RVT’s novel publishing platform have allowed us to publish very cost-effectively (the APC of GigaByte is $535). We’ve also received sponsorship from the WHO to publish a series of public health papers, particularly supporting authors from the Global South who may not have sources of funding listed in these registries. Because of this, we’ve published numerous papers from independent researchers, students, and self-financed projects that may not have funding IDs or grant numbers. We’d like to push to get “unfunded” counted as a metadata field to address this.
We’d like to think our authors find this useful, but we’ve not had any specific feedback on this. Our readers, both human and machine, should hopefully appreciate finding our work more easily, and from a purely selfish perspective, should get us higher access and citations. This is difficult to measure, but as evidence nerds, we have attempted to conduct RCTs examining this for Data Citations. One anecdote I can give is about the author who told us they pasted their paper into ChatGPT and asked it which was the best journal for their work, and it suggested our journal. I’d like to think that putting in this effort in making our papers more machine-readable and comprehensible pays off at times like this to make the discoverability and visibility of our journals greater.
We still need to update older content with RORs, and improve it for the datasets
linked to our papers. To do this, we’ve had interns working to improve our DataCite metadata.
We encourage others to think about metadata issues when setting up their workflows. While it may seem like additional work, it will be increasingly important to future-proof and get journals ready for our increasingly AI-centric age. And as we show here, we can more easily carry out important tasks like getting your content more quickly and widely indexed and disseminated.
Strong metadata ties open science, integrity, and discoverability together. GigaScience Press shows how consistent identifiers, machine-readable formats, and continuous checks deliver real benefits. As discovery becomes more AI-assisted, the priority is clear: keep metadata complete, open, and usable.
While it may seem like additional work, it will be increasingly important to future-proof and get journals ready for our increasingly AI-centric age.
– Scott Edmunds, GigaScience
Now, a few words from Scott.
Metadata Awards video - Gigascience