Blog

What’s your (citations’) style?

Bibliographic references in scientific papers are the end result of a process typically composed of: finding the right document to cite, obtaining its metadata, and formatting the metadata using a specific citation style. This end result, however, does not preserve the information about the citation style used to generate it. Can the citation style be somehow guessed from the reference string only?

TL;DR

  • I built an automatic citation style classifier. It classifies a given bibliographic reference string into one of 17 citation styles or “unknown”.
  • The classifier is based on supervised machine learning. It uses TF-IDF feature representation and a simple Logistic Regression model.
  • For training and testing, I used datasets generated automatically from Crossref metadata.
  • The accuracy of the classifier estimated on the test set is 94.7%.
  • The classifier is open source and can be used as a Python library or REST API.

Introduction

Threadgill-Sowder, J. (1983). Question Placement in Mathematical Word Problems. School Science and Mathematics, 83(2), 107-111

This reference is the end result of a process that typically includes: finding the right document, obtaining its metadata, and formatting the metadata using a specific citation style. Sadly, the intermediate reference forms or the details of this process are not preserved in the end result. In general, just by looking at the reference string we cannot be sure which document it originates from, what its metadata is, or which citation style was used.

Accidental release of internal passwords, & API tokens for the Crossref system

TL;DR

On Wednesday, October 2nd, 2019 we discovered that we had accidentally pushed the main Crossref system as part of a docker image into a developer’s account on Docker Hub. The binaries and configuration files that made up the docker image included embedded passwords and API tokens that could have been used to compromise our systems and infrastructure. When we discovered this, we immediately secured the repo, changed all the passwords and secrets, and redeployed the system code. We have since been scanning all of our logs and systems to see if there has been any unusual activity that could be related to the exposure of the container.

Request for feedback: Conference ID implementation

We’ve all been subject to floods of conference invitations, it can be difficult to sort the relevant from the not-relevant or (even worse) sketchy conferences competing for our attention. In 2017, DataCite and Crossref started a working group to investigate creating identifiers for conferences and projects. Identifiers describe and disambiguate, and applying identifiers to conference events will help build clear durable connections between scholarly events and scholarly literature.

Chaired by Aliaksandr Birukou, the Executive Editor for Computer Science at Springer Nature, the group has met regularly over the past two years, collaborating to create use cases and define metadata to identify and describe conference series and events. We first asked for input on metadata specifications in April 2018. Technical implementation kicked off in February with a workshop at CERN to discuss the mechanics of making PIDs for conferences a reality.

Speaking, Traveling, Listening, Learning

2019 has been busy for the Community Outreach Team; our small sub-team travels far and wide, talking to members around the world to learn how we can better support the work they do. We run one-day LIVE local events alongside multi-language webinars, with the addition of a new Community Forum, to better support and communicate with our global membership.

This year we held a publisher workshop in London in collaboration with the British Library in February to talk about all things metadata and Open Access, before heading over to speak to members in Kyiv in March at the National Technical University of Ukraine. June saw our first ever non-English LIVE local event in Bogota held in collaboration with Biteca, and in an action-packed week in July, Rachael Lammey and myself jetted across to Kuala Lumpur and Bangkok where we collaborated with Malaysian Ministry of Education, USIM, Chulalongkorn University, iGroup, and ORCID to run two events for our South-East Asian members.

2019 election slate

2019 Board Election

The annual board election is a very important event for Crossref and its members. The board of directors, comprising 16 member organizations, governs Crossref, sets its strategic direction and makes sure that we fulfill our mission. Our members elect the board - its “one member one vote” - and we like to see as many members as possible voting. We are very pleased to announce the 2019 election slate - we have a great set of candidates and an update to the ByLaws addressing the composition of the slate to ensure that the board continues to be representative of our membership.

Building better metadata with schema releases

This month we have officially released a new version of our input metadata schema. As well as walking through the latest additions, I’ll also describe here how we’re starting to develop a new streamlined and open approach to schema development, using GitLab and some of the ideas under discussion going forward.

Introducing our new Director of Product

I’m happy to announce that Bryan Vickery has joined Crossref today as our new Director of Product. Bryan has extensive experience developing products and services at publishers such as Taylor & Francis, where he led the creation of the open-access platform Cogent OA. Most recently he was Managing Director of Research Services at T&F, including Wizdom.ai after it was acquired.

We’ll be rocking your world again at PIDapalooza 2020

The official countdown to PIDapalooza 2020 begins here! It’s 163 days to go till our flame-lighting opening ceremony at the fabulous Belem Cultural Center in Lisbon, Portugal. Your friendly neighborhood PIDapalooza Planning Committee—Helena Cousijn (DataCite), Maria Gould (CDL), Stephanie Harley (ORCID), Alice Meadows (ORCID), and I—are already hard at work making sure it’s the best one so far!

LIVE19, the strategy one: have your say

With a smaller group than usual, we’re dedicating this year’s annual meeting to hear what you value about Crossref. Which initiatives would you put first and/or last? Where would you have us draw the line between mission and ambition? What is “core” for you? How could/should we adapt for the future in order to meet your needs?

Crossref LIVE19 logo

Striving for balance

Different people want different things from us. As Aristotle said: “There is only one way to avoid criticism: do nothing, say nothing, and be nothing.” As we prepare for our 20th year of operation, please join this unique meeting to help shape the future of Crossref.

Funders and infrastructure: let’s get building

Human intelligence and curiosity are the lifeblood of the scholarly world, but not many people can afford to pursue research out of their own pocket. We all have bills to pay. Also, compute time, buildings, lab equipment, administration, and giant underground thingumatrons do not come cheap. In 2017, according to statistics from UNESCO, $1.7 trillion dollars were invested globally in Research and Development. A lot of this money comes from the public - 22c in every dollar spent on R&D in the USA comes from government funds, for example. Funders really do support a LOT of research.