Dan Needham, the developer working on the Names Project, attended the Web Services and Repositories workshop that was organised by the EThOS project and held at the British Library on 2nd June.
He gave a presentation [PowerPoint format, 205KB] on the project and the aims behind the web services for the Names prototype that he’s been working on and recently testing with colleagues from Cranfield University.
UPDATE: the audio from Dan’s presentation and all the other materials from the day are now available on the EThoS site.
It’s time to share the progress that has been made by the Names project in developing a prototype for a name and factual authority service for use by UK repositories of research outputs. Note that this is just a prototype, not an all-singing and -dancing service that is going to solve anyone’s name-authority issues right now!
The project team have been concentrating on using a subset of data from the Zetoc service to develop a disambiguation algorithm. Zetoc is a huge database of information about journal articles held by the British Library, dating from 1993 to the present. There are millions of author names within Zetoc, together with a fair amount of additional information (subject classification codes, co-authors, journal titles) which has allowed the team to identify individuals and allocate an identifier to them.
Individuals whose names are reresented in Zetoc records are people who are active in research, so there will be a high overlap with names of UK researchers who might be submitting their papers to repositories of research outputs (the target audience of the Names project). We started off with just the Zetoc data and the surnames Birtwistle and Needham, to test the disambiguation algorithm. At the moment the resulting records are very minimal: just any variants of the name that appear in Zetoc, and the Zetoc record numbers which have been identified by the algorithm as being related to that individual, as can be seen in this example of one of these records.
If you have access to Zetoc, the link on the record number will take you through to the relevant Zetoc record. In future, additional information from those records (article titles, co-authors etc.) would be presented, rather than just this link. At the end of the record there are options to output the information in MARC, CSV or Names formats (these still need work, but you can see the general idea).
More recently, names from the UK PubMed Central grantees’ database have been added to the data. There weren’t any Birtwistles or Needhams in the UKPMC data, so as yet there is no overlap between the two sets of names, but that will be remedied as more data is added from Zetoc over the next few weeks.
Dan Needham (the project’s developer) has also been working on a test script that makes use of the Names API to the prototype’s data. You can see the script in action at this web interface. Again, it’s quite simple at the moment: only the names are displayed as you start to type. In future, institutional affiliations and fields of activity could be added to the names to help narrow down the identities to particular individuals. We’ll be testing this script with colleagues working on developing repositories at Cranfield University over the next month or two.
As I said, please bear in mind that this is still just a prototype and under active development (and results may be unexpected!). Comments welcome…