Names Project Blog

Names prototype update

Posted in reports by Amanda Hill on 3 February, 2009

It’s time to share the progress that has been made by the Names project in developing a prototype for a name and factual authority service for use by UK repositories of research outputs. Note that this is just a prototype, not an all-singing and -dancing service that is going to solve anyone’s name-authority issues right now!

The project team have been concentrating on using a subset of data from the Zetoc service to develop a disambiguation algorithm. Zetoc is a huge database of information about journal articles held by the British Library, dating from 1993 to the present. There are millions of author names within Zetoc, together with a fair amount of additional information (subject classification codes, co-authors, journal titles) which has allowed the team to identify individuals and allocate an identifier to them.

Individuals whose names are reresented in Zetoc records are people who are active in research, so there will be a high overlap with names of UK researchers who might be submitting their papers to repositories of research outputs (the target audience of the Names project). We started off with just the Zetoc data and the surnames Birtwistle and Needham, to test the disambiguation algorithm. At the moment the resulting records are very minimal: just any variants of the name that appear in Zetoc, and the Zetoc record numbers which have been identified by the algorithm as being related to that individual, as can be seen in this example of one of these records.

Example Names record

Example Names record


If you have access to Zetoc, the link on the record number will take you through to the relevant Zetoc record. In future, additional information from those records (article titles, co-authors etc.) would be presented, rather than just this link. At the end of the record there are options to output the information in MARC, CSV or Names formats (these still need work, but you can see the general idea).

More recently, names from the UK PubMed Central grantees’ database have been added to the data. There weren’t any Birtwistles or Needhams in the UKPMC data, so as yet there is no overlap between the two sets of names, but that will be remedied as more data is added from Zetoc over the next few weeks.

Dan Needham (the project’s developer) has also been working on a test script that makes use of the Names API to the prototype’s data. You can see the script in action at this web interface. Again, it’s quite simple at the moment: only the names are displayed as you start to type. In future, institutional affiliations and fields of activity could be added to the names to help narrow down the identities to particular individuals. We’ll be testing this script with colleagues working on developing repositories at Cranfield University over the next month or two.

As I said, please bear in mind that this is still just a prototype and under active development (and results may be unexpected!). Comments welcome…

Advertisements
Tagged with:

One Response

Subscribe to comments with RSS.

  1. OCLC Identities « Museum Pipes said, on 7 March, 2009 at 7:39 pm

    […] works Might be even better when the Names Project gives us our “all-singing and -dancing service that is going to solve anyone’s name-authority issues.” OCLC […]


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: