6 May, 2011

How vain, without the merit, is the name!1

At the end of October 2010, the Merit project made its cleaned-up version of the 2008 Research Assessment Exercise (RAE) data available through the project’s website. This data set includes names of the top researchers in the UK (Stephen Hawking, for example, Monica Grady or Brian Cox), with the titles of the materials that were submitted for assessment by their institutions. It seemed to be an ideal set of information for the Names Project to use, as the information includes institutional affiliations, which is not easy to track down from other data sources we’ve been investigating, such as the Zetoc table of contents data from the British Library.

Names records have now been generated for all of the individuals represented in the Merit data. This creates a core of nearly 47,000 disambiguated names of UK researchers for the project, associated with 158 institutions. As a result of the earlier work of the Merit project, the quality of the data was good. There were occasions where individuals had more than one identifier in the Merit data (when their work had been submitted by more than one institution), but these were successfully identified and merged in the disambiguation process.

Our British Library colleagues’ quality assurance process identified only one case where the system wrongly suggested a match. There were two D. J. Siveters listed in the data, one at the University of Leicester, the other at the University of Oxford, both writing on the subject of palaeontology.  A little investigation revealed that these were in fact two distinct individuals: twin brothers (Derek and David) working in the same field who often co-author papers.  Perhaps these two form the ultimate test of any disambiguation mechanism?

The project team are intending to share the records generated by this process with colleagues working on the ISNI (International Standard Name Identifier) to see if they can identify matches with records in the ISNI data.

