In the past few weeks the Names team have been working with colleagues at the London School of Economics to uniquely identify individuals who have been involved in research at their institution. As with our previous work with the University of Huddersfield, this involved analysing the contents of LSE’s institutional repository, LSE Research Online.
By processing the RDF data which is automatically provided by the repository’s EPrints software, we were able to compare the information within it against the existing information in Names about LSE authors. Where individuals had already been identified from the Merit 2008 Research Assessment Exercise data, the repository information usually provided additional details to augment the Names records, including first names and other titles of papers that individuals had worked on. For individuals who were not already in Names, we created new records and assigned identifiers to them.
The Names disambiguation algorithm does a good job of automatically matching information from repository data with existing Names records, but it is configured to err on the side of caution in making matches to avoid making false connections between individuals who may have similar names but are not the same. This creates some extra work for the quality assurance process (which is undertaken by our colleagues at the British Library) , as it generates a list of potential matches which have to be checked manually. This is worth doing, however, as it ensures that the resulting data is more reliable than it would be with just an automated check. The more data is added to Names, the smoother the matching process becomes, as there is more information in the system to compare against each new source of data.
In the record below, the original Merit record has been enhanced with information about the individual’s first name and with the identifier from the LSE repository. Already this person has four separate identifiers assigned to him: the local LSE one, the national Merit one derived from the 2008 Research Assessment Exercise, the Names identifier (15711) and the international ISNI identifier. We’re also currently investigating the best way of linking this data up with the other big international initiative, ORCID.
Colleagues at LSE plan to add the Names identifiers to their local name authority file for use within the institution. I’d like to note here that working in collaboration with the LSE staff helped to improve data both in Names and at the repository. The experience has also helped us to speed up and fine-tune the quality assurance process at the Names end.
In total there are now 1,005 individuals identified in Names who are affiliated with LSE. 463 of these were new identities created from information in LSE Research Online and 413 were existing Names records which have been improved with additional information from the LSE repository.