Names Project Blog

Names source code available

Posted in publications, reports by Amanda Hill on 19 December, 2013

The source code for the Names project’s disambiguation service and user front end are now available from the Bitbucket code-sharing service. The various components are:

Project Documentation

Disambiguation package

Database manager package

Names user interface application

Example data handler

Example data handler two

Matching service

The user interface to the Names data has been updated to use the code available through Bitbucket. Here’s an example of a Names record in the new web view of the data:

Names record in new interface

Names record in new interface

The Names project came to an end in July 2013 – the final report [PDF, 525KB] is available here. The conclusions of the report were:

  1. The Names project has demonstrated that automated or semi-automated solutions can be applied to bulk-process complex authority control tasks traditionally undertaken by cataloguers on an item by item basis. This approach offers the potential to extend authority control to types of resource, such as journal articles, which have previously been neglected on grounds of cost.
  2. The quality of the outcome is directly affected by the range and quality of the metadata available. Publishing conventions, such as use of initials rather than full names, hinder accurate identification and comprehensive disambiguation of individuals. Human intervention is still necessary, but filtering enables the human intervention to be focused on ambiguous and anomalous identities.
  3. Retrospective author disambiguation is complex and costly, even when partially automated and should be regarded as the solution to a legacy problem rather than the preferred way forward. The Names database and the components of the Names system are resources which can be used by other services to improve their own efficiency.
  4. Integration between national systems such as Names and international services like ISNI is possible, with the national system offering the opportunity of liaising with institutions to feed data into the international level and with the potential for saving the research community the fees for institutional membership for ORCID and registration agency costs ISNI. Further investment in Names would be required to establish an automatic updating mechanism between the Names system and ISNI and/or ORCID.
  5. The major achievements of the Names project have been the development of the disambiguation algorithm and the quality assurance process for the resulting data. These have enabled the creation of a useful set of information in the Names database which offers free and flexible access to its contents. By making the database structure, the data, and the disambiguation algorithm available through a code-hosting service, it will be possible for other services to make use of these elements in the future. It should be noted that the quality assurance expertise provided by the Names project team is not something that can be made available externally.

As we wind up the project, I would like to acknowledge the huge amount of work that Dan Needham at Mimas has put into developing this code and into sharing it so that others can benefit from his expertise in this area. Also, many thanks to our colleagues in the British Library: Alan Danskin, Stephen Andrews, Michael Docherty, Alison Wood, Richard Moore, Susan Skaife, Jasper Jackson and Andrew MacEwan whose time and efforts contributed to the success of Names, particularly in the development of a data model and in the quality assurance of data. They also helped to ensure that the results of the project live on in the form of ISNI identifiers for many UK researchers.

Advertisements