One of the principal aims of the Names Project is to assess the feasibility of a name authority service for institutional repositories. Such an endeavour could potentially alleviate many of the issues inherent with the management of identities in an IR environment.
In a climate of ‘publish or perish’ the correct attribution of research output to an author identity is of primary importance. However, inconsistencies arising from the variant nature of names hinder such efforts, fostering an element of unreliability with regards to repository output.
Some IR administrators are endeavouring to address this issue by implementing local approaches to name matching. One such repository is Strathprints, the Institutional Repository of the University of Strathclyde. I recently met with staff involved in the administration of this repository in order to gain a greater understanding of the issues involved in name authority control and to determine the viability of such in-house approaches to disambiguation. I also wanted to assess the usefulness of a service like Names as a tool in this regard.
The main points arising from this meeting related to the ineffectiveness of traditional library authority control sources for the management of IR identities, the impracticality of manual disambiguation as a long-term solution to authority control, and the need for a centralised solution to fragmented authority control initiatives.
The Strathprints repository has facilitated access to the research output of the University of Strathclyde since October 2005. It currently runs on EPrints 3 repository software but will be migrating to PURE, an integrated CERIF-based system, in the coming months. This move is part of a wider effort by the University to amalgamate research related data from various faculties into a single uniform service. The assimilation of previously independent systems (e.g. grant and funding databases) may provide richer contextual data with regards to overall identity management and authority control.
One of the primary motivators for establishing a local approach to identity management related to the increasing number of identities ingested into Strathprints through batch imports from Thomson Reuters. This coupled with the anticipated migration to PURE highlighted the need for a sustainable approach to authority control.
Initial attempts to manage researcher identities concentrated on traditional library approaches to authority control. Library of Congress authority files were checked for established formats of author names. However, low percentages of name matches hindered this process. A manual approach to disambiguation was developed and subsequently implemented by Strathprints administrators. Name matching was based on the following matching criteria: name strings, subject areas, journal titles, and affiliation data. Much of this data was generated externally (either through an analysis of articles associated with the name itself or through Google searches). This proved to be both time consuming and labour intensive. The possibility of using HR data to facilitate automated name matching was investigated but not found to be particularly useful.
Attempts at disambiguating both internal Strathclyde researcher names and external researcher names associated with other institutions proved to be unsustainable in the long-term. Such efforts may result in a possible duplication of disambiguation efforts between institutions and highlights the potential for a centralised, cooperative service for the management of researcher identities.
A sustainable approach to authority control characterised by automated and manual disambiguation processes may address many of the problems identified by Strathprints administrators. A service such as Names might be a logical option in this regard and we look forward to working with this institutional repository in the near future.
Many thanks to Emma McCulloch, Anu Joseph, and Marlene Lightbody of the University of Strathclyde for taking the time to answer my (many!) questions.