Names Project Blog

Names source code available

Posted in publications, reports by Amanda Hill on 19 December, 2013

The source code for the Names project’s disambiguation service and user front end are now available from the Bitbucket code-sharing service. The various components are:

Project Documentation

Disambiguation package

Database manager package

Names user interface application

Example data handler

Example data handler two

Matching service

The user interface to the Names data has been updated to use the code available through Bitbucket. Here’s an example of a Names record in the new web view of the data:

Names record in new interface

Names record in new interface

The Names project came to an end in July 2013 – the final report [PDF, 525KB] is available here. The conclusions of the report were:

  1. The Names project has demonstrated that automated or semi-automated solutions can be applied to bulk-process complex authority control tasks traditionally undertaken by cataloguers on an item by item basis. This approach offers the potential to extend authority control to types of resource, such as journal articles, which have previously been neglected on grounds of cost.
  2. The quality of the outcome is directly affected by the range and quality of the metadata available. Publishing conventions, such as use of initials rather than full names, hinder accurate identification and comprehensive disambiguation of individuals. Human intervention is still necessary, but filtering enables the human intervention to be focused on ambiguous and anomalous identities.
  3. Retrospective author disambiguation is complex and costly, even when partially automated and should be regarded as the solution to a legacy problem rather than the preferred way forward. The Names database and the components of the Names system are resources which can be used by other services to improve their own efficiency.
  4. Integration between national systems such as Names and international services like ISNI is possible, with the national system offering the opportunity of liaising with institutions to feed data into the international level and with the potential for saving the research community the fees for institutional membership for ORCID and registration agency costs ISNI. Further investment in Names would be required to establish an automatic updating mechanism between the Names system and ISNI and/or ORCID.
  5. The major achievements of the Names project have been the development of the disambiguation algorithm and the quality assurance process for the resulting data. These have enabled the creation of a useful set of information in the Names database which offers free and flexible access to its contents. By making the database structure, the data, and the disambiguation algorithm available through a code-hosting service, it will be possible for other services to make use of these elements in the future. It should be noted that the quality assurance expertise provided by the Names project team is not something that can be made available externally.

As we wind up the project, I would like to acknowledge the huge amount of work that Dan Needham at Mimas has put into developing this code and into sharing it so that others can benefit from his expertise in this area. Also, many thanks to our colleagues in the British Library: Alan Danskin, Stephen Andrews, Michael Docherty, Alison Wood, Richard Moore, Susan Skaife, Jasper Jackson and Andrew MacEwan whose time and efforts contributed to the success of Names, particularly in the development of a data model and in the quality assurance of data. They also helped to ensure that the results of the project live on in the form of ISNI identifiers for many UK researchers.

Advertisements

UK Researcher ID questionnaire

Posted in Discussions, identifiers, reports by Amanda Hill on 24 May, 2012

The Researcher Identifier Task and Finish Group convened by JISC is seeking responses to a questionnaire about the recommendations of the group. The text of the questionnaire is available in PDF format, if you want to read the whole thing before starting to answer the questions.

The purpose of this questionnaire is to consult within the UK research community about the feasibility and general acceptability of the task and finish group’s recommendations, of which those relating to ORCID and its implementation are central. The data gathered through this survey, along with that from interviews, will inform a report for JISC that will help prepare the ground for the UK-wide use of a common researcher ID which can be used to uniquely identify anyone involved in research.

Time is short – the deadline for completing the questionnaire is 4th June.

Knowledge Exchange DAI Summit: Day 2

Posted in identifiers, meetings, reports by Amanda Hill on 23 March, 2012

KE-DAI participants

On the second day of the Digital Author Identifier Summit, the participants spent time divided into separate groups, looking at issues of governance, interoperability and added value. I was in the Interoperability group which was concerned with identifying barriers to the interchange of digital author identifier information and recommending ‘next steps’ for the international scene.

It was a lively discussion, eventually focusing on the need for a canonical identifier for individuals at the international level. Paolo Bouquet advanced the idea that the canonical ID should be a light-weight service with a minimal set of metadata which would be sufficient to distinguish one entity from another. The first step is to identify who should provide this thin layer: both ORCID and ISNI were seen as candidate services, but ideally they should co-operate in this area. Once the ‘thin’ identifier layer is agreed upon, other identifier services would be able to map information found in their systems to the canonical ID. These lower-level systems would be able to provide various value-added  services, tailored for their particular constituencies, and would have to agree standard ways of sharing data between them. (For an example, see the Names Project’s API documentation.)

Paolo demonstrated the sig.ma Semantic Information Mashup as an example of a service which could then aggregate information from other services about an individual (Paolo himself, in this case). Sig.ma illustrates part of what Cliff Lynch was talking about on Day 1, with the ability of creating new biography services with data from author identifier systems. Paolo’s vision gained a fair degree of support from the group, although the issue of collaboration between ISNI and ORCID was seen as a possible problem area: the two approaches have very different business models and ways of obtaining information.

Priorities of participants

The feedback from the Added Value group was that the practical steps for existing systems would be to develop local IDs for authors/contributors and to make those available to other systems. The Governance group agreed that ISNI and ORCID are part of the solution and complementary but were concerned that if they did not agree on a way of collaborating, the landscape would become fragmented. They saw the importance of aligning business models with available funding sources and thought that the data should be open and trustworthy. In the summing-up of the two days, Cliff Lynch noted that both ORCID and ISNI are relatively young services and that there is still time to provide feedback at a high level to help ensure that they evolve in the most useful direction for the communities which need them.

Brian Kelly has pulled together the tweets from the workshop and there are overall summaries of the event on the Knowledge Exchange site and by Talat Chaudhri at the JISC Innovation Support Centre blog. It was an interesting and stimulating two days (it’s not often that I get to talk for two solid days about digital author identifiers!) and I’d like to take this opportunity to thank the organisers of the event for the chance of taking part.

UPDATED 11 April 2012: just to note that the Knowledge Exchange team have now published a report [PDF, 440KB] on the event.

Recent JISC-sponsored reports on researcher identifiers

Posted in identifiers, reports by Amanda Hill on 9 February, 2012

Late 2011 saw a small flurry of reports commissioned by JISC in the area of researcher identifiers, to support the work of the JISC Researcher Identifier Task and Finish Group. These reports are available from the JISC Information Environment Repository.

They are:

Researcher Identifiers Data sources report [PDF, 669Kb] by Cottage Labs

This report provides an overview of sources of data relevant to the task of creating profiles for academic researchers in the UK.

Researcher Identifiers Technical interoperability report [PDF, 506Kb] by Cottage Labs

This report discusses some of the technical aspects of implementing an identifier and profile system for researchers.

Stakeholder use cases and identifier needs: Report One [PDF, 204Kb] by Clax Limited

This report analyses UK research organisations’ use cases, needs, requirements and roles for an identifier system for researchers.

Stakeholder use cases and identifier needs: Report Two [PDF, 378Kb] by Clax Limited

This report investigates which technical systems would need to interoperate with any identifier infrastructure and examines the question of at what point an individual becomes a ‘researcher’.

Report on National Approaches to Researcher Identification Systems [PDF, 463Kb] by Hillbraith Limited

The remit of the report was to examine the approaches taken in other countries to the creation and maintenance of researcher identifiers.

DIGOIDUNA report on digital identifier infrastructures

Posted in identifiers, reports by Amanda Hill on 12 December, 2011

DIGOIDUNA report PDF, 7MB

A report on identifiers for digital object and authorshas been made available on the website of the EU-funded DIGOIDUNA study team. The project team state that:

The final report of the study is focused on three key objectives:

1. analyzing the fundamental role of identifiers as enablers of value in e-infrastructures and presenting forward looking scenarios as examples of the benefits of a systematic usage of identifiers for digital objects and authors to locate and integrate information from multiple sources;
2. reporting the results of the analysis of the Strengths, Weaknesses, Opportunities and Threats (SWOT) associated with establishing in Europe an open, dynamic and sustainable governance of e-infrastructure using identifiers for digital objects and authors;
3. presenting the main challenges and recommendations which European Commission and other relevant stakeholders should address to develop an open and sustainable e-infrastructure for locators of digital objects and identifiers of authors supporting scientific information access, curation and preservation.

The report provides a good analysis of the requirements for establishment of an infrastructure for digital identifiers and maintains that Europe is in a good position to set up initiatives in this area. Some of the issues identified by the report (and familiar to the Names Project team) include: fragmented current approaches, lack of financial sustainability, lack of consensus and resistance to change.

As the authors point out:

…technology is not the main driver in leading this process. Any identifier solution is always used within cultural, geographical, disciplinary and organizational boundaries through a technical system and the process of reaching an agreement between parties over possibly conflicting purposes and objectives is a process which is played out at the interfaces of these boundaries.

EPrints autocompleter using Names API

Posted in EPrints, reports by Amanda Hill on 27 September, 2011

This summer, JISC funded the Names Project to build a plugin for the EPrints software. In this post, developer Phil Cross describes his work on this.


The EPrints software has been designed to ease the process for creating add-ons and customisations for a repository. We wished to provide an automatic search of the Names API when users type author or editor details into an eprint creation form. We also wanted to be able to present disambiguating information to allow the selection of the correct author and to have the Names-assigned person URI added to the eprint metadata.

EPrints creator interface

We discovered that EPrints already has a built-in autocomplete function that searches over existing repository authors and that there is also an existing creator identifier field that allows the system to identify authors of multiple eprints. We therefore created an augmented version of the existing name autocomplete script that searches the Names API. The search pulls back affiliations, fields of interest and publication details as well as name details and the person URI. When this script is inserted into the code for a specific repository, it overrides the global script, adding the new functionality. Simply removing the script returns the repository to its default behaviour.

Drop-down list with Names metadata

The name details are displayed in a drop-down list together with the Names URI. Moving the cursor down the list opens a box next to each entry that contains the disambiguation information. Selecting a name adds the name details and URI to the form.

We were also able to make changes to the context-sensitive help for the author and editor fields. The altered help text contains information about the Names API search and provides a link for authors who wish to add their own details to the Names database.

The Extension Package produced can be used with any EPrints 3.x installation by unzipping the compressed package into the top directory of the chosen repository (a single EPrints installation can run multiple repositories). To disable the Names functionality, the administrator simply needs to delete the three files added. The package and instructions on how to install it are available on the Names site.

EPrints Bazaar

The newest version of EPrints, version 3.3, contains access to a new development called the Bazaar Store. This is an application store for the EPrints platform that enables repository administrators to install EPrints Plugins and Extensions with a single click. We have created a Bazaar Package that is a version of the Names Extension Package and this is now available in the Bazaar store for users of version 3.3.

Names autocompleter in Bazaar Store


Comments on this work are welcome and we are also interested in working with you if you would like to include details of researchers from your institution in the Names system, to help improve the data which is returned from the API and the autocompletion plugin.

Our survey said. . .

Posted in feedback, reports by Amanda Hill on 9 November, 2010

Stakeholder report (PDF, 771 KB)


In July this year the Names project team went on a feedback spree. We asked our Expert Panel members difficult questions about the project and, at the same time, we surveyed UK institutional repository managers with some slightly easier ones. The response rate was about the same for both exercises, with around a third of each group giving us their views.

We’ve combined the results of these two pieces of work into one report [PDF, 771KB], which is now available from the Names project’s website. Thanks to everyone who took part and to those institutional repository managers I visited in October as a follow-up exercise. Particular thanks are due to Barbara McCormack, who worked extremely hard on the survey.

From the executive summary of the report:

Expert Panel

Seven members responded to the request for feedback. There was general approval among these panel members for the approach that has been taken by the Names Project in developing its prototype and sharing the information within it. The members of the panel were unanimous in their support for the principle of assigning unique identifiers to researchers and making as much of the information associated with those identifiers as possible available as open linked data. Continuing co-operation with related initiatives was seen as important, with Names being just one of an international set of similar name-related services. The potential benefits that could be realised in a range of fields with the availability of Names services were mentioned, with the proviso that trials should be set up as soon as possible to demonstrate these.

In response to questions about sustainability, there were a number of suggestions about possible additional services which could be made available for a fee. It was generally agreed that a basic level of service should be made available free of charge, certainly within the UK research community.

Repository Managers

The survey of repository managers demonstrated that over three-quarters of respondents had encountered problems relating to identification of authors, including issues around variant forms of names or materials being wrongly attributed to authors with similar names. Around two-thirds of the sixty-five respondents thought that names-related functions such as deduplication, disambiguation and data clean-up services would be either useful or very useful. A quarter of respondents indicated a willingness (in principle) to pay for one or more of these services.

Initiatives for identifying institutions

Posted in identifiers, reports by Amanda Hill on 6 July, 2010

The NISO Institutional Identifers (I2) group has just released its mid-term report on the development of a new standard for uniquely identifying organisations. It encompasses all institutions, but particularly those…

…engaged in the selection, purchase, licensing, storage, description, management, and delivery of information (“information supply chain”).

This is a broad grouping,  including libraries, publishers, government departments, museums, archives, news services, universities and collaborative groups and consortia. There are existing identifiers for many of these organisations, a fact which is recognised in the report, which analyses some of them (namely the ISNI: International Standard Name Identifier (ISO draft standard 27729), the MARC Code List for Organizations, the ANSI/NISO SAN: Standard Address Number for the publishing industry and the D-U-N-S number for businesses).

The disadvantages of all these identifiers, except the ISNI, as expressed in the report, is that they do not support the decentralised creation of a core set of metadata about each organisation. Nor do they allow for the inclusion of an external link to an alternative identifier which provides additional information about the organisation. The report points out that many existing codes “pre-date the Web and are not ‘resolvable'” (they can’t be used to link directly to further information about the  institutions online). ISNI will support these things, but its disadvantage, from the perspective of I2, is that its metadata set is limited to the following fields  (this list is taken from the ‘ISNI Overview’ link on the ISNI site):

– Name of the Public identity
– Date and place of birth and or death (or registration and dissolution for legal entities)
– Class and Roles as defined by the Registration Agency. Classes define the repertoire (such as Musical, Audio-Visual, Literary,..) and Roles can be Author, Performer, Publisher,..
– Title of [or?] reference to a creation
– A URI (or URL) providing a link to more detailed information about the Public Identity.

This limited set of data is deliberate: ISNI is being designed as an open, light-weight, bridging system that will be used to link a variety of other databases, including proprietary ones, which would not share their internal information. The I2 proposed metadata set is richer, allowing for the expression of relationships between different organisations, such as hierarchies within institutions and records of former names. It does seem likely that the I2 group will recommend the use of the ISNI for the identifiers themselves. ISNI has a broader scope, being aimed at all creators of materials: their identifiers are going be used for individuals as well as for organisations. We have been talking to ISNI representatives about assigning ISNI identifiers to the people and institutions that have been uniquely identified within the Names system. This is likely to happen in the next few months.

The Names project is principally interested in uniquely identifying a subset of the I2 group’s target audience: UK academic institutions. Currently these are identified in Names at a high level and with minimal additional information (see, for example, this record for The University of Manchester), but in the future we may want to capture information about lower levels of hierarchy within the institutions: department or faculty details, for example. The data model for Names allows for this additional information to be held, although the system does not do so at the moment.

Below is a very rough-and-ready mapping of the corporate name elements identified in the Names project’s Data Analysis Report of March 2008 [PDF file] (which was based on the Functional Requirements for Authority Data) with those in the I2 group’s draft schema and the above list of ISNI elements. Names in bold are represented in both the I2 list and in the Names one. There is a fair degree of agreement across the two, if not exact mappings in all cases.

Names model

I2 Metadata

ISNI

Corporate body

Name

Y

Place

Date

Usage date (for variant names only)

Y

Other designation

Type

Type

Y

Language

Language

Address

Location/Country/State/City

Field of activity

Y

History

Known by

Assigned identifiers

Institution identifier & variant identifiers

Y

URL

Y

Expanded form of name

Variant name

Acronym/abbreviation

Pseudonym

Alternative language form of name

Other variant name

Hierarchical relationship

Affiliated institution

Sequential relationship

Domain

Contact information

Note

The I2 group are looking for feedback on their interim report. If you have views on the ways in which institutional identifiers might be generated, used and sustained, and would like to influence the development of this NISO standard, this is your chance to get involved.

Looking forwards, looking back

Posted in reports by Amanda Hill on 24 August, 2009

Just a brief note to say that the final report from the first phase of the Names project and the project plan for the second phase are both now available from the project website.

Image of Janus at the Musée des Beaux-Arts de Montréal, from Flickr user quinn.anya

Image of Janus at the Musée des Beaux-Arts de Montréal, from Flickr user quinn.anya

Name authority for dead people

Posted in identifiers, reports by Amanda Hill on 13 July, 2009

A JISC-funded project on the possibilities of using automatically generated metadata in the context of UK higher education has recently been co-ordinated by Intrallect Ltd. The project commissioned a series of reports on different aspects of metadata that might be obtained automatically. These reports are now available on the project’s wiki. They include one on ‘Person Metadata’, which was written by me, based on the experiences we’ve had with the Names Project. The wiki allows for the reports to be annotated with comments, so please chip in if you have any observations.

One area I am keen to see progress in is in building a name authority file that would be a shared resource for the cultural heritage sector. This formed one of the recommendations in my report. Perhaps it might seem a bit off-topic, but I do worry that the needs of institutional repositories have somewhat eclipsed the requirements of archives, museums and galleries in this area. I’ve been peripherally involved in some discussions with the Archives Hub team and others about this. The National Archives (TNA) maintains the kernel of an archival national name authority file as part of the UK’s National Register of Archives (NRA), but this is not easily added to by staff at other institutions and (from my perspective, anyway), there seems little will by TNA to further develop this resource in ways that would make it more useful for the cultural heritage sector and for the users of electronic resources provided by museums, galleries, archives and other organisations with a more historical view of the world.

As is the case with repositories, people mentioned in archives (or creators and owners of archival and museum materials) may not be represented in library authority files. An archival standard for authority files allows for rich description of individuals, families and organisations but as yet there is no easy way for institutions to share this information or to pool these descriptions together. A set of rules developed within the UK archival community in the 1990s gives guidance on creating an authoritative form of a name, but this has not solved the problem, as this screenshot of name index terms in the Archives Hub illustrates:

Browsing the Archives Hub for Alice Green

Browsing the Archives Hub for Alice Green

A way of associating the different forms of a name with a unique identifier would be more useful than ensuring that Alice Green’s name is always written in exactly the same way. That identifier could then be used to group all records relating to Alice together. The National Register of Archives’ page for Alice Green attempts to do just that, but is not open for additions by anyone outside TNA. The NRA’s identifier for Alice is GB/NNAF/P125310 but the number that retrieves her page within the system is an earlier version of this (GB/NNAF/P11998), which isn’t ideal.

There seems to me to be an opportunity here to build a collaborative service that would be of enormous benefit to those documenting our heritage and those seeking to find out about it. The current information in the National Register of Archives could be the core of this, in a service that is open to other institutions to edit and that is made available to both web users and to other systems. Lukas Koster’s overview of ‘Linked Data for Libraries’ describes the principles and the end result I have in mind for such information. Tim Berners-Lee’s TED talk in February this year is a great introduction to this area, too.

Actually, now I’ve written all that, this sounds a lot like what we’re trying to do for the repository sector with the Names project. It’s just that there isn’t a big overlap with the people currently active in UK research and those that the cultural heritage community care about…