Yesterday saw an important milestone in the progress of the ORCID researcher and contributor identifier initiative, as the service was launched to the public. You can now register for an ORCID and use the built-in CrossRef search to choose publications that you have written to link to your identifier.
Following on from the launch, the ORCID team hosted a meeting today in Berlin to celebrate the launch and to share news about recent developments in the work of the ORCID team and the broader ORCID stakeholder community.
Howard Ratner gave an overview of the history of ORCID to date to kick off the meeting. The initiative started back in 2009 (we reported on the first stakeholder meeting in this blog post). The first phase of ORCID is aimed squarely at individual researchers and by lunchtime today over 850 people had signed up for an identifier in the system. Future phases will look at options for importing records from other trusted sources and conversations with ORCID representatives at the meeting today confirmed our feeling that the disambiguated data within Names would be a good set for ORCID to work with, especially as the records already contain ISNIs (International Standard Name Identifiers) which are in the same form as ORCIDs. Watch this space for updates on that!
A number of systems which had implemented some kind of ORCID integration were demonstrated at the meeting, including ImpactStory, a site which measures the bookmarking, sharing and saving of publications, slideshows and data sets from a range of different websites, including ORCID, SlideShare, Dryad, GitHub and Google Scholar.
After a keynote from JISC’s Josh Brown, a panel session in the afternoon discussed the relationship between international initiatives such as ORCID and ISNI and specialist or national services operating in the same space. Magchiel Bijsterbosch of the SURF foundation in the Netherlands talked about the situation there, where there is a fairly mature author identification system. Magchiel raised a number of challenges faced by national systems but concluded that there would probably still be a role for such systems in the area of disambiguation. There was some consensus in the meeting that ORCID might want to delegate disambiguation to the wider community, particularly to local experts, rather than attempt to take on this role itself for the whole world.
The Names Project was represented at an event in Barcelona today which looked at the role of author identifiers and ways of integrating them into the procedures of institutions, and institutional repositories in particular. A number of different perspectives were presented at the event, including publishers, funders and identifier providers. There are videos of the talks available:
Martin Fenner on ORCID:
Me on the Names Project:
Gerry Lawson on the funder’s perspective:
Some interesting statistics emerged from Gerry Lawson’s talk concerning the number of researchers in Europe. All European governments have to report on the numbers of full-time-equivalent researchers in higher education, business, government and non-profit sectors. These figures are available from the Eurostat site and cover the years 1999 to 2010. The figures for the UK between 2005 and 2010 are fairly static, with a high of 254,009 in 2006 to a low of 235,373 in 2010. Germany has the highest number of researchers, at 327,500, with a noticeable increase in numbers each year. For the EU as a whole, the figure for 2010 is over 1.5 million researchers.
The number of individuals represented by these figures will be higher than the total of FTEs, of course, but at least this gives us an idea of the number of people who may ultimately need to be covered by services like Names (and some confidence in the figure of 20% of UK researchers that we’ve been estimating that Names currently holds). Philip Purnell’s presentation also gave us some figures for the number of researcher who have registered with the ResearcherID service from Thomson Reuters. For the UK, this is currently 14,033.
It will be interesting to see how many researchers sign up for an ORCID identifier when the service launches on 15th October. One of the options that will be offered is the ability to transfer information from an existing ResearcherID into an ORCID, which will be useful those researchers who are already registered in that service. ResearcherID also offers institutions the option of assigning IDs in batches, free of charge, to their researchers. This differs from the ORCID model, which will allow institutions to submit ORCIDs in bulk only if they are ORCID members (at an annual cost of $5,000 for small institutions). Martin Fenner suggested that small institutions might want to encourage their researchers to register themselves, as this process is free of charge.
On the second day of the Digital Author Identifier Summit, the participants spent time divided into separate groups, looking at issues of governance, interoperability and added value. I was in the Interoperability group which was concerned with identifying barriers to the interchange of digital author identifier information and recommending ‘next steps’ for the international scene.
It was a lively discussion, eventually focusing on the need for a canonical identifier for individuals at the international level. Paolo Bouquet advanced the idea that the canonical ID should be a light-weight service with a minimal set of metadata which would be sufficient to distinguish one entity from another. The first step is to identify who should provide this thin layer: both ORCID and ISNI were seen as candidate services, but ideally they should co-operate in this area. Once the ‘thin’ identifier layer is agreed upon, other identifier services would be able to map information found in their systems to the canonical ID. These lower-level systems would be able to provide various value-added services, tailored for their particular constituencies, and would have to agree standard ways of sharing data between them. (For an example, see the Names Project’s API documentation.)
Paolo demonstrated the sig.ma Semantic Information Mashup as an example of a service which could then aggregate information from other services about an individual (Paolo himself, in this case). Sig.ma illustrates part of what Cliff Lynch was talking about on Day 1, with the ability of creating new biography services with data from author identifier systems. Paolo’s vision gained a fair degree of support from the group, although the issue of collaboration between ISNI and ORCID was seen as a possible problem area: the two approaches have very different business models and ways of obtaining information.
The feedback from the Added Value group was that the practical steps for existing systems would be to develop local IDs for authors/contributors and to make those available to other systems. The Governance group agreed that ISNI and ORCID are part of the solution and complementary but were concerned that if they did not agree on a way of collaborating, the landscape would become fragmented. They saw the importance of aligning business models with available funding sources and thought that the data should be open and trustworthy. In the summing-up of the two days, Cliff Lynch noted that both ORCID and ISNI are relatively young services and that there is still time to provide feedback at a high level to help ensure that they evolve in the most useful direction for the communities which need them.
Brian Kelly has pulled together the tweets from the workshop and there are overall summaries of the event on the Knowledge Exchange site and by Talat Chaudhri at the JISC Innovation Support Centre blog. It was an interesting and stimulating two days (it’s not often that I get to talk for two solid days about digital author identifiers!) and I’d like to take this opportunity to thank the organisers of the event for the chance of taking part.
UPDATED 11 April 2012: just to note that the Knowledge Exchange team have now published a report [PDF, 440KB] on the event.
For a meeting held in the grounds of the former Royal Mint near the Tower of London, it was probably appropriate that at lot of the discussion on the first day of the Digital Author Identifier Summit should focus on the financial aspects of building identifier systems for researchers and/or authors. An international group representing digital infrastructure specialists and people involved in building identifier systems are looking at the requirements of researchers, institutions, funders and publishers in this rapidly-evolving field. The meeting has been convened by the Knowledge Exchange, a Danish/Dutch/German/British grouping of institutions interesting in using technology to improve access to research materials.
It is interesting to see how the discussion has moved on since March 2009, when many of the participants in this meeting met in Amsterdam to begin discussions in this area. Existing systems have matured since then, and back in March 2009 no-one had heard of ORCID.
Several points came up yesterday which I think are worth mentioning here. One was the notion that different users of author identifier systems have different requirements in terms of the quality and completeness of the data in those systems. So a service which covers 80% of researchers might have enough to be useful for a range of other services, even though it is not complete.
Participants were asked to imagine that they had a magic wand and could grant three wishes in relation to DAIs. Common themes quickly appeared: openness of the data was an oft-mentioned priority – the information needs to be freely available in order to build other useful services on top of it. Other popular choices were the importance of having a single identifier for an author at an international level and an agreed way of aligning national identifier services with international ones. It was agreed that the benefits to the individuals being identified should be easily demonstrated to ensure their engagement.
Group discussions in the afternoon focused on the role of DAIs from the viewpoint of suppliers of information, those needing the data and those in charge of working out how the systems should be overseen. One interesting point from the reporting of these groups was the general acceptance that digital author identifier systems are ‘resistant to traditional business models’ (Cliff Lynch) and ideally should be funded as elements of infrastructure. This is mainly because the data held in the systems needs to be freely available for re-use to make the most of having them (and to create the ‘frictionless sharing’ and ‘bridges of trust’ which were mentioned in the meeting), but no-one is expecting individual researchers or authors to pay a fee in order to register their identifier.
Today the discussion will move on to analysing opportunities and challenges in issues of governance, interoperability and added value and maybe come up with some actions for members of this international group to take on.
It’s been an interesting week of visits to institutional repositories around Scotland and England. One thing that has become very clear is that no two repositories are doing things in exactly the same way: even those with very similar software set-ups.
Some institutions have a great deal of control over the names in their repository (the University of Warwick is a good example of this), while others have control over the names of researchers from their own institution (often through a manual or automatic check against the human resources database), but no standardisation for co-authors who are outside the university. From conversations this week, it seems that one of the most useful things the Names Project will be able to do in the near future is to take those lists of external researchers (along with the title of their article or conference paper) and provide the repository with an identifier for them, matched from the Zetoc data that we’re using to disambiguate people.
We don’t have affiliation data in the Names pilot system yet – changing this proves to be a bit of a stumbling block with institutions, who are reluctant to give us information about their researchers (the Data Protector spectre looms large when I raise this topic*). The RAE 2008 data is available on the web now, giving access to information about researchers and institutions, so the project will be using this to supply some of the affiliation data for those researchers covered by that Research Assessment Exercise.
We’ll also be doing a full Zetoc extract and disambiguation in the next month or two, which means that there will be many more names in the pilot system than we’ve got at present. It’s always a bit embarrassing when I’m demonstrating the system and a university’s prize researcher isn’t mentioned anywhere.
A few institutions mentioned metasearch implementations as a future source of name problems: some systems will be searching across existing institutional databases which may have treated names in different ways. Others will search across multiple institutional repositories which may record the names of contributors according to an internal standard, which may be different from the way that other institutions describe the same item, resulting in multiple entries for a single journal article across repositories.
I’m picking up some good examples of specific name-related problems on my travels, too. At the University of Birmingham, for example, there are two professors with the same first name, surname and middle initial, while at De Montfort University there are a husband and wife with the same initial and surname who occasionally collaborate on the same paper. I am sure there are many similar tales at other institutions.
Thanks very much to everyone who has taken the time to explain their system to me – it’s been really interesting and great to see how things work (or don’t work) in relation to names in repositories. I’ll be at Internet Librarian International on Thursday and Friday – so if I haven’t been able to get to you yet and you’re there and would like to talk about the names in your repository, let me know.
*I visualise this as something like the Ghost of Christmas Yet to Come.
The Names Project ran a survey of UK institutional repository managers in July, asking about their experiences of name-related issues in relation to repositories. We had a good response, with 65 people completing the questionnaire (we’ll be sharing the overall findings soon). The last question in the survey was ‘Would you be willing to discuss your answers further or to be a case study for the Names project?’. Over half of the respondents said ‘yes’ to this, so we’re now following up on that promise. I’m spending the next two weeks trying to visit as many of those people as possible.
It’s a great opportunity to see how people are dealing with name-related issues at first hand and to explain in more detail what we’re trying to achieve with the Names project. The tour starts today in Aberdeen and I’ll be visiting ten repositories in eight cities over the next two weeks (a test of the UK rail network, if nothing else). I will also be talking about Names at the Internet Librarian International conference in London on Friday 15th October, so if I haven’t managed to visit your repository, perhaps there will be a chance to catch up then. I am also attending JISC’s The Future of Research? event on 19th October in London and will be demonstrating the Names pilot system at the Mimas stand there.
One of the principal aims of the Names Project is to assess the feasibility of a name authority service for institutional repositories. Such an endeavour could potentially alleviate many of the issues inherent with the management of identities in an IR environment.
In a climate of ‘publish or perish’ the correct attribution of research output to an author identity is of primary importance. However, inconsistencies arising from the variant nature of names hinder such efforts, fostering an element of unreliability with regards to repository output.
Some IR administrators are endeavouring to address this issue by implementing local approaches to name matching. One such repository is Strathprints, the Institutional Repository of the University of Strathclyde. I recently met with staff involved in the administration of this repository in order to gain a greater understanding of the issues involved in name authority control and to determine the viability of such in-house approaches to disambiguation. I also wanted to assess the usefulness of a service like Names as a tool in this regard.
The main points arising from this meeting related to the ineffectiveness of traditional library authority control sources for the management of IR identities, the impracticality of manual disambiguation as a long-term solution to authority control, and the need for a centralised solution to fragmented authority control initiatives.
The Strathprints repository has facilitated access to the research output of the University of Strathclyde since October 2005. It currently runs on EPrints 3 repository software but will be migrating to PURE, an integrated CERIF-based system, in the coming months. This move is part of a wider effort by the University to amalgamate research related data from various faculties into a single uniform service. The assimilation of previously independent systems (e.g. grant and funding databases) may provide richer contextual data with regards to overall identity management and authority control.
One of the primary motivators for establishing a local approach to identity management related to the increasing number of identities ingested into Strathprints through batch imports from Thomson Reuters. This coupled with the anticipated migration to PURE highlighted the need for a sustainable approach to authority control.
Initial attempts to manage researcher identities concentrated on traditional library approaches to authority control. Library of Congress authority files were checked for established formats of author names. However, low percentages of name matches hindered this process. A manual approach to disambiguation was developed and subsequently implemented by Strathprints administrators. Name matching was based on the following matching criteria: name strings, subject areas, journal titles, and affiliation data. Much of this data was generated externally (either through an analysis of articles associated with the name itself or through Google searches). This proved to be both time consuming and labour intensive. The possibility of using HR data to facilitate automated name matching was investigated but not found to be particularly useful.
Attempts at disambiguating both internal Strathclyde researcher names and external researcher names associated with other institutions proved to be unsustainable in the long-term. Such efforts may result in a possible duplication of disambiguation efforts between institutions and highlights the potential for a centralised, cooperative service for the management of researcher identities.
A sustainable approach to authority control characterised by automated and manual disambiguation processes may address many of the problems identified by Strathprints administrators. A service such as Names might be a logical option in this regard and we look forward to working with this institutional repository in the near future.
Many thanks to Emma McCulloch, Anu Joseph, and Marlene Lightbody of the University of Strathclyde for taking the time to answer my (many!) questions.
1. to review the current position
2. to come to a shared vision of an international repositories infrastructure or, at least, the infrastructure components that might best be developed internationally
3. to identify the essential components of an international repositories infrastructure
4. to review the approaches to sustainability, scalability and interoperability being taken by these components, bearing in mind the wider research infrastructure
5. to agree ways to resolve any issues identified in (3) above, including areas where practical international collaboration would help
6. to identify critical success factors in achieving the progress identified in (4) above, bearing in mind the current position
7. to consider ways in which the progress might be coordinated and reviewed over time
Yesterday began with a presentation by Norbert Lossau of the DRIVER project which set the scene for the workshop. There are about 100 attendees, who were then invited to attend one of the four breakout groups to draw up action plans which will be presented to various representatives of funding organisations today. Quite a tall order, this, but it certainly concentrates the mind (although I don’t get the impression that the funders are going to be in a position to hand out cash right now…).
The group that is of most relevance to the Names Project was the one looking at ‘Interoperable Identification Infrastructure’. Which is quite a hard phrase to get your tongue around. My impression (mainly gleaned from the lively Twittering that was emanating from the other groups) was that this ended up being quite a small cohort, compared to the other sessions. It was certainly quite tightly focused, thanks in large part to the chair of the group, Andrew Treloar of the Australian National Data Service. Andrew had expressed concern about the connotations of the location we had been allocated, namely the Batavia room, as this was an infamous Dutch ship that became shipwrecked in Australia, resulting in mutiny and murder. As it turned out, our discussion was certainly lively, but rarely mutinous.
It was quickly decided to put the initial priority on identifying people, organisations and digital objects. The original draft had also included identification for funding programmes, research projects and collections of digital objects. The action plan for further progress is gradually coming into shape and this morning we will be continuing to work on it, so that there is something for the funders to discuss this afternoon. A closing keynote by Cliff Lynch of the Coalition for Networked Information will bring the main event to its conclusion.
I recommend following the Twitter tag #repinf09 for live updates on today’s discussions. Although not from me, I’m afraid. I have great admiration for those who are able to Tweet while listening: it was all I could do to take in everything was being said, without the additional burden of trying to summarise it for the rest of the world. And they say it’s men that can’t multi-task…