Names at the coal face
It’s been an interesting week of visits to institutional repositories around Scotland and England. One thing that has become very clear is that no two repositories are doing things in exactly the same way: even those with very similar software set-ups.
Some institutions have a great deal of control over the names in their repository (the University of Warwick is a good example of this), while others have control over the names of researchers from their own institution (often through a manual or automatic check against the human resources database), but no standardisation for co-authors who are outside the university. From conversations this week, it seems that one of the most useful things the Names Project will be able to do in the near future is to take those lists of external researchers (along with the title of their article or conference paper) and provide the repository with an identifier for them, matched from the Zetoc data that we’re using to disambiguate people.
We don’t have affiliation data in the Names pilot system yet – changing this proves to be a bit of a stumbling block with institutions, who are reluctant to give us information about their researchers (the Data Protector spectre looms large when I raise this topic*). The RAE 2008 data is available on the web now, giving access to information about researchers and institutions, so the project will be using this to supply some of the affiliation data for those researchers covered by that Research Assessment Exercise.
We’ll also be doing a full Zetoc extract and disambiguation in the next month or two, which means that there will be many more names in the pilot system than we’ve got at present. It’s always a bit embarrassing when I’m demonstrating the system and a university’s prize researcher isn’t mentioned anywhere.
A few institutions mentioned metasearch implementations as a future source of name problems: some systems will be searching across existing institutional databases which may have treated names in different ways. Others will search across multiple institutional repositories which may record the names of contributors according to an internal standard, which may be different from the way that other institutions describe the same item, resulting in multiple entries for a single journal article across repositories.
I’m picking up some good examples of specific name-related problems on my travels, too. At the University of Birmingham, for example, there are two professors with the same first name, surname and middle initial, while at De Montfort University there are a husband and wife with the same initial and surname who occasionally collaborate on the same paper. I am sure there are many similar tales at other institutions.
Thanks very much to everyone who has taken the time to explain their system to me – it’s been really interesting and great to see how things work (or don’t work) in relation to names in repositories. I’ll be at Internet Librarian International on Thursday and Friday – so if I haven’t been able to get to you yet and you’re there and would like to talk about the names in your repository, let me know.
*I visualise this as something like the Ghost of Christmas Yet to Come.