Names Project Blog

Text Mining for Scholarly Communications and Repositories Joint Workshop

Posted in conferences by Amanda Hill on 29 October, 2009
Manchester Interdisciplinary Biocentre

Manchester Interdisciplinary Biocentre

Two members of the Names Project team attended the NaCTeM/UKOLN text mining workshop in Manchester on 28-29th October. The event was an opportunity for us to find out how text mining tools have been used within the academic community and to understand the relevance of them to repositories and publishers which are important stakeholders for the Names Project.

The Director of the National Centre for Text Mining (NaCTeM), Sophia Ananiadou, gave a good introduction to the event, explaining that text mining provides annotations to unstructured textual materials which allow semantic enrichment of the text; making implicit knowledge within the materials explicit. A range of perspectives on text mining were then represented, from the academic (linguistics, biology, chemistry and social science) to publishers (Elsevier and the Nature Publishing Group) and service providers (Mimas, EDINA and Microsoft Research).

A theme mentioned by Tony Hey of Microsoft was that if tools like text mining are to be taken up widely by the scientific community (and I presume, by extension, the wider academic world), then they need to be as simple to use as the Web 2.0 tools that are being widely used by general web users. This was echoed in two subsequent talks: Rafael Sidi of Elsevier (who got through an eye-boggling 180 slides in 30 minutes!) emphasised the importance of openness in encouraging innovation and Paul Walk of UKOLN gave us the developers’ point of view, pointing out that access to data without unnecessary obstacles was essential to get the developer community to make use of services.

The closing session allowed a panel of six experts to give their view of the future of text mining, particularly in the context of institutional repositories. Areas that were seen as important were involving end-users in evaluating the effectiveness of text-mining tools (comparing results to those that can be obtained using manual methods); improving repository metadata by using automatic classification of full-text materials such as theses and papers; searching across multiple repositories; developing standards for semantically annotating materials and recording the provenance of those annotations; capturing work-in-progress information generated by researchers that does not get formally published (e.g. laboratory workbooks recording unsuccessful experiments). One issue that (inevitably) generated a lot of discussion was the problem of getting permission to use full-text materials for text-mining purposes given restrictions imposed by copyright laws and by publishers who put limits on annotation of their articles.

Thanks to UKOLN and NaCTeM for organising an interesting event which gave all the attendees plenty to think about and to discuss.

Names Project recruiting

Posted in recruitment by Amanda Hill on 8 October, 2009

The British Library are recruiting an Analyst for the Names Project. The information below is taken from the job details on their recruitment site.

Ref: O&S00184
Location: Boston Spa, Yorkshire
Position Type: Fixed Term
Specialism: Cataloguers
Salary: £22,063 – £23,896

Fixed term appointment for 2 years

Closing date: 18 October 2009

A. Rose by any other name might be, “Alex” to her friends, “Dr. Alexandra Rose”, to her students, “Dr. Alexandra N. Rose”, to her funders and, “A.N. Rose, PhD”, to her publishers; and she is not the only A. Rose. For the higher education and research communities identification of researchers and authors is difficult. The Names 2 Project aims to develop innovative and scalable solutions to problems of identification, attribution and affiliation.

We are recruiting an Analyst to help turn this project from a concept to a service. This is a full time, fixed term post, funded for 2 years by JISC (Joint Information Systems Committee). Names 2 is led by Mimas, based at the University of Manchester.

The successful candidate will have excellent communications skills and work effectively to deadlines. Experience of cataloguing at a professional level, using internationally recognised standards is essential. First hand knowledge or experience of institutional repositories or authority control will be an advantage. The post holder will work as part of a distributed project team.

For an informal discussion about this role please contact Alan Danskin on 01937 546669.

Looking forwards, looking back

Posted in reports by Amanda Hill on 24 August, 2009

Just a brief note to say that the final report from the first phase of the Names project and the project plan for the second phase are both now available from the project website.

Image of Janus at the Musée des Beaux-Arts de Montréal, from Flickr user quinn.anya

Image of Janus at the Musée des Beaux-Arts de Montréal, from Flickr user quinn.anya

Name authority for dead people

Posted in identifiers, reports by Amanda Hill on 13 July, 2009

A JISC-funded project on the possibilities of using automatically generated metadata in the context of UK higher education has recently been co-ordinated by Intrallect Ltd. The project commissioned a series of reports on different aspects of metadata that might be obtained automatically. These reports are now available on the project’s wiki. They include one on ‘Person Metadata’, which was written by me, based on the experiences we’ve had with the Names Project. The wiki allows for the reports to be annotated with comments, so please chip in if you have any observations.

One area I am keen to see progress in is in building a name authority file that would be a shared resource for the cultural heritage sector. This formed one of the recommendations in my report. Perhaps it might seem a bit off-topic, but I do worry that the needs of institutional repositories have somewhat eclipsed the requirements of archives, museums and galleries in this area. I’ve been peripherally involved in some discussions with the Archives Hub team and others about this. The National Archives (TNA) maintains the kernel of an archival national name authority file as part of the UK’s National Register of Archives (NRA), but this is not easily added to by staff at other institutions and (from my perspective, anyway), there seems little will by TNA to further develop this resource in ways that would make it more useful for the cultural heritage sector and for the users of electronic resources provided by museums, galleries, archives and other organisations with a more historical view of the world.

As is the case with repositories, people mentioned in archives (or creators and owners of archival and museum materials) may not be represented in library authority files. An archival standard for authority files allows for rich description of individuals, families and organisations but as yet there is no easy way for institutions to share this information or to pool these descriptions together. A set of rules developed within the UK archival community in the 1990s gives guidance on creating an authoritative form of a name, but this has not solved the problem, as this screenshot of name index terms in the Archives Hub illustrates:

Browsing the Archives Hub for Alice Green

Browsing the Archives Hub for Alice Green

A way of associating the different forms of a name with a unique identifier would be more useful than ensuring that Alice Green’s name is always written in exactly the same way. That identifier could then be used to group all records relating to Alice together. The National Register of Archives’ page for Alice Green attempts to do just that, but is not open for additions by anyone outside TNA. The NRA’s identifier for Alice is GB/NNAF/P125310 but the number that retrieves her page within the system is an earlier version of this (GB/NNAF/P11998), which isn’t ideal.

There seems to me to be an opportunity here to build a collaborative service that would be of enormous benefit to those documenting our heritage and those seeking to find out about it. The current information in the National Register of Archives could be the core of this, in a service that is open to other institutions to edit and that is made available to both web users and to other systems. Lukas Koster’s overview of ‘Linked Data for Libraries’ describes the principles and the end result I have in mind for such information. Tim Berners-Lee’s TED talk in February this year is a great introduction to this area, too.

Actually, now I’ve written all that, this sounds a lot like what we’re trying to do for the repository sector with the Names project. It’s just that there isn’t a big overlap with the people currently active in UK research and those that the cultural heritage community care about…

Institutional identifiers for repositories

Posted in identifiers by Amanda Hill on 19 June, 2009

The Names Project is represented on the NISO I2 group that is looking into the requirements for unique identifiers for organisations. There is a subset of this group which is focusing on the needs of digital repositories and this group is currently asking repository managers and other interested parties to complete a questionnaire about current and future practice in repositories in relation to uniquely identifying organisations and their constituent parts.

If you are interested in this area, please take the survey.

Tweeting

Posted in Uncategorized by Amanda Hill on 19 June, 2009

Dan Needham is now sharing updates about his work developing the Names prototype on Twitter.

Web Services and Repositories

Posted in conferences by Amanda Hill on 5 June, 2009
Web Services and Repositories Slides

Web Services and Repositories Slides

Dan Needham, the developer working on the Names Project, attended the Web Services and Repositories workshop that was organised by the EThOS project and held at the British Library on 2nd June.

He gave a presentation [PowerPoint format, 205KB] on the project and the aims behind the web services for the Names prototype that he’s been working on and recently testing with colleagues from Cranfield University.

UPDATE: the audio from Dan’s presentation and all the other materials from the day are now available on the EThoS site.

Tagged with: ,

OCLC’s Networking Names report

Posted in reports by Amanda Hill on 1 May, 2009
Networking Names report

Networking Names report

The report from OCLC’s Networking Names Advisory Group which we blogged about last year is now available. The aim of this publication (taken from the introduction) is to:

[articulate] the problem space that the research community needs to address and identify components of a “Cooperative Identities Hub” that would have the most impact across different target audiences. The group developed use case scenarios that provide the context in which different communities would benefit from aggregating information about persons and organizations, corporate and government bodies, and families, and making it available on a network level. This report summarizes the group’s recommendations on the functions and attributes needed to support the use case scenarios.

And if you haven’t got time to read the report, you can always take a look at the Wordle word-cloud that has been generated from it:
Wordle: Networking Names

Double-barrelled Names

Posted in Uncategorized by Amanda Hill on 30 April, 2009

Just a brief update to say that the Names project is entering a second phase, thanks to continuing funding from the JISC. In this next period we will be further developing the prototype name authority system into a pilot. This continuation will extend the project for a further two years, building the prototype into a form that will be useful for repository services, and working with new sources of information to improve the quality of the data within the system.

Managing identities

Posted in conferences by Amanda Hill on 25 March, 2009

Edinburgh Castle at dusk

Edinburgh Castle at dusk

The JISC conference was held in Edinburgh this year and has been widely held to be a great success. The parallel sessions included a fairly interactive one on identities, called ‘As You Like Identity’, which has been fairly comprehensively blogged about by Rach Colling and also by James Farnhill. (And yes, I am the Amanda mentioned in Rach’s post.) Conclusions from this group included the need to help people to be aware of the significance of their online presence (for example in relation to potential employers checking people out) and the difficulties of correcting false or presented-out-of-context information.

In the afternoon I attended the session on e-theses, which was chaired by Owen Stephens and also thoroughly blogged by him (which is quite an impressive feat). Author identities were only touched upon in passing here, but the Entry to EThOS (E2E) project at King’s College is using student record systems to populate name (and other) metadata associated with electronic theses, which sounded interesting. The overlap between the people involved in the creation of theses and those who are producing research outputs is clearly high, meaning that there will be good reasons in the near future for the Names Project to work together with those involved in managing e-theses and digitising the paper versions.