Posted in conferences by Amanda Hill on 29 October, 2009
Manchester Interdisciplinary Biocentre

Two members of the Names Project team attended the NaCTeM/UKOLN text mining workshop in Manchester on 28-29th October. The event was an opportunity for us to find out how text mining tools have been used within the academic community and to understand the relevance of them to repositories and publishers which are important stakeholders for the Names Project.

The Director of the National Centre for Text Mining (NaCTeM), Sophia Ananiadou, gave a good introduction to the event, explaining that text mining provides annotations to unstructured textual materials which allow semantic enrichment of the text; making implicit knowledge within the materials explicit. A range of perspectives on text mining were then represented, from the academic (linguistics, biology, chemistry and social science) to publishers (Elsevier and the Nature Publishing Group) and service providers (Mimas, EDINA and Microsoft Research).

A theme mentioned by Tony Hey of Microsoft was that if tools like text mining are to be taken up widely by the scientific community (and I presume, by extension, the wider academic world), then they need to be as simple to use as the Web 2.0 tools that are being widely used by general web users. This was echoed in two subsequent talks: Rafael Sidi of Elsevier (who got through an eye-boggling 180 slides in 30 minutes!) emphasised the importance of openness in encouraging innovation and Paul Walk of UKOLN gave us the developers’ point of view, pointing out that access to data without unnecessary obstacles was essential to get the developer community to make use of services.

The closing session allowed a panel of six experts to give their view of the future of text mining, particularly in the context of institutional repositories. Areas that were seen as important were involving end-users in evaluating the effectiveness of text-mining tools (comparing results to those that can be obtained using manual methods); improving repository metadata by using automatic classification of full-text materials such as theses and papers; searching across multiple repositories; developing standards for semantically annotating materials and recording the provenance of those annotations; capturing work-in-progress information generated by researchers that does not get formally published (e.g. laboratory workbooks recording unsuccessful experiments). One issue that (inevitably) generated a lot of discussion was the problem of getting permission to use full-text materials for text-mining purposes given restrictions imposed by copyright laws and by publishers who put limits on annotation of their articles.

Thanks to UKOLN and NaCTeM for organising an interesting event which gave all the attendees plenty to think about and to discuss.


One Response

  Fernanda said, on 1 December, 2009 at 5:51 pm

