SAPIENT – Semantic Annotation of Papers: Interface & ENrichment Tool
Tuesday, December 1st, 2009The Aberystwyth University from the UK has just released SAPIENT in the context of the ART project.
SAPIENT stands for “Semantic Annotation of Papers: Interface & ENrichment Tool”. It is an annotation interface implemented as a web application, to help users annotate scientific papers in XML, sentence by sentence, with a set of concepts called General Scientific Concepts (GSCs, see http://ie-repository.jisc.ac.uk/88/). GSCs constitute the set of concepts essential for describing a scientific investigation. However, SAPIENT can also be used in conjunction with other annotation schemas to annotate papers in XML sentence by sentence.SAPIENT also incorporates Oscar3 functionality, allowing the automatic annotation of chemical named entities.
SAPIENT includes the so called SAPIENT Sentence Splitter (SSSplit). SSSplit is an XML-aware sentence splitter which preserves XML markup and identifies sentences through the addition of in-line markup. The reason for developing our own sentence splitter was that sentence splitters widely available could not handle XML properly. The XML markup contains useful information about the document structure and formatting in the form of inline tags, which is important for determining the logical structure of the paper.
SSSplit has been written in the platform-independent Java language (version 1.6), based on and extending open source Perl code for handling plain text. In order to make our sentence splitter XML aware, we translated the Perl regular expression rules into Java and modifed them to make them compatible with the SciXML schema.
Via: http://zillman.blogspot.com/2009/11/sapient-semantic-annotation-of-papers.html

Our author
![Reblog this post [with Zemanta]](http://img.zemanta.com/reblog_e.png?x-id=5e998e20-4127-44f6-8d35-a6082d123ebf)