Thursday, February 14, 2008

Semantic Extraction - Calais Web Service

According to the FAQ for Calais:
The Calais initiative seeks to help make all the worlds content more accessible, interoperable and valuable via the automated generation of rich semantic metadata, the incorporation of user defined metadata, the transportation of those metadata resources throughout the content ecosystem and the extension of it’s capabilities by user-contributed components.
The Calais initiative has three major components:
  • The Calais Web Service is the core. The Web service provides for the automated generation of rich semantic metadata in RDF format.
  • A series of sample applications to demonstrate how the web service can be utilized and serve as a starting point for other’s development activities.
  • Active support for developers that want to incorporate Calais capabilities in their applications and web sites.
The Calais initiative is sponsored by Reuters and built on ClearForest technology. Their Roadmap indicates the second release is in April:
Calais R2 is a big step forward. In addition to the functionality of R1, R2 will provide users with a persistent GUID allowing anyone with the GUID to call the Calais service and access the original metadata. For example – an RSS reader may have only a snippet of the original article but by using the Calais GUID the reader has the ability to filter, aggregate and present information based on the rich semantic content of the original document. R2.1 and R2.2 will focus on the normalization of extracted entities such as company names and the incorporation of selected industry-standard ontologies.

The second significant feature of R2 is the ability to support user-generated metadata. At the time content is submitted to Calais for processing the user has the ability to attach their own “bottom up” metadata – which will be available to all downstream consumers via the Calais GUID.

No comments: