here you have a very good tool for Keyphrase Extraction. It is GNU and easy to integrate in Lucene.
http://www.paynter.info/academia/Kea.php best jose On 5/8/07, Bill Janssen <[EMAIL PROTECTED]> wrote:
Dawid Weiss wrote: > You could also try splitting the document into paragraphs and use Carrot2's > Lingo algorithm (www.carrot2.org) on a paragraph-level to extract clusters. > Labelling routine in Lingo should extract 'key' phrases; this analysis is > heavily frequency-based, but... you know, you may want to try it. Just to make sure I'm following... So you're suggesting splitting the document into paragraphs, then treating each paragraph as if it were a Carrot2 search result, performing the clustering, then looking at the label Lingo chooses for each cluster, and treating that label as the "key phrase"? Would DirectDocumentFeedExample be a good starting point? Bill --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
-- José Ramón Pérez Agüera Dept. de Ingeniería del Software e Inteligencia Artificial Despacho 411 tlf. 913947599 Facultad de Informática Universidad Complutense de Madrid --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]