here you have a very good tool for Keyphrase Extraction. It is GNU and
easy to integrate in Lucene.

http://www.paynter.info/academia/Kea.php

best

jose

On 5/8/07, Bill Janssen <[EMAIL PROTECTED]> wrote:
Dawid Weiss wrote:
> You could also try splitting the document into paragraphs and use Carrot2's
> Lingo algorithm (www.carrot2.org) on a paragraph-level to extract clusters.
> Labelling routine in Lingo should extract 'key' phrases; this analysis is
> heavily frequency-based, but... you know, you may want to try it.

Just to make sure I'm following...

So you're suggesting splitting the document into paragraphs, then
treating each paragraph as if it were a Carrot2 search result,
performing the clustering, then looking at the label Lingo chooses for
each cluster, and treating that label as the "key phrase"?

Would DirectDocumentFeedExample be a good starting point?

Bill

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




--
José Ramón Pérez Agüera

Dept. de Ingeniería del Software e Inteligencia Artificial
Despacho 411 tlf. 913947599
Facultad de Informática
Universidad Complutense de Madrid

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to