Hi Grant, i think what is more relevan is what you wrote here: http://www.cnlp.org/apachecon2005/
about domain specialization, but it wasn't very (maybe because only 4 slides) On 3/21/06, Grant Ingersoll <[EMAIL PROTECTED]> wrote: > > You might want to look at the Carrot2 project > (http://www.carrot2.org/website/xml/index.xml). > > It does clustering and has support for Lucene. > > Valerio Schiavoni wrote: > > Hello, > > not sure if the term 'cluster' is the correct one, but here what i would > > like to do: > > given I have a small set of categories; i manually defined some keywords > for > > each category. > > ie: > > > > -spielberg: ET, munich, indiana jones; > > -sport: football, basket, volley, etc etc; > > > > then, i have a quite large archive of documents (html, pdf, doc) (~5000, > > still growing) and I want to 'assign' each document > > to those categories, using Lucene possibly (if it can help!). > > > > what approach could I adopt ? > > > > thanks, > > valerio > > > > -- > > To Iterate is Human, to Recurse, Divine > > James O. Coplien, Bell Labs > > (how good is to be human indeed) > > > > > > -- > > Grant Ingersoll > Sr. Software Engineer > Center for Natural Language Processing > Syracuse University > School of Information Studies > 335 Hinds Hall > Syracuse, NY 13244 > > http://www.cnlp.org > Voice: 315-443-5484 > Fax: 315-443-6886 > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > -- To Iterate is Human, to Recurse, Divine James O. Coplien, Bell Labs (how good is to be human indeed)