The Term Vector code can be used to get the term frequencies from a specific document. Search this list, see the Lucene In Action book or look at http://www.cnlp.org/apachecon2005 for examples on how to use Term Vectors

Danilo Cicognani wrote:
Hello everybody.
We are building a complex automatic classification system using Lucene.
We need to manage normalized Tf/Idf (Term Frequency / Inverse Document
Frequency).
We understood that Lucene can give us Tf and Df and we are using these
values to calculate the normalized Tf/Idf but we would like to optimize this
calculation for better performance.
Is there any way to expose the maximum term frequency in a document from
Lucene, and maybe to obtain the normalized Tf/Idf from Lucene?
There aren't a public methods to get these values, but maybe Lucene holds
these informations privately and with a modify on Lucene source we could
have the work done to fasten the system.

P.S. Sorry for MY English: I hope I explained clearly my question.

**** 1000 KBye ****

 [) /\ |\| | |_ ()

web: www.ciconet.it
Web Portal Now: www.webportalnow.com


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



--

Grant Ingersoll Sr. Software Engineer Center for Natural Language Processing Syracuse University School of Information Studies 335 Hinds Hall Syracuse, NY 13244 http://www.cnlp.org Voice: 315-443-5484 Fax: 315-443-6886

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to