The Term Vector code can be used to get the term frequencies from a
specific document. Search this list, see the Lucene In Action book or
look at http://www.cnlp.org/apachecon2005 for examples on how to use
Term Vectors
Danilo Cicognani wrote:
Hello everybody.
We are building a complex automatic classification system using Lucene.
We need to manage normalized Tf/Idf (Term Frequency / Inverse Document
Frequency).
We understood that Lucene can give us Tf and Df and we are using these
values to calculate the normalized Tf/Idf but we would like to optimize this
calculation for better performance.
Is there any way to expose the maximum term frequency in a document from
Lucene, and maybe to obtain the normalized Tf/Idf from Lucene?
There aren't a public methods to get these values, but maybe Lucene holds
these informations privately and with a modify on Lucene source we could
have the work done to fasten the system.
P.S. Sorry for MY English: I hope I explained clearly my question.
**** 1000 KBye ****
[) /\ |\| | |_ ()
web: www.ciconet.it
Web Portal Now: www.webportalnow.com
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
--
Grant Ingersoll
Sr. Software Engineer
Center for Natural Language Processing
Syracuse University
School of Information Studies
335 Hinds Hall
Syracuse, NY 13244
http://www.cnlp.org
Voice: 315-443-5484
Fax: 315-443-6886
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]