Re: TF-IDF API

2007-03-28 Thread Sengly Heng
Thank you very much for your time. Here is a sample of a vector of terms : v1 = {"sad", "john", "intelligent", "news", "USA", "disneyland", "MIT", "cambridge", "marry", ...} I'll try out your method. Best regards, Sengly On 3/28/07, karl wettin <[EMAIL PROTECTED]> wrote: 28 mar 2007 kl.

Re: TF-IDF API

2007-03-28 Thread karl wettin
28 mar 2007 kl. 15.24 skrev Sengly Heng: Thank you but I still have have no clue of how to do that by using Weka after taking a look at its API. Let me reformulate my problem : I have a collection of vector of terms (actually each vector of terms represents the list of tokens extracted from

Re: TF-IDF API

2007-03-28 Thread Sengly Heng
Thank you but I still have have no clue of how to do that by using Weka after taking a look at its API. Let me reformulate my problem : I have a collection of vector of terms (actually each vector of terms represents the list of tokens extracted from a file) and I do not have the original files.

Re: TF-IDF API

2007-03-28 Thread karl wettin
28 mar 2007 kl. 10.36 skrev Sengly Heng: Does anyone of you know any Java API that directly handle this problem? or I have to implement from scratch. You can also try weka.filters.unsupervised.attribute.StringToWordVector, it has many neat features you might be interested in. And if app

Re: TF-IDF API

2007-03-28 Thread Grant Ingersoll
You can pass in a String or a Reader to Field when indexing. There is nothing file specific about Lucene when it comes to indexing. Take a look at the Field class for the various constructors. On Mar 28, 2007, at 8:20 AM, Sengly Heng wrote: Thanks but in my case I do not have the files. W

Re: TF-IDF API

2007-03-28 Thread Sengly Heng
Thanks but in my case I do not have the files. What I have is just a collection of vectors of terms. Does lucene provide any mean to index each vector of terms as a file? Or there is a better way to do? Thank everyone once again. Regards, Sengly On 3/28/07, thomas arni <[EMAIL PROTECTED]> wr

Re: TF-IDF API

2007-03-28 Thread thomas arni
Hava a look at the "TermDocs" Interface in the API. You can get term frequency with a open IndexReader TermDocs termDocs = reader.termDocs(term); where "term" represents the current Term. now you can call: termDocs.freq() to get the frequency of the term within the current document. For th