Thank you very much for your time. Here is a sample of a vector of terms :
v1 = {"sad", "john", "intelligent", "news", "USA", "disneyland", "MIT",
"cambridge", "marry", ...}
I'll try out your method.
Best regards,
Sengly
On 3/28/07, karl wettin <[EMAIL PROTECTED]> wrote:
28 mar 2007 kl.
28 mar 2007 kl. 15.24 skrev Sengly Heng:
Thank you but I still have have no clue of how to do that by using
Weka
after taking a look at its API. Let me reformulate my problem :
I have a collection of vector of terms (actually each vector of terms
represents the list of tokens extracted from
Thank you but I still have have no clue of how to do that by using Weka
after taking a look at its API. Let me reformulate my problem :
I have a collection of vector of terms (actually each vector of terms
represents the list of tokens extracted from a file) and I do not have the
original files.
28 mar 2007 kl. 10.36 skrev Sengly Heng:
Does anyone of you know any Java API that directly handle this
problem?
or I have to implement from scratch.
You can also try
weka.filters.unsupervised.attribute.StringToWordVector, it has many
neat features you might be interested in. And if app
You can pass in a String or a Reader to Field when indexing. There
is nothing file specific about Lucene when it comes to indexing.
Take a look at the Field class for the various constructors.
On Mar 28, 2007, at 8:20 AM, Sengly Heng wrote:
Thanks but in my case I do not have the files. W
Thanks but in my case I do not have the files. What I have is just a
collection of vectors of terms.
Does lucene provide any mean to index each vector of terms as a file? Or
there is a better way to do?
Thank everyone once again.
Regards,
Sengly
On 3/28/07, thomas arni <[EMAIL PROTECTED]> wr
Hava a look at the "TermDocs" Interface in the API.
You can get term frequency with a open IndexReader
TermDocs termDocs = reader.termDocs(term);
where "term" represents the current Term.
now you can call:
termDocs.freq()
to get the frequency of the term within the current document.
For th