Are you perhaps looking for this:

http://lucene.apache.org/java/3_0_2/api/all/org/apache/lucene/search/similar/MoreLikeThis.html

?

        karl

23 jul 2010 kl. 10.54 skrev Xaida:


Hi! thanks for reply! I will try to explain better, sorry if it was unclear.

I have user text document collection. Not too big. Goal is to get the most "important" concepts which would in a way represent user interests. That
is what i mean when i say important :)


So lets say, in my collection I have my school documents, i have some
snowboarding articles, i have some backpacking and easy travelling guides,
my favorite cooking recipes......and so on. Collection is more - less
supervised, so number of documents for each "area" is similar. Not equal,
but there is some balance.

So i would like, as result to get terms which are important in the entire collection. For example, i think that term "cheese" should appear in my results, because i know in my recipes there is a lot of cheese. Also i would
like to get the term "database"...from my school documents. And so on.

So nothing more smart comes to my mind than this :)
step 1 take one document
step 2 calculate tfidf for all its terms
step 3 take the terms with best tfidf and save them somewhere....
step 4 go to step 1......and so on for all the documents

And in the end to merge these results somehow :/

I guess there is better way :)

Thank you!!





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Hot-to-get-word-importance-in-lucene-index-tp988836p989301.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to