Hi! thanks for reply! I will try to explain better, sorry if it was unclear.
I have user text document collection. Not too big. Goal is to get the most "important" concepts which would in a way represent user interests. That is what i mean when i say important :) So lets say, in my collection I have my school documents, i have some snowboarding articles, i have some backpacking and easy travelling guides, my favorite cooking recipes......and so on. Collection is more - less supervised, so number of documents for each "area" is similar. Not equal, but there is some balance. So i would like, as result to get terms which are important in the entire collection. For example, i think that term "cheese" should appear in my results, because i know in my recipes there is a lot of cheese. Also i would like to get the term "database"...from my school documents. And so on. So nothing more smart comes to my mind than this :) step 1 take one document step 2 calculate tfidf for all its terms step 3 take the terms with best tfidf and save them somewhere.... step 4 go to step 1......and so on for all the documents And in the end to merge these results somehow :/ I guess there is better way :) Thank you!! -- View this message in context: http://lucene.472066.n3.nabble.com/Hot-to-get-word-importance-in-lucene-index-tp988836p989301.html Sent from the Lucene - Java Users mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org