Re: Hot to get word importance in lucene index

Xaida Fri, 23 Jul 2010 01:55:19 -0700

Hi! thanks for reply! I will try to explain better, sorry if it was unclear.

I have user text document collection. Not too big. Goal is to get the most
"important" concepts which would in a way represent user interests. That
is what i mean when i say important :)

So lets say, in my collection I have my school documents, i have some
snowboarding articles, i have some backpacking and easy travelling guides,
my favorite cooking recipes......and so on. Collection is more - less
supervised, so number of documents for each "area" is similar. Not equal,
but there is some balance.

So i would like, as result to get terms which are important in the entire
collection. For example, i think that term "cheese" should appear in my
results, because i know in my recipes there is a lot of cheese. Also i would
like to get the term "database"...from my school documents. And so on.

So nothing more smart comes to my mind than this :)
step 1 take one document
step 2 calculate tfidf for all its terms
step 3 take the terms with best tfidf and save them somewhere....
step 4 go to step 1......and so on for all the documents

And in the end to merge these results somehow :/

I guess there is better way :)

Thank you!!

--
View this message in context:
http://lucene.472066.n3.nabble.com/Hot-to-get-word-importance-in-lucene-index-tp988836p989301.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: Hot to get word importance in lucene index

Reply via email to