Phrase Frequency For Analysis

Nader Akhnoukh Wed, 21 Jun 2006 16:33:51 -0700

Hi, I've looked through the archives and it looks like this question has
been asked in one form or another a few times, but without a satisfactory
solution.


I am trying to get the most frequently occurring phrases in a document and
in the index as a whole.  The goal is compare the two to get something like
Amazon's SIPs.

This is straightforward for individual words.  Get the term frequency of
each term in a doc and compare it to the frequency of that term in the
index.  A high ratio indicates that the term appears in this doc much more
than the other docs on average.

Does anyone have an idea of how to do this with phrases of say 1 to 3 words?

Just to be clear,  in this case I am only using Lucene for it's built in
frequency analysis, I'm not actually using it to search for anything that is
indexed.

Thanks,
NSA

Phrase Frequency For Analysis

Reply via email to