> Do you just do this with terms or do you also
> extract phrases?
The scheme involves these phases:
1) Identify top terms (using algo described)
2) Identify all term "runs" in original text.
3) Identify sensible phrases from large list of term
runs
4) Provide shortlist of top scoring terms AND
On Jul 14, 2005, at 7:17 AM, mark harwood wrote:
I've done this by comparing term frequency in a subset
(in Amazon's case a single book) and looking for a
significant "uplift" in term popularity vs that of the
general corpus popularity. Practically speaking, in
the amazon case you can treat each
I've done this by comparing term frequency in a subset
(in Amazon's case a single book) and looking for a
significant "uplift" in term popularity vs that of the
general corpus popularity. Practically speaking, in
the amazon case you can treat each page in the example
book as a Lucene document, crea