Re: Phrase Frequency For Analysis

2006-06-22 Thread Bob Carpenter
Adding to this growing thread, there's really no reason to index all the term bigrams, trigrams, etc. It's not only slow, it's very memory/disk intensive. All you need to do is two passes over the collection. Pass One Collect counts of bigrams (or trigrams, or whatever -- if size is an

Re: Phrase Frequency For Analysis

2006-06-22 Thread Andrzej Bialecki
Nader Akhnoukh wrote: Yes, Chris is correct, the goal is to determine the most frequently occuring phrases in a document compared to the frequency of that phrase in the index. So there are only output phrases, no inputs. Also performance is not really an issue, this would take place on an irre

Re: Phrase Frequency For Analysis

2006-06-22 Thread Kamal Abou Mikhael
I may be coming into this thread without knowing enough. I have implemented a phrase filter, which indexes all token sequences that are 2 to N tokens long. The n is defined in the constructor. It takes a stopword Trie for input because the policy I used, based on a publish work I read, was that a

Re: Phrase Frequency For Analysis

2006-06-22 Thread Nader Akhnoukh
Yes, Chris is correct, the goal is to determine the most frequently occuring phrases in a document compared to the frequency of that phrase in the index. So there are only output phrases, no inputs. Also performance is not really an issue, this would take place on an irregular basis and could ru

Re: Phrase Frequency For Analysis

2006-06-22 Thread Andrzej Bialecki
Chris Hostetter wrote: I think either you missunderstood Nader's question or I did: I belive the goal is to determine what the most frequently occuring phrases are -- not determine how frequently a particular input phrase appears. Isn't the latter a pre-requisite for the former ? ;) Regardi

Re: Phrase Frequency For Analysis

2006-06-22 Thread Chris Hostetter
: > I am trying to get the most frequently occurring phrases in a document and : > in the index as a whole. The goal is compare the two to get something like : > Amazon's SIPs. : Other than indexing the phrases directly, you could use a SpanNearQuery : over the words, use getSpans() on its SpanS

Re: Phrase Frequency For Analysis

2006-06-22 Thread Paul Elschot
of occurrences of the "phrase" in the index. Eeach time doc() on the Spans returns a given document number, one can increase the phrase frequency count within the document. A Spans always iterates by non decreasing document number. Btw. that is a search. Regards, Paul Elschot

Phrase Frequency For Analysis

2006-06-21 Thread Nader Akhnoukh
Hi, I've looked through the archives and it looks like this question has been asked in one form or another a few times, but without a satisfactory solution. I am trying to get the most frequently occurring phrases in a document and in the index as a whole. The goal is compare the two to get some

phrase frequency??

2006-02-23 Thread sog
I searched my question in the mail archive, and found that I really want to get a phrase frequency, it is an old question which was not solved well. I traced Lucene source code, and discover that I can get a phrase's IDF from the Hits object weight= PhraseQuery$PhraseWeight (id=62

Re: Phrase frequency

2005-09-04 Thread Sean O'Connor
. If I do, I would be happy to share. Good luck, and feel free to post anything you think might be helpful if you implement something. Sean Fabio Cristiano dos Anjos wrote: Hi, How can I get phrase frequency in an index? Thanks in advance

Phrase frequency

2005-09-02 Thread Fabio Cristiano dos Anjos
Hi, How can I get phrase frequency in an index? Thanks in advance!! -- Atenciosamente, Fábio Cristiano dos Anjos

Phrase frequency

2005-08-24 Thread Ravikumar.Kondadadi
How can I get phrase frequency in an index? termDocs/termPositions in IndexReader work only with words Thanks Ravi. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]