Re: how to reuse a tokenStream?

2010-05-28 Thread Li Li
I want to implement an analyzer which use WhitespaceAnalyzer first then my tokenFilter. But my filter need not global information of token such as how many times a token occur. So in tokenStream method. I iterate the tokenStream to get all the things I need. Then pass this information to my own To

Re: PhraseQuery vs MultiPhraseQuery

2010-05-28 Thread Ahmet Arslan
> Is there a fundamental difference between > > PhraseQuery query = new PhraseQuery(); > query.add(term1, 0); > query.add(term2, 0); > > and > > MultiPhraseQuery query = new MultiPhraseQuery(); > query.add( new Term[] { term1, term2 } ); > > The only different I could think of is that MPQ som

Re: Surround QueryParser and PhraseQuery

2010-05-28 Thread Ahmet Arslan
> I'm having problem with searching phrase and using Surround > Query Parser, so > let look at input surround queries (test examples) >    1. "yellow orange" >    2. lemon 2n ("yellow orange") 4n banana > where 2n, 4n are within connectors. You don't need phrasequery when you already have spannear

Re: Core dumped

2010-05-28 Thread Michael McCandless
Also, are you indexing largish documents? Lucene must fully index the doc, and then flush, so for such large docs it can easily use more than the 50 MB buffer you allotted. There were some recent memory leak fixes for such large documents, as well, that you might be hitting. Which Lucene version

PhraseQuery vs MultiPhraseQuery

2010-05-28 Thread Emmanuel Bernard
Hello, I am a bit confused by the two. Is there a fundamental difference between PhraseQuery query = new PhraseQuery(); query.add(term1, 0); query.add(term2, 0); and MultiPhraseQuery query = new MultiPhraseQuery(); query.add( new Term[] { term1, term2 } ); The only different I could think of i

Re: how to reuse a tokenStream?

2010-05-28 Thread Erick Erickson
What is the problem you're seeing? Maybe a stack trace? You haven't told us what the incorrect behavior is. Best Erick On Fri, May 28, 2010 at 12:52 AM, Li Li wrote: > I want to analyzer a text twice so that I can get some statistic > information from this text >

Surround QueryParser and PhraseQuery

2010-05-28 Thread flaiz
Hello I'm having problem with searching phrase and using Surround Query Parser, so let look at input surround queries (test examples) 1. "yellow orange" 2. lemon 2n ("yellow orange") 4n banana where 2n, 4n are within connectors. You see I surrounded yellow orange into quotes to let the par

Re: How to get the number of unique terms in the inverted index

2010-05-28 Thread Yonik Seeley
It seems like there should be a formula for estimating the total number of unique terms given that you know the unique term counts for each segment, and make certain assumptions like random document distribution across segments. -Yonik http://www.lucidimagination.com On Thu, May 27, 2010 at 9:17