Re: LookaheadTokenFilter

2013-09-07 Thread Michael McCandless
Thanks Benson, I'll have a look. Mike McCandless http://blog.mikemccandless.com On Sat, Sep 7, 2013 at 4:33 PM, Benson Margulies wrote: > LUCENE-5202. It seems to show the problem of the extra peek. I'm still > struggling to make sense of the 'problem' of not always calling > afterPosition();

Re: LookaheadTokenFilter

2013-09-07 Thread Benson Margulies
LUCENE-5202. It seems to show the problem of the extra peek. I'm still struggling to make sense of the 'problem' of not always calling afterPosition(); that may be entirely my own confusion. On Sat, Sep 7, 2013 at 4:21 PM, Michael McCandless wrote: > That would be awesome, thanks! > > Mike McCand

Re: LookaheadTokenFilter

2013-09-07 Thread Michael McCandless
That would be awesome, thanks! Mike McCandless http://blog.mikemccandless.com On Sat, Sep 7, 2013 at 3:40 PM, Benson Margulies wrote: > I think I had better build you a test case for this situation, and > attach it to a JIRA. > > On Sat, Sep 7, 2013 at 3:33 PM, Michael McCandless > wrote: >>

Re: LookaheadTokenFilter

2013-09-07 Thread Benson Margulies
I think I had better build you a test case for this situation, and attach it to a JIRA. On Sat, Sep 7, 2013 at 3:33 PM, Michael McCandless wrote: > Something is wrong; I'm not sure what offhand, but calling peekToken > 10 times should not stack all tokens @ position 0; it should stack the > token

Re: LookaheadTokenFilter

2013-09-07 Thread Michael McCandless
Something is wrong; I'm not sure what offhand, but calling peekToken 10 times should not stack all tokens @ position 0; it should stack the tokens at the positions where they occurred. Are you sure the posIncr att is sometimes 1 (i.e., the position is in fact moving forward for some tokens)? next

Re: LookaheadTokenFilter

2013-09-07 Thread Benson Margulies
nextToken() calls peekToken(). That seems to prevent my lookahead processing from seeing that item later. Am I missing something? On Fri, Sep 6, 2013 at 9:15 PM, Benson Margulies wrote: > I think that the penny just dropped, and I should not be using this class. > > If I call peekToken 10 times

Strange performance of Lucene 4.4.0

2013-09-07 Thread Mirko Sertic
Hi@all I am getting strange performance measures on Lucene 4.4.0, maybe someone can explain this: The following syntax leads to pretty slow queries on my machine(16ms for every execution): theSearcher.search(theQuery, null, theSearcher.getIndexReader().maxDoc()); but the following syntax

Re: PositionLengthAttribute

2013-09-07 Thread Benson Margulies
On Sat, Sep 7, 2013 at 8:39 AM, Robert Muir wrote: > On Sat, Sep 7, 2013 at 7:44 AM, Benson Margulies wrote: >> In Japanese, compounds are just decompositions of the input string. In >> other languages, compounds can manufacture entire tokens from thin >> air. In those cases, it's something of a

Re: PositionLengthAttribute

2013-09-07 Thread Robert Muir
On Sat, Sep 7, 2013 at 7:44 AM, Benson Margulies wrote: > In Japanese, compounds are just decompositions of the input string. In > other languages, compounds can manufacture entire tokens from thin > air. In those cases, it's something of a question how to decide on the > offsets. I think that you

Re: PositionLengthAttribute

2013-09-07 Thread Benson Margulies
In Japanese, compounds are just decompositions of the input string. In other languages, compounds can manufacture entire tokens from thin air. In those cases, it's something of a question how to decide on the offsets. I think that you're right, eventually, insofar as there's some offset in the orig