Re: WhitespaceTokenizer, incrementToke() ArrayOutOfBoundException

2013-04-15 Thread Jack Krupansky
Yes, reset was always "mandatory" from an API contract sense, but not always enforced in a practical sense in 3.x (no uniformly extreme negative consequences), as the original emailer indicated. Now, it is "mandatory" in a practical sense as well (extremely annoying consequences in all cases of

Re: Statically store sub-collections for search (faceted search?)

2013-04-15 Thread SUJIT PAL
Hi Uwe, I see, makes sense, thanks very much for the info. Sorry about giving you wrong info Carsten. -sujit On Apr 15, 2013, at 1:06 PM, Uwe Schindler wrote: > Hi, > > Original Message- >> From: Sujit Pal [mailto:sujitatgt...@gmail.com] On Behalf Of SUJIT PAL >> Sent: Monday, April 1

RE: Statically store sub-collections for search (faceted search?)

2013-04-15 Thread Uwe Schindler
Hi, Original Message- > From: Sujit Pal [mailto:sujitatgt...@gmail.com] On Behalf Of SUJIT PAL > Sent: Monday, April 15, 2013 9:43 PM > To: java-user@lucene.apache.org > Subject: Re: Statically store sub-collections for search (faceted search?) > > Hi Uwe, > > Thanks for the info, I was

Re: Statically store sub-collections for search (faceted search?)

2013-04-15 Thread SUJIT PAL
Hi Uwe, Thanks for the info, I was under the impression that it didn't... I got this info (that filters don't have a limit because they are not scoring) from a document like the one below. Can't say this is the exact doc because its been a while since I saw that, though. http://searchhub.org/2

No documents in TermsFilter.getDocIdSet()

2013-04-15 Thread Carsten Schnober
Hi, tying in with the previous thread "Statically store sub-collections for search", I'm trying to focus on the root of the problem that has occurred to me. At first, I generate a TermsFilter with potentially many terms in one term: - List docnames = new Ar

RE: WhitespaceTokenizer, incrementToke() ArrayOutOfBoundException

2013-04-15 Thread Uwe Schindler
Hi, It was always mandatory! In Lucene 2.x/3.x some Tokenizers just returned bogus, undefined stuff if not correctly reset before usage, especially when Tokenizers are "reused" by the Analyzer, which is now mandatory in 4.x. So we made it throw some Exception (NPE or AIOOBE) in Lucene 4 by init

Re: Statically store sub-collections for search (faceted search?)

2013-04-15 Thread Carsten Schnober
Am 15.04.2013 13:43, schrieb Uwe Schindler: Hi, > Passing NULL means all documents are allowed, if this would not be the case, > whole Lucene queries and filters would not work at all, so if you get 0 docs, > you must have missed something else. If this is not the case, your filter may > behav

RE: WhitespaceTokenizer, incrementToke() ArrayOutOfBoundException

2013-04-15 Thread andi rexha
Thank you, that was the reason. > From: j...@basetechnology.com > To: java-user@lucene.apache.org > Subject: Re: WhitespaceTokenizer, incrementToke() ArrayOutOfBoundException > Date: Mon, 15 Apr 2013 10:25:26 -0400 > > I didn't read your code, but do you have the "reset" that is now mandatory >

Re: WhitespaceTokenizer, incrementToke() ArrayOutOfBoundException

2013-04-15 Thread Jack Krupansky
I didn't read your code, but do you have the "reset" that is now mandatory and throws AIOOBE if not present? -- Jack Krupansky -Original Message- From: andi rexha Sent: Monday, April 15, 2013 10:21 AM To: java-user@lucene.apache.org Subject: WhitespaceTokenizer, incrementToke() ArrayO

WhitespaceTokenizer, incrementToke() ArrayOutOfBoundException

2013-04-15 Thread andi rexha
Hi, I have tryed to get all the tokens from a TokenStream in the same way as I was doing in the 3.x version of Lucene, but now (at least with WhitespaceTokenizer) I get an exception: Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: -1 at java.lang.Character.codePointAtIm

RE: Statically store sub-collections for search (faceted search?)

2013-04-15 Thread Uwe Schindler
Hi, > Hi again, > > >>> You are somehow "misusing" acceptDocs and DocIdSet here, so you > have > >> to take care, semantics are different: > >>> - For acceptDocs "null" means "all documents allowed" -> no deleted > >>> documents > >>> - For DocIdSet "null" means "no documents matched" > >> > >> O

Re: Statically store sub-collections for search (faceted search?)

2013-04-15 Thread Carsten Schnober
Am 15.04.2013 11:27, schrieb Uwe Schindler: Hi again, >>> You are somehow "misusing" acceptDocs and DocIdSet here, so you have >> to take care, semantics are different: >>> - For acceptDocs "null" means "all documents allowed" -> no deleted >>> documents >>> - For DocIdSet "null" means "no docume

RE: Statically store sub-collections for search (faceted search?)

2013-04-15 Thread Uwe Schindler
Hi, > > AcceptDocs in Lucene are generally all non-deleted documents. For your > call to Filter.getDocIdSet you should therefor pass > AtomicReader.getLiveDocs() and not Bits.MatchAllBits. > > I see. As far as I understand the documentation, getLiveDocs() returns null if > there are no deleted d

Re: Statically store sub-collections for search (faceted search?)

2013-04-15 Thread Carsten Schnober
Am 15.04.2013 10:42, schrieb Uwe Schindler: > Not every DocIdSet supports bits(). If it returns null, then bits are not > supported. To enforce a bitset availabe use CachingWrapperFilter (which > internally uses a BitSet to cache). > It might also happen that Filter.getDocIdSet() returns null, w

RE: Statically store sub-collections for search (faceted search?)

2013-04-15 Thread Uwe Schindler
There might be 2 problems: Not every DocIdSet supports bits(). If it returns null, then bits are not supported. To enforce a bitset availabe use CachingWrapperFilter (which internally uses a BitSet to cache). It might also happen that Filter.getDocIdSet() returns null, which means that no docum

Re: Statically store sub-collections for search (faceted search?)

2013-04-15 Thread Carsten Schnober
Am 15.04.2013 10:04, schrieb Uwe Schindler: > The limit also applies for filters. If you have a list of terms ORed > together, the fastest way is not to use a BooleanQuery at all, but instead a > TermsFilter (which has no limits). Hi Uwe, thanks for the pointer, this looks promising! The only mi

RE: Statically store sub-collections for search (faceted search?)

2013-04-15 Thread Uwe Schindler
The limit also applies for filters. If you have a list of terms ORed together, the fastest way is not to use a BooleanQuery at all, but instead a TermsFilter (which has no limits). - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -Or

Re: Statically store sub-collections for search (faceted search?)

2013-04-15 Thread Carsten Schnober
Am 12.04.2013 20:08, schrieb SUJIT PAL: > Hi Carsten, > > Why not use your idea of the BooleanQuery but wrap it in a Filter instead? > Since you are not doing any scoring (only filtering), the max boolean clauses > limit should not apply to a filter. Hi Sujit, thanks for your suggestion! I wasn