Re: Tokenization and PrefixQuery

2014-02-14 Thread Michael McCandless
On Fri, Feb 14, 2014 at 8:21 AM, Yann-Erwan Perio wrote: > I have written a test which demonstrates that the mistake is indeed on > my side. It's probably due to inconsistent rules for > indexing/searching content having special characters (namely the > "plus" sign). OK, thanks for bringing clos

Re: Tokenization and PrefixQuery

2014-02-14 Thread Yann-Erwan Perio
On Fri, Feb 14, 2014 at 1:11 PM, Yann-Erwan Perio wrote: > On Fri, Feb 14, 2014 at 12:33 PM, Michael McCandless > wrote: Hi again, >> That should not be the case: it should match all terms with that >> prefix regardless of the term's length. Try to boil it down to a >> small test case? > > I g

Re: Tokenization and PrefixQuery

2014-02-14 Thread Yann-Erwan Perio
On Fri, Feb 14, 2014 at 12:33 PM, Michael McCandless wrote: > This is similar to PathHierarchyTokenizer, I think. Ah, yes, very much. I'll check it out and see if I can make something of it. I am not sure to what extent it'll be reusable though, as my tokenizer also sets payloads (the next comin

Re: Tokenization and PrefixQuery

2014-02-14 Thread Michael McCandless
On Fri, Feb 14, 2014 at 6:17 AM, Yann-Erwan Perio wrote: > Hello, > > I am designing a system with documents having one field containing > values such as "Ae1 Br2 Cy8 ...", i.e. a sequence of items made of > letters and numbers (max=7 per item), all separated by a space, > possibly 200 items per f

Tokenization and PrefixQuery

2014-02-14 Thread Yann-Erwan Perio
Hello, I am designing a system with documents having one field containing values such as "Ae1 Br2 Cy8 ...", i.e. a sequence of items made of letters and numbers (max=7 per item), all separated by a space, possibly 200 items per field, with no limit upon the number of documents (although I would no