RE: Bigrams for CJK with ICUTokenizer ?

2011-02-04 Thread Burton-West, Tom
Thanks Robert, I opened up LUCENE 2906. But I just realized in the effort to keep the description short, I forgot to include your option of producing both unigrams and bigrams, which is a nice option. Tom -Original Message- From: Robert Muir [mailto:rcm...@gmail.com] Sent: Friday, Fe

Re: Bigrams for CJK with ICUTokenizer ?

2011-02-04 Thread Robert Muir
On Fri, Feb 4, 2011 at 3:07 PM, Burton-West, Tom wrote: > Thanks Robert, > > Lucene 2740 looks really interesting.  In the meantime a JIRA issue for this > sounds like a good idea since I'm guessing other people would like to use the > ICUTokenizer but would also like bigrams for CJK. > > I'm a

RE: Bigrams for CJK with ICUTokenizer ?

2011-02-04 Thread Burton-West, Tom
Thanks Robert, Lucene 2740 looks really interesting. In the meantime a JIRA issue for this sounds like a good idea since I'm guessing other people would like to use the ICUTokenizer but would also like bigrams for CJK. I'm a bit confused over the relationship of the queryparser to the filter c

Re: Bigrams for CJK with ICUTokenizer ?

2011-02-04 Thread Robert Muir
On Fri, Feb 4, 2011 at 12:46 PM, Burton-West, Tom wrote: > Hello all, > > We are using the ICUTokenizer because we have documents in about 400 > different languages.   We are also setting autoGeneratePhraseQueries to false > so that CJK and other languages that don't use space to separate words

Bigrams for CJK with ICUTokenizer ?

2011-02-04 Thread Burton-West, Tom
Hello all, We are using the ICUTokenizer because we have documents in about 400 different languages. We are also setting autoGeneratePhraseQueries to false so that CJK and other languages that don't use space to separate words won't get tokenized properly by the ICUTokenizer and then the toke

Re: Syntax for Numeric Range

2011-02-04 Thread Anuj Shah
In my case the query engine is very generic and, along side the QueryParser, doesn't know about the fields. So I can't decide whether a TermRangeQuery and a NumericRangeQuery. How about a syntax like: > numericfield:[{1 TO 10}] > Using Luke this seems to parse into a TermRangeQuery with the { }

RE: Syntax for Numeric Range

2011-02-04 Thread Uwe Schindler
You have everything you need to implement this. This is much easier to change the syntax. - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -Original Message- > From: Anuj Shah [mailto:anujshahw...@gmail.com] > Sent: Friday, February

Re: Syntax for Numeric Range

2011-02-04 Thread Anuj Shah
Hi, I see why the existing syntax cannot be used to automatically generate a NumericRange. But, is it possible to extend the QueryParser to include additional syntax for a numeric range. e.g. numericfield:[1;10] > The user can be trained to use this syntax for certain fields (i.e those that I kn