Hi Uwe, Thank you for your help, it is greatly appreciated. Unfortunately, my tests all fail except for RangeInclusive. I've changed the step to be 6 as per your recommendation. I had it at max to eliminate step precision as the cause of the test failure. Essentially, all keys in Cassandra are UTF-8 Keys. In the Lucandra, the keys are constructed in the following way.
1. Get the token stream for the field. In this case it's a NumericTokenStream with (numeric,valSize=64,precisionStep=6) 2. For all tokens in the stream, create a UTF8 String in the following format <fieldname>\uffff<token value> 3. Set the term frequency to 1 This gives us a list of tokens, prefixed with the field name and the delimiter. then we do this for each term from above create a key of the format <indexname>\uffff<fieldname>\uffff<token value> and write it to TermInfo column Family After debugging the implementation of the LucandraTermEnum, it is correctly returning values that should match my numeric range query. However, I never get the results in the TopDocs result set after they're handed back to the numeric range query object. Any ideas why this is happening? Thanks, Todd On Wed, 2010-06-23 at 08:53 +0200, Uwe Schindler wrote: > Hi Todd, > > I am not sure if I understand your problem correctly. I am not familiar with > Lucandra/Cassandra at all, but if Lucandra implements the IndexWriter and > IndexReader according to the documentation, numeric queries should work. A > NumericField internally creates a TokenStream and "analyzes" the number to > several Tokens, which are somehow "half binary" (they are terms containing of > characters in the full 0..127 range for optimal UTF8 compression with 3.x > versions of Lucene). The exact encoding can be looked at in the NumericUtils > class + javadocs. > > About your testcase: The test looks good, so does it fail? If yes, where is > the problem? You can also look into Lucene's test TestNumericRangeQuery64 for > more examples. Or modify its @BeforeClass to instead build a Lucandra index. > > The test has one thing, that is not intended to be done like that: > numeric = new NumericField("long", Integer.MAX_VALUE, Store.YES, true); > > You are using MAX_VALUE as precision step, this would slowdown all queries to > the speed of old-style TermRangeQueries. It is always better to stick with > the default of 4, which creates 64 bits / 4 precStep = 16 terms per value. > Alternatively for longs, 6 is a good precision step (see NumericRangeQuery > documentation). MAX_VALUE is only intended for fields that do not do numeric > ranges but e.g. sort only. precisionStep is a performance tuning parameter, > it has nothing to do with better/worse precision on terms or different query > results. If you are using NumericRangeQuery with this large precStep, you are > not using the numeric features at all, so your test should not behave > different from a conventional TermRangeQuery with padded terms. > > Uwe > > ----- > Uwe Schindler > H.-H.-Meier-Allee 63, D-28213 Bremen > http://www.thetaphi.de > eMail: u...@thetaphi.de > > > > -----Original Message----- > > From: Todd Nine [mailto:t...@spidertracks.co.nz] > > Sent: Wednesday, June 23, 2010 7:53 AM > > To: java-user@lucene.apache.org > > Subject: Help with Numeric Range > > > > Hi all, > > I'm new to Lucene, as well as Cassandra. I'm working on the Lucandra > > project to modify it to add some extra functionality. It hasn't been fully > > testing with range queries, so I've created some tests and contributed them. > > You can view my source here. > > > > http://github.com/tnine/Lucandra/blob/master/test/lucandra/NumericRang > > eTests.java > > > > First, is this a sensible test? I'm specifically testing the case of longs > > where I > > need millisecond precision on my searches. > > > > > > Second, I see that Numeric Fields are built via terms. I think the issue > > lies in > > the encoding of these terms into bytes for the Cassandra keys. Can anyone > > point me to some documentation on numeric queries and terms, and how > > they are encoded at the byte level based on the precision? > > > > Thanks, > > Todd >