Hi Eric,

Yes they help for particularly this problem - they shrik all avilable Longs to 14 character strings. So they free me from the limitation that I wrote about (1970 - 2280 years). So now the advantage that I have with treating fields representing UTC dates as numbers (but not as strings) seems to be some times smaller. If storing UTCs in radix 36 and considering the real situation in my case - I have only 2 UTC-fields with which keep dates of theese years I will save approximate 20MB for 1 million docs which is 0,5 - 2 % of the index size.

But may be this approach that I describe in my first mail (and tested it successfully by now) will help to someone who uses float values in the index or longs that vary reasonably in length.

Best Regards,
Ivan

Erik Hatcher wrote:
Ivan - have you considered using NumberUtils?

<http://lucene.zones.apache.org:8080/hudson/job/Lucene-Nightly/javadoc/org/apache/lucene/document/NumberTools.html>


I'm curious if those utility methods solve the same problem you're working on.

    Erik


On Sep 13, 2007, at 1:19 PM, Ivan Vasilev wrote:

Hi All,

I have made some changes in my Lucene source, so that values of numeric fields to be treated as numbers but not as Strings. After testing everything seems to work correctly, but I still would like to know your opinion about this.
So my approach is the following:

1. As during the indexing process the terms are ordered according their values I made changes in the methods TermBuffer.compareTo(TermBuffer other) and Term.compareTo(Term other) so that when the filed contains numeric data comparison to be made based on numeric logic (Integer.compareto(..), Float.compareTo(..)). So this orders terms based on numeric logic but not based on lexicographical one.

2. To work correctly range searches similar changes were made in RangeFilter.bits(IndexReader reader) and RangeQuery.rewrite(IndexReader reader) methods.

Changes seem to be very simple, but I did not found case when they lead to wrong behavior. Before these changes to make range queries on numeric fields I made values of those fields with fixed length so that the lexicographical order to be the same like the numeric one. So I had to keep dates in some fields and I made them 13 length fields that keep UTC representation of the date. When the UTC was short I prefixed it with zeroes. This made range searches to work correctly for the time interval 1970 - 2280 approximately. Now with the new implementation I do not have any restrictions about this time interval.

So before some time I digged in Lucene forum and saw there were some discussions about this. So if anybody uses also such approach, or have bad experience with it, please tell me.
Tanks in advance.


Best Regards,
Ivan

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


__________ NOD32 2528 (20070913) Information __________

This message was checked by NOD32 antivirus system.
http://www.eset.com





---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to