Thanks for all your ideas. I was expecting the sorting related fix in 3.0 but 
hopefully it would be great, if it is get in to 3.1.

I am using Integer for datetime. As the data grows, I am hitting the upper 
limit. As my application is part of the product, used in different environment, 
We cannot request all customers to increase the memory. We need to run in less 
and find some other way out.

One option most of them in the group discussed about using EHcache. 

Let consider the below data get indexed. unique_id is the id generated for 
every record.
unique_id,  field1,  field2,  date_time

In Ehcache, Consider I am storing
unique_id, date_time

How could i merge the results from Lucene and Ehcache? Do I need to fetch all 
the search results and compare it against the EHcache results and decide (using 
FieldComparatorSource). Could some one help me how to go about with it. 
Consider the data is static and there will be updates, modification but the 
uniqueid and date_time is not going to change. I cannot use docid from lucene 
as it is changing after updates.

(future thought / research) One more thought, Is there any way to write the 
index in sorted order, May be while merging. Assign docid by sorting the 
selected field. This way we could achieve the sorting by zero RAM utilization. 
Mostly the sorted field is fixed for all application. Just some interest to 
know these things....

Regards
Ganesh

----- Original Message ----- 
From: "Toke Eskildsen" <t...@statsbiblioteket.dk>
To: <java-user@lucene.apache.org>; "Toke Eskildsen" <t...@statsbiblioteket.dk>
Sent: Thursday, December 17, 2009 9:33 PM
Subject: RE: External sort


Sigh... Forest for the trees, Toke.

The date time represented as an integer _is_ the order and at the same time a 
global external representation. No need to jump through all the hoops I made. I 
had my mind full of a more general solution to the memory problem with sort.


Instead of the order-array being a long[] with order and datetime, it should be 
just an int[] with datetime. The FieldCacheImpl does this for INT-sorts , so 
there's no need for extra code if you just store the datetime as an integer 
(something like Integer.toString(datetimeAsInt) for the field-value) and use 
SortField(fieldname, SortField.INT) to sort with.

If you cannot store the datetime as an integer (e.g. if you don't control the 
indexing), you can use the FieldCacheImpl with a custom int-parser that 
translates your datetime representation to int.

The internal representation takes 4 bytes/document. If you need to go lower 
than that, I'll say you have a very uncommon setup.

It can be done by making custom code and storing the order-array on disk, but 
access-speed would suffer tremendously for searches with millions of hits. An 
alternative would be to reduce the granularity of the datetime and use 
SortField.SHORT or ShortField.BYTE. A third alternative would be to count the 
number of unique datetime-values and make a compressed representation, but that 
would make the creation of the order-array more complex.
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Send instant messages to your online friends http://in.messenger.yahoo.com

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to