Re: External sort

2009-12-18 Thread Toke Eskildsen
On Fri, 2009-12-18 at 12:47 +0100, Ganesh wrote: > I am using Integer for datetime. As the data grows, I am hitting the > upper limit. Could you give us some numbers? Document count, index size in GB, amount of RAM available for Lucene? > One option most of them in the group discussed about usin

Re: External sort

2009-12-18 Thread Ganesh
o: ; "Toke Eskildsen" Sent: Thursday, December 17, 2009 9:33 PM Subject: RE: External sort Sigh... Forest for the trees, Toke. The date time represented as an integer _is_ the order and at the same time a global external representation. No need to jump through all the hoops I made. I h

RE: External sort

2009-12-17 Thread Toke Eskildsen
(reposted to the list as I originally sent my reply to only Marvin by mistake) From: Marvin Humphrey [mar...@rectangular.com] > We use a hash set to perform uniquing, but when there are a lot of unique > values, the memory requirements of the SortWriter go very high. > > Assume that we're sorting

Re: External sort

2009-12-17 Thread Marvin Humphrey
On Thu, Dec 17, 2009 at 09:33:11AM -0800, Marvin Humphrey wrote: > On Thu, Dec 17, 2009 at 05:03:11PM +0100, Toke Eskildsen wrote: > > > A third alternative would be to count the number of unique datetime-values > > and make a compressed representation, but that would make the creation of > > the

Re: External sort

2009-12-17 Thread Marvin Humphrey
On Thu, Dec 17, 2009 at 05:03:11PM +0100, Toke Eskildsen wrote: > A third alternative would be to count the number of unique datetime-values > and make a compressed representation, but that would make the creation of > the order-array more complex. This is what we're doing in Lucy/KinoSearch, tho

RE: External sort

2009-12-17 Thread Toke Eskildsen
Sigh... Forest for the trees, Toke. The date time represented as an integer _is_ the order and at the same time a global external representation. No need to jump through all the hoops I made. I had my mind full of a more general solution to the memory problem with sort. Instead of the order-ar

Re: External sort

2009-12-17 Thread Toke Eskildsen
he 80MB are temporary on the first search. > Any idea of building external sort cache? You could dump the order-array on disk (huge performance hit on conventional harddisks), but it's hard to avoid the temporary inverse-array upon first search. Of course, you could generate it on index bu

Re: External sort

2009-12-17 Thread Ganesh
Thanks Toke. I worried to use long[] inverted = new long[reader.maxDoc]; as the memory consumption will be high for millions of document. Any idea of building external sort cache? Regards Ganesh - Original Message - From: "Toke Eskildsen" To: Sent: Thursday, December

Re: External sort

2009-12-17 Thread Toke Eskildsen
On Thu, 2009-12-17 at 07:48 +0100, Ganesh wrote: > I am using v2.9.1 I am having multiple shards and i need to do only > date time sorting. Sorting consumes 50% of RAM. I'm guessing that your date-times are representable in 32 bits (signed seconds since epoch or such - that'll work until 2038)? I

Re: External sort

2009-12-17 Thread Michael McCandless
There was a recent thread on neat algos that can compute the global order in multiple passes to keep RAM usage low. Mike On Thu, Dec 17, 2009 at 1:48 AM, Ganesh wrote: > Hello all, > > We are facing serious issues related to Sorting in production environment. I > know this issues has been discu

External sort

2009-12-16 Thread Ganesh
Hello all, We are facing serious issues related to Sorting in production environment. I know this issues has been discussed in this group. I am using v2.9.1 I am having multiple shards and i need to do only date time sorting. Sorting consumes 50% of RAM. Is anyone has attempted to use external