On Fri, 2009-12-18 at 12:47 +0100, Ganesh wrote:
> I am using Integer for datetime. As the data grows, I am hitting the
> upper limit.
Could you give us some numbers? Document count, index size in GB, amount
of RAM available for Lucene?
> One option most of them in the group discussed about usin
o: ; "Toke Eskildsen"
Sent: Thursday, December 17, 2009 9:33 PM
Subject: RE: External sort
Sigh... Forest for the trees, Toke.
The date time represented as an integer _is_ the order and at the same time a
global external representation. No need to jump through all the hoops I made. I
h
(reposted to the list as I originally sent my reply to only Marvin by mistake)
From: Marvin Humphrey [mar...@rectangular.com]
> We use a hash set to perform uniquing, but when there are a lot of unique
> values, the memory requirements of the SortWriter go very high.
>
> Assume that we're sorting
On Thu, Dec 17, 2009 at 09:33:11AM -0800, Marvin Humphrey wrote:
> On Thu, Dec 17, 2009 at 05:03:11PM +0100, Toke Eskildsen wrote:
>
> > A third alternative would be to count the number of unique datetime-values
> > and make a compressed representation, but that would make the creation of
> > the
On Thu, Dec 17, 2009 at 05:03:11PM +0100, Toke Eskildsen wrote:
> A third alternative would be to count the number of unique datetime-values
> and make a compressed representation, but that would make the creation of
> the order-array more complex.
This is what we're doing in Lucy/KinoSearch, tho
Sigh... Forest for the trees, Toke.
The date time represented as an integer _is_ the order and at the same time a
global external representation. No need to jump through all the hoops I made. I
had my mind full of a more general solution to the memory problem with sort.
Instead of the order-ar
On Thu, 2009-12-17 at 15:34 +0100, Ganesh wrote:
> Thanks Toke. I worried to use long[] inverted = new long[reader.maxDoc];
> as the memory consumption will be high for millions of document.
Well, how many documents do you have? 10 million? That's just 160MB in
overhead, of which the 80MB are te
17, 2009 6:07 PM
Subject: Re: External sort
> On Thu, 2009-12-17 at 07:48 +0100, Ganesh wrote:
>> I am using v2.9.1 I am having multiple shards and i need to do only
>> date time sorting. Sorting consumes 50% of RAM.
>
> I'm guessing that your date-times are repres
On Thu, 2009-12-17 at 07:48 +0100, Ganesh wrote:
> I am using v2.9.1 I am having multiple shards and i need to do only
> date time sorting. Sorting consumes 50% of RAM.
I'm guessing that your date-times are representable in 32 bits (signed
seconds since epoch or such - that'll work until 2038)? I
There was a recent thread on neat algos that can compute the global
order in multiple passes to keep RAM usage low.
Mike
On Thu, Dec 17, 2009 at 1:48 AM, Ganesh wrote:
> Hello all,
>
> We are facing serious issues related to Sorting in production environment. I
> know this issues has been discu
10 matches
Mail list logo