On Fri, 2009-12-18 at 12:47 +0100, Ganesh wrote:
> I am using Integer for datetime. As the data grows, I am hitting the
> upper limit.
Could you give us some numbers? Document count, index size in GB, amount
of RAM available for Lucene?
> One option most of them in the group discussed about usin
o: ; "Toke Eskildsen"
Sent: Thursday, December 17, 2009 9:33 PM
Subject: RE: External sort
Sigh... Forest for the trees, Toke.
The date time represented as an integer _is_ the order and at the same time a
global external representation. No need to jump through all the hoops I made. I
h
(reposted to the list as I originally sent my reply to only Marvin by mistake)
From: Marvin Humphrey [mar...@rectangular.com]
> We use a hash set to perform uniquing, but when there are a lot of unique
> values, the memory requirements of the SortWriter go very high.
>
> Assume that we're sorting
On Thu, Dec 17, 2009 at 09:33:11AM -0800, Marvin Humphrey wrote:
> On Thu, Dec 17, 2009 at 05:03:11PM +0100, Toke Eskildsen wrote:
>
> > A third alternative would be to count the number of unique datetime-values
> > and make a compressed representation, but that would make the creation of
> > the
On Thu, Dec 17, 2009 at 05:03:11PM +0100, Toke Eskildsen wrote:
> A third alternative would be to count the number of unique datetime-values
> and make a compressed representation, but that would make the creation of
> the order-array more complex.
This is what we're doing in Lucy/KinoSearch, tho
Sigh... Forest for the trees, Toke.
The date time represented as an integer _is_ the order and at the same time a
global external representation. No need to jump through all the hoops I made. I
had my mind full of a more general solution to the memory problem with sort.
Instead of the order-ar
he 80MB are temporary on the first search.
> Any idea of building external sort cache?
You could dump the order-array on disk (huge performance hit on
conventional harddisks), but it's hard to avoid the temporary
inverse-array upon first search. Of course, you could generate it on
index bu
Thanks Toke. I worried to use long[] inverted = new long[reader.maxDoc]; as
the memory consumption will be high for millions of document.
Any idea of building external sort cache?
Regards
Ganesh
- Original Message -
From: "Toke Eskildsen"
To:
Sent: Thursday, December
On Thu, 2009-12-17 at 07:48 +0100, Ganesh wrote:
> I am using v2.9.1 I am having multiple shards and i need to do only
> date time sorting. Sorting consumes 50% of RAM.
I'm guessing that your date-times are representable in 32 bits (signed
seconds since epoch or such - that'll work until 2038)? I
There was a recent thread on neat algos that can compute the global
order in multiple passes to keep RAM usage low.
Mike
On Thu, Dec 17, 2009 at 1:48 AM, Ganesh wrote:
> Hello all,
>
> We are facing serious issues related to Sorting in production environment. I
> know this issues has been discu
Hello all,
We are facing serious issues related to Sorting in production environment. I
know this issues has been discussed in this group. I am using v2.9.1 I am
having multiple shards and i need to do only date time sorting. Sorting
consumes 50% of RAM.
Is anyone has attempted to use external
11 matches
Mail list logo