I think this'd make a nice contribution -- eg it could be bundled up
as a FieldComparator impl, eg LowMemoryStringComparator, that would
compute the global ords in multiple passes with limited RAM usage.
It'd give users the space/time tradeoff...
Mike
On Mon, Dec 14, 2009 at 9:09 AM, Toke Eskild
On Fri, 2009-12-11 at 14:53 +0100, Michael McCandless wrote:
> How long does Lucene take to build the ords for the toplevel reader?
>
> You should be able to just time FieldCache.getStringIndex(topLevelReader).
>
> I think your 8.5 seconds for first Lucene search was with the
> StringIndex compute
. The order-array is updated for the documents that
>> > has
>> > one of these terms. The sliding is repeated multiple times, where terms
>> > ordered
>> > before the last term of the previous iteration are ignored.
>> >
>> > Cons: _Very_ slow (too slow in the current implementation) order bu
t; >
> > Cons: _Very_ slow (too slow in the current implementation) order build.
> > Pros: Same as above.
> > Joker: The buffer size determines memory use vs. order build time.
> >
> >
> > The multipass approach looks promising, but requires more work to get
> > Joker: The buffer size determines memory use vs. order build time.
> >
> >
> > The multipass approach looks promising, but requires more work to get to a
> > usable state. Right now it takes minutes to build the order-array for half a
> > million documents, with a buffer size req
On Thu, Dec 10, 2009 at 2:05 AM, Ganesh wrote:
> I think, This problem will happen for all sorted fields. I am sorting on
> integer field.
Integer field should take much less RAM than String, today, for
sorting. And there's no efficiency gained by doing this globally (per
segment is just fine).
r-array for half a
> million documents, with a buffer size requiring 5 iterations. If I ever get
> it to
> work, I'll be sure to share it.
>
> Regards,
> Toke Eskildsen
>
> ________
> From: TCK [moonwatcher32...@gmail.com]
> Sent: 09 December 20
e to share it.
Regards,
Toke Eskildsen
From: TCK [moonwatcher32...@gmail.com]
Sent: 09 December 2009 22:58
To: java-user@lucene.apache.org
Subject: Re: heap memory issues when sorting by a string field
Thanks Mike for opening this jira ticket and for your p
rds,
Toke Eskildsen
From: TCK [moonwatcher32...@gmail.com]
Sent: 09 December 2009 22:58
To: java-user@lucene.apache.org
Subject: Re: heap memory issues when sorting by a string field
Thanks Mike for opening this jira ticket and for your patch. Explicitly
removing the entry from the
It's not that it's "necessary" -- this is just how Lucene's sorting
has always worked ;) But, it's just software! You could whip up a
patch...
I'm not familiar with the order-maintenance problem & solutions
offhand, but it certainly sounds interesting.
One issue is that loading only certain val
Thanks Mike for opening this jira ticket and for your patch. Explicitly
removing the entry from the WHM definitely does reduce the number of GC
cycles taken to free the huge StringIndex objects that get created when
doing a sort by a string field.
But I'm still trying to figure out why it is neces
I've opened LUCENE-2135.
Mike
On Tue, Dec 8, 2009 at 5:36 AM, Michael McCandless
wrote:
> This is a rather disturbing implementation detail of WeakHashMap, that
> it needs the one extra step (invoking one of its methods) for its weak
> keys to be reclaimable.
>
> Maybe on IndexReader.close(), Lu
This is a rather disturbing implementation detail of WeakHashMap, that
it needs the one extra step (invoking one of its methods) for its weak
keys to be reclaimable.
Maybe on IndexReader.close(), Lucene should go and evict all entries
in the FieldCache associated with that reader. Ie, step throug
TCK,
CSIndexInput is returned by SegmentReader.getFieldCacheKey()
If you think it's an issue, then it'd be good to open an issue and submit
some code as a patch, maybe a test case showing the WHM isn't
removing values like it's supposed to.
Jason
On Mon, Dec 7, 2009 at 10:45 PM, TCK wrote:
> T
> It's an apache license - but you mentioned something about no third party
> libraries. Is that a policy for Lucene?
Pretty much... Though one can always submit a patch anyways.
On Mon, Dec 7, 2009 at 4:57 PM, Tom Hill wrote:
> Hey, that's a nice little Class! I hadn't see it before. But it so
Thanks for the feedback guys. The evidence I have collected does point to an
issue either in the java WeakHashMap implementation or in Lucene's use of
it. In particular, I used reflection to replace the WeakHashMap instances
with my own dummy Map that does a no-op for the put operation, and althoug
Hey, that's a nice little Class! I hadn't see it before. But it sounds like
the asynchronous cleanup might deal with the problem I mentioned above (but
I haven't looked at the code yet).
It's an apache license - but you mentioned something about no third party
libraries. Is that a policy for Lucen
I wonder if Google Collections (even though we don't use third party
libraries) concurrent map, which supports weak keys, handles the
removal of weakly referenced keys in a more elegant way than Java's
WeakHashMap?
On Mon, Dec 7, 2009 at 4:38 PM, Tom Hill wrote:
> Hi -
>
> If I understand correct
Hi -
If I understand correctly, WeakHashMap does not free the memory for the
value (cached data) when the key is nulled, or even when the key is garbage
collected.
It requires one more step: a method on WeakHashMap must be called to allow
it to release its hard reference to the cached data. It ap
Thanks for the response. But I'm definitely calling close() on the old
reader and opening a new one (not using reopen). Also, to simplify the
analysis, I did my test with a single-threaded requester to eliminate any
concurrency issues.
I'm doing:
sSearcher.getIndexReader().close();
sSearcher.close
What this sounds like is that you're not really closing your
readers even though you think you are. Sorting indeed uses up
significant memory when it populates internal caches and keeps
it around for later use (which is one of the reasons that warming
queries matter). But if you really do close the
21 matches
Mail list logo