Very nice - it does pass now!
I wish there was a better way of incorporating the patch than just
shadowing the original StandardDirectoryReader with a patched one, but
unfortunately this class is final and FilterDirectoryReader doesn't seem to
do help here, making a cleaner approach seemingly impo
Yes, there is also a safety check, but IMO it should be removed.
See the patch on the issue, the test passes now.
On Wed, Sep 10, 2014 at 9:31 PM, Vitaly Funstein wrote:
> Seems to me the bug occurs regardless of whether the passed in newer reader
> is NRT or non-NRT. This is because the user op
Seems to me the bug occurs regardless of whether the passed in newer reader
is NRT or non-NRT. This is because the user operates at the level of
DirectoryReader, not SegmentReader and modifying the test code to do the
following reproduces the bug:
writer.commit();
DirectoryReader latest =
Thats because there are 3 constructors in segmentreader:
1. one used for opening new (checks hasDeletions, only reads liveDocs if so)
2. one used for non-NRT reopen <-- problem one for you
3. one used for NRT reopen (takes a LiveDocs as a param, so no bug)
so personally i think you should be able
One other observation - if instead of a reader opened at a later commit
point (T1), I pass in an NRT reader *without* doing the second commit on
the index prior, then there is no exception. This probably also hinges on
the assumption that no buffered docs have been flushed after T0, thus
creating n
>
> Normally, reopens only go forwards in time, so if you could ensure
> that when you reopen one reader to another, the 2nd one is always
> "newer", then I think you should never hit this issue
Mike, I'm not sure if I fully understand your suggestion. In a nutshell,
the use here case is as follo
Thanks, I'll look at the issue soon.
Right, segment merging won't spontaneously create deletes. Deletes
are only made if you explicitly delete OR (tricky) there is a
non-aborting exception (e.g. an analysis problem) hit while indexing a
document; in that case IW indexes a portion of the document
Okay, created LUCENE-5931 for this. As it turns out, my original test
actually does do deletes on the index so please disregard my question about
segment merging.
On Tue, Sep 9, 2014 at 3:00 PM, wrote:
> I'm on 4.6.1. I'll file an issue for sure, but is there a workaround you
> could think of i
I'm on 4.6.1. I'll file an issue for sure, but is there a workaround you could
think of in the meantime? As you probably remember, the reason for doing this
in the first place was to prevent the catastrophic heap exhaustion when
SegmentReader instances are opened from scratch for every new Index
Hmm, which Lucene version are you using? We recently beefed up the
checking in this code, so you ought to be hitting an exception in
newer versions.
But that being said, I think the bug is real: if you try to reopen
from a newer NRT reader down to an older (commit point) reader then
you can hit t
I think I see the bug here, but maybe I'm wrong. Here's my theory:
Suppose no segments at a particular commit point contain any deletes. Now,
we also hold open an NRT reader into the index, which may end up with some
deletes, after the commit occurred. Then, according to the following
conditional
UPDATE:
After making the changes we discussed to enable sharing of SegmentReaders
between the NRT reader and a commit point reader, specifically calling
through to DirectoryReader.openIfChanged(DirectoryReader, IndexCommit), I
am seeing this exception, sporadically:
Caused by: java.lang.NullPoint
On Thu, Aug 28, 2014 at 5:38 PM, Vitaly Funstein wrote:
> On Thu, Aug 28, 2014 at 1:25 PM, Michael McCandless <
> luc...@mikemccandless.com> wrote:
>
>>
>> The segments_N file can be different, that's fine: after that, we then
>> re-use SegmentReaders when they are in common between the two commit
Candless
Date: Thu, 28 Aug 2014 15:49:30
To: Lucene Users
Reply-To: java-user@lucene.apache.org
Subject: Re: BlockTreeTermsReader consumes crazy amount of memory
Ugh, you're right: this still won't re-use from IW's reader pool. Can
you open an issue? Somehow we should make this eas
Aug 2014 15:49:30
To: Lucene Users
Reply-To: java-user@lucene.apache.org
Subject: Re: BlockTreeTermsReader consumes crazy amount of memory
Ugh, you're right: this still won't re-use from IW's reader pool. Can
you open an issue? Somehow we should make this easier.
In the meantim
-(FST)=
Sent from my BlackBerry® smartphone
-Original Message-
From: Michael McCandless
Date: Thu, 28 Aug 2014 15:49:30
To: Lucene Users
Reply-To: java-user@lucene.apache.org
Subject: Re: BlockTreeTermsReader consumes crazy amount of memory
Ugh, you're right: this still won
o: Lucene Users
Reply-To: java-user@lucene.apache.org
Subject: Re: BlockTreeTermsReader consumes crazy amount of memory
Ugh, you're right: this still won't re-use from IW's reader pool. Can
you open an issue? Somehow we should make this easier.
In the meantime, I guess you can use
(Commit=all!!
Sent from my BlackBerry® smartphone
-Original Message-
From: Vitaly Funstein
Date: Thu, 28 Aug 2014 13:18:08
To:
Reply-To: java-user@lucene.apache.org
Subject: Re: BlockTreeTermsReader consumes crazy amount of memory
Thanks for the suggestions! I'll file an enhanc
Yes!
Sent from my BlackBerry® smartphone
-Original Message-
From: Vitaly Funstein
Date: Thu, 28 Aug 2014 14:39:50
To:
Reply-To: java-user@lucene.apache.org
Subject: Re: BlockTreeTermsReader consumes crazy amount of memory
On Thu, Aug 28, 2014 at 2:38 PM, Vitaly Funstein
wrote
==null!-(?)
Sent from my BlackBerry® smartphone
-Original Message-
From: Vitaly Funstein
Date: Thu, 28 Aug 2014 13:18:08
To:
Reply-To: java-user@lucene.apache.org
Subject: Re: BlockTreeTermsReader consumes crazy amount of memory
Thanks for the suggestions! I'll file an enhanc
>= 2*(min-1),
Sent from my BlackBerry® smartphone
-Original Message-
From: Vitaly Funstein
Date: Thu, 28 Aug 2014 14:38:37
To:
Reply-To: java-user@lucene.apache.org
Subject: Re: BlockTreeTermsReader consumes crazy amount of memory
On Thu, Aug 28, 2014 at 1:25 PM, Michael McCandl
openIfChanged(latestNRTReader,
Index Commit):
Sent from my BlackBerry® smartphone
-Original Message-
From: Michael McCandless
Date: Thu, 28 Aug 2014 15:49:30
To: Lucene Users
Reply-To: java-user@lucene.apache.org
Subject: Re: BlockTreeTermsReader consumes crazy amount of memory
Ugh
Subject: Re: BlockTreeTermsReader consumes crazy amount of memory
You can actually use IndexReader.openIfChanged(latestNRTReader,
IndexCommit): this should pull/share SegmentReaders from the pool
inside IW, when available. But it will fail to share e.g.
SegmentReader no longer part of the NRT view
: BlockTreeTermsReader consumes crazy amount of memory
You can actually use IndexReader.openIfChanged(latestNRTReader,
IndexCommit): this should pull/share SegmentReaders from the pool
inside IW, when available. But it will fail to share e.g.
SegmentReader no longer part of the NRT view but shared by
On Thu, Aug 28, 2014 at 2:38 PM, Vitaly Funstein
wrote:
>
> Looks like this is used inside Lucene41PostingsFormat, which simply passes
> in those defaults - so you are effectively saying the minimum (and
> therefore, maximum) block size can be raised to reuse the size of the terms
> index inside
On Thu, Aug 28, 2014 at 1:25 PM, Michael McCandless <
luc...@mikemccandless.com> wrote:
>
> The segments_N file can be different, that's fine: after that, we then
> re-use SegmentReaders when they are in common between the two commit
> points. Each segments_N file refers to many segments...
>
>
Y
On Thu, Aug 28, 2014 at 4:18 PM, Vitaly Funstein wrote:
> Thanks for the suggestions! I'll file an enhancement request.
>
> But I am still a little skeptical about the approach of "pooling" segment
> readers from prior DirectoryReader instances, opened at earlier commit
> points. It looks like the
Thanks for the suggestions! I'll file an enhancement request.
But I am still a little skeptical about the approach of "pooling" segment
readers from prior DirectoryReader instances, opened at earlier commit
points. It looks like the up to date check for non-NRT directory reader
just compares the s
: BlockTreeTermsReader consumes crazy amount of memory
You can actually use IndexReader.openIfChanged(latestNRTReader,
IndexCommit): this should pull/share SegmentReaders from the pool
inside IW, when available. But it will fail to share e.g.
SegmentReader no longer part of the NRT view but shared by
Ugh, you're right: this still won't re-use from IW's reader pool. Can
you open an issue? Somehow we should make this easier.
In the meantime, I guess you can use openIfChanged from your "back in
time" reader to open another "back in time" reader. This way you have
two pools... IW's pool for the
Sent from my BlackBerry® smartphone
-Original Message-
From: Vitaly Funstein
Date: Thu, 28 Aug 2014 10:56:17
To:
Reply-To: java-user@lucene.apache.org
Subject: Re: BlockTreeTermsReader consumes crazy amount of memory
Here's the link:
https://drive.google.com/f
Thanks, Mike - I think the issue is actually the latter, i.e. SegmentReader
on its own can certainly use enough heap to cause problems, which of course
would be made that much worse by failure to pool readers for unchanged
segments.
But where are you seeing the behavior that would result in reuse
Can you drill down some more to see what's using those ~46 MB? Is the
the FSTs in the terms index?
But, we need to decouple the "single segment is opened with multiple
SegmentReaders" from e.g. "single SegmentReader is using too much RAM
to hold terms index". E.g. from this screen shot it looks
Bremen
> http://www.thetaphi.de
> eMail: u...@thetaphi.de
>
>
>> -Original Message-
>> From: Vitaly Funstein [mailto:vfunst...@gmail.com]
>> Sent: Thursday, August 28, 2014 7:56 PM
>> To: java-user@lucene.apache.org
>> Subject: Re: BlockTreeT
...@thetaphi.de
> -Original Message-
> From: Vitaly Funstein [mailto:vfunst...@gmail.com]
> Sent: Thursday, August 28, 2014 7:56 PM
> To: java-user@lucene.apache.org
> Subject: Re: BlockTreeTermsReader consumes crazy amount of memory
>
> Here's the link:
> ht
Here's the link:
https://drive.google.com/file/d/0B5eRTXMELFjjbUhSUW9pd2lVN00/edit?usp=sharing
I'm indexing let's say 11 unique fields per document. Also, the NRT reader
is opened continually, and "regular" searches use that one. But a special
kind of feature allows searching a particular point in
Hmm screen shot didn't make it ... can you post link?
If you are using NRT reader then when a new one is opened, it won't
open new SegmentReaders for all segments, just for newly
flushed/merged segments since the last reader was opened. So for your
N commit points that you have readers open for,
Mike,
Here's the screenshot; not sure if it will go through as an attachment
though - if not, I'll post it as a link. Please ignore the altered package
names, since Lucene is shaded in as part of our build process.
Some more context about the use case. Yes, the terms are pretty much
unique; the s
This is surprising: unless you have an excessive number of unique
fields, BlockTreeTermReader shouldn't be such a big RAM consumer.
Bu you only have 12 unique fields?
Can you post screen shots of the heap usage?
Mike McCandless
http://blog.mikemccandless.com
On Tue, Aug 26, 2014 at 3:53 PM, V
This is a follow up to the earlier thread I started to understand memory
usage patterns of SegmentReader instances, but I decided to create a
separate post since this issue is much more serious than the heap overhead
created by use of stored field compression.
Here is the use case, once again. The
40 matches
Mail list logo