subject:"BlockTreeTermsReader consumes crazy amount of memory"

Re: BlockTreeTermsReader consumes crazy amount of memory

2014-09-11 Thread Vitaly Funstein

Very nice - it does pass now! I wish there was a better way of incorporating the patch than just shadowing the original StandardDirectoryReader with a patched one, but unfortunately this class is final and FilterDirectoryReader doesn't seem to do help here, making a cleaner approach seemingly impo

Re: BlockTreeTermsReader consumes crazy amount of memory

2014-09-10 Thread Robert Muir

Yes, there is also a safety check, but IMO it should be removed. See the patch on the issue, the test passes now. On Wed, Sep 10, 2014 at 9:31 PM, Vitaly Funstein wrote: > Seems to me the bug occurs regardless of whether the passed in newer reader > is NRT or non-NRT. This is because the user op

Re: BlockTreeTermsReader consumes crazy amount of memory

2014-09-10 Thread Vitaly Funstein

Seems to me the bug occurs regardless of whether the passed in newer reader is NRT or non-NRT. This is because the user operates at the level of DirectoryReader, not SegmentReader and modifying the test code to do the following reproduces the bug: writer.commit(); DirectoryReader latest =

Re: BlockTreeTermsReader consumes crazy amount of memory

2014-09-10 Thread Robert Muir

Thats because there are 3 constructors in segmentreader: 1. one used for opening new (checks hasDeletions, only reads liveDocs if so) 2. one used for non-NRT reopen <-- problem one for you 3. one used for NRT reopen (takes a LiveDocs as a param, so no bug) so personally i think you should be able

Re: BlockTreeTermsReader consumes crazy amount of memory

2014-09-10 Thread Vitaly Funstein

One other observation - if instead of a reader opened at a later commit point (T1), I pass in an NRT reader *without* doing the second commit on the index prior, then there is no exception. This probably also hinges on the assumption that no buffered docs have been flushed after T0, thus creating n

Re: BlockTreeTermsReader consumes crazy amount of memory

2014-09-10 Thread Vitaly Funstein

> > Normally, reopens only go forwards in time, so if you could ensure > that when you reopen one reader to another, the 2nd one is always > "newer", then I think you should never hit this issue Mike, I'm not sure if I fully understand your suggestion. In a nutshell, the use here case is as follo

Re: BlockTreeTermsReader consumes crazy amount of memory

2014-09-10 Thread Michael McCandless

Thanks, I'll look at the issue soon. Right, segment merging won't spontaneously create deletes. Deletes are only made if you explicitly delete OR (tricky) there is a non-aborting exception (e.g. an analysis problem) hit while indexing a document; in that case IW indexes a portion of the document

Re: BlockTreeTermsReader consumes crazy amount of memory

2014-09-09 Thread Vitaly Funstein

Okay, created LUCENE-5931 for this. As it turns out, my original test actually does do deletes on the index so please disregard my question about segment merging. On Tue, Sep 9, 2014 at 3:00 PM, wrote: > I'm on 4.6.1. I'll file an issue for sure, but is there a workaround you > could think of i

Re: BlockTreeTermsReader consumes crazy amount of memory

2014-09-09 Thread vfunstein

I'm on 4.6.1. I'll file an issue for sure, but is there a workaround you could think of in the meantime? As you probably remember, the reason for doing this in the first place was to prevent the catastrophic heap exhaustion when SegmentReader instances are opened from scratch for every new Index

Re: BlockTreeTermsReader consumes crazy amount of memory

2014-09-09 Thread Michael McCandless

Hmm, which Lucene version are you using? We recently beefed up the checking in this code, so you ought to be hitting an exception in newer versions. But that being said, I think the bug is real: if you try to reopen from a newer NRT reader down to an older (commit point) reader then you can hit t

Re: BlockTreeTermsReader consumes crazy amount of memory

2014-09-08 Thread Vitaly Funstein

I think I see the bug here, but maybe I'm wrong. Here's my theory: Suppose no segments at a particular commit point contain any deletes. Now, we also hold open an NRT reader into the index, which may end up with some deletes, after the commit occurred. Then, according to the following conditional

Re: BlockTreeTermsReader consumes crazy amount of memory

2014-09-08 Thread Vitaly Funstein

UPDATE: After making the changes we discussed to enable sharing of SegmentReaders between the NRT reader and a commit point reader, specifically calling through to DirectoryReader.openIfChanged(DirectoryReader, IndexCommit), I am seeing this exception, sporadically: Caused by: java.lang.NullPoint

Re: BlockTreeTermsReader consumes crazy amount of memory

2014-08-29 Thread Michael McCandless

On Thu, Aug 28, 2014 at 5:38 PM, Vitaly Funstein wrote: > On Thu, Aug 28, 2014 at 1:25 PM, Michael McCandless < > luc...@mikemccandless.com> wrote: > >> >> The segments_N file can be different, that's fine: after that, we then >> re-use SegmentReaders when they are in common between the two commit

Re: BlockTreeTermsReader consumes crazy amount of memory

2014-08-28 Thread craiglang44

Candless Date: Thu, 28 Aug 2014 15:49:30 To: Lucene Users Reply-To: java-user@lucene.apache.org Subject: Re: BlockTreeTermsReader consumes crazy amount of memory Ugh, you're right: this still won't re-use from IW's reader pool. Can you open an issue? Somehow we should make this eas

Re: BlockTreeTermsReader consumes crazy amount of memory

2014-08-28 Thread craiglang44

Aug 2014 15:49:30 To: Lucene Users Reply-To: java-user@lucene.apache.org Subject: Re: BlockTreeTermsReader consumes crazy amount of memory Ugh, you're right: this still won't re-use from IW's reader pool. Can you open an issue? Somehow we should make this easier. In the meantim

Re: BlockTreeTermsReader consumes crazy amount of memory

2014-08-28 Thread craiglang44

-(FST)= Sent from my BlackBerry® smartphone -Original Message- From: Michael McCandless Date: Thu, 28 Aug 2014 15:49:30 To: Lucene Users Reply-To: java-user@lucene.apache.org Subject: Re: BlockTreeTermsReader consumes crazy amount of memory Ugh, you're right: this still won

Re: BlockTreeTermsReader consumes crazy amount of memory

2014-08-28 Thread craiglang44

o: Lucene Users Reply-To: java-user@lucene.apache.org Subject: Re: BlockTreeTermsReader consumes crazy amount of memory Ugh, you're right: this still won't re-use from IW's reader pool. Can you open an issue? Somehow we should make this easier. In the meantime, I guess you can use

Re: BlockTreeTermsReader consumes crazy amount of memory

2014-08-28 Thread craiglang44

(Commit=all!! Sent from my BlackBerry® smartphone -Original Message- From: Vitaly Funstein Date: Thu, 28 Aug 2014 13:18:08 To: Reply-To: java-user@lucene.apache.org Subject: Re: BlockTreeTermsReader consumes crazy amount of memory Thanks for the suggestions! I'll file an enhanc

Re: BlockTreeTermsReader consumes crazy amount of memory

2014-08-28 Thread craiglang44

Yes! Sent from my BlackBerry® smartphone -Original Message- From: Vitaly Funstein Date: Thu, 28 Aug 2014 14:39:50 To: Reply-To: java-user@lucene.apache.org Subject: Re: BlockTreeTermsReader consumes crazy amount of memory On Thu, Aug 28, 2014 at 2:38 PM, Vitaly Funstein wrote

Re: BlockTreeTermsReader consumes crazy amount of memory

2014-08-28 Thread craiglang44

==null!-(?) Sent from my BlackBerry® smartphone -Original Message- From: Vitaly Funstein Date: Thu, 28 Aug 2014 13:18:08 To: Reply-To: java-user@lucene.apache.org Subject: Re: BlockTreeTermsReader consumes crazy amount of memory Thanks for the suggestions! I'll file an enhanc

Re: BlockTreeTermsReader consumes crazy amount of memory

2014-08-28 Thread craiglang44

>= 2*(min-1), Sent from my BlackBerry® smartphone -Original Message- From: Vitaly Funstein Date: Thu, 28 Aug 2014 14:38:37 To: Reply-To: java-user@lucene.apache.org Subject: Re: BlockTreeTermsReader consumes crazy amount of memory On Thu, Aug 28, 2014 at 1:25 PM, Michael McCandl

Re: BlockTreeTermsReader consumes crazy amount of memory

2014-08-28 Thread craiglang44

openIfChanged(latestNRTReader, Index Commit): Sent from my BlackBerry® smartphone -Original Message- From: Michael McCandless Date: Thu, 28 Aug 2014 15:49:30 To: Lucene Users Reply-To: java-user@lucene.apache.org Subject: Re: BlockTreeTermsReader consumes crazy amount of memory Ugh

Re: BlockTreeTermsReader consumes crazy amount of memory

2014-08-28 Thread craiglang44

Subject: Re: BlockTreeTermsReader consumes crazy amount of memory You can actually use IndexReader.openIfChanged(latestNRTReader, IndexCommit): this should pull/share SegmentReaders from the pool inside IW, when available. But it will fail to share e.g. SegmentReader no longer part of the NRT view

Re: BlockTreeTermsReader consumes crazy amount of memory

2014-08-28 Thread craiglang44

: BlockTreeTermsReader consumes crazy amount of memory You can actually use IndexReader.openIfChanged(latestNRTReader, IndexCommit): this should pull/share SegmentReaders from the pool inside IW, when available. But it will fail to share e.g. SegmentReader no longer part of the NRT view but shared by

Re: BlockTreeTermsReader consumes crazy amount of memory

2014-08-28 Thread Vitaly Funstein

On Thu, Aug 28, 2014 at 2:38 PM, Vitaly Funstein wrote: > > Looks like this is used inside Lucene41PostingsFormat, which simply passes > in those defaults - so you are effectively saying the minimum (and > therefore, maximum) block size can be raised to reuse the size of the terms > index inside

Re: BlockTreeTermsReader consumes crazy amount of memory

2014-08-28 Thread Vitaly Funstein

On Thu, Aug 28, 2014 at 1:25 PM, Michael McCandless < luc...@mikemccandless.com> wrote: > > The segments_N file can be different, that's fine: after that, we then > re-use SegmentReaders when they are in common between the two commit > points. Each segments_N file refers to many segments... > > Y

Re: BlockTreeTermsReader consumes crazy amount of memory

2014-08-28 Thread Michael McCandless

On Thu, Aug 28, 2014 at 4:18 PM, Vitaly Funstein wrote: > Thanks for the suggestions! I'll file an enhancement request. > > But I am still a little skeptical about the approach of "pooling" segment > readers from prior DirectoryReader instances, opened at earlier commit > points. It looks like the

Re: BlockTreeTermsReader consumes crazy amount of memory

2014-08-28 Thread Vitaly Funstein

Thanks for the suggestions! I'll file an enhancement request. But I am still a little skeptical about the approach of "pooling" segment readers from prior DirectoryReader instances, opened at earlier commit points. It looks like the up to date check for non-NRT directory reader just compares the s

Re: BlockTreeTermsReader consumes crazy amount of memory

2014-08-28 Thread craiglang44

: BlockTreeTermsReader consumes crazy amount of memory You can actually use IndexReader.openIfChanged(latestNRTReader, IndexCommit): this should pull/share SegmentReaders from the pool inside IW, when available. But it will fail to share e.g. SegmentReader no longer part of the NRT view but shared by

Re: BlockTreeTermsReader consumes crazy amount of memory

2014-08-28 Thread Michael McCandless

Ugh, you're right: this still won't re-use from IW's reader pool. Can you open an issue? Somehow we should make this easier. In the meantime, I guess you can use openIfChanged from your "back in time" reader to open another "back in time" reader. This way you have two pools... IW's pool for the

Re: BlockTreeTermsReader consumes crazy amount of memory

2014-08-28 Thread craiglang44

Sent from my BlackBerry® smartphone -Original Message- From: Vitaly Funstein Date: Thu, 28 Aug 2014 10:56:17 To: Reply-To: java-user@lucene.apache.org Subject: Re: BlockTreeTermsReader consumes crazy amount of memory Here's the link: https://drive.google.com/f

Re: BlockTreeTermsReader consumes crazy amount of memory

2014-08-28 Thread Vitaly Funstein

Thanks, Mike - I think the issue is actually the latter, i.e. SegmentReader on its own can certainly use enough heap to cause problems, which of course would be made that much worse by failure to pool readers for unchanged segments. But where are you seeing the behavior that would result in reuse

Re: BlockTreeTermsReader consumes crazy amount of memory

2014-08-28 Thread Michael McCandless

Can you drill down some more to see what's using those ~46 MB? Is the the FSTs in the terms index? But, we need to decouple the "single segment is opened with multiple SegmentReaders" from e.g. "single SegmentReader is using too much RAM to hold terms index". E.g. from this screen shot it looks

Re: BlockTreeTermsReader consumes crazy amount of memory

2014-08-28 Thread Michael McCandless

Bremen > http://www.thetaphi.de > eMail: u...@thetaphi.de > > >> -Original Message- >> From: Vitaly Funstein [mailto:vfunst...@gmail.com] >> Sent: Thursday, August 28, 2014 7:56 PM >> To: java-user@lucene.apache.org >> Subject: Re: BlockTreeT

RE: BlockTreeTermsReader consumes crazy amount of memory

2014-08-28 Thread Uwe Schindler

...@thetaphi.de > -Original Message- > From: Vitaly Funstein [mailto:vfunst...@gmail.com] > Sent: Thursday, August 28, 2014 7:56 PM > To: java-user@lucene.apache.org > Subject: Re: BlockTreeTermsReader consumes crazy amount of memory > > Here's the link: > ht

Re: BlockTreeTermsReader consumes crazy amount of memory

2014-08-28 Thread Vitaly Funstein

Here's the link: https://drive.google.com/file/d/0B5eRTXMELFjjbUhSUW9pd2lVN00/edit?usp=sharing I'm indexing let's say 11 unique fields per document. Also, the NRT reader is opened continually, and "regular" searches use that one. But a special kind of feature allows searching a particular point in

Re: BlockTreeTermsReader consumes crazy amount of memory

2014-08-28 Thread Michael McCandless

Hmm screen shot didn't make it ... can you post link? If you are using NRT reader then when a new one is opened, it won't open new SegmentReaders for all segments, just for newly flushed/merged segments since the last reader was opened. So for your N commit points that you have readers open for,

Re: BlockTreeTermsReader consumes crazy amount of memory

2014-08-27 Thread Vitaly Funstein

Mike, Here's the screenshot; not sure if it will go through as an attachment though - if not, I'll post it as a link. Please ignore the altered package names, since Lucene is shaded in as part of our build process. Some more context about the use case. Yes, the terms are pretty much unique; the s

Re: BlockTreeTermsReader consumes crazy amount of memory

2014-08-27 Thread Michael McCandless

This is surprising: unless you have an excessive number of unique fields, BlockTreeTermReader shouldn't be such a big RAM consumer. Bu you only have 12 unique fields? Can you post screen shots of the heap usage? Mike McCandless http://blog.mikemccandless.com On Tue, Aug 26, 2014 at 3:53 PM, V

BlockTreeTermsReader consumes crazy amount of memory

2014-08-26 Thread Vitaly Funstein

This is a follow up to the earlier thread I started to understand memory usage patterns of SegmentReader instances, but I decided to create a separate post since this issue is much more serious than the heap overhead created by use of stored field compression. Here is the use case, once again. The

40 matches

Mail list logo