I agree "file sitting" is not great, but at worse this causes a higher
transient disk usage, which happens already if you have readers open
against those files, during merging, during CFS building, etc.

A number of users have complained about the apparent RAM usage of
WeakIdentityMap, and it adds complexity to ByteBufferIndexInput to do
this tracking ... I think defaulting the unmap hack to off is best for
users of MMapDir.

Mike McCandless

http://blog.mikemccandless.com


On Thu, Aug 8, 2013 at 9:09 AM, Uwe Schindler <u...@thetaphi.de> wrote:
> Hi Mike,
>
> I don't think disabling by default is a good idea. It is not only 64 bit 
> wasted address space (which is not a problem at all, you are right), but the 
> JVM also "sits" on those files:
> - On windows they cannot be deleted (not even on Java 7 w/ Lucene trunk, 
> where you can now delete them if not mmapped) - this may cause major pain...!
> - On posix the disk space is locked, so the inode can only be freed when GC 
> was freeing the mapping
>
> Uwe
>
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: u...@thetaphi.de
>
>
>> -----Original Message-----
>> From: Michael McCandless [mailto:luc...@mikemccandless.com]
>> Sent: Thursday, August 08, 2013 2:18 PM
>> To: Lucene Users
>> Subject: Re: WeakIdentityMap high memory usage
>>
>> Thanks for bringing closure.
>>
>> Note that you should still run a tight ship, ie don't give excess heap to
>> Lucene, and instead let the OS take up the slack of any spare RAM for IO
>> caching.  Especially with unmap disabled, the JVM will now only unmap once
>> a map is GC'd, so the larger your heap the longer these unused maps are
>> held open.
>>
>> Maybe we should disable unmap by default; I don't see what value it brings
>> for 64 bit envs.
>>
>> Mike McCandless
>>
>> http://blog.mikemccandless.com
>>
>>
>> On Wed, Aug 7, 2013 at 9:34 PM, Denis Bazhenov <dot...@gmail.com>
>> wrote:
>> > Uwe, Michael, thank you very much for your help. We have deployed one
>> of the nodes in our system and tomorrow I'll have more information on that,
>> but it seems that setUseUnmap(false) trick did the job. RT drops 
>> significantly
>> comparing to 3.6.0 version. We have about 100 rps per search-node and
>> commit interval about 1 minute, so switching off the unmap seems like a
>> good idea.
>> >
>> > There is one more question. As far as I understand, this map is like a 
>> > fuse in
>> situation where clients continue to use IndexReader after it is already 
>> closed.
>> So if the code is correctly closing IndexReaders (only after all clients have
>> finished using it), there is no need to use this sort of weak map hashing. 
>> Did I
>> get it right?
>> >
>> > On Aug 8, 2013, at 4:31 AM, "Uwe Schindler" <u...@thetaphi.de> wrote:
>> >
>> >> Hi Denis,
>> >>
>> >> I assume you are using Lucene 3.6.0, because in Lucene 3.6.1 the tracking
>> of buffers using weak references is also done (although you cannot switch it
>> off, unfortunately).
>> >>
>> >> I can confirm what Mike says: Its all weak references and the overhead is
>> maybe large, but it gets freed when memory gets low. In general its in most
>> cases better to not allocate too much heap space for Lucene as this makes
>> those maps larger and GC gets stressed. Only use as much memory so no
>> OOM occurs and instead free al memory for the file system cache (so it has
>> less paging). In that case, GC will clean up the concurrent maps faster.
>> >>
>> >> In gernal: If you have an large index that changes seldom, but your query
>> rate is very hight (like 200 queries per second), switch unmapping off (works
>> since Lucene 4.2, see changelog for LUCENE-4740 - unfortunately the issue
>> itself was closed for 4.4, 4.2 would be correct). In that case it's not 
>> needed to
>> take care of unmapping and as index reopen rate is low, this does not waste
>> resources.
>> >>
>> >> But if your index changes often, there is no way around unmapping - or
>> use NIOFSDir with NRTCachingDirectory for the optimization of near real time
>> search with highly changing indexes!
>> >>
>> >> Finally: The only way to fix this would be to make all codec structures 
>> >> like
>> TermsEnum or DocsEnum, but also Scorer/DocIdSet/... implement Closeable.
>> When you are done with Scorer you have to close it and the underlying
>> cloned indexinput would be closed, too. In that case, the cloned IndexInput
>> would be refcounted and unmapped when the last clone is closed. This is a
>> larger change and might be an idea for Lucene 5.0 as "optimization". It would
>> be a backwards break because all codecs and all queries would need to close
>> correctly, but with our test frameworak and MockDirWrapper (and other
>> MockFooBarWrappers) we could track this so all resources are closed.
>> >> We had TermEnum.close() up to Lucene 3.x, but it was dropped in 4.0
>> because it was never working in 3.x (nobody ever called close() on
>> TermEnum or TermDocs instances.... :( ). With our new test framework this
>> could be tracked now... So maybe worth a try?
>> >>
>> >> Uwe
>> >>
>> >> -----
>> >> Uwe Schindler
>> >> H.-H.-Meier-Allee 63, D-28213 Bremen
>> >> http://www.thetaphi.de
>> >> eMail: u...@thetaphi.de
>> >>
>> >>> -----Original Message-----
>> >>> From: Michael McCandless [mailto:luc...@mikemccandless.com]
>> >>> Sent: Wednesday, August 07, 2013 3:45 PM
>> >>> To: Lucene Users
>> >>> Subject: Re: WeakIdentityMap high memory usage
>> >>>
>> >>> This map is used to track all cloned open files, which can be a very
>> >>> large number over time (each search will create maybe 3 of them).
>> >>>
>> >>> This is done as a "best effort" to prevent SEGV (JVM dies) if you
>> >>> accidentally try to use an IndexReader after it was closed, while using
>> MMapDirectory.
>> >>>
>> >>> However, it's a weak map, which means when HEAP is tight GC should
>> >>> drop it.
>> >>>
>> >>> So, this should not cause a real problem in "real life", even though
>> >>> it looks scary when you look at its RAM usage under a profiler.
>> >>>
>> >>> If somehow it's causing "real life" problems, please report back!
>> >>> But a simple workaround is to call MMapDirectory.setUseUnmap(false)
>> >>> to turn off this tracking; this means you rely on GC to (eventually)
>> unmap.
>> >>>
>> >>> Mike McCandless
>> >>>
>> >>> http://blog.mikemccandless.com
>> >>>
>> >>>
>> >>> On Wed, Aug 7, 2013 at 2:45 AM, Denis Bazhenov
>> >>> <bazhe...@farpost.com>
>> >>> wrote:
>> >>>> We have upgraded from Lucene 3.6 to 4.4.On the production we faced
>> >>>> high
>> >>> minor GC time. Heap dump showed that one of the biggest objects by
>> >>> size is
>> >>> org.apache.lucene.util.WeakIdentityMap$IdentityWeakReference.
>> About
>> >>> 11 million instances with about 377 megabytes of memory in total (this is
>> not even retained size). Here is screenshot of the JProfiler output:
>> >>>
>> https://dl.dropboxusercontent.com/u/16254496/Screen%20Shot%202013-
>> >>> 08-07%20at%205.35.22%20PM.png.
>> >>>>
>> >>>> The keys of the map are MMapIndexInput. What this map is for and
>> >>>> how
>> >>> can I reduce it memory usage?
>> >>>> ---
>> >>>> Denis Bazhenov <bazhe...@farpost.com> FarPost.
>> >>>>
>> >>>>
>> >>>> ---------------------------------------------------------------------
>> >>>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
>> >>>> For additional commands, e-mail: java-user-h...@lucene.apache.org
>> >>>>
>> >>>
>> >>> ---------------------------------------------------------------------
>> >>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
>> >>> For additional commands, e-mail: java-user-h...@lucene.apache.org
>> >>
>> >>
>> >> ---------------------------------------------------------------------
>> >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
>> >> For additional commands, e-mail: java-user-h...@lucene.apache.org
>> >>
>> >
>> > ---
>> > Denis Bazhenov <dot...@gmail.com>
>> >
>> >
>> >
>> >
>> >
>> >
>> > ---------------------------------------------------------------------
>> > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
>> > For additional commands, e-mail: java-user-h...@lucene.apache.org
>> >
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to