Hi Mike, I don't think disabling by default is a good idea. It is not only 64 bit wasted address space (which is not a problem at all, you are right), but the JVM also "sits" on those files: - On windows they cannot be deleted (not even on Java 7 w/ Lucene trunk, where you can now delete them if not mmapped) - this may cause major pain...! - On posix the disk space is locked, so the inode can only be freed when GC was freeing the mapping
Uwe ----- Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -----Original Message----- > From: Michael McCandless [mailto:luc...@mikemccandless.com] > Sent: Thursday, August 08, 2013 2:18 PM > To: Lucene Users > Subject: Re: WeakIdentityMap high memory usage > > Thanks for bringing closure. > > Note that you should still run a tight ship, ie don't give excess heap to > Lucene, and instead let the OS take up the slack of any spare RAM for IO > caching. Especially with unmap disabled, the JVM will now only unmap once > a map is GC'd, so the larger your heap the longer these unused maps are > held open. > > Maybe we should disable unmap by default; I don't see what value it brings > for 64 bit envs. > > Mike McCandless > > http://blog.mikemccandless.com > > > On Wed, Aug 7, 2013 at 9:34 PM, Denis Bazhenov <dot...@gmail.com> > wrote: > > Uwe, Michael, thank you very much for your help. We have deployed one > of the nodes in our system and tomorrow I'll have more information on that, > but it seems that setUseUnmap(false) trick did the job. RT drops significantly > comparing to 3.6.0 version. We have about 100 rps per search-node and > commit interval about 1 minute, so switching off the unmap seems like a > good idea. > > > > There is one more question. As far as I understand, this map is like a fuse > > in > situation where clients continue to use IndexReader after it is already > closed. > So if the code is correctly closing IndexReaders (only after all clients have > finished using it), there is no need to use this sort of weak map hashing. > Did I > get it right? > > > > On Aug 8, 2013, at 4:31 AM, "Uwe Schindler" <u...@thetaphi.de> wrote: > > > >> Hi Denis, > >> > >> I assume you are using Lucene 3.6.0, because in Lucene 3.6.1 the tracking > of buffers using weak references is also done (although you cannot switch it > off, unfortunately). > >> > >> I can confirm what Mike says: Its all weak references and the overhead is > maybe large, but it gets freed when memory gets low. In general its in most > cases better to not allocate too much heap space for Lucene as this makes > those maps larger and GC gets stressed. Only use as much memory so no > OOM occurs and instead free al memory for the file system cache (so it has > less paging). In that case, GC will clean up the concurrent maps faster. > >> > >> In gernal: If you have an large index that changes seldom, but your query > rate is very hight (like 200 queries per second), switch unmapping off (works > since Lucene 4.2, see changelog for LUCENE-4740 - unfortunately the issue > itself was closed for 4.4, 4.2 would be correct). In that case it's not > needed to > take care of unmapping and as index reopen rate is low, this does not waste > resources. > >> > >> But if your index changes often, there is no way around unmapping - or > use NIOFSDir with NRTCachingDirectory for the optimization of near real time > search with highly changing indexes! > >> > >> Finally: The only way to fix this would be to make all codec structures > >> like > TermsEnum or DocsEnum, but also Scorer/DocIdSet/... implement Closeable. > When you are done with Scorer you have to close it and the underlying > cloned indexinput would be closed, too. In that case, the cloned IndexInput > would be refcounted and unmapped when the last clone is closed. This is a > larger change and might be an idea for Lucene 5.0 as "optimization". It would > be a backwards break because all codecs and all queries would need to close > correctly, but with our test frameworak and MockDirWrapper (and other > MockFooBarWrappers) we could track this so all resources are closed. > >> We had TermEnum.close() up to Lucene 3.x, but it was dropped in 4.0 > because it was never working in 3.x (nobody ever called close() on > TermEnum or TermDocs instances.... :( ). With our new test framework this > could be tracked now... So maybe worth a try? > >> > >> Uwe > >> > >> ----- > >> Uwe Schindler > >> H.-H.-Meier-Allee 63, D-28213 Bremen > >> http://www.thetaphi.de > >> eMail: u...@thetaphi.de > >> > >>> -----Original Message----- > >>> From: Michael McCandless [mailto:luc...@mikemccandless.com] > >>> Sent: Wednesday, August 07, 2013 3:45 PM > >>> To: Lucene Users > >>> Subject: Re: WeakIdentityMap high memory usage > >>> > >>> This map is used to track all cloned open files, which can be a very > >>> large number over time (each search will create maybe 3 of them). > >>> > >>> This is done as a "best effort" to prevent SEGV (JVM dies) if you > >>> accidentally try to use an IndexReader after it was closed, while using > MMapDirectory. > >>> > >>> However, it's a weak map, which means when HEAP is tight GC should > >>> drop it. > >>> > >>> So, this should not cause a real problem in "real life", even though > >>> it looks scary when you look at its RAM usage under a profiler. > >>> > >>> If somehow it's causing "real life" problems, please report back! > >>> But a simple workaround is to call MMapDirectory.setUseUnmap(false) > >>> to turn off this tracking; this means you rely on GC to (eventually) > unmap. > >>> > >>> Mike McCandless > >>> > >>> http://blog.mikemccandless.com > >>> > >>> > >>> On Wed, Aug 7, 2013 at 2:45 AM, Denis Bazhenov > >>> <bazhe...@farpost.com> > >>> wrote: > >>>> We have upgraded from Lucene 3.6 to 4.4.On the production we faced > >>>> high > >>> minor GC time. Heap dump showed that one of the biggest objects by > >>> size is > >>> org.apache.lucene.util.WeakIdentityMap$IdentityWeakReference. > About > >>> 11 million instances with about 377 megabytes of memory in total (this is > not even retained size). Here is screenshot of the JProfiler output: > >>> > https://dl.dropboxusercontent.com/u/16254496/Screen%20Shot%202013- > >>> 08-07%20at%205.35.22%20PM.png. > >>>> > >>>> The keys of the map are MMapIndexInput. What this map is for and > >>>> how > >>> can I reduce it memory usage? > >>>> --- > >>>> Denis Bazhenov <bazhe...@farpost.com> FarPost. > >>>> > >>>> > >>>> --------------------------------------------------------------------- > >>>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > >>>> For additional commands, e-mail: java-user-h...@lucene.apache.org > >>>> > >>> > >>> --------------------------------------------------------------------- > >>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > >>> For additional commands, e-mail: java-user-h...@lucene.apache.org > >> > >> > >> --------------------------------------------------------------------- > >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > >> For additional commands, e-mail: java-user-h...@lucene.apache.org > >> > > > > --- > > Denis Bazhenov <dot...@gmail.com> > > > > > > > > > > > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > > For additional commands, e-mail: java-user-h...@lucene.apache.org > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org