I agree "file sitting" is not great, but at worse this causes a higher transient disk usage, which happens already if you have readers open against those files, during merging, during CFS building, etc.
A number of users have complained about the apparent RAM usage of WeakIdentityMap, and it adds complexity to ByteBufferIndexInput to do this tracking ... I think defaulting the unmap hack to off is best for users of MMapDir. Mike McCandless http://blog.mikemccandless.com On Thu, Aug 8, 2013 at 9:09 AM, Uwe Schindler <u...@thetaphi.de> wrote: > Hi Mike, > > I don't think disabling by default is a good idea. It is not only 64 bit > wasted address space (which is not a problem at all, you are right), but the > JVM also "sits" on those files: > - On windows they cannot be deleted (not even on Java 7 w/ Lucene trunk, > where you can now delete them if not mmapped) - this may cause major pain...! > - On posix the disk space is locked, so the inode can only be freed when GC > was freeing the mapping > > Uwe > > ----- > Uwe Schindler > H.-H.-Meier-Allee 63, D-28213 Bremen > http://www.thetaphi.de > eMail: u...@thetaphi.de > > >> -----Original Message----- >> From: Michael McCandless [mailto:luc...@mikemccandless.com] >> Sent: Thursday, August 08, 2013 2:18 PM >> To: Lucene Users >> Subject: Re: WeakIdentityMap high memory usage >> >> Thanks for bringing closure. >> >> Note that you should still run a tight ship, ie don't give excess heap to >> Lucene, and instead let the OS take up the slack of any spare RAM for IO >> caching. Especially with unmap disabled, the JVM will now only unmap once >> a map is GC'd, so the larger your heap the longer these unused maps are >> held open. >> >> Maybe we should disable unmap by default; I don't see what value it brings >> for 64 bit envs. >> >> Mike McCandless >> >> http://blog.mikemccandless.com >> >> >> On Wed, Aug 7, 2013 at 9:34 PM, Denis Bazhenov <dot...@gmail.com> >> wrote: >> > Uwe, Michael, thank you very much for your help. We have deployed one >> of the nodes in our system and tomorrow I'll have more information on that, >> but it seems that setUseUnmap(false) trick did the job. RT drops >> significantly >> comparing to 3.6.0 version. We have about 100 rps per search-node and >> commit interval about 1 minute, so switching off the unmap seems like a >> good idea. >> > >> > There is one more question. As far as I understand, this map is like a >> > fuse in >> situation where clients continue to use IndexReader after it is already >> closed. >> So if the code is correctly closing IndexReaders (only after all clients have >> finished using it), there is no need to use this sort of weak map hashing. >> Did I >> get it right? >> > >> > On Aug 8, 2013, at 4:31 AM, "Uwe Schindler" <u...@thetaphi.de> wrote: >> > >> >> Hi Denis, >> >> >> >> I assume you are using Lucene 3.6.0, because in Lucene 3.6.1 the tracking >> of buffers using weak references is also done (although you cannot switch it >> off, unfortunately). >> >> >> >> I can confirm what Mike says: Its all weak references and the overhead is >> maybe large, but it gets freed when memory gets low. In general its in most >> cases better to not allocate too much heap space for Lucene as this makes >> those maps larger and GC gets stressed. Only use as much memory so no >> OOM occurs and instead free al memory for the file system cache (so it has >> less paging). In that case, GC will clean up the concurrent maps faster. >> >> >> >> In gernal: If you have an large index that changes seldom, but your query >> rate is very hight (like 200 queries per second), switch unmapping off (works >> since Lucene 4.2, see changelog for LUCENE-4740 - unfortunately the issue >> itself was closed for 4.4, 4.2 would be correct). In that case it's not >> needed to >> take care of unmapping and as index reopen rate is low, this does not waste >> resources. >> >> >> >> But if your index changes often, there is no way around unmapping - or >> use NIOFSDir with NRTCachingDirectory for the optimization of near real time >> search with highly changing indexes! >> >> >> >> Finally: The only way to fix this would be to make all codec structures >> >> like >> TermsEnum or DocsEnum, but also Scorer/DocIdSet/... implement Closeable. >> When you are done with Scorer you have to close it and the underlying >> cloned indexinput would be closed, too. In that case, the cloned IndexInput >> would be refcounted and unmapped when the last clone is closed. This is a >> larger change and might be an idea for Lucene 5.0 as "optimization". It would >> be a backwards break because all codecs and all queries would need to close >> correctly, but with our test frameworak and MockDirWrapper (and other >> MockFooBarWrappers) we could track this so all resources are closed. >> >> We had TermEnum.close() up to Lucene 3.x, but it was dropped in 4.0 >> because it was never working in 3.x (nobody ever called close() on >> TermEnum or TermDocs instances.... :( ). With our new test framework this >> could be tracked now... So maybe worth a try? >> >> >> >> Uwe >> >> >> >> ----- >> >> Uwe Schindler >> >> H.-H.-Meier-Allee 63, D-28213 Bremen >> >> http://www.thetaphi.de >> >> eMail: u...@thetaphi.de >> >> >> >>> -----Original Message----- >> >>> From: Michael McCandless [mailto:luc...@mikemccandless.com] >> >>> Sent: Wednesday, August 07, 2013 3:45 PM >> >>> To: Lucene Users >> >>> Subject: Re: WeakIdentityMap high memory usage >> >>> >> >>> This map is used to track all cloned open files, which can be a very >> >>> large number over time (each search will create maybe 3 of them). >> >>> >> >>> This is done as a "best effort" to prevent SEGV (JVM dies) if you >> >>> accidentally try to use an IndexReader after it was closed, while using >> MMapDirectory. >> >>> >> >>> However, it's a weak map, which means when HEAP is tight GC should >> >>> drop it. >> >>> >> >>> So, this should not cause a real problem in "real life", even though >> >>> it looks scary when you look at its RAM usage under a profiler. >> >>> >> >>> If somehow it's causing "real life" problems, please report back! >> >>> But a simple workaround is to call MMapDirectory.setUseUnmap(false) >> >>> to turn off this tracking; this means you rely on GC to (eventually) >> unmap. >> >>> >> >>> Mike McCandless >> >>> >> >>> http://blog.mikemccandless.com >> >>> >> >>> >> >>> On Wed, Aug 7, 2013 at 2:45 AM, Denis Bazhenov >> >>> <bazhe...@farpost.com> >> >>> wrote: >> >>>> We have upgraded from Lucene 3.6 to 4.4.On the production we faced >> >>>> high >> >>> minor GC time. Heap dump showed that one of the biggest objects by >> >>> size is >> >>> org.apache.lucene.util.WeakIdentityMap$IdentityWeakReference. >> About >> >>> 11 million instances with about 377 megabytes of memory in total (this is >> not even retained size). Here is screenshot of the JProfiler output: >> >>> >> https://dl.dropboxusercontent.com/u/16254496/Screen%20Shot%202013- >> >>> 08-07%20at%205.35.22%20PM.png. >> >>>> >> >>>> The keys of the map are MMapIndexInput. What this map is for and >> >>>> how >> >>> can I reduce it memory usage? >> >>>> --- >> >>>> Denis Bazhenov <bazhe...@farpost.com> FarPost. >> >>>> >> >>>> >> >>>> --------------------------------------------------------------------- >> >>>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >> >>>> For additional commands, e-mail: java-user-h...@lucene.apache.org >> >>>> >> >>> >> >>> --------------------------------------------------------------------- >> >>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >> >>> For additional commands, e-mail: java-user-h...@lucene.apache.org >> >> >> >> >> >> --------------------------------------------------------------------- >> >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >> >> For additional commands, e-mail: java-user-h...@lucene.apache.org >> >> >> > >> > --- >> > Denis Bazhenov <dot...@gmail.com> >> > >> > >> > >> > >> > >> > >> > --------------------------------------------------------------------- >> > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >> > For additional commands, e-mail: java-user-h...@lucene.apache.org >> > >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >> For additional commands, e-mail: java-user-h...@lucene.apache.org > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org