Hmm... this looks like a side-effect of LUCENE-2680, which was merged back from trunk to 3.1.
So the problem is, IW recycles the RAM it has allocated, and so this method is returning the allocated RAM, even if those buffers are not in fact in use right now (ie, filled with postings data). I think it's important that it does this, ie, it should be honest that it is in fact tying up RAM. Maybe we could fix this by adding a new method that tells you how much of the buffers are really in-use... but I don't think we directly track that now; it'd have to be computed from the free buffers lists inside DocumentsWriter. BTW, why not have IW flush by RAM itself? This way it will flush (but not commit) the postings to disk... commit is rather costly since it fsyncs all the newly written files. Mike McCandless http://blog.mikemccandless.com On Tue, Aug 23, 2011 at 12:17 AM, Trejkaz <trej...@trypticon.org> wrote: > Hi all. > > We are using IndexWriter with no limits set and managing the commits > ourselves, mainly so that we can ensure they are done at the same time > as other (non-Lucene) commits. > > After upgrading from 3.0 ~ 3.3, we are seeing a change in > ramSizeInBytes() behaviour where it is no longer resetting to zero > after a commit(). The end result is that after a while, the code > wants to commit after adding even a single document. > > I boiler it down to a test case (though I'm obviously just using JUnit > as a helper here): > > @Test > public void testIndexWriterByteCount() throws Exception > { > Directory directory = new RAMDirectory(); > IndexWriter writer = new IndexWriter(directory, new > WhitespaceAnalyzer(), IndexWriter.MaxFieldLength.UNLIMITED); > System.out.println("At start: " + writer.ramSizeInBytes()); > > for (int j = 0; j < 3; j++) > { > for (int i = 0; i < 5; i++) > { > Document document = new Document(); > document.add(new Field("text", "a", Field.Store.YES, > Field.Index.ANALYZED)); > writer.addDocument(document); > } > System.out.println("After adding some docs: " + > writer.ramSizeInBytes()); > > writer.commit(); > System.out.println("After commit: " + writer.ramSizeInBytes()); > } > > writer.close(); > directory.close(); > } > > The results on Lucene 3.3.0: > > At start: 0 > After adding some docs: 99400 > After commit: 99344 > After adding some docs: 99400 > After commit: 99344 > After adding some docs: 99400 > After commit: 99344 > > The results of running more or less the same test on Lucene 3.0.3: > > At start: 0 > After adding some docs: 115712 > After commit: 0 > After adding some docs: 50176 > After commit: 0 > After adding some docs: 50176 > After commit: 0 > > Questions: > > (1) Is Lucene now caching more than it used to be caching, which would > account for the extra space usage, or is this simply a bug where the > count isn't being updated correctly? > > (2) Is checking ramSizeInBytes() still the recommended way to > determine whether it's time to commit()? > > TX > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org