The Problem: periodically we see thousands of files get created from
an IndexWriter in a Java process in a very short period of time.
Since we started trying to track this, we saw an index go from ~25
files to over 200K files in about a half hour.
The Context: a hand-rolled, all-in-one Lucene server (2.3.2 codebase)
that can respond to searches and perform index updates, running under
Tomcat, on Java 1.6 on 32-bit Linux using 2GB of memory, reading/
writing to local disk. This is a threaded environment where we're
serving about 15-20/requests a second (mostly searches, with a 10:1
search/update ratio). We wrap all of the update code around
IndexWriter to make sure all threads are only ever using one writer
and never close an actively used writer. We cache about 40
IndexSearchers (really IndexReaders) using an MRU cache and leave it
to Java to garbage collect those that leave scope. We can potentially
serve ~150 different search indexes, most with document count under 1
million, with fairly sparsely populated fields and under about 100
fields. We do not store a lot of information in any index, generally
just IDs that we then use for DB look-ups. Our biggest index is about
7GB on disk and comprises roughly 18 million records and is almost
always in use (either searched or updated). We sometimes go days
without seeing the The Problem and we've seen it happen twice in the
span of 4 hours.
Accompanying Symptom: we see an OOM error where we do not have enough
heap space. I'm not sure if the explosion of files triggers or
results from the error. This is the only error we see accompanying
the problem; performance and memory usage seem fine up to the OOM error.
Current Workaround: taking the same server to a 64-bit machine and
throwing 10GB of RAM at it seems (4 days counting now) to have
"solved" the problem.
What I'd really like is to understand the underlying problem, and we
have some theories, but before charging down one way or another I was
hoping to get an idea if a) people have seen something similar before
and b) what they did. Our theories:
- Opening IndexReaders faster than Java can garbage collect those that
are out of scope. We do know that too many open readers (e.g. around
100 of our indexes) can exhaust memory. This scenario seems unlikely
given our usage; we have 2-3 heavily used indexes and very light usage
on the rest. That said, the with some recent code changes we decided
to rely on garbage collection to fix another bug (race condition where
a searcher was being used as it was being closed).
- Hit a race condition with IndexWriter, with our code or in this
version of the library, and it goes nuts.
- Particular heavy-duty search/update hits, e.g. potentially iterating
across all documents (not likely) or updating a large number of
documents in an index (more likely).
Really scientific, I know, but I'd welcome any discussion that
involves juggling Java heap (what do you do with your OOMs?), our
particular problem or a threaded environment using Lucene (like Solr).
thanks!
Micah
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org