Hi everyone, I'm a bit of a Lucene newb, but a fairly experienced Java developer. Hope someone can give me some clues as to what I may be doing wrong.
In essence I've got a lucene index built off of a database table that gets updated at a rate of about 1 row changing every 2 seconds or so. I've got a webapp whose sole purpose in life is to provide a simple front end for searching this table. The table in question lives in an Oracle db (not that Java cares) and it has 2 datetime/timestamp columns; ent_dtm and upd_dtm. When a new row gets inserted into the table, a trigger sets the ent_dtm to be "right now". When a row gets updated, a trigger sets the upd_dtm to be "right now". queries like: SELECT COL1, COL2,... COLn from THE_TABLE where ENT_DTM > (some timestamp) are very fast, as are queries like: SELECT COL1, COL2,... COLn from THE_TABLE where UPD_DTM > (some timestamp) These are the sorts of queries I use to keep my lucene index "in synch" with the table and these queries are fast and there are no issues with them. As you would expect, each Document in my lucene index roughly corresponds to a row in THE_TABLE, including 2 fields called "ent_dtm" and "upd_dtm" THE_TABLE has a primary key which I will call THE_ID. Correspondingly, a Document in the Lucene index has a field called "the_id" values of "the_id" are typically numbers (Field.Store.YES, Field.Index.NOT_ANALYZED_NO_NORMS) with the exception of a "special" value of "newest". The Document with the field "the_id" with the value of "newest" contains just 2 more fields, ent_dtm and upd_dtm. This Document is just used to keep track of "what's the newest thing in Lucene's world" So this is what my webapp is doing: In a background thread, every 1.2 seconds it checks the Lucene index for "what's the newest thing in my world" (call that X) uses that to hit the database asking it in essence "have you got anything newer in your world than X", if it returns say 3 rows newer than X, call the newest of those rows Y. Then, this background thread updates the Document with the_id="newest" with Y then goes to sleep again for 1.2 seconds. Lather, rinse, repeat. Incoming search requests attempt to use a "Near Real Time" IndexReader (with an IndexSearcher wrapped around it) to search the index. Again, everything seems to do what it says on the box. My problem is that I can't seem to avoid the occasional 100 second pause while IndexReader "refreshes itself". I create my one-and-only shared IndexReader thusly: indexReader = IndexReader.open(indexWriter, true); and I check if it needs to be refreshed by calling indexReader.isCurrent() and I "refresh" it with the following method: public static IndexReader freshVersionOf(IndexReader indexReader) throws IOException { StopWatch stopWatch = new StopWatch(); final IndexReader newReader = IndexReader.openIfChanged(indexReader, true); logger.info("IndexReader.openIfChanged() took " + stopWatch.elapsedSeconds() + " seconds"); if (newReader == null) { return indexReader; } else { indexReader.close(); return newReader; } } Which is basically a Lucene method moved into a static method in my own code (my method closes the old indexReader, that's the only difference) Sometimes IndexReader.openIfChanged(indexReader, true); takes what seems like a crapload of time. If I don't "freshen" the IndexReader, it doesn't see the latest-and-greatest timestamp (ie. what is newest in the Lucene world). I've tried doing indexWriter.commit() in my background thread, but that can take on the order of 100 seconds as well. Anyway, all the searching and updating of the index is all working just fine, it's just that I'm seeing these occasional long periods of time which seem to be unavoidable. Any suggestions of things to try would be appreciated! PS. I'm using Lucene 3.6 which it seems lots of people have used successfully in the past, so I'm guessing the "use the newer Lucene" won't necessarily help me. -- Mike Jennings