And by the way, I cannot see it ever making sense to keep reopening an index reader every second or so. It has to be MUCH more efficient to even wait every 2 or 4 seconds...even that is going to be pretty nasty, but you have to allow for a bit of batch man. You will waste so much time opening those readers that its not going to be real-time anyway. You are just going to be in a world of slow.
On 7/30/07, Mark Miller <[EMAIL PROTECTED]> wrote: > > I believe there is an issue in JIRA that handles reopening an IndexReader > without reopening segments that have not changed. > > On 7/30/07, Tim Sturge < [EMAIL PROTECTED]> wrote: > > > > Thanks for the reply Erick, > > > > I believe it is the gc for four reasons: > > > > - I've tried the "warmup" approach alredy and it didn't change the > > situation. > > > > - The server completely pauses for several seconds. I run jstack to find > > out where the pause is, and it also pauses for several seconds before > > telling me the server is doing something perfectly innocuous. If I was > > stuck in some search overhead, I would expect jstack to tell me where > > (and I would expect the where to be somewhere interesting and vaguely > > repeatable) > > > > - The impact is very uneven. Over 50000 queries (sequentially) I get > > 49500 at 3 msec, 450 at 300 msec and 50 at 3 sec or more (ouch). I > > really would be much happier with a consistent 10msec (which adds up to > > the same amount of time in total) or even 25msec > > > > - "-XX:+UseConcMarkSweepGC -XX:+UseParNewGC" changes the pauses (I get > > 100 msec and 1 sec pauses instead, but 5x as many for slower overall > > time; 1 sec is far too slow) > > > > Your solution looks possible, but seems really too complex for what I am > > trying to do (which is basic incremental update). What I really am > > looking for is a way to avoid reopening the first segment of my FSDir. I > > > > have a single 6G segment, and then another 20-50 segments with updates, > > but they are <100M in total size. So if I could have lucene open just > > the segments file and the new or changed *.del and *.cfs files (without > > reopening the unchanged *.cfs files) that would be a huge win for me I > > think. > > > > It strikes me this should be possible with a thin but complex layer > > between the SegmentReader and MultiReader, and perhaps a way to get > > SegmentReader to update what *.del file it is using. I'm just curious > > why this doesn't already exist. > > > > Tim > > > > Erick Erickson wrote: > > > Why do you believe that it's the gc? I admit i just scanned your > > > e-mail, but I *do* know that the first search (especially sorts) on > > > a newly-opened IndexReader incure a bunch of overhead. Could > > > that be what you're seeing? > > > > > > I'm not sure there is a "best practice", but I have seen two > > > solutions mentioned, both more complex than opening/closing > > > the reader. > > > 1> open the reader in the background, fire a few "warmup" queries > > > at it, then switch it with the one you actually use to answer queries. > > > > > > > > 2> Use a RAMDirectory to hold your new entries for some period > > > of time. You'd have to do some fancy dancing to keep this straight > > > since you're updating documents, but it might be viable. The scheme > > > is something like > > > Open your FSDIR > > > Open a RAMdir. > > > > > > Add all new documents to BOTH of them. When servicing a query, > > > look in both indexes, but you only open/close the RAMdir for > > > every query. Note that since, when you open a reader, it > > > takes a snapshot of the index, these two views will be disjoint. When > > you > > > get your results back, you'll have to do something about the documents > > > > > from the FSdir that have been replaced in the RAMdir, which is where > > > the fancy dancing part comes in. But I leave that as an exercise for > > > the reader. > > > > > > Periodically, shut everything down and repeat. The point here is that > > > you can (probably) close/open your RAMdir with very small costs and > > > have the whole thing be up to date. > > > > > > There'll be some coordination issues, and you'll have to cope with > > data > > > integrity if your process barfs before you've closed your FSDir.... > > > > > > Or, you could ask whether 5 seconds is really necessary.I've seen a > > lot > > > of times when "real time" could be 5 minutes and nobody would really > > > complain, and other times when it really is critical. But that's > > between you > > > and our Product Manager.... > > > > > > Hope this helps > > > Erick > > > > > > On 7/25/07, Tim Sturge <[EMAIL PROTECTED]> wrote: > > > > > >> Hi, > > >> > > >> I am indexing a set of constantly changing documents. The change rate > > is > > >> moderate (about 10 docs/sec over a 10M document collection with a 6G > > >> total size) but I want to be right up to date (ideally within a > > second > > >> but within 5 seconds is acceptable) with the index. > > >> > > >> Right now I have code that adds new documents to the index and > > deletes > > >> old ones using updateDocument() in the 2.1 IndexWriter. In order to > > see > > >> the changes, I need to recreate the IndexReader/IndexSearcher every > > >> second or so. I am not calling optimize() on the index in the writer, > > >> and the mergeFactor is 10. > > >> > > >> The problem I am facing is that java gc is terrible at collecting the > > > > >> IndexSearchers I am discarding. I usually have a 3msec query time, > > but I > > >> get gc pauses of 300msec to 3 sec (I assume is is collecting the > > >> "tenured" generation in these pauses, which is my old IndexSearcher) > > >> > > >> I've tried "-Xincgc", "-XX:+UseConcMarkSweepGC -XX:+UseParNewGC" and > > >> calling System.gc() right after I close the old index without much > > luck > > >> (I get the pauses down to 1sec, but get 3x as many. I want < 25 msec > > >> pauses). So my question is, should I be avoiding reloading my index > > in > > >> this way? Should I keep a separate IndexReader (which only deletes > > old > > >> documents) and one for new documents? Is there a standard technique > > for > > >> a quickly changing index? > > >> > > >> Thanks, > > >> > > >> Tim > > >> > > >> > > >> --------------------------------------------------------------------- > > >> To unsubscribe, e-mail: [EMAIL PROTECTED] > > >> For additional commands, e-mail: [EMAIL PROTECTED] > > >> > > >> > > >> > > > > > > > > > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: [EMAIL PROTECTED] > > For additional commands, e-mail: [EMAIL PROTECTED] > > > > >