Hi all, I've got an indexing issue I think other folks might be interested in hearing about and I wanted to get feedback before I went ahead and implemented a new method.
Currently, the way we update indices is by sending individual delete/add document requests to all our search boxes individually. Each box is doing about 20-30qps while this is happening. The problem I'm seeing is that when a segment from the index is merged [honestly I don't know that much about segment merging] (our merge factor is set to 5) and an old highly used segment of the index is lost from the disk cache; most of the search requests to that box get prohibitively slow 10-80+ secs and I see pg/in + pg/out stats spike sar. I'm planning on implementing a method similar to the SOLR model using the rsync method that Doug Cutting outlined a long time ago on this list and forcing the new files into the disk cache using fadvice. Is there another strategy here? Could I create a merge policy that forces new segments into the disk cache before lucene nukes the old ones? Thanks, M