Hi all,

I've got an indexing issue I think other folks might be interested in
hearing about and I wanted to get feedback before I went ahead and
implemented a new method.

Currently, the way we update indices is by sending individual delete/add
document requests to all our search boxes individually. Each box is doing
about 20-30qps while this is happening. The problem I'm seeing is that when
a segment from the index is merged [honestly I don't know that much about
segment merging] (our merge factor is set to 5) and an old highly used
segment of the index is lost from the disk cache; most of the search
requests to that box get prohibitively slow 10-80+ secs and I see pg/in +
pg/out stats spike sar. I'm planning on implementing a method similar to the
SOLR model using the rsync method that Doug Cutting outlined a long time ago
on this list and forcing the new files into the disk cache using fadvice.

Is there another strategy here? Could I create a merge policy that forces
new segments into the disk cache before lucene nukes the old ones?

Thanks,
M

Reply via email to