On Sat, Nov 29, 2008 at 11:11 AM, Yonik Seeley <yo...@apache.org> wrote:
> On Sat, Nov 29, 2008 at 12:45 PM, Michael Stoppelman <stop...@gmail.com> > wrote: > > Hi all, > > > > I've got an indexing issue I think other folks might be interested in > > hearing about and I wanted to get feedback before I went ahead and > > implemented a new method. > > > > Currently, the way we update indices is by sending individual delete/add > > document requests to all our search boxes individually. Each box is doing > > about 20-30qps while this is happening. The problem I'm seeing is that > when > > a segment from the index is merged [honestly I don't know that much about > > segment merging] (our merge factor is set to 5) and an old highly used > > segment of the index is lost from the disk cache; most of the search > > requests to that box get prohibitively slow 10-80+ secs and I see pg/in + > > pg/out stats spike sar. > > > I'm planning on implementing a method similar to the > > SOLR model using the rsync method that Doug Cutting outlined a long time > ago > > on this list and forcing the new files into the disk cache using fadvice. > > FYI, if you don't actually want to use rsync, Solr has very recently > implemented this in pure Java. > But forcing new files into the cache means ejecting currently used > files from the cache (for queries currently in flight). Doesn't seem > any way to really win here if you don't have enough memory to hold the > critical parts of both indexes. I found a rsync patch that has a flag that lets you tell the kernel to not disk cache the new files being copied over (adds the --drop-caches flag). http://lists.samba.org/archive/rsync/2007-May/017707.html > > > I think Mike pointed out in a recent post that it would be nice to > advise the kernel not to cache files being written during merging. In > Solr, we would want to allow the same thing when copying over new > segment files during replication. Unfortunately, I don't think there > is a way to do this through Java. > > -Yonik > > > Is there another strategy here? Could I create a merge policy that forces > > new segments into the disk cache before lucene nukes the old ones? > > > > Thanks, > > M > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > >