While I digest Nicolas' novel :) (thanks for the additional insight on
Lucene by the way), I will suggest one other idea.

We could allow for the option of a Solr instance collocated with the
repository on one machine to serve up the index stored on the repository.
 IvyDE could be configured by the user to either read the index directly
from the remote filesystem or send its requests via HTTP to a Solr server.
 The Solr server would not be responsible for maintaining the index in the
same way that Archiva/Nexus/Artifactory do, but would simply be a querying
tool.  In the case where Solr is serving the index, the index would still be
maintained through some combination of the index ant task and the publish
proxy.

This way we don't get into the complexity of pushing out index updates to
clients.

The rsync strategy is a very intriguing idea though, especially in light of
how Lucene segments its index in multiple files.  What happens when optimize
is called on the index and the segments are combined into one file?  In this
case, any search slaves would essentially have to download the whole index
right?  How much segmentation is considered too much segmentation before we
optimize the index to cater to search speed over index publishing speed?

I'll be trying to wrap this up enough (at least with the remote filesystem
index read strategy) to make a patch so others can see it in action.  We are
a little busy at work, but I will be coming back to it in the coming days.

Thanks for all the feedback so far,
Jon

Reply via email to