Re: Ivy Indexer

Nicolas Lalevée Sat, 21 Nov 2009 10:52:51 -0800

Le 19 nov. 2009 à 12:06, Xavier Hanin a écrit :

> I really like the idea to use a solr instance colocated with the repository.
> I've seen a presentation on solr yesterday at devoxx, and it sounds like so
> close to what we need. The only problem I see with it is that it requires to
> install a server side component, getting closer to what repository managers
> do. I'm not sure about why if we install a slor instance we wouldn't use it
> to update the index too. Solr takes care of problems like transactions,
> concurrency, so I think it's a perfect fit...


I think the transaction would be supported at the Lucene index level. I don't 
think there is any mechanism to make solr manage an extra "data storage". As 
far as I remember Solr is just able to read the external "data storage" to 
index it.
But what would work is a Solr deployed just next to an Ivy repository, let Ivy 
publish artifacts like it already does, but also make Ivy request Solr to index 
the newly published artifact.

And spotted by a friend, Solr 1.4 [1] support replication in Java [2] ala rsync 
!

So Solr might be the easiest way of achieving an Ivy indexer.

I have to admit I am not a big fan of having to deploy a webapp next to a dumb 
simple repo. On the other hand managing an index on the client side depends 
enormously of the kind of repository (at work we have an ivy repo in svn 
accessible form both http and checkouted), it would consume more bandwidth, 
some publication locking would probably be in place, etc...

Nicolas

[1] http://svn.apache.org/repos/asf/lucene/solr/tags/release-1.4.0/CHANGES.txt
[2] https://issues.apache.org/jira/browse/SOLR-561

> 
> My 2 c.
> 
> Xavier
> 
> 2009/11/18 Jon Schneider <[email protected]>
> 
>> While I digest Nicolas' novel :) (thanks for the additional insight on
>> Lucene by the way), I will suggest one other idea.
>> 
>> We could allow for the option of a Solr instance collocated with the
>> repository on one machine to serve up the index stored on the repository.
>> IvyDE could be configured by the user to either read the index directly
>> from the remote filesystem or send its requests via HTTP to a Solr server.
>> The Solr server would not be responsible for maintaining the index in the
>> same way that Archiva/Nexus/Artifactory do, but would simply be a querying
>> tool.  In the case where Solr is serving the index, the index would still
>> be
>> maintained through some combination of the index ant task and the publish
>> proxy.
>> 
>> This way we don't get into the complexity of pushing out index updates to
>> clients.
>> 
>> The rsync strategy is a very intriguing idea though, especially in light of
>> how Lucene segments its index in multiple files.  What happens when
>> optimize
>> is called on the index and the segments are combined into one file?  In
>> this
>> case, any search slaves would essentially have to download the whole index
>> right?  How much segmentation is considered too much segmentation before we
>> optimize the index to cater to search speed over index publishing speed?
>> 
>> I'll be trying to wrap this up enough (at least with the remote filesystem
>> index read strategy) to make a patch so others can see it in action.  We
>> are
>> a little busy at work, but I will be coming back to it in the coming days.
>> 
>> Thanks for all the feedback so far,
>> Jon
>> 
> 
> 
> 
> -- 
> Xavier Hanin - 4SH France - http://www.4sh.fr/
> BordeauxJUG creator & leader - http://www.bordeauxjug.org/
> Apache Ivy Creator - http://ant.apache.org/ivy/


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: Ivy Indexer

Reply via email to