So I couldn't resist, I attempted to do this tonight, I used the solrconfig you mentioned (as is, no modifications), I setup a 2 shard cluster in collection1, I sent 1 doc to 1 of the shards, updated it and sent the update to the other. I don't see the modifications though I only see the original document. The following is the test
public void update() throws Exception { String key = "1"; SolrInputDocument solrDoc = new SolrInputDocument(); solrDoc.setField("key", key); solrDoc.addField("content", "initial value"); SolrServer server = servers .get("http://localhost:8983/solr/collection1"); server.add(solrDoc); server.commit(); solrDoc = new SolrInputDocument(); solrDoc.addField("key", key); solrDoc.addField("content", "updated value"); server = servers.get("http://localhost:7574/solr/collection1"); UpdateRequest ureq = new UpdateRequest(); ureq.setParam("update.chain", "distrib-update-chain"); ureq.add(solrDoc); ureq.setParam("shards", "localhost:8983/solr/collection1,localhost:7574/solr/collection1"); ureq.setParam("self", "foo"); ureq.setAction(ACTION.COMMIT, true, true); server.request(ureq); System.out.println("done"); } key is my unique field in schema.xml What am I doing wrong? On Thu, Dec 1, 2011 at 8:51 PM, Jamie Johnson <jej2...@gmail.com> wrote: > Yes, the ZK method seems much more flexible. Adding a new shard would > be simply updating the range assignments in ZK. Where is this > currently on the list of things to accomplish? I don't have time to > work on this now, but if you (or anyone) could provide direction I'd > be willing to work on this when I had spare time. I guess a JIRA > detailing where/how to do this could help. Not sure if the design has > been thought out that far though. > > On Thu, Dec 1, 2011 at 8:15 PM, Mark Miller <markrmil...@gmail.com> wrote: >> Right now lets say you have one shard - everything there hashes to range X. >> >> Now you want to split that shard with an Index Splitter. >> >> You divide range X in two - giving you two ranges - then you start >> splitting. This is where the current Splitter needs a little modification. >> You decide which doc should go into which new index by rehashing each doc id >> in the index you are splitting - if its hash is greater than X/2, it goes >> into index1 - if its less, index2. I think there are a couple current >> Splitter impls, but one of them does something like, give me an id - now if >> the id's in the index are above that id, goto index1, if below, index2. We >> need to instead do a quick hash rather than simple id compare. >> >> Why do you need to do this on every shard? >> >> The other part we need that we dont have is to store hash range assignments >> in zookeeper - we don't do that yet because it's not needed yet. Instead we >> currently just simply calculate that on the fly (too often at the moment - >> on every request :) I intend to fix that of course). >> >> At the start, zk would say, for range X, goto this shard. After the split, >> it would say, for range less than X/2 goto the old node, for range greater >> than X/2 goto the new node. >> >> - Mark >> >> On Dec 1, 2011, at 7:44 PM, Jamie Johnson wrote: >> >>> hmmm.....This doesn't sound like the hashing algorithm that's on the >>> branch, right? The algorithm you're mentioning sounds like there is >>> some logic which is able to tell that a particular range should be >>> distributed between 2 shards instead of 1. So seems like a trade off >>> between repartitioning the entire index (on every shard) and having a >>> custom hashing algorithm which is able to handle the situation where 2 >>> or more shards map to a particular range. >>> >>> On Thu, Dec 1, 2011 at 7:34 PM, Mark Miller <markrmil...@gmail.com> wrote: >>>> >>>> On Dec 1, 2011, at 7:20 PM, Jamie Johnson wrote: >>>> >>>>> I am not familiar with the index splitter that is in contrib, but I'll >>>>> take a look at it soon. So the process sounds like it would be to run >>>>> this on all of the current shards indexes based on the hash algorithm. >>>> >>>> Not something I've thought deeply about myself yet, but I think the idea >>>> would be to split as many as you felt you needed to. >>>> >>>> If you wanted to keep the full balance always, this would mean splitting >>>> every shard at once, yes. But this depends on how many boxes (partitions) >>>> you are willing/able to add at a time. >>>> >>>> You might just split one index to start - now it's hash range would be >>>> handled by two shards instead of one (if you have 3 replicas per shard, >>>> this would mean adding 3 more boxes). When you needed to expand again, you >>>> would split another index that was still handling its full starting range. >>>> As you grow, once you split every original index, you'd start again, >>>> splitting one of the now half ranges. >>>> >>>>> Is there also an index merger in contrib which could be used to merge >>>>> indexes? I'm assuming this would be the process? >>>> >>>> You can merge with IndexWriter.addIndexes (Solr also has an admin command >>>> that can do this). But I'm not sure where this fits in? >>>> >>>> - Mark >>>> >>>>> >>>>> On Thu, Dec 1, 2011 at 7:18 PM, Mark Miller <markrmil...@gmail.com> wrote: >>>>>> Not yet - we don't plan on working on this until a lot of other stuff is >>>>>> working solid at this point. But someone else could jump in! >>>>>> >>>>>> There are a couple ways to go about it that I know of: >>>>>> >>>>>> A more long term solution may be to start using micro shards - each index >>>>>> starts as multiple indexes. This makes it pretty fast to move mirco >>>>>> shards >>>>>> around as you decide to change partitions. It's also less flexible as you >>>>>> are limited by the number of micro shards you start with. >>>>>> >>>>>> A more simple and likely first step is to use an index splitter . We >>>>>> already have one in lucene contrib - we would just need to modify it so >>>>>> that it splits based on the hash of the document id. This is super >>>>>> flexible, but splitting will obviously take a little while on a huge >>>>>> index. >>>>>> The current index splitter is a multi pass splitter - good enough to >>>>>> start >>>>>> with, but most files under codec control these days, we may be able to >>>>>> make >>>>>> a single pass splitter soon as well. >>>>>> >>>>>> Eventually you could imagine using both options - micro shards that could >>>>>> also be split as needed. Though I still wonder if micro shards will be >>>>>> worth the extra complications myself... >>>>>> >>>>>> Right now though, the idea is that you should pick a good number of >>>>>> partitions to start given your expected data ;) Adding more replicas is >>>>>> trivial though. >>>>>> >>>>>> - Mark >>>>>> >>>>>> On Thu, Dec 1, 2011 at 6:35 PM, Jamie Johnson <jej2...@gmail.com> wrote: >>>>>> >>>>>>> Another question, is there any support for repartitioning of the index >>>>>>> if a new shard is added? What is the recommended approach for >>>>>>> handling this? It seemed that the hashing algorithm (and probably >>>>>>> any) would require the index to be repartitioned should a new shard be >>>>>>> added. >>>>>>> >>>>>>> On Thu, Dec 1, 2011 at 6:32 PM, Jamie Johnson <jej2...@gmail.com> wrote: >>>>>>>> Thanks I will try this first thing in the morning. >>>>>>>> >>>>>>>> On Thu, Dec 1, 2011 at 3:39 PM, Mark Miller <markrmil...@gmail.com> >>>>>>> wrote: >>>>>>>>> On Thu, Dec 1, 2011 at 10:08 AM, Jamie Johnson <jej2...@gmail.com> >>>>>>> wrote: >>>>>>>>> >>>>>>>>>> I am currently looking at the latest solrcloud branch and was >>>>>>>>>> wondering if there was any documentation on configuring the >>>>>>>>>> DistributedUpdateProcessor? What specifically in solrconfig.xml >>>>>>>>>> needs >>>>>>>>>> to be added/modified to make distributed indexing work? >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> Hi Jaime - take a look at solrconfig-distrib-update.xml in >>>>>>>>> solr/core/src/test-files >>>>>>>>> >>>>>>>>> You need to enable the update log, add an empty replication handler >>>>>>>>> def, >>>>>>>>> and an update chain with solr.DistributedUpdateProcessFactory in it. >>>>>>>>> >>>>>>>>> -- >>>>>>>>> - Mark >>>>>>>>> >>>>>>>>> http://www.lucidimagination.com >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> - Mark >>>>>> >>>>>> http://www.lucidimagination.com >>>>>> >>>> >>>> - Mark Miller >>>> lucidimagination.com >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >> >> - Mark Miller >> lucidimagination.com >> >> >> >> >> >> >> >> >> >> >> >> >