Nitin Sharma created SOLR-9241:
----------------------------------

             Summary: Rebalance API for SolrCloud
                 Key: SOLR-9241
                 URL: https://issues.apache.org/jira/browse/SOLR-9241
             Project: Solr
          Issue Type: New Feature
          Components: SolrCloud
    Affects Versions: 4.6.1
         Environment: Ubuntu, Mac OsX
            Reporter: Nitin Sharma
             Fix For: 4.6.1
         Attachments: rebalance.diff

This is the v1 of the patch for Solrcloud Rebalance api, built at Bloomreach by 
Nitin Sharma and Suruchi Shah. The goal of the API  is to provide a zero 
downtime mechanism to perform data manipulation and  efficient core allocation 
in solrcloud. This API was envisioned to be the base layer that enables 
Solrcloud to be an auto scaling platform. (and work in unison with other 
complementing monitoring and scaling features).


# Patch Status:
===============
The patch is work in progress and incremental. We have done a few rounds of 
code clean up. We wanted to get the patch going first to get initial feed back. 
 We will continue to work on making it more open source friendly and easily 
testable.

# Deployment Status:
====================
The platform is deployed in production at bloomreach and has been battle tested 
for large scale load. (millions of documents and hundreds of collections).

# Internals:
=============
The internals of the API and performance : 
http://engineering.bloomreach.com/solrcloud-rebalance-api/

It is built on top of the admin collections API as an action (with various 
flavors). At a high level, the rebalance api provides 2 constructs:

Scaling Strategy:  Decides how to move the data.  Every flavor has multiple 
options which can be reviewed in the api spec.
Re-distribute  - Move around data in the cluster based on capacity/allocation.
Auto Shard  - Dynamically shard a collection to any size.
Smart Merge - Distributed Mode - Helps merging data from a larger shard setup 
into smaller one.  (the source should be divisible by destination)
Scale up -  Add replicas on the fly
Scale Down - Remove replicas on the fly

Allocation Strategy:  Decides where to put the data.  (Nodes with least cores, 
Nodes that do not have this collection etc). Custom implementations can be 
built on top as well. One other example is Availability Zone aware. Distribute 
data such that every replica is placed on different availability zone to 
support HA.

# Detailed API Spec:
====================
  https://github.com/bloomreach/solrcloud-rebalance-api

# Contributors:
=====================
  Nitin Sharma
  Suruchi Shah

# Questions/Comments:
=====================
  You can reach me at [email protected]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to