[
https://issues.apache.org/jira/browse/SOLR-12993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Noble Paul updated SOLR-12993:
------------------------------
Description:
This a just a proposal to minimize the ZK load and improve scalability of very
large clusters.
Every time a small state change occurs for a collection/replica the following
file needs to be updated + read * n times (where n = no of replicas for this
collection ). The proposal is to split the main file into 2.
{code:json}
{"gettingstarted":{
"pullReplicas":"0",
"replicationFactor":"2",
"router":{"name":"compositeId"},
"maxShardsPerNode":"-1",
"autoAddReplicas":"false",
"nrtReplicas":"2",
"tlogReplicas":"0",
"shards":{
"shard1":{
"range":"80000000-ffffffff",
"replicas":{
"core_node3":{
"core":"gettingstarted_shard1_replica_n1",
"base_url":"http://10.0.0.80:8983/solr",
"node_name":"10.0.0.80:8983_solr",
"state":"active",
"type":"NRT",
"force_set_state":"false",
"leader":"true"},
"core_node5":{
"core":"gettingstarted_shard1_replica_n2",
"base_url":"http://10.0.0.80:7574/solr",
"node_name":"10.0.0.80:7574_solr",
"type":"NRT",
"force_set_state":"false"}}},
"shard2":{
"range":"0-7fffffff",
"state":"active",
"replicas":{
"core_node7":{
"core":"gettingstarted_shard2_replica_n4",
"base_url":"http://10.0.0.80:7574/solr",
"node_name":"10.0.0.80:7574_solr",
"type":"NRT",
"force_set_state":"false"},
"core_node8":{
"core":"gettingstarted_shard2_replica_n6",
"base_url":"http://10.0.0.80:8983/solr",
"node_name":"10.0.0.80:8983_solr",
"type":"NRT",
"force_set_state":"false",
"leader":"true"}}}}}}
{code}
another file {{status.json}} which is frequently updated and small.
{code:json}
{
"shard1": {
"s": 1,
"core_node3": {"s": 1},
"core_node5": {"s": 1}
},
"shard2": {
"s": 1,
"core_node7": {"s": 1},
"core_node8": {"s": 1}}
}
{code}
Here the size of the file is roughly one tenth of the other file. This leads to
a dramatic reduction in the amount of data written/read to/from ZK.
was:
Every time a small atet change occurs for a collection replica the entire
following file needs to be updated + read * n times (where n = no of replicas
for this collection ). The proposal is to split the main file into 2.
{code:json}
{"gettingstarted":{
"pullReplicas":"0",
"replicationFactor":"2",
"router":{"name":"compositeId"},
"maxShardsPerNode":"-1",
"autoAddReplicas":"false",
"nrtReplicas":"2",
"tlogReplicas":"0",
"shards":{
"shard1":{
"range":"80000000-ffffffff",
"replicas":{
"core_node3":{
"core":"gettingstarted_shard1_replica_n1",
"base_url":"http://10.0.0.80:8983/solr",
"node_name":"10.0.0.80:8983_solr",
"state":"active",
"type":"NRT",
"force_set_state":"false",
"leader":"true"},
"core_node5":{
"core":"gettingstarted_shard1_replica_n2",
"base_url":"http://10.0.0.80:7574/solr",
"node_name":"10.0.0.80:7574_solr",
"type":"NRT",
"force_set_state":"false"}}},
"shard2":{
"range":"0-7fffffff",
"state":"active",
"replicas":{
"core_node7":{
"core":"gettingstarted_shard2_replica_n4",
"base_url":"http://10.0.0.80:7574/solr",
"node_name":"10.0.0.80:7574_solr",
"type":"NRT",
"force_set_state":"false"},
"core_node8":{
"core":"gettingstarted_shard2_replica_n6",
"base_url":"http://10.0.0.80:8983/solr",
"node_name":"10.0.0.80:8983_solr",
"type":"NRT",
"force_set_state":"false",
"leader":"true"}}}}}}
{code}
another file {{status.json}} which is frequently updated and small.
{code:json}
{
"shard1": {
"s": 1,
"core_node3": {"s": 1},
"core_node5": {"s": 1}
},
"shard2": {
"s": 1,
"core_node7": {"s": 1},
"core_node8": {"s": 1}}
}
{code}
Here the size of the file is roughly one tenth of the other file. This leads to
a dramatic reduction in the amount of data written/read to/from ZK.
> Split the state.json into 2. a small frequently modified data + a large
> unmodified data
> ---------------------------------------------------------------------------------------
>
> Key: SOLR-12993
> URL: https://issues.apache.org/jira/browse/SOLR-12993
> Project: Solr
> Issue Type: Improvement
> Security Level: Public(Default Security Level. Issues are Public)
> Reporter: Noble Paul
> Priority: Major
>
> This a just a proposal to minimize the ZK load and improve scalability of
> very large clusters.
> Every time a small state change occurs for a collection/replica the following
> file needs to be updated + read * n times (where n = no of replicas for this
> collection ). The proposal is to split the main file into 2.
> {code:json}
> {"gettingstarted":{
> "pullReplicas":"0",
> "replicationFactor":"2",
> "router":{"name":"compositeId"},
> "maxShardsPerNode":"-1",
> "autoAddReplicas":"false",
> "nrtReplicas":"2",
> "tlogReplicas":"0",
> "shards":{
> "shard1":{
> "range":"80000000-ffffffff",
>
> "replicas":{
> "core_node3":{
> "core":"gettingstarted_shard1_replica_n1",
> "base_url":"http://10.0.0.80:8983/solr",
> "node_name":"10.0.0.80:8983_solr",
> "state":"active",
> "type":"NRT",
> "force_set_state":"false",
> "leader":"true"},
> "core_node5":{
> "core":"gettingstarted_shard1_replica_n2",
> "base_url":"http://10.0.0.80:7574/solr",
> "node_name":"10.0.0.80:7574_solr",
>
> "type":"NRT",
> "force_set_state":"false"}}},
> "shard2":{
> "range":"0-7fffffff",
> "state":"active",
> "replicas":{
> "core_node7":{
> "core":"gettingstarted_shard2_replica_n4",
> "base_url":"http://10.0.0.80:7574/solr",
> "node_name":"10.0.0.80:7574_solr",
>
> "type":"NRT",
> "force_set_state":"false"},
> "core_node8":{
> "core":"gettingstarted_shard2_replica_n6",
> "base_url":"http://10.0.0.80:8983/solr",
> "node_name":"10.0.0.80:8983_solr",
>
> "type":"NRT",
> "force_set_state":"false",
> "leader":"true"}}}}}}
> {code}
> another file {{status.json}} which is frequently updated and small.
> {code:json}
> {
> "shard1": {
> "s": 1,
> "core_node3": {"s": 1},
> "core_node5": {"s": 1}
> },
> "shard2": {
> "s": 1,
> "core_node7": {"s": 1},
> "core_node8": {"s": 1}}
> }
> {code}
> Here the size of the file is roughly one tenth of the other file. This leads
> to a dramatic reduction in the amount of data written/read to/from ZK.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]