[
https://issues.apache.org/jira/browse/SOLR-10233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tomás Fernández Löbbe updated SOLR-10233:
-----------------------------------------
Description:
For the majority of the cases, current SolrCloud's distributed indexing is
great. There is a subset of use cases for which the legacy Master/Slave
replication may fit better:
* Don’t require NRT
* LIR can become an issue, prefer availability of reads vs consistency or NRT
* High number of searches (requiring many search nodes)
SOLR-9835 is adding replicas that don’t do indexing, just update their
transaction log. This Jira is to extend that idea and provide the following
replica types:
* *Realtime:* Writes updates to transaction log and indexes locally. Replicas
of type “realtime” support NRT (soft commits) and RTG. Any _realtime_ replica
can become a leader. This is the only type supported in SolrCloud at this time
and will be the default.
* *Append:* Writes to transaction log, but not to index, uses replication. Any
_append_ replica can become leader (by first applying all local transaction log
elements). If a replica is of type _append_ but is also the leader, it will
behave as a _realtime_. This is exactly what SOLR-9835 is proposing (non-live
replicas)
* *Passive:* Doesn’t index or writes to transaction log. Just replicates from
_realtime_ or _append_ replicas. Passive replicas can’t become shard leaders
(i.e., if there are only passive replicas in the collection at some point,
updates will fail same as if there is no leaders, queries continue to work), so
they don’t even participate in elections.
When the leader replica of the shard receives an update, it will distribute it
to all _realtime_ and _append_ replicas, the same as it does today. It won't
distribute to _passive_ replicas.
By using a combination of _append_ and _passive_ replicas, one can achieve an
equivalent of the legacy Master/Slave architecture in SolrCloud mode with most
of its benefits, including high availability of writes.
h2. API (v1 style)
{{/admin/collections?action=CREATE…&*realtime=X&append=Y&passive=Z*}}
{{/admin/collections?action=ADDREPLICA…&*type=\[realtime/append/passive\]*}}
* “replicationFactor=” will translate to “realtime=“ for back compatibility
* if _passive_ > 0, _append_ or _realtime_ need to be >= 1 (can’t be all
passives)
h2. Placement Strategies
By using replica placement rules, one should be able to dedicate nodes to
search-only and write-only workloads. For example:
{code}
shard:*,replica:*,type:passive,fleet:slaves
{code}
where “type” is a new condition supported by the rule engine, and
“fleet:slaves” is a regular tag. Note that rules are only applied when the
replicas are created, so a later change in tags won't affect existing replicas.
Also, rules are per collection, so each collection could contain it's own
different rules.
Note that on the server side Solr also needs to know how to distribute the
shard requests (maybe ShardHandler?) if we want to hit only a subset of
replicas (i.e. *passive *replicas only, or similar rules)
h2. SolrJ
SolrCloud client could be smart to prefer _passive_ replicas for search
requests when available (and if configured to do so). _Passive_ replicas can’t
respond RTG requests, so those should go to _append_ or _realtime_ replicas.
h2. Cluster/Collection state
{code}
{"gettingstarted":{
"replicationFactor":"1",
"router":{"name":"compositeId"},
"maxShardsPerNode":"2",
"autoAddReplicas":"false",
"shards":{
"shard1":{
"range":"80000000-ffffffff",
"state":"active",
"replicas":{
"core_node5":{
"core":"gettingstarted_shard1_replica1",
"base_url":"http://127.0.0.1:8983/solr",
"node_name":"127.0.0.1:8983_solr",
"state":"active",
"leader":"true",
**"type": "realtime"**},
"core_node10":{
"core":"gettingstarted_shard1_replica2",
"base_url":"http://127.0.0.1:7574/solr",
"node_name":"127.0.0.1:7574_solr",
"state":"active",
**"type": "passive"**}},
}},
"shard2":{
...
{code}
h2. Back compatibility
We should be able to support back compatibility by assuming replicas without a
“type” property are _realtime_ replicas.
h2. Failure Scenarios for passive replicas
h3. Replica-Leader partition
In SolrCloud today, in this scenario the replica would be placed in LIR. With
_passive_ replicas, replicas may not be able to replicate from some time (and
fall behind with the index) but queries can still be served. Once the
connection is re-established the replication will continue.
h3. Replica ZooKeeper partition
_Passive_ replica will leave the cluster. “Smart clients” and other replicas
(e.g. for distributed search) won’t find it and won’t query on it. Direct
search requests to the replica may still succeed.
h3. Passive replica dies (or is unreachable)
Replica won’t be query-able. On restart, replica will recover from the leader,
following the same flow as _realtime_ replicas: set state to DOWN, then
RECOVERING, and finally ACTIVE. _Passive_ replicas will use a different
{{RecoveryStrategy}} implementation, that omits *preparerecovery,* and peer
sync attempt, it will jump to replication . If the leader didn't change, or if
the other replicas are of type “append”, replication should be incremental.
Once the first replication is done, passive replica will declare itself active
and start serving traffic.
h3. Leader dies
Passive replica won’t be able to replicate. The cluster won’t take updates
until a new leader is elected. Once a new leader is elected, updates will be
back to normal. Passive replicas will remain active and serving query traffic
during the “write outage”. Once the new leader is elected the replication will
restart (maybe from a different node)
h3. Leader ZooKeeper partition
Same as today. Leader will abandon leadership and a new replica will be elected
as leader.
h2. Q&A
h3. Can I use a combination of _passive_ + _realtime_?
You could. The problem is that, since _realtime_ generate their own index, any
change of leadership could trigger a full replication from all the _passive_
replicas. The biggest benefits of _append_ replicas is that they share the same
index files, which means that even if the leader changes, the number of
segments to replicate will remain low. For that reason, using _append_ replicas
is recommended when using _passive_.
h3. Can I use _passive_ + _append_ + _realtime_?
The issue with mixing _realtime_ replicas with _append_ replicas is that if a
different _realtime_ replica becomes the leader, the whole purpose of using
_append_ replicas is defeated, since they will all have to replicate the full
index.
h3. What happens if replication from *passives* fail?
TBD: In general we want those replicas to continue serving search traffic, but
we may want to have a way to say “If can’t replicate after X hours put yourself
in recovery” or something similar.
[~varunthacker] suggested that we include in the response time since the last
successful replication, and then the client can choose what to do with the
results (in a multi-shard request, this date would be the oldest of all shards).
h3. Do _passive_ replicas need to replicate from the leader only?
This is not necessary. _Passive_ replicas can replicate from any _realtime_ or
_append_ replicas, although this would add some extra waiting time for the last
updates. Replicating from a _realtime_ replica may not be a good idea, see the
question “Can I use a combination of _passive_ + _realtime_?”
h3. What if I need NRT?
Then you can’t query _append_ or _passive_ replicas. You should use all
_realtime_ replicas
h3. Will new _passive_ replicas start receiving traffic immediately after added?
_passive_ replicas will have the same states as _realtime_/_append_ replicas,
they’ll join the cluster as “DOWN” and be moved to “RECOVERY” until they can
replicate from the leader. Then they’ll start the replication process and
become “ACTIVE”, at this point they’ll start responding queries. They'll use a
different {{RecoveryStrategy}} that skips peer sync and buffering of docs, and
just replicates.
h3. What if a _passive_ replica receives an update?
This will work the same as today with non-leader replicas, it will just forward
the update to the correct leader.
h3. What is the difference between using active + passive with legacy
master/slave?
These are just some I can think of:
* You now need ZooKeeper to run in SolrCloud mode
* High availability for writes, as long as you have more than 1 active replica
* Shard management by Solr at index time and query time.
* Full support for Collections and Collections API
* SolrCloudClient support
I'd like to get some thoughts on this proposal.
was:
For the majority of the cases, current SolrCloud's distributed indexing is
great. There is a subset of use cases for which the legacy Master/Slave
replication may fit better:
* Don’t require NRT
* LIR can become an issue, prefer availability of reads vs consistency or NRT
* High number of searches (requiring many search nodes)
SOLR-9835 is adding replicas that don’t do indexing, just update their
transaction log. This Jira is to extend that idea and provide the following
replica types:
* *Realtime:* Writes updates to transaction log and indexes locally. Replicas
of type “realtime” support NRT (soft commits) and RTG. Any _realtime_ replica
can become a leader. This is the only type supported in SolrCloud at this time
and will be the default.
* *Append:* Writes to transaction log, but not to index, uses replication. Any
_append_ replica can become leader (by first applying all local transaction log
elements). If a replica is of type _append_ but is also the leader, it will
behave as a _realtime_. This is exactly what SOLR-9835 is proposing (non-live
replicas)
* *Passive:* Doesn’t index or writes to transaction log. Just replicates from
_realtime_ or _append_ replicas. Passive replicas can’t become shard leaders
(i.e., if there are only passive replicas in the collection at some point,
updates will fail same as if there is no leaders, queries continue to work), so
they don’t even participate in elections.
When the leader replica of the shard receives an update, it will distribute it
to all _realtime_ and _append_ replicas, the same as it does today. It won't
distribute to _passive_ replicas.
By using a combination of _append_ and _passive_ replicas, one can achieve an
equivalent of the legacy Master/Slave architecture in SolrCloud mode with most
of its benefits, including high availability of writes.
h2. API (v1 style)
{{/admin/collections?action=CREATE…&*realtime=X&append=Y&passive=Z*}}
{{/admin/collections?action=ADDREPLICA…&*type=\[realtime/append/passive\]*}}
* “replicationFactor=” will translate to “realtime=“ for back compatibility
* if _passive_ > 0, _append_ or _realtime_ need to be >= 1 (can’t be all
passives)
h2. Placement Strategies
By using replica placement rules, one should be able to dedicate nodes to
search-only and write-only workloads. For example:
{code}
shard:*,replica:*,type:passive,fleet:slaves
{code}
where “type” is a new condition supported by the rule engine, and
“fleet:slaves” is a regular tag. Note that rules are only applied when the
replicas are created, so a later change in tags won't affect existing replicas.
Also, rules are per collection, so each collection could contain it's own
different rules.
Note that on the server side Solr also needs to know how to distribute the
shard requests (maybe ShardHandler?) if we want to hit only a subset of
replicas (i.e. *passive *replicas only, or similar rules)
h2. SolrJ
SolrCloud client could be smart to prefer _passive_ replicas for search
requests when available (and if configured to do so). _Passive_ replicas can’t
respond RTG requests, so those should go to _append_ or _realtime_ replicas.
h2. Cluster/Collection state
{code}
{"gettingstarted":{
"replicationFactor":"1",
"router":{"name":"compositeId"},
"maxShardsPerNode":"2",
"autoAddReplicas":"false",
"shards":{
"shard1":{
"range":"80000000-ffffffff",
"state":"active",
"replicas":{
"core_node5":{
"core":"gettingstarted_shard1_replica1",
"base_url":"http://127.0.0.1:8983/solr",
"node_name":"127.0.0.1:8983_solr",
"state":"active",
"leader":"true",
**"type": "realtime"**},
"core_node10":{
"core":"gettingstarted_shard1_replica2",
"base_url":"http://127.0.0.1:7574/solr",
"node_name":"127.0.0.1:7574_solr",
"state":"active",
**"type": "passive"**}},
}},
"shard2":{
...
{code}
h2. Back compatibility
We should be able to support back compatibility by assuming replicas without a
“type” property are _realtime_ replicas.
h2. Failure Scenarios for passive replicas
h3. Replica-Leader partition
In SolrCloud today, in this scenario the replica would be placed in LIR. With
_passive_ replicas, replicas may not be able to replicate from some time (and
fall behind with the index) but queries can still be served. Once the
connection is re-established the replication will continue.
h3. Replica ZooKeeper partition
_Passive_ replica will leave the cluster. “Smart clients” and other replicas
(e.g. for distributed search) won’t find it and won’t query on it. Direct
search requests to the replica may still succeed.
h3. Passive replica dies (or is unreachable)
Replica won’t be query-able. On restart, replica will recover from the leader,
following the same flow as _realtime_ replicas: set state to DOWN, then
RECOVERING, and finally ACTIVE. _Passive_ replicas will use a different
{{RecoveryStrategy}} implementation, that omits *preparerecovery,* and peer
sync attempt, it will jump to replication . If the leader didn't change, or if
the other replicas are of type “append”, replication should be incremental.
Once the first replication is done, passive replica will declare itself active
and start serving traffic.
h3. Leader dies
Passive replica won’t be able to replicate. The cluster won’t take updates
until a new leader is elected. Once a new leader is elected, updates will be
back to normal. Passive replicas will remain active and serving query traffic
during the “write outage”. Once the new leader is elected the replication will
restart (maybe from a different node)
h3. Leader ZooKeeper partition
Same as today. Leader will abandon leadership and a new replica will be elected
as leader.
h2. Q&A
h3. Can I use a combination of _passive_ + _realtime_?
You could. The problem is that, since _realtime_ generate their own index, any
change of leadership could trigger a full replication from all the _passive_
replicas. The biggest benefits of _append_ replicas is that they share the same
index files, which means that even if the leader changes, the number of
segments to replicate will remain low. For that reason, using _append_ replicas
is recommended when using _passive_.
h3. Can I use _passive_ + _append_ + _realtime_?
The issue with mixing _realtime_ replicas with _append_ replicas is that if a
different _realtime_ replica becomes the leader, the whole purpose of using
_append_ replicas is defeated, since they will all have to replicate the full
index.
h3. What happens if replication from *passives* fail?
TBD: In general we want those replicas to continue serving search traffic, but
we may want to have a way to say “If can’t replicate after X hours put yourself
in recovery” or something similar.
[~varunthacker] suggested that we include in the response the date of the last
successful replication, and then the client can choose what to do with the
results (in a multi-shard request, this date would be the oldest of all shards).
h3. Do _passive_ replicas need to replicate from the leader only?
This is not necessary. _Passive_ replicas can replicate from any _realtime_ or
_append_ replicas, although this would add some extra waiting time for the last
updates. Replicating from a _realtime_ replica may not be a good idea, see the
question “Can I use a combination of _passive_ + _realtime_?”
h3. What if I need NRT?
Then you can’t query _append_ or _passive_ replicas. You should use all
_realtime_ replicas
h3. Will new _passive_ replicas start receiving traffic immediately after added?
_passive_ replicas will have the same states as _realtime_/_append_ replicas,
they’ll join the cluster as “DOWN” and be moved to “RECOVERY” until they can
replicate from the leader. Then they’ll start the replication process and
become “ACTIVE”, at this point they’ll start responding queries. They'll use a
different {{RecoveryStrategy}} that skips peer sync and buffering of docs, and
just replicates.
h3. What if a _passive_ replica receives an update?
This will work the same as today with non-leader replicas, it will just forward
the update to the correct leader.
h3. What is the difference between using active + passive with legacy
master/slave?
These are just some I can think of:
* You now need ZooKeeper to run in SolrCloud mode
* High availability for writes, as long as you have more than 1 active replica
* Shard management by Solr at index time and query time.
* Full support for Collections and Collections API
* SolrCloudClient support
I'd like to get some thoughts on this proposal.
> Add support for different replica types in Solr
> -----------------------------------------------
>
> Key: SOLR-10233
> URL: https://issues.apache.org/jira/browse/SOLR-10233
> Project: Solr
> Issue Type: New Feature
> Security Level: Public(Default Security Level. Issues are Public)
> Components: SolrCloud
> Reporter: Tomás Fernández Löbbe
> Assignee: Tomás Fernández Löbbe
>
> For the majority of the cases, current SolrCloud's distributed indexing is
> great. There is a subset of use cases for which the legacy Master/Slave
> replication may fit better:
> * Don’t require NRT
> * LIR can become an issue, prefer availability of reads vs consistency or NRT
> * High number of searches (requiring many search nodes)
> SOLR-9835 is adding replicas that don’t do indexing, just update their
> transaction log. This Jira is to extend that idea and provide the following
> replica types:
> * *Realtime:* Writes updates to transaction log and indexes locally. Replicas
> of type “realtime” support NRT (soft commits) and RTG. Any _realtime_ replica
> can become a leader. This is the only type supported in SolrCloud at this
> time and will be the default.
> * *Append:* Writes to transaction log, but not to index, uses replication.
> Any _append_ replica can become leader (by first applying all local
> transaction log elements). If a replica is of type _append_ but is also the
> leader, it will behave as a _realtime_. This is exactly what SOLR-9835 is
> proposing (non-live replicas)
> * *Passive:* Doesn’t index or writes to transaction log. Just replicates from
> _realtime_ or _append_ replicas. Passive replicas can’t become shard leaders
> (i.e., if there are only passive replicas in the collection at some point,
> updates will fail same as if there is no leaders, queries continue to work),
> so they don’t even participate in elections.
> When the leader replica of the shard receives an update, it will distribute
> it to all _realtime_ and _append_ replicas, the same as it does today. It
> won't distribute to _passive_ replicas.
> By using a combination of _append_ and _passive_ replicas, one can achieve an
> equivalent of the legacy Master/Slave architecture in SolrCloud mode with
> most of its benefits, including high availability of writes.
> h2. API (v1 style)
> {{/admin/collections?action=CREATE…&*realtime=X&append=Y&passive=Z*}}
> {{/admin/collections?action=ADDREPLICA…&*type=\[realtime/append/passive\]*}}
> * “replicationFactor=” will translate to “realtime=“ for back compatibility
> * if _passive_ > 0, _append_ or _realtime_ need to be >= 1 (can’t be all
> passives)
> h2. Placement Strategies
> By using replica placement rules, one should be able to dedicate nodes to
> search-only and write-only workloads. For example:
> {code}
> shard:*,replica:*,type:passive,fleet:slaves
> {code}
> where “type” is a new condition supported by the rule engine, and
> “fleet:slaves” is a regular tag. Note that rules are only applied when the
> replicas are created, so a later change in tags won't affect existing
> replicas. Also, rules are per collection, so each collection could contain
> it's own different rules.
> Note that on the server side Solr also needs to know how to distribute the
> shard requests (maybe ShardHandler?) if we want to hit only a subset of
> replicas (i.e. *passive *replicas only, or similar rules)
> h2. SolrJ
> SolrCloud client could be smart to prefer _passive_ replicas for search
> requests when available (and if configured to do so). _Passive_ replicas
> can’t respond RTG requests, so those should go to _append_ or _realtime_
> replicas.
> h2. Cluster/Collection state
> {code}
> {"gettingstarted":{
> "replicationFactor":"1",
> "router":{"name":"compositeId"},
> "maxShardsPerNode":"2",
> "autoAddReplicas":"false",
> "shards":{
> "shard1":{
> "range":"80000000-ffffffff",
> "state":"active",
> "replicas":{
> "core_node5":{
> "core":"gettingstarted_shard1_replica1",
> "base_url":"http://127.0.0.1:8983/solr",
> "node_name":"127.0.0.1:8983_solr",
> "state":"active",
> "leader":"true",
> **"type": "realtime"**},
> "core_node10":{
> "core":"gettingstarted_shard1_replica2",
> "base_url":"http://127.0.0.1:7574/solr",
> "node_name":"127.0.0.1:7574_solr",
> "state":"active",
> **"type": "passive"**}},
> }},
> "shard2":{
> ...
> {code}
> h2. Back compatibility
> We should be able to support back compatibility by assuming replicas without
> a “type” property are _realtime_ replicas.
> h2. Failure Scenarios for passive replicas
> h3. Replica-Leader partition
> In SolrCloud today, in this scenario the replica would be placed in LIR. With
> _passive_ replicas, replicas may not be able to replicate from some time (and
> fall behind with the index) but queries can still be served. Once the
> connection is re-established the replication will continue.
> h3. Replica ZooKeeper partition
> _Passive_ replica will leave the cluster. “Smart clients” and other replicas
> (e.g. for distributed search) won’t find it and won’t query on it. Direct
> search requests to the replica may still succeed.
> h3. Passive replica dies (or is unreachable)
> Replica won’t be query-able. On restart, replica will recover from the
> leader, following the same flow as _realtime_ replicas: set state to DOWN,
> then RECOVERING, and finally ACTIVE. _Passive_ replicas will use a different
> {{RecoveryStrategy}} implementation, that omits *preparerecovery,* and peer
> sync attempt, it will jump to replication . If the leader didn't change, or
> if the other replicas are of type “append”, replication should be
> incremental. Once the first replication is done, passive replica will declare
> itself active and start serving traffic.
> h3. Leader dies
> Passive replica won’t be able to replicate. The cluster won’t take updates
> until a new leader is elected. Once a new leader is elected, updates will be
> back to normal. Passive replicas will remain active and serving query traffic
> during the “write outage”. Once the new leader is elected the replication
> will restart (maybe from a different node)
> h3. Leader ZooKeeper partition
> Same as today. Leader will abandon leadership and a new replica will be
> elected as leader.
> h2. Q&A
> h3. Can I use a combination of _passive_ + _realtime_?
> You could. The problem is that, since _realtime_ generate their own index,
> any change of leadership could trigger a full replication from all the
> _passive_ replicas. The biggest benefits of _append_ replicas is that they
> share the same index files, which means that even if the leader changes, the
> number of segments to replicate will remain low. For that reason, using
> _append_ replicas is recommended when using _passive_.
> h3. Can I use _passive_ + _append_ + _realtime_?
> The issue with mixing _realtime_ replicas with _append_ replicas is that if a
> different _realtime_ replica becomes the leader, the whole purpose of using
> _append_ replicas is defeated, since they will all have to replicate the full
> index.
> h3. What happens if replication from *passives* fail?
> TBD: In general we want those replicas to continue serving search traffic,
> but we may want to have a way to say “If can’t replicate after X hours put
> yourself in recovery” or something similar.
> [~varunthacker] suggested that we include in the response time since the last
> successful replication, and then the client can choose what to do with the
> results (in a multi-shard request, this date would be the oldest of all
> shards).
> h3. Do _passive_ replicas need to replicate from the leader only?
> This is not necessary. _Passive_ replicas can replicate from any _realtime_
> or _append_ replicas, although this would add some extra waiting time for the
> last updates. Replicating from a _realtime_ replica may not be a good idea,
> see the question “Can I use a combination of _passive_ + _realtime_?”
> h3. What if I need NRT?
> Then you can’t query _append_ or _passive_ replicas. You should use all
> _realtime_ replicas
> h3. Will new _passive_ replicas start receiving traffic immediately after
> added?
> _passive_ replicas will have the same states as _realtime_/_append_ replicas,
> they’ll join the cluster as “DOWN” and be moved to “RECOVERY” until they can
> replicate from the leader. Then they’ll start the replication process and
> become “ACTIVE”, at this point they’ll start responding queries. They'll use
> a different {{RecoveryStrategy}} that skips peer sync and buffering of docs,
> and just replicates.
> h3. What if a _passive_ replica receives an update?
> This will work the same as today with non-leader replicas, it will just
> forward the update to the correct leader.
> h3. What is the difference between using active + passive with legacy
> master/slave?
> These are just some I can think of:
> * You now need ZooKeeper to run in SolrCloud mode
> * High availability for writes, as long as you have more than 1 active replica
> * Shard management by Solr at index time and query time.
> * Full support for Collections and Collections API
> * SolrCloudClient support
> I'd like to get some thoughts on this proposal.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]