[ 
https://issues.apache.org/jira/browse/SOLR-17306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Kroiss updated SOLR-17306:
--------------------------------
    Description: 
We are testing Solr 9.6.2 in a leader - repeater - follower configuration. We 
have times where we write the leader heavily, in that time replication is 
disabled to save bandwidth.

In the time, when replication is disabled on leader, the repeater restarts for 
some reason, the repeater loses all documents and doesn't recover when the 
leader is opened for replication.

The documents are deleted but indexVersion and generation properties are set to 
the value of the leader, so the repeater or follower doesn't recover when the 
leader is opened for replication again.

It recovers only when there are commits on the leader after opening the 
replication.

Log:

2024-05-22 06:18:42.186 INFO  (qtp16373883-27-null-23) [c: s: r: x:mycore 
t:null-23] o.a.s.c.S.Request webapp=/solr path=/replication 
params=\{wt=json&command=details} status=0 QTime=10

2024-05-22 06:18:46.195 INFO  (indexFetcher-43-thread-1) [c: s: r: x:mycore t:] 
o.a.s.h.IndexFetcher Leader's generation: 0

2024-05-22 06:18:46.195 INFO  (indexFetcher-43-thread-1) [c: s: r: x:mycore t:] 
o.a.s.h.IndexFetcher Leader's version: 0

2024-05-22 06:18:46.195 INFO  (indexFetcher-43-thread-1) [c: s: r: x:mycore t:] 
o.a.s.h.IndexFetcher Follower's generation: 2913

2024-05-22 06:18:46.195 INFO  (indexFetcher-43-thread-1) [c: s: r: x:mycore t:] 
o.a.s.h.IndexFetcher Follower's version: 1716300697144

2024-05-22 06:18:46.195 INFO  (indexFetcher-43-thread-1) [c: s: r: x:mycore t:] 
o.a.s.h.IndexFetcher New index in Leader. Deleting mine...

 

--> there is no new Index in Leader it is only closed for replication

 

 

We think the problem is in IndexFetcher

old: if (IndexDeletionPolicyWrapper.getCommitTimestamp(commit) != 0L) {

forceReplication - will probably fix the problem

new : if (forceReplication && 
IndexDeletionPolicyWrapper.getCommitTimestamp(commit) != 0L) {

 

 

 

 

When investigation the problem we also found some inconsistencies in the 
details request. There are two fragments leader. When the leader is closed for 
replication the property leader. replicationEnabled is set to true, the 
property follower. leaderDetails. Leader. replicationEnabled is correct.

 

Example

curl -s 
"https://solr9-repeater:8983/solr/mycore/replication?wt=json&command=details"; | 
jq  '.details |

{ indexSize: .indexSize, indexVersion: .indexVersion, generation: .generation, 
indexPath: .indexPath, leader: \\{  replicableVersion: 
.leader.replicableVersion, replicableGeneration: .leader.replicableGeneration, 
replicationEnabled: .leader.replicationEnabled }

,

follower: { leaderDetails: { indexSize: .follower.leaderDetails.indexSize, 
generation: .follower.leaderDetails.generation,

 indexVersion: .follower.leaderDetails.indexVersion, indexPath: 
.follower.leaderDetails.indexPath,

leader:

{ replicableVersion:  .follower.leaderDetails.leader.replicableVersion , 
replicableGeneration:  .follower.leaderDetails.leader.replicableGeneration, 
replicationEnabled:  .follower.leaderDetails.leader.replicationEnabled }

   }}

}'

 

{

  "indexSize": "10.34 GB",

  "indexVersion": 1716358708159,

  "generation": 2913,

  "indexPath": "/var/solr/data/mycore/data/index.20240522061946262",

  "leader":

{     "replicableVersion": 1716358708159,     "replicableGeneration": 2913,     
"replicationEnabled": "true"   }

,

  "follower": {

    "leaderDetails": {

      "indexSize": "10.34 GB",

      "generation": 2913,

      "indexVersion": 1716358708159,

      "indexPath": "/var/solr/data/mycore/data/restore.20240508131046932",

      "leader":

{         "replicableVersion": 1716358708159,         "replicableGeneration": 
2913,         "replicationEnabled": "false"       }

    }

  }

}

  was:
We are testing Solr 9.6.2 in a leader - repeater - follower configuration. We 
have times where we write the leader heavily, in that time replication is 
disabled to save bandwidth.

In the time, when replication is disabled on leader, the repeater restarts for 
some reason, the repeater loses all documents and doesn't recover when the 
leader is opened for replication.

The documents are deleted but indexVersion and generation properties are set to 
the value of the leader, so the repeater or follower doesn't recover when the 
leader is opened for replication again.

It recovers only when there are commits on the leader after opening the 
replication.

Log:

2024-05-22 06:18:42.186 INFO  (qtp16373883-27-null-23) [c: s: r: x:mycore 
t:null-23] o.a.s.c.S.Request webapp=/solr path=/replication 
params=\{wt=json&command=details} status=0 QTime=10

2024-05-22 06:18:46.195 INFO  (indexFetcher-43-thread-1) [c: s: r: x:mycore t:] 
o.a.s.h.IndexFetcher Leader's generation: 0

2024-05-22 06:18:46.195 INFO  (indexFetcher-43-thread-1) [c: s: r: x:mycore t:] 
o.a.s.h.IndexFetcher Leader's version: 0

2024-05-22 06:18:46.195 INFO  (indexFetcher-43-thread-1) [c: s: r: x:mycore t:] 
o.a.s.h.IndexFetcher Follower's generation: 2913

2024-05-22 06:18:46.195 INFO  (indexFetcher-43-thread-1) [c: s: r: x:mycore t:] 
o.a.s.h.IndexFetcher Follower's version: 1716300697144

2024-05-22 06:18:46.195 INFO  (indexFetcher-43-thread-1) [c: s: r: x:mycore t:] 
o.a.s.h.IndexFetcher New index in Leader. Deleting mine...

 

 

We think the problem is in IndexFetcher

old: if (IndexDeletionPolicyWrapper.getCommitTimestamp(commit) != 0L) {

forceReplication - will probably fix the problem

new : if (forceReplication && 
IndexDeletionPolicyWrapper.getCommitTimestamp(commit) != 0L) {

 

 

 

 

When investigation the problem we also found some inconsistencies in the 
details request. There are two fragments leader. When the leader is closed for 
replication the property leader. replicationEnabled is set to true, the 
property follower. leaderDetails. Leader. replicationEnabled is correct.

 

Example

curl -s 
"https://solr9-repeater:8983/solr/mycore/replication?wt=json&command=details"; | 
jq  '.details | { indexSize: .indexSize, indexVersion: .indexVersion,

generation: .generation, indexPath: .indexPath,

leader: \{  replicableVersion: .leader.replicableVersion, replicableGeneration: 
.leader.replicableGeneration, replicationEnabled: .leader.replicationEnabled },

follower: { leaderDetails: { indexSize: .follower.leaderDetails.indexSize, 
generation: .follower.leaderDetails.generation,

 indexVersion: .follower.leaderDetails.indexVersion, indexPath: 
.follower.leaderDetails.indexPath,

leader: { replicableVersion:  .follower.leaderDetails.leader.replicableVersion 
, replicableGeneration:  .follower.leaderDetails.leader.replicableGeneration,

replicationEnabled:  .follower.leaderDetails.leader.replicationEnabled }

   }}

}'

 

{

  "indexSize": "10.34 GB",

  "indexVersion": 1716358708159,

  "generation": 2913,

  "indexPath": "/var/solr/data/mycore/data/index.20240522061946262",

  "leader": {

    "replicableVersion": 1716358708159,

    "replicableGeneration": 2913,

    "replicationEnabled": "true"

  },

  "follower": {

    "leaderDetails": {

      "indexSize": "10.34 GB",

      "generation": 2913,

      "indexVersion": 1716358708159,

      "indexPath": "/var/solr/data/mycore/data/restore.20240508131046932",

      "leader": {

        "replicableVersion": 1716358708159,

        "replicableGeneration": 2913,

        "replicationEnabled": "false"

      }

    }

  }

}


> Solr Repeater or Slave loses data after restart when replication is not 
> enabled on leader
> -----------------------------------------------------------------------------------------
>
>                 Key: SOLR-17306
>                 URL: https://issues.apache.org/jira/browse/SOLR-17306
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>    Affects Versions: 9.2, 9.3, 9.4, 9.6, 9.5.0
>            Reporter: Peter Kroiss
>            Priority: Major
>
> We are testing Solr 9.6.2 in a leader - repeater - follower configuration. We 
> have times where we write the leader heavily, in that time replication is 
> disabled to save bandwidth.
> In the time, when replication is disabled on leader, the repeater restarts 
> for some reason, the repeater loses all documents and doesn't recover when 
> the leader is opened for replication.
> The documents are deleted but indexVersion and generation properties are set 
> to the value of the leader, so the repeater or follower doesn't recover when 
> the leader is opened for replication again.
> It recovers only when there are commits on the leader after opening the 
> replication.
> Log:
> 2024-05-22 06:18:42.186 INFO  (qtp16373883-27-null-23) [c: s: r: x:mycore 
> t:null-23] o.a.s.c.S.Request webapp=/solr path=/replication 
> params=\{wt=json&command=details} status=0 QTime=10
> 2024-05-22 06:18:46.195 INFO  (indexFetcher-43-thread-1) [c: s: r: x:mycore 
> t:] o.a.s.h.IndexFetcher Leader's generation: 0
> 2024-05-22 06:18:46.195 INFO  (indexFetcher-43-thread-1) [c: s: r: x:mycore 
> t:] o.a.s.h.IndexFetcher Leader's version: 0
> 2024-05-22 06:18:46.195 INFO  (indexFetcher-43-thread-1) [c: s: r: x:mycore 
> t:] o.a.s.h.IndexFetcher Follower's generation: 2913
> 2024-05-22 06:18:46.195 INFO  (indexFetcher-43-thread-1) [c: s: r: x:mycore 
> t:] o.a.s.h.IndexFetcher Follower's version: 1716300697144
> 2024-05-22 06:18:46.195 INFO  (indexFetcher-43-thread-1) [c: s: r: x:mycore 
> t:] o.a.s.h.IndexFetcher New index in Leader. Deleting mine...
>  
> --> there is no new Index in Leader it is only closed for replication
>  
>  
> We think the problem is in IndexFetcher
> old: if (IndexDeletionPolicyWrapper.getCommitTimestamp(commit) != 0L) {
> forceReplication - will probably fix the problem
> new : if (forceReplication && 
> IndexDeletionPolicyWrapper.getCommitTimestamp(commit) != 0L) {
>  
>  
>  
>  
> When investigation the problem we also found some inconsistencies in the 
> details request. There are two fragments leader. When the leader is closed 
> for replication the property leader. replicationEnabled is set to true, the 
> property follower. leaderDetails. Leader. replicationEnabled is correct.
>  
> Example
> curl -s 
> "https://solr9-repeater:8983/solr/mycore/replication?wt=json&command=details"; 
> | jq  '.details |
> { indexSize: .indexSize, indexVersion: .indexVersion, generation: 
> .generation, indexPath: .indexPath, leader: \\{  replicableVersion: 
> .leader.replicableVersion, replicableGeneration: 
> .leader.replicableGeneration, replicationEnabled: .leader.replicationEnabled }
> ,
> follower: { leaderDetails: { indexSize: .follower.leaderDetails.indexSize, 
> generation: .follower.leaderDetails.generation,
>  indexVersion: .follower.leaderDetails.indexVersion, indexPath: 
> .follower.leaderDetails.indexPath,
> leader:
> { replicableVersion:  .follower.leaderDetails.leader.replicableVersion , 
> replicableGeneration:  .follower.leaderDetails.leader.replicableGeneration, 
> replicationEnabled:  .follower.leaderDetails.leader.replicationEnabled }
>    }}
> }'
>  
> {
>   "indexSize": "10.34 GB",
>   "indexVersion": 1716358708159,
>   "generation": 2913,
>   "indexPath": "/var/solr/data/mycore/data/index.20240522061946262",
>   "leader":
> {     "replicableVersion": 1716358708159,     "replicableGeneration": 2913,   
>   "replicationEnabled": "true"   }
> ,
>   "follower": {
>     "leaderDetails": {
>       "indexSize": "10.34 GB",
>       "generation": 2913,
>       "indexVersion": 1716358708159,
>       "indexPath": "/var/solr/data/mycore/data/restore.20240508131046932",
>       "leader":
> {         "replicableVersion": 1716358708159,         "replicableGeneration": 
> 2913,         "replicationEnabled": "false"       }
>     }
>   }
> }



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org

Reply via email to