[ https://issues.apache.org/jira/browse/SOLR-17306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Peter Kroiss updated SOLR-17306: -------------------------------- Description: We are testing Solr 9.6.2 in a leader - repeater - follower configuration. We have times where we write the leader heavily, in that time replication is disabled to save bandwidth. In the time, when replication is disabled on leader, the repeater restarts for some reason, the repeater loses all documents and doesn't recover when the leader is opened for replication. The documents are deleted but indexVersion and generation properties are set to the value of the leader, so the repeater or follower doesn't recover when the leader is opened for replication again. It recovers only when there are commits on the leader after opening the replication. Log: 2024-05-22 06:18:42.186 INFO (qtp16373883-27-null-23) [c: s: r: x:mycore t:null-23] o.a.s.c.S.Request webapp=/solr path=/replication params=\{wt=json&command=details} status=0 QTime=10 2024-05-22 06:18:46.195 INFO (indexFetcher-43-thread-1) [c: s: r: x:mycore t:] o.a.s.h.IndexFetcher Leader's generation: 0 2024-05-22 06:18:46.195 INFO (indexFetcher-43-thread-1) [c: s: r: x:mycore t:] o.a.s.h.IndexFetcher Leader's version: 0 2024-05-22 06:18:46.195 INFO (indexFetcher-43-thread-1) [c: s: r: x:mycore t:] o.a.s.h.IndexFetcher Follower's generation: 2913 2024-05-22 06:18:46.195 INFO (indexFetcher-43-thread-1) [c: s: r: x:mycore t:] o.a.s.h.IndexFetcher Follower's version: 1716300697144 2024-05-22 06:18:46.195 INFO (indexFetcher-43-thread-1) [c: s: r: x:mycore t:] o.a.s.h.IndexFetcher New index in Leader. Deleting mine... --> there is no new Index in Leader it is only closed for replication We think the problem is in IndexFetcher old: if (IndexDeletionPolicyWrapper.getCommitTimestamp(commit) != 0L) { forceReplication - will probably fix the problem new : if (forceReplication && IndexDeletionPolicyWrapper.getCommitTimestamp(commit) != 0L) { When investigation the problem we also found some inconsistencies in the details request. There are two fragments leader. When the leader is closed for replication the property leader. replicationEnabled is set to true, the property follower. leaderDetails. Leader. replicationEnabled is correct. Example curl -s "https://solr9-repeater:8983/solr/mycore/replication?wt=json&command=details" | jq '.details | { indexSize: .indexSize, indexVersion: .indexVersion, generation: .generation, indexPath: .indexPath, leader: \\{ replicableVersion: .leader.replicableVersion, replicableGeneration: .leader.replicableGeneration, replicationEnabled: .leader.replicationEnabled } , follower: { leaderDetails: { indexSize: .follower.leaderDetails.indexSize, generation: .follower.leaderDetails.generation, indexVersion: .follower.leaderDetails.indexVersion, indexPath: .follower.leaderDetails.indexPath, leader: { replicableVersion: .follower.leaderDetails.leader.replicableVersion , replicableGeneration: .follower.leaderDetails.leader.replicableGeneration, replicationEnabled: .follower.leaderDetails.leader.replicationEnabled } }} }' { "indexSize": "10.34 GB", "indexVersion": 1716358708159, "generation": 2913, "indexPath": "/var/solr/data/mycore/data/index.20240522061946262", "leader": { "replicableVersion": 1716358708159, "replicableGeneration": 2913, "replicationEnabled": "true" } , "follower": { "leaderDetails": { "indexSize": "10.34 GB", "generation": 2913, "indexVersion": 1716358708159, "indexPath": "/var/solr/data/mycore/data/restore.20240508131046932", "leader": { "replicableVersion": 1716358708159, "replicableGeneration": 2913, "replicationEnabled": "false" } } } } was: We are testing Solr 9.6.2 in a leader - repeater - follower configuration. We have times where we write the leader heavily, in that time replication is disabled to save bandwidth. In the time, when replication is disabled on leader, the repeater restarts for some reason, the repeater loses all documents and doesn't recover when the leader is opened for replication. The documents are deleted but indexVersion and generation properties are set to the value of the leader, so the repeater or follower doesn't recover when the leader is opened for replication again. It recovers only when there are commits on the leader after opening the replication. Log: 2024-05-22 06:18:42.186 INFO (qtp16373883-27-null-23) [c: s: r: x:mycore t:null-23] o.a.s.c.S.Request webapp=/solr path=/replication params=\{wt=json&command=details} status=0 QTime=10 2024-05-22 06:18:46.195 INFO (indexFetcher-43-thread-1) [c: s: r: x:mycore t:] o.a.s.h.IndexFetcher Leader's generation: 0 2024-05-22 06:18:46.195 INFO (indexFetcher-43-thread-1) [c: s: r: x:mycore t:] o.a.s.h.IndexFetcher Leader's version: 0 2024-05-22 06:18:46.195 INFO (indexFetcher-43-thread-1) [c: s: r: x:mycore t:] o.a.s.h.IndexFetcher Follower's generation: 2913 2024-05-22 06:18:46.195 INFO (indexFetcher-43-thread-1) [c: s: r: x:mycore t:] o.a.s.h.IndexFetcher Follower's version: 1716300697144 2024-05-22 06:18:46.195 INFO (indexFetcher-43-thread-1) [c: s: r: x:mycore t:] o.a.s.h.IndexFetcher New index in Leader. Deleting mine... We think the problem is in IndexFetcher old: if (IndexDeletionPolicyWrapper.getCommitTimestamp(commit) != 0L) { forceReplication - will probably fix the problem new : if (forceReplication && IndexDeletionPolicyWrapper.getCommitTimestamp(commit) != 0L) { When investigation the problem we also found some inconsistencies in the details request. There are two fragments leader. When the leader is closed for replication the property leader. replicationEnabled is set to true, the property follower. leaderDetails. Leader. replicationEnabled is correct. Example curl -s "https://solr9-repeater:8983/solr/mycore/replication?wt=json&command=details" | jq '.details | { indexSize: .indexSize, indexVersion: .indexVersion, generation: .generation, indexPath: .indexPath, leader: \{ replicableVersion: .leader.replicableVersion, replicableGeneration: .leader.replicableGeneration, replicationEnabled: .leader.replicationEnabled }, follower: { leaderDetails: { indexSize: .follower.leaderDetails.indexSize, generation: .follower.leaderDetails.generation, indexVersion: .follower.leaderDetails.indexVersion, indexPath: .follower.leaderDetails.indexPath, leader: { replicableVersion: .follower.leaderDetails.leader.replicableVersion , replicableGeneration: .follower.leaderDetails.leader.replicableGeneration, replicationEnabled: .follower.leaderDetails.leader.replicationEnabled } }} }' { "indexSize": "10.34 GB", "indexVersion": 1716358708159, "generation": 2913, "indexPath": "/var/solr/data/mycore/data/index.20240522061946262", "leader": { "replicableVersion": 1716358708159, "replicableGeneration": 2913, "replicationEnabled": "true" }, "follower": { "leaderDetails": { "indexSize": "10.34 GB", "generation": 2913, "indexVersion": 1716358708159, "indexPath": "/var/solr/data/mycore/data/restore.20240508131046932", "leader": { "replicableVersion": 1716358708159, "replicableGeneration": 2913, "replicationEnabled": "false" } } } } > Solr Repeater or Slave loses data after restart when replication is not > enabled on leader > ----------------------------------------------------------------------------------------- > > Key: SOLR-17306 > URL: https://issues.apache.org/jira/browse/SOLR-17306 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Affects Versions: 9.2, 9.3, 9.4, 9.6, 9.5.0 > Reporter: Peter Kroiss > Priority: Major > > We are testing Solr 9.6.2 in a leader - repeater - follower configuration. We > have times where we write the leader heavily, in that time replication is > disabled to save bandwidth. > In the time, when replication is disabled on leader, the repeater restarts > for some reason, the repeater loses all documents and doesn't recover when > the leader is opened for replication. > The documents are deleted but indexVersion and generation properties are set > to the value of the leader, so the repeater or follower doesn't recover when > the leader is opened for replication again. > It recovers only when there are commits on the leader after opening the > replication. > Log: > 2024-05-22 06:18:42.186 INFO (qtp16373883-27-null-23) [c: s: r: x:mycore > t:null-23] o.a.s.c.S.Request webapp=/solr path=/replication > params=\{wt=json&command=details} status=0 QTime=10 > 2024-05-22 06:18:46.195 INFO (indexFetcher-43-thread-1) [c: s: r: x:mycore > t:] o.a.s.h.IndexFetcher Leader's generation: 0 > 2024-05-22 06:18:46.195 INFO (indexFetcher-43-thread-1) [c: s: r: x:mycore > t:] o.a.s.h.IndexFetcher Leader's version: 0 > 2024-05-22 06:18:46.195 INFO (indexFetcher-43-thread-1) [c: s: r: x:mycore > t:] o.a.s.h.IndexFetcher Follower's generation: 2913 > 2024-05-22 06:18:46.195 INFO (indexFetcher-43-thread-1) [c: s: r: x:mycore > t:] o.a.s.h.IndexFetcher Follower's version: 1716300697144 > 2024-05-22 06:18:46.195 INFO (indexFetcher-43-thread-1) [c: s: r: x:mycore > t:] o.a.s.h.IndexFetcher New index in Leader. Deleting mine... > > --> there is no new Index in Leader it is only closed for replication > > > We think the problem is in IndexFetcher > old: if (IndexDeletionPolicyWrapper.getCommitTimestamp(commit) != 0L) { > forceReplication - will probably fix the problem > new : if (forceReplication && > IndexDeletionPolicyWrapper.getCommitTimestamp(commit) != 0L) { > > > > > When investigation the problem we also found some inconsistencies in the > details request. There are two fragments leader. When the leader is closed > for replication the property leader. replicationEnabled is set to true, the > property follower. leaderDetails. Leader. replicationEnabled is correct. > > Example > curl -s > "https://solr9-repeater:8983/solr/mycore/replication?wt=json&command=details" > | jq '.details | > { indexSize: .indexSize, indexVersion: .indexVersion, generation: > .generation, indexPath: .indexPath, leader: \\{ replicableVersion: > .leader.replicableVersion, replicableGeneration: > .leader.replicableGeneration, replicationEnabled: .leader.replicationEnabled } > , > follower: { leaderDetails: { indexSize: .follower.leaderDetails.indexSize, > generation: .follower.leaderDetails.generation, > indexVersion: .follower.leaderDetails.indexVersion, indexPath: > .follower.leaderDetails.indexPath, > leader: > { replicableVersion: .follower.leaderDetails.leader.replicableVersion , > replicableGeneration: .follower.leaderDetails.leader.replicableGeneration, > replicationEnabled: .follower.leaderDetails.leader.replicationEnabled } > }} > }' > > { > "indexSize": "10.34 GB", > "indexVersion": 1716358708159, > "generation": 2913, > "indexPath": "/var/solr/data/mycore/data/index.20240522061946262", > "leader": > { "replicableVersion": 1716358708159, "replicableGeneration": 2913, > "replicationEnabled": "true" } > , > "follower": { > "leaderDetails": { > "indexSize": "10.34 GB", > "generation": 2913, > "indexVersion": 1716358708159, > "indexPath": "/var/solr/data/mycore/data/restore.20240508131046932", > "leader": > { "replicableVersion": 1716358708159, "replicableGeneration": > 2913, "replicationEnabled": "false" } > } > } > } -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org