This has happened yet again. Does anyone yet have any input on the idea of using the Leader's collection name in Leader/Follower replication (or pre-Solr8.7 Master/Slave replication), rather than the core name?
-----Original Message----- From: Oakley, Craig (NIH/NLM/NCBI) [C] <craig.oak...@nih.gov.INVALID> Sent: Thursday, June 3, 2021 10:30 AM To: users@solr.apache.org Subject: RE: Cores renamed As a potential solution, I was wondering about implementing Master/Slave replication using the collection name of the Master rather than the core name. My initial experiment with this in a test environment seemed to work. Does anyone have any input on the idea of using the Master's collection name in Master/Slave replication, rather than the core name? -----Original Message----- From: Oakley, Craig (NIH/NLM/NCBI) [C] <craig.oak...@nih.gov.INVALID> Sent: Wednesday, June 02, 2021 5:46 PM To: users@solr.apache.org Subject: RE: Cores renamed It happened again this morning. Attached is an excerpt from solr.log (with port #s & IP addresses redacted) and below is the current CLUSTERSTATUS (with port #s redacted) Is there yet any explanation? { "responseHeader":{ "status":0, "QTime":10}, "cluster":{ "collections":{ "ipg_report_large":{ "pullReplicas":"0", "replicationFactor":"1", "shards":{"shard1":{ "range":"80000000-7fffffff", "state":"active", "replicas":{ "core_node8":{ "core":"ipg_report_large_shard1_replica_n7", "base_url":"http://solrdbprod26.be-md:####/solr", "node_name":"solrdbprod26.be-md:####_solr", "state":"active", "type":"NRT", "force_set_state":"false", "leader":"true"}, "core_node10":{ "core":"ipg_report_large_shard1_replica_n9", "base_url":"http://solrdbprod25.be-md:####/solr", "node_name":"solrdbprod25.be-md:####_solr", "state":"active", "type":"NRT", "force_set_state":"false"}}}}, "router":{"name":"compositeId"}, "maxShardsPerNode":"1", "autoAddReplicas":"false", "nrtReplicas":"1", "tlogReplicas":"0", "znodeVersion":741, "configName":"ipg_report_large"}}, "live_nodes":["solrdbprod26.be-md:####_solr", "solrdbprod25.be-md:####_solr"]}} -----Original Message----- From: Oakley, Craig (NIH/NLM/NCBI) [C] <craig.oak...@nih.gov.INVALID> Sent: Monday, May 17, 2021 5:01 PM To: users@solr.apache.org Subject: RE: Cores renamed The entire directory for the old core gets removed Here is CLUSTERSTATUS (again with port numbers redacted). I ran CLUSTERSTATUS on both nodes, and the only difference was QTime (that is, there was no real difference): { "responseHeader":{ "status":0, "QTime":5}, "cluster":{ "collections":{ "ipg_report_large":{ "pullReplicas":"0", "replicationFactor":"1", "shards":{"shard1":{ "range":"80000000-7fffffff", "state":"active", "replicas":{ "core_node4":{ "core":"ipg_report_large_shard1_replica_n3", "base_url":"http://solrdbprod26.be-md:####/solr", "node_name":"solrdbprod26.be-md:####_solr", "state":"active", "type":"NRT", "force_set_state":"false"}, "core_node6":{ "core":"ipg_report_large_shard1_replica_n5", "base_url":"http://solrdbprod25.be-md:####/solr", "node_name":"solrdbprod25.be-md:####_solr", "state":"active", "type":"NRT", "force_set_state":"false", "leader":"true"}}}}, "router":{"name":"compositeId"}, "maxShardsPerNode":"1", "autoAddReplicas":"false", "nrtReplicas":"1", "tlogReplicas":"0", "znodeVersion":710, "configName":"ipg_report_large"}}, "live_nodes":["solrdbprod26.be-md:####_solr", "solrdbprod25.be-md:####_solr"]}} -----Original Message----- From: matthew sporleder <msporle...@gmail.com> Sent: Monday, May 17, 2021 4:34 PM To: users@solr.apache.org Subject: Re: Cores renamed Can you verify all of your zkHost connection params across the entire cluster, and share the replicationFactor, autoAddReplicas, etc for the collection? My theory is that you have two zookeeper configs conflicting as master elections happens, causing new replicas to get created on-the-fly. Also -- do these cores get deleted from the filesystem or left around? On Mon, May 17, 2021 at 4:11 PM Oakley, Craig (NIH/NLM/NCBI) [C] <craig.oak...@nih.gov.invalid> wrote: > > > What does the core renames itself to, that would probably be the biggest > > hint. > > At 4:01pm 1/14/21, Solr decided on its own to drop the core > ipg_report_large_shard1_replica_n1 and to create the core > ipg_report_large_shard1_replica_n7 in its place > > At 4:33am 1/16/21, Solr decided on its own to drop the core > ipg_report_large_shard1_replica_n5 (on another node of the same SolrCloud) > and to create the core ipg_report_large_shard1_replica_n9 in its place > > At about 4:10pm 1/26/21, Solr decided on its own to drop this core > ipg_report_large_shard1_replica_n9 and to create the core > ipg_report_large_shard1_replica_n13 in its place > > In March, we created a new SolrCloud for the same collection, and reloaded > the data > > At 7:59am 5/12/21, Solr decided on its own to drop the core > ipg_report_large_shard1_replica_n1 and to create the core > ipg_report_large_shard1_replica_n5 in its place > > I am attaching an excerpt from solr.log for the most recent problem (with IP > addresses and port numbers redacted) > > Please not that Master/Slave replication breaks when a core is renamed, so > this can be a major problem > > > Any ideas? > > -----Original Message----- > From: Alexandre Rafalovitch <arafa...@gmail.com> > Sent: Wednesday, May 12, 2021 2:10 PM > To: users@solr.apache.org > Subject: Re: Cores renamed > > This is truly a shot in the dark, but is it possible you have > something in core.properties file (which is where the core name is for > non-Cloud setup)? > > What does the core renames itself to, that would probably be the biggest hint. > > Regards, > Alex. > > On Wed, 12 May 2021 at 14:00, Oakley, Craig (NIH/NLM/NCBI) [C] > <craig.oak...@nih.gov.invalid> wrote: > > > > This phenomenon has happened again (this time without any REQUESTRECOVERY) > > > > Does anyone yet have any explanation of this? > > > > -----Original Message----- > > From: Oakley, Craig (NIH/NLM/NCBI) [C] <craig.oak...@nih.gov.INVALID> > > Sent: Thursday, January 28, 2021 10:57 AM > > To: solr-u...@lucene.apache.org > > Subject: Cores renamed > > > > We recently have had a few occasions when cores for one specific collection > > were renamed (or more likely dropped and recreated, and thus ended up with > > a different core name). > > > > Is this a known phenomenon? Is there any explanation? > > > > It may be relevant that we just recently started running this SolrCloud on > > version 8.5.2, although the collection was created under Solr7.4. Also, > > this collection seems to experience some heavy updates such that the > > non-Leader replica has trouble keeping up. One of these renames occurred at > > 4:33am, so I highly suspect that the rename (or drop and recreate) was done > > by some internal Solr thread rather than by any of my coworkers. One other > > potential clue is that I can see that > > /solr/admin/cores?action=REQUESTRECOVERY was usually run on the new core a > > moment after it was created. > > > > Does anyone have any insights?