Thanks again Dave, helpful information! Yes, we have one slave as primary - serving queries, and the other for failover... We are in a slow process of moving to SolrCloud Solr 8/9, but still a lot of work is given to maintaining our Solr 6 deployment. So learning more about this type of replication is valuable.
Matt On Tue, Oct 11, 2022 at 12:32 PM Dave <hastings.recurs...@gmail.com> wrote: > I’ve seen this happen where the slaves behave differently from each other > or get the version of the index out of whack, usually it happened if the > latency of one slave vs another to the master isn’t the same. But again, > that’s why you should have at least double the size on your slaves for the > index, I also wonder if both or however many slaves have the exact disk > space and memory, and maybe if one slave is outside of the network increase > the replication timeout to not battle with the immediate server. > > The entire thing is a dance, for example of the one server was giving > issues and re replicating it could be because the index changed part way > through so it had to make another temp index folder and repeat the same > process. So many entertaining things to watch > > Take care and good luck. Also another tip from experience, use only one > slave for queen and have the others as backup, as a single server can cache > the fields way faster than round robin or whatever other metric uses to > determine who serves. > -Dave > > > > On Oct 11, 2022, at 1:32 PM, mtn search <search...@gmail.com> wrote: > > > > Thanks Dave! Yes, we ran into this issue yesterday and do need to > review > > the disk space we have available (as well as the large size of our > cores). > > Also, there was some interesting context for this event. We have 2 > slaves > > on separate servers replicating from the master. One slave replicated > fine > > over the weekend, with only a fraction of the files needing to be > updated. > > However, on the other slave, Solr believed it needed to do a full > > replication. Over and over it filled up disk, failed, appeared to clean > > failed attempt and tried again. Yesterday, after a couple Solr restarts > > and then a full Solr start/stop, it appears that Solr recognized it did > not > > need to perform a full replication and the completed successfully by only > > copying over the subset of index files needed (like the other slave did). > > > > I am not sure how to explain it other than for a time Solr was in a state > > that required a full replication and the stop/start forced it to reassess > > what was actually needed in the replication. Replication is healthy on > > both today. > > > > Matt > > > >> On Mon, Oct 10, 2022 at 1:18 PM Dave <hastings.recurs...@gmail.com> > wrote: > >> > >> Only an optimize or a large fragment merge would cause a large file > >> deposits there. That’s why “slaves” should always have double the index > >> size available as solr will decide on its own when to merge or optimize > on > >> the master so the slaves need to be ready for double the size, and the > >> master needs to be ready for triple the size. If you don’t have the > disk > >> space ready to handle this you’re going to eventually run into some > serious > >> issues, or just not be able to replicate > >> > >> -dave > >> > >>>> On Oct 10, 2022, at 2:56 PM, mtn search <search...@gmail.com> wrote: > >>> > >>> As I go back through > >>> https://solr.apache.org/guide/6_6/index-replication.html, the picture > is > >>> filling in a little more. My guess the tmp dir referenced, is the > >>> index.<timestamp> dir. > >>> > >>> Very interested in cases that might generate a full replication. To my > >>> knowledge no optimize commands has been issued against the core in > >> question. > >>> > >>>> On Mon, Oct 10, 2022 at 12:38 PM mtn search <search...@gmail.com> > >> wrote: > >>>> > >>>> Hello, I am learning more about replication as I maintain a large > Solr > >> 6 > >>>> set of Solr servers configured for Master/Slave. > >>>> > >>>> I noticed during some replication activities in addition to the > original > >>>> index dir under the core name on the file system is a dir named > "index" > >>>> with a timestamp. index.<timestamp>. Files are written to this dir > >> with > >>>> the timestamp during replication. I am interested in how this works: > >>>> > >>>> For every core replicating to it's master is this timestamped dir > >>>> created? > >>>> > >>>> Or is this timestamped dir created/used for only special > circumstances? > >>>> If so, what? > >>>> > >>>> - Are there cases that cause a full replication within Solr 6? > >>>> > >>>> Is the original index dir removed and the time stamped dir renamed to > >>>> "index" after replication? > >>>> > >>>> I initially figured all replication activities happened within the > index > >>>> dir, but that does not appear to be the case. > >>>> > >>>> Any tips, or documentation references would be appreciated. > >>>> > >>>> Thanks, > >>>> Matt > >>>> > >> >