I’ve seen this happen where the slaves behave differently from each other or 
get the version of the index out of whack, usually it happened if the latency 
of one slave vs another to the master isn’t the same. But again, that’s why you 
should have at least double the size on your slaves for the index, I also 
wonder if both or however many slaves have the exact disk space and memory, and 
maybe if one slave is outside of the network increase the replication timeout 
to not battle with the immediate server.  

The entire thing is a dance, for example of the one server was giving issues 
and re replicating it could be because the index changed part way through so it 
had to make another temp index folder and repeat the same process. So many 
entertaining things to watch 

Take care and good luck. Also another tip from experience, use only one slave 
for queen and have the others as backup, as a single server can cache the 
fields way faster than round robin or whatever other metric uses to determine 
who serves. 
-Dave


> On Oct 11, 2022, at 1:32 PM, mtn search <search...@gmail.com> wrote:
> 
> Thanks Dave!  Yes, we ran into this issue yesterday and do need to review
> the disk space we have available (as well as the large size of our cores).
> Also, there was some interesting context for this event.  We have 2 slaves
> on separate servers replicating from the master.  One slave replicated fine
> over the weekend, with only a fraction of the files needing to be updated.
> However, on the other slave, Solr believed it needed to do a full
> replication.  Over and over it filled up disk, failed, appeared to clean
> failed attempt and tried again.  Yesterday, after a couple Solr restarts
> and then a full Solr start/stop, it appears that Solr recognized it did not
> need to perform a full replication and the completed successfully by only
> copying over the subset of index files needed (like the other slave did).
> 
> I am not sure how to explain it other than for a time Solr was in a state
> that required a full replication and the stop/start forced it to reassess
> what was actually needed in the replication.  Replication is healthy on
> both today.
> 
> Matt
> 
>> On Mon, Oct 10, 2022 at 1:18 PM Dave <hastings.recurs...@gmail.com> wrote:
>> 
>> Only an optimize or a large fragment merge would cause a large file
>> deposits there. That’s why “slaves” should always have double the index
>> size available as solr will decide on its own when to merge or optimize on
>> the master so the slaves need to be ready for double the size, and the
>> master needs to be ready for triple the size.  If you don’t have the disk
>> space ready to handle this you’re going to eventually run into some serious
>> issues, or just not be able to replicate
>> 
>> -dave
>> 
>>>> On Oct 10, 2022, at 2:56 PM, mtn search <search...@gmail.com> wrote:
>>> 
>>> As I go back through
>>> https://solr.apache.org/guide/6_6/index-replication.html, the picture is
>>> filling in a little more.  My guess the tmp dir referenced, is the
>>> index.<timestamp> dir.
>>> 
>>> Very interested in cases that might generate a full replication.  To my
>>> knowledge no optimize commands has been issued against the core in
>> question.
>>> 
>>>> On Mon, Oct 10, 2022 at 12:38 PM mtn search <search...@gmail.com>
>> wrote:
>>>> 
>>>> Hello,  I am learning more about replication as I maintain a large Solr
>> 6
>>>> set of Solr servers configured for Master/Slave.
>>>> 
>>>> I noticed during some replication activities in addition to the original
>>>> index dir under the core name on the file system is a dir named "index"
>>>> with a timestamp.  index.<timestamp>.  Files are written to this dir
>> with
>>>> the timestamp during replication.  I am interested in how this works:
>>>> 
>>>> For every core replicating to it's master is this timestamped dir
>>>> created?
>>>> 
>>>> Or is this timestamped dir created/used for only special circumstances?
>>>> If so, what?
>>>> 
>>>>     - Are there cases that cause a full replication within Solr 6?
>>>> 
>>>> Is the original index dir removed and the time stamped dir renamed to
>>>> "index" after replication?
>>>> 
>>>> I initially figured all replication activities happened within the index
>>>> dir, but that does not appear to be the case.
>>>> 
>>>> Any tips, or documentation references would be appreciated.
>>>> 
>>>> Thanks,
>>>> Matt
>>>> 
>> 

Reply via email to