* Daniel Cho (daniel...@qnap.com) wrote: > Hi Zhang, > > We use qemu-4.1.0 release on this case. > > I think we need use block mirror to sync the disk to secondary node first, > then stop the primary VM and build COLO system. > > In the stop moment, you need add some netfilter and chardev socket node for > COLO, maybe you need re-check this part. > > > Our test was already follow those step. Maybe I could describe the detail > of the test flow and issues. > > > Step 1: > > Create primary VM without any netfilter and chardev for COLO, and using > other host ping primary VM continually. > > > Step 2: > > Create secondary VM (the same device/drive with primary VM), and do block > mirror sync ( ping to primary VM normally ) > > > Step 3: > > After block mirror sync finish, add those netfilter and chardev to primary > VM and secondary VM for COLO ( *Can't* ping to primary VM but those packets > will be received later ) > > > Step 4: > > Start migrate primary VM to secondary VM, and primary VM & secondary VM are > running ( ping to primary VM works and receive those packets on step 3 > status ) > > > > > Between Step 3 to Step 4, it will take 10~20 seconds in our environment. > > I could image this issue (delay reply packets) is because of setting COLO > proxy for temporary status, > > but we thought 10~20 seconds might a little long. (If primary VM is already > doing some jobs, it might lose the data.) > > > Could we reduce those time? or those delay is depends on different VM?
I think you need to set up the netfilter and chardev on the primary at the start; the filter contains the state of the TCP connections working with the VM, so adding it later can't gain that state for existing connections. Dave > > Best Regard, > > Daniel Cho. > > > > Zhang, Chen <chen.zh...@intel.com> 於 2019年11月30日 週六 上午2:04寫道: > > > > > > > > > > > *From:* Daniel Cho <daniel...@qnap.com> > > *Sent:* Friday, November 29, 2019 10:43 AM > > *To:* Zhang, Chen <chen.zh...@intel.com> > > *Cc:* Dr. David Alan Gilbert <dgilb...@redhat.com>; lukasstra...@web.de; > > qemu-devel@nongnu.org > > *Subject:* Re: Network connection with COLO VM > > > > > > > > Hi David, Zhang, > > > > > > > > Thanks for replying my question. > > > > We know why will occur this issue. > > > > As you said, the COLO VM's network needs > > > > colo-proxy to control packets, so the guest's > > > > interface should set the filter to solve the problem. > > > > > > > > But we found another question, when we set the > > > > fault-tolerance feature to guest (primary VM is running, > > > > secondary VM is pausing), the guest's network would not > > > > responds any request for a while (in our environment > > > > about 20~30 secs) after secondary VM runs. > > > > > > > > Does it be a normal situation, or a known issue? > > > > > > > > Our test is creating primary VM for a while, then creating > > > > secondary VM to make it with COLO feature. > > > > > > > > Hi Daniel, > > > > > > > > Happy to hear you have solved ssh disconnection issue. > > > > > > > > Do you use Lukas’s patch on this case? > > > > I think we need use block mirror to sync the disk to secondary node first, > > then stop the primary VM and build COLO system. > > > > In the stop moment, you need add some netfilter and chardev socket node > > for COLO, maybe you need re-check this part. > > > > > > > > Best Regard, > > > > Daniel Cho > > > > > > > > Zhang, Chen <chen.zh...@intel.com> 於 2019年11月28日 週四 上午9:26寫道: > > > > > > > > > -----Original Message----- > > > From: Dr. David Alan Gilbert <dgilb...@redhat.com> > > > Sent: Wednesday, November 27, 2019 6:51 PM > > > To: Daniel Cho <daniel...@qnap.com>; Zhang, Chen > > > <chen.zh...@intel.com>; lukasstra...@web.de > > > Cc: qemu-devel@nongnu.org > > > Subject: Re: Network connection with COLO VM > > > > > > * Daniel Cho (daniel...@qnap.com) wrote: > > > > Hello everyone, > > > > > > > > Could we ssh to colo VM (means PVM & SVM are starting)? > > > > > > > > > > Lets cc in Zhang Chen and Lukas Straub. > > > > Thanks Dave. > > > > > > > > > SSH will connect to colo VM for a while, but it will disconnect with > > > > error > > > > *client_loop: send disconnect: Broken pipe* > > > > > > > > It seems to colo VM could not keep network session. > > > > > > > > Does it be a known issue? > > > > > > That sounds like the COLO proxy is getting upset; it's supposed to > > compare > > > packets sent by the primary and secondary and only send one to the > > outside > > > - you shouldn't be talking directly to the guest, but always via the > > proxy. See > > > docs/colo-proxy.txt > > > > > > > Hi Daniel, > > > > I have try ssh to COLO guest with 8 hours, not occurred this issue. > > Please check your network/qemu configuration. > > But I found another problem maybe related this issue, if no network > > communication for a period of time(maybe 10min), the first message send to > > guest have a chance with delay(maybe 1-5 sec), I will try to fix it when I > > have time. > > > > Thanks > > Zhang Chen > > > > > Dave > > > > > > > Best Regard, > > > > Daniel Cho > > > -- > > > Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK > > > > -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK