+1 for the qemu guest agent approach.
________________________________ From: Wido den Hollander <w...@widodh.nl> Sent: Saturday, April 13, 2019 2:32 PM To: dev@cloudstack.apache.org; Rohit Yadav Subject: Re: Latest Qemu KVM EV appears to be broken with ACS On 4/12/19 9:33 PM, Rohit Yadav wrote: > Thanks, I was already exploring a solution using qemu guest agent since > morning today. It just so happened that you also thought of the approach, and > I could validate my script to work with qemu ev 2.12 by the end of my day. > That would be great actually. The Qemu Guest Agent is a lot better to use. We might want to explore that indeed. Not for now, but it is a better option to talk to VMs imho. Wido > A proper fix might require some additional changes in cloud-early-config and > therefore a new systemvmtemplate for 4.13.0.0/4.11.3.0, I'll start a PR on > that in the following week(s). > > Regards. > > Regards, > Rohit Yadav > > ________________________________ > From: Marcus <shadow...@gmail.com> > Sent: Saturday, April 13, 2019 12:31:33 AM > To: dev@cloudstack.apache.org > Subject: Re: Latest Qemu KVM EV appears to be broken with ACS > > Wow, that was fast. Good work. > > The script seems to work for me. There was one case where I rebooted the > router and got the old link local IP somehow. I'm not sure if that was a > timing issue in seeing the existing /var/cache/cloud/cmdline before the new > one was written or what, but if it was a timing issue it would seem like we > should already have that problem with the existing cloud-early-config. > > On Fri, Apr 12, 2019 at 12:24 PM Rohit Yadav <rohit.ya...@shapeblue.com> > wrote: > >> Hi Marcus, Simon, >> >> >> I explore two of the short term solutions and I've a working (work in >> progress) script that replaces the patchviasocket script to use the qemu >> guest agent (that is installed in 4.11+ sytemvmtemplate). This was part of >> a scoping exercise for solving the patching problem for qemu 2.12+ (Ubuntu >> 19.04 has 3.x version). >> >> >> This is what I've so far, however, further testing is needed: >> >> https://gist.github.com/rhtyd/ddb42c4c7581c4129ca04fbb829f16cf >> >> >> The logic is completely written in bash as: >> >> - Try if we're able to contact the guest agent >> >> - Once we're able to connect, confirm that the I/O is not error prone >> >> - Then write the payload as file (the ssh public key and cmdline string) >> >> - Then fix file permissions >> - Hope that internally cloud-early-config would detect the cmdline we had >> saved and patching would work >> >> >> While this may work, for the long term a proper fix is needed that should >> be a standard patching mechanism across all hypervisors. >> >> >> Regards, >> >> Rohit Yadav >> >> Software Architect, ShapeBlue >> >> https://www.shapeblue.com >> >> ________________________________ >> From: Marcus <shadow...@gmail.com> >> Sent: Friday, April 12, 2019 11:30:46 PM >> To: dev@cloudstack.apache.org >> Subject: Re: Latest Qemu KVM EV appears to be broken with ACS >> >> Long ago it was a disk. The problem was that these disks had to go >> somewhere, a place where they could survive migrations, which didn't work >> well for block based primary storage... at least for the code base at the >> time. Using virtio socket was seen as a fairly standard way to communicate >> temporary information to the guest, and didn't require managing the >> lifecycle of a special disk. >> >> I believe the current problem is that the sender needs to remain connected >> until the receiver has read. Maybe socat does this, but if so we need to >> ensure that it is available and applied as a new RPM dependency. In my >> testing, waiting on the sender side didn't 100% fix things, or sometimes >> took a very long time due to the backoff algorithm on the >> cloud-early-config receiver. Some tweaks to that made it more robust, but >> it is still a game of trying to coordinate timing of two services on either >> end. If it works though, I'm all for it. >> >> Just to throw another idea out there... If we want to fix this without >> involving storage, I might suggest switching to the qemu-guest-agent that >> now exists, with a socket and listening client already in the system vm. >> This would be far more robust, I think, than our scripting reading unix >> sockets without any sort of protocol or buffer control considerations, and >> would likely be more robust to changes in qemu as the guest agent is the >> primary target for the feature. >> >> We can directly write our /var/cache/cloud/cmdline from the host like so >> (I'm using virsh but we could perhaps communicate with the guest agent >> socket directly or via socat): >> >> virsh qemu-agent-command 19 '{"execute":"guest-file-open", >> "arguments":{"path":"/tmp/testfile","mode":"w+"}}' >> {"return":1001} >> >> virsh qemu-agent-command 19 '{"execute":"guest-file-write", >> "arguments":{"handle":1001,"buf-b64":"Zm9vIHdhcyBoZXJlCg=="}}' >> {"return":{"count":13,"eof":false}} >> >> virsh qemu-agent-command 19 '{"execute":"guest-file-close", >> "arguments":{"handle":1001}}' >> {"return":{}} >> >> root@r-54850-VM:~# cat /tmp/testfile >> foo was here >> >> We are also able to detect via libvirt that the qemu guest agent is up and >> ready. You can see it in the XML when you list a VM. >> >> We do need to keep other hypervisors in mind. This is just an option for a >> fix that doesn't involve a larger redesign. >> >> On Fri, Apr 12, 2019 at 10:21 AM Rohit Yadav <rohit.ya...@shapeblue.com> >> wrote: >> >>> Hi Simon, >>> >>> >>> I'm exploring a solution for the same, I've found that the python based >>> patching script fails to wait for the message to be written on the unix >>> socket before that the socket is closed. I reckon this could be related >> to >>> serial port device handling related changes in qemu-ev 2.12, as the same >>> mechanism used to work in past versions. >>> >>> >>> I'm exploring/testing a solution where I replace the python based >> patching >>> script into a bash one. Can you test the following in your envrionment >>> (ensure socat is installed), just backup and replace the >> patchviasocket.py >>> file with this: >>> >>> https://gist.github.com/rhtyd/aab23357fef2d8a530c0e83ec8be10c5 >>> >>> >>> The short term solution would be one of the ways to ensure patching works >>> without much change in the scripts or systemvmtemplate. However, longer >>> term we need to explore and standardize patching mechanism across all >>> hypervisors, for example by using a small payload via a config drive iso. >>> >>> >>> Regards, >>> >>> Rohit Yadav >>> >>> Software Architect, ShapeBlue >>> >>> https://www.shapeblue.com >>> >>> ________________________________ >>> From: Simon Weller <swel...@ena.com.INVALID> >>> Sent: Friday, April 12, 2019 8:29:04 PM >>> To: dev; users >>> Subject: Latest Qemu KVM EV appears to be broken with ACS >>> >>> All, >>> >>> After troubleshooting a strange issue with a new lab environment >>> yesterday, it appears that the patchviasocket functionality we rely on >> for >>> key and ip injection into our router/SSVM/CPVM images is broken with >>> qemu-kvm-ev-2.12.0-18.el7 (January 2019 release). This was tested on >> Centos >>> 7.6. >>> No data is injected and this was confirmed using socat on /dev/vport0p1. >>> qemu-kvm-ev-2.10.0-21.el7_5.7.1 works, so hopefully this will save >> someone >>> some pain and suffering trying to figure out why the deployed seems >> broken. >>> >>> We're going to dig in and see if can figure out the patches responsible >>> for it breaking. >>> >>> -Si >>> >>> >>> >>> rohit.ya...@shapeblue.com >>> www.shapeblue.com<http://www.shapeblue.com> >>> Amadeus House, Floral Street, London WC2E 9DPUK >>> @shapeblue >>> >>> >>> >>> >> >> rohit.ya...@shapeblue.com >> www.shapeblue.com<http://www.shapeblue.com> >> Amadeus House, Floral Street, London WC2E 9DPUK >> @shapeblue >> >> >> >> > > rohit.ya...@shapeblue.com > www.shapeblue.com<http://www.shapeblue.com> > Amadeus House, Floral Street, London WC2E 9DPUK > @shapeblue > > >