Thanks, I was already exploring a solution using qemu guest agent since morning today. It just so happened that you also thought of the approach, and I could validate my script to work with qemu ev 2.12 by the end of my day.
A proper fix might require some additional changes in cloud-early-config and therefore a new systemvmtemplate for 4.13.0.0/4.11.3.0, I'll start a PR on that in the following week(s). Regards. Regards, Rohit Yadav ________________________________ From: Marcus <shadow...@gmail.com> Sent: Saturday, April 13, 2019 12:31:33 AM To: dev@cloudstack.apache.org Subject: Re: Latest Qemu KVM EV appears to be broken with ACS Wow, that was fast. Good work. The script seems to work for me. There was one case where I rebooted the router and got the old link local IP somehow. I'm not sure if that was a timing issue in seeing the existing /var/cache/cloud/cmdline before the new one was written or what, but if it was a timing issue it would seem like we should already have that problem with the existing cloud-early-config. On Fri, Apr 12, 2019 at 12:24 PM Rohit Yadav <rohit.ya...@shapeblue.com> wrote: > Hi Marcus, Simon, > > > I explore two of the short term solutions and I've a working (work in > progress) script that replaces the patchviasocket script to use the qemu > guest agent (that is installed in 4.11+ sytemvmtemplate). This was part of > a scoping exercise for solving the patching problem for qemu 2.12+ (Ubuntu > 19.04 has 3.x version). > > > This is what I've so far, however, further testing is needed: > > https://gist.github.com/rhtyd/ddb42c4c7581c4129ca04fbb829f16cf > > > The logic is completely written in bash as: > > - Try if we're able to contact the guest agent > > - Once we're able to connect, confirm that the I/O is not error prone > > - Then write the payload as file (the ssh public key and cmdline string) > > - Then fix file permissions > - Hope that internally cloud-early-config would detect the cmdline we had > saved and patching would work > > > While this may work, for the long term a proper fix is needed that should > be a standard patching mechanism across all hypervisors. > > > Regards, > > Rohit Yadav > > Software Architect, ShapeBlue > > https://www.shapeblue.com > > ________________________________ > From: Marcus <shadow...@gmail.com> > Sent: Friday, April 12, 2019 11:30:46 PM > To: dev@cloudstack.apache.org > Subject: Re: Latest Qemu KVM EV appears to be broken with ACS > > Long ago it was a disk. The problem was that these disks had to go > somewhere, a place where they could survive migrations, which didn't work > well for block based primary storage... at least for the code base at the > time. Using virtio socket was seen as a fairly standard way to communicate > temporary information to the guest, and didn't require managing the > lifecycle of a special disk. > > I believe the current problem is that the sender needs to remain connected > until the receiver has read. Maybe socat does this, but if so we need to > ensure that it is available and applied as a new RPM dependency. In my > testing, waiting on the sender side didn't 100% fix things, or sometimes > took a very long time due to the backoff algorithm on the > cloud-early-config receiver. Some tweaks to that made it more robust, but > it is still a game of trying to coordinate timing of two services on either > end. If it works though, I'm all for it. > > Just to throw another idea out there... If we want to fix this without > involving storage, I might suggest switching to the qemu-guest-agent that > now exists, with a socket and listening client already in the system vm. > This would be far more robust, I think, than our scripting reading unix > sockets without any sort of protocol or buffer control considerations, and > would likely be more robust to changes in qemu as the guest agent is the > primary target for the feature. > > We can directly write our /var/cache/cloud/cmdline from the host like so > (I'm using virsh but we could perhaps communicate with the guest agent > socket directly or via socat): > > virsh qemu-agent-command 19 '{"execute":"guest-file-open", > "arguments":{"path":"/tmp/testfile","mode":"w+"}}' > {"return":1001} > > virsh qemu-agent-command 19 '{"execute":"guest-file-write", > "arguments":{"handle":1001,"buf-b64":"Zm9vIHdhcyBoZXJlCg=="}}' > {"return":{"count":13,"eof":false}} > > virsh qemu-agent-command 19 '{"execute":"guest-file-close", > "arguments":{"handle":1001}}' > {"return":{}} > > root@r-54850-VM:~# cat /tmp/testfile > foo was here > > We are also able to detect via libvirt that the qemu guest agent is up and > ready. You can see it in the XML when you list a VM. > > We do need to keep other hypervisors in mind. This is just an option for a > fix that doesn't involve a larger redesign. > > On Fri, Apr 12, 2019 at 10:21 AM Rohit Yadav <rohit.ya...@shapeblue.com> > wrote: > > > Hi Simon, > > > > > > I'm exploring a solution for the same, I've found that the python based > > patching script fails to wait for the message to be written on the unix > > socket before that the socket is closed. I reckon this could be related > to > > serial port device handling related changes in qemu-ev 2.12, as the same > > mechanism used to work in past versions. > > > > > > I'm exploring/testing a solution where I replace the python based > patching > > script into a bash one. Can you test the following in your envrionment > > (ensure socat is installed), just backup and replace the > patchviasocket.py > > file with this: > > > > https://gist.github.com/rhtyd/aab23357fef2d8a530c0e83ec8be10c5 > > > > > > The short term solution would be one of the ways to ensure patching works > > without much change in the scripts or systemvmtemplate. However, longer > > term we need to explore and standardize patching mechanism across all > > hypervisors, for example by using a small payload via a config drive iso. > > > > > > Regards, > > > > Rohit Yadav > > > > Software Architect, ShapeBlue > > > > https://www.shapeblue.com > > > > ________________________________ > > From: Simon Weller <swel...@ena.com.INVALID> > > Sent: Friday, April 12, 2019 8:29:04 PM > > To: dev; users > > Subject: Latest Qemu KVM EV appears to be broken with ACS > > > > All, > > > > After troubleshooting a strange issue with a new lab environment > > yesterday, it appears that the patchviasocket functionality we rely on > for > > key and ip injection into our router/SSVM/CPVM images is broken with > > qemu-kvm-ev-2.12.0-18.el7 (January 2019 release). This was tested on > Centos > > 7.6. > > No data is injected and this was confirmed using socat on /dev/vport0p1. > > qemu-kvm-ev-2.10.0-21.el7_5.7.1 works, so hopefully this will save > someone > > some pain and suffering trying to figure out why the deployed seems > broken. > > > > We're going to dig in and see if can figure out the patches responsible > > for it breaking. > > > > -Si > > > > > > > > rohit.ya...@shapeblue.com > > www.shapeblue.com<http://www.shapeblue.com> > > Amadeus House, Floral Street, London WC2E 9DPUK > > @shapeblue > > > > > > > > > > rohit.ya...@shapeblue.com > www.shapeblue.com<http://www.shapeblue.com> > Amadeus House, Floral Street, London WC2E 9DPUK > @shapeblue > > > > rohit.ya...@shapeblue.comĀ www.shapeblue.com Amadeus House, Floral Street, London WC2E 9DPUK @shapeblue