+1 for the qemu guest agent approach.

________________________________
From: Wido den Hollander <w...@widodh.nl>
Sent: Saturday, April 13, 2019 2:32 PM
To: dev@cloudstack.apache.org; Rohit Yadav
Subject: Re: Latest Qemu KVM EV appears to be broken with ACS



On 4/12/19 9:33 PM, Rohit Yadav wrote:
> Thanks, I was already exploring a solution using qemu guest agent since 
> morning today. It just so happened that you also thought of the approach, and 
> I could validate my script to work with qemu ev 2.12 by the end of my day.
>

That would be great actually. The Qemu Guest Agent is a lot better to
use. We might want to explore that indeed. Not for now, but it is a
better option to talk to VMs imho.

Wido

> A proper fix might require some additional changes in cloud-early-config and 
> therefore a new systemvmtemplate for 4.13.0.0/4.11.3.0, I'll start a PR on 
> that in the following week(s).
>
> Regards.
>
> Regards,
> Rohit Yadav
>
> ________________________________
> From: Marcus <shadow...@gmail.com>
> Sent: Saturday, April 13, 2019 12:31:33 AM
> To: dev@cloudstack.apache.org
> Subject: Re: Latest Qemu KVM EV appears to be broken with ACS
>
> Wow, that was fast. Good work.
>
> The script seems to work for me. There was one case where I rebooted the
> router and got the old link local IP somehow. I'm not sure if that was a
> timing issue in seeing the existing /var/cache/cloud/cmdline before the new
> one was written or what, but if it was a timing issue it would seem like we
> should already have that problem with the existing cloud-early-config.
>
> On Fri, Apr 12, 2019 at 12:24 PM Rohit Yadav <rohit.ya...@shapeblue.com>
> wrote:
>
>> Hi Marcus, Simon,
>>
>>
>> I explore two of the short term solutions and I've a working (work in
>> progress) script that replaces the patchviasocket script to use the qemu
>> guest agent (that is installed in 4.11+ sytemvmtemplate). This was part of
>> a scoping exercise for solving the patching problem for qemu 2.12+ (Ubuntu
>> 19.04 has 3.x version).
>>
>>
>> This is what I've so far, however, further testing is needed:
>>
>> https://gist.github.com/rhtyd/ddb42c4c7581c4129ca04fbb829f16cf
>>
>>
>> The logic is completely written in bash as:
>>
>> - Try if we're able to contact the guest agent
>>
>> - Once we're able to connect, confirm that the I/O is not error prone
>>
>> - Then write the payload as file (the ssh public key and cmdline string)
>>
>> - Then fix file permissions
>> - Hope that internally cloud-early-config would detect the cmdline we had
>> saved and patching would work
>>
>>
>> While this may work, for the long term a proper fix is needed that should
>> be a standard patching mechanism across all hypervisors.
>>
>>
>> Regards,
>>
>> Rohit Yadav
>>
>> Software Architect, ShapeBlue
>>
>> https://www.shapeblue.com
>>
>> ________________________________
>> From: Marcus <shadow...@gmail.com>
>> Sent: Friday, April 12, 2019 11:30:46 PM
>> To: dev@cloudstack.apache.org
>> Subject: Re: Latest Qemu KVM EV appears to be broken with ACS
>>
>> Long ago it was a disk. The problem was that these disks had to go
>> somewhere, a place where they could survive migrations, which didn't work
>> well for block based primary storage... at least for the code base at the
>> time. Using virtio socket was seen as a fairly standard way to communicate
>> temporary information to the guest, and didn't require managing the
>> lifecycle of a special disk.
>>
>> I believe the current problem is that the sender needs to remain connected
>> until the receiver has read. Maybe socat does this, but if so we need to
>> ensure that it is available and applied as a new RPM dependency. In my
>> testing, waiting on the sender side didn't 100% fix things, or sometimes
>> took a very long time due to the backoff algorithm on the
>> cloud-early-config receiver. Some tweaks to that made it more robust, but
>> it is still a game of trying to coordinate timing of two services on either
>> end. If it works though, I'm all for it.
>>
>> Just to throw another idea out there... If we want to fix this without
>> involving storage, I might suggest switching to the qemu-guest-agent that
>> now exists, with a socket and listening client already in the system vm.
>> This would be far more robust, I think, than our scripting reading unix
>> sockets without any sort of protocol or buffer control considerations, and
>> would likely be more robust to changes in qemu as the guest agent is the
>> primary target for the feature.
>>
>> We can directly write our /var/cache/cloud/cmdline from the host like so
>> (I'm using virsh but we could perhaps communicate with the guest agent
>> socket directly or via socat):
>>
>> virsh qemu-agent-command 19 '{"execute":"guest-file-open",
>> "arguments":{"path":"/tmp/testfile","mode":"w+"}}'
>> {"return":1001}
>>
>> virsh qemu-agent-command 19 '{"execute":"guest-file-write",
>> "arguments":{"handle":1001,"buf-b64":"Zm9vIHdhcyBoZXJlCg=="}}'
>> {"return":{"count":13,"eof":false}}
>>
>> virsh qemu-agent-command 19 '{"execute":"guest-file-close",
>> "arguments":{"handle":1001}}'
>> {"return":{}}
>>
>> root@r-54850-VM:~# cat /tmp/testfile
>> foo was here
>>
>> We are also able to detect via libvirt that the qemu guest agent is up and
>> ready. You can see it in the XML when you list a VM.
>>
>> We do need to keep other hypervisors in mind. This is just an option for a
>> fix that doesn't involve a larger redesign.
>>
>> On Fri, Apr 12, 2019 at 10:21 AM Rohit Yadav <rohit.ya...@shapeblue.com>
>> wrote:
>>
>>> Hi Simon,
>>>
>>>
>>> I'm exploring a solution for the same, I've found that the python based
>>> patching script fails to wait for the message to be written on the unix
>>> socket before that the socket is closed. I reckon this could be related
>> to
>>> serial port device handling related changes in qemu-ev 2.12, as the same
>>> mechanism used to work in past versions.
>>>
>>>
>>> I'm exploring/testing a solution where I replace the python based
>> patching
>>> script into a bash one. Can you test the following in your envrionment
>>> (ensure socat is installed), just backup and replace the
>> patchviasocket.py
>>> file with this:
>>>
>>> https://gist.github.com/rhtyd/aab23357fef2d8a530c0e83ec8be10c5
>>>
>>>
>>> The short term solution would be one of the ways to ensure patching works
>>> without much change in the scripts or systemvmtemplate. However, longer
>>> term we need to explore and standardize patching mechanism across all
>>> hypervisors, for example by using a small payload via a config drive iso.
>>>
>>>
>>> Regards,
>>>
>>> Rohit Yadav
>>>
>>> Software Architect, ShapeBlue
>>>
>>> https://www.shapeblue.com
>>>
>>> ________________________________
>>> From: Simon Weller <swel...@ena.com.INVALID>
>>> Sent: Friday, April 12, 2019 8:29:04 PM
>>> To: dev; users
>>> Subject: Latest Qemu KVM EV appears to be broken with ACS
>>>
>>> All,
>>>
>>> After troubleshooting a strange issue with a new lab environment
>>> yesterday, it appears that the patchviasocket functionality we rely on
>> for
>>> key and ip injection into our router/SSVM/CPVM images is broken with
>>> qemu-kvm-ev-2.12.0-18.el7 (January 2019 release). This was tested on
>> Centos
>>> 7.6.
>>> No data is injected and this was confirmed using socat on /dev/vport0p1.
>>> qemu-kvm-ev-2.10.0-21.el7_5.7.1 works, so hopefully this will save
>> someone
>>> some pain and suffering trying to figure out why the deployed seems
>> broken.
>>>
>>> We're going to dig in and see if can figure out the patches responsible
>>> for it breaking.
>>>
>>> -Si
>>>
>>>
>>>
>>> rohit.ya...@shapeblue.com
>>> www.shapeblue.com<http://www.shapeblue.com>
>>> Amadeus House, Floral Street, London  WC2E 9DPUK
>>> @shapeblue
>>>
>>>
>>>
>>>
>>
>> rohit.ya...@shapeblue.com
>> www.shapeblue.com<http://www.shapeblue.com>
>> Amadeus House, Floral Street, London  WC2E 9DPUK
>> @shapeblue
>>
>>
>>
>>
>
> rohit.ya...@shapeblue.com
> www.shapeblue.com<http://www.shapeblue.com>
> Amadeus House, Floral Street, London  WC2E 9DPUK
> @shapeblue
>
>
>

Reply via email to