Valerio Aimale <vale...@aimale.com> writes: > On 10/21/15 4:54 AM, Markus Armbruster wrote: >> Valerio Aimale <vale...@aimale.com> writes: >> >>> On 10/19/15 1:52 AM, Markus Armbruster wrote: >>>> Valerio Aimale <vale...@aimale.com> writes: >>>> >>>>> On 10/16/15 2:15 AM, Markus Armbruster wrote: >>>>>> vale...@aimale.com writes: >>>>>> >>>>>>> All- >>>>>>> >>>>>>> I've produced a patch for the current QEMU HEAD, for libvmi to >>>>>>> introspect QEMU/KVM VMs. >>>>>>> >>>>>>> Libvmi has patches for the old qeum-kvm fork, inside its source tree: >>>>>>> https://github.com/libvmi/libvmi/tree/master/tools/qemu-kvm-patch >>>>>>> >>>>>>> This patch adds a hmp and a qmp command, "pmemaccess". When the >>>>>>> commands is invoked with a string arguments (a filename), it will open >>>>>>> a UNIX socket and spawn a listening thread. >>>>>>> >>>>>>> The client writes binary commands to the socket, in the form of a c >>>>>>> structure: >>>>>>> >>>>>>> struct request { >>>>>>> uint8_t type; // 0 quit, 1 read, 2 write, ... rest reserved >>>>>>> uint64_t address; // address to read from OR write to >>>>>>> uint64_t length; // number of bytes to read OR write >>>>>>> }; >>>>>>> >>>>>>> The client receives as a response, either (length+1) bytes, if it is a >>>>>>> read operation, or 1 byte ifit is a write operation. >>>>>>> >>>>>>> The last bytes of a read operation response indicates success (1 >>>>>>> success, 0 failure). The single byte returned for a write operation >>>>>>> indicates same (1 success, 0 failure). >>>>>> So, if you ask to read 1 MiB, and it fails, you get back 1 MiB of >>>>>> garbage followed by the "it failed" byte? >>>>> Markus, that appear to be the case. However, I did not write the >>>>> communication protocol between libvmi and qemu. I'm assuming that the >>>>> person that wrote the protocol, did not want to bother with over >>>>> complicating things. >>>>> >>>>> https://github.com/libvmi/libvmi/blob/master/libvmi/driver/kvm/kvm.c >>>>> >>>>> I'm thinking he assumed reads would be small in size and the price of >>>>> reading garbage was less than the price of writing a more complicated >>>>> protocol. I can see his point, confronted with the same problem, I >>>>> might have done the same. >>>> All right, the interface is designed for *small* memory blocks then. >>>> >>>> Makes me wonder why he needs a separate binary protocol on a separate >>>> socket. Small blocks could be done just fine in QMP. >>> The problem is speed. if one's analyzing the memory space of a running >>> process (physical and paged), libvmi will make a large number of small >>> and mid-sized reads. If one uses xp, or pmemsave, the overhead is >>> quite significant. xp has overhead due to encoding, and pmemsave has >>> overhead due to file open/write (server), file open/read/close/unlink >>> (client). >>> >>> Others have gone through the problem before me. It appears that >>> pmemsave and xp are significantly slower than reading memory using a >>> socket via pmemaccess. >> That they're slower isn't surprising, but I'd expect the cost of >> encoding a small block to be insiginificant compared to the cost of the >> network roundtrips. >> >> As block size increases, the space overhead of encoding will eventually >> bite. But for that usage, the binary protocol appears ill-suited, >> unless the client can pretty reliably avoid read failure. I haven't >> examined its failure modes, yet. >> >>> The following data is not mine, but it shows the time, in >>> milliseconds, required to resolve the content of a paged memory >>> address via socket (pmemaccess) , pmemsave and xp >>> >>> http://cl.ly/image/322a3s0h1V05 >>> >>> Again, I did not produce those data points, they come from an old >>> libvmi thread. >> 90ms is a very long time. What exactly was measured? > That is a fair question to ask. Unfortunately, I extracted that data > plot from an old thread in some libvmi mailing list. I do not have the > data and code that produced it. Sifting through the thread, I can see > the code > was never published. I will take it upon myself to produce code that > compares timing - in a fair fashion - of libvmi doing an atomic > operation and a larger-scale operation (like listing running > processes) via gdb, pmemaccess/socket, pmemsave, xp, and hopefully, a > version of xp that returns byte streams of memory regions base64 or > base85 encoded in json strings. I'll publish results and code. > > However, given workload and life happening, it will be some time > before I complete that task.
No problem. I'd like to have your use case addressed, but there's no need for haste. [...] >>>>>>> Also, the pmemsave commands QAPI should be changed to be usable with >>>>>>> 64bit VM's >>>>>>> >>>>>>> in qapi-schema.json >>>>>>> >>>>>>> from >>>>>>> >>>>>>> --- >>>>>>> { 'command': 'pmemsave', >>>>>>> 'data': {'val': 'int', 'size': 'int', 'filename': 'str'} } >>>>>>> --- >>>>>>> >>>>>>> to >>>>>>> >>>>>>> --- >>>>>>> { 'command': 'pmemsave', >>>>>>> 'data': {'val': 'int64', 'size': 'int64', 'filename': 'str'} } >>>>>>> --- >>>>>> In the QAPI schema, 'int' is actually an alias for 'int64'. Yes, that's >>>>>> confusing. >>>>> I think it's confusing for the HMP parser too. If you have a VM with >>>>> 8Gb of RAM and want to snapshot the whole physical memory, via HMP >>>>> over telnet this is what happens: >>>>> >>>>> $ telnet localhost 1234 >>>>> Trying 127.0.0.1... >>>>> Connected to localhost. >>>>> Escape character is '^]'. >>>>> QEMU 2.4.0.1 monitor - type 'help' for more information >>>>> (qemu) help pmemsave >>>>> pmemsave addr size file -- save to disk physical memory dump starting >>>>> at 'addr' of size 'size' >>>>> (qemu) pmemsave 0 8589934591 "/tmp/memorydump" >>>>> 'pmemsave' has failed: integer is for 32-bit values >>>>> Try "help pmemsave" for more information >>>>> (qemu) quit >>>> Your change to pmemsave's definition in qapi-schema.json is effectively a >>>> no-op. >>>> >>>> Your example shows *HMP* command pmemsave. The definition of an HMP >>>> command is *independent* of the QMP command. The implementation *uses* >>>> the QMP command. >>>> >>>> QMP pmemsave is defined in qapi-schema.json as >>>> >>>> { 'command': 'pmemsave', >>>> 'data': {'val': 'int', 'size': 'int', 'filename': 'str'} } >>>> >>>> Its implementation is in cpus.c: >>>> >>>> void qmp_pmemsave(int64_t addr, int64_t size, const char *filename, >>>> Error **errp) >>>> >>>> Note the int64_t size. >>>> >>>> HMP pmemsave is defined in hmp-commands.hx as >>>> >>>> { >>>> .name = "pmemsave", >>>> .args_type = "val:l,size:i,filename:s", >>>> .params = "addr size file", >>>> .help = "save to disk physical memory dump starting at >>>> 'addr' of size 'size'", >>>> .mhandler.cmd = hmp_pmemsave, >>>> }, >>>> >>>> Its implementation is in hmp.c: >>>> >>>> void hmp_pmemsave(Monitor *mon, const QDict *qdict) >>>> { >>>> uint32_t size = qdict_get_int(qdict, "size"); >>>> const char *filename = qdict_get_str(qdict, "filename"); >>>> uint64_t addr = qdict_get_int(qdict, "val"); >>>> Error *err = NULL; >>>> >>>> qmp_pmemsave(addr, size, filename, &err); >>>> hmp_handle_error(mon, &err); >>>> } >>>> >>>> Note uint32_t size. >>>> >>>> Arguably, the QMP size argument should use 'size' (an alias for >>>> 'uint64'), and the HMP args_type should use 'size:o'. >>> Understand all that. Indeed, I've re-implemented 'pmemaccess' the same >>> way pmemsave is implemented. There is a single function, and two >>> points of entrance, one for HMP and one for QMP. I think pmemacess >>> mimics pmemsave closely. >>> >>> However, if one wants to simply dump a memory region, via HMP for >>> human easy of use/debug/testing purposes, one cannot dump memory >>> regions that resides higher than 2^32-1 >> Can you give an example? > Yes. I was trying to dump the full extent of physical memory of a VM > that has 8GB memory space (ballooned). I simply did this: > > $ telnet localhost 1234 > Trying 127.0.0.1... > Connected to localhost. > Escape character is '^]'. > QEMU 2.4.0.1 monitor - type 'help' for more information > (qemu) pmemsave 0 8589934591 "/tmp/memsaved" > 'pmemsave' has failed: integer is for 32-bit values > > Maybe I misunderstood how pmemsave works. Maybe I should have used > dump-guest-memory This is am unnecessary limitation caused by 'size:i' instead of 'size:o'. Fixable.