Re: [Qemu-devel] Hibernate and qemu-nbd

Mark Trumpold Fri, 20 Sep 2013 11:39:37 -0700

>-----Original Message-----
>From: Stefan Hajnoczi [mailto:stefa...@gmail.com]
>Sent: Thursday, September 19, 2013 10:14 PM
>To: 'Mark Trumpold'
>Cc: 'qemu-devel', 'Paul Clements', nbd-gene...@lists.sourceforge.net, 
>bonz...@stefanha-thinkpad.redhat.com, w...@uter.be
>Subject: Re: [Qemu-devel] Hibernate and qemu-nbd
>
>On Thu, Sep 19, 2013 at 10:44 PM, Mark Trumpold <ma...@netqa.com> wrote:
>>
>>>-----Original Message-----
>>>From: Stefan Hajnoczi [mailto:stefa...@gmail.com]
>>>Sent: Wednesday, September 18, 2013 06:12 AM
>>>To: 'Mark Trumpold'
>>>Cc: qemu-devel@nongnu.org, 'Paul Clements', 
>>>nbd-gene...@lists.sourceforge.net,
>>>bonz...@stefanha-thinkpad.redhat.com, w...@uter.be
>>>Subject: Re: [Qemu-devel] Hibernate and qemu-nbd
>>>
>>>On Tue, Sep 17, 2013 at 07:10:44AM -0700, Mark Trumpold wrote:
>>>> I am using the kernel functionality directly with the commands:
>>>>     echo platform >/sys/power/disk
>>>>     echo disk >/sys/power/state
>>>>
>>>> The following appears in dmesg when I attempt to hibernate:
>>>>
>>>> ====================================================
>>>> [   38.881397] nbd (pid 1473: qemu-nbd) got signal 0
>>>> [   38.881401] block nbd0: shutting down socket
>>>> [   38.881404] block nbd0: Receive control failed (result -4)
>>>> [   38.881417] block nbd0: queue cleared
>>>> [   87.463133] block nbd0: Attempted send on closed socket
>>>> [   87.463137] end_request: I/O error, dev nbd0, sector 66824
>>>> ====================================================
>>>>
>>>> My environment:
>>>>   Debian: 6.0.5
>>>>   Kernel: 3.3.1
>>>>   Qemu userspace: 1.2.0
>>>
>>>This could be a bug in the nbd client kernel module.
>>>drivers/block/nbd.c:sock_xmit() does the following:
>>>
>>>            result = kernel_recvmsg(sock, &msg, &iov, 1, size,
>>>                                    msg.msg_flags);
>>>
>>>    if (signal_pending(current)) {
>>>            siginfo_t info;
>>>            printk(KERN_WARNING "nbd (pid %d: %s) got signal %d\n",
>>>                    task_pid_nr(current), current->comm,
>>>                    dequeue_signal_lock(current, &current->blocked, &info));
>>>            result = -EINTR;
>>>            sock_shutdown(nbd, !send);
>>>            break;
>>>    }
>>>
>>>The signal number in the log output looks bogus, we shouldn't get 0.
>>>sock_xmit() actually blocks all signals except SIGKILL before calling
>>>kernel_recvmsg().  I guess this is an artifact of the suspend-to-disk
>>>operation, maybe the signal pending flag is set on the process.
>>>
>>>Perhaps someone with a better understanding of the kernel internals can
>>>check this?
>>>
>>>What happens next is that the nbd kernel module shuts down the NBD 
>>>connection.
>>>
>>>As a workaround, please try running a separate nbd-client(1) process and drop
>>>the qemu-nbd -c command-line argument.  This way nbd-client(1) uses the
>>>nbd kernel module instead of the qemu-nbd process and you'll get the
>>>benefit of nbd-client's automatic reconnect.
>>>
>>>Stefan
>>>
>>
>> Hi Stefan,
>>
>> Thank you for the information.
>>
>> I did some experiments per you suggestion.  Wasn't sure if the following
>> was what you had in mind:
>>
>> 1) Configured 'nbd-server' and started (/etc/nbd-server/config):
>>   [generic]
>>   [export]
>>     exportname = /root/qemu/q1.img
>>     port = 2000
>
>You can use qemu-nbd instead of nbd-server.  This way you'll be able
>to serve up qcow2 and other image formats.
>
>Just avoid the qemu-nbd -c option.  This makes qemu-nbd purely run the
>NBD network protocol and skips simultaneously running the kernel NBD
>client.  (Since qemu-nbd doesn't reconnect when ioctl(NBD_DO_IT) fails
>with EINTR the workaround is to use nbd-client(1) to drive the kernel
>NBD client instead.)
>
>> 2) Started 'nbd-client':
>>    -> nbd-client localhost 2000 /dev/nbd0
>>
>> 3) Verify '/dev/nbd0' is in use (will appear in list):
>>    -> cat /proc/partitions
>>
>> At this point I could mount '/dev/nbd0' as expected, but not necessary
>> to demonstrate a problem.
>>
>> Now at this point if I enter S1(standby), S3(suspend to ram), or
>> S4(suspend to disk) I get the same dmesg as before indicating
>> 'nbd0' caught signal 0 and exited.
>>
>> When I resume I simply repeat step #3 to verify.
>
>It's expected that you get the same kernel messages.  The difference
>should be that /dev/nbd0 is still accessible after resuming from disk
>because nbd-client automatically reconnects after the nbd kernel
>module bails out with EINTR.
>
>> ==================
>>
>> Also, previously before contacting the group I had modified the same
>> kernel source that you had identified in 'drivers/block/nbd.c:sock_xmit()'
>> to not take any action.  This was strictly for troubleshooting:
>>
>> 199            result = kernel_recvmsg(sock, &msg, &iov, 1, size,
>> 200                                    msg.msg_flags);
>> 201
>> 202    if (signal_pending(current)) {
>> 203            siginfo_t info;
>> 204            printk(KERN_WARNING "nbd (pid %d: %s) got signal %d\n",
>> 205                    task_pid_nr(current), current->comm,
>> 206                    dequeue_signal_lock(current, 
>> &current->blocked,&info)); 207
>> 208            //result = -EINTR;
>> 209            //sock_shutdown(nbd, !send);
>> 210            //break;
>> 211    }
>>
>> We then got errors ("Wrong magac ...) in the following section:
>>
>> /* NULL returned = something went wrong, inform userspace */
>> static struct request *nbd_read_stat(struct nbd_device *lo)
>> {
>>         int result;
>>         struct nbd_reply reply;
>>         struct request *req;
>>
>>         reply.magic = 0;
>>         result = sock_xmit(lo, 0, &reply, sizeof(reply), MSG_WAITALL);
>>         if (result <= 0) {
>>                 dev_err(disk_to_dev(lo->disk),
>>                         "Receive control failed (result %d)\n", result);
>>                 goto harderror;
>>         }
>>
>>         if (ntohl(reply.magic) != NBD_REPLY_MAGIC) {
>>                 dev_err(disk_to_dev(lo->disk), "Wrong magic (0x%lx)\n",
>>                                 (unsigned long)ntohl(reply.magic));
>>                 result = -EPROTO;
>>                 goto harderror;
>>
>>
>> So, it seemed to me the call at line #199 above must be returning with
>> error after we commented out the signal action logic.
>
>I'm not familiar enough with the code to say what is happening.  As
>the next step I would print out the kernel_recvmsg() return value when
>the signal is pending and look into what happens during
>suspend-to-disk (there's some sort of process freezing that takes
>place).
>
>Sorry I can't be of more help.  Hopefully someone more familiar with
>the nbd kernel module will have time to chime in.
>
>Stefan
>


Stefan,

So, I tried the following:

  -> qemu-nbd -p 2000 /root/qemu/q1.img &
  -> nbd-client localhost 2000 /dev/nbd0 &

At this point I can mount /dev/nbd0, etc.

  -> echo platform > /sys/power/disk
  -> echo disk >/sys/power/state

At this point we are 'hibernated'.
On power cycle, the OS seems to come back to the state
before hibernation with exception to QEMU:

  nbd.c:nbd_receive_request():L517: read failed  <-- on command line

[78979.269039] Freezing user space processes ...
[78979.269122] nbd (pid 2455: nbd-client) got signal 0
[78979.269127] block nbd0: shutting down socket
[78979.269151] block nbd0: Receive control failed (result -4)
[78979.269165] block nbd0: queue cleared

=============================

Is this the correct test you were thinking?

Thanks for your input!

Regards,
Mark T.

Re: [Qemu-devel] Hibernate and qemu-nbd

Reply via email to