>-----Original Message----- >From: Stefan Hajnoczi [mailto:stefa...@gmail.com] >Sent: Thursday, September 19, 2013 10:14 PM >To: 'Mark Trumpold' >Cc: 'qemu-devel', 'Paul Clements', nbd-gene...@lists.sourceforge.net, >bonz...@stefanha-thinkpad.redhat.com, w...@uter.be >Subject: Re: [Qemu-devel] Hibernate and qemu-nbd > >On Thu, Sep 19, 2013 at 10:44 PM, Mark Trumpold <ma...@netqa.com> wrote: >> >>>-----Original Message----- >>>From: Stefan Hajnoczi [mailto:stefa...@gmail.com] >>>Sent: Wednesday, September 18, 2013 06:12 AM >>>To: 'Mark Trumpold' >>>Cc: qemu-devel@nongnu.org, 'Paul Clements', >>>nbd-gene...@lists.sourceforge.net, >>>bonz...@stefanha-thinkpad.redhat.com, w...@uter.be >>>Subject: Re: [Qemu-devel] Hibernate and qemu-nbd >>> >>>On Tue, Sep 17, 2013 at 07:10:44AM -0700, Mark Trumpold wrote: >>>> I am using the kernel functionality directly with the commands: >>>> echo platform >/sys/power/disk >>>> echo disk >/sys/power/state >>>> >>>> The following appears in dmesg when I attempt to hibernate: >>>> >>>> ==================================================== >>>> [ 38.881397] nbd (pid 1473: qemu-nbd) got signal 0 >>>> [ 38.881401] block nbd0: shutting down socket >>>> [ 38.881404] block nbd0: Receive control failed (result -4) >>>> [ 38.881417] block nbd0: queue cleared >>>> [ 87.463133] block nbd0: Attempted send on closed socket >>>> [ 87.463137] end_request: I/O error, dev nbd0, sector 66824 >>>> ==================================================== >>>> >>>> My environment: >>>> Debian: 6.0.5 >>>> Kernel: 3.3.1 >>>> Qemu userspace: 1.2.0 >>> >>>This could be a bug in the nbd client kernel module. >>>drivers/block/nbd.c:sock_xmit() does the following: >>> >>> result = kernel_recvmsg(sock, &msg, &iov, 1, size, >>> msg.msg_flags); >>> >>> if (signal_pending(current)) { >>> siginfo_t info; >>> printk(KERN_WARNING "nbd (pid %d: %s) got signal %d\n", >>> task_pid_nr(current), current->comm, >>> dequeue_signal_lock(current, ¤t->blocked, &info)); >>> result = -EINTR; >>> sock_shutdown(nbd, !send); >>> break; >>> } >>> >>>The signal number in the log output looks bogus, we shouldn't get 0. >>>sock_xmit() actually blocks all signals except SIGKILL before calling >>>kernel_recvmsg(). I guess this is an artifact of the suspend-to-disk >>>operation, maybe the signal pending flag is set on the process. >>> >>>Perhaps someone with a better understanding of the kernel internals can >>>check this? >>> >>>What happens next is that the nbd kernel module shuts down the NBD >>>connection. >>> >>>As a workaround, please try running a separate nbd-client(1) process and drop >>>the qemu-nbd -c command-line argument. This way nbd-client(1) uses the >>>nbd kernel module instead of the qemu-nbd process and you'll get the >>>benefit of nbd-client's automatic reconnect. >>> >>>Stefan >>> >> >> Hi Stefan, >> >> Thank you for the information. >> >> I did some experiments per you suggestion. Wasn't sure if the following >> was what you had in mind: >> >> 1) Configured 'nbd-server' and started (/etc/nbd-server/config): >> [generic] >> [export] >> exportname = /root/qemu/q1.img >> port = 2000 > >You can use qemu-nbd instead of nbd-server. This way you'll be able >to serve up qcow2 and other image formats. > >Just avoid the qemu-nbd -c option. This makes qemu-nbd purely run the >NBD network protocol and skips simultaneously running the kernel NBD >client. (Since qemu-nbd doesn't reconnect when ioctl(NBD_DO_IT) fails >with EINTR the workaround is to use nbd-client(1) to drive the kernel >NBD client instead.) > >> 2) Started 'nbd-client': >> -> nbd-client localhost 2000 /dev/nbd0 >> >> 3) Verify '/dev/nbd0' is in use (will appear in list): >> -> cat /proc/partitions >> >> At this point I could mount '/dev/nbd0' as expected, but not necessary >> to demonstrate a problem. >> >> Now at this point if I enter S1(standby), S3(suspend to ram), or >> S4(suspend to disk) I get the same dmesg as before indicating >> 'nbd0' caught signal 0 and exited. >> >> When I resume I simply repeat step #3 to verify. > >It's expected that you get the same kernel messages. The difference >should be that /dev/nbd0 is still accessible after resuming from disk >because nbd-client automatically reconnects after the nbd kernel >module bails out with EINTR. > >> ================== >> >> Also, previously before contacting the group I had modified the same >> kernel source that you had identified in 'drivers/block/nbd.c:sock_xmit()' >> to not take any action. This was strictly for troubleshooting: >> >> 199 result = kernel_recvmsg(sock, &msg, &iov, 1, size, >> 200 msg.msg_flags); >> 201 >> 202 if (signal_pending(current)) { >> 203 siginfo_t info; >> 204 printk(KERN_WARNING "nbd (pid %d: %s) got signal %d\n", >> 205 task_pid_nr(current), current->comm, >> 206 dequeue_signal_lock(current, >> ¤t->blocked,&info)); 207 >> 208 //result = -EINTR; >> 209 //sock_shutdown(nbd, !send); >> 210 //break; >> 211 } >> >> We then got errors ("Wrong magac ...) in the following section: >> >> /* NULL returned = something went wrong, inform userspace */ >> static struct request *nbd_read_stat(struct nbd_device *lo) >> { >> int result; >> struct nbd_reply reply; >> struct request *req; >> >> reply.magic = 0; >> result = sock_xmit(lo, 0, &reply, sizeof(reply), MSG_WAITALL); >> if (result <= 0) { >> dev_err(disk_to_dev(lo->disk), >> "Receive control failed (result %d)\n", result); >> goto harderror; >> } >> >> if (ntohl(reply.magic) != NBD_REPLY_MAGIC) { >> dev_err(disk_to_dev(lo->disk), "Wrong magic (0x%lx)\n", >> (unsigned long)ntohl(reply.magic)); >> result = -EPROTO; >> goto harderror; >> >> >> So, it seemed to me the call at line #199 above must be returning with >> error after we commented out the signal action logic. > >I'm not familiar enough with the code to say what is happening. As >the next step I would print out the kernel_recvmsg() return value when >the signal is pending and look into what happens during >suspend-to-disk (there's some sort of process freezing that takes >place). > >Sorry I can't be of more help. Hopefully someone more familiar with >the nbd kernel module will have time to chime in. > >Stefan >
Stefan, So, I tried the following: -> qemu-nbd -p 2000 /root/qemu/q1.img & -> nbd-client localhost 2000 /dev/nbd0 & At this point I can mount /dev/nbd0, etc. -> echo platform > /sys/power/disk -> echo disk >/sys/power/state At this point we are 'hibernated'. On power cycle, the OS seems to come back to the state before hibernation with exception to QEMU: nbd.c:nbd_receive_request():L517: read failed <-- on command line [78979.269039] Freezing user space processes ... [78979.269122] nbd (pid 2455: nbd-client) got signal 0 [78979.269127] block nbd0: shutting down socket [78979.269151] block nbd0: Receive control failed (result -4) [78979.269165] block nbd0: queue cleared ============================= Is this the correct test you were thinking? Thanks for your input! Regards, Mark T.