Re: [Qemu-devel] Livelock with qemu 2.5.0

2016-12-02 Thread Brian Candler
On 02/12/2016 14:19, Brian Candler wrote: I am running a VM under qemu 2.5.0/Ubuntu 16.04. The guest VM is also Ubuntu 16.04, with ZFS and LXD Correction: it's btrfs (not zfs). I re-ran the test with qemu 2.7.0, and this time got a btrfs kernel panic in the guest [below]. The gue

[Qemu-devel] Livelock with qemu 2.5.0

2016-12-02 Thread Brian Candler
I am running a VM under qemu 2.5.0/Ubuntu 16.04. The guest VM is also Ubuntu 16.04, with ZFS and LXD, and I am hitting it hard with I/O, concurrently building multiple LXD containers. Previously it was serialized to build one container at a time, and it was fine. With the concurrent builds I

Re: [Qemu-devel] Crashing in tcp_close

2016-11-13 Thread Brian Candler
On 12/11/2016 10:44, Samuel Thibault wrote: Oops, sorry, my patch was completely bogus, here is a proper one. Excellent. I've run the original build process 18 times (each run takes about 25 minutes) without valgrind, and it hasn't crashed once. So this looks good. Thank you! Regards, Bri

Re: [Qemu-devel] Crashing in tcp_close

2016-11-12 Thread Brian Candler
On 12/11/2016 09:33, Brian Candler wrote: So I sent a SIGABRT, here is the backtrace: And here is some state from the core dump: (gdb) print so $1 = (struct socket *) 0x564b181fc940 (gdb) print *so $2 = {so_next = 0x564b18258c60, so_prev = 0x564b181fcb00, canary1 = -559038737, s = 28

Re: [Qemu-devel] Crashing in tcp_close

2016-11-12 Thread Brian Candler
On 11/11/2016 22:09, Samuel Thibault wrote: Ooh, I see. Now it's obvious, now that it's not coming from the tcb loop:) Could you try the attached patch? It looks like it now goes into an infinite loop when a connection is closed. Packer output stopped here: ... 2016/11/12 09:29:04 ui:

Re: [Qemu-devel] Crashing in tcp_close

2016-11-11 Thread Brian Candler
On 11/11/2016 16:17, Samuel Thibault wrote: Could you increase the value given to valgrind's --num-callers= so we can make sure the context of this call? OK: re-run with --num-callers=250. It took a few iterations, but I captured it again. (I have grepped out all the "invalid file descriptor"

Re: [Qemu-devel] Crashing in tcp_close

2016-11-11 Thread Brian Candler
On 11/11/2016 15:02, Brian Candler wrote: But over more than 10 runs (some with MALLOC_xxx_ and some without) it did not crash once :-( Aha!! Looking carefully at valgrind output, I see some definite cases of use-after-free in tcp_output. Does the info below help? Regards, Brian. ==18350

Re: [Qemu-devel] Crashing in tcp_close

2016-11-11 Thread Brian Candler
On 09/11/2016 11:27, Stefan Hajnoczi wrote: Heap corruption. Valgrind's memcheck tool could be fruitful here: http://valgrind.org/docs/manual/quick-start.html#quick-start.mcrun This is really frustrating. I have been running with the following script instead of invoking qemu directly: $ ca

Re: [Qemu-devel] Crashing in tcp_close

2016-11-08 Thread Brian Candler
On 07/11/2016 10:42, Stefan Hajnoczi wrote: On Mon, Nov 07, 2016 at 08:42:17AM +, Brian Candler wrote: >On 06/11/2016 18:04, Samuel Thibault wrote: > >Brian, could you run it with > > > >export MALLOC_CHECK_=2 > > > >and also this could be useful: &

Re: [Qemu-devel] Crashing in tcp_close

2016-11-08 Thread Brian Candler
On 07/11/2016 20:52, Brian Candler wrote: So either this means that using tap networking instead of user networking is fixing all the problems; or it is some other option which is different. Really I now need to run qemu with exactly the same settings as before, except with tap instead of user

Re: [Qemu-devel] Crashing in tcp_close

2016-11-07 Thread Brian Candler
On 07/11/2016 11:09, Brian Candler wrote: On 07/11/2016 10:42, Stefan Hajnoczi wrote: Let's try to isolate the cause of this crash: Are you able to switch -netdev user to -netdev tap so we can rule out the slirp user network stack as the source of memory corruption? Let me try to set th

Re: [Qemu-devel] Crashing in tcp_close

2016-11-07 Thread Brian Candler
On 07/11/2016 10:42, Stefan Hajnoczi wrote: Let's try to isolate the cause of this crash: Are you able to switch -netdev user to -netdev tap so we can rule out the slirp user network stack as the source of memory corruption? Let me try to set that up. Using packer.io, I will have to start a VM b

Re: [Qemu-devel] Crashing in tcp_close

2016-11-07 Thread Brian Candler
On 07/11/2016 08:42, Brian Candler wrote: The following crashes occurred when running with a single vcpu. Normally I have been running with -smp 8,sockets=1,cores=4,threads=2 as it seems to crash less with those settings; however I'm trying it again like that in a loop to see if I can

Re: [Qemu-devel] Crashing in tcp_close

2016-11-07 Thread Brian Candler
On 06/11/2016 18:04, Samuel Thibault wrote: Brian, could you run it with export MALLOC_CHECK_=2 and also this could be useful: export MALLOC_PERTURB_=1234 Also, to rule out the double-free scenario, and try to catch a buffer overflow coming from the socket structure itself, I have attached a

[Qemu-devel] Crashing in tcp_close

2016-10-20 Thread Brian Candler
gc=, argv=, envp=out>) at vl.c:4604 (gdb) So: * Is this of interest? * If so, what additional gdb output would you like me to provide? * If developers want to reproduce this, let me know and I can probably send the VM qcow2 file and/or packer source privately off-list [I need to check perm