Re: [Xen-devel] Kernel panic on Xen virtualisation in Debian
Hi, few months later Ingo decided again to give it a try as he really doesn't want to keep ipv6 disabled in 2016. He tried Xen 4.8 - which didn't help, the crash reappeared. He then managed to build Xen with debug=y and soon it crashed with the following output, which looks a little bit longer than without debug: http://paste.debian.net/895464/ If this still doesn't help, we would really appreciate more information on how to do proper debugging, the information we found online is either very old, confusing - or it's hidden very good? Andreas. Original-Nachricht Betreff: Re: [Xen-devel] Kernel panic on Xen virtualisation in Debian Von: Wei Liu An: Ingo Jürgensmann Datum: 2.8.2016, 14:37:58 > On Tue, Aug 02, 2016 at 12:30:30PM +0200, Ingo Jürgensmann wrote: >> On 02.08.2016 11:20, Wei Liu wrote: >>> On Fri, Jul 29, 2016 at 10:17:22PM +0200, Ingo Jürgensmann wrote: >>> What is also interesting is that you seem to be running some sort of >>> ip accounting software (pmacctd) which also segfault'ed. >> >> Yeah, it is segfaulting, because the database (in a domU VM) where it is >> storing the accounting is not yet available after the crash. When database >> is up&running, those segfaults go away. >> > > At least we can now rule out that it is not related to the issue you > reported. > >>> Still not sure what to make of that though. >> >> Me neither. ;-) >> >> I already tried to get a core dump by setting ulimit -c unlimited, but that >> didn't work as well, which makes me believe that the crash happens in >> hypervisor not in dom0 kernel. When it's dom0 kernel I would expect dumping >> a core file should work. >> > > We can't draw the conclusion that the crash is in hypervisor yet. If > your dom0 crash, hypervisor would normally decide to reboot the machine. > > Wei. > ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] Kernel panic on Xen virtualisation in Debian
> On Mon, Jul 25, 2016 at 11:38:20AM +0200, Ingo Jürgensmann wrote: >> On 22.07.2016 12:21, Ingo Jürgensmann wrote: >>> On 22.07.2016 11:03, Ingo Jürgensmann wrote: >>> In the meanwhile, I activated IPv6 again on Tuesday evening and today the server crashed again some minutes ago. Here's the output from netconsole: >>> ... and the second subsequent crash: >> >> ... and another crash below... >> >> But in the meanwhile, I wonder if anyone except Andreas and I has some >> interest in fixing this issue, as there were absolutely no comments or >> feedback on my previous mails. Quite disappointing. :-/ >> > > I did skim your emails. But the oops was happening in memcpy+0x6 which > indicated it came back to the origin question why would it got an > exception there. > [...] > Your report and the debian report suggested that Dom0 kernel is less > likely to be the culprit because you've tried different Dom0 kernels. yes we did. but nothing newer than 3.16 so far, we could try that, too. > As for Xen, not sure if you would be up for trying a debug build from > source tree. That would help provide information on whether this is a > bug in Xen or not. ok, we'll try! > As for hardware, it would be worth trying whether this issue happens on > other hardware platform. as i wrote earlier on the list, we have two platforms which only have the Intel Xeon CPU in common, nothing else - and even that is from a very different generation. Andreas. ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] Kernel panic on Xen virtualisation in Debian
>> Your report and the debian report suggested that Dom0 kernel is less >> likely to be the culprit because you've tried different Dom0 kernels. > > yes we did. but nothing newer than 3.16 so far, we could try that, too. i have to correct myself: we also tried 4.4 a while ago, maybe we should try 4.6 now ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] Kernel panic on Xen virtualisation in Debian
Hello Wei, we tried with kernel 4.6 now, the crashed happened again, though. next we want to try the Xen debug build, but we couldn't find any information on how to enable debug for the build, perhaps you could give us a hint. - Andreas Original-Nachricht Betreff: Re: [Xen-devel] Kernel panic on Xen virtualisation in Debian Von: Wei Liu An: Ingo Jürgensmann Datum: 25.7.2016, 15:13:06 > On Mon, Jul 25, 2016 at 01:41:41PM +0200, Ingo Jürgensmann wrote: >> On 25.07.2016 12:23, Wei Liu wrote: >> >> First, thank you for replying! Very much appreciated! :) >> >>> I did skim your emails. But the oops was happening in memcpy+0x6 which >>> indicated it came back to the origin question why would it got an >>> exception there. >>> >>> Just by staring at the code doesn't get me anywhere. Without a concrete >>> reproduction of the issue, I'm afraid I can't provide more input here. >> >> Well, from my point of view, it happens quite often when accessing the >> server via SSH. For example today it crashed when I wanted to add something >> and after I clicked into putty and typed the first char. In another putty, >> where I have my netconsole log open, I instantly saw the oops. >> >> But what exactly causing these kinds of reboots, I'm clueless as you too. >> Only that I do experience far more frequent crashes when accessing the >> server from workplace via putty on Windows than via SSH on OSX or Debian >> Linux. >> >>> There are several moving parts: >>> 0. Hardware >>> 1. Xen hypervisor >>> 2. Dom0 kernel >>> Your report and the debian report suggested that Dom0 kernel is less >>> likely to be the culprit because you've tried different Dom0 kernels. >> >> As just written in the other mail, I already tried kernel 4.5 from >> backports. Still crashing. >> >>> As for Xen, not sure if you would be up for trying a debug build from >>> source tree. That would help provide information on whether this is a >>> bug in Xen or not. >> >> Will try to build from Debian source, but how to enable debug build? >> > > I was thinking about building directly from xen.git. > > http://wiki.xenproject.org/wiki/Compiling_Xen_From_Source > > Probably try the Xen 4.7 release. > > Wei. > >> -- >> Ciao... //http://blog.windfluechter.net >> Ingo \X/ XMPP: i...@jabber.windfluechter.net >> >> >> gpg pubkey: http://www.juergensmann.de/ij_public_key.asc ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] Kernel panic on Xen virtualisation in Debian
Hello, one of the servers that crashes sometimes because of this issue is mine. Jan is also using this server and therefore interested in a solution. We are not sure whether it is a Xen or Kernel Issue, but on the Kernel and Debian bugtrackers no one answered, so we figured it might be worth a try asking on this list. The kernel panics first happened in summer 2014, multiple times a week. We tried various kernel versions back then - it suddenly stopped although we didn't change anything after the latest crash back then. After that, the server ran without problems running Kernel 3.15.7 on Xen 4.4.0 until November 2015. After that, the server ran on Xen 4.4.4-pre and Debians 3.16 Kernel. The crash happend again in May and after that, running Xen 4.4.4 and an updated Debian 3.16 Kernel, it happened in June. In May, Ingo Jürgensmann also started experiencing this problem and blogged about it: https://blog.windfluechter.net/content/blog/2016/03/23/1721-xen-randomly-crashing-server https://blog.windfluechter.net/content/blog/2016/05/12/1723-xen-randomly-crashing-server-part-2 He is pretty sure, that the problem went away after disabling IPv6. But: we can't say for sure, because on our server it sometimes happened often in a short period of time, but then it didn't for months. and: disabling IPv6 is no option for me at all. @Wei So far we didn't find a way to reproduce the kernel panic... Andreas. Original-Nachricht Betreff: Re: [Xen-devel] Kernel panic on Xen virtualisation in Debian Von: Jan Prunk An: Wei Liu Datum: Fri, 8 Jul 2016 14:22:37 +0200 > Hello, > > Please also send a CC: to 804...@bugs.debian.org > <mailto:804...@bugs.debian.org> for future reference. > The administrator of the server is Andreas Ziegler <mailto:m...@conemu.de>>, > maybe he will be able to log/reproduce the bug, I was only an initial > reporter. > > Kind regards, > Jan > > On Fri, Jul 8, 2016 at 1:14 PM, Wei Liu <mailto:wei.l...@citrix.com>> wrote: > > On Wed, Jul 06, 2016 at 03:14:15PM +0100, George Dunlap wrote: > > On Mon, Jul 4, 2016 at 7:06 PM, Jan Prunk <mailto:janpr...@gmail.com>> wrote: > > > Hello ! > > > > > > I am posting Xen virtualisation bug links to this e-mail address, > > > because I wasn't able to find the Xen specific bugtracker list. > > > This bug has been discovered in 2015 and so far it hasn't been > > > resolved through the Debian/Kernel bug lists. I submit the > > > links to bug reports for you. > > > > > > https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=804079 > > > > The serial log at the bottom looks like there was a crash in the ipv6 > > handling as the result of a packet delivery, perhaps? David / Wei, do > > you have any ideas? Not sure who else has worked on the netback side > > of things. > > > > The original bug report showed that there was a exception in the middle > of memcpy instruction while the latest log showed that the exception > could potentially be somewhere else. Both logs showed that the > exception took place when ipv6 was involved. > > If Jan can come up with a reliable repro I might be able to have a look. > > Wei. > > > -George > ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] Kernel panic on Xen virtualisation in Debian
Hi everyone, did the information that Ingo provided (i cited his message to the list below) maybe help in narrowing down the possible issue? If you need additional information we can try getting it for you, as Ingo might be able to reproduce the kernel panic, although not reliably. by the way, Ingo and i compared the output of "lspci" on both our servers and they have no similar hardware other than a Xeon CPU, but mine is an earlier generation than the other one - maybe this rules out driver-related problems. Andreas. Am 10.07.2016 um 15:18 schrieb Ingo Jürgensmann: > Am 10.07.2016 um 00:29 schrieb Andreas Ziegler : > >> In May, Ingo Jürgensmann also started experiencing this problem and >> blogged about it: >> https://blog.windfluechter.net/content/blog/2016/03/23/1721-xen-randomly-crashing-server >> https://blog.windfluechter.net/content/blog/2016/05/12/1723-xen-randomly-crashing-server-part-2 > > Actually I’m suffering from this problem since April 2013. Here’s my story… ;) > > Everything was working smoothly when I was still using a rootserver at > Hetzner. The setup there was some sort of non-standard, as I needed to have > eth0 as outgoing interface not being part of the Xen bridge. So I used a > mixture of bridge and routed in xend-config.sxp. This setup worked for years > without problems. > > However: as Hetzner started to bill for every single IPv4 address, I moved to > my new provider where I could get the same address space (/26) without being > forced to pay for every IPv4 address. The server back then was a Cisco C200 > M2. > > Since I got my own VLAN at the new location, I was then able to dismiss the > mixed setup of routing and bridging and used only bridging with eth0 now > being part of the Xen bridge. The whole setup consists of two bridges: one > for the external IP addresses (xenbr0) and one for internal traffic (xenbr1). > This was already that way with Hetzner. > > However, shortly after I moved to the new provider, the issues started: > random crashes of the host. With the new provider, who was and is still very > helpful, we exchanged for example the memory. The provider reported as well > that other Cisco C200 server with Ubutu LTS didn’t show this issue. > > Over time a pattern showed up that might cause the frequent crashes > (sometimes several times in a row, let’s say 2-10 times a day!): > > My setup is this: > > Debian stable with packaged Xen hypervisor and these VMs: > 1) Mail, Database, Nameserver, OpenVPN > 2) Webserver, Squid3 > 3) Login server > 4) … some more servers (10 in total), e.g. Tor Relay… > > IPv4 /26 network, IPv6 /48 network > > From my workplace I need to login to 3) and have a tunnel to the Squid on 2) > via the internal addresses on xenbr1. Of course Squid queries the nameserver > on 1), so there is some internal traffic going back and forth on the internal > bridge and traffic originating from the external bridge (xenbr0). Using Squid > I access my Roundcube on my small homebrew server that is connected to 1) via > OpenVPN. Of course the webserver on 2) queries the database on 1) > > So, the most crashes do happen while I’m using the SSH tunnel from my > workplace. If a crash happen, it’s most likely that at least two in a row > will happen in a short time frame (within 1-2 hours), sometimes even within > 10 mins after the server came back. From time to time my impression was, that > the server crashes the second time instantly when I try to access my > Roundcube at home. > > Furthermore, I switched from using the Cisco C200 server to my own server > with Supermicro X9SRi-F mainboard and a XEON E5-2630L V2, but still the same > provider, and the same issue: the new server crashes the same way as the > Cisco server did. With the new server we did a replacement of the memory as > well: from 32G to 128G. So over time we have switched memory twice and > hardware once. Since then I don’t assume anymore that this might be hardware > related. > > In the meantime I switched from using Squid on 2) to tinyproxy running on 2) > as well as running tinyproxy on another third party VPS. Still the crashes > happen, regardless of using Squid on 2) or not. > > In May the server crashed again several times a week and several times a day. > Really, really annoying! > So together with my provider we setup a netconsole to catch some more > information about the crash than just the few lines from the IPMI console. > > Trying linux-image 4.4 from backports didn’t help either. I switched from PV > to PVHVM as well some months ago. > >> He is pretty sure, that the problem went away after disabling IPv6. > > Indeed. Since I disabled IPv6 for all of my VMs