Re: [Xen-devel] Regression, host crash with 4.5rc1

2015-03-02 Thread Jan Beulich
>>> On 27.02.15 at 18:50, wrote: > If this issue were to happen on Linux/bare-metal, this is how I'd debug it. > Hopefully some of this will translate to Xen in one way or another. Sadly not really - the kernel plays only a minor role (forwarding ACPI data to the hypervisor) in C-state handling u

Re: [Xen-devel] Regression, host crash with 4.5rc1

2015-02-27 Thread Brown, Len
(Please forgive my lack of Xen-fu knowledge in advance) If this issue were to happen on Linux/bare-metal, this is how I'd debug it. Hopefully some of this will translate to Xen in one way or another. dmesg | grep idle will tell us what idle driver is running (on Dom0 kernel) and if it is intel_id

Re: [Xen-devel] Regression, host crash with 4.5rc1

2015-02-27 Thread Dugger, Donald D
hursday, November 27, 2014 2:28 AM To: Steve Freitas; Dugger, Donald D; Nakajima, Jun Cc: xen-devel@lists.xen.org; Don Slutz Subject: Re: [Xen-devel] Regression, host crash with 4.5rc1 >>> On 27.11.14 at 06:29, wrote: > On 11/25/2014 03:00 AM, Jan Beulich wrote: >> Okay, so it'

Re: [Xen-devel] Regression, host crash with 4.5rc1

2014-12-03 Thread Dugger, Donald D
ve Freitas; Dugger, Donald D; Nakajima, Jun Cc: xen-devel@lists.xen.org; Don Slutz Subject: Re: [Xen-devel] Regression, host crash with 4.5rc1 >>> On 27.11.14 at 06:29, wrote: > On 11/25/2014 03:00 AM, Jan Beulich wrote: >> Okay, so it's not really the mwait-idle driver caus

Re: [Xen-devel] Regression, host crash with 4.5rc1

2014-11-28 Thread Steve Freitas
On Nov 28, 2014, at 00:50, Jan Beulich wrote: On 28.11.14 at 09:24, wrote: >>> And with 6 errata >>> documented it's not all that unlikely that there's a 7th one with >>> MONITOR/MWAIT behavior. The commit you bisected to (and >>> which you had verified to be the culprit by just forcing >>

Re: [Xen-devel] Regression, host crash with 4.5rc1

2014-11-28 Thread Jan Beulich
>>> On 28.11.14 at 09:24, wrote: >> And with 6 errata >> documented it's not all that unlikely that there's a 7th one with >> MONITOR/MWAIT behavior. The commit you bisected to (and >> which you had verified to be the culprit by just forcing >> arch_skip_send_event_check() to always return false)

Re: [Xen-devel] Regression, host crash with 4.5rc1

2014-11-28 Thread Steve Freitas
On 11/27/2014 01:27 AM, Jan Beulich wrote: This was precisely the reason why I told you that the numbering differs (and is confusing and has nothing to do with actual C state numbers): What max_cstate refers to in the mwait-idle driver is what above is listed as type[Cx], i.e. the state at index

Re: [Xen-devel] Regression, host crash with 4.5rc1

2014-11-27 Thread Jan Beulich
>>> On 27.11.14 at 06:29, wrote: > On 11/25/2014 03:00 AM, Jan Beulich wrote: >> Okay, so it's not really the mwait-idle driver causing the regression, >> but it is C-state related. Hence we're now down to seeing whether all >> or just the deeper C states are affected, i.e. I now need to ask you >

Re: [Xen-devel] Regression, host crash with 4.5rc1

2014-11-26 Thread Steve Freitas
On 11/25/2014 03:00 AM, Jan Beulich wrote: Okay, so it's not really the mwait-idle driver causing the regression, but it is C-state related. Hence we're now down to seeing whether all or just the deeper C states are affected, i.e. I now need to ask you to play with "max_cstate=". For that you'll

Re: [Xen-devel] Regression, host crash with 4.5rc1

2014-11-25 Thread Jan Beulich
>>> On 25.11.14 at 10:38, wrote: > On 11/25/2014 12:16 AM, Jan Beulich wrote: >> Interesting, so other than for me (perhaps due to other patches >> I have in my tree) the change resulted in C states now being used >> again despite mwait-idle=0, which is good. Question now is - with >> this being t

Re: [Xen-devel] Regression, host crash with 4.5rc1

2014-11-25 Thread Steve Freitas
On 11/25/2014 12:16 AM, Jan Beulich wrote: (XEN) 'c' pressed -> printing ACPI Cx structures (XEN) ==cpu0== (XEN) active state:C0 (XEN) max_cstate:C7 (XEN) states: (XEN) C1:type[C1] latency[001] usage[5664] method[ FFH] duration[4042540627] (XEN) C2:type[C3] l

Re: [Xen-devel] Regression, host crash with 4.5rc1

2014-11-25 Thread Jan Beulich
>>> On 24.11.14 at 23:17, wrote: > I'm combining this action with your patch, see below. Please let me know > if this was incorrect. Thanks, that's perfectly fine. > (XEN) 'c' pressed -> printing ACPI Cx structures > (XEN) ==cpu0== > (XEN) active state:C0 > (XEN) max_cstate:C7 >

Re: [Xen-devel] Regression, host crash with 4.5rc1

2014-11-24 Thread Steve Freitas
On 24.11.14 at 10:08, wrote: On Nov 24, 2014, at 00:45, Jan Beulich wrote: On 23.11.14 at 02:28, wrote: As promised, below is the apic-verbosity=debug log, with 'i'. Thanks! I'm sorry, I misspelled the option, it's really "apic_verbosity=debug". The 'i' output at least already confirms that

Re: [Xen-devel] Regression, host crash with 4.5rc1

2014-11-24 Thread Jan Beulich
>>> On 24.11.14 at 10:08, wrote: > On Nov 24, 2014, at 00:45, Jan Beulich wrote: > > On 23.11.14 at 02:28, wrote: >>> With mwait-idle=0: >>> >>> (XEN) 'c' pressed -> printing ACPI Cx structures >>> (XEN) ==cpu0== >>> (XEN) active state: C0 >>> (XEN) max_cstate: C7

Re: [Xen-devel] Regression, host crash with 4.5rc1

2014-11-24 Thread Jan Beulich
>>> On 24.11.14 at 10:08, wrote: > On Nov 24, 2014, at 00:45, Jan Beulich wrote: > On 23.11.14 at 02:28, wrote: >>> As promised, below is the apic-verbosity=debug log, with 'i'. Thanks! >> >> I'm sorry, I misspelled the option, it's really "apic_verbosity=debug". >> The 'i' output at least

Re: [Xen-devel] Regression, host crash with 4.5rc1

2014-11-24 Thread Steve Freitas
On Nov 24, 2014, at 00:45, Jan Beulich wrote: On 23.11.14 at 02:28, wrote: >> With mwait-idle=0: >> >> (XEN) 'c' pressed -> printing ACPI Cx structures >> (XEN) ==cpu0== >> (XEN) active state: C0 >> (XEN) max_cstate: C7 >> (XEN) states: >> (XEN) C1: type[C1]

Re: [Xen-devel] Regression, host crash with 4.5rc1

2014-11-24 Thread Jan Beulich
>>> On 23.11.14 at 02:28, wrote: > With mwait-idle=0: > > (XEN) 'c' pressed -> printing ACPI Cx structures > (XEN) ==cpu0== > (XEN) active state: C0 > (XEN) max_cstate: C7 > (XEN) states: > (XEN) C1: type[C1] latency[001] usage[] method[ FFH] > duration[0

Re: [Xen-devel] Regression, host crash with 4.5rc1

2014-11-22 Thread Steve Freitas
On 11/21/2014 0:42, Jan Beulich wrote: On 20.11.14 at 21:07, wrote: Running with mwait-idle=0 solves (hides?) the problem. Next step is to fiddle with the C states? For that I'd first of all like to know how much use of C states the system still makes with that option in place. For that I'd ne

Re: [Xen-devel] Regression, host crash with 4.5rc1

2014-11-21 Thread Jan Beulich
>>> On 20.11.14 at 21:07, wrote: > Running with mwait-idle=0 solves (hides?) the problem. Next step is to > fiddle with the C states? So this also prompted me to go over the list of errata. Just to confirm - your CPU is family 6 model 44? What stepping? And what nominal frequency? There are a c

Re: [Xen-devel] Regression, host crash with 4.5rc1

2014-11-21 Thread Jan Beulich
>>> On 20.11.14 at 21:07, wrote: > Running with mwait-idle=0 solves (hides?) the problem. Next step is to > fiddle with the C states? For that I'd first of all like to know how much use of C states the system still makes with that option in place. For that I'd need the output of "xenpm get-cpuid

Re: [Xen-devel] Regression, host crash with 4.5rc1

2014-11-20 Thread Steve Freitas
Hi Jan, Thanks for all your help so far! Here's my latest update. On 11/17/2014 23:54, Jan Beulich wrote: Plus, without said adjustment, first just disable the MWAIT CPU idle driver ("mwait-idle=0") and then, if that didn't make a difference, use of C states altogether ("cpuidle=0"). If any of

Re: [Xen-devel] Regression, host crash with 4.5rc1

2014-11-20 Thread Jan Beulich
>>> On 20.11.14 at 02:23, wrote: > On 11/17/2014 23:54, Jan Beulich wrote: >> Another thing - now that serial logging appears to be working for >> you, did you try whether the host, once hung, still reacts to serial >> input (perhaps force input to go to Xen right at boot via the >> "conswitch=" o

Re: [Xen-devel] Regression, host crash with 4.5rc1

2014-11-19 Thread Steve Freitas
On 11/17/2014 23:54, Jan Beulich wrote: On 17.11.14 at 20:21, wrote: Okay, I did a bisection and was not able to correlate the above error message with the problem I'm seeing. Not saying it's not related, but I had plenty of successful test runs in the presence of that error. Took me about a w

Re: [Xen-devel] Regression, host crash with 4.5rc1

2014-11-17 Thread Jan Beulich
>>> On 17.11.14 at 20:21, wrote: > Okay, I did a bisection and was not able to correlate the above error > message with the problem I'm seeing. Not saying it's not related, but I > had plenty of successful test runs in the presence of that error. > > Took me about a week (sometimes it takes as

Re: [Xen-devel] Regression, host crash with 4.5rc1

2014-11-17 Thread Steve Freitas
Hi Jan, On 11/11/2014 0:05, Jan Beulich wrote: And these [ 199.775209] pcieport :00:03.0: AER: Multiple Corrected error received: id=0018 [ 199.775238] pcieport :00:03.0: PCIe Bus Error: severity=Corrected, type=Data Link Layer, id=0018(Transmitter ID) [ 199.7

Re: [Xen-devel] Regression, host crash with 4.5rc1

2014-11-11 Thread Jan Beulich
>>> On 10.11.14 at 21:05, wrote: > On 11/10/2014 0:51, Jan Beulich wrote: >> Raising the kernel log level to maximum too would have helped. > > Okay, I've done that and the output is here, let me know if you have any > preferred logging flags instead: > > http://pastebin.com/M3yvWNTT Hmm, I c

Re: [Xen-devel] Regression, host crash with 4.5rc1

2014-11-10 Thread Steve Freitas
On 11/10/2014 0:51, Jan Beulich wrote: On 10.11.14 at 09:03, wrote: Sorry for the delay, took some debugging on another computer to get serial logging working. Due to its size, I've posted the entire log of a crashed session here: http://pastebin.com/AiPHUZRH In this case I used a 3.0 gig memor

Re: [Xen-devel] Regression, host crash with 4.5rc1

2014-11-10 Thread Jan Beulich
>>> On 10.11.14 at 09:03, wrote: > Sorry for the delay, took some debugging on another computer to get > serial logging working. Due to its size, I've posted the entire log of a > crashed session here: http://pastebin.com/AiPHUZRH In this case I used a > 3.0 gig memory size for the Windows domU

Re: [Xen-devel] Regression, host crash with 4.5rc1

2014-11-10 Thread Steve Freitas
On 11/04/2014 02:15 AM, Jan Beulich wrote: In fact this would not just be nice, but is strictly needed, and (as with any pass-through related problems) additionally requires running with "iommu=debug" alongside the usual need of setting the log levels to the maximum. Hi all, Sorry for the del