On Mon, May 11, 2015 at 04:00:03PM +0200, Ingo Molnar wrote:
> > So potentially, CPU0 generated an interrupt that caused
> > vcpu_enter_guest to be called on CPU1. However, when
> > vmx_handle_external_intr was called, it didn't progress any further.
>
> So the IPI does look like to be lost in t
* Chris J Arges wrote:
> Later in the trace we see the same call followed by
> vmx_handle_external_intr() ignoring the call:
>
> [ 603.248016] 2452.083823 | 0) |ptep_clear_flush() {
> [ 603.248016] 2452.083824 | 0) | flush_tlb_page() {
> [ 603.2480
On Thu, Apr 16, 2015 at 06:31:40PM +0200, Ingo Molnar wrote:
> Now what would be nice is to observe it whether the CPU that is not
> doing the CSD wait is truly locked up.
>
> It might be executing random KVM-ish workloads and the various
> backtraces we've seen so far are just a random sampl
* Chris J Arges wrote:
> A previous backtrace of a 3.19 series kernel is here and showing interrupts
> enabled on both CPUs on L1:
> https://lkml.org/lkml/2015/2/23/234
> http://people.canonical.com/~inaddy/lp1413540/BACKTRACES.txt
>
> [...]
>
> Yes, I think at this point I'll go through the var
On Thu, Apr 16, 2015 at 01:04:23PM +0200, Ingo Molnar wrote:
>
> * Chris J Arges wrote:
>
> > Ingo,
> >
> > Below are the patches and data I've gathered from the reproducer. My
> > methodology was as described previously; however I used gdb on the
> > qemu process in order to breakpoint L1 on
* Chris J Arges wrote:
> Ingo,
>
> Below are the patches and data I've gathered from the reproducer. My
> methodology was as described previously; however I used gdb on the
> qemu process in order to breakpoint L1 once we've detected the hang.
> This made dumping the kvm_lapic structures on
On Mon, Apr 13, 2015 at 08:14:51AM +0200, Ingo Molnar wrote:
> > FYI, I'm working on getting better data at the moment and here is my
> > approach:
> > * For the L0 kernel:
> > - In arch/x86/kvm/lapic.c, I enabled 'apic_debug' to get more output (and
> > print
> >the addresses of various u
* Chris J Arges wrote:
> /sys/module/kvm_intel/parameters/enable_apicv on the affected
> hardware is not enabled, and unfortunately my hardware doesn't have
> the necessary features to enable it. So we are dealing with KVM's
> lapic implementation only.
That's actually pretty fortunate, as w
> So it would be really important to see the stack dump of CPU#0 at this
> point, and also do an APIC state dump of it.
>
> Because from previous dumps it appeared that the 'stuck' CPU was just
> in its idle loop - which is 'impossible' as the idle loop should still
> allow APIC irqs arriving
* Chris J Arges wrote:
> Ingo,
>
> Looks like sched_clock() works in this case.
>
> Adding the dump_stack() line caused various issues such as the VM
> oopsing on boot or the softlockup never being detected properly (and
> thus not crashing). So the below is running with your patch and
> 'd
On Tue, Apr 7, 2015 at 1:59 PM, Chris J Arges
wrote:
>
> Here is the log leading up to the soft lockup (I adjusted CSD_LOCK_TIMEOUT to
> 5s):
> [ 22.669630] kvm [1523]: vcpu0 disabled perfctr wrmsr: 0xc1 data 0x
> [ 38.712710] csd: Detected non-responsive CSD lock (#1) on CPU#00, waiting
On Tue, Apr 07, 2015 at 11:21:21AM +0200, Ingo Molnar wrote:
>
> * Linus Torvalds wrote:
>
> > On Mon, Apr 6, 2015 at 9:58 AM, Chris J Arges
> > wrote:
> > >
> > > I noticed that with this patch it never reached 'csd: Detected
> > > non-responsive CSD lock...' because it seems that ts_delta nev
* Linus Torvalds wrote:
> On Mon, Apr 6, 2015 at 9:58 AM, Chris J Arges
> wrote:
> >
> > I noticed that with this patch it never reached 'csd: Detected
> > non-responsive CSD lock...' because it seems that ts_delta never reached
> > CSD_LOCK_TIMEOUT. I tried adjusting the TIMEOUT value and stil
On Mon, Apr 6, 2015 at 9:58 AM, Chris J Arges
wrote:
>
> I noticed that with this patch it never reached 'csd: Detected
> non-responsive CSD lock...' because it seems that ts_delta never reached
> CSD_LOCK_TIMEOUT. I tried adjusting the TIMEOUT value and still got a
> hang without reaching this st
On 04/03/2015 12:43 AM, Ingo Molnar wrote:
>
> * Chris J Arges wrote:
>
>> Ingo,
>>
>> I think tracking IPI calls from 'generic_exec_single' would make a lot
>> of sense. When you say poll for completion do you mean a loop after
>> 'arch_send_call_function_single_ipi' in kernel/smp.c? My main co
* Ingo Molnar wrote:
> +static void csd_lock_wait(struct call_single_data *csd, int cpu)
> {
> - while (csd->flags & CSD_FLAG_LOCK)
> + int bug_id = 0;
> + u64 ts0, ts1, ts_delta;
> +
> + ts0 = jiffies_to_msecs(jiffies);
Note that while 'jiffies' is a global variable, it's read
16 matches
Mail list logo