[167986.379006] WARNING: at net/core/skbuff.c:3444 skb_try_coalesce+0x359/0x390()

2012-11-28 Thread Sander Eikelenboom
On one of my virtual machines i got this splat(running a 3.7.0-rc7):


[167986.378985] [ cut here ]
[167986.379006] WARNING: at net/core/skbuff.c:3444 
skb_try_coalesce+0x359/0x390()
[167986.379012] Modules linked in:
[167986.379021] Pid: 3231, comm: apache2 Not tainted 
3.7.0-rc7-20121126-persistent #1
[167986.379028] Call Trace:
[167986.379032][] warn_slowpath_common+0x7a/0xb0
[167986.379047]  [] ? local_bh_enable+0xb5/0x160
[167986.379053]  [] warn_slowpath_null+0x15/0x20
[167986.379059]  [] skb_try_coalesce+0x359/0x390
[167986.379067]  [] tcp_try_coalesce+0x69/0xc0
[167986.379073]  [] tcp_queue_rcv+0x54/0x100
[167986.379079]  [] ? tcp_rcv_state_process+0x84f/0xc70
[167986.379086]  [] tcp_rcv_established+0x2bb/0x6a0
[167986.379093]  [] ? tcp_v4_rcv+0x6cf/0xb10
[167986.379098]  [] tcp_v4_do_rcv+0x135/0x480
[167986.379106]  [] ? _raw_spin_lock_nested+0x42/0x50
[167986.379112]  [] ? tcp_v4_rcv+0x6cf/0xb10
[167986.379118]  [] tcp_v4_rcv+0x95d/0xb10
[167986.379125]  [] ? lock_acquire+0xd8/0x100
[167986.379133]  [] ? ip_local_deliver_finish+0x45/0x230
[167986.379140]  [] ip_local_deliver_finish+0x11a/0x230
[167986.379149]  [] ? ip_local_deliver_finish+0x45/0x230
[167986.379156]  [] ip_local_deliver+0x38/0x80
[167986.379162]  [] ip_rcv_finish+0x15a/0x630
[167986.379169]  [] ip_rcv+0x218/0x300
[167986.379176]  [] __netif_receive_skb+0x65d/0x8d0
[167986.379182]  [] ? __netif_receive_skb+0x145/0x8d0
[167986.379189]  [] ? trace_hardirqs_on+0xd/0x10
[167986.379197]  [] ? free_hot_cold_page+0x1b3/0x1e0
[167986.379204]  [] netif_receive_skb+0x28/0xf0
[167986.379210]  [] ? __pskb_pull_tail+0x253/0x340
[167986.400668]  [] xennet_poll+0xad5/0xe10
[167986.400679]  [] net_rx_action+0x136/0x260
[167986.400692]  [] ? __do_softirq+0x71/0x1a0
[167986.400698]  [] __do_softirq+0xc9/0x1a0
[167986.400705]  [] call_softirq+0x1c/0x30
[167986.400709][] do_softirq+0x85/0xf0
[167986.400721]  [] ? ip_finish_output+0x246/0x530
[167986.400727]  [] local_bh_enable+0x153/0x160
[167986.400733]  [] ip_finish_output+0x246/0x530
[167986.400739]  [] ? ip_finish_output+0xcd/0x530
[167986.400749]  [] ip_output+0x59/0xe0
[167986.400755]  [] ip_local_out+0x28/0x90
[167986.400760]  [] ip_queue_xmit+0x17f/0x4a0
[167986.400766]  [] ? ip_send_unicast_reply+0x340/0x340
[167986.400773]  [] ? getnstimeofday+0x47/0xe0
[167986.400779]  [] ? __skb_clone+0x29/0x120
[167986.400786]  [] tcp_transmit_skb+0x400/0x8d0
[167986.400793]  [] tcp_write_xmit+0x22c/0xa70
[167986.400799]  [] __tcp_push_pending_frames+0x2d/0x90
[167986.400809]  [] tcp_sendmsg+0x17d/0xe10
[167986.400816]  [] inet_sendmsg+0xa9/0x100
[167986.400822]  [] ? inet_autobind+0x70/0x70
[167986.400829]  [] ? sock_destroy_inode+0x40/0x40
[167986.400835]  [] sock_aio_write+0x12d/0x140
[167986.400842]  [] do_sync_readv_writev+0x9b/0xe0
[167986.400849]  [] do_readv_writev+0xcf/0x1d0
[167986.400855]  [] vfs_writev+0x3e/0x60
[167986.400861]  [] sys_writev+0x5a/0xc0
[167986.400871]  [] ? trace_hardirqs_on_thunk+0x3a/0x3f
[167986.400878]  [] system_call_fastpath+0x16/0x1b
[167986.400884] ---[ end trace 2f73b807ee74fc5a ]---

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


3.9-rc0: Boot fails: RIP: 0010:[] [] stop_machine_cpu_stop+0x8a/0x160

2013-02-26 Thread Sander Eikelenboom
Hi All,

Current tip (with last commit id: c41b3810c09e60664433548c5218cc6ece6a8903 
(Merge tag 'pm+acpi-fixes-3.9-rc1') fails to boot with:

[0.00] Initializing cgroup subsys cpuset
[0.00] Initializing cgroup subsys cpu
[0.00] Linux version 3.8.0-rc0-20130226 (root@serveerstertje) (gcc 
version 4.4.5 (Debian 4.4.5-8) ) #1 SMP PREEMPT Tue Feb 26 10:31:29 CET 2013
[0.00] Command line: BOOT_IMAGE=/vmlinuz-3.8.0-rc0-20130226 
root=/dev/mapper/serveerstertje-root ro nomodeset verbose mem=1024M vga=794 
video=vesafb acpi_enforce_resources=lax r8169.use_dac=1 max_loop=50 
loop_max_part=10 debug loglevel=10 kmemleak=on console=vga console=ttyS0,38400 
earlyprint=serial,ttyS0,38
[0.00] e820: BIOS-provided physical RAM map:
[0.00] BIOS-e820: [mem 0x-0x0009efff] usable
[0.00] BIOS-e820: [mem 0x0009f000-0x0009] reserved
[0.00] BIOS-e820: [mem 0x000e4000-0x000f] reserved
[0.00] BIOS-e820: [mem 0x0010-0xaff8] usable
[0.00] BIOS-e820: [mem 0xaff9-0xaff9dfff] ACPI data
[0.00] BIOS-e820: [mem 0xaff9e000-0xaffd] ACPI NVS
[0.00] BIOS-e820: [mem 0xaffe-0xafff] reserved
[0.00] BIOS-e820: [mem 0xffe0-0x] reserved
[0.00] BIOS-e820: [mem 0x0001-0x00024fff] usable
[0.00] e820: remove [mem 0x4000-0xfffe] usable
[0.00] NX (Execute Disable) protection: active
[0.00] e820: user-defined physical RAM map:
[0.00] user: [mem 0x-0x0009efff] usable
[0.00] user: [mem 0x0009f000-0x0009] reserved
[0.00] user: [mem 0x000e4000-0x000f] reserved
[0.00] user: [mem 0x0010-0x3fff] usable
[0.00] user: [mem 0xaff9-0xaff9dfff] ACPI data
[0.00] user: [mem 0xaff9e000-0xaffd] ACPI NVS
[0.00] user: [mem 0xaffe-0xafff] reserved
[0.00] user: [mem 0xffe0-0x] reserved
[0.00] SMBIOS 2.5 present.
[0.00] DMI: MSI MS-7640/890FXA-GD70 (MS-7640)  , BIOS V1.8B1 09/13/2010
[0.00] e820: update [mem 0x-0x0fff] usable ==> reserved
[0.00] e820: remove [mem 0x000a-0x000f] usable
[0.00] No AGP bridge found
[0.00] e820: last_pfn = 0x4 max_arch_pfn = 0x4
[0.00] MTRR default type: uncachable
[0.00] MTRR fixed ranges enabled:
[0.00]   0-9 write-back
[0.00]   A-E uncachable
[0.00]   F-F write-protect
[0.00] MTRR variable ranges enabled:
[0.00]   0 base  mask 8000 write-back
[0.00]   1 base 8000 mask E000 write-back
[0.00]   2 base A000 mask F000 write-back
[0.00]   3 disabled
[0.00]   4 disabled
[0.00]   5 disabled
[0.00]   6 disabled
[0.00]   7 disabled
[0.00] TOM2: 00025000 aka 9472M
[0.00] x86 PAT enabled: cpu 0, old 0x7040600070406, new 0x7010600070106
[0.00] e820: update [mem 0xb000-0x] usable ==> reserved
[0.00] found SMP MP-table at [mem 0x000ff780-0x000ff78f] mapped at 
[880ff780]
[0.00] Scanning 1 areas for low memory corruption
[0.00] ACPI: RSDP 000fb100 00014 (v00 ACPIAM)
[0.00] ACPI: RSDT aff9 00048 (v01 MSIOEMSLIC  20100913 
MSFT 0097)
[0.00] ACPI: FACP aff90200 00084 (v01 7640MS A7640100 20100913 
MSFT 0097)
[0.00] ACPI: DSDT aff905e0 09427 (v01  A7640 A7640100 0100 
INTL 20051117)
[0.00] ACPI: FACS aff9e000 00040
[0.00] ACPI: APIC aff90390 00088 (v01 7640MS A7640100 20100913 
MSFT 0097)
[0.00] ACPI: MCFG aff90420 0003C (v01 7640MS OEMMCFG  20100913 
MSFT 0097)
[0.00] ACPI: SLIC aff90460 00176 (v01 MSIOEMSLIC  20100913 
MSFT 0097)
[0.00] ACPI: OEMB aff9e040 00072 (v01 7640MS A7640100 20100913 
MSFT 0097)
[0.00] ACPI: SRAT aff9a5e0 00108 (v03 AMDFAM_F_10 0002 
AMD  0001)
[0.00] ACPI: HPET aff9a6f0 00038 (v01 7640MS OEMHPET  20100913 
MSFT 0097)
[0.00] ACPI: IVRS aff9a730 00100 (v01  AMD RD890S 00202031 
AMD  )
[0.00] ACPI: SSDT aff9a830 00DA4 (v01 A M I  POWERNOW 0001 
AMD  0001)
[0.00] ACPI: Local APIC address 0xfee0
[0.00] SRAT: PXM 0 -> APIC 0x00 -> Node 0
[0.00] SRAT: PXM 0 -> APIC 0x01 -> Node 0
[0.00] SRAT: PXM 0 -> APIC 0x02 -> Node 0
[0.00] SRAT: PXM 0 -> APIC 0x03 -> Node 0
[0.00] SRAT: PXM 0 -> APIC 0x04 -> Node 

Re: [Xen-devel] Regression introduced with 14e568e78f6f80ca1e27256641ddf524c7dbdc51 (stop_machine: Use smpboot threads)

2013-02-26 Thread Sander Eikelenboom

Tuesday, February 26, 2013, 1:36:36 PM, you wrote:

> On Fri, 22 Feb 2013, Konrad Rzeszutek Wilk wrote:
>> 
>> I don't know if this is b/c the Xen code is missing something or
>> expects something that never happend. I hadn't looked at your
>> patch in any detail (was going to do that on Monday).
>> 
>> Either way, if I boot a HVM guest with PV extensions (aka PVHVM)

Hmm i'm seeing this booting on baremetal as well.
(see http://lkml.indiana.edu/hypermail/linux/kernel/1302.3/00836.html)

>> this is I what get:
>> [0.133081] cpu 1 spinlock event irq 71
>> [0.134049] smpboot: Booting Node   0, Processors  #1[0.008000] 
>> installing Xen timer for CPU 1
>> [0.205154] Brought up 2 CPUs
>> [0.205156] smpboot: Total of 2 processors activated (16021.74 BogoMIPS)
>> 
>> [   28.134000] BUG: soft lockup - CPU#0 stuck for 23s! [migration/0:8]
>> [   28.134000] Modules linked in:
>> [   28.134000] CPU 0 
>> [   28.134000] Pid: 8, comm: migration/0 Tainted: GW
>> 3.8.0upstream-06472-g6661875-dirty #1 Xen HVM domU
>> [   28.134000] RIP: 0010:[]  [] 
>> stop_machine_cpu_stop+0x7b/0xf0

> So the migration thread loops in stop_machine_cpu_stop(). Now the
> interesting question is what work was scheduled for that cpu.

> The main difference between the old code and the new one, is that the
> thread is created earlier and not detroyed on cpu offline.

> Could you add some instrumentation, so we can see what kind of cpu
> stop work is scheduled and from where?

> Thanks,

> tglx



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Xen-devel] Regression introduced with 14e568e78f6f80ca1e27256641ddf524c7dbdc51 (stop_machine: Use smpboot threads)

2013-02-26 Thread Sander Eikelenboom

Tuesday, February 26, 2013, 6:44:33 PM, you wrote:

> On Tue, 26 Feb 2013, Sander Eikelenboom wrote:
>> Tuesday, February 26, 2013, 1:36:36 PM, you wrote:
>> > On Fri, 22 Feb 2013, Konrad Rzeszutek Wilk wrote:
>> >> 
>> >> I don't know if this is b/c the Xen code is missing something or
>> >> expects something that never happend. I hadn't looked at your
>> >> patch in any detail (was going to do that on Monday).
>> >> 
>> >> Either way, if I boot a HVM guest with PV extensions (aka PVHVM)
>> 
>> Hmm i'm seeing this booting on baremetal as well.
>> (see http://lkml.indiana.edu/hypermail/linux/kernel/1302.3/00836.html)

> Ok. I decoded it with the help of Konrad. Does the patch below work
> for you as well?

Did a few reboots and yes it seems to work here as well !
Thx

--
Sander

> Thanks,

> tglx

> Index: linux-2.6/include/linux/smpboot.h
> ===
> --- linux-2.6.orig/include/linux/smpboot.h
> +++ linux-2.6/include/linux/smpboot.h
> @@ -24,6 +24,9 @@ struct smpboot_thread_data;
>   * parked (cpu offline)
>   * @unpark:Optional unpark function, called when the thread is
>   * unparked (cpu online)
> + * @pre_unpark:Optional unpark function, called before the 
> thread is
> + * unparked (cpu online). This is not guaranteed to be
> + * called on the target cpu of the thread. Careful!
>   * @selfparking:   Thread is not parked by the park function.
>   * @thread_comm:   The base name of the thread
>   */
> @@ -37,6 +40,7 @@ struct smp_hotplug_thread {
> void(*cleanup)(unsigned int cpu, bool 
> online);
> void(*park)(unsigned int cpu);
> void(*unpark)(unsigned int cpu);
> +   void(*pre_unpark)(unsigned int cpu);
> boolselfparking;
> const char  *thread_comm;
>  };
> Index: linux-2.6/kernel/smpboot.c
> ===
> --- linux-2.6.orig/kernel/smpboot.c
> +++ linux-2.6/kernel/smpboot.c
> @@ -209,6 +209,8 @@ static void smpboot_unpark_thread(struct
>  {
> struct task_struct *tsk = *per_cpu_ptr(ht->store, cpu);
>  
+   if (ht->>pre_unpark)
+   ht->>pre_unpark(cpu);
> kthread_unpark(tsk);
>  }
>  
> Index: linux-2.6/kernel/stop_machine.c
> ===
> --- linux-2.6.orig/kernel/stop_machine.c
> +++ linux-2.6/kernel/stop_machine.c
> @@ -336,7 +336,7 @@ static struct smp_hotplug_thread cpu_sto
> .create = cpu_stop_create,
> .setup  = cpu_stop_unpark,
> .park   = cpu_stop_park,
> -   .unpark = cpu_stop_unpark,
> +   .pre_unpark = cpu_stop_unpark,
> .selfparking= true,
>  };
>  


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


libata-acpi.c ata_acpi_register_power_resource copy and paste mistake ?

2013-02-28 Thread Sander Eikelenboom
Hi Jeff,

During the last merge of ahci code (d9978ec5680059d727b39d6c706777c6973587f2), 
i saw this coming by:


--- a/drivers/ata/libata-acpi.c
+++ b/drivers/ata/libata-acpi.c
@@ -1024,30 +1024,20 @@ static void ata_acpi_register_power_resource(struct 
ata_device *dev)
 {
struct scsi_device *sdev = dev->sdev;
acpi_handle handle;
-   struct device *device;
 
handle = ata_dev_acpi_handle(dev);
-   if (!handle)
-   return;
-
-   device = &sdev->sdev_gendev;
-
-   acpi_power_resource_register_device(device, handle);
+   if (handle)
+   acpi_dev_pm_remove_dependent(handle, &sdev->sdev_gendev);
 }

shouldn't:

acpi_dev_pm_remove_dependent(handle, &sdev->sdev_gendev);

be

acpi_dev_pm_add_dependent(handle, &sdev->sdev_gendev);

in the ata_acpi_register_power_resource function ?

(seems like a copy and paste mistake from the unregister function)

--
Sander



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Xen-devel] [GIT PULL] (xen) stable/for-linus-3.7-rc2-tag for v3.7-rc2

2012-10-23 Thread Sander Eikelenboom
Hi Konrad,

Did you push the tag ?
It doesn't seem to be in your git repo.

--
Sander

Tuesday, October 23, 2012, 6:27:07 PM, you wrote:

> Hey Linus,

> Please git pull the following tag:

>  git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen.git 
> stable/for-linus-3.7-rc2-tag

> which has bug-fixes. Most of them are just code cleanup to make x86 and ARM 
> code
> work in unison. There is one serious bug-fix which manifested itself in some
> applications mysteriously getting SIGKILL. Besides that nothing 
> earth-shattering.

> :
>  * Fix mysterious SIGSEGV or SIGKILL in applications due to corrupting
>of the %eip when returning from a signal handler.
>  * Fix various ARM compile issues after the merge fallout.
>  * Continue on making more of the Xen generic code usable by ARM platform.
>  * Fix SR-IOV passthrough to mirror multifunction PCI devices.
>  * Fix various compile warnings.
>  * Remove hypercalls that don't exist anymore.
> 

>  arch/arm/include/asm/xen/interface.h |   12 +---
>  arch/arm/include/asm/xen/page.h  |   13 ++---
>  arch/arm/xen/grant-table.c   |2 +-
>  arch/x86/include/asm/xen/interface.h |4 ++--
>  arch/x86/kernel/entry_32.S   |8 +---
>  arch/x86/kernel/entry_64.S   |2 +-
>  arch/x86/xen/enlighten.c |2 --
>  drivers/xen/balloon.c|3 +--
>  drivers/xen/dbgp.c   |2 ++
>  drivers/xen/events.c |4 
>  drivers/xen/grant-table.c|8 
>  drivers/xen/sys-hypervisor.c |4 +++-
>  drivers/xen/xen-pciback/vpci.c   |   14 ++
>  drivers/xen/xenbus/xenbus_xs.c   |2 ++
>  include/xen/grant_table.h|2 +-
>  include/xen/interface/grant_table.h  |2 +-
>  include/xen/interface/memory.h   |   24 ++--
>  17 files changed, 58 insertions(+), 50 deletions(-)

> David Vrabel (1):
>   xen/x86: don't corrupt %eip when returning from a signal handler

> Ian Campbell (11):
>   xen: xenbus: quirk uses x86 specific cpuid
>   xen: sysfs: include err.h for PTR_ERR etc
>   xen: sysfs: fix build warning.
>   xen: XENMEM_translate_gpfn_list was remove ages ago and is unused.
>   xen: events: pirq_check_eoi_map is X86 specific
>   xen: grant: use xen_pfn_t type for frame_list.
>   xen: balloon: don't include e820.h
>   xen: arm: make p2m operations NOPs
>   xen: balloon: use correct type for frame_list
>   xen: arm: comment on why 64-bit xen_pfn_t is safe even on 32 bit
>   xen: dbgp: Fix warning when CONFIG_PCI is not enabled.

> Konrad Rzeszutek Wilk (1):
>   xen/xenbus: Fix compile warning.

> Laszlo Ersek (1):
>   xen PV passthru: assign SR-IOV virtual functions to separate virtual 
> slots

> Wei Yongjun (1):
>   xen/x86: remove duplicated include from enlighten.c


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Xen-devel] [GIT PULL] (xen) stable/for-linus-3.7-rc2-tag for v3.7-rc2

2012-10-23 Thread Sander Eikelenboom

Tuesday, October 23, 2012, 7:37:53 PM, you wrote:

> Hi Konrad,

> Did you push the tag ?
> It doesn't seem to be in your git repo.

Ah nevermind, it's there now. Sorry for the noise.


> --
> Sander

> Tuesday, October 23, 2012, 6:27:07 PM, you wrote:

>> Hey Linus,

>> Please git pull the following tag:

>>  git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen.git 
>> stable/for-linus-3.7-rc2-tag

>> which has bug-fixes. Most of them are just code cleanup to make x86 and ARM 
>> code
>> work in unison. There is one serious bug-fix which manifested itself in some
>> applications mysteriously getting SIGKILL. Besides that nothing 
>> earth-shattering.

>> :
>>  * Fix mysterious SIGSEGV or SIGKILL in applications due to corrupting
>>of the %eip when returning from a signal handler.
>>  * Fix various ARM compile issues after the merge fallout.
>>  * Continue on making more of the Xen generic code usable by ARM platform.
>>  * Fix SR-IOV passthrough to mirror multifunction PCI devices.
>>  * Fix various compile warnings.
>>  * Remove hypercalls that don't exist anymore.
>> 

>>  arch/arm/include/asm/xen/interface.h |   12 +---
>>  arch/arm/include/asm/xen/page.h  |   13 ++---
>>  arch/arm/xen/grant-table.c   |2 +-
>>  arch/x86/include/asm/xen/interface.h |4 ++--
>>  arch/x86/kernel/entry_32.S   |8 +---
>>  arch/x86/kernel/entry_64.S   |2 +-
>>  arch/x86/xen/enlighten.c |2 --
>>  drivers/xen/balloon.c|3 +--
>>  drivers/xen/dbgp.c   |2 ++
>>  drivers/xen/events.c |4 
>>  drivers/xen/grant-table.c|8 
>>  drivers/xen/sys-hypervisor.c |4 +++-
>>  drivers/xen/xen-pciback/vpci.c   |   14 ++
>>  drivers/xen/xenbus/xenbus_xs.c   |2 ++
>>  include/xen/grant_table.h|2 +-
>>  include/xen/interface/grant_table.h  |2 +-
>>  include/xen/interface/memory.h   |   24 ++--
>>  17 files changed, 58 insertions(+), 50 deletions(-)

>> David Vrabel (1):
>>   xen/x86: don't corrupt %eip when returning from a signal handler

>> Ian Campbell (11):
>>   xen: xenbus: quirk uses x86 specific cpuid
>>   xen: sysfs: include err.h for PTR_ERR etc
>>   xen: sysfs: fix build warning.
>>   xen: XENMEM_translate_gpfn_list was remove ages ago and is unused.
>>   xen: events: pirq_check_eoi_map is X86 specific
>>   xen: grant: use xen_pfn_t type for frame_list.
>>   xen: balloon: don't include e820.h
>>   xen: arm: make p2m operations NOPs
>>   xen: balloon: use correct type for frame_list
>>   xen: arm: comment on why 64-bit xen_pfn_t is safe even on 32 bit
>>   xen: dbgp: Fix warning when CONFIG_PCI is not enabled.

>> Konrad Rzeszutek Wilk (1):
>>   xen/xenbus: Fix compile warning.

>> Laszlo Ersek (1):
>>   xen PV passthru: assign SR-IOV virtual functions to separate virtual 
>> slots

>> Wei Yongjun (1):
>>   xen/x86: remove duplicated include from enlighten.c





--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Xen-devel] [PATCH] xen: remove unused Kconfig parameter

2013-07-09 Thread Sander Eikelenboom

Tuesday, July 9, 2013, 5:05:54 PM, you wrote:

> On Tue, Jul 09, 2013 at 10:48:40AM -0400, Konrad Rzeszutek Wilk wrote:
>> Then that should be discussed on grub2 to remove said check and modify
>> the code so that it can properly work without regression.

> Actually, the kernel patch removing that symbol should be applied so
> that grub2 breaks faster. One can't possibly rely on kernel internals
> for anything, as it is insanely insane (yep, the tautology is on purpose
> :-)).


How insanely insane is it to be able to determine whether a certain compiled 
kernel binary supports a certain function ?

Grub does this in it's update script to prevent adding a xen + kernel 
combination that has no chance of booting when dom0 support has not been 
configured in the kernel.
That doesn't seem to be a unreasonable thought.

Grepping the accompanied config file in /boot for the xen dom0 Kconfig 
parameter seems the best possible effort grub can do at the moment.
Especially since the Kconfig parameter naming doesn't change that often.

If you know a better way for grub to determine if a certain function for a 
kernel binary is supported then please elaborate ..

 --
 Sander

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Xen-devel] [PATCH] xen: remove unused Kconfig parameter

2013-07-10 Thread Sander Eikelenboom

Wednesday, July 10, 2013, 8:19:34 AM, you wrote:

> On Wed, Jul 10, 2013 at 12:34:58AM +0200, Sander Eikelenboom wrote:
>> 
>> Tuesday, July 9, 2013, 5:05:54 PM, you wrote:
>> 
>> > On Tue, Jul 09, 2013 at 10:48:40AM -0400, Konrad Rzeszutek Wilk wrote:
>> >> Then that should be discussed on grub2 to remove said check and modify
>> >> the code so that it can properly work without regression.
>> 
>> > Actually, the kernel patch removing that symbol should be applied so
>> > that grub2 breaks faster. One can't possibly rely on kernel internals
>> > for anything, as it is insanely insane (yep, the tautology is on purpose
>> > :-)).
>>
>> How insanely insane is it to be able to determine whether a certain
>> compiled kernel binary supports a certain function ?
>> 
>> Grub does this in it's update script to prevent adding a xen +
>> kernel combination that has no chance of booting when dom0 support
>> has not been configured in the kernel.  That doesn't seem to be a
>> unreasonable thought.
>> 
>> Grepping the accompanied config file in /boot for the xen dom0
>> Kconfig parameter seems the best possible effort grub can do at the
>> moment.

> I think this can be improved, even with the situation today.

>> Especially since the Kconfig parameter naming doesn't change that
>> often.
>> 
>> If you know a better way for grub to determine if a certain function
>> for a kernel binary is supported then please elaborate ..

> Certainly. Parse the ELF notes that are present in a dom0-capable
> Linux kernel binary itself.

> $ readelf -n vmlinux

> Notes at offset 0x0069be88 with length 0x017c:
>   Owner Data size   Description
>   Xen   0x0006  Unknown note type: (0x0006)
>   Xen   0x0004  Unknown note type: (0x0007)
>   Xen   0x0008  Unknown note type: (0x0005)
>   Xen   0x0008  Unknown note type: (0x0003)
>   Xen   0x0008  NT_VERSION (version)
>   Xen   0x0008  NT_ARCH (architecture)
>   Xen   0x002a  Unknown note type: (0x000a)
>   Xen   0x0004  Unknown note type: (0x0009)
>   Xen   0x0008  Unknown note type: (0x0008)
>   Xen   0x0010  Unknown note type: (0x000d)
>   Xen   0x0004  Unknown note type: (0x000e)
>   Xen   0x0008  Unknown note type: (0x000c)
>   Xen   0x0008  Unknown note type: (0x0004)
>   GNU   0x0014  NT_GNU_BUILD_ID (unique build ID bitstring)

> See arch/x86/xen/xen-head.S.

> There's a new note type (XEN_ELFNOTE_SUPPORTED_FEATURES) that we can
> use to make dom0 support explicit.

> http://xenbits.xen.org/gitweb/?p=xen.git;a=blob;f=xen/arch/x86/domain_build.c;hb=HEAD#l415

Seems like a better option, although completely dropping the check could be a 
option too.
Since dom0 support is in mainline distributions (at least Debian, haven't 
checked the other main yet) don't supply a seperate xen enabled kernel anymore,
so any distro supplied kernel has xen support. For the self-building case 
Borislav is probably right in that you have to watchout yourself.

So it would be nice to have at least some time to address this with upstream 
grub and the main distributions to patch their grub.

--
Sander


> --msw


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [media] cx25821 regression from 3.9: BUG: bad unlock balance detected!

2013-07-12 Thread Sander Eikelenboom

Friday, May 17, 2013, 11:52:17 AM, you wrote:

> On Fri May 17 2013 11:04:50 Sander Eikelenboom wrote:
>> 
>> Friday, May 17, 2013, 10:25:24 AM, you wrote:
>> 
>> > On Thu May 16 2013 19:41:42 Sander Eikelenboom wrote:
>> >> Hi Hans / Mauro,
>> >> 
>> >> With 3.10.0-rc1 (including the cx25821 changes from Hans), I get the bug 
>> >> below which wasn't present with 3.9.
>> 
>> > How do I reproduce this? I've tried to, but I can't make this happen.
>> 
>> > Looking at the code I can't see how it could hit this bug anyway.
>> 
>> I'm using "motion" to grab and process 6 from the video streams of the card 
>> i have (card with 8 inputs).
>> It seems the cx25821 underwent quite some changes between 3.9 and 3.10.

> It did.

>> And in the past there have been some more locking issues around mmap and 
>> media devices, although they seem to appear as circular locking dependencies 
>> and with different devices.
>>- http://www.mail-archive.com/linux-media@vger.kernel.org/msg46217.html
>>- Under kvm: http://www.spinics.net/lists/linux-media/msg63322.html

> Neither of those are related to this issue.

>> 
>> - Perhaps that running in a VM could have to do with it ?
>>- The driver on 3.9 occasionaly gives this, probably latency related (but 
>> continues to work):
>>  cx25821: cx25821_video_wakeup: 2 buffers handled (should be 1)
>> 
>>  Could it be something double unlocking in that path ?
>> 
>> - Is there any extra debugging i could enable that could pinpoint the issue ?

> Try this patch:

> diff --git a/drivers/media/pci/cx25821/cx25821-core.c 
> b/drivers/media/pci/cx25821/cx25821-core.c
> index b762c5b..8f8d0e0 100644
> --- a/drivers/media/pci/cx25821/cx25821-core.c
> +++ b/drivers/media/pci/cx25821/cx25821-core.c
> @@ -1208,7 +1208,6 @@ void cx25821_free_buffer(struct videobuf_queue *q, 
> struct cx25821_buffer *buf)
> struct videobuf_dmabuf *dma = videobuf_to_dma(&buf->vb);
>  
> BUG_ON(in_interrupt());
> -   videobuf_waiton(q, &buf->vb, 0, 0);
> videobuf_dma_unmap(q->dev, dma);
> videobuf_dma_free(dma);
> btcx_riscmem_free(to_pci_dev(q->dev), &buf->risc);

> I don't think the waiton is really needed for this driver.

> What really should happen is that videobuf is replaced by videobuf2 in this
> driver, but that's a fair amount of work.

Hi Hans,

After being busy for quite some time, i do have some spare time now.

Since i'm still having trouble with this driver, is there a patch series for a 
similar driver
that was converted to videobuf2 ?
I don't know if it is entirely in my league, but i could give it a try when i 
have a example.

--
Sander


> Regards,

> Hans

>> 
>> 
>> --
>> 
>> Sander
>> 
>> 
>> 
>> > Regards,
>> 
>> > Hans
>> 
>> >> 
>> >> --
>> >> Sander
>> >> 
>> >> 
>> >> [   53.004968] =
>> >> [   53.004968] [ BUG: bad unlock balance detected! ]
>> >> [   53.004968] 3.10.0-rc1-20130516-jens+ #1 Not tainted
>> >> [   53.004968] -
>> >> [   53.004968] motion/3328 is trying to release lock (&dev->lock) at:
>> >> [   53.004968] [] mutex_unlock+0x9/0x10
>> >> [   53.004968] but there are no more locks to release!
>> >> [   53.004968]
>> >> [   53.004968] other info that might help us debug this:
>> >> [   53.004968] 1 lock held by motion/3328:
>> >> [   53.004968]  #0:  (&mm->mmap_sem){++}, at: [] 
>> >> vm_munmap+0x3e/0x70
>> >> [   53.004968]
>> >> [   53.004968] stack backtrace:
>> >> [   53.004968] CPU: 1 PID: 3328 Comm: motion Not tainted 
>> >> 3.10.0-rc1-20130516-jens+ #1
>> >> [   53.004968] Hardware name: Xen HVM domU, BIOS 4.3-unstable 05/16/2013
>> >> [   53.004968]  819be5f9 88002ac35c58 819b9029 
>> >> 88002ac35c88
>> >> [   53.004968]  810e615e 88002ac35cb8 88002b7c18a8 
>> >> 819be5f9
>> >> [   53.004968]   88002ac35d28 810eb17e 
>> >> 810e7ba5
>> >> [   53.004968] Call Trace:
>> >> [   53.004968]  [] ? mutex_unlock+0x9/0x10
>> >> [   53.004968]  [] dump_stack+0x19/0x1b
>> >> [   53.004968]  [] print_u

[fuse[[xen][3.10-rc5] kernel oops: BUG: unable to handle kernel NULL pointer dereference at 0000000000000008 IP: [] __list_add+0x17/0xd0

2013-06-16 Thread Sander Eikelenboom
Hi All,

Tonight one of my PV guest kernels on Xen oopsed for the looks of it on some 
fuse activity (by glusterfs).

--
Sander

Oops:

[107481.132631] BUG: unable to handle kernel NULL pointer dereference at 
0008
[107481.132650] IP: [] __list_add+0x17/0xd0
[107481.132660] PGD 0
[107481.132664] Oops:  [#1] PREEMPT SMP
[107481.132670] Modules linked in:
[107481.132676] CPU: 0 PID: 2851 Comm: glusterfs Not tainted 
3.10.0-rc5-20130613-jens-konrad #1
[107481.132684] task: 88000ed3 ti: 88000ef34000 task.ti: 
88000ef34000
[107481.132691] RIP: e030:[]  [] 
__list_add+0x17/0xd0
[107481.132699] RSP: e02b:88000ef35988  EFLAGS: 00010086
[107481.132703] RAX: 88000ec13ff8 RBX: 88000d4ca150 RCX: 

[107481.132709] RDX:  RSI: 88000ec13ff8 RDI: 
88000d4ca150
[107481.132716] RBP: 88000ef359a8 R08:  R09: 
88000ed30700
[107481.132722] R10:  R11: 00038d60 R12: 
0001
[107481.132727] R13: 88000d4ca128 R14: 88000b7ae600 R15: 

[107481.132735] FS:  7f3fcb666700() GS:88000fc0() 
knlGS:
[107481.132742] CS:  e033 DS:  ES:  CR0: 8005003b
[107481.132746] CR2: 0008 CR3: 0db0e000 CR4: 
0660
[107481.132752] DR0:  DR1:  DR2: 

[107481.132758] DR3:  DR6: 0ff0 DR7: 
0400
[107481.132763] Stack:
[107481.132766]  88000fc14140 88000b7ae600 0001 
88000d4ca128
[107481.132776]  88000ef359c8 810cc180 88000d4ca128 
88000fc13700
[107481.132784]  88000ef35a28 810cf241 88000ef35a08 
810cac98
[107481.132793] Call Trace:
[107481.132799]  [] account_entity_enqueue+0x80/0x90
[107481.132806]  [] enqueue_task_fair+0x211/0xbb0
[107481.132813]  [] ? sched_clock_cpu+0xb8/0x130
[107481.132819]  [] enqueue_task+0x58/0x60
[107481.132824]  [] activate_task+0x1d/0x20
[107481.135844]  [] ttwu_do_activate.constprop.64+0x36/0x70
[107481.135844]  [] try_to_wake_up+0x257/0x320
[107481.135844]  [] default_wake_function+0xd/0x10
[107481.135844]  [] autoremove_wake_function+0x18/0x40
[107481.135844]  [] __wake_up_common+0x4d/0x80
[107481.135844]  [] __wake_up+0x3b/0x60
[107481.135844]  [] request_end+0xc5/0x190
[107481.135844]  [] fuse_dev_do_write+0xa3f/0xd10
[107481.135844]  [] ? __lock_acquire+0x3dc/0x2040
[107481.135844]  [] ? sock_aio_read.part.23+0xe7/0x110
[107481.135844]  [] fuse_dev_write+0x61/0x80
[107481.135844]  [] do_sync_readv_writev+0x6e/0xa0
[107481.135844]  [] do_readv_writev+0xe2/0x250
[107481.135844]  [] ? ep_poll+0x137/0x390
[107481.135844]  [] ? lock_release+0x133/0x250
[107481.135844]  [] vfs_writev+0x30/0x60
[107481.135844]  [] SyS_writev+0x50/0xc0
[107481.135844]  [] system_call_fastpath+0x16/0x1b
[107481.135844] Code: 48 83 c4 08 5b 5d c3 66 66 66 66 2e 0f 1f 84 00 00 00 00 
00 55 48 89 e5 48 83 ec 20 48 89 5d e8 4c 89 65 f0 48 89 fb 4c 89 6d f8 <4c> 8b 
42 08 49 89 f5 49 89 d4 49 39 f0 75 31 4d 8b 45 00 4d 39
[107481.135844] RIP  [] __list_add+0x17/0xd0
[107481.135844]  RSP 
[107481.135844] CR2: 0008
[107481.135844] ---[ end trace d628d0543f7ba8cb ]---
[107481.135844] BUG: spinlock lockup suspected on CPU#0, glusterfs/2851
[107481.135844]  lock: 0x88000fc13700, .magic: dead4ead, .owner: 
glusterfs/2851, .owner_cpu: 0
[107481.135844] CPU: 0 PID: 2851 Comm: glusterfs Tainted: G  D  
3.10.0-rc5-20130613-jens-konrad #1
[107481.135844]  88000fc13700 88000fc03ab8 819b7f45 
88000fc03ad8
[107481.135844]  819b7fd3 88000fc13700 bebf63f0 
88000fc03b08
[107481.135844]  81406dd5 88000fc13700 88000fc13700 
88000ee0a708
[107481.135844] Call Trace:
[107481.135844][] dump_stack+0x19/0x1b
[107481.135844]  [] spin_dump+0x8c/0x91
[107481.135844]  [] do_raw_spin_lock+0x75/0x140
[107481.135844]  [] _raw_spin_lock+0x3e/0x50
[107481.135844]  [] ? try_to_wake_up+0x24c/0x320
[107481.135844]  [] try_to_wake_up+0x24c/0x320
[107481.135844]  [] default_wake_function+0xd/0x10
[107481.135844]  [] autoremove_wake_function+0x18/0x40
[107481.135844]  [] __wake_up_common+0x4d/0x80
[107481.135844]  [] __wake_up+0x3b/0x60
[107481.135844]  [] wake_up_klogd_work_func+0x48/0x80
[107481.135844]  [] __irq_work_run+0x7c/0xb0
[107481.135844]  [] ? tick_sched_do_timer+0x40/0x40
[107481.135844]  [] irq_work_run+0x1e/0x40
[107481.135844]  [] update_process_times+0x5d/0x80
[107481.135844]  [] tick_sched_handle.isra.12+0x1e/0x50
[107481.135844]  [] tick_sched_timer+0x47/0x70
[107481.135844]  [] __run_hrtimer.isra.28+0x6f/0x120
[107481.135844]  [] hrtimer_interrupt+0xf7/0x230
[107481.135844]  [] xen_timer_interrupt+0x3a/0x1f0
[107481.135844]  [] ? 
net_rps_action_and_irq_enable.isra.75+0x8d/0xb0
[107481.135844]  [] handle_irq_event_percpu+0x47/0x1a0
[107481.135844]  [] ? info_for_irq+0x9/0x20
[107481.135844

Libata pull request ?

2013-03-22 Thread Sander Eikelenboom
Hi Jeff,

Your tree git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/libata-dev.git 
seems to be quiet for a few weeks now.
Any reason for not sending a pull request for the (few) pending fixes to Linus ?

--
Sander

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] libata-acpi.c: fix copy and paste mistake in ata_acpi_register_power_resource

2013-03-01 Thread Sander Eikelenboom
Fix a copy and paste mistake introduced in:

commit bc9b6407bd6df3ab7189e5622816bbc11ae9d2d8
"ACPI / PM: Rework the handling of devices depending on power resources"

Signed-off-by: Sander Eikelenboom 
---
 drivers/ata/libata-acpi.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/ata/libata-acpi.c b/drivers/ata/libata-acpi.c
index 0ea1018..cb3eab6d 100644
--- a/drivers/ata/libata-acpi.c
+++ b/drivers/ata/libata-acpi.c
@@ -1027,7 +1027,7 @@ static void ata_acpi_register_power_resource(struct 
ata_device *dev)
 
handle = ata_dev_acpi_handle(dev);
if (handle)
-   acpi_dev_pm_remove_dependent(handle, &sdev->sdev_gendev);
+   acpi_dev_pm_add_dependent(handle, &sdev->sdev_gendev);
 }
 
 static void ata_acpi_unregister_power_resource(struct ata_device *dev)
-- 
1.7.2.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Xen-devel] [GIT PULL] (xen) stable/for-jens-3.10 xenwatch: page allocation failure: order:7, mode:0x10c0d0

2013-05-04 Thread Sander Eikelenboom
Hello Sander,

Monday, April 29, 2013, 6:05:20 PM, you wrote:


> Monday, April 29, 2013, 5:46:23 PM, you wrote:

>> On Wed, Apr 24, 2013 at 08:16:37PM +0200, Sander Eikelenboom wrote:
>>> Friday, April 19, 2013, 4:44:01 PM, you wrote:
>>> 
>>> > Hey Jens,
>>> 
>>> > Please in your spare time (if there is such a thing at a conference)
>>> > pull this branch:
>>> 
>>> >  git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen.git 
>>> > stable/for-jens-3.10
>>> 

>> .. snip..
>>> Hi Konrad / Roger,
>>> 
>>> I tried this pull on top of latest Linus latest linux-3.9 tree, but 
>>> although it seems to boot and work fine at first, i seem to get trouble 
>>> after running for about a day.
>>> Without this pull it runs fine for several days.
>> .. snipp.

>>> [18496.013743] xenwatch: page allocation failure: order:7, mode:0x10c0d0
>>> [18496.031948] Pid: 54, comm: xenwatch Not tainted 3.9.0-rc8-20130424-jens+ 
>>> #1
>>> [18496.049897] Call Trace:
>>> [18496.067674]  [] warn_alloc_failed+0xf1/0x140
>>> [18496.085453]  [] ? trace_hardirqs_on+0xd/0x10
>>> [18496.102951]  [] ? on_each_cpu_mask+0x94/0xd0
>>> [18496.120270]  [] __alloc_pages_nodemask+0x69f/0x960
>>> [18496.137306]  [] alloc_pages_current+0xb1/0x160
>>> [18496.154051]  [] __get_free_pages+0x9/0x40
>>> [18496.170579]  [] __kmalloc+0x134/0x160
>>> [18496.186921]  [] xen_blkbk_probe+0x170/0x2f0
>>> [18496.202963]  [] xenbus_dev_probe+0x77/0x130
>>> [18496.218714]  [] ? __driver_attach+0xa0/0xa0
>>> [18496.234237]  [] driver_probe_device+0x81/0x220
>>> [18496.249605]  [] ? klist_next+0x8c/0x110
>>> [18496.264681]  [] ? __driver_attach+0xa0/0xa0
>>> [18496.279500]  [] __device_attach+0x4b/0x50
>>> [18496.294138]  [] bus_for_each_drv+0x68/0x90
>>> [18496.308553]  [] device_attach+0x89/0x90
>>> [18496.322694]  [] bus_probe_device+0xa8/0xd0
>>> [18496.336640]  [] device_add+0x650/0x720

>> .. snip..

>> Jens,
>> I don't know if you had pulled this git tree yet (I don't see it in
>> your for-3.10/* branches).

>> But if you have, Sander has found a bug (and Roger has a fix for it).

>> Whether you would like to wait until v3.11 to pull it (and me sending
>> the git pull around a month) is OK. Or pull it now and we will fix the
>> bugs in the -rc's as they creep up.

> Roger's fix seems to work for me ..

Hmm although it takes longer, i still see my memory getting souped-up.
Will see what setting:

echo 384 > /sys/module/xen_blkback/parameters/max_persistent_grants
echo 256 >> /sys/module/xen_blkback/parameters/max_buffer_pages

will do this time around.

--
Sander


(XEN) [2013-05-03 23:00:21] grant_table.c:1250:d1 Expanding dom (1) grant table 
from (7) to (8) frames.
(XEN) [2013-05-03 23:00:41] grant_table.c:1250:d1 Expanding dom (1) grant table 
from (8) to (9) frames.
(XEN) [2013-05-03 23:01:01] grant_table.c:1250:d1 Expanding dom (1) grant table 
from (9) to (10) frames.
(XEN) [2013-05-03 23:01:01] grant_table.c:1250:d1 Expanding dom (1) grant table 
from (10) to (11) frames.
(XEN) [2013-05-04 03:15:32] grant_table.c:289:d0 Increased maptrack size to 10 
frames
(XEN) [2013-05-04 03:15:34] grant_table.c:1250:d9 Expanding dom (9) grant table 
from (4) to (5) frames.
(XEN) [2013-05-04 03:15:34] grant_table.c:1250:d9 Expanding dom (9) grant table 
from (5) to (6) frames.
(XEN) [2013-05-04 03:15:34] grant_table.c:1250:d9 Expanding dom (9) grant table 
from (6) to (7) frames.
(XEN) [2013-05-04 03:15:34] grant_table.c:289:d0 Increased maptrack size to 11 
frames
(XEN) [2013-05-04 03:15:36] grant_table.c:1250:d8 Expanding dom (8) grant table 
from (4) to (5) frames.
(XEN) [2013-05-04 03:15:36] grant_table.c:1250:d8 Expanding dom (8) grant table 
from (5) to (6) frames.
(XEN) [2013-05-04 03:15:36] grant_table.c:289:d0 Increased maptrack size to 12 
frames
(XEN) [2013-05-04 03:15:36] grant_table.c:289:d0 Increased maptrack size to 13 
frames
(XEN) [2013-05-04 03:15:37] grant_table.c:1250:d4 Expanding dom (4) grant table 
from (4) to (5) frames.
(XEN) [2013-05-04 03:15:37] grant_table.c:1250:d4 Expanding dom (4) grant table 
from (5) to (6) frames.
(XEN) [2013-05-04 03:15:37] grant_table.c:1250:d4 Expanding dom (4) grant table 
from (6) to (7) frames.
(XEN) [2013-05-04 03:15:37] grant_table.c:289:d0 Increased maptrack size to 14 
frames
(XEN) [2013-05-04 03:15:37] grant_table.c:289:d0 Increased maptrack size to 15 
frames
(XEN) [2013-05-04 03:15:39] grant_table.c:1250:d3 Expanding dom (3) grant table 
from (4) to (5) frames.
(XEN) [2013-05-04 03:15:39] grant_table.c:1250:d3 Expanding dom (3) grant table 
from

Re: [Xen-devel] [GIT PULL] (xen) stable/for-jens-3.10 xenwatch: page allocation failure: order:7, mode:0x10c0d0

2013-04-29 Thread Sander Eikelenboom

Monday, April 29, 2013, 5:46:23 PM, you wrote:

> On Wed, Apr 24, 2013 at 08:16:37PM +0200, Sander Eikelenboom wrote:
>> Friday, April 19, 2013, 4:44:01 PM, you wrote:
>> 
>> > Hey Jens,
>> 
>> > Please in your spare time (if there is such a thing at a conference)
>> > pull this branch:
>> 
>> >  git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen.git 
>> > stable/for-jens-3.10
>> 

> .. snip..
>> Hi Konrad / Roger,
>> 
>> I tried this pull on top of latest Linus latest linux-3.9 tree, but although 
>> it seems to boot and work fine at first, i seem to get trouble after running 
>> for about a day.
>> Without this pull it runs fine for several days.
> .. snipp.

>> [18496.013743] xenwatch: page allocation failure: order:7, mode:0x10c0d0
>> [18496.031948] Pid: 54, comm: xenwatch Not tainted 3.9.0-rc8-20130424-jens+ 
>> #1
>> [18496.049897] Call Trace:
>> [18496.067674]  [] warn_alloc_failed+0xf1/0x140
>> [18496.085453]  [] ? trace_hardirqs_on+0xd/0x10
>> [18496.102951]  [] ? on_each_cpu_mask+0x94/0xd0
>> [18496.120270]  [] __alloc_pages_nodemask+0x69f/0x960
>> [18496.137306]  [] alloc_pages_current+0xb1/0x160
>> [18496.154051]  [] __get_free_pages+0x9/0x40
>> [18496.170579]  [] __kmalloc+0x134/0x160
>> [18496.186921]  [] xen_blkbk_probe+0x170/0x2f0
>> [18496.202963]  [] xenbus_dev_probe+0x77/0x130
>> [18496.218714]  [] ? __driver_attach+0xa0/0xa0
>> [18496.234237]  [] driver_probe_device+0x81/0x220
>> [18496.249605]  [] ? klist_next+0x8c/0x110
>> [18496.264681]  [] ? __driver_attach+0xa0/0xa0
>> [18496.279500]  [] __device_attach+0x4b/0x50
>> [18496.294138]  [] bus_for_each_drv+0x68/0x90
>> [18496.308553]  [] device_attach+0x89/0x90
>> [18496.322694]  [] bus_probe_device+0xa8/0xd0
>> [18496.336640]  [] device_add+0x650/0x720

> .. snip..

> Jens,
> I don't know if you had pulled this git tree yet (I don't see it in
> your for-3.10/* branches).

> But if you have, Sander has found a bug (and Roger has a fix for it).

> Whether you would like to wait until v3.11 to pull it (and me sending
> the git pull around a month) is OK. Or pull it now and we will fix the
> bugs in the -rc's as they creep up.

Roger's fix seems to work for me ..

--
Sander

> Thanks!


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[media] cx25821 regression from 3.9: BUG: bad unlock balance detected!

2013-05-16 Thread Sander Eikelenboom
Hi Hans / Mauro,

With 3.10.0-rc1 (including the cx25821 changes from Hans), I get the bug below 
which wasn't present with 3.9.

--
Sander


[   53.004968] =
[   53.004968] [ BUG: bad unlock balance detected! ]
[   53.004968] 3.10.0-rc1-20130516-jens+ #1 Not tainted
[   53.004968] -
[   53.004968] motion/3328 is trying to release lock (&dev->lock) at:
[   53.004968] [] mutex_unlock+0x9/0x10
[   53.004968] but there are no more locks to release!
[   53.004968]
[   53.004968] other info that might help us debug this:
[   53.004968] 1 lock held by motion/3328:
[   53.004968]  #0:  (&mm->mmap_sem){++}, at: [] 
vm_munmap+0x3e/0x70
[   53.004968]
[   53.004968] stack backtrace:
[   53.004968] CPU: 1 PID: 3328 Comm: motion Not tainted 
3.10.0-rc1-20130516-jens+ #1
[   53.004968] Hardware name: Xen HVM domU, BIOS 4.3-unstable 05/16/2013
[   53.004968]  819be5f9 88002ac35c58 819b9029 
88002ac35c88
[   53.004968]  810e615e 88002ac35cb8 88002b7c18a8 
819be5f9
[   53.004968]   88002ac35d28 810eb17e 
810e7ba5
[   53.004968] Call Trace:
[   53.004968]  [] ? mutex_unlock+0x9/0x10
[   53.004968]  [] dump_stack+0x19/0x1b
[   53.004968]  [] print_unlock_imbalance_bug+0xfe/0x110
[   53.004968]  [] ? mutex_unlock+0x9/0x10
[   53.004968]  [] lock_release_non_nested+0x1ce/0x320
[   53.004968]  [] ? debug_check_no_locks_freed+0x105/0x1b0
[   53.353529]  [] ? mutex_unlock+0x9/0x10
[   53.353529]  [] lock_release+0xfc/0x250
[   53.353529]  [] __mutex_unlock_slowpath+0xb2/0x1f0
[   53.353529]  [] mutex_unlock+0x9/0x10
[   53.353529]  [] videobuf_waiton+0x55/0x230
[   53.353529]  [] ? tlb_finish_mmu+0x32/0x50
[   53.353529]  [] ? unmap_region+0xc6/0x100
[   53.353529]  [] ? kmem_cache_free+0x195/0x230
[   53.353529]  [] cx25821_free_buffer+0x49/0xa0
[   53.353529]  [] cx25821_buffer_release+0x9/0x10
[   53.353529]  [] videobuf_vm_close+0xc5/0x160
[   53.353529]  [] remove_vma+0x25/0x60
[   53.353529]  [] do_munmap+0x307/0x410
[   53.353529]  [] vm_munmap+0x4c/0x70
[   53.353529]  [] SyS_munmap+0x9/0x10
[   53.353529]  [] system_call_fastpath+0x16/0x1b

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [media] cx25821 regression from 3.9: BUG: bad unlock balance detected!

2013-05-17 Thread Sander Eikelenboom

Friday, May 17, 2013, 10:25:24 AM, you wrote:

> On Thu May 16 2013 19:41:42 Sander Eikelenboom wrote:
>> Hi Hans / Mauro,
>> 
>> With 3.10.0-rc1 (including the cx25821 changes from Hans), I get the bug 
>> below which wasn't present with 3.9.

> How do I reproduce this? I've tried to, but I can't make this happen.

> Looking at the code I can't see how it could hit this bug anyway.

I'm using "motion" to grab and process 6 from the video streams of the card i 
have (card with 8 inputs).
It seems the cx25821 underwent quite some changes between 3.9 and 3.10.

And in the past there have been some more locking issues around mmap and media 
devices, although they seem to appear as circular locking dependencies and with 
different devices.
   - http://www.mail-archive.com/linux-media@vger.kernel.org/msg46217.html
   - Under kvm: http://www.spinics.net/lists/linux-media/msg63322.html

- Perhaps that running in a VM could have to do with it ?
   - The driver on 3.9 occasionaly gives this, probably latency related (but 
continues to work):
 cx25821: cx25821_video_wakeup: 2 buffers handled (should be 1)

 Could it be something double unlocking in that path ?

- Is there any extra debugging i could enable that could pinpoint the issue ?


--

Sander



> Regards,

> Hans

>> 
>> --
>> Sander
>> 
>> 
>> [   53.004968] =
>> [   53.004968] [ BUG: bad unlock balance detected! ]
>> [   53.004968] 3.10.0-rc1-20130516-jens+ #1 Not tainted
>> [   53.004968] -
>> [   53.004968] motion/3328 is trying to release lock (&dev->lock) at:
>> [   53.004968] [] mutex_unlock+0x9/0x10
>> [   53.004968] but there are no more locks to release!
>> [   53.004968]
>> [   53.004968] other info that might help us debug this:
>> [   53.004968] 1 lock held by motion/3328:
>> [   53.004968]  #0:  (&mm->mmap_sem){++}, at: [] 
>> vm_munmap+0x3e/0x70
>> [   53.004968]
>> [   53.004968] stack backtrace:
>> [   53.004968] CPU: 1 PID: 3328 Comm: motion Not tainted 
>> 3.10.0-rc1-20130516-jens+ #1
>> [   53.004968] Hardware name: Xen HVM domU, BIOS 4.3-unstable 05/16/2013
>> [   53.004968]  819be5f9 88002ac35c58 819b9029 
>> 88002ac35c88
>> [   53.004968]  810e615e 88002ac35cb8 88002b7c18a8 
>> 819be5f9
>> [   53.004968]   88002ac35d28 810eb17e 
>> 810e7ba5
>> [   53.004968] Call Trace:
>> [   53.004968]  [] ? mutex_unlock+0x9/0x10
>> [   53.004968]  [] dump_stack+0x19/0x1b
>> [   53.004968]  [] print_unlock_imbalance_bug+0xfe/0x110
>> [   53.004968]  [] ? mutex_unlock+0x9/0x10
>> [   53.004968]  [] lock_release_non_nested+0x1ce/0x320
>> [   53.004968]  [] ? debug_check_no_locks_freed+0x105/0x1b0
>> [   53.353529]  [] ? mutex_unlock+0x9/0x10
>> [   53.353529]  [] lock_release+0xfc/0x250
>> [   53.353529]  [] __mutex_unlock_slowpath+0xb2/0x1f0
>> [   53.353529]  [] mutex_unlock+0x9/0x10
>> [   53.353529]  [] videobuf_waiton+0x55/0x230
>> [   53.353529]  [] ? tlb_finish_mmu+0x32/0x50
>> [   53.353529]  [] ? unmap_region+0xc6/0x100
>> [   53.353529]  [] ? kmem_cache_free+0x195/0x230
>> [   53.353529]  [] cx25821_free_buffer+0x49/0xa0
>> [   53.353529]  [] cx25821_buffer_release+0x9/0x10
>> [   53.353529]  [] videobuf_vm_close+0xc5/0x160
>> [   53.353529]  [] remove_vma+0x25/0x60
>> [   53.353529]  [] do_munmap+0x307/0x410
>> [   53.353529]  [] vm_munmap+0x4c/0x70
>> [   53.353529]  [] SyS_munmap+0x9/0x10
>> [   53.353529]  [] system_call_fastpath+0x16/0x1b
>> 


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [media] cx25821 regression from 3.9: BUG: bad unlock balance detected!

2013-05-17 Thread Sander Eikelenboom

Friday, May 17, 2013, 11:52:17 AM, you wrote:

> On Fri May 17 2013 11:04:50 Sander Eikelenboom wrote:
>> 
>> Friday, May 17, 2013, 10:25:24 AM, you wrote:
>> 
>> > On Thu May 16 2013 19:41:42 Sander Eikelenboom wrote:
>> >> Hi Hans / Mauro,
>> >> 
>> >> With 3.10.0-rc1 (including the cx25821 changes from Hans), I get the bug 
>> >> below which wasn't present with 3.9.
>> 
>> > How do I reproduce this? I've tried to, but I can't make this happen.
>> 
>> > Looking at the code I can't see how it could hit this bug anyway.
>> 
>> I'm using "motion" to grab and process 6 from the video streams of the card 
>> i have (card with 8 inputs).
>> It seems the cx25821 underwent quite some changes between 3.9 and 3.10.

> It did.

>> And in the past there have been some more locking issues around mmap and 
>> media devices, although they seem to appear as circular locking dependencies 
>> and with different devices.
>>- http://www.mail-archive.com/linux-media@vger.kernel.org/msg46217.html
>>- Under kvm: http://www.spinics.net/lists/linux-media/msg63322.html

> Neither of those are related to this issue.

>> 
>> - Perhaps that running in a VM could have to do with it ?
>>- The driver on 3.9 occasionaly gives this, probably latency related (but 
>> continues to work):
>>  cx25821: cx25821_video_wakeup: 2 buffers handled (should be 1)
>> 
>>  Could it be something double unlocking in that path ?
>> 
>> - Is there any extra debugging i could enable that could pinpoint the issue ?

> Try this patch:

Hmm it seems it's gone after pulling in linuses latest tree, with some 
workqueue / rcu fixes.
(running without the patch underneath now)

Thx,

Sander


> diff --git a/drivers/media/pci/cx25821/cx25821-core.c 
> b/drivers/media/pci/cx25821/cx25821-core.c
> index b762c5b..8f8d0e0 100644
> --- a/drivers/media/pci/cx25821/cx25821-core.c
> +++ b/drivers/media/pci/cx25821/cx25821-core.c
> @@ -1208,7 +1208,6 @@ void cx25821_free_buffer(struct videobuf_queue *q, 
> struct cx25821_buffer *buf)
> struct videobuf_dmabuf *dma = videobuf_to_dma(&buf->vb);
>  
> BUG_ON(in_interrupt());
> -   videobuf_waiton(q, &buf->vb, 0, 0);
> videobuf_dma_unmap(q->dev, dma);
> videobuf_dma_free(dma);
> btcx_riscmem_free(to_pci_dev(q->dev), &buf->risc);

> I don't think the waiton is really needed for this driver.

> What really should happen is that videobuf is replaced by videobuf2 in this
> driver, but that's a fair amount of work.

> Regards,

> Hans

>> 
>> 
>> --
>> 
>> Sander
>> 
>> 
>> 
>> > Regards,
>> 
>> > Hans
>> 
>> >> 
>> >> --
>> >> Sander
>> >> 
>> >> 
>> >> [   53.004968] =
>> >> [   53.004968] [ BUG: bad unlock balance detected! ]
>> >> [   53.004968] 3.10.0-rc1-20130516-jens+ #1 Not tainted
>> >> [   53.004968] -
>> >> [   53.004968] motion/3328 is trying to release lock (&dev->lock) at:
>> >> [   53.004968] [] mutex_unlock+0x9/0x10
>> >> [   53.004968] but there are no more locks to release!
>> >> [   53.004968]
>> >> [   53.004968] other info that might help us debug this:
>> >> [   53.004968] 1 lock held by motion/3328:
>> >> [   53.004968]  #0:  (&mm->mmap_sem){++}, at: [] 
>> >> vm_munmap+0x3e/0x70
>> >> [   53.004968]
>> >> [   53.004968] stack backtrace:
>> >> [   53.004968] CPU: 1 PID: 3328 Comm: motion Not tainted 
>> >> 3.10.0-rc1-20130516-jens+ #1
>> >> [   53.004968] Hardware name: Xen HVM domU, BIOS 4.3-unstable 05/16/2013
>> >> [   53.004968]  819be5f9 88002ac35c58 819b9029 
>> >> 88002ac35c88
>> >> [   53.004968]  810e615e 88002ac35cb8 88002b7c18a8 
>> >> 819be5f9
>> >> [   53.004968]   88002ac35d28 810eb17e 
>> >> 810e7ba5
>> >> [   53.004968] Call Trace:
>> >> [   53.004968]  [] ? mutex_unlock+0x9/0x10
>> >> [   53.004968]  [] dump_stack+0x19/0x1b
>> >> [   53.004968]  [] print_unlock_imbalance_bug+0xfe/0x110
>> >> [   53.004968]  [] ? mutex_unlock+0x9/0x10
>> >> [   53.004968]  [] lock_release_non_nested+0x1ce/0x3

3.7.0-pre-rc1 INFO: inconsistent lock state kswapd0/792 [HC0[0]:SC0[0]:HE1:SE1] takes: (&anon_vma->mutex){+.+.?.}, at: [] page_lock_anon_vma+0x12d/0x1a0

2012-10-13 Thread Sander Eikelenboom
On linux kernel 3.7.0-pre-rc1 (last commit = 
4d7127dace8cf4b05eb7c8c8531fc204fbb195f4)

I get:

[ 2954.552722]
[ 2954.563914] =
[ 2954.573011] [ INFO: inconsistent lock state ]
[ 2954.582002] 3.6.0pre-rc1-20121013 #1 Tainted: GW   
[ 2954.591174] -
[ 2954.600275] inconsistent {RECLAIM_FS-ON-W} -> {IN-RECLAIM_FS-W} usage.
[ 2954.609099] kswapd0/792 [HC0[0]:SC0[0]:HE1:SE1] takes:
[ 2954.617855]  (&anon_vma->mutex){+.+.?.}, at: [] 
page_lock_anon_vma+0x12d/0x1a0
[ 2954.626841] {RECLAIM_FS-ON-W} state was registered at:
[ 2954.635560]   [] mark_held_locks+0xa4/0x130
[ 2954.644236]   [] lockdep_trace_alloc+0xe4/0x130
[ 2954.652867]   [] kmem_cache_alloc+0x33/0xd0
[ 2954.661416]   [] do_mmu_notifier_register+0x7f/0x160
[ 2954.669906]   [] mmu_notifier_register+0xe/0x10
[ 2954.678215]   [] gntdev_open+0xa3/0xe0
[ 2954.686432]   [] misc_open+0xb0/0x1a0
[ 2954.694523]   [] chrdev_open+0x98/0x170
[ 2954.702526]   [] do_dentry_open+0x25e/0x310
[ 2954.710471]   [] finish_open+0x30/0x50
[ 2954.718305]   [] do_last+0x30e/0xe90
[ 2954.725953]   [] path_openat+0xae/0x4e0
[ 2954.733444]   [] do_filp_open+0x44/0xa0
[ 2954.740911]   [] do_sys_open+0x103/0x1f0
[ 2954.748388]   [] sys_open+0x1c/0x20
[ 2954.755772]   [] system_call_fastpath+0x16/0x1b
[ 2954.763015] irq event stamp: 2815
[ 2954.770156] hardirqs last  enabled at (2815): [] 
mutex_trylock+0x15d/0x200
[ 2954.777538] hardirqs last disabled at (2814): [] 
mutex_trylock+0x67/0x200
[ 2954.784781] softirqs last  enabled at (0): [] 
copy_process+0x52a/0x14b0
[ 2954.792016] softirqs last disabled at (0): [<  (null)>]   
(null)
[ 2954.799237] 
[ 2954.799237] other info that might help us debug this:
[ 2954.813342]  Possible unsafe locking scenario:
[ 2954.813342] 
[ 2954.827364]CPU0
[ 2954.834223]
[ 2954.840907]   lock(&anon_vma->mutex);
[ 2954.847630]   
[ 2954.854238] lock(&anon_vma->mutex);
[ 2954.860924] 
[ 2954.860924]  *** DEADLOCK ***
[ 2954.860924] 
[ 2954.880533] no locks held by kswapd0/792.
[ 2954.887085] 
[ 2954.887085] stack backtrace:
[ 2954.900162] Pid: 792, comm: kswapd0 Tainted: GW
3.6.0pre-rc1-20121013 #1
[ 2954.906779] Call Trace:
[ 2954.913335]  [] print_usage_bug+0x244/0x2e0
[ 2954.919970]  [] mark_lock+0x60c/0x6a0
[ 2954.926503]  [] __lock_acquire+0x636/0xdd0
[ 2954.933108]  [] ? mark_held_locks+0xa4/0x130
[ 2954.939759]  [] lock_acquire+0xba/0x100
[ 2954.946193]  [] ? page_lock_anon_vma+0x12d/0x1a0
[ 2954.952383]  [] ? page_lock_anon_vma+0x12d/0x1a0
[ 2954.958260]  [] mutex_lock_nested+0x4c/0x450
[ 2954.963944]  [] ? page_lock_anon_vma+0x12d/0x1a0
[ 2954.969414]  [] ? trace_hardirqs_on_caller+0xf8/0x200
[ 2954.974640]  [] ? lock_release+0x117/0x250
[ 2954.979593]  [] page_lock_anon_vma+0x12d/0x1a0
[ 2954.984605]  [] ? page_mapped_in_vma+0xa0/0xa0
[ 2954.989597]  [] page_referenced+0x16b/0x2a0
[ 2954.994498]  [] ? _raw_spin_unlock_irq+0x2b/0x70
[ 2954.999459]  [] shrink_active_list+0x1bd/0x300
[ 2955.004413]  [] shrink_lruvec+0x484/0x640
[ 2955.009311]  [] ? zone_watermark_ok_safe+0xa4/0xc0
[ 2955.014236]  [] kswapd+0x854/0xda0
[ 2955.018987]  [] ? trace_hardirqs_on_caller+0xf8/0x200
[ 2955.023806]  [] ? wake_up_bit+0x40/0x40
[ 2955.028496]  [] ? _raw_spin_unlock_irqrestore+0x53/0xa0
[ 2955.033247]  [] ? zone_reclaim+0x420/0x420
[ 2955.037975]  [] kthread+0xd6/0xe0
[ 2955.042682]  [] ? __init_kthread_worker+0x70/0x70
[ 2955.047400]  [] ret_from_fork+0x7c/0xb0
[ 2955.052152]  [] ? __init_kthread_worker+0x70/0x70


--
Sander

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


3.8.0-rc3: possible circular locking dependency: &tty->legacy_mutex / &tty->hangup_work with serial/RFCOMM connection via USB bluetooth dongle

2013-01-12 Thread Sander Eikelenboom
Hi,

Running a 3.8.0-rc3 kernel (latest commit 
b719f43059903820c31edb30f4663a2818836e7f) kernel (debian squeeze os), i'm 
running into this lockdep warning when:

- Running a perl script that uses rfcomm to communicatie via bluetooth with a 
bluetooth/TTL converter.
- It can run ok for a few hours before this lockdep occurs and the perl script 
freezes.
- The info related to bluetooth from syslog:

Jan 12 10:24:08 serveerstertje kernel: [7.919775] Bluetooth: Virtual HCI 
driver ver 1.3
Jan 12 10:24:08 serveerstertje kernel: [7.920314] Bluetooth: HCI UART 
driver ver 2.2
Jan 12 10:24:08 serveerstertje kernel: [7.920316] Bluetooth: HCI H4 
protocol initialized
Jan 12 10:24:08 serveerstertje kernel: [7.920317] Bluetooth: HCI BCSP 
protocol initialized
Jan 12 10:24:08 serveerstertje kernel: [7.920318] Bluetooth: HCILL protocol 
initialized
Jan 12 10:24:08 serveerstertje kernel: [7.920318] Bluetooth: HCIATH3K 
protocol initialized
Jan 12 10:24:08 serveerstertje kernel: [7.920319] Bluetooth: HCI Three-wire 
UART (H5) protocol initialized

Jan 12 10:24:08 serveerstertje kernel: [8.191897] Bluetooth: RFCOMM TTY 
layer initialized
Jan 12 10:24:08 serveerstertje kernel: [8.191930] Bluetooth: RFCOMM socket 
layer initialized
Jan 12 10:24:08 serveerstertje kernel: [8.191931] Bluetooth: RFCOMM ver 1.11
Jan 12 10:24:08 serveerstertje kernel: [8.191932] Bluetooth: BNEP (Ethernet 
Emulation) ver 1.3
Jan 12 10:24:08 serveerstertje kernel: [8.191933] Bluetooth: BNEP filters: 
protocol multicast
Jan 12 10:24:08 serveerstertje kernel: [8.191944] Bluetooth: BNEP socket 
layer initialized
Jan 12 10:24:08 serveerstertje kernel: [8.191945] Bluetooth: HIDP (Human 
Interface Emulation) ver 1.2
Jan 12 10:24:08 serveerstertje kernel: [8.191954] Bluetooth: HIDP socket 
layer initialized

Jan 12 10:24:09 serveerstertje bluetoothd[3912]: Bluetooth deamon 4.66
Jan 12 10:24:09 serveerstertje bluetoothd[3912]: Starting SDP server
Jan 12 10:24:09 serveerstertje bluetoothd[3912]: Starting experimental netlink 
support
Jan 12 10:24:09 serveerstertje bluetoothd[3912]: Failed to find Bluetooth 
netlink family
Jan 12 10:24:09 serveerstertje bluetoothd[3912]: Failed to init netlink plugin
Jan 12 10:24:09 serveerstertje bluetoothd[3912]: bridge pan0 created
Jan 12 10:24:09 serveerstertje bluetoothd[3912]: HCI dev 0 registered
Jan 12 10:24:09 serveerstertje bluetoothd[3912]: Failed to open RFKILL control 
device
Jan 12 10:24:09 serveerstertje bluetoothd[3912]: HCI dev 0 up
Jan 12 10:24:09 serveerstertje bluetoothd[3912]: Starting security manager 0
Jan 12 10:24:09 serveerstertje bluetoothd[3912]: Adapter /org/bluez/3912/hci0 
has been enabled
Jan 12 10:24:09 serveerstertje bluetoothd[3912]: Failed to access HAL


- And the lockdep warning itself:

[28678.458250]
[28678.476588] ==
[28678.494887] [ INFO: possible circular locking dependency detected ]
[28678.513013] 3.8.0-rc3-20130112-netpatched-rocketscience-radeon #1 Not tainted
[28678.530909] ---
[28678.548636] kworker/2:1/19513 is trying to acquire lock:
[28678.566070]  (&tty->legacy_mutex){+.+.+.}, at: [] 
tty_lock_nested+0x3e/0x80
[28678.583577]
[28678.583577] but task is already holding lock:
[28678.617615]  ((&tty->hangup_work)){+.+...}, at: [] 
process_one_work+0x158/0x4b0
[28678.634569]
[28678.634569] which lock already depends on the new lock.
[28678.634569]
[28678.683868]
[28678.683868] the existing dependency chain (in reverse order) is:
[28678.715354]
[28678.715354] -> #2 ((&tty->hangup_work)){+.+...}:
[28678.745890][] __lock_acquire+0x44e/0xdd0
[28678.760975][] lock_acquire+0xba/0x100
[28678.775834][] flush_work+0x3a/0x250
[28678.790408][] tty_ldisc_flush_works+0x18/0x40
[28678.804877][] tty_ldisc_release+0x2e/0x90
[28678.818952][] tty_release+0x3c7/0x590
[28678.832813][] __fput+0xa9/0x2c0
[28678.846411][] fput+0x9/0x10
[28678.859644][] task_work_run+0x95/0xb0
[28678.872661][] do_notify_resume+0x6d/0x80
[28678.885516][] int_signal+0x12/0x17
[28678.898047]
[28678.898047] -> #1 (&tty->legacy_mutex/1){+.+...}:
[28678.922334][] __lock_acquire+0x44e/0xdd0
[28678.934268][] lock_acquire+0xba/0x100
[28678.945916][] mutex_lock_nested+0x4c/0x450
[28678.957318][] tty_lock_nested+0x3e/0x80
[28678.968500][] tty_lock_pair+0x6a/0x70
[28678.979405][] tty_release+0x16b/0x590
[28678.990012][] __fput+0xa9/0x2c0
[28679.000367][] fput+0x9/0x10
[28679.009455] FW: BLOCKED low udp input: IN=eth0 OUT= 
MAC=40:61:86:f4:67:d9:00:08:ae:10:46:60:08:00 SRC=112.203.174.221 
DST=88.159.69.252 LEN=131 TOS=0x00 PREC=0x00 TTL=38 ID=17898 PROTO=UDP 
SPT=27001 DPT=1024 LEN=111
[28679.030869][] task_work_run+0x95/0xb0
[28679.040727][] do_notify_resume+0x6d/0x80
[28679.050419][] in

Re: 3.8.0-rc0 on xen-unstable: RCU Stall during boot as dom0 kernel after IOAPIC

2012-12-16 Thread Sander Eikelenboom

Sunday, December 16, 2012, 6:38:24 PM, you wrote:

> On Fri, Dec 14, 2012 at 04:55:57PM +0100, Sander Eikelenboom wrote:
>> Hi Konrad,
>> 
>> I just tried to boot a 3.8.0-rc0 kernel (last commit: 
>> 7313264b899bbf3988841296265a6e0e8a7b6521) as dom0 on my machine with current 
>> xen-unstable.

> Yeah, saw it over the Dec 11->Dec 12 merges and was out on
> vacation during that time (just got back).

> Did you by any chance try to do a git bisect to narrow down
> which merge it was?

Hi Konrad,

Nope haven't had the time, I only tried resetting to commit 
189251705649bdfdf5e5850eb178f8cbfdac5480 as a "hunch"(just before a lot of x86 
and rcu commits), but the result didn't boot ..

--
Sander

> Thanks!
>> The boot stalls:
>> 
>> [0.00] ACPI: PM-Timer IO Port: 0x808
>> [0.00] ACPI: Local APIC address 0xfee0
>> [0.00] ACPI: LAPIC (acpi_id[0x01] lapic_id[0x00] enabled)
>> [0.00] ACPI: LAPIC (acpi_id[0x02] lapic_id[0x01] enabled)
>> [0.00] ACPI: LAPIC (acpi_id[0x03] lapic_id[0x02] enabled)
>> [0.00] ACPI: LAPIC (acpi_id[0x04] lapic_id[0x03] enabled)
>> [0.00] ACPI: LAPIC (acpi_id[0x05] lapic_id[0x04] enabled)
>> [0.00] ACPI: LAPIC (acpi_id[0x06] lapic_id[0x05] enabled)
>> [0.00] ACPI: IOAPIC (id[0x06] address[0xfec0] gsi_base[0])
>> [0.00] IOAPIC[0]: apic_id 6, version 33, address 0xfec0, GSI 0-23
>> [0.00] ACPI: IOAPIC (id[0x07] address[0xfec2] gsi_base[24])
>> [0.00] IOAPIC[1]: apic_id 7, version 33, address 0xfec2, GSI 24-
>> [   64.598628] INFO: rcu_preempt detected stalls on CPUs/tasks:
>> [   64.598676]  0: (1 GPs behind) idle=aed/140/0 drain=5 . timer 
>> not pending
>> [   64.598683]  (detected by 1, t=18004 jiffies, g=18446744073709551414, 
>> c=18446744073709551413, q=162)
>> [   64.598692] sending NMI to all CPUs:
>> [   64.598716] xen: vector 0x2 is not implemented
>> 
>> 
>> Perhaps an interesting line is the incomplete (no end of range, and it 
>> stalls there some time before the kernel reports the stall itself:
>> [0.00] IOAPIC[1]: apic_id 7, version 33, address 0xfec2, GSI 24-
>> 
>> 
>> The exact seem config with 3.7.0 as kernel works fine.
>> Complete serial log is attached.
>> 
>> --
>> 
>> Sander
>> 
>> 




--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 3.8.0-rc0 on xen-unstable: RCU Stall during boot as dom0 kernel after IOAPIC

2012-12-17 Thread Sander Eikelenboom

Sunday, December 16, 2012, 6:38:24 PM, you wrote:

> On Fri, Dec 14, 2012 at 04:55:57PM +0100, Sander Eikelenboom wrote:
>> Hi Konrad,
>> 
>> I just tried to boot a 3.8.0-rc0 kernel (last commit: 
>> 7313264b899bbf3988841296265a6e0e8a7b6521) as dom0 on my machine with current 
>> xen-unstable.

> Yeah, saw it over the Dec 11->Dec 12 merges and was out on
> vacation during that time (just got back).

> Did you by any chance try to do a git bisect to narrow down
> which merge it was?

Hi Konrad,

I tried to bisect, but did not succeed so far. But somehow i have the feeling 
it is at least partly .config related.
After make a new clone, and by hand trying to bisecting down, i came back to 
v3.7, but that also didn't boot.
So i will see if i can do it the other way around :S

--
Sander

> Thanks!
>> The boot stalls:
>> 
>> [0.00] ACPI: PM-Timer IO Port: 0x808
>> [0.00] ACPI: Local APIC address 0xfee0
>> [0.00] ACPI: LAPIC (acpi_id[0x01] lapic_id[0x00] enabled)
>> [0.00] ACPI: LAPIC (acpi_id[0x02] lapic_id[0x01] enabled)
>> [0.00] ACPI: LAPIC (acpi_id[0x03] lapic_id[0x02] enabled)
>> [0.00] ACPI: LAPIC (acpi_id[0x04] lapic_id[0x03] enabled)
>> [0.00] ACPI: LAPIC (acpi_id[0x05] lapic_id[0x04] enabled)
>> [0.00] ACPI: LAPIC (acpi_id[0x06] lapic_id[0x05] enabled)
>> [0.00] ACPI: IOAPIC (id[0x06] address[0xfec0] gsi_base[0])
>> [0.00] IOAPIC[0]: apic_id 6, version 33, address 0xfec0, GSI 0-23
>> [0.00] ACPI: IOAPIC (id[0x07] address[0xfec2] gsi_base[24])
>> [0.00] IOAPIC[1]: apic_id 7, version 33, address 0xfec2, GSI 24-
>> [   64.598628] INFO: rcu_preempt detected stalls on CPUs/tasks:
>> [   64.598676]  0: (1 GPs behind) idle=aed/140/0 drain=5 . timer 
>> not pending
>> [   64.598683]  (detected by 1, t=18004 jiffies, g=18446744073709551414, 
>> c=18446744073709551413, q=162)
>> [   64.598692] sending NMI to all CPUs:
>> [   64.598716] xen: vector 0x2 is not implemented
>> 
>> 
>> Perhaps an interesting line is the incomplete (no end of range, and it 
>> stalls there some time before the kernel reports the stall itself:
>> [0.00] IOAPIC[1]: apic_id 7, version 33, address 0xfec2, GSI 24-
>> 
>> 
>> The exact seem config with 3.7.0 as kernel works fine.
>> Complete serial log is attached.
>> 
>> --
>> 
>> Sander
>> 
>> 




--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 3.8.0-rc0 on xen-unstable: RCU Stall during boot as dom0 kernel after IOAPIC

2012-12-17 Thread Sander Eikelenboom

Sunday, December 16, 2012, 6:38:24 PM, you wrote:

> On Fri, Dec 14, 2012 at 04:55:57PM +0100, Sander Eikelenboom wrote:
>> Hi Konrad,
>> 
>> I just tried to boot a 3.8.0-rc0 kernel (last commit: 
>> 7313264b899bbf3988841296265a6e0e8a7b6521) as dom0 on my machine with current 
>> xen-unstable.

> Yeah, saw it over the Dec 11->Dec 12 merges and was out on
> vacation during that time (just got back).

> Did you by any chance try to do a git bisect to narrow down
> which merge it was?

Hi Konrad,

With some more effort it leads to:

git bisect start
# bad: [fa4c95bfdb85d568ae327d57aa33a4f55bab79c4] Merge branch 'for_linus' of 
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs
git bisect bad fa4c95bfdb85d568ae327d57aa33a4f55bab79c4
# good: [29594404d7fe73cd80eaa4ee8c43dcc53970c60e] Linux 3.7
git bisect good 29594404d7fe73cd80eaa4ee8c43dcc53970c60e
# bad: [98870901cce098bbe94d90d2c41d8d1fa8d94392] mm/bootmem.c: remove unused 
wrapper function reserve_bootmem_generic()
git bisect bad 98870901cce098bbe94d90d2c41d8d1fa8d94392
# good: [8966961b31c251b854169e9886394c2a20f2cea7] Merge tag 'staging-3.8-rc1' 
of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging
git bisect good 8966961b31c251b854169e9886394c2a20f2cea7
# bad: [22a40fd9a60388aec8106b0baffc8f59f83bb1b4] Merge tag 'dlm-3.8' of 
git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm
git bisect bad 22a40fd9a60388aec8106b0baffc8f59f83bb1b4
# good: [aefb058b0c27dafb15072406fbfd92d2ac2c8790] Merge branch 
'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
git bisect good aefb058b0c27dafb15072406fbfd92d2ac2c8790
# good: [b64c5fda3868cb29d5dae0909561aa7d93fb7330] Merge branch 
'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
git bisect good b64c5fda3868cb29d5dae0909561aa7d93fb7330
# bad: [139353ffbe42ac7abda42f3259c1c374cbf4b779] Merge tag 
'please-pull-einj-fix-for-acpi5' of 
git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras
git bisect bad 139353ffbe42ac7abda42f3259c1c374cbf4b779
# bad: [d07e43d70eef15a44a2c328a913d8d633a90e088] Merge branch 'omap-serial' of 
git://git.linaro.org/people/rmk/linux-arm
git bisect bad d07e43d70eef15a44a2c328a913d8d633a90e088
# bad: [a05a4e24dcd73c2de4ef3f8d520b8bbb44570c60] Merge branch 
'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
git bisect bad a05a4e24dcd73c2de4ef3f8d520b8bbb44570c60
# bad: [a71c8bc5dfefbbf80ef90739791554ef7ea4401b] x86, topology: Debug CPU0 
hotplug
git bisect bad a71c8bc5dfefbbf80ef90739791554ef7ea4401b
# bad: [42e78e9719aa0c76711e2731b19c90fe5ae05278] x86-64, hotplug: Add 
start_cpu0() entry point to head_64.S
git bisect bad 42e78e9719aa0c76711e2731b19c90fe5ae05278
# good: [4d25031a81d3cd32edc00de6596db76cc4010685] x86, topology: Don't offline 
CPU0 if any PIC irq can not be migrated out of it
git bisect good 4d25031a81d3cd32edc00de6596db76cc4010685
# bad: [209efae12981f3d2d694499b761def10895c078c] x86, hotplug, suspend: Online 
CPU0 for suspend or hibernate
git bisect bad 209efae12981f3d2d694499b761def10895c078c
# bad: [30106c174311b8cfaaa3186c7f6f9c36c62d17da] x86, hotplug: Support 
functions for CPU0 online/offline
git bisect bad 30106c174311b8cfaaa3186c7f6f9c36c62d17da



30106c174311b8cfaaa3186c7f6f9c36c62d17da is the first bad commit
commit 30106c174311b8cfaaa3186c7f6f9c36c62d17da
Author: Fenghua Yu 
Date:   Tue Nov 13 11:32:41 2012 -0800

x86, hotplug: Support functions for CPU0 online/offline

Add smp_store_boot_cpu_info() to store cpu info for BSP during boot time.

Now smp_store_cpu_info() stores cpu info for bringing up BSP or AP after
it's offline.

Continue to online CPU0 in native_cpu_up().

Continue to offline CPU0 in native_cpu_disable().

Signed-off-by: Fenghua Yu 
Link: 
http://lkml.kernel.org/r/1352835171-3958-5-git-send-email-fenghua...@intel.com
Signed-off-by: H. Peter Anvin 

:04 04 729e56e8eddaaf5d0f55257b82f28006dffb9aab 
d5c98e50cd92814351ee6c741b7e4c9afa29487c M  arch


Which seems to be merged in 
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=commitdiff;h=74b84233458e9db7c160cec67638efdbec748ca9

--

Sander


> Thanks!
>> The boot stalls:
>> 
>> [0.00] ACPI: PM-Timer IO Port: 0x808
>> [0.00] ACPI: Local APIC address 0xfee0
>> [0.00] ACPI: LAPIC (acpi_id[0x01] lapic_id[0x00] enabled)
>> [0.00] ACPI: LAPIC (acpi_id[0x02] lapic_id[0x01] enabled)
>> [0.00] ACPI: LAPIC (acpi_id[0x03] lapic_id[0x02] enabled)
>> [0.00] ACPI: LAPIC (acpi_id[0x04] lapic_id[0x03] enabled)
>> [0.00] ACPI: LAPIC (acpi_id[0x05] lapic_id[0x04] enabled)
>> [0.00] ACPI: LAPIC (acpi_id[0x06] lapic_id[0x05] enabled)
>> [0.00] ACPI: IOAPIC (id[0x06

Re: 3.8.0-rc0 on xen-unstable: RCU Stall during boot as dom0 kernel after IOAPIC

2012-12-17 Thread Sander Eikelenboom

Monday, December 17, 2012, 10:12:40 PM, you wrote:

> On Mon, Dec 17, 2012 at 03:46:34PM -0500, Konrad Rzeszutek Wilk wrote:
>> On Mon, Dec 17, 2012 at 09:32:17PM +0100, Sander Eikelenboom wrote:
>> > 
>> > Sunday, December 16, 2012, 6:38:24 PM, you wrote:
>> > 
>> > > On Fri, Dec 14, 2012 at 04:55:57PM +0100, Sander Eikelenboom wrote:
>> > >> Hi Konrad,
>> > >> 
>> > >> I just tried to boot a 3.8.0-rc0 kernel (last commit: 
>> > >> 7313264b899bbf3988841296265a6e0e8a7b6521) as dom0 on my machine with 
>> > >> current xen-unstable.
>> > 
>> > > Yeah, saw it over the Dec 11->Dec 12 merges and was out on
>> > > vacation during that time (just got back).
>> > 
>> > > Did you by any chance try to do a git bisect to narrow down
>> > > which merge it was?
>> > 
>> > Hi Konrad,
>> 
>> Hey Sander,
>> 
>> Thank you for doing the bisection.
>> 
>> Fenghua - any ideas what might be amiss in the Xen subsystem?
>> I hadn't looked at the patchset of the CPU0 offlining/onlining
>> so I am not completly up to speed on the particulars of the patches.

>> > 30106c174311b8cfaaa3186c7f6f9c36c62d17da is the first bad commit
>> > commit 30106c174311b8cfaaa3186c7f6f9c36c62d17da
>> > Author: Fenghua Yu 
>> > Date:   Tue Nov 13 11:32:41 2012 -0800
>> > 
>> > x86, hotplug: Support functions for CPU0 online/offline
>> > 
>> > Add smp_store_boot_cpu_info() to store cpu info for BSP during boot 
>> > time.
>> > 
>> > Now smp_store_cpu_info() stores cpu info for bringing up BSP or AP 
>> > after
>> > it's offline.
>> > 
>> > Continue to online CPU0 in native_cpu_up().
>> > 
>> > Continue to offline CPU0 in native_cpu_disable().
>> > 
>> > Signed-off-by: Fenghua Yu 
>> > Link: 
>> > http://lkml.kernel.org/r/1352835171-3958-5-git-send-email-fenghua...@intel.com
>> > Signed-off-by: H. Peter Anvin 
>> > 

> This patch:


> diff --git a/arch/x86/xen/smp.c b/arch/x86/xen/smp.c
> index 353c50f..4f7d259 100644
> --- a/arch/x86/xen/smp.c
> +++ b/arch/x86/xen/smp.c
> @@ -254,7 +254,7 @@ static void __init xen_smp_prepare_cpus(unsigned int 
> max_cpus)
> }
> xen_init_lock_cpu(0);
>  
> -   smp_store_cpu_info(0);
> +   smp_store_boot_cpu_info();
> cpu_data(0).x86_max_cores = 1;
>  
> for_each_possible_cpu(i) {

> Would do the corresponding change in the Xen subsystem that the above
> mentioned commit did. Perhaps that is all that is needed? I am going to
> be able to test this and look in more details tomorrow.

Seems like it, don't know if there are other things still lurking, but with 
your patch it boots again as dom0 :-)

Thx !

--
Sander


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 3.8.0-rc3: possible circular locking dependency: &tty->legacy_mutex / &tty->hangup_work with serial/RFCOMM connection via USB bluetooth dongle

2013-01-17 Thread Sander Eikelenboom

Saturday, January 12, 2013, 7:46:39 PM, you wrote:

> Hi,

> Running a 3.8.0-rc3 kernel (latest commit 
> b719f43059903820c31edb30f4663a2818836e7f) kernel (debian squeeze os), i'm 
> running into this lockdep warning when:

Ok, this seems to be fixed by commit: 852e4a8152b427c3f318bb0e1b5e938d64dcdc32 
(tty: don't deadlock while flushing workqueue)


> - Running a perl script that uses rfcomm to communicatie via bluetooth with a 
> bluetooth/TTL converter.
> - It can run ok for a few hours before this lockdep occurs and the perl 
> script freezes.
> - The info related to bluetooth from syslog:

> Jan 12 10:24:08 serveerstertje kernel: [7.919775] Bluetooth: Virtual HCI 
> driver ver 1.3
> Jan 12 10:24:08 serveerstertje kernel: [7.920314] Bluetooth: HCI UART 
> driver ver 2.2
> Jan 12 10:24:08 serveerstertje kernel: [7.920316] Bluetooth: HCI H4 
> protocol initialized
> Jan 12 10:24:08 serveerstertje kernel: [7.920317] Bluetooth: HCI BCSP 
> protocol initialized
> Jan 12 10:24:08 serveerstertje kernel: [7.920318] Bluetooth: HCILL 
> protocol initialized
> Jan 12 10:24:08 serveerstertje kernel: [7.920318] Bluetooth: HCIATH3K 
> protocol initialized
> Jan 12 10:24:08 serveerstertje kernel: [7.920319] Bluetooth: HCI 
> Three-wire UART (H5) protocol initialized

> Jan 12 10:24:08 serveerstertje kernel: [8.191897] Bluetooth: RFCOMM TTY 
> layer initialized
> Jan 12 10:24:08 serveerstertje kernel: [8.191930] Bluetooth: RFCOMM 
> socket layer initialized
> Jan 12 10:24:08 serveerstertje kernel: [8.191931] Bluetooth: RFCOMM ver 
> 1.11
> Jan 12 10:24:08 serveerstertje kernel: [8.191932] Bluetooth: BNEP 
> (Ethernet Emulation) ver 1.3
> Jan 12 10:24:08 serveerstertje kernel: [8.191933] Bluetooth: BNEP 
> filters: protocol multicast
> Jan 12 10:24:08 serveerstertje kernel: [8.191944] Bluetooth: BNEP socket 
> layer initialized
> Jan 12 10:24:08 serveerstertje kernel: [8.191945] Bluetooth: HIDP (Human 
> Interface Emulation) ver 1.2
> Jan 12 10:24:08 serveerstertje kernel: [8.191954] Bluetooth: HIDP socket 
> layer initialized

> Jan 12 10:24:09 serveerstertje bluetoothd[3912]: Bluetooth deamon 4.66
> Jan 12 10:24:09 serveerstertje bluetoothd[3912]: Starting SDP server
> Jan 12 10:24:09 serveerstertje bluetoothd[3912]: Starting experimental 
> netlink support
> Jan 12 10:24:09 serveerstertje bluetoothd[3912]: Failed to find Bluetooth 
> netlink family
> Jan 12 10:24:09 serveerstertje bluetoothd[3912]: Failed to init netlink plugin
> Jan 12 10:24:09 serveerstertje bluetoothd[3912]: bridge pan0 created
> Jan 12 10:24:09 serveerstertje bluetoothd[3912]: HCI dev 0 registered
> Jan 12 10:24:09 serveerstertje bluetoothd[3912]: Failed to open RFKILL 
> control device
> Jan 12 10:24:09 serveerstertje bluetoothd[3912]: HCI dev 0 up
> Jan 12 10:24:09 serveerstertje bluetoothd[3912]: Starting security manager 0
> Jan 12 10:24:09 serveerstertje bluetoothd[3912]: Adapter /org/bluez/3912/hci0 
> has been enabled
> Jan 12 10:24:09 serveerstertje bluetoothd[3912]: Failed to access HAL


> - And the lockdep warning itself:

> [28678.458250]
> [28678.476588] ==
> [28678.494887] [ INFO: possible circular locking dependency detected ]
> [28678.513013] 3.8.0-rc3-20130112-netpatched-rocketscience-radeon #1 Not 
> tainted
> [28678.530909] ---
> [28678.548636] kworker/2:1/19513 is trying to acquire lock:
> [28678.566070]  (&tty->legacy_mutex){+.+.+.}, at: [] 
> tty_lock_nested+0x3e/0x80
> [28678.583577]
> [28678.583577] but task is already holding lock:
> [28678.617615]  ((&tty->hangup_work)){+.+...}, at: [] 
> process_one_work+0x158/0x4b0
> [28678.634569]
> [28678.634569] which lock already depends on the new lock.
> [28678.634569]
> [28678.683868]
> [28678.683868] the existing dependency chain (in reverse order) is:
> [28678.715354]
[28678.715354] ->> #2 ((&tty->hangup_work)){+.+...}:
> [28678.745890][] __lock_acquire+0x44e/0xdd0
> [28678.760975][] lock_acquire+0xba/0x100
> [28678.775834][] flush_work+0x3a/0x250
> [28678.790408][] tty_ldisc_flush_works+0x18/0x40
> [28678.804877][] tty_ldisc_release+0x2e/0x90
> [28678.818952][] tty_release+0x3c7/0x590
> [28678.832813][] __fput+0xa9/0x2c0
> [28678.846411][] fput+0x9/0x10
> [28678.859644][] task_work_run+0x95/0xb0
> [28678.872661][] do_notify_resume+0x6d/0x80
> [28678.885516][] int_signal+0x12/0x17
> [28678.898047]
[28678.898047] ->> #1 (&tty->legacy_mutex/1){+.+...}:
> [28678.922334][] __lock_acquire+0x44e/0xdd0
> [28678.934268][] lock_acquire+0xba/0x100
> [28678.945916][] mutex_lock_nested+0x4c/0x450
> [28678.957318][] tty_lock_nested+0x3e/0x80
> [28678.968500][] tty_lock_pair+0x6a/0x70
> [28678.979405][] tty_release+0x16b/0x590
> [28678.990012][] __fput+0xa9/0x2c0
> [28679.000367]   

Bluetooth / TTY: [ 1806.484970] INFO: task kworker/0:1:25023 blocked for more than 120 seconds.

2013-01-08 Thread Sander Eikelenboom
I'm trying to use a USB bluetooth dongle to connect to a bluetooth to serial 
device with RFCOMM.
It's able to work fine for some time, but tt consistently fails after some time.

This is sometimes right on the start when connecting to the /dev/rfcomm0, but 
it can also require several hours of running fine while connected and 
exchanging data.

This is the stacktrace i get:

[ 1806.484970] INFO: task kworker/0:1:25023 blocked for more than 120 seconds.
[ 1806.503488] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this 
message.
[ 1806.521864] kworker/0:1 D 0201 0 25023  2 0x
[ 1806.540026]  88000baa7be8 0216 880037079148 
880037079148
[ 1806.557926]  8800386fa0e0 00013040 88000baa7fd8 
88000baa6010
[ 1806.575622]  00013040 00013040 88000baa7fd8 
00013040
[ 1806.592981] Call Trace:
[ 1806.610066]  [] ? lock_release+0x117/0x250
[ 1806.627150]  [] ? lock_acquire+0xd8/0x100
[ 1806.643901]  [] ? tty_lock_nested+0x3e/0x80
[ 1806.660460]  [] schedule+0x24/0x70
[ 1806.676724]  [] schedule_preempt_disabled+0x13/0x20
[ 1806.692780]  [] mutex_lock_nested+0x1ab/0x450
[ 1806.708582]  [] ? tty_lock_nested+0x3e/0x80
[ 1806.724140]  [] tty_lock_nested+0x3e/0x80
[ 1806.739421]  [] tty_lock+0xb/0x10
[ 1806.754418]  [] __tty_hangup+0x65/0x3c0
[ 1806.769153]  [] ? process_one_work+0x158/0x4b0
[ 1806.783648]  [] do_tty_hangup+0x10/0x20
[ 1806.797905]  [] process_one_work+0x1c0/0x4b0
[ 1806.811958]  [] ? process_one_work+0x158/0x4b0
[ 1806.825752]  [] ? __tty_hangup+0x3c0/0x3c0
[ 1806.839332]  [] worker_thread+0x11e/0x3d0
[ 1806.852654]  [] ? manage_workers+0x2e0/0x2e0
[ 1806.865719]  [] kthread+0xd6/0xe0
[ 1806.878518]  [] ? __init_kthread_worker+0x70/0x70
[ 1806.891064]  [] ret_from_fork+0x7c/0xb0
[ 1806.903376]  [] ? __init_kthread_worker+0x70/0x70
[ 1806.939888] INFO: lockdep is turned off.
[ 1806.951766] INFO: task zabbix_slimmeme:27798 blocked for more than 120 
seconds.
[ 1806.963521] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this 
message.
[ 1806.975059] zabbix_slimmeme D 88002a619070 0 27798  27355 0x
[ 1806.986497]  88bb7818 0216 8802 
8202ae38
[ 1806.997893]  88002a619070 00013040 88bb7fd8 
88bb6010
[ 1807.008944]  00013040 00013040 88bb7fd8 
00013040
[ 1807.019692] Call Trace:
[ 1807.030165]  [] ? __module_text_address+0xd/0x60
[ 1807.040524]  [] ? __module_text_address+0xd/0x60
[ 1807.050568]  [] ? is_module_text_address+0x2b/0x60
[ 1807.060389]  [] ? __kernel_text_address+0x58/0x80
[ 1807.069996]  [] ? local_bh_disable+0x17/0x20
[ 1807.079383]  [] ? lock_acquire+0xd8/0x100
[ 1807.088467]  [] schedule+0x24/0x70
[ 1807.097296]  [] schedule_timeout+0x1bd/0x220
[ 1807.105884]  [] ? lock_acquire+0xd8/0x100
[ 1807.114211]  [] ? wait_for_common+0x31/0x170
[ 1807.122301]  [] ? lock_release+0x117/0x250
[ 1807.130156]  [] wait_for_common+0x101/0x170
[ 1807.137804]  [] ? try_to_wake_up+0x310/0x310
[ 1807.145193]  [] wait_for_completion+0x18/0x20
[ 1807.152350]  [] flush_work+0x195/0x250
[ 1807.159275]  [] ? flush_work+0x1b0/0x250
[ 1807.165957]  [] ? cwq_dec_nr_in_flight+0xd0/0xd0
[ 1807.172401]  [] tty_ldisc_flush_works+0x18/0x40
[ 1807.178634]  [] tty_ldisc_release+0x2e/0x90
[ 1807.184586]  [] tty_release+0x3c7/0x590
[ 1807.190264]  [] ? trace_hardirqs_on+0xd/0x10
[ 1807.195910]  [] ? __mutex_unlock_slowpath+0x149/0x1d0
[ 1807.201455]  [] ? try_to_wake_up+0x310/0x310
[ 1807.206927]  [] tty_open+0x3c4/0x5f0
[ 1807.212366]  [] chrdev_open+0x98/0x170
[ 1807.217803]  [] ? lg_local_unlock+0x3d/0x70
[ 1807.223255]  [] ? cdev_put+0x30/0x30
[ 1807.228678]  [] do_dentry_open+0x25e/0x310
[ 1807.234040]  [] finish_open+0x30/0x50
[ 1807.239445]  [] do_last+0x30e/0xe90
[ 1807.244805]  [] ? link_path_walk+0x9a/0x9f0
[ 1807.250170]  [] path_openat+0xae/0x4e0
[ 1807.255503]  [] ? lock_release+0x117/0x250
[ 1807.260835]  [] ? do_select+0x3f4/0x6d0
[ 1807.266174]  [] do_filp_open+0x44/0xa0
[ 1807.271504]  [] ? __alloc_fd+0xb3/0x150
[ 1807.276904]  [] do_sys_open+0x103/0x1f0
[ 1807.282262]  [] sys_open+0x1c/0x20
[ 1807.287579]  [] system_call_fastpath+0x16/0x1b
[ 1807.292892] INFO: lockdep is turned off.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Intel-gfx] Linux 4.8-rc?: WARNING: at drivers/gpu/drm/i915/intel_pm.c:7866 sandybridge_pcode_write Missing switch case (16) in gen6_check_mailbox_status

2016-09-07 Thread Sander Eikelenboom

On 2016-09-07 16:49, Jani Nikula wrote:

On Tue, 06 Sep 2016, li...@eikelenboom.it wrote:

On 2016-09-06 11:25, Jani Nikula wrote:

On Tue, 06 Sep 2016, li...@eikelenboom.it wrote:

L.S.,

Since one of the last 4.8 RC's i'm getting the warning below when
booting on my sandybridge based thinkpad.
 From what it seems the machine still works fine though.


What does 'lspci -nns 2' say for you?


00:02.0 VGA compatible controller [0300]: Intel Corporation 2nd
Generation Core Processor Family Integrated Graphics Controller
[8086:0126] (rev 09)


Fixed in drm-intel-fixes by

commit fc2780b66b15092ac68272644a522c1624c48547
Author: Chris Wilson 
Date:   Fri Aug 26 11:59:26 2016 +0100

drm/i915: Add GEN7_PCODE_MIN_FREQ_TABLE_GT_RATIO_OUT_OF_RANGE to 
SNB


BR,
Jani.


Works-for-me, thx!

--
Sander


Re: ce56a86e2a ("x86/mm: Limit mmap() of /dev/mem to valid physical addresses"): kernel BUG at arch/x86/mm/physaddr.c:79!

2017-10-26 Thread Sander Eikelenboom
On 26/10/17 10:05, Sander Eikelenboom wrote:
> On 26/10/17 00:02, Craig Bergstrom wrote:
>> Thanks for the notification, my apologies for the breakage.  I'll take a
>> close look and see if I can figure out what went wrong.
>>
>> Sander, any chance you can send /proc/iomem and the inputs to the mmap call
>> that fail on your affected system?
> 
> Hi Craig,
> 
> The output from /proc/iomem is simple to get and attached.
> The mmap call is probably issued by qemu and will require more digging.

Ahh grepping qemu gave a pointer, it's probably the code in:

http://xenbits.xen.org/gitweb/?p=qemu-xen.git;a=blob;f=hw/xen/xen_pt_msi.c;h=ff9a79f5d27ad7d74a1b22297be560feb455063c;hb=5cd7ce5dde3f228b3b669ed9ca432f588947bd40

around line 571, that would also explain why it's only this device that
has the problem, since it's the only one trying to use MSI(-X)
interrupts. Will see it i can add some logging to that function.

--
Sander


> 
> I don't know if there is that much time left for 4.14, since we are at
> RC6 already.
> 
> --
> Sander
> 
> 
>>
>>
>> On Wed, Oct 25, 2017 at 2:50 PM, Boris Ostrovsky >> wrote:
>>
>>> On 10/23/2017 10:44 PM, Fengguang Wu wrote:
>>>> Greetings,
>>>>
>>>> 0day kernel testing robot got the below dmesg and the first bad commit is
>>>>
>>>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
>>> master
>>>>
>>>> commit ce56a86e2ade45d052b3228cdfebe913a1ae7381
>>>> Author: Craig Bergstrom 
>>>> AuthorDate: Thu Oct 19 13:28:56 2017 -0600
>>>> Commit: Ingo Molnar 
>>>> CommitDate: Fri Oct 20 09:48:00 2017 +0200
>>>>
>>>>  x86/mm: Limit mmap() of /dev/mem to valid physical addresses
>>>
>>> Also note
>>> https://lists.xenproject.org/archives/html/xen-devel/2017-10/msg02935.html
>>>
>>> -boris
>>>
>>
> 



Re: ce56a86e2a ("x86/mm: Limit mmap() of /dev/mem to valid physical addresses"): kernel BUG at arch/x86/mm/physaddr.c:79!

2017-10-26 Thread Sander Eikelenboom
On 26/10/17 00:02, Craig Bergstrom wrote:
> Thanks for the notification, my apologies for the breakage.  I'll take a
> close look and see if I can figure out what went wrong.
> 
> Sander, any chance you can send /proc/iomem and the inputs to the mmap call
> that fail on your affected system?

Hi Craig,

The output from /proc/iomem is simple to get and attached.
The mmap call is probably issued by qemu and will require more digging.

I don't know if there is that much time left for 4.14, since we are at
RC6 already.

--
Sander


> 
> 
> On Wed, Oct 25, 2017 at 2:50 PM, Boris Ostrovsky > wrote:
> 
>> On 10/23/2017 10:44 PM, Fengguang Wu wrote:
>>> Greetings,
>>>
>>> 0day kernel testing robot got the below dmesg and the first bad commit is
>>>
>>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
>> master
>>>
>>> commit ce56a86e2ade45d052b3228cdfebe913a1ae7381
>>> Author: Craig Bergstrom 
>>> AuthorDate: Thu Oct 19 13:28:56 2017 -0600
>>> Commit: Ingo Molnar 
>>> CommitDate: Fri Oct 20 09:48:00 2017 +0200
>>>
>>>  x86/mm: Limit mmap() of /dev/mem to valid physical addresses
>>
>> Also note
>> https://lists.xenproject.org/archives/html/xen-devel/2017-10/msg02935.html
>>
>> -boris
>>
> 

-0fff : Reserved
1000-00095fff : System RAM
00096000-000963ff : RAM buffer
00096400-000f : Reserved
  000a-000b : PCI Bus :00
  000c-000cfdff : Video ROM
  000d-000d : PCI Bus :00
000d4800-000d4bff : Adapter ROM
  000f-000f : System ROM
0010-7fff : System RAM
  0100-01d2a703 : Kernel code
  01d2a704-025450ff : Kernel data
  02b3f000-02cc1fff : Kernel bss
c7f9-c7f9dfff : ACPI Tables
c7f9e000-c7fd : ACPI Non-volatile Storage
c7fe-c7ff : Reserved
c800-dfff : PCI Bus :00
  cfe0-cfef : PCI Bus :0c
cfef8000-cfefbfff : :0c:00.0
  cfef8000-cfefbfff : r8169
cfeff000-cfef : :0c:00.0
  cfeff000-cfef : r8169
  cff0-cfff : PCI Bus :0d
cfff8000-cfffbfff : :0d:00.0
  cfff8000-cfffbfff : r8169
c000-cfff : :0d:00.0
  c000-cfff : r8169
  d000-dfff : PCI Bus :0f
d000-dfff : :0f:00.0
  d000-d0ff : vesafb
e000-efff : PCI MMCONFIG  [bus 00-ff]
  e000-efff : pnp 00:07
f000-febf : PCI Bus :00
  f600-f6003fff : Reserved
f600-f6003fff : pnp 00:01
  fdcf7000-fdcf7fff : :00:12.0
fdcf7000-fdcf7fff : ohci_hcd
  fdcf8000-fdcfbfff : :00:14.2
  fdcfc000-fdcfcfff : :00:13.0
fdcfc000-fdcfcfff : ohci_hcd
  fdcfd000-fdcfdfff : :00:14.5
fdcfd000-fdcfdfff : ohci_hcd
  fdcfe000-fdcfefff : :00:16.0
fdcfe000-fdcfefff : ohci_hcd
  fdcff000-fdcff3ff : :00:11.0
fdcff000-fdcff3ff : ahci
  fdcff400-fdcff4ff : :00:12.2
fdcff400-fdcff4ff : ehci_hcd
  fdcff800-fdcff8ff : :00:13.2
fdcff800-fdcff8ff : ehci_hcd
  fdcffc00-fdcffcff : :00:16.2
fdcffc00-fdcffcff : ehci_hcd
  fde0-fdef : PCI Bus :04
fdef8000-fdef8fff : :04:00.0
fdef9000-fdef9fff : :04:00.1
fdefa000-fdefafff : :04:00.2
fdefb000-fdefbfff : :04:00.3
fdefc000-fdefcfff : :04:00.4
fdefd000-fdefdfff : :04:00.5
fdefe000-fdefefff : :04:00.6
fdeff000-fdef : :04:00.7
  fdf0-fe1f : PCI Bus :05
fdfe-fdff : :05:00.0
fe00-fe1f : PCI Bus :06
  fe00-fe0f : PCI Bus :07
fe0e-fe0e : :07:00.0
fe0ff800-fe0f : :07:00.0
  fe0ff800-fe0f : ahci
  fe10-fe1f : PCI Bus :08
fe1fe000-fe1f : :08:00.0
  fe20-fe3f : PCI Bus :09
fe20-fe3f : :09:00.0
  fe40-fe4f : PCI Bus :0a
fe4f8000-fe4f8fff : :0a:00.0
fe4f9000-fe4f9fff : :0a:00.1
fe4fa000-fe4fafff : :0a:00.2
fe4fb000-fe4fbfff : :0a:00.3
fe4fc000-fe4fcfff : :0a:00.4
fe4fd000-fe4fdfff : :0a:00.5
fe4fe000-fe4fefff : :0a:00.6
fe4ff000-fe4f : :0a:00.7
  fe50-fe5f : PCI Bus :0b
fe5fe000-fe5f : :0b:00.0
  fe60-fe6f : PCI Bus :0c
fe6e-fe6f : :0c:00.0
  fe70-fe7f : PCI Bus :0d
fe7e-fe7f : :0d:00.0
  fe80-fe8f : PCI Bus :0e
fe8fe000-fe8f : :0e:00.0
  fe90-fe9f : PCI Bus :0f
fe9e-fe9e : :0f:00.0
fe9fc000-fe9f : :0f:00.1
  fe9fc000-fe9f : ICH HD audio
fec0-fec00fff : Reserved
  fec0-fec003ff : IOAPIC 0
fec1-fec1001f : pnp 00:06
fec2-fec20fff : Reserved
  fec2-fec203ff : IOAPIC 1
fed0-fed003ff : HPET 2
  fed0-fed003ff : PNP0103:00
fed8-fed80fff : pnp 00:06
fee0-feef : Reserved
  fee0-fee00fff : Local APIC
fee0-fee00fff : pnp 00:05
ffb8-ffbf : pnp 00:06
ffe0- : Reserved
fd-ff : Reserve

Re: ce56a86e2a ("x86/mm: Limit mmap() of /dev/mem to valid physical addresses"): kernel BUG at arch/x86/mm/physaddr.c:79!

2017-10-26 Thread Sander Eikelenboom
On 26/10/17 10:12, Sander Eikelenboom wrote:
> On 26/10/17 10:05, Sander Eikelenboom wrote:
>> On 26/10/17 00:02, Craig Bergstrom wrote:
>>> Thanks for the notification, my apologies for the breakage.  I'll take a
>>> close look and see if I can figure out what went wrong.
>>>
>>> Sander, any chance you can send /proc/iomem and the inputs to the mmap call
>>> that fail on your affected system?
>>
>> Hi Craig,
>>
>> The output from /proc/iomem is simple to get and attached.
>> The mmap call is probably issued by qemu and will require more digging.
> 
> Ahh grepping qemu gave a pointer, it's probably the code in:
> 
> http://xenbits.xen.org/gitweb/?p=qemu-xen.git;a=blob;f=hw/xen/xen_pt_msi.c;h=ff9a79f5d27ad7d74a1b22297be560feb455063c;hb=5cd7ce5dde3f228b3b669ed9ca432f588947bd40
> 
> around line 571, that would also explain why it's only this device that
> has the problem, since it's the only one trying to use MSI(-X)
> interrupts. Will see it i can add some logging to that function.

Attached is the qemu debug output with an extra line outputting all stuff
used to calculate the arguments used by the mmap-call.
--
Sander

 
> --
> Sander
> 
> 
>>
>> I don't know if there is that much time left for 4.14, since we are at
>> RC6 already.
>>
>> --
>> Sander
>>
>>
>>>
>>>
>>> On Wed, Oct 25, 2017 at 2:50 PM, Boris Ostrovsky >>> wrote:
>>>
>>>> On 10/23/2017 10:44 PM, Fengguang Wu wrote:
>>>>> Greetings,
>>>>>
>>>>> 0day kernel testing robot got the below dmesg and the first bad commit is
>>>>>
>>>>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
>>>> master
>>>>>
>>>>> commit ce56a86e2ade45d052b3228cdfebe913a1ae7381
>>>>> Author: Craig Bergstrom 
>>>>> AuthorDate: Thu Oct 19 13:28:56 2017 -0600
>>>>> Commit: Ingo Molnar 
>>>>> CommitDate: Fri Oct 20 09:48:00 2017 +0200
>>>>>
>>>>>  x86/mm: Limit mmap() of /dev/mem to valid physical addresses
>>>>
>>>> Also note
>>>> https://lists.xenproject.org/archives/html/xen-devel/2017-10/msg02935.html
>>>>
>>>> -boris
>>>>
>>>
>>
> 

qemu-system-i386: -serial pty: char device redirected to /dev/pts/16 (label serial0)
[00:05.0] xen_pt_realize: Assigning real physical device 08:00.0 to devfn 0x28
[00:05.0] xen_pt_register_regions: IO region 0 registered (size=0x2000 base_addr=0xfe1fe000 type: 0x4)
[00:05.0] xen_pt_config_reg_init: Offset 0x000e mismatch! Emulated=0x0080, host=0x, syncing to 0x0080.
[00:05.0] xen_pt_config_reg_init: Offset 0x0010 mismatch! Emulated=0x, host=0xfe1fe004, syncing to 0xfe1fe004.
[00:05.0] xen_pt_config_reg_init: Offset 0x0052 mismatch! Emulated=0x, host=0x4803, syncing to 0x0003.
[00:05.0] xen_pt_config_reg_init: Offset 0x0072 mismatch! Emulated=0x, host=0x0086, syncing to 0x0080.
[00:05.0] xen_pt_config_reg_init: Offset 0x00a4 mismatch! Emulated=0x, host=0x8fc0, syncing to 0x8fc0.
[00:05.0] xen_pt_config_reg_init: Offset 0x00b2 mismatch! Emulated=0x, host=0x1012, syncing to 0x1012.
[00:05.0] xen_pt_msix_init: get MSI-X table BAR base 0xfe1fe000
[00:05.0] xen_pt_msix_init: table_off = 0x1000, total_entries = 8
[00:05.0] xen_pt_msix_init: table_off = 0x1000, total_entries = 8, PCI_MSIX_ENTRY_SIZE = 0x10,  msix->table_offset_adjust = 0,  msix->table_base = 0xfe1fe000
[00:05.0] xen_pt_msix_init: Error: Can't map physical MSI-X table: Invalid argument
[00:05.0] xen_pt_msix_size_init: Error: Internal error: Invalid xen_pt_msix_init.
Failed to initialize 12/15, type = 0x1, rc: -22
[00:05.0] xen_pt_msi_set_enable: disabling MSI.
*** Error in `/usr/local/lib/xen/bin/qemu-system-i386': corrupted size vs. prev_size: 0x55ce13565570 ***
=== Backtrace: =
/lib/x86_64-linux-gnu/libc.so.6(+0x70bcb)[0x7f700ab7ebcb]
/lib/x86_64-linux-gnu/libc.so.6(+0x76f96)[0x7f700ab84f96]
/lib/x86_64-linux-gnu/libc.so.6(+0x77388)[0x7f700ab85388]
/lib/x86_64-linux-gnu/libc.so.6(+0x78dca)[0x7f700ab86dca]
/lib/x86_64-linux-gnu/libc.so.6(__libc_calloc+0x27b)[0x7f700ab89b4b]
/lib/x86_64-linux-gnu/libglib-2.0.so.0(g_malloc0+0x21)[0x7f700bbbee61]
/usr/local/lib/xen/bin/qemu-system-i386(+0x6d78ee)[0x55ce114298ee]
/usr/local/lib/xen/bin/qemu-system-i386(+0x6d309e)[0x55ce1142509e]
/usr/local/lib/xen/bin/qemu-system-i386(+0x6d316f)[0x55ce1142516f]
/usr/local/lib/xen/bin/qemu-system-i386(+0x24d79b)[0x55ce10f9f79b]
/usr/local/lib/xen/bin/qemu-system-i386(+0x6da8bf)[0x55ce1142c8bf]
/usr/local/lib/xen/bin/qemu-system-i386(+0x70717c)[0x55ce1145917c]
/usr/local/lib/xen/

Re: ce56a86e2a ("x86/mm: Limit mmap() of /dev/mem to valid physical addresses"): kernel BUG at arch/x86/mm/physaddr.c:79!

2017-10-26 Thread Sander Eikelenboom
On 26/10/17 19:49, Craig Bergstrom wrote:
> Sander, thanks for the details, they've been very useful.
> 
> I suspect that your host system's mem=2048M parameter is causing the
> problem.  Any chance you can confirm by removing the parameter and
> running the guest code path?

I removed it, but kept the hypervisor limiting dom0 memory to 2046M intact (in 
grub using the xen bootcmd: 
"multiboot   /xen-4.10.gz  dom0_mem=2048M,max:2048M ."

Unfortunately that doesn't change anything, the guest still fails to start with 
the same errors.

> More specifically, since you're telling the kernel that it's high
> memory address is at 2048M and your device is at 0xfe1fe000 (~4G), the
> new mmap() limits are preventing you from mapping addresses that are
> explicitly disallowed by the parameter.
> 

Which would probably mean the current patch prohibits hard limiting the dom0 
memory to a certain value (below 4G)
at least in combination with PCI-passthrough. So the only thing left would be 
to have no hard memory restriction on dom0
and rely on auto-ballooning, but I'm not a great fan of that.

I don't know how KVM handles setting memory limits for the host system, but 
perhaps it suffers from the same issue.

I also tried the patch from one of your last mails to make the check "less 
strict", 
but still get the same errors (when using the hard memory limits).

--
Sander

 
> 
> On Thu, Oct 26, 2017 at 10:39 AM, Ingo Molnar  wrote:
>>
>> * Craig Bergstrom  wrote:
>>
>>> Yes, not much time left for 4.14, it might be reasonable to pull the
>>> change out since it's causing problems. [...]
>>
>> Ok, I'll queue up a revert tomorrow morning and send it to Linus ASAP if 
>> there's
>> no good fix by then. In hindsight I should have queued it for v4.15 ...
>>
>> Thanks,
>>
>> Ingo



Linux 4.14-rc6 bisected regression tun devices not working anymore in openvpn

2017-10-28 Thread Sander Eikelenboom
L.S.,

While testing a linux 4.14-rc6 kernel i noticed OpenVPN didn't function 
anymore. 
My openvpn config uses tun devices and is pretty standard.
The openvpn version is current Debian stable: openvpn 2.4.0-6+deb9u2

>From the openvpn logging:
Sat Oct 28 16:03:34 2017 us=175829 TUN/TAP device  opened
Sat Oct 28 16:03:34 2017 us=183027 Note: Cannot set tx queue length on : No 
such device (errno=19)
Sat Oct 28 16:03:34 2017 us=183055 do_ifconfig, 
tt->did_ifconfig_ipv6_setup=0
Sat Oct 28 16:03:34 2017 us=183071 /sbin/ip link set dev  up mtu 1500
Cannot find device ""
Sat Oct 28 16:03:34 2017 us=200445 Linux ip link set failed: external 
program exited with error status: 1
Sat Oct 28 16:03:34 2017 us=200482 Exiting due to fatal error
Sat Oct 28 16:38:17 2017 us=923381 TCP/UDP: Closing socket
Sat Oct 28 16:38:17 2017 us=925986 Closing TUN/TAP interface


The offending commit is: 
0ad646c81b2182f7fa67ec0c8c825e0ee165696d
"tun: call dev_get_valid_name() before register_netdevice()" 

Reverting this commit fixes the issue for me, it's unfortunate that the commit 
it self seems to fix an other issue.

--
Sander


Re: [Xen-devel] [PATCH] xen/blkfront: When purging persistent grants, keep them in the buffer

2018-09-27 Thread Sander Eikelenboom
On 27/09/18 16:26, Jens Axboe wrote:
> On 9/27/18 1:12 AM, Juergen Gross wrote:
>> On 22/09/18 21:55, Boris Ostrovsky wrote:
>>> Commit a46b53672b2c ("xen/blkfront: cleanup stale persistent grants")
>>> added support for purging persistent grants when they are not in use. As
>>> part of the purge, the grants were removed from the grant buffer, This
>>> eventually causes the buffer to become empty, with BUG_ON triggered in
>>> get_free_grant(). This can be observed even on an idle system, within
>>> 20-30 minutes.
>>>
>>> We should keep the grants in the buffer when purging, and only free the
>>> grant ref.
>>>
>>> Fixes: a46b53672b2c ("xen/blkfront: cleanup stale persistent grants")
>>> Signed-off-by: Boris Ostrovsky 
>>
>> Reviewed-by: Juergen Gross 
> 
> Since Konrad is out, I'm going to queue this up for 4.19.
> 

Hi Boris/Juergen.

Last week i tested a linux-4.19-rc4 kernel with xen-next and this patch from 
Boris pulled on top. 
Unfortunately it made a VM hang (probably because it's rootFS is shuffled from 
under it's feet 
and it gave these in dom0 dmesg:

[ 9251.696090] xen-blkback: requesting a grant already in use
[ 9251.705861] xen-blkback: trying to add a gref that's already in the tree
[ 9251.715781] xen-blkback: requesting a grant already in use
[ 9251.725756] xen-blkback: trying to add a gref that's already in the tree
[ 9251.735698] xen-blkback: requesting a grant already in use
[ 9251.745573] xen-blkback: trying to add a gref that's already in the tree

The VM was a HVM with 4 vcpu's and 2 phy disks:
xen-blkback: backend/vbd/14/768: using 4 queues, protocol 1 (x86_64-abi) 
persistent grants
xen-blkback: backend/vbd/14/832: using 4 queues, protocol 1 (x86_64-abi) 
persistent grants


Currently i have been running 4.19-rc5 with xen-next on top and commit 
a46b53672b2c reverted,
for a couple of days. That seems to run stable for me (since it's a small box 
so i'm not hit
by what a46b53672b2c tried to fix.

If you can come up with a debug patch i can give that a spin tomorrow evening 
or in the weekend,
so we are hopefully still in time for the 4.19 release.

--
Sander


Re: [Xen-devel] [PATCH] xen/blkfront: When purging persistent grants, keep them in the buffer

2018-09-27 Thread Sander Eikelenboom
On 27/09/18 21:06, Boris Ostrovsky wrote:
> On 9/27/18 2:56 PM, Jens Axboe wrote:
>> On 9/27/18 12:52 PM, Sander Eikelenboom wrote:
>>> On 27/09/18 16:26, Jens Axboe wrote:
>>>> On 9/27/18 1:12 AM, Juergen Gross wrote:
>>>>> On 22/09/18 21:55, Boris Ostrovsky wrote:
>>>>>> Commit a46b53672b2c ("xen/blkfront: cleanup stale persistent grants")
>>>>>> added support for purging persistent grants when they are not in use. As
>>>>>> part of the purge, the grants were removed from the grant buffer, This
>>>>>> eventually causes the buffer to become empty, with BUG_ON triggered in
>>>>>> get_free_grant(). This can be observed even on an idle system, within
>>>>>> 20-30 minutes.
>>>>>>
>>>>>> We should keep the grants in the buffer when purging, and only free the
>>>>>> grant ref.
>>>>>>
>>>>>> Fixes: a46b53672b2c ("xen/blkfront: cleanup stale persistent grants")
>>>>>> Signed-off-by: Boris Ostrovsky 
>>>>> Reviewed-by: Juergen Gross 
>>>> Since Konrad is out, I'm going to queue this up for 4.19.
>>>>
>>> Hi Boris/Juergen.
>>>
>>> Last week i tested a linux-4.19-rc4 kernel with xen-next and this patch 
>>> from Boris pulled on top. 
>>> Unfortunately it made a VM hang (probably because it's rootFS is shuffled 
>>> from under it's feet 
> 
> What do you mean by "rootFS is shuffled from under it's feet " ?

Assumption that block-front getting borked and either a kernel crash or rootfs 
becoming mounted readonly. Didn't (try) to check though.

>>> and it gave these in dom0 dmesg:
>>>
>>> [ 9251.696090] xen-blkback: requesting a grant already in use
>>> [ 9251.705861] xen-blkback: trying to add a gref that's already in the tree
>>> [ 9251.715781] xen-blkback: requesting a grant already in use
>>> [ 9251.725756] xen-blkback: trying to add a gref that's already in the tree
>>> [ 9251.735698] xen-blkback: requesting a grant already in use
>>> [ 9251.745573] xen-blkback: trying to add a gref that's already in the tree
>>>
>>> The VM was a HVM with 4 vcpu's and 2 phy disks:
>>> xen-blkback: backend/vbd/14/768: using 4 queues, protocol 1 (x86_64-abi) 
>>> persistent grants
>>> xen-blkback: backend/vbd/14/832: using 4 queues, protocol 1 (x86_64-abi) 
>>> persistent grants
>>>
>>>
>>> Currently i have been running 4.19-rc5 with xen-next on top and commit
>>> a46b53672b2c reverted, for a couple of days. That seems to run stable
>>> for me (since it's a small box so i'm not hit by what a46b53672b2c
>>> tried to fix.
>>>
>>> If you can come up with a debug patch i can give that a spin tomorrow
>>> evening or in the weekend, so we are hopefully still in time for the
>>> 4.19 release.
>> At this late in the game, might make more sense to simply revert the
>> buggy commit.  Especially since what is currently out there doesn't fix
>> the issue for you.
Don't know if Boris or Juergen have a hunch about the issue, if not perhaps a 
revert is the best. 

> If decision is to revert then I think the whole series needs to be
> reverted.
> 
> -boris
> 

For Boris and Juergen:
Would it make sense to have an "xen-next" branch in the xen-tip tree that is:
- based on the previous stable kernel
- and has the for-linus branches for the upcoming kernel release on top;
- and has the pathes for net(-next) and block changes on top (since these don't 
go via the tree but only via mailing-list patches);
  (which are scattered, difficult to track and use for automated testing)
- and dependency patches for the above if necessary to be able to build.

So there is one branch that can be used to test ALL pending kernel related Xen 
patches and which could be used in OSStest without as
many potential false alarms as linux-next will have ?

--
Sander


Re: [Xen-devel] [PATCH] xen/blkfront: When purging persistent grants, keep them in the buffer

2018-09-27 Thread Sander Eikelenboom
On 27/09/18 23:48, Boris Ostrovsky wrote:
> On 9/27/18 5:37 PM, Jens Axboe wrote:
>> On 9/27/18 2:33 PM, Sander Eikelenboom wrote:
>>> On 27/09/18 21:06, Boris Ostrovsky wrote:
>>>> On 9/27/18 2:56 PM, Jens Axboe wrote:
>>>>> On 9/27/18 12:52 PM, Sander Eikelenboom wrote:
>>>>>> On 27/09/18 16:26, Jens Axboe wrote:
>>>>>>> On 9/27/18 1:12 AM, Juergen Gross wrote:
>>>>>>>> On 22/09/18 21:55, Boris Ostrovsky wrote:
>>>>>>>>> Commit a46b53672b2c ("xen/blkfront: cleanup stale persistent grants")
>>>>>>>>> added support for purging persistent grants when they are not in use. 
>>>>>>>>> As
>>>>>>>>> part of the purge, the grants were removed from the grant buffer, This
>>>>>>>>> eventually causes the buffer to become empty, with BUG_ON triggered in
>>>>>>>>> get_free_grant(). This can be observed even on an idle system, within
>>>>>>>>> 20-30 minutes.
>>>>>>>>>
>>>>>>>>> We should keep the grants in the buffer when purging, and only free 
>>>>>>>>> the
>>>>>>>>> grant ref.
>>>>>>>>>
>>>>>>>>> Fixes: a46b53672b2c ("xen/blkfront: cleanup stale persistent grants")
>>>>>>>>> Signed-off-by: Boris Ostrovsky 
>>>>>>>> Reviewed-by: Juergen Gross 
>>>>>>> Since Konrad is out, I'm going to queue this up for 4.19.
>>>>>>>
>>>>>> Hi Boris/Juergen.
>>>>>>
>>>>>> Last week i tested a linux-4.19-rc4 kernel with xen-next and this patch 
>>>>>> from Boris pulled on top. 
>>>>>> Unfortunately it made a VM hang (probably because it's rootFS is 
>>>>>> shuffled from under it's feet 
>>>> What do you mean by "rootFS is shuffled from under it's feet " ?
>>> Assumption that block-front getting borked and either a kernel crash or 
>>> rootfs becoming mounted readonly. Didn't (try) to check though.
>>>
>>>>>> and it gave these in dom0 dmesg:
>>>>>>
>>>>>> [ 9251.696090] xen-blkback: requesting a grant already in use
>>>>>> [ 9251.705861] xen-blkback: trying to add a gref that's already in the 
>>>>>> tree
>>>>>> [ 9251.715781] xen-blkback: requesting a grant already in use
>>>>>> [ 9251.725756] xen-blkback: trying to add a gref that's already in the 
>>>>>> tree
>>>>>> [ 9251.735698] xen-blkback: requesting a grant already in use
>>>>>> [ 9251.745573] xen-blkback: trying to add a gref that's already in the 
>>>>>> tree
>>>>>>
>>>>>> The VM was a HVM with 4 vcpu's and 2 phy disks:
>>>>>> xen-blkback: backend/vbd/14/768: using 4 queues, protocol 1 (x86_64-abi) 
>>>>>> persistent grants
>>>>>> xen-blkback: backend/vbd/14/832: using 4 queues, protocol 1 (x86_64-abi) 
>>>>>> persistent grants
>>>>>>
>>>>>>
>>>>>> Currently i have been running 4.19-rc5 with xen-next on top and commit
>>>>>> a46b53672b2c reverted, for a couple of days. That seems to run stable
>>>>>> for me (since it's a small box so i'm not hit by what a46b53672b2c
>>>>>> tried to fix.
>>>>>>
>>>>>> If you can come up with a debug patch i can give that a spin tomorrow
>>>>>> evening or in the weekend, so we are hopefully still in time for the
>>>>>> 4.19 release.
>>>>> At this late in the game, might make more sense to simply revert the
>>>>> buggy commit.  Especially since what is currently out there doesn't fix
>>>>> the issue for you.
>>> Don't know if Boris or Juergen have a hunch about the issue, if not
>>> perhaps a revert is the best.
>> Anyone? Unless I hear otherwise, I'll revert the series tomorrow.
> 
> Juergen may have something to say by tomorrow, but from my perspective,
> given that we are coming up on rc6 --- yes.
> 
> I looked at the patches again and didn't see anything obvious.
> 
> -boris

Could also be that what i hit is a latent bug, 
that is not caused by these patches but merely got uncovered by them.

xl dmesg also shows quite some:
(XEN) [2018-09-24 03:15:46.847] grant_table.c:1755:d14v0 Expanding d14 
grant table from 19 to 20 frames
(XEN) [2018-09-24 03:15:46.849] grant_table.c:1755:d14v0 Expanding d14 
grant table from 20 to 21 frames
(and has done that for ages on my box not leading to any direct problems to my 
knowledge)

I don't know if there could be related and something around the (persistent) 
grants for block devices could be leaking under some conditions?

--
Sander



Re: [Xen-devel] [PATCH v5 9/9] xen/pciback: Implement PCI reset slot or bus with 'do_flr' SysFS attribute

2014-12-04 Thread Sander Eikelenboom

Thursday, December 4, 2014, 1:24:47 PM, you wrote:

> On 04/12/14 12:06, Konrad Rzeszutek Wilk wrote:
>> 
>> On Dec 4, 2014 6:30 AM, David Vrabel  wrote:
>>>
>>> On 03/12/14 21:40, Konrad Rzeszutek Wilk wrote: 

 Instead of doing all this complex dance, we depend on the toolstack 
 doing the right thing. As such implement the 'do_flr' SysFS attribute 
 which 'xl' uses when a device is detached or attached from/to a guest. 
 It bypasses the need to worry about the PCI lock. 
>>>
>>> No.  Get pciback to add its own "reset" sysfs file (as I have repeatedly 
>>> proposed). 
>>>
>> 
>> Which does not work as the kobj will complain (as there is already an 
>> 'reset' associated with the PCI device).

> It is only needed if the core won't provide one.

> +static int pcistub_try_create_reset_file(struct pci_dev *pci)
> +{
> +   struct xen_pcibk_dev_data *dev_data = pci_get_drvdata(pci);
> +   struct device *dev = &pci->dev;
> +   int ret;
> +
> +   /* Already have a per-function reset? */
> +   if (pci_probe_reset_function(pci) == 0)
> +   return 0;
> +
> +   ret = device_create_file(dev, &dev_attr_reset);
> +   if (ret < 0)
> +   return ret;
+   dev_data->>created_reset_file = true;
> +   return 0;
> +}

Wouldn't the "core-reset-sysfs-file" be still wired to the end up calling 
"pci.c:__pci_dev_reset" ?

The problem with that function is that from my testing it seems that the 
first option "pci_dev_specific_reset" always seems to return succes, so all the
other options are skipped (flr, pm, slot, bus). However the device it self is 
not properly reset enough (perhaps the pci_dev_specific_reset is good enough 
for 
none virtualization purposes and it's probably the least intrusive. For 
virtualization however it would be nice to be sure it resets properly, or have 
a 
way to force a specific reset routine.) 

So it's the ordering and skipping of the other resets that seems to make
this workaround necessary in the first place.

> David





--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Xen-devel] [PATCH v5 9/9] xen/pciback: Implement PCI reset slot or bus with 'do_flr' SysFS attribute

2014-12-04 Thread Sander Eikelenboom

Thursday, December 4, 2014, 2:43:06 PM, you wrote:

> On 04/12/14 13:10, Sander Eikelenboom wrote:
>> 
>> Thursday, December 4, 2014, 1:24:47 PM, you wrote:
>> 
>>> On 04/12/14 12:06, Konrad Rzeszutek Wilk wrote:
>>>>
>>>> On Dec 4, 2014 6:30 AM, David Vrabel  wrote:
>>>>>
>>>>> On 03/12/14 21:40, Konrad Rzeszutek Wilk wrote: 
>>>>>>
>>>>>> Instead of doing all this complex dance, we depend on the toolstack 
>>>>>> doing the right thing. As such implement the 'do_flr' SysFS attribute 
>>>>>> which 'xl' uses when a device is detached or attached from/to a guest. 
>>>>>> It bypasses the need to worry about the PCI lock. 
>>>>>
>>>>> No.  Get pciback to add its own "reset" sysfs file (as I have repeatedly 
>>>>> proposed). 
>>>>>
>>>>
>>>> Which does not work as the kobj will complain (as there is already an 
>>>> 'reset' associated with the PCI device).
>> 
>>> It is only needed if the core won't provide one.
>> 
>>> +static int pcistub_try_create_reset_file(struct pci_dev *pci)
>>> +{
>>> +   struct xen_pcibk_dev_data *dev_data = pci_get_drvdata(pci);
>>> +   struct device *dev = &pci->dev;
>>> +   int ret;
>>> +
>>> +   /* Already have a per-function reset? */
>>> +   if (pci_probe_reset_function(pci) == 0)
>>> +   return 0;
>>> +
>>> +   ret = device_create_file(dev, &dev_attr_reset);
>>> +   if (ret < 0)
>>> +   return ret;
>> +   dev_data->>created_reset_file = true;
>>> +   return 0;
>>> +}
>> 
>> Wouldn't the "core-reset-sysfs-file" be still wired to the end up calling 
>> "pci.c:__pci_dev_reset" ?
>> 
>> The problem with that function is that from my testing it seems that the 
>> first option "pci_dev_specific_reset" always seems to return succes, so all 
>> the
>> other options are skipped (flr, pm, slot, bus). However the device it self 
>> is 
>> not properly reset enough (perhaps the pci_dev_specific_reset is good enough 
>> for 
>> none virtualization purposes and it's probably the least intrusive. For 
>> virtualization however it would be nice to be sure it resets properly, or 
>> have a 
>> way to force a specific reset routine.)

> Then you need work with the maintainer for those specific devices or
> drivers to fix their specific reset function.

> I'm not adding stuff to pciback to workaround broken quirks.

OK that's a pretty clear message there, so if one wants to use pci and vga
passthrough one should better use KVM and vfio-pci. 

vfio-pci has:
- logic to do the try-slot-bus-reset logic
- it has quirks specific to vga passthrough
implemented in from the looks of it a quite clean driver.
(the main issue with it so far was you could only seize devices based on 
vendor and device id, which can be a problem when you have multiple devices.
However that was resolved recently if i am correct.)

And neither of those will be supported by xen-pciback if i get your message 
right ?

--
Sander

> David


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Xen-devel] [PATCH v5 9/9] xen/pciback: Implement PCI reset slot or bus with 'do_flr' SysFS attribute

2014-12-04 Thread Sander Eikelenboom
Hello Sander,

Thursday, December 4, 2014, 3:09:09 PM, you wrote:


> Thursday, December 4, 2014, 2:43:06 PM, you wrote:

>> On 04/12/14 13:10, Sander Eikelenboom wrote:
>>> 
>>> Thursday, December 4, 2014, 1:24:47 PM, you wrote:
>>> 
>>>> On 04/12/14 12:06, Konrad Rzeszutek Wilk wrote:
>>>>>
>>>>> On Dec 4, 2014 6:30 AM, David Vrabel  wrote:
>>>>>>
>>>>>> On 03/12/14 21:40, Konrad Rzeszutek Wilk wrote: 
>>>>>>>
>>>>>>> Instead of doing all this complex dance, we depend on the toolstack 
>>>>>>> doing the right thing. As such implement the 'do_flr' SysFS attribute 
>>>>>>> which 'xl' uses when a device is detached or attached from/to a guest. 
>>>>>>> It bypasses the need to worry about the PCI lock. 
>>>>>>
>>>>>> No.  Get pciback to add its own "reset" sysfs file (as I have repeatedly 
>>>>>> proposed). 
>>>>>>
>>>>>
>>>>> Which does not work as the kobj will complain (as there is already an 
>>>>> 'reset' associated with the PCI device).
>>> 
>>>> It is only needed if the core won't provide one.
>>> 
>>>> +static int pcistub_try_create_reset_file(struct pci_dev *pci)
>>>> +{
>>>> +   struct xen_pcibk_dev_data *dev_data = pci_get_drvdata(pci);
>>>> +   struct device *dev = &pci->dev;
>>>> +   int ret;
>>>> +
>>>> +   /* Already have a per-function reset? */
>>>> +   if (pci_probe_reset_function(pci) == 0)
>>>> +   return 0;
>>>> +
>>>> +   ret = device_create_file(dev, &dev_attr_reset);
>>>> +   if (ret < 0)
>>>> +   return ret;
>>> +   dev_data->>created_reset_file = true;
>>>> +   return 0;
>>>> +}
>>> 
>>> Wouldn't the "core-reset-sysfs-file" be still wired to the end up calling 
>>> "pci.c:__pci_dev_reset" ?
>>> 
>>> The problem with that function is that from my testing it seems that the 
>>> first option "pci_dev_specific_reset" always seems to return succes, so all 
>>> the
>>> other options are skipped (flr, pm, slot, bus). However the device it self 
>>> is 
>>> not properly reset enough (perhaps the pci_dev_specific_reset is good 
>>> enough for 
>>> none virtualization purposes and it's probably the least intrusive. For 
>>> virtualization however it would be nice to be sure it resets properly, or 
>>> have a 
>>> way to force a specific reset routine.)

>> Then you need work with the maintainer for those specific devices or
>> drivers to fix their specific reset function.

>> I'm not adding stuff to pciback to workaround broken quirks.

> OK that's a pretty clear message there, so if one wants to use pci and vga
> passthrough one should better use KVM and vfio-pci. 

> vfio-pci has:
> - logic to do the try-slot-bus-reset logic
> - it has quirks specific to vga passthrough
Hrmm have to correct my self because the vga-pt quirks are part of the vfio-pci 
part in qemu.

The try-slot-bus-reset logic is part of the kernel vfio-pci driver though and 
they faced the same locking issue it 
seems:

http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=61cf16d8bd38c3dc52033ea75d5b1f8368514a17
http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=890ed578df82f5b7b5a874f9f2fa4f117305df5f

> implemented in from the looks of it a quite clean driver.
> (the main issue with it so far was you could only seize devices based on 
> vendor and device id, which can be a problem when you have multiple devices.
> However that was resolved recently if i am correct.)

> And neither of those will be supported by xen-pciback if i get your message 
> right ?

> --
> Sander

>> David


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Xen-devel] [PATCH v5 9/9] xen/pciback: Implement PCI reset slot or bus with 'do_flr' SysFS attribute

2014-12-04 Thread Sander Eikelenboom

Thursday, December 4, 2014, 3:31:11 PM, you wrote:

> On 04/12/14 14:09, Sander Eikelenboom wrote:
>> 
>> Thursday, December 4, 2014, 2:43:06 PM, you wrote:
>> 
>>> On 04/12/14 13:10, Sander Eikelenboom wrote:
>>>>
>>>> Thursday, December 4, 2014, 1:24:47 PM, you wrote:
>>>>
>>>>> On 04/12/14 12:06, Konrad Rzeszutek Wilk wrote:
>>>>>>
>>>>>> On Dec 4, 2014 6:30 AM, David Vrabel  wrote:
>>>>>>>
>>>>>>> On 03/12/14 21:40, Konrad Rzeszutek Wilk wrote: 
>>>>>>>>
>>>>>>>> Instead of doing all this complex dance, we depend on the toolstack 
>>>>>>>> doing the right thing. As such implement the 'do_flr' SysFS attribute 
>>>>>>>> which 'xl' uses when a device is detached or attached from/to a guest. 
>>>>>>>> It bypasses the need to worry about the PCI lock. 
>>>>>>>
>>>>>>> No.  Get pciback to add its own "reset" sysfs file (as I have 
>>>>>>> repeatedly 
>>>>>>> proposed). 
>>>>>>>
>>>>>>
>>>>>> Which does not work as the kobj will complain (as there is already an 
>>>>>> 'reset' associated with the PCI device).
>>>>
>>>>> It is only needed if the core won't provide one.
>>>>
>>>>> +static int pcistub_try_create_reset_file(struct pci_dev *pci)
>>>>> +{
>>>>> +   struct xen_pcibk_dev_data *dev_data = pci_get_drvdata(pci);
>>>>> +   struct device *dev = &pci->dev;
>>>>> +   int ret;
>>>>> +
>>>>> +   /* Already have a per-function reset? */
>>>>> +   if (pci_probe_reset_function(pci) == 0)
>>>>> +   return 0;
>>>>> +
>>>>> +   ret = device_create_file(dev, &dev_attr_reset);
>>>>> +   if (ret < 0)
>>>>> +   return ret;
>>>> +   dev_data->>created_reset_file = true;
>>>>> +   return 0;
>>>>> +}
>>>>
>>>> Wouldn't the "core-reset-sysfs-file" be still wired to the end up calling 
>>>> "pci.c:__pci_dev_reset" ?
>>>>
>>>> The problem with that function is that from my testing it seems that the 
>>>> first option "pci_dev_specific_reset" always seems to return succes, so 
>>>> all the
>>>> other options are skipped (flr, pm, slot, bus). However the device it self 
>>>> is 
>>>> not properly reset enough (perhaps the pci_dev_specific_reset is good 
>>>> enough for 
>>>> none virtualization purposes and it's probably the least intrusive. For 
>>>> virtualization however it would be nice to be sure it resets properly, or 
>>>> have a 
>>>> way to force a specific reset routine.)
>> 
>>> Then you need work with the maintainer for those specific devices or
>>> drivers to fix their specific reset function.
>> 
>>> I'm not adding stuff to pciback to workaround broken quirks.
>> 
>> OK that's a pretty clear message there, so if one wants to use pci and vga
>> passthrough one should better use KVM and vfio-pci.

> Have you (or anyone else) ever raised the problem with the broken reset
> quirk for certain devices with the relevant maintainer?

>> vfio-pci has:
>> - logic to do the try-slot-bus-reset logic

> Just because vfio-pci fixed it incorrectly doesn't mean pciback has to
> as well.

Depends on what you call an "incorrect fix" .. it fixes a quirk .. 
you can say that's incorrect, but then you would have to remove 50% of
the kernel and Xen code as well.

(i do in general agree it's better to strive for a generic solution though,
that's exactly why i brought up that that function doesn't seem to work perfect
for virtualization purposes) 

> It makes no sense for both pciback and vfio-pci to workaround problems
> with pci_function_reset() in different ways -- it should be fixed in the
> core PCI code so both can benefit and make use of the same code.

Well perhaps Bjorn knows why the order of resets and skipping the rest as
implemented in "pci.c:__pci_dev_reset" was implemented in that way ?

Especially what is the expectation about pci_dev_specific_reset doing a proper 
reset for say a vga-card:
- i know it doesn't work on a radeon card (doesn't blank screen, on next guest 
  boot reports it's already posted, powermanagement doesn't work).
- while with a slot/bus reset, that all just works fine, screen blanks 
  immediately and everything else also works.

Added Alex as well since he added this workaround for KVM/vfio-pci, perhaps he 
knows why
he introduced the workaround in vfio-pci instead of trying to fix it in core 
pci 
code ?

--
Sander


> David



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Xen-devel] [PATCH v5 9/9] xen/pciback: Implement PCI reset slot or bus with 'do_flr' SysFS attribute

2014-12-04 Thread Sander Eikelenboom

Thursday, December 4, 2014, 4:39:06 PM, you wrote:

> On Thu, 2014-12-04 at 15:50 +0100, Sander Eikelenboom wrote:
>> Thursday, December 4, 2014, 3:31:11 PM, you wrote:
>> 
>> > On 04/12/14 14:09, Sander Eikelenboom wrote:
>> >> 
>> >> Thursday, December 4, 2014, 2:43:06 PM, you wrote:
>> >> 
>> >>> On 04/12/14 13:10, Sander Eikelenboom wrote:
>> >>>>
>> >>>> Thursday, December 4, 2014, 1:24:47 PM, you wrote:
>> >>>>
>> >>>>> On 04/12/14 12:06, Konrad Rzeszutek Wilk wrote:
>> >>>>>>
>> >>>>>> On Dec 4, 2014 6:30 AM, David Vrabel  wrote:
>> >>>>>>>
>> >>>>>>> On 03/12/14 21:40, Konrad Rzeszutek Wilk wrote: 
>> >>>>>>>>
>> >>>>>>>> Instead of doing all this complex dance, we depend on the toolstack 
>> >>>>>>>> doing the right thing. As such implement the 'do_flr' SysFS 
>> >>>>>>>> attribute 
>> >>>>>>>> which 'xl' uses when a device is detached or attached from/to a 
>> >>>>>>>> guest. 
>> >>>>>>>> It bypasses the need to worry about the PCI lock. 
>> >>>>>>>
>> >>>>>>> No.  Get pciback to add its own "reset" sysfs file (as I have 
>> >>>>>>> repeatedly 
>> >>>>>>> proposed). 
>> >>>>>>>
>> >>>>>>
>> >>>>>> Which does not work as the kobj will complain (as there is already an 
>> >>>>>> 'reset' associated with the PCI device).
>> >>>>
>> >>>>> It is only needed if the core won't provide one.
>> >>>>
>> >>>>> +static int pcistub_try_create_reset_file(struct pci_dev *pci)
>> >>>>> +{
>> >>>>> +   struct xen_pcibk_dev_data *dev_data = pci_get_drvdata(pci);
>> >>>>> +   struct device *dev = &pci->dev;
>> >>>>> +   int ret;
>> >>>>> +
>> >>>>> +   /* Already have a per-function reset? */
>> >>>>> +   if (pci_probe_reset_function(pci) == 0)
>> >>>>> +   return 0;
>> >>>>> +
>> >>>>> +   ret = device_create_file(dev, &dev_attr_reset);
>> >>>>> +   if (ret < 0)
>> >>>>> +   return ret;
>> >>>> +   dev_data->>created_reset_file = true;
>> >>>>> +   return 0;
>> >>>>> +}
>> >>>>
>> >>>> Wouldn't the "core-reset-sysfs-file" be still wired to the end up 
>> >>>> calling 
>> >>>> "pci.c:__pci_dev_reset" ?
>> >>>>
>> >>>> The problem with that function is that from my testing it seems that 
>> >>>> the 
>> >>>> first option "pci_dev_specific_reset" always seems to return succes, so 
>> >>>> all the
>> >>>> other options are skipped (flr, pm, slot, bus). However the device it 
>> >>>> self is 
>> >>>> not properly reset enough (perhaps the pci_dev_specific_reset is good 
>> >>>> enough for 
>> >>>> none virtualization purposes and it's probably the least intrusive. For 
>> >>>> virtualization however it would be nice to be sure it resets properly, 
>> >>>> or have a 
>> >>>> way to force a specific reset routine.)
>> >> 
>> >>> Then you need work with the maintainer for those specific devices or
>> >>> drivers to fix their specific reset function.
>> >> 
>> >>> I'm not adding stuff to pciback to workaround broken quirks.
>> >> 
>> >> OK that's a pretty clear message there, so if one wants to use pci and vga
>> >> passthrough one should better use KVM and vfio-pci.
>> 
>> > Have you (or anyone else) ever raised the problem with the broken reset
>> > quirk for certain devices with the relevant maintainer?
>> 
>> >> vfio-pci has:
>> >> - logic to do the try-slot-bus-reset logic
>> 
>> > Just because vfio-pci fixed it incorrectly doesn'

if power button pressed 1x it creates the same ACPI event twice

2015-01-13 Thread Sander Eikelenboom
Hi Rafael / Len,

When i press the power-button on my intel NUC, i get twp ACPI power-events 
shortly 
after one another (within a second), instead of just one. 

It doesn't matter if i have the old /proc/acpi interface enabled or disabled in 
the kernel config.

I have tested a few older kernels lingering around on the box and 3.14 already 
has this problem. 3.2 (from the debian repo) doesn't have the problem, it only 
fires one event.

I did find another report:
Re: lenovo ultrabay docking station: if power button pressed 1x it creates 
2x the same ACPI event
http://www.spinics.net/lists/linux-acpi/msg54723.html
But that hasn't come to a conclusion ...


Unfortunately 3.2 - 3.19-rc4 is a bit of a largish bisect window, so that's 
unfeasible :-)

I compiled in apci debug support and tried:
 echo "0x0008" > /sys/module/acpi/parameters/debug_layer
 echo "0x" > /sys/module/acpi/parameters/debug_level

But i don't get any extra output in dmesg ?

Do you have any ideas for a debug patch or better values to figure out what is 
going on ?

--
Sander

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: if power button pressed 1x it creates the same ACPI event twice

2015-01-13 Thread Sander Eikelenboom
Tuesday, January 13, 2015, 5:35:10 PM, you wrote:

> Hi Rafael / Len,

> When i press the power-button on my intel NUC, i get twp ACPI power-events 
> shortly 
> after one another (within a second), instead of just one. 

> It doesn't matter if i have the old /proc/acpi interface enabled or disabled 
> in the kernel config.

> I have tested a few older kernels lingering around on the box and 3.14 
> already 
> has this problem. 3.2 (from the debian repo) doesn't have the problem, it 
> only 
> fires one event.

> I did find another report:
> Re: lenovo ultrabay docking station: if power button pressed 1x it 
> creates 2x the same ACPI event
> http://www.spinics.net/lists/linux-acpi/msg54723.html
> But that hasn't come to a conclusion ...


> Unfortunately 3.2 - 3.19-rc4 is a bit of a largish bisect window, so that's 
> unfeasible :-)

> I compiled in apci debug support and tried:
>  echo "0x0008" > /sys/module/acpi/parameters/debug_layer
>  echo "0x" > /sys/module/acpi/parameters/debug_level

> But i don't get any extra output in dmesg ?

> Do you have any ideas for a debug patch or better values to figure out what 
> is going on ?

> --
> Sander

Hmm is it normal that it registers in this way (twice) ?

[5.817980] input: Power Button as 
/devices/LNXSYSTM:00/LNXSYBUS:00/PNP0C0C:00/input/input0
[5.837907] ACPI: Power Button [PWRB]
[5.847540] input: Power Button as 
/devices/LNXSYSTM:00/LNXPWRBN:00/input/input1
[5.866915] ACPI: Power Button [PWRF]

--
Sander

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


3.19-rc4: BUG: unable to handle kernel paging request at ffff880055f15000 ovs_packet_cmd_execute+0x1f/0x229

2015-01-14 Thread Sander Eikelenboom
Hi,

I was testing 3.19-rc4 with openvswitch and encountered the splat below.

#addr2line -e 
/boot/vmlinux-3.19.0-rc4-creanuc-20150114-doflr-apicpatchv3-apicrevert+ 
818a1690
/mnt/kernelbuild/linux-tip/net/openvswitch/datapath.c:527
--
Sander

[  463.033308] BUG: unable to handle kernel paging request at 880055f15000
[  463.072154] IP: [] ovs_packet_cmd_execute+0x1f/0x229
[  463.106202] PGD 1e10067 PUD 2097067 PMD 5ff54067 PTE 0
[  463.126940] Oops:  [#1] SMP
[  463.147505] Modules linked in:
[  463.166938] CPU: 2 PID: 3049 Comm: ovs-vswitchd Not tainted 
3.19.0-rc4-creanuc-20150114-doflr-apicpatchv3-apicrevert+ #1
[  463.187507] Hardware name:  /D53427RKE, BIOS 
RKPPT10H.86A.0017.2013.0425.1251 04/25/2013
[  463.208553] task: 880058d3 ti: 880055c38000 task.ti: 
880055c38000
[  463.229734] RIP: e030:[]  [] 
ovs_packet_cmd_execute+0x1f/0x229
[  463.251082] RSP: e02b:880055c3ba48  EFLAGS: 00010296
[  463.271786] RAX: 88004fe38818 RBX: 81ed4cc0 RCX: 
[  463.293072] RDX: 880055c3bb00 RSI: 880055c3bad0 RDI: 8800559dc700
[  463.314521] RBP: 8800559dc700 R08: 81b08d00 R09: 7000
[  463.336189] R10: 88004fe38814 R11: 81ed4cc0 R12: 880055f14fc0
[  463.356906] R13: 88004fe38800 R14: 880055f14fc0 R15: 81b08c60
[  463.377482] FS:  7f196321c700() GS:88005f70() 
knlGS:88005f68
[  463.398646] CS:  e033 DS:  ES:  CR0: 80050033
[  463.419995] CR2: 880055f15000 CR3: 5622e000 CR4: 00042660
[  463.441577] Stack:
[  463.462975]  000c 88004fe38814 0005 
8130b116
[  463.485114]  81ed4cc0 81ed4cc0 8800559dc700 
880055f14fc0
[  463.507367]  88004fe38800 0008 81b08c60 
81794364
[  463.530186] Call Trace:
[  463.552330]  [] ? nla_parse+0x57/0xe7
[  463.574869]  [] ? genl_family_rcv_msg+0x243/0x2a9
[  463.597276]  [] ? __slab_alloc.constprop.63+0x2bb/0x2e5
[  463.619394]  [] ? genl_rcv_msg+0x38/0x5b
[  463.641361]  [] ? __netlink_lookup+0x3a/0x40
[  463.663192]  [] ? genl_family_rcv_msg+0x2a9/0x2a9
[  463.685141]  [] ? netlink_rcv_skb+0x36/0x7c
[  463.706874]  [] ? genl_rcv+0x1f/0x2c
[  463.729152]  [] ? netlink_unicast+0x100/0x19c
[  463.751315]  [] ? netlink_sendmsg+0x311/0x36b
[  463.772483]  [] ? do_sock_sendmsg+0x62/0x7b
[  463.793309]  [] ? copy_msghdr_from_user+0x158/0x17c
[  463.814032]  [] ? ___sys_sendmsg+0x11f/0x197
[  463.834595]  [] ? sock_poll+0xf2/0xfd
[  463.854970]  [] ? ep_send_events_proc+0x91/0x153
[  463.875603]  [] ? ep_read_events_proc+0x92/0x92
[  463.896168]  [] ? _raw_spin_unlock_irqrestore+0x42/0x5b
[  463.917050]  [] ? ep_scan_ready_list.isra.14+0x163/0x182
[  463.938458]  [] ? ep_poll+0x250/0x2c4
[  463.958214]  [] ? __sys_sendmsg+0x3b/0x5d
[  463.977581]  [] ? system_call_fastpath+0x12/0x17
[  463.996860] Code: ff 89 d8 5b 5d 41 5c 41 5d 41 5e c3 41 57 41 56 41 55 41 
54 55 53 48 83 ec 28 48 8b 46 18 4c 8b 76 20 48 89 44 24 08 49 8b 46 08 <49> 8b 
6e 40 48 85 c0 0f 84 e0 01 00 00 49 83 7e 10 00 0f 84 d5
[  464.037236] RIP  [] ovs_packet_cmd_execute+0x1f/0x229
[  464.056926]  RSP 
[  464.076182] CR2: 880055f15000
[  464.095097] ---[ end trace 8bcb28ced5309e55 ]---

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v3] [Bugfix] x86/apic: Fix xen IRQ allocation failure caused by commit b81975eade8c

2015-01-14 Thread Sander Eikelenboom

Wednesday, January 14, 2015, 4:09:35 AM, you wrote:

> Commit b81975eade8c ("x86, irq: Clean up irqdomain transition code")
> breaks xen IRQ allocation because xen_smp_prepare_cpus() doesn't invoke
> setup_IO_APIC(), so no irqdomains created for IOAPICs and
> mp_map_pin_to_irq() fails at the very beginning.

> So move creating of IOAPIC irqdomains from setup_IO_APIC() into
> arch_early_ioapic_init().

> Signed-off-by: Jiang Liu 
> Signed-off-by: David Vrabel 
> Reported-and-tested-by: Sander Eikelenboom 
> Cc: Konrad Rzeszutek Wilk 

Thanks Gerry !
Will you send (a backport) to stable for 3.17 and 3.18 when 
it's applied to -tip ?

--
Sander

> ---
>  arch/x86/kernel/apic/io_apic.c |   13 +++--
>  1 file changed, 7 insertions(+), 6 deletions(-)

> diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c
> index 3f5f60406ab1..1117c84cefe4 100644
> --- a/arch/x86/kernel/apic/io_apic.c
> +++ b/arch/x86/kernel/apic/io_apic.c
> @@ -245,6 +245,8 @@ static void free_ioapic_saved_registers(int idx)
> ioapics[idx].saved_registers = NULL;
>  }
>  
> +static int mp_irqdomain_create(int ioapic);
> +
>  int __init arch_early_ioapic_init(void)
>  {
> struct irq_cfg *cfg;
> @@ -253,8 +255,10 @@ int __init arch_early_ioapic_init(void)
> if (!nr_legacy_irqs())
> io_apic_irqs = ~0UL;
>  
> -   for_each_ioapic(i)
> +   for_each_ioapic(i) {
> +   BUG_ON(mp_irqdomain_create(i));
> alloc_ioapic_saved_registers(i);
> +   }
>  
> /*
>  * For legacy IRQ's, start with assigning irq0 to irq15 to
> @@ -2371,16 +2375,12 @@ static void ioapic_destroy_irqdomain(int idx)
>  
>  void __init setup_IO_APIC(void)
>  {
> -   int ioapic;
> -
> /*
>  * calling enable_IO_APIC() is moved to setup_local_APIC for BP
>  */
> io_apic_irqs = nr_legacy_irqs() ? ~PIC_IRQS : ~0UL;
>  
> apic_printk(APIC_VERBOSE, "ENABLING IO-APIC IRQs\n");
> -   for_each_ioapic(ioapic)
> -   BUG_ON(mp_irqdomain_create(ioapic));
>  
> /*
>   * Set up IO-APIC IRQ routing.
> @@ -2929,7 +2929,8 @@ int mp_register_ioapic(int id, u32 address, u32 
> gsi_base,
> /*
>  * If mp_register_ioapic() is called during early boot stage when
>  * walking ACPI/SFI/DT tables, it's too early to create irqdomain,
> -* we are still using bootmem allocator. So delay it to 
> setup_IO_APIC().
> +* we are still using bootmem allocator.So delay it to
> +* arch_early_ioapic_init().
>  */
> if (hotplug) {
> if (mp_irqdomain_create(idx)) {

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


3.19-rc4: Xen pci-passthrough regression, bisected to commit cffe0a2b5a34c95a4dadc9ec7132690a5b0f6687 "x86, irq: Keep balance of IOAPIC pin reference count"

2015-01-14 Thread Sander Eikelenboom
Hi Gerry / David / Konrad,

Some more testing uncovered another issue under Xen, this time with 
PCI-passthrough.

I have bisected it to the following commit: 
cffe0a2b5a34c95a4dadc9ec7132690a5b0f6687 "x86, irq: Keep balance of IOAPIC pin 
reference count"

It causes these symptoms:

- On Intel
  - Running on Xen with pci devices seized on host boot with xen-pciback.hide= 
parameter
  - Running a HVM guest with PCI passthrough of two devices (NIC + wireless NIC)
  - While the driver loads fine, the device isn't working properly, looking in 
/proc/interrupts in the guest
shows that it doesn't receive any interrupts.
  - Reverting this particular commit (in the dom0 kernel only) makes the device 
receive interrupts and work properly again.

- On AMD (more subtle symptom) 
  - Running on Xen with pci devices seized on host boot with xen-pciback.hide= 
parameter
  - Running a HVM guest with PCI passthrough of one devices (videograbber)
  - While the driver loads fine and the device looks like it's working, the 
videostream isn't stable and it skips or repeats frames.
  - Reverting this particular commit (in the dom0 kernel only) makes the device 
work properly again with a stable videostream.

--
Sander

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Bugfix v4 0/2] Fix xen IRQ allocation failure caused by commit b81975eade8c

2015-01-15 Thread Sander Eikelenboom

Thursday, January 15, 2015, 2:04:34 PM, you wrote:

> With more knowledge of Xen interrupt manangement subsytem, I realized
> previous three versions to fix https://lkml.org/lkml/2014/12/19/178 are
> just plainly wrong. Those patches try to fix the issue by creating
> irqdomain for IOAPICs for PV domains, which effectively let native
> IOAPIC driver and Xen PV interrupt management subsystem to manage
> IOAPIC irqs concurrently, sounds unpredictable.

> Sorry for those wrong fixes. The good news is that the new fix does
> make code simpler and easier to maintain.

> I have tested the patchset on Intel platform with bare metal and Dom0
> kernels.

> Hi Sander,
> Could you please help to test it again?

> Regards!
> Gerry

Hi Gerry,

Sure, i will test this afternoon and report back.

Thanks

--
Sander

> Jiang Liu (2):
>   xen/pci: Kill function xen_setup_acpi_sci()
>   xen/pci: Simplify x86/pci/xen.c by killing gsi_override related code

>  arch/x86/kernel/acpi/boot.c |   26 -
>  arch/x86/pci/xen.c  |   68 
> ---
>  2 files changed, 19 insertions(+), 75 deletions(-)


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Bugfix v4 0/2] Fix xen IRQ allocation failure caused by commit b81975eade8c

2015-01-15 Thread Sander Eikelenboom

Thursday, January 15, 2015, 2:04:34 PM, you wrote:

> With more knowledge of Xen interrupt manangement subsytem, I realized
> previous three versions to fix https://lkml.org/lkml/2014/12/19/178 are
> just plainly wrong. Those patches try to fix the issue by creating
> irqdomain for IOAPICs for PV domains, which effectively let native
> IOAPIC driver and Xen PV interrupt management subsystem to manage
> IOAPIC irqs concurrently, sounds unpredictable.

> Sorry for those wrong fixes. The good news is that the new fix does
> make code simpler and easier to maintain.

> I have tested the patchset on Intel platform with bare metal and Dom0
> kernels.

> Hi Sander,
> Could you please help to test it again?

> Regards!
> Gerry

Hi Gerry,

These patches fix the first symptom of the powerbutton not working.

Unfortunately it doesn't fix the second symptoms with pci-passthrough,
the device still doesn't receive irq's on intel and the video device still haas
issues on AMD.

What i have tested extensively and works stable for me is:
david's patch + revert of cffe0a2b5a34c95a4dadc9ec7132690a5b0f6687 "x86, irq: 
Keep balance of IOAPIC pin reference count" 

Hope that helps in finding the solution.

--
Sander


> Jiang Liu (2):
>   xen/pci: Kill function xen_setup_acpi_sci()
>   xen/pci: Simplify x86/pci/xen.c by killing gsi_override related code

>  arch/x86/kernel/acpi/boot.c |   26 -
>  arch/x86/pci/xen.c  |   68 
> ---
>  2 files changed, 19 insertions(+), 75 deletions(-)


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


media

2015-01-10 Thread Sander Eikelenboom
Hi all, 

With a 3.19-rc3 kernel i'm running into the warning below on boot with a 
pvrusb2 device.

It's hitting this warn:
/*
 * Drivers MUST fill in device_caps, so check for this and
 * warn if it was forgotten.
 */
WARN_ON(!(cap->capabilities & V4L2_CAP_DEVICE_CAPS) ||
!cap->device_caps);

--
Sander

[   25.604846] [ cut here ]
[   25.622165] WARNING: CPU: 4 PID: 2133 at 
drivers/media/v4l2-core/v4l2-ioctl.c:1025 v4l_querycap+0x3e/0x70()
[   25.654888] Modules linked in:
[   25.667494] CPU: 4 PID: 2133 Comm: v4l_id Not tainted 
3.19.0-rc3-20150110-pciback-doflr+ #1
[   25.695927] Hardware name: MSI MS-7640/890FXA-GD70 (MS-7640)  , BIOS V1.8B1 
09/13/2010
[   25.723022]  81fcf468 880546fefbc8 81bb3fc9 
880544a83240
[   25.748684]   880546fefc08 810c738d 
880544a83240
[   25.774301]  880546fefd38 80685600  
880544b78e00
[   25.799936] Call Trace:
[   25.810584]  [] dump_stack+0x45/0x57
[   25.829319]  [] warn_slowpath_common+0x8d/0xd0
[   25.850669]  [] warn_slowpath_null+0x15/0x20
[   25.871477]  [] v4l_querycap+0x3e/0x70
[   25.890692]  [] __video_do_ioctl+0x284/0x300
[   25.911478]  [] ? __lock_acquire+0x4e6/0x21a0
[   25.932412]  [] video_usercopy+0x1f1/0x4e0
[   25.952525]  [] ? v4l_printk_ioctl+0xa0/0xa0
[   25.973136]  [] ? trace_hardirqs_on+0xd/0x10
[   25.993791]  [] video_ioctl2+0x10/0x20
[   26.012844]  [] pvr2_v4l2_ioctl+0xa7/0x180
[   26.032939]  [] v4l2_ioctl+0x12f/0x150
[   26.051927]  [] do_vfs_ioctl+0x83/0x5b0
[   26.071161]  [] ? final_putname+0x21/0x50
[   26.090911]  [] ? sysret_check+0x22/0x5d
[   26.110395]  [] SyS_ioctl+0x47/0x90
[   26.128537]  [] system_call_fastpath+0x12/0x17
[   26.149531] ---[ end trace 52e366625e9023ef ]---
[   26.166528] [ cut here ]

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Bugfix] x86/apic: Fix xen IRQ allocation failure caused by commit b81975eade8c

2015-01-10 Thread Sander Eikelenboom

Wednesday, January 7, 2015, 7:13:49 AM, you wrote:

> Commit b81975eade8c ("x86, irq: Clean up irqdomain transition code")
> breaks xen IRQ allocation because xen_smp_prepare_cpus() doesn't invoke
> setup_IO_APIC(), so no irqdomains created for IOAPICs and
> mp_map_pin_to_irq() fails at the very beginning.

> Enhance xen_smp_prepare_cpus() to call setup_IO_APIC() to initialize
> irqdomain for IOAPICs.

> Signed-off-by: Jiang Liu 
> Reported-and-tested-by: Sander Eikelenboom 
> Cc: Konrad Rzeszutek Wilk 
> ---
> Hi all,
> This patch should be backported to v3.17, but there are
> conflicts. So I will send backported patch to 3.17/3.18 stable tree
> once this patch has been merged into mainstream kernel.
> Thanks!
> Gerry

Hi Gerry / Konrad / Thomas,

This patch doesn't apply cleanly to current linux-tip.

Unfortunately the "Tested-by" seems only valid for the intel hardware i have 
(intel NUC).
Testing on AMD delivered some interesting results:

- Under Xen: Host freeze early in dom0 kernel boot, unfortunately no more info.

- On baremetal with iommu enabled and ivrs_ioapic[6]=00:14.0 
ivrs_hpet[0]=00:14.0
  as commandline override for a borked bios:
  It doesn't boot and spits out:
  
  [0.339811] AMD-Vi: Command-line override present for HPET id 0 - ignoring
  [0.460563] ..TIMER: vector=0x30 apic1=0 pin1=2 apic2=-1 pin2=-1
  [0.511535] Kernel panic - not syncing: timer doesn't work through 
Interrupt-remapped IO-APIC
  [0.537042] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 
3.19.0-rc3-20150110-pciback-doflr-apic-fixed+ #1
  [0.564887] Hardware name: MSI MS-7640/890FXA-GD70 (MS-7640)  , BIOS 
V1.8B1 09/13/2010
  [0.588558]  88054b00c000 880547d8fda8 81bb3fc9 
880547db
  [0.610722]  81f20070 880547d8fe28 81baeeb9 
880547d8fde8
  [0.632886]  0008 880547d8fe38 880547d8fdd8 
fffea093
  [0.655052] Call Trace:
  [0.662349]  [] dump_stack+0x45/0x57
  [0.677712]  [] panic+0xcd/0x212
  [0.692026]  [] panic_if_irq_remap+0x17/0x20
  [0.709460]  [] setup_IO_APIC+0x2bb/0x74c
  [0.726112]  [] native_smp_prepare_cpus+0x2c9/0x35a
  [0.745365]  [] kernel_init_freeable+0x153/0x298
  [0.763839]  [] ? kernel_init+0x9/0xf0
  [0.779712]  [] ? finish_task_switch+0x8b/0x100
  [0.797937]  [] ? rest_init+0xc0/0xc0
  [0.813549]  [] kernel_init+0x9/0xf0
  [0.828904]  [] ret_from_fork+0x7c/0xb0
  [0.845036]  [] ? rest_init+0xc0/0xc0
  [0.860660] ---[ end Kernel panic - not syncing: timer doesn't work 
through Interrupt-remapped IO-APIC


- On baremetal with iommu enabled and without the commandline overrides:
  It boots, but iommu is disabled (as expected) but i also get this lockdep 
trace:

  [0.339808] [Firmware Bug]: AMD-Vi: IOAPIC[6] not in IVRS table
  [0.357519] [Firmware Bug]: AMD-Vi: No southbridge IOAPIC found
  [0.375220] AMD-Vi: Disabling interrupt remapping
  [0.389723] ..TIMER: vector=0x30 apic1=0 pin1=2 apic2=-1 pin2=-1
  [0.440685] ..MP-BIOS bug: 8254 timer not connected to IO-APIC
  [0.458128] ...trying to set up timer (IRQ0) through the 8259A ...
  [0.476602] . (found apic 0 pin 2) ...
  [0.488834] [ cut here ]
  [0.502631] WARNING: CPU: 0 PID: 1 at kernel/locking/lockdep.c:2744 
lockdep_trace_alloc+0x12c/0x140()
  [0.530215] DEBUG_LOCKS_WARN_ON(irqs_disabled_flags(flags))
  [0.546347] Modules linked in:
  [0.556006] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 
3.19.0-rc3-20150110-pciback-doflr-apic-fixed+ #1
  [0.583839] Hardware name: MSI MS-7640/890FXA-GD70 (MS-7640)  , BIOS 
V1.8B1 09/13/2010
  [0.607512]  81f2408e 880547d8fcd8 81bb3fc9 
880547db
  [0.629675]  880547d8fd28 880547d8fd18 810c738d 
880547d8fd08
  [0.651840]  880547db 0086 0080 

  [0.674004] Call Trace:
  [0.681302]  [] dump_stack+0x45/0x57
  [0.696653]  [] warn_slowpath_common+0x8d/0xd0
  [0.714605]  [] warn_slowpath_fmt+0x41/0x50
  [0.731780]  [] ? vprintk_emit+0x312/0x5f0
  [0.748692]  [] lockdep_trace_alloc+0x12c/0x140
  [0.766906]  [] kmem_cache_alloc_node+0x3f/0x160
  [0.785379]  [] ? __add_pin_to_irq_node+0x6c/0xc0
  [0.804112]  [] __add_pin_to_irq_node+0x6c/0xc0
  [0.822326]  [] setup_IO_APIC+0x34f/0x74c
  [0.838990]  [] native_smp_prepare_cpus+0x2c9/0x35a
  [0.858243]  [] kernel_init_freeable+0x153/0x298
  [0.876719]  [] ? kernel_init+0x9/0xf0
  [0.892602]  [] ? finish_task_switch+0x8b/0x100
  [0.910815]  [] ? rest_init+0xc0/0xc0
  [0.926428]  [] kernel_init+0x9/0xf0
  [0.941782]  [] ret_from_fork+0x7c/0xb0
  [0.957913]  [] ? rest_init+0xc0/0xc0
  [0.973531] ---[ end trace 5f14749f8239057a ]---
  [1.020338] .

Re: [Xen-devel] [PATCH v2] [Bugfix] x86/apic: Fix xen IRQ allocation failure caused by commit b81975eade8c

2015-01-12 Thread Sander Eikelenboom

Monday, January 12, 2015, 4:01:00 PM, you wrote:

> On 12/01/15 13:39, Jiang Liu wrote:
>> Commit b81975eade8c ("x86, irq: Clean up irqdomain transition code")
>> breaks xen IRQ allocation because xen_smp_prepare_cpus() doesn't invoke
>> setup_IO_APIC(), so no irqdomains created for IOAPICs and
>> mp_map_pin_to_irq() fails at the very beginning.
>> 
>> Enhance xen_smp_prepare_cpus() to call setup_IO_APIC() to initialize
>> irqdomain for IOAPICs.

> Having Xen call setup_IO_APIC() to initialize the irq domains then having to
> add special cases to it is just wrong.

> The bits of init deferred by mp_register_apic() are also deferred to
> two different places which looks odd.

> What about something like the following (untested) patch?

Hi David / Gerry,

David's patch (after fixing a few compile issues) fixes the problem.

The power button now works for me on:
- intel baremetal
- intel xen
- amd baremetal (no issues with the override anymore)
- amd xen   (no freeze issues anymore)

Big thanks David !

--
Sander


> diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c
> index 3f5f604..e180680 100644
> --- a/arch/x86/kernel/apic/io_apic.c
> +++ b/arch/x86/kernel/apic/io_apic.c
> @@ -253,8 +253,10 @@ int __init arch_early_ioapic_init(void)
> if (!nr_legacy_irqs())
> io_apic_irqs = ~0UL;
>  
> -   for_each_ioapic(i)
> +   for_each_ioapic(i) {
> +   BUG_ON(mp_irqdomain_create(ioapic));
> alloc_ioapic_saved_registers(i);
> +   }
>  
> /*
>  * For legacy IRQ's, start with assigning irq0 to irq15 to
> @@ -2379,8 +2381,6 @@ void __init setup_IO_APIC(void)
> io_apic_irqs = nr_legacy_irqs() ? ~PIC_IRQS : ~0UL;
>  
> apic_printk(APIC_VERBOSE, "ENABLING IO-APIC IRQs\n");
> -   for_each_ioapic(ioapic)
> -   BUG_ON(mp_irqdomain_create(ioapic));
>  
> /*
>   * Set up IO-APIC IRQ routing.
> @@ -2929,7 +2929,8 @@ int mp_register_ioapic(int id, u32 address, u32 
> gsi_base,
> /*
>  * If mp_register_ioapic() is called during early boot stage when
>  * walking ACPI/SFI/DT tables, it's too early to create irqdomain,
> -* we are still using bootmem allocator. So delay it to 
> setup_IO_APIC().
> +* we are still using bootmem allocator. So delay it to
> +* arch_early_ioapic_init().
>  */
> if (hotplug) {
> if (mp_irqdomain_create(idx)) {

>> --- a/arch/x86/kernel/apic/io_apic.c
>> +++ b/arch/x86/kernel/apic/io_apic.c
>> @@ -2369,6 +2381,15 @@ static void ioapic_destroy_irqdomain(int idx)
>>   ioapics[idx].pin_info = NULL;
>>  }
>>  
>> +static void setup_IO_APIC_IDs(void)
>> +{
>> + if (xen_domain())
>> + return;

> This would have to xen_pv_domain().

> David


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: if power button pressed 1x it creates the same ACPI event twice

2015-01-19 Thread Sander Eikelenboom
Hello Sander,

Tuesday, January 13, 2015, 5:47:24 PM, you wrote:

> Tuesday, January 13, 2015, 5:35:10 PM, you wrote:

>> Hi Rafael / Len,

>> When i press the power-button on my intel NUC, i get twp ACPI power-events 
>> shortly 
>> after one another (within a second), instead of just one. 

>> It doesn't matter if i have the old /proc/acpi interface enabled or disabled 
>> in the kernel config.

>> I have tested a few older kernels lingering around on the box and 3.14 
>> already 
>> has this problem. 3.2 (from the debian repo) doesn't have the problem, it 
>> only 
>> fires one event.

>> I did find another report:
>> Re: lenovo ultrabay docking station: if power button pressed 1x it 
>> creates 2x the same ACPI event
>> http://www.spinics.net/lists/linux-acpi/msg54723.html
>> But that hasn't come to a conclusion ...


>> Unfortunately 3.2 - 3.19-rc4 is a bit of a largish bisect window, so that's 
>> unfeasible :-)

>> I compiled in apci debug support and tried:
>>  echo "0x0008" > /sys/module/acpi/parameters/debug_layer
>>  echo "0x" > /sys/module/acpi/parameters/debug_level

>> But i don't get any extra output in dmesg ?

>> Do you have any ideas for a debug patch or better values to figure out what 
>> is going on ?

>> --
>> Sander

> Hmm is it normal that it registers in this way (twice) ?

> [5.817980] input: Power Button as 
> /devices/LNXSYSTM:00/LNXSYBUS:00/PNP0C0C:00/input/input0
> [5.837907] ACPI: Power Button [PWRB]
> [5.847540] input: Power Button as 
> /devices/LNXSYSTM:00/LNXPWRBN:00/input/input1
> [5.866915] ACPI: Power Button [PWRF]

> --
> Sander

Hi Rafael / Len,

Got some more time to test:

- It's indeed giving one event for both registered power buttons (which in 
reality 
  are just one hardware button) printing acpid's "%e" revealed that:
  1421672289-161192235 | button/power PBTN 0080 
  1421672289-279647745 | button/power LNXPWRBN:00 0080 0004

So i tested on another box (amd instead of intel), and that also registers 2 
power buttons:

# dmesg | grep -i button
[   13.435060] input: Power Button as 
/devices/LNXSYSTM:00/LNXSYBUS:00/PNP0C0C:00/input/input0
[   13.435294] ACPI: Power Button [PWRB]
[   13.435495] input: Power Button as 
/devices/LNXSYSTM:00/LNXPWRBN:00/input/input1
[   13.435683] ACPI: Power Button [PWRF]
 
So the question is .. should both register and should scripts adjust to pick up 
just one of them ?
Or can, as a more general solution, just one be ignored when registering ?

--
Sander

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: if power button pressed 1x it creates the same ACPI event twice

2015-01-19 Thread Sander Eikelenboom

Monday, January 19, 2015, 2:09:34 PM, you wrote:

> Hello Sander,

> Tuesday, January 13, 2015, 5:47:24 PM, you wrote:

>> Tuesday, January 13, 2015, 5:35:10 PM, you wrote:

>>> Hi Rafael / Len,

>>> When i press the power-button on my intel NUC, i get twp ACPI power-events 
>>> shortly 
>>> after one another (within a second), instead of just one. 

>>> It doesn't matter if i have the old /proc/acpi interface enabled or 
>>> disabled in the kernel config.

>>> I have tested a few older kernels lingering around on the box and 3.14 
>>> already 
>>> has this problem. 3.2 (from the debian repo) doesn't have the problem, it 
>>> only 
>>> fires one event.

>>> I did find another report:
>>> Re: lenovo ultrabay docking station: if power button pressed 1x it 
>>> creates 2x the same ACPI event
>>> http://www.spinics.net/lists/linux-acpi/msg54723.html
>>> But that hasn't come to a conclusion ...


>>> Unfortunately 3.2 - 3.19-rc4 is a bit of a largish bisect window, so that's 
>>> unfeasible :-)

>>> I compiled in apci debug support and tried:
>>>  echo "0x0008" > /sys/module/acpi/parameters/debug_layer
>>>  echo "0x" > /sys/module/acpi/parameters/debug_level

>>> But i don't get any extra output in dmesg ?

>>> Do you have any ideas for a debug patch or better values to figure out what 
>>> is going on ?

>>> --
>>> Sander

>> Hmm is it normal that it registers in this way (twice) ?

>> [5.817980] input: Power Button as 
>> /devices/LNXSYSTM:00/LNXSYBUS:00/PNP0C0C:00/input/input0
>> [5.837907] ACPI: Power Button [PWRB]
>> [5.847540] input: Power Button as 
>> /devices/LNXSYSTM:00/LNXPWRBN:00/input/input1
>> [5.866915] ACPI: Power Button [PWRF]

>> --
>> Sander

> Hi Rafael / Len,

> Got some more time to test:

> - It's indeed giving one event for both registered power buttons (which in 
> reality 
>   are just one hardware button) printing acpid's "%e" revealed that:
>   1421672289-161192235 | button/power PBTN 0080 
>   1421672289-279647745 | button/power LNXPWRBN:00 0080 0004

> So i tested on another box (amd instead of intel), and that also registers 2 
> power buttons:

> # dmesg | grep -i button
> [   13.435060] input: Power Button as 
> /devices/LNXSYSTM:00/LNXSYBUS:00/PNP0C0C:00/input/input0
> [   13.435294] ACPI: Power Button [PWRB]
> [   13.435495] input: Power Button as 
> /devices/LNXSYSTM:00/LNXPWRBN:00/input/input1
> [   13.435683] ACPI: Power Button [PWRF]
>  
> So the question is .. should both register and should scripts adjust to pick 
> up just one of them ?
> Or can, as a more general solution, just one be ignored when registering ?

> --
> Sander

Whoops sorry for the fragmentation of the report ... but just thought the 
output 
under kernel 3.2.0 could be handy as well .. as that only fires one event:

[4.003753] input: Power Button as 
/devices/LNXSYSTM:00/LNXSYBUS:00/PNP0C0C:00/input/input1
[4.003838] ACPI: Power Button [PWRB]
[4.003977] input: Power Button as 
/devices/LNXSYSTM:00/LNXPWRBN:00/input/input2
[4.005530] ACPI: Power Button [PWRF]

It registers also two ..
But when the power button is pressed only fires this one:

1421673524-817561070 | button/power PBTN 0080 

--
Sander

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RC6 Bell Chime] [PATCH 00/24] rfcomm fixes

2014-03-13 Thread Sander Eikelenboom

Tuesday, March 11, 2014, 4:14:38 PM, you wrote:

> Hi John,

>>> Since:
>>> - 3.14-RC6 has been cut
>>> - this regression is known and reported since the merge window
>>> - the fix (revert of 3 patches) is known for over a month now
>>> - but it's still not in mainline
>>> - my polite ping request from last week seems to have provoked exactly 0 
>>> (zero) response.
>>> 
>>> IT'S TIME TO CHIME SOME BELLS :-)
>>> 
>>> Hope that WILL be heard somewhere ...
>>> 
>>> --
>>> Sander
>>> 
>>> PS. on the informative side the 3 commits to be reverted are:
>>> 
>>> f86772af6a0f643d3e13eb3f4f9213ae0c333ee4 Bluetooth: Remove 
>>> rfcomm_carrier_raised()
>>> 4a2fb3ecc7467c775b154813861f25a0ddc11aa0 Bluetooth: Always wait for a 
>>> connection on RFCOMM open()
>>> e228b63390536f5b737056059a9a04ea016b1abf Bluetooth: Move 
>>> rfcomm_get_device() before rfcomm_dev_activate()
>> 
>> Gustavo, should I be expecting a pull request?

> you have all 3 reverts already in wireless-next as part of a larger RFCOMM 
> change from Peter that we had to do to get this bug fixed. The whole series 
> ended up as 24 patches and was way to late in the -rc stage. In addition we 
> did not have enough exposure from people running that patch set. I did not 
> wanted to end up in a ping-pong situation with apply + revert over and over 
> again until we gave this some test exposure.

> I think it is pretty safe to revert these 3 patches from 3.14-rc6 since they 
> broke more than they actually fixed. However everybody has to be aware of 
> that the real fix only comes in 3.15 since the change unfortunately is large. 
> As far as we can tell, only users of RFCOMM TTY are affected. All RFCOMM 
> socket users are fine.

> So I do not know what the best way of getting these reverts merged. Gustavo 
> has a tree ready for you to pull from. Or would it be better if they get 
> cherry picked from wireless-next tree.

> Regards

> Marcel



Is it just me .. or is this going at the speed of about a bluetooth connection 
..
and probably missing the boot for 3.14 ? (for no good reason IMHO)


(it was not in John's nor Dave's last pull request, although it seems to be 
reverted in the bluetooth tree now .. i didn't
see any formal pull request from that .. to get it even *starting* to traverse 
all the trees up to Linus ... )

--
Sander

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RC6 Bell Chime] [PATCH 00/24] rfcomm fixes

2014-03-15 Thread Sander Eikelenboom

Friday, March 14, 2014, 2:29:43 AM, you wrote:

> Hi Sander,

> On 03/13/2014 08:49 PM, Sander Eikelenboom wrote:
>> 
>> Is it just me .. or is this going at the speed of about a bluetooth 
>> connection ..
>> and probably missing the boot for 3.14 ? (for no good reason IMHO)
>> 
>>
>> (it was not in John's nor Dave's last pull request, although it seems to be 
>> reverted in the bluetooth tree now .. i didn't
>> see any formal pull request from that .. to get it even *starting* to 
>> traverse all the trees up to Linus ... )

> Known bugs sometimes roll out into mainline release because the
> alternative can be worse.

> As I explained in the follow-up to my patch series, I would
> not have expected Marcel to pick up any of the fixes for 3.14.
> There are a lot of moving parts in usb + bluetooth + rfcomm + tty,
> and the unfortunate reality is that -next doesn't get as much
> testing as it should.

> The fault is mine because Gianluca let me know about the
> problems with the conversion to tty_port, but the holidays really
> interfered with my ability to put this work first, and I'm sorry
> for that.

> I know the breakage around RFCOMM is frustrating but I think the
> worst is behind us. After 3.15 gets some -rc testing, I will
> be happy to cherry-pick the critical fixes for -stable inclusion.

Ok but the breakage/regression was known since around the merge window.
I thought the "standard policy" when things cause new regression and are not 
fixable (too intrusive, too time consuming you
name the reason), was to revert .. (and possibly ASAP so the reverts also get 
some test coverage in mainline RC's).
That's also why i tested the revert ASAP to let you know that that worked  .. 
at least for me.

But no big deal for mee .. the reverts are simple enough to be privately 
applied.

> Regards,
> Peter Hurley


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RC6 Bell Chime] [PATCH 00/24] rfcomm fixes

2014-03-15 Thread Sander Eikelenboom

Saturday, March 15, 2014, 9:45:03 PM, you wrote:

> On 03/15/2014 01:53 PM, Linus Torvalds wrote:
>> Guys, why is this being discussed?

> FWIW, the 'known breakage' for 3.14 is a (valid) lockdep report.

Hmm .. whoops you are right .. i remembered it as being an "oops",
but you are right it was merely an lockdep warning.
Should have checked that before putting up my big mouth .. sorry for that!

> This regression was introduced by a small patchset added to -next
> over the holidays that was intended to address 2 bug reports
> stemming from a long-overdue overhaul of the RFCOMM tty driver by
> Gianluca Anzolin (which fixed numerous problems and several hangs
> reported since 3.8).

Ok if what went in for 3.14 fixes some hangs that should probably outweigh
the lockdep warning regression.

> It is the bulk of that small patchset which is being reverted.

> The point of this brief history is that:
> 1) the 3.13 state of rfcomm is just as broken as 3.14-rcX, but
> in a different way
> 2) there are plenty of serious defects in both versions regardless.

> I mean for this to be informative and not argumentative --
> either outcome is ok with me. In fact, I'm ok if you want to
> pick my entire 24-patch series that addresses the bugs in 3.13
> and 3.14, plus a bunch of other problems that I found at the time.

> Regards,
> Peter Hurley



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 00/24] rfcomm fixes

2014-03-03 Thread Sander Eikelenboom

Wednesday, February 12, 2014, 12:06:44 PM, you wrote:

> Monday, February 10, 2014, 11:09:38 PM, you wrote:

>> Hi Peter,

>>> This patch series addresses a number of previously unknown issues
>>> with the RFCOMM tty device implementation, in addition to
>>> addressing the locking regression recently reported [1].
>>> 
>>> As Gianluca suggested and I agree, this series first reverts
>>> 3 of the 4 patches of 3.14-rc1 for bluetooth/rfcomm/tty.c.

>> so for 3.14 we should revert 3 patches. And then the other 21 are intended 
>> for 3.15 merge window.

>> I realize that we still have to deal with some breakage, but we do not want 
>> regressions and I clearly not going to take 24 patches for 3.14 at this 
>> point in time.

>> What I can do is take all 24 patches into bluetooth-next and let them sit 
>> for 1 week and have people test them. And then we go ahead with reverting 3 
>> patches from 3.14. Does that make sense?

> Reverting those 3 patches works for me.

> --
> Sander

>> Regards

>> Marcel

Hi Marcel,

Ping... it seems these 3 reverts are still not in 3.14-rc5 to fix the 
regressions ?

--
Sander

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Xen-devel] [GIT PULL] (xen) stable/for-jens-3.14 : NFO: trying to register non-static key. the code is fine but needs lockdep annotation.

2014-02-11 Thread Sander Eikelenboom
Hi Konrad,

Today decided to tryout another kernel RC and your pull request to Jens on top 
of it .. and I encoutered this one:


[  438.029756] INFO: trying to register non-static key.
[  438.029759] the code is fine but needs lockdep annotation.
[  438.029760] turning off the locking correctness validator.
[  438.029770] CPU: 3 PID: 9593 Comm: blkback.2.xvda Tainted: GW
3.14.0-rc2-20140211-pcireset-net-btrevert-xenblock+ #1
[  438.029773] Hardware name: MSI MS-7640/890FXA-GD70 (MS-7640)  , BIOS V1.8B1 
09/13/2010
[  438.029784]  88005224c4f0 88004e5d9b68 81b808c4 
88004ba2b510
[  438.029791]  0002 88004e5d9c38 81116eab 
88004e5d9bf8
[  438.029798]  81117b35   
82cee570
[  438.029799] Call Trace:
[  438.029815]  [] dump_stack+0x46/0x58
[  438.029826]  [] __lock_acquire+0x1c2b/0x2220
[  438.029833]  [] ? lock_acquire+0xe5/0x150
[  438.029841]  [] lock_acquire+0xbd/0x150
[  438.029847]  [] ? flush_work+0x5/0x290
[  438.029852]  [] flush_work+0x3d/0x290
[  438.029856]  [] ? flush_work+0x5/0x290
[  438.029863]  [] ? lock_acquire+0xe5/0x150
[  438.029872]  [] ? xen_blkif_schedule+0x1a1/0x8d0
[  438.029881]  [] ? _raw_spin_unlock_irqrestore+0x6d/0x90
[  438.029888]  [] ? trace_hardirqs_on_caller+0xfb/0x240
[  438.029894]  [] ? trace_hardirqs_on+0xd/0x10
[  438.029901]  [] xen_blkif_schedule+0x289/0x8d0
[  438.029907]  [] ? __init_waitqueue_head+0x60/0x60
[  438.029913]  [] ? trace_hardirqs_on+0xd/0x10
[  438.029919]  [] ? _raw_spin_unlock_irqrestore+0x81/0x90
[  438.029925]  [] ? xen_blkif_be_int+0x40/0x40
[  438.029932]  [] kthread+0xe4/0x100
[  438.029938]  [] ? _raw_spin_unlock_irq+0x30/0x50
[  438.029946]  [] ? __init_kthread_worker+0x70/0x70
[  438.029951]  [] ret_from_fork+0x7c/0xb0
[  438.029958]  [] ? __init_kthread_worker+0x70/0x70

Doesn't seem to serious .. but never the less :-)

--

Sander


Monday, February 10, 2014, 8:54:02 PM, you wrote:

> On Mon, Feb 10 2014, Konrad Rzeszutek Wilk wrote:
>> Hey Jens,
>> 
>> Please git pull the following branch:
>> 
>>  git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip.git 
>> stable/for-jens-3.14
>> 
>> which is based off v3.13-rc6. If you would like me to rebase it on
>> a different branch/tag I would be more than happy to do so.

> Older is fine, it's only an issue if you are ahead of the branch you
> want to go into.

dd>> 
>> The patches are all bug-fixes and hopefully can go in 3.14.
>> 
>> They deal with xen-blkback shutdown and cause memory leaks
>> as well as shutdown races. They should go to stable tree and if you
>> are OK with I will ask them to backport those fixes.
>> 
>> There is also a fix to xen-blkfront to deal with unexpected state
>> transition. And lastly a fix to the header where it was using the
>> __aligned__ unnecessarily.

> Pulled!


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 3.14-mw regression: rtl8169 WARNING: DMA-API: exceeded 7 overlapping mappings of pfn 55ebe

2014-02-11 Thread Sander Eikelenboom
2] r8169 :0b:00.0: single idx 586 P=3c7160c0 N=3c716 D=c940c0 
L=36 DMA_TO_DEVICE dma map error checked
[  218.477874] r8169 :0b:00.0: single idx 586 P=3c716280 N=3c716 D=c95280 
L=36 DMA_TO_DEVICE dma map error checked
[  218.480075] r8169 :0b:00.0: single idx 587 P=3c716440 N=3c716 D=c96440 
L=36 DMA_TO_DEVICE dma map error checked
[  218.482245] r8169 :0b:00.0: single idx 587 P=3c716600 N=3c716 D=c97600 
L=36 DMA_TO_DEVICE dma map error checked
[  218.484390] r8169 :0b:00.0: single idx 588 P=3c7167c0 N=3c716 D=c987c0 
L=42 DMA_TO_DEVICE dma map error checked
[  218.486510] r8169 :0b:00.0: single idx 588 P=3c7169c0 N=3c716 D=c999c0 
L=36 DMA_TO_DEVICE dma map error checked
[  218.488603] r8169 :0b:00.0: single idx 589 P=3c716b80 N=3c716 D=c9ab80 
L=42 DMA_TO_DEVICE dma map error checked
[  218.490682] r8169 :0b:00.0: single idx 589 P=3c716d80 N=3c716 D=c9bd80 
L=42 DMA_TO_DEVICE dma map error checked
[  218.492735] r8169 :0b:00.0: single idx 590 P=3c716f80 N=3c716 D=c9cf80 
L=42 DMA_TO_DEVICE dma map error not checked
[  218.494788] r8169 :0b:00.0: DMA-API: exceeded 7 overlapping mappings of 
pfn 3c716 .. end of dump

--
Sander





Thursday, February 6, 2014, 3:26:09 PM, you wrote:

> On Thu, Feb 6, 2014 at 5:09 AM, Sander Eikelenboom  
> wrote:
>> Hmm ok that last message was false .. sorry for that .. it did happen again 
>> without r8169.use_dac=1, it just doesn't seem to happen all the time...
>>
>> Konrad / Wei, do you happen to know of any xen related change that went into 
>> 3.14 merge window that relates to dma / xen networking ?
>>
>> --
>> Sander
>>
>> complete stacktrace:
>>
>> [  342.710738] [ cut here ]
>> [  342.726890] WARNING: CPU: 0 PID: 0 at lib/dma-debug.c:491 
>> add_dma_entry+0x105/0x130()
>> [  342.743210] DMA-API: exceeded 7 overlapping mappings of pfn 40b00
>> [  342.759510] Modules linked in:
>> [  342.775557] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 
>> 3.14.0-rc1-20140206-pcireset-net-btrevert+ #1
>> [  342.791706] Hardware name: MSI MS-7640/890FXA-GD70 (MS-7640)  , BIOS 
>> V1.8B1 09/13/2010
>> [  342.807627]  0009 88005f603828 81ad29fc 
>> 822134e0
>> [  342.823430]  88005f603878 88005f603868 810bdf62 
>> 8800
>> [  342.839081]  00040b00 ffef 822102e0 
>> 8800592b9098
>> [  342.854572] Call Trace:
>> [  342.869748][] dump_stack+0x46/0x58
>> [  342.884915]  [] warn_slowpath_common+0x82/0xb0
>> [  342.899710]  [] warn_slowpath_fmt+0x41/0x50
>> [  342.914395]  [] ? active_pfn_read_overlap+0x3a/0x70
>> [  342.929166]  [] add_dma_entry+0x105/0x130
>> [  342.943733]  [] debug_dma_map_page+0x126/0x150
>> [  342.957988]  [] rtl8169_start_xmit+0x216/0xa20
>> [  342.972306]  [] ? dev_queue_xmit_nit+0x1ef/0x260
>> [  342.986523]  [] ? dev_loopback_xmit+0x1e0/0x1e0
>> [  343.000689]  [] dev_hard_start_xmit+0x2e6/0x4a0
>> [  343.014466]  [] sch_direct_xmit+0xfe/0x280
>> [  343.028052]  [] __dev_queue_xmit+0x23c/0x630
>> [  343.041338]  [] ? dev_hard_start_xmit+0x4a0/0x4a0
>> [  343.054483]  [] ? ip_output+0x54/0xf0
>> [  343.067659]  [] dev_queue_xmit+0xb/0x10
>> [  343.080804]  [] ip_finish_output+0x2cb/0x670
>> [  343.093746]  [] ? ip_output+0x54/0xf0
>> [  343.106391]  [] ip_output+0x54/0xf0
>> [  343.118683]  [] ip_forward_finish+0x71/0x1a0
>> [  343.130901]  [] ip_forward+0x1a3/0x440
>> [  343.142829]  [] ? lock_is_held+0x8b/0xb0
>> [  343.154346]  [] ip_rcv_finish+0x150/0x660
>> [  343.165748]  [] ip_rcv+0x22b/0x370
>> [  343.176838]  [] ? packet_rcv_spkt+0x42/0x190
>> [  343.187659]  [] __netif_receive_skb_core+0x6d2/0x8a0
>> [  343.198209]  [] ? __netif_receive_skb_core+0x114/0x8a0
>> [  343.208819]  [] ? xen_clocksource_read+0x20/0x30
>> [  343.219471]  [] ? getnstimeofday+0x9/0x30
>> [  343.229862]  [] __netif_receive_skb+0x1c/0x70
>> [  343.239953]  [] netif_receive_skb_internal+0x1e/0xf0
>> [  343.249908]  [] napi_gro_receive+0x70/0xa0
>> [  343.259509]  [] rtl8169_poll+0x2d3/0x680
>> [  343.268982]  [] ? _raw_spin_unlock_irq+0x2b/0x50
>> [  343.278091]  [] net_rx_action+0x161/0x260
>> [  343.287056]  [] __do_softirq+0x12c/0x280
>> [  343.295756]  [] irq_exit+0xa2/0xd0
>> [  343.304235]  [] xen_evtchn_do_upcall+0x2f/0x40
>> [  343.312387]  [] xen_do_hypervisor_callback+0x1e/0x30
>> [  343.320389][] ? xen_hypercall_sched_op+0xa/0x20
>> [  343.328171]  [] ? xen_hypercall_sched_op+0xa/0x20
>> [  343.335738]  [] ? xen_safe_halt+0x10/0x20
>> [  343.343142]  [] ? default_idle+0x18/0x20
>> [  343.350

Re: [Xen-devel] [GIT PULL] (xen) stable/for-jens-3.14 : NFO: trying to register non-static key. the code is fine but needs lockdep annotation.

2014-02-11 Thread Sander Eikelenboom

Tuesday, February 11, 2014, 7:21:56 PM, you wrote:

> On 11/02/14 18:15, Roger Pau Monné wrote:
>> On 11/02/14 18:52, David Vrabel wrote:
>>> 
>> That would mean that unmap_purged_grants would no longer be static and 
>> I should also add a prototype for it in blkback/common.h, which is kind 
>> of ugly IMHO.

> But less ugly than initializing work with a NULL function, IMO.

>> commit 980e72e45454b64ccb7f23b6794a769384e51038
>> Author: Roger Pau Monne 
>> Date:   Tue Feb 11 19:04:06 2014 +0100
>> 
>> xen-blkback: init persistent_purge_work work_struct
>> 
>> Initialize persistent_purge_work work_struct on xen_blkif_alloc (and
>> remove the previous initialization done in purge_persistent_gnt). This
>> prevents flush_work from complaining even if purge_persistent_gnt has
>> not been used.
>> 
>> Signed-off-by: Roger Pau Monné 

> Reviewed-by: David Vrabel 

And a Tested-by: Sander Eikelenboom 

Thanks !

> Thanks.

> David


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 3.14-mw regression: rtl8169 WARNING: DMA-API: exceeded 7 overlapping mappings of pfn 55ebe

2014-02-11 Thread Sander Eikelenboom

Tuesday, February 11, 2014, 10:28:52 PM, you wrote:

> On Tue, 2014-02-11 at 20:56 +0100, Sander Eikelenboom wrote:
>> Hi Dan,
>> 
>> FYI just tested and put Xen out of the equation (booting baremetal) and it 
>> still persists.
>> 
>> I tried something else .. don't know if it gives you anymore insights, but 
>> it's worth the try:
>> 
>> diff --git a/lib/dma-debug.c b/lib/dma-debug.c
>> index 2defd13..0fe5b75 100644
>> --- a/lib/dma-debug.c
>> +++ b/lib/dma-debug.c
>> @@ -474,11 +474,11 @@ static int active_pfn_set_overlap(unsigned long pfn, 
>> int overlap)
>> return overlap;
>>  }
>> 
>> -static void active_pfn_inc_overlap(unsigned long pfn)
>> +static void active_pfn_inc_overlap(struct dma_debug_entry *ent)
>>  {
>> -   int overlap = active_pfn_read_overlap(pfn);
>> +   int overlap = active_pfn_read_overlap(ent->pfn);
>> 
>> -   overlap = active_pfn_set_overlap(pfn, ++overlap);
>> +   overlap = active_pfn_set_overlap(ent->pfn, ++overlap);
>> 
>> /* If we overflowed the overlap counter then we're potentially
>>  * leaking dma-mappings.  Otherwise, if maps and unmaps are
>> @@ -486,15 +486,43 @@ static void active_pfn_inc_overlap(unsigned long pfn)
>>  * debug_dma_assert_idle() as the pfn may be marked idle
>>  * prematurely.
>>  */
>> +
>> WARN_ONCE(overlap > ACTIVE_PFN_MAX_OVERLAP,
>>   "DMA-API: exceeded %d overlapping mappings of pfn %lx\n",
>> - ACTIVE_PFN_MAX_OVERLAP, pfn);
>> + ACTIVE_PFN_MAX_OVERLAP, ent->pfn);
>> +
>> +   if(overlap > ACTIVE_PFN_MAX_OVERLAP){
>> +
>> +   dev_info(ent->dev, "DMA-API: exceeded %d overlapping 
>> mappings of pfn %lx .. start dump\n", ACTIVE_PFN_MAX_OVERLAP, ent->pfn);
>> +   int idx;
>> +
>> +   for (idx = 0; idx < HASH_SIZE; idx++) {
>> +struct hash_bucket *bucket = &dma_entry_hash[idx];
>> +struct dma_debug_entry *entry;
>> +   unsigned long flags;
>> +
>> +list_for_each_entry(entry, &bucket->list, list) {
>> +   if (entry->pfn == ent->pfn) {
>> +   dev_info(entry->dev, "%s idx %d 
>> P=%Lx N=%lx D=%Lx L=%Lx %s %s\n",
>> +type2name[entry->type], idx,
>> +phys_addr(entry), 
>> entry->pfn,
>> +entry->dev_addr, 
>> entry->size,
>> +dir2name[entry->direction],
>> +   
>> maperr2str[entry->map_err_type]);
>> +   }
>> +}
>> +   }
>> +   dev_info(ent->dev, "DMA-API: exceeded %d overlapping 
>> mappings of pfn %lx .. end of dump\n", ACTIVE_PFN_MAX_OVERLAP, ent->pfn);
>> +   }
>>  }
>> 
>> 
>> @@ -505,10 +533,10 @@ static int active_pfn_insert(struct dma_debug_entry 
>> *entry)
>> 
>> spin_lock_irqsave(&radix_lock, flags);
>> rc = radix_tree_insert(&dma_active_pfn, entry->pfn, entry);
>> -   if (rc == -EEXIST)
>> -   active_pfn_inc_overlap(entry->pfn);
>> +   if (rc == -EEXIST){
>> +   active_pfn_inc_overlap(entry);
>> +   }
>> spin_unlock_irqrestore(&radix_lock, flags);
>> -
>> return rc;
>>  }
>> 
>> 
>> This results in:
>> [   27.708678] r8169 :0a:00.0 eth1: link down
>> [   27.712102] r8169 :0a:00.0 eth1: link down
>> [   28.015340] r8169 :0b:00.0 eth0: link down
>> [   28.015368] r8169 :0b:00.0 eth0: link down
>> [   29.654844] r8169 :0b:00.0 eth0: link up
>> [   30.278542] r8169 :0a:00.0 eth1: link up
>> [   60.829503] EXT4-fs (dm-2): mounted filesystem with ordered data mode. 
>> Opts: barrier=1,errors=remount-ro
>> [   69.708979] EXT4-fs (dm-42): mounted filesystem with ordered data mode. 
>> Opts: barrier=1,errors=remount-ro
>> [   76.128678] EXT4-fs (dm-43): mounted filesystem with ordered data mode. 
>> Opts: barrier=1,errors=remount-ro
>> [   82.922836] EXT4-fs (dm-44): mounted filesystem with ordered data mode. 
&g

Re: [PATCH 00/24] rfcomm fixes

2014-02-12 Thread Sander Eikelenboom

Monday, February 10, 2014, 11:09:38 PM, you wrote:

> Hi Peter,

>> This patch series addresses a number of previously unknown issues
>> with the RFCOMM tty device implementation, in addition to
>> addressing the locking regression recently reported [1].
>> 
>> As Gianluca suggested and I agree, this series first reverts
>> 3 of the 4 patches of 3.14-rc1 for bluetooth/rfcomm/tty.c.

> so for 3.14 we should revert 3 patches. And then the other 21 are intended 
> for 3.15 merge window.

> I realize that we still have to deal with some breakage, but we do not want 
> regressions and I clearly not going to take 24 patches for 3.14 at this point 
> in time.

> What I can do is take all 24 patches into bluetooth-next and let them sit for 
> 1 week and have people test them. And then we go ahead with reverting 3 
> patches from 3.14. Does that make sense?

Reverting those 3 patches works for me.

--
Sander

> Regards

> Marcel



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2] Setting the IORESOURCE_ROM_SHADOW flag on a VGA card other than the primary prevents it from reading it's own rom. It will get the content of the shadowrom at C000 instead, which is of

2014-02-13 Thread Sander Eikelenboom
Hi Bjorn,

I have given it another email and another week, but without gaining any 
reviewed or acked-by's.
It seems the only way forward is to shovel it in linux-next earlier, give it a 
good soak and see if
anyone starts to squeal .. or that everything seems to be ok :-)

Would you need a v3 with the acked and reviewed-by from Konrad for x86 in it ?

--

Sander


Monday, February 3, 2014, 3:52:05 PM, you wrote:

> On Fri, Jan 31, 2014 at 10:28:22AM +0100, Sander Eikelenboom wrote:
>> Hi Bjorn / Tony,
>> 
>> I fixed up ia64 as well and brought it inline again with the x86 code,
>> but i don't have a ia64 machine, so that part is untested.
>> Perhaps Tony is able to review/test it ?
>> 
>> Sander
>> 
>> 
>> 
>> Setting the IORESOURCE_ROM_SHADOW flag on a VGA card other than the primary
>> prevents it from reading it's own rom. It will get the content of the 
>> shadowrom
>> at C000 instead, which is of the primary VGA card and the driver of the
>> secondary card will bail out.
>> 
>> Fix this by checking if the arch code or vga-arbitration has already
>> determined the vga_default_device, if so only apply the fix to this
>> primary video device and let the comment reflect this.
>> 
>> v2:
>> - Fix pci_fixup_video both in x86 and ia64
>> 
>> 
>> Sander Eikelenboom (1):
>>   Setting the IORESOURCE_ROM_SHADOW flag on a VGA card other than the
>> primary prevents it from reading it's own rom. It will get the
>> content of the shadowrom at C000 instead, which is of the
>> primary VGA card and the driver of the secondary card will bail
>> out.

> Your editor mutilated your subject line. It ought to have been just
> one line.

> Anyhow, you can also add 'Reviewed-by: Konrad Rzeszutek Wilk 
>  on the patch for the x86 part.

> The ia64 "looks" OK to me, but my ia64 box won't boot v3.11 or later
> so I can't give it a 'Tested-by' yet.

>> 
>>  arch/ia64/pci/fixup.c |   24 +---
>>  arch/x86/pci/fixup.c  |   18 ++
>>  2 files changed, 23 insertions(+), 19 deletions(-)
>> 
>> -- 
>> 1.7.10.4
>> 


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


3.14-mw regression: rtl8169 WARNING: DMA-API: exceeded 7 overlapping mappings of pfn 55ebe

2014-01-26 Thread Sander Eikelenboom
Hi,

I have got a regression with a 3.14-mw kernel (last commit is 
4ba9920e5e9c0e16b5ed24292d45322907bb9035):
It looks like it's related to the rtl8169 ...

--
Sander

Jan 26 11:36:26 serveerstertje kernel: [   89.105537] [ cut here 
]
Jan 26 11:36:26 serveerstertje kernel: [   89.116779] WARNING: CPU: 0 PID: 0 at 
lib/dma-debug.c:491 add_dma_entry+0x103/0x130()
Jan 26 11:36:26 serveerstertje kernel: [   89.128148] DMA-API: exceeded 7 
overlapping mappings of pfn 55ebe
Jan 26 11:36:26 serveerstertje kernel: [   89.139397] Modules linked in:
Jan 26 11:36:26 serveerstertje kernel: [   89.150535] CPU: 0 PID: 0 Comm: 
swapper/0 Not tainted 3.13.0-20140125-mw-pcireset+ #1
Jan 26 11:36:26 serveerstertje kernel: [   89.161784] Hardware name: MSI 
MS-7640/890FXA-GD70 (MS-7640)  , BIOS V1.8B1 09/13/2010
Jan 26 11:36:26 serveerstertje kernel: [   89.172965]  0009 
88005f603838 81acbcfa 822134e0
Jan 26 11:36:26 serveerstertje kernel: [   89.184156]  88005f603888 
88005f603878 810bdf62 8800
Jan 26 11:36:26 serveerstertje kernel: [   89.195186]  00055ebe 
ffef 0200 8800592ea098
Jan 26 11:36:26 serveerstertje kernel: [   89.206227] Call Trace:
Jan 26 11:36:26 serveerstertje kernel: [   89.217027]
[] dump_stack+0x46/0x58
Jan 26 11:36:26 serveerstertje kernel: [   89.227907]  [] 
warn_slowpath_common+0x82/0xb0
Jan 26 11:36:26 serveerstertje kernel: [   89.238678]  [] 
warn_slowpath_fmt+0x41/0x50
Jan 26 11:36:26 serveerstertje kernel: [   89.249336]  [] ? 
active_pfn_read_overlap+0x3a/0x70
Jan 26 11:36:26 serveerstertje kernel: [   89.259904]  [] 
add_dma_entry+0x103/0x130
Jan 26 11:36:26 serveerstertje kernel: [   89.270416]  [] 
debug_dma_map_page+0x126/0x150
Jan 26 11:36:26 serveerstertje kernel: [   89.280840]  [] 
rtl8169_start_xmit+0x216/0xa20
Jan 26 11:36:26 serveerstertje kernel: [   89.291073]  [] ? 
__kfree_skb+0x3a/0xb0
Jan 26 11:36:26 serveerstertje kernel: [   89.301252]  [] ? 
dev_queue_xmit_nit+0x1ef/0x260
Jan 26 11:36:26 serveerstertje kernel: [   89.311392]  [] ? 
dev_loopback_xmit+0x1e0/0x1e0
Jan 26 11:36:26 serveerstertje kernel: [   89.321418]  [] 
dev_hard_start_xmit+0x2e6/0x4a0
Jan 26 11:36:26 serveerstertje kernel: [   89.331236]  [] 
sch_direct_xmit+0xfe/0x280
Jan 26 11:36:26 serveerstertje kernel: [   89.341013]  [] 
__dev_queue_xmit+0x23c/0x630
Jan 26 11:36:26 serveerstertje kernel: [   89.350668]  [] ? 
dev_hard_start_xmit+0x4a0/0x4a0
Jan 26 11:36:26 serveerstertje kernel: [   89.360264]  [] ? 
ip_output+0x54/0xf0
Jan 26 11:36:26 serveerstertje kernel: [   89.369698]  [] 
dev_queue_xmit+0xb/0x10
Jan 26 11:36:26 serveerstertje kernel: [   89.379034]  [] 
ip_finish_output+0x2cb/0x670
Jan 26 11:36:26 serveerstertje kernel: [   89.388373]  [] ? 
ip_output+0x54/0xf0
Jan 26 11:36:26 serveerstertje kernel: [   89.397498]  [] 
ip_output+0x54/0xf0
Jan 26 11:36:26 serveerstertje kernel: [   89.406584]  [] 
ip_forward_finish+0x71/0x1a0
Jan 26 11:36:26 serveerstertje kernel: [   89.415534]  [] 
ip_forward+0x1a3/0x440
Jan 26 11:36:26 serveerstertje kernel: [   89.424400]  [] 
ip_rcv_finish+0x150/0x650
Jan 26 11:36:26 serveerstertje kernel: [   89.433108]  [] 
ip_rcv+0x22b/0x370
Jan 26 11:36:26 serveerstertje kernel: [   89.441737]  [] ? 
packet_rcv_spkt+0x42/0x190
Jan 26 11:36:26 serveerstertje kernel: [   89.450226]  [] 
__netif_receive_skb_core+0x6d2/0x8a0
Jan 26 11:36:26 serveerstertje kernel: [   89.458687]  [] ? 
__netif_receive_skb_core+0x114/0x8a0
Jan 26 11:36:26 serveerstertje kernel: [   89.467109]  [] ? 
xen_clocksource_read+0x20/0x30
Jan 26 11:36:26 serveerstertje kernel: [   89.475362]  [] ? 
getnstimeofday+0x9/0x30
Jan 26 11:36:26 serveerstertje kernel: [   89.483548]  [] 
__netif_receive_skb+0x1c/0x70
Jan 26 11:36:26 serveerstertje kernel: [   89.491608]  [] 
netif_receive_skb_internal+0x1e/0xf0
Jan 26 11:36:26 serveerstertje kernel: [   89.499596]  [] 
napi_gro_receive+0x70/0xa0
Jan 26 11:36:26 serveerstertje kernel: [   89.507486]  [] 
rtl8169_poll+0x2d3/0x680
Jan 26 11:36:26 serveerstertje kernel: [   89.515222]  [] 
net_rx_action+0x161/0x260
Jan 26 11:36:26 serveerstertje kernel: [   89.523097]  [] 
__do_softirq+0x11d/0x250
Jan 26 11:36:26 serveerstertje kernel: [   89.530973]  [] 
irq_exit+0xa2/0xd0
Jan 26 11:36:26 serveerstertje kernel: [   89.538915]  [] 
xen_evtchn_do_upcall+0x2f/0x40
Jan 26 11:36:26 serveerstertje kernel: [   89.546876]  [] 
xen_do_hypervisor_callback+0x1e/0x30
Jan 26 11:36:26 serveerstertje kernel: [   89.554591]
[] ? xen_hypercall_sched_op+0xa/0x20
Jan 26 11:36:26 serveerstertje kernel: [   89.562139]  [] ? 
xen_hypercall_sched_op+0xa/0x20
Jan 26 11:36:26 serveerstertje kernel: [   89.569503]  [] ? 
xen_safe_halt+0x10/0x20
Jan 26 11:36:26 serveerstertje kernel: [   89.576788]  [] ? 
default_idle+0x18/0x20
Jan 26 11:36:26 serveerstertje kernel: [   89.583863]  [] ? 
arch_cpu_idle+0x2e/0x40
Jan 26 11:36:26 serveerstertje kernel: [   89.590627]

In "pci_fixup_video" check if this is or should be the primary video device to prevent setting the IORESOURCE_ROM_SHADOW flag on a secondary VGA card

2014-01-11 Thread Sander Eikelenboom
Hi Eiichiro / Dave / Greg,

While trying to get secondary PCI/VGA passthrough of a AMD 6570 card to a Xen 
guest with the radeon driver and modesetting
i'm running into the problem that the driver says the BIOS is a COMBIOS while 
it expects a ATOMBIOS for the cards.

So the Guest uses both it's normal emulated VGA card provided by Qemu (f.e. 
cirrus logic) and a real VGA card via
PCI passthrough.

While debugging it turned out that the bios that the driver read was not the 
AMD bios, but the bios from the emulated card.
(so it wasn't a COMBIOS either ..)

I first thought the culprit was with Xen, Seabios or Qemu ..
So it took quite a while and debugging, but finally my eye fell on this in the 
guest dmesg:

[2.545728] pci :00:00.0: calling quirk_natoma+0x0/0x40
[2.545730] pci :00:00.0: Limiting direct PCI/PCI transfers
[2.558998] pci :00:00.0: calling quirk_passive_release+0x0/0x90
[2.559121] pci :00:01.0: PIIX3: Enabling Passive Release
[2.572412] pci :00:01.0: calling quirk_isa_dma_hangs+0x0/0x40
[2.572415] pci :00:01.0: Activating ISA DMA hang workarounds
[2.586527] pci :00:03.0: calling pci_fixup_video+0x0/0xd0
[2.586609] pci :00:03.0: Boot video device
[2.586696] pci :00:05.0: calling pci_fixup_video+0x0/0xd0
[2.586827] pci :00:05.0: Boot video device
[2.586928] pci :00:06.0: calling quirk_e100_interrupt+0x0/0x1c0

It's calling the "pci_fixup_video" quirk ... and it's calling it twice ..
which if i read the comment correctly .. shouldn't be the case:

 /*
 * Fixup to mark boot BIOS video selected by BIOS before it changes
 *
 * From information provided by "Jon Smirl" 
 *
 * The standard boot ROM sequence for an x86 machine uses the BIOS
 * to select an initial video card for boot display. This boot video
 * card will have it's BIOS copied to C in system RAM.
 * IORESOURCE_ROM_SHADOW is used to associate the boot video
 * card with this copy. On laptops this copy has to be used since
 * the main ROM may be compressed or combined with another image.
 * See pci_map_rom() for use of this flag. IORESOURCE_ROM_SHADOW
 * is marked here since the boot video device will be the only enabled
 * video device at this point.
 */


But the code doesn't check if it's actually the only enabled (or first) video 
device at that point ..
and it's setting 2 boot video devices and setting both to use the 
IORESOURCE_ROM_SHADOW at C000 ..
which happens to be the bios from the emulated card.

With this patch applied the passthrough of the card works fine in the guest and 
dmesg reports:

[2.167076] pci :00:00.0: calling quirk_natoma+0x0/0x40
[2.167078] pci :00:00.0: Limiting direct PCI/PCI transfers
[2.179807] pci :00:00.0: calling quirk_passive_release+0x0/0x90
[2.179953] pci :00:01.0: PIIX3: Enabling Passive Release
[2.192953] pci :00:01.0: calling quirk_isa_dma_hangs+0x0/0x40
[2.192955] pci :00:01.0: Activating ISA DMA hang workarounds
[2.206543] pci :00:03.0: calling pci_fixup_video+0x0/0xe0
[2.206623] pci :00:03.0: Boot video device
[2.206710] pci :00:05.0: calling pci_fixup_video+0x0/0xe0
[2.206842] pci 0000:00:06.0: calling quirk_e100_interrupt+0x0/0x1c0

--
Sander

Sander Eikelenboom (1):
  In "pci_fixup_video" check if this is or should be the primary video
device to prevent setting the IORESOURCE_ROM_SHADOW flag on a
secondary VGA card

 arch/x86/pci/fixup.c |   17 +
 1 file changed, 9 insertions(+), 8 deletions(-)

-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] In "pci_fixup_video" check if this is or should be the primary video device to prevent setting the IORESOURCE_ROM_SHADOW flag on a secondary VGA card

2014-01-11 Thread Sander Eikelenboom
Setting the IORESOURCE_ROM_SHADOW flag on a secondary VGA card prevents if from
reading it's own rom. It will get the content of the shadowrom at C000 instead,
which is of the primary VGA card and the driver of the secondary card will bail
out.

Fix this by checking if this is or should be the primary video device before
applying the fix and let the comment reflect this.

Signed-off-by: Sander Eikelenboom 
---
 arch/x86/pci/fixup.c |   17 +
 1 file changed, 9 insertions(+), 8 deletions(-)

diff --git a/arch/x86/pci/fixup.c b/arch/x86/pci/fixup.c
index b046e07..525e49a 100644
--- a/arch/x86/pci/fixup.c
+++ b/arch/x86/pci/fixup.c
@@ -314,9 +314,9 @@ DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL,
PCI_DEVICE_ID_INTEL_MCH_PC1,pcie_r
  * IORESOURCE_ROM_SHADOW is used to associate the boot video
  * card with this copy. On laptops this copy has to be used since
  * the main ROM may be compressed or combined with another image.
- * See pci_map_rom() for use of this flag. IORESOURCE_ROM_SHADOW
- * is marked here since the boot video device will be the only enabled
- * video device at this point.
+ * See pci_map_rom() for use of this flag. Before we mark the device
+ * with IORESOURCE_ROM_SHADOW we have to check if this is or should become
+ * the primary video card, since this quirk is ran for all video devices.
  */
 
 static void pci_fixup_video(struct pci_dev *pdev)
@@ -347,12 +347,13 @@ static void pci_fixup_video(struct pci_dev *pdev)
}
bus = bus->parent;
}
-   pci_read_config_word(pdev, PCI_COMMAND, &config);
-   if (config & (PCI_COMMAND_IO | PCI_COMMAND_MEMORY)) {
-   pdev->resource[PCI_ROM_RESOURCE].flags |= IORESOURCE_ROM_SHADOW;
-   dev_printk(KERN_DEBUG, &pdev->dev, "Boot video device\n");
-   if (!vga_default_device())
+   if (!vga_default_device() || pdev == vga_default_device()) {
+   pci_read_config_word(pdev, PCI_COMMAND, &config);
+   if (config & (PCI_COMMAND_IO | PCI_COMMAND_MEMORY)) {
+   pdev->resource[PCI_ROM_RESOURCE].flags |= 
IORESOURCE_ROM_SHADOW;
+   dev_printk(KERN_DEBUG, &pdev->dev, "Boot video 
device\n");
vga_set_default_device(pdev);
+   }
}
 }
 DECLARE_PCI_FIXUP_CLASS_FINAL(PCI_ANY_ID, PCI_ANY_ID,
-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: In "pci_fixup_video" check if this is or should be the primary video device t

2014-01-14 Thread Sander Eikelenboom

Tuesday, January 14, 2014, 6:54:17 AM, you wrote:

> Hi Sander,

>>It's calling the "pci_fixup_video" quirk ... and it's calling it twice ..
>>which if i read the comment correctly .. shouldn't be the case:

> I think that there is only one bridge VGA Enable bit is set on normal machine.
> I guess two virtual VGA Enable bit are set on your virtual machine.

> static void pci_fixup_video(struct pci_dev *pdev)
> ...
>  if (!(config & PCI_BRIDGE_CTL_VGA))
>  return;
>  }
> ...

> Eiichiro

I have added some printk stuff .. and that shows that that code never runs for 
any of the 2 devices ..
since it runs the while loop once but !bridge ...

Here is the complete lspci output, 00:03.0 is the emulated device, 00:05.0 the 
one passedthrough:

# lspci -vvknn
00:00.0 Host bridge [0600]: Intel Corporation 440FX - 82441FX PMC [Natoma] 
[8086:1237] (rev 02)
Subsystem: Red Hat, Inc Qemu virtual machine [1af4:1100]
Control: I/O- Mem- BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- 
Stepping- SERR- FastB2B- DisINTx-
Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- SERR- TAbort- 
SERR- TAbort- 
SERR- TAbort- SERR- TAbort- 
SERR- TAbort- SERR- TAbort- SERR- TAbort- SERR- >Hi Eiichiro / Dave / Greg,
>>
>>While trying to get secondary PCI/VGA passthrough of a AMD 6570 card to a Xen 
>>guest with the radeon driver and modesetting
>>i'm running into the problem that the driver says the BIOS is a COMBIOS while 
>>it expects a ATOMBIOS for the cards.
>>
>>So the Guest uses both it's normal emulated VGA card provided by Qemu (f.e. 
>>cirrus logic) and a real VGA card via
>>PCI passthrough.
>>
>>While debugging it turned out that the bios that the driver read was not the 
>>AMD bios, but the bios from the emulated card.
>>(so it wasn't a COMBIOS either ..)
>>
>>I first thought the culprit was with Xen, Seabios or Qemu ..
>>So it took quite a while and debugging, but finally my eye fell on this in 
>>the guest dmesg:
>>
>>[2.545728] pci :00:00.0: calling quirk_natoma+0x0/0x40
>>[2.545730] pci :00:00.0: Limiting direct PCI/PCI transfers
>>[2.558998] pci :00:00.0: calling quirk_passive_release+0x0/0x90
>>[2.559121] pci :00:01.0: PIIX3: Enabling Passive Release
>>[2.572412] pci :00:01.0: calling quirk_isa_dma_hangs+0x0/0x40
>>[2.572415] pci :00:01.0: Activating ISA DMA hang workarounds
>>[2.586527] pci :00:03.0: calling pci_fixup_video+0x0/0xd0
>>[2.586609] pci :00:03.0: Boot video device
>>[2.586696] pci :00:05.0: calling pci_fixup_video+0x0/0xd0
>>[2.586827] pci :00:05.0: Boot video device
>>[2.586928] pci :00:06.0: calling quirk_e100_interrupt+0x0/0x1c0
>>
>>It's calling the "pci_fixup_video" quirk ... and it's calling it twice ..
>>which if i read the comment correctly .. shouldn't be the case:
>>
>> /*
>> * Fixup to mark boot BIOS video selected by BIOS before it changes
>> *
>> * From information provided by "Jon Smirl" 
>> *
>> * The standard boot ROM sequence for an x86 machine uses the BIOS
>> * to select an initial video card for boot display. This boot video
>> * card will have it's BIOS copied to C in system RAM.
>> * IORESOURCE_ROM_SHADOW is used to associate the boot video
>> * card with this copy. On laptops this copy has to be used since
>> * the main ROM may be compressed or combined with another image.
>> * See pci_map_rom() for use of this flag. IORESOURCE_ROM_SHADOW
>> * is marked here since the boot video device will be the only enabled
>> * video device at this point.
>> */
>>
>>
>>But the code doesn't check if it's actually the only enabled (or first) video 
>>device at that point ..
>>and it's setting 2 boot video devices and setting both to use the 
>>IORESOURCE_ROM_SHADOW at C000 ..
>>which happens to be the bios from the emulated card.
>>
>>With this patch applied the passthrough of the card works fine in the guest 
>>and dmesg reports:
>>
>>[2.167076] pci :00:00.0: calling quirk_natoma+0x0/0x40
>>[2.167078] pci :00:00.0: Limiting direct PCI/PCI transfers
>>[2.179807] pci :00:00.0: calling quirk_passive_release+0x0/0x90
>>[2.179953] pci :00:01.0: PIIX3: Enabling Passive Release
>>[2.192953] pci :00:01.0: calling quirk_isa_dma_hangs+0x0/0x40
>>[2.192955] pci :00:01.0: Activating ISA DMA hang workarounds
>>[2.206543] pci 

Re: In "pci_fixup_video" check if this is or should be the primary video devi

2014-01-15 Thread Sander Eikelenboom
t;>the AMD bios, but the bios from the emulated card.
>>>>(so it wasn't a COMBIOS either ..)
>>>>
>>>>I first thought the culprit was with Xen, Seabios or Qemu ..
>>>>So it took quite a while and debugging, but finally my eye fell on this in 
>>>>the guest dmesg:
>>>>
>>>>[2.545728] pci :00:00.0: calling quirk_natoma+0x0/0x40
>>>>[2.545730] pci :00:00.0: Limiting direct PCI/PCI transfers
>>>>[2.558998] pci :00:00.0: calling quirk_passive_release+0x0/0x90
>>>>[2.559121] pci :00:01.0: PIIX3: Enabling Passive Release
>>>>[2.572412] pci :00:01.0: calling quirk_isa_dma_hangs+0x0/0x40
>>>>[2.572415] pci :00:01.0: Activating ISA DMA hang workarounds
>>>>[2.586527] pci :00:03.0: calling pci_fixup_video+0x0/0xd0
>>>>[2.586609] pci :00:03.0: Boot video device
>>>>[2.586696] pci :00:05.0: calling pci_fixup_video+0x0/0xd0
>>>>[2.586827] pci :00:05.0: Boot video device
>>>>[2.586928] pci :00:06.0: calling quirk_e100_interrupt+0x0/0x1c0
>>>>
>>>>It's calling the "pci_fixup_video" quirk ... and it's calling it twice ..
>>>>which if i read the comment correctly .. shouldn't be the case:
>>>>
>>>> /*
>>>> * Fixup to mark boot BIOS video selected by BIOS before it changes
>>>> *
>>>> * From information provided by "Jon Smirl" 
>>>> *
>>>> * The standard boot ROM sequence for an x86 machine uses the BIOS
>>>> * to select an initial video card for boot display. This boot video
>>>> * card will have it's BIOS copied to C in system RAM.
>>>> * IORESOURCE_ROM_SHADOW is used to associate the boot video
>>>> * card with this copy. On laptops this copy has to be used since
>>>> * the main ROM may be compressed or combined with another image.
>>>> * See pci_map_rom() for use of this flag. IORESOURCE_ROM_SHADOW
>>>> * is marked here since the boot video device will be the only enabled
>>>> * video device at this point.
>>>> */
>>>>
>>>>
>>>>But the code doesn't check if it's actually the only enabled (or first) 
>>>>video device at that point ..
>>>>and it's setting 2 boot video devices and setting both to use the 
>>>>IORESOURCE_ROM_SHADOW at C000 ..
>>>>which happens to be the bios from the emulated card.
>>>>
>>>>With this patch applied the passthrough of the card works fine in the guest 
>>>>and dmesg reports:
>>>>
>>>>[2.167076] pci :00:00.0: calling quirk_natoma+0x0/0x40
>>>>[2.167078] pci :00:00.0: Limiting direct PCI/PCI transfers
>>>>[2.179807] pci :00:00.0: calling quirk_passive_release+0x0/0x90
>>>>[2.179953] pci :00:01.0: PIIX3: Enabling Passive Release
>>>>[2.192953] pci :00:01.0: calling quirk_isa_dma_hangs+0x0/0x40
>>>>[2.192955] pci :00:01.0: Activating ISA DMA hang workarounds
>>>>[2.206543] pci :00:03.0: calling pci_fixup_video+0x0/0xe0
>>>>[2.206623] pci :00:03.0: Boot video device
>>>>[2.206710] pci :00:05.0: calling pci_fixup_video+0x0/0xe0
>>>>[2.206842] pci :00:06.0: calling quirk_e100_interrupt+0x0/0x1c0
>>>>
>>>>--
>>>>Sander
>>>>
>>>>Sander Eikelenboom (1):
>>>>  In "pci_fixup_video" check if this is or should be the primary video
>>>>device to prevent setting the IORESOURCE_ROM_SHADOW flag on a
>>>>secondary VGA card
>>>>
>>>> arch/x86/pci/fixup.c |   17 +
>>>> 1 file changed, 9 insertions(+), 8 deletions(-)
>>>>
>>>>-- 
>>>>1.7.10.4
>>>>
>>>>
>>
>>
>>


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: In "pci_fixup_video" check if this is or should be the primary video devi

2014-01-15 Thread Sander Eikelenboom
 EqualizationPhase2-, EqualizationPhase3-, 
>> LinkEqualizationRequest-
>>Capabilities: [a0] MSI: Enable+ Count=1/1 Maskable- 64bit+
>>Address: fee57000  Data: 4300
>>Capabilities: [100 v9] #1002
>>Kernel driver in use: radeon
>>
>>>>Hi Eiichiro / Dave / Greg,
>>>>
>>>>While trying to get secondary PCI/VGA passthrough of a AMD 6570 card to a 
>>>>Xen guest with the radeon driver and modesetting
>>>>i'm running into the problem that the driver says the BIOS is a COMBIOS 
>>>>while it expects a ATOMBIOS for the cards.
>>>>
>>>>So the Guest uses both it's normal emulated VGA card provided by Qemu (f.e. 
>>>>cirrus logic) and a real VGA card via
>>>>PCI passthrough.
>>>>
>>>>While debugging it turned out that the bios that the driver read was not 
>>>>the AMD bios, but the bios from the emulated card.
>>>>(so it wasn't a COMBIOS either ..)
>>>>
>>>>I first thought the culprit was with Xen, Seabios or Qemu ..
>>>>So it took quite a while and debugging, but finally my eye fell on this in 
>>>>the guest dmesg:
>>>>
>>>>[2.545728] pci :00:00.0: calling quirk_natoma+0x0/0x40
>>>>[2.545730] pci :00:00.0: Limiting direct PCI/PCI transfers
>>>>[2.558998] pci :00:00.0: calling quirk_passive_release+0x0/0x90
>>>>[2.559121] pci :00:01.0: PIIX3: Enabling Passive Release
>>>>[2.572412] pci :00:01.0: calling quirk_isa_dma_hangs+0x0/0x40
>>>>[2.572415] pci :00:01.0: Activating ISA DMA hang workarounds
>>>>[2.586527] pci :00:03.0: calling pci_fixup_video+0x0/0xd0
>>>>[2.586609] pci :00:03.0: Boot video device
>>>>[2.586696] pci :00:05.0: calling pci_fixup_video+0x0/0xd0
>>>>[2.586827] pci :00:05.0: Boot video device
>>>>[2.586928] pci :00:06.0: calling quirk_e100_interrupt+0x0/0x1c0
>>>>
>>>>It's calling the "pci_fixup_video" quirk ... and it's calling it twice ..
>>>>which if i read the comment correctly .. shouldn't be the case:
>>>>
>>>> /*
>>>> * Fixup to mark boot BIOS video selected by BIOS before it changes
>>>> *
>>>> * From information provided by "Jon Smirl" 
>>>> *
>>>> * The standard boot ROM sequence for an x86 machine uses the BIOS
>>>> * to select an initial video card for boot display. This boot video
>>>> * card will have it's BIOS copied to C in system RAM.
>>>> * IORESOURCE_ROM_SHADOW is used to associate the boot video
>>>> * card with this copy. On laptops this copy has to be used since
>>>> * the main ROM may be compressed or combined with another image.
>>>> * See pci_map_rom() for use of this flag. IORESOURCE_ROM_SHADOW
>>>> * is marked here since the boot video device will be the only enabled
>>>> * video device at this point.
>>>> */
>>>>
>>>>
>>>>But the code doesn't check if it's actually the only enabled (or first) 
>>>>video device at that point ..
>>>>and it's setting 2 boot video devices and setting both to use the 
>>>>IORESOURCE_ROM_SHADOW at C000 ..
>>>>which happens to be the bios from the emulated card.
>>>>
>>>>With this patch applied the passthrough of the card works fine in the guest 
>>>>and dmesg reports:
>>>>
>>>>[2.167076] pci :00:00.0: calling quirk_natoma+0x0/0x40
>>>>[2.167078] pci :00:00.0: Limiting direct PCI/PCI transfers
>>>>[2.179807] pci :00:00.0: calling quirk_passive_release+0x0/0x90
>>>>[2.179953] pci :00:01.0: PIIX3: Enabling Passive Release
>>>>[2.192953] pci :00:01.0: calling quirk_isa_dma_hangs+0x0/0x40
>>>>[2.192955] pci :00:01.0: Activating ISA DMA hang workarounds
>>>>[2.206543] pci :00:03.0: calling pci_fixup_video+0x0/0xe0
>>>>[2.206623] pci :00:03.0: Boot video device
>>>>[2.206710] pci :00:05.0: calling pci_fixup_video+0x0/0xe0
>>>>[2.206842] pci :00:06.0: calling quirk_e100_interrupt+0x0/0x1c0
>>>>
>>>>--
>>>>Sander
>>>>
>>>>Sander Eikelenboom (1):
>>>>  In "pci_fixup_video" check if this is or should be the primary video
>>>>device to prevent setting the IORESOURCE_ROM_SHADOW flag on a
>>>>secondary VGA card
>>>>
>>>> arch/x86/pci/fixup.c |   17 +
>>>> 1 file changed, 9 insertions(+), 8 deletions(-)
>>>>
>>>>-- 
>>>>1.7.10.4
>>>>
>>>>
>>
>>
>>


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: In "pci_fixup_video" check if this is or should be the primary video devi

2014-01-15 Thread Sander Eikelenboom

Wednesday, January 15, 2014, 10:50:09 PM, you wrote:

> On Wed, Jan 15, 2014 at 12:36 PM, Sander Eikelenboom
>  wrote:
>> ...
>> And that's just what my patch does ..
>>
>> +   if (!vga_default_device() || pdev == vga_default_device()) {
>>
>> If we don't know the vga_default_device ... because we don't have that 
>> knowlegde
>> or
>> if this is actually the vga_default_device  ... because we do have that 
>> knowledge ..
>>
>> and only then .. run the fixup code and set this device as the 
>> vga_default_device
>>
>>
>> So this change actually makes the code adhere to the comment already above 
>> it .. saying it should only be applied to the vga_default_device aka
>> boot video device.
>>
>> Also added Bjorn and linux-pci to the CC .. should have done that right away 
>> ..
>> sorry for that.

> Can you resend your patch to linux-pci?  I don't think it made it there.

> Bjorn


Sure .. also attached for the case it gets mangled ..

Date: Sun, 12 Jan 2014 04:49:44 +0100
Subject: [PATCH] In "pci_fixup_video" check if this is or should be the
 primary video device to prevent setting the
 IORESOURCE_ROM_SHADOW flag on a secondary VGA card
To: Dave Airlie ,
Eiichiro Oiwa ,
Greg Kroah-Hartman 
Cc: Konrad Rzeszutek Wilk ,
linux-kernel@vger.kernel.org 

Setting the IORESOURCE_ROM_SHADOW flag on a secondary VGA card prevents if from
reading it's own rom. It will get the content of the shadowrom at C000 instead,
which is of the primary VGA card and the driver of the secondary card will bail
out.

Fix this by checking if this is or should be the primary video device before
applying the fix and let the comment reflect this.

Signed-off-by: Sander Eikelenboom 
---
 arch/x86/pci/fixup.c |   17 +
 1 file changed, 9 insertions(+), 8 deletions(-)

diff --git a/arch/x86/pci/fixup.c b/arch/x86/pci/fixup.c
index b046e07..525e49a 100644
--- a/arch/x86/pci/fixup.c
+++ b/arch/x86/pci/fixup.c
@@ -314,9 +314,9 @@ DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL,
PCI_DEVICE_ID_INTEL_MCH_PC1,pcie_r
  * IORESOURCE_ROM_SHADOW is used to associate the boot video
  * card with this copy. On laptops this copy has to be used since
  * the main ROM may be compressed or combined with another image.
- * See pci_map_rom() for use of this flag. IORESOURCE_ROM_SHADOW
- * is marked here since the boot video device will be the only enabled
- * video device at this point.
+ * See pci_map_rom() for use of this flag. Before we mark the device
+ * with IORESOURCE_ROM_SHADOW we have to check if this is or should become
+ * the primary video card, since this quirk is ran for all video devices.
  */
 
 static void pci_fixup_video(struct pci_dev *pdev)
@@ -347,12 +347,13 @@ static void pci_fixup_video(struct pci_dev *pdev)
}
bus = bus->parent;
}
-   pci_read_config_word(pdev, PCI_COMMAND, &config);
-   if (config & (PCI_COMMAND_IO | PCI_COMMAND_MEMORY)) {
-   pdev->resource[PCI_ROM_RESOURCE].flags |= IORESOURCE_ROM_SHADOW;
-   dev_printk(KERN_DEBUG, &pdev->dev, "Boot video device\n");
-   if (!vga_default_device())
+   if (!vga_default_device() || pdev == vga_default_device()) {
+   pci_read_config_word(pdev, PCI_COMMAND, &config);
+   if (config & (PCI_COMMAND_IO | PCI_COMMAND_MEMORY)) {
+   pdev->resource[PCI_ROM_RESOURCE].flags |= 
IORESOURCE_ROM_SHADOW;
+   dev_printk(KERN_DEBUG, &pdev->dev, "Boot video 
device\n");
vga_set_default_device(pdev);
+   }
}
 }
 DECLARE_PCI_FIXUP_CLASS_FINAL(PCI_ANY_ID, PCI_ANY_ID,
-- 
1.7.10.4


0001-In-pci_fixup_video-check-if-this-is-or-should-be-the.patch
Description: Binary data


Kconfig help entry for CONFIG_PARAVIRT_SPINLOCK

2013-10-15 Thread Sander Eikelenboom
Hi Raghavendra,

Since the ticketlock series have landed in this mergewindow (thanks :-) ) the
help accompanying the Kconfig entry doesn't seem to reflect the current state 
well.

- Wasn't the whole purpose of the ticketlock series to mitigate this 5% 
performance hit to something
  far less, so distro kernels could enable this for their normal kernels ?
  I don't have the exact performance figures though.

- Perhaps the suggestion to enable this for supported hypervisors (Xen and KVM 
?) could be added ?

--
Sander


CONFIG_PARAVIRT_SPINLOCKS:

Paravirtualized spinlocks allow a pvops backend to replace the
spinlock implementation with something virtualization-friendly
(for example, block the virtual CPU rather than spinning).

Unfortunately the downside is an up to 5% performance hit on
native kernels, with various workloads.

If you are unsure how to answer this question, answer N.

Symbol: PARAVIRT_SPINLOCKS [=y]
Type  : boolean
Prompt: Paravirtualization layer for spinlocks
  Location:
-> Processor type and features
  -> Linux guest support (HYPERVISOR_GUEST [=y])
-> Enable paravirtualization code (PARAVIRT [=y])
  Defined at arch/x86/Kconfig:632
  Depends on: HYPERVISOR_GUEST [=y] && PARAVIRT [=y] && SMP [=y]
  Selects: UNINLINE_SPIN_UNLOCK [=y]


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [cfg80211 / iwlwifi] setting wireless regulatory domain doesn't work.

2013-12-16 Thread Sander Eikelenboom

Wednesday, December 11, 2013, 7:38:50 PM, you wrote:

> On Wed, Dec 11, 2013 at 7:11 PM, Sander Eikelenboom
>  wrote:
>>
>> Wednesday, December 11, 2013, 6:53:07 PM, you wrote:
>>
>>> The best way to address all this is by automatic region awareness and
>>> doing the right thing on devices, this however requires good
>>> architecture / calibration data  / etc and all that needs to be
>>> verified by the system integrators, and finally they need to be
>>> certified. If you want to hack your firmware and software go at it,
>>> just be aware there are reasons for things.
>>
>> Well the general problem seems to be "we don't trust the user" so we FORCE 
>> him to the lowest
>> common denominator (without a way to overrule that) so he is forced to 
>> operate *well* within the law.

> Its simply stupid to have the user be involved, period, the fact that
> a user would be involved should only be for testing or helping
> compliance for a busted device, development, research and obviously
> hacking. Linux allows all these but by default a device with firmware
> and a custom regdomain that will barf if you try to use a channel that
> is not allowed is a restriction in firmware. Feel free to reverse
> engineer that if you don't like it but it just won't be supported or
> go upstream. Now, the common denominator is generally optimized for
> best performance as well so you shouldn't have to do anything, and for
> APs -- this is typically carefully crafted for a region, also highly
> optimized.

>>>>> It doesn't seem like you are getting your original requests getting
>>>>> processed, so I don't think CRDA is passing it. Can you verify running
>>>>> from CRDA code:
>>>>
>>>> They don't get processed unless i remove the return from the code as i 
>>>> indicated.
>>>> If i remove that return it processes the request.
>>>>
>>>>> ./regdbdump /usr/lib/crda/regulatory.bin
>>>>
>>>> Although it's in a different location on Debian, /lib/crda/regulatory.bin
>>>> the dump seems fine.
>>
>>> OK thanks. Can you send a patch of what exact change you made, it was
>>> unclear from the paste you made.
>>
>>> diff -u file.c.orig file.c
>>
>> Well i just did a pull from wireless-next, to try Avinash Patil's patch.
>> net/wireless/reg.c had already changed much so i couldn't apply his patch 
>> without.
>>
>> With his patch it sets the regulatory domain, although as now expected i 
>> still can not use channels 12 and 13 yet,
>> probably due to those firmware restrictions.

> Its unclear what results you got, and yeah if the device is restricted
> then its just the fw telling the driver its channels and you can't use
> them. That's it. You won't be able to override information then unless
> you hack the firmware

Ping ?

Is there anymore information you need to *fix* the problem ?

>   Luis


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [cfg80211 / iwlwifi] setting wireless regulatory domain doesn't work.

2013-12-16 Thread Sander Eikelenboom

Monday, December 16, 2013, 12:37:47 PM, you wrote:

> On 12/16/2013 12:22 PM, Sander Eikelenboom wrote:
>> 
>> Wednesday, December 11, 2013, 7:38:50 PM, you wrote:
>> 
>>> On Wed, Dec 11, 2013 at 7:11 PM, Sander Eikelenboom
>>>  wrote:
>>>>
>>>> Wednesday, December 11, 2013, 6:53:07 PM, you wrote:
>>>>
>>>>> The best way to address all this is by automatic region awareness and
>>>>> doing the right thing on devices, this however requires good
>>>>> architecture / calibration data  / etc and all that needs to be
>>>>> verified by the system integrators, and finally they need to be
>>>>> certified. If you want to hack your firmware and software go at it,
>>>>> just be aware there are reasons for things.
>>>>
>>>> Well the general problem seems to be "we don't trust the user" so we FORCE 
>>>> him to the lowest
>>>> common denominator (without a way to overrule that) so he is forced to 
>>>> operate *well* within the law.
>> 
>>> Its simply stupid to have the user be involved, period, the fact that
>>> a user would be involved should only be for testing or helping
>>> compliance for a busted device, development, research and obviously
>>> hacking. Linux allows all these but by default a device with firmware
>>> and a custom regdomain that will barf if you try to use a channel that
>>> is not allowed is a restriction in firmware. Feel free to reverse
>>> engineer that if you don't like it but it just won't be supported or
>>> go upstream. Now, the common denominator is generally optimized for
>>> best performance as well so you shouldn't have to do anything, and for
>>> APs -- this is typically carefully crafted for a region, also highly
>>> optimized.
>> 
>>>>>>> It doesn't seem like you are getting your original requests getting
>>>>>>> processed, so I don't think CRDA is passing it. Can you verify running
>>>>>>> from CRDA code:
>>>>>>
>>>>>> They don't get processed unless i remove the return from the code as i 
>>>>>> indicated.
>>>>>> If i remove that return it processes the request.
>>>>>>
>>>>>>> ./regdbdump /usr/lib/crda/regulatory.bin
>>>>>>
>>>>>> Although it's in a different location on Debian, /lib/crda/regulatory.bin
>>>>>> the dump seems fine.
>>>>
>>>>> OK thanks. Can you send a patch of what exact change you made, it was
>>>>> unclear from the paste you made.
>>>>
>>>>> diff -u file.c.orig file.c
>>>>
>>>> Well i just did a pull from wireless-next, to try Avinash Patil's patch.
>>>> net/wireless/reg.c had already changed much so i couldn't apply his patch 
>>>> without.
>>>>
>>>> With his patch it sets the regulatory domain, although as now expected i 
>>>> still can not use channels 12 and 13 yet,
>>>> probably due to those firmware restrictions.
>> 
>>> Its unclear what results you got, and yeah if the device is restricted
>>> then its just the fw telling the driver its channels and you can't use
>>> them. That's it. You won't be able to override information then unless
>>> you hack the firmware
>> 
>> Ping ?
>> 
>> Is there anymore information you need to *fix* the problem ?

> Maybe you did not get the essence of the response from Luis: There is
> *no* problem to be fixed.

*sigh* ..

Let's start from scratch then ...


a) Isn't the point of the whole regulatory domain system that i can select (and 
restrict) the channels/frequencies my devices transmits on, so i can abide the 
law ?
b) If so, does it set a regulatory domain from firmware  ?
c) If so, should it let me *restrict* the available channels even more by 
setting the regulatory domain to the region in which de device is currently 
being used ?
d) If so, why am i not  able to do so with my intel driver for a long time (for 
over a month now).
# iw reg get
country 00:
(2402 - 2472 @ 40), (6, 20)
(2457 - 2482 @ 40), (6, 20), PASSIVE-SCAN
(2474 - 2494 @ 20), (6, 20), NO-OFDM, PASSIVE-SCAN
(5170 - 5250 @ 160), (6, 20), PASSIVE-SCAN
(5250 - 5330 @ 160), (6, 20), DFS, PASSIVE-SCAN
(5490 - 5730 @ 160), (6, 20), DFS, PASSIVE-SCAN
# iw reg set US
# iw reg get
country 00:
(2402 - 2472 @ 40), (6, 20)
(2457 

Re: [Xen-devel] [RFC PATCH] Xen PCI back - do slot and bus reset (v0).

2013-12-16 Thread Sander Eikelenboom

Monday, December 16, 2013, 3:35:15 PM, you wrote:

> On Mon, Dec 16, 2013 at 10:59:01AM +, David Vrabel wrote:
>> On 13/12/13 16:09, Konrad Rzeszutek Wilk wrote:
>> > Hey,
>> > 
>> > While I was trying to narrow down the state of GPU passthrough
>> > (still not finished) and figuring what needs to be done I realized
>> > that Xen PCIback did not reset my GPU properly (when I crashed the
>> > Windows guest by mistake). It does an FLR reset or Power one - if
>> > the device supports it. But it seems that some of these GPUs
>> > are liars and actually don't do the power part properly.
>> 
>> In my experience the devices do not lie.  They correctly report that
>> they do not perform a reset in D3hot.
>> 
>> Here's the patch I'm using to solve this.  It does something similar.
>> i.e., a SBR if all devices on that bus are safe to be reset.
>> 
>> I prefer it because it provides the standard 'reset' sysfs file that the
>> toolstack/userspace can use.

> We can still add the 'reset' to SysFS
>> 
>> It does have some limitations:  a) It does not check whether a device is
>> in use (only if it is bound to pciback); and b) it hand rolls
>> pci_slot_reset() (because it didn't exist at the time).

> .. which can have those limiations removed and be based on this patchset.
> Meaning it won't do a bus-reset or device reset if the rest of the devices
> are _not_ assigned to pciback.

Perhaps there is something to learn from the steps vfio-pci takes to do this ?
(they sorted out quite some stuff around pci/vga passtrough)

>> 
>> diff --git a/drivers/xen/xen-pciback/pci_stub.c
>> b/drivers/xen/xen-pciback/pci_stub.c
>> index 4e8ba38..5a03e63 100644
>> --- a/drivers/xen/xen-pciback/pci_stub.c
>> +++ b/drivers/xen/xen-pciback/pci_stub.c
>> @@ -14,6 +14,7 @@
>>  #include 
>>  #include 
>>  #include 
>> +#include 
>>  #include 
>>  #include 
>>  #include 
>> @@ -43,6 +44,7 @@ struct pcistub_device {
>>   struct kref kref;
>>   struct list_head dev_list;
>>   spinlock_t lock;
>> + bool created_reset_file;
>> 
>>   struct pci_dev *dev;
>>   struct xen_pcibk_device *pdev;/* non-NULL if struct pci_dev is in use 
>> */
>> @@ -60,6 +62,114 @@ static LIST_HEAD(pcistub_devices);
>>  static int initialize_devices;
>>  static LIST_HEAD(seized_devices);
>> 
>> +/*
>> + * pci_reset_function() will only work if there is a mechanism to
>> + * reset that single function (e.g., FLR or a D-state transition).
>> + * For PCI hardware that has two or more functions but no per-function
>> + * reset, we can do a bus reset iff all the functions are co-assigned
>> + * to the same domain.
>> + *
>> + * If a function has no per-function reset mechanism the 'reset' sysfs
>> + * file that the toolstack uses to reset a function prior to assigning
>> + * the device will be missing.  In this case, pciback adds its own
>> + * which will try a bus reset.
>> + *
>> + * Note: pciback does not check for co-assigment before doing a bus
>> + * reset, only that the devices are bound to pciback.  The toolstack
>> + * is assumed to have done the right thing.
>> + */
>> +static int __pcistub_reset_function(struct pci_dev *dev)
>> +{
>> + struct pci_dev *pdev;
>> + u16 ctrl;
>> + int ret;
>> +
>> + ret = __pci_reset_function_locked(dev);
>> + if (ret == 0)
>> + return 0;
>> +
>> + if (pci_is_root_bus(dev->bus) || dev->subordinate || !dev->bus->self)
>> + return -ENOTTY;
>> +
>> + list_for_each_entry(pdev, &dev->bus->devices, bus_list) {
>> + if (pdev != dev && (!pdev->driver
>> + || strcmp(pdev->driver->name, "pciback")))
>> + return -ENOTTY;
>> + pci_save_state(pdev);
>> + }
>> +
>> + pci_read_config_word(dev->bus->self, PCI_BRIDGE_CONTROL, &ctrl);
>> + ctrl |= PCI_BRIDGE_CTL_BUS_RESET;
>> + pci_write_config_word(dev->bus->self, PCI_BRIDGE_CONTROL, ctrl);
>> + msleep(200);
>> +
>> + ctrl &= ~PCI_BRIDGE_CTL_BUS_RESET;
>> + pci_write_config_word(dev->bus->self, PCI_BRIDGE_CONTROL, ctrl);
>> + msleep(200);
>> +
>> + list_for_each_entry(pdev, &dev->bus->devices, bus_list)
>> + pci_restore_state(pdev);
>> +
>> + return 0;
>> +}
>> +
>> +static int pcistub_reset_function(struct pci_dev *dev)
>> +{
>> + int ret;
>> +
>> + device_lock(&dev->dev);
>> + ret = __pcistub_reset_function(dev);
>> + device_unlock(&dev->dev);
>> +
>> + return ret;
>> +}
>> +
>> +static ssize_t pcistub_reset_store(struct device *dev,
>> +struct device_attribute *attr,
>> +const char *buf, size_t count)
>> +{
>> + struct pci_dev *pdev = to_pci_dev(dev);
>> + unsigned long val;
>> + ssize_t result = strict_strtoul(buf, 0, &val);
>> +
>> + if (result < 0)
>> + return result;
>> +
>> + if (val != 1)
>> + return -EINVAL;
>> +
>> + result = pcistub_reset_function(pdev);
>

Re: [Xen-devel] [RFC PATCH] Xen PCI back - do slot and bus reset (v0).

2013-12-16 Thread Sander Eikelenboom

Monday, December 16, 2013, 4:36:12 PM, you wrote:

> On Mon, Dec 16, 2013 at 04:23:53PM +0100, Sander Eikelenboom wrote:
>> 
>> Monday, December 16, 2013, 3:35:15 PM, you wrote:
>> 
>> > On Mon, Dec 16, 2013 at 10:59:01AM +, David Vrabel wrote:
>> >> On 13/12/13 16:09, Konrad Rzeszutek Wilk wrote:
>> >> > Hey,
>> >> > 
>> >> > While I was trying to narrow down the state of GPU passthrough
>> >> > (still not finished) and figuring what needs to be done I realized
>> >> > that Xen PCIback did not reset my GPU properly (when I crashed the
>> >> > Windows guest by mistake). It does an FLR reset or Power one - if
>> >> > the device supports it. But it seems that some of these GPUs
>> >> > are liars and actually don't do the power part properly.
>> >> 
>> >> In my experience the devices do not lie.  They correctly report that
>> >> they do not perform a reset in D3hot.
>> >> 
>> >> Here's the patch I'm using to solve this.  It does something similar.
>> >> i.e., a SBR if all devices on that bus are safe to be reset.
>> >> 
>> >> I prefer it because it provides the standard 'reset' sysfs file that the
>> >> toolstack/userspace can use.
>> 
>> > We can still add the 'reset' to SysFS
>> >> 
>> >> It does have some limitations:  a) It does not check whether a device is
>> >> in use (only if it is bound to pciback); and b) it hand rolls
>> >> pci_slot_reset() (because it didn't exist at the time).
>> 
>> > .. which can have those limiations removed and be based on this patchset.
>> > Meaning it won't do a bus-reset or device reset if the rest of the devices
>> > are _not_ assigned to pciback.
>> 
>> Perhaps there is something to learn from the steps vfio-pci takes to do this 
>> ?
>> (they sorted out quite some stuff around pci/vga passtrough)

> That is actually what I based it on :-)

OK was already suspecting it somehow :-)

Reminds me to see if the Radeon maintainer knows a way to hookup a sysfs reset 
entry for
Radeon cards, that completly cycles the card (including potential bios voodoo). 
Since AMD is supporting the development
of the opensource driver now, there should be chance and it would be welcome 
for both Xen and KVM
since those cards don't support FLR.

>> 
>> >> 
>> >> diff --git a/drivers/xen/xen-pciback/pci_stub.c
>> >> b/drivers/xen/xen-pciback/pci_stub.c
>> >> index 4e8ba38..5a03e63 100644
>> >> --- a/drivers/xen/xen-pciback/pci_stub.c
>> >> +++ b/drivers/xen/xen-pciback/pci_stub.c
>> >> @@ -14,6 +14,7 @@
>> >>  #include 
>> >>  #include 
>> >>  #include 
>> >> +#include 
>> >>  #include 
>> >>  #include 
>> >>  #include 
>> >> @@ -43,6 +44,7 @@ struct pcistub_device {
>> >>   struct kref kref;
>> >>   struct list_head dev_list;
>> >>   spinlock_t lock;
>> >> + bool created_reset_file;
>> >> 
>> >>   struct pci_dev *dev;
>> >>   struct xen_pcibk_device *pdev;/* non-NULL if struct pci_dev is in 
>> >> use */
>> >> @@ -60,6 +62,114 @@ static LIST_HEAD(pcistub_devices);
>> >>  static int initialize_devices;
>> >>  static LIST_HEAD(seized_devices);
>> >> 
>> >> +/*
>> >> + * pci_reset_function() will only work if there is a mechanism to
>> >> + * reset that single function (e.g., FLR or a D-state transition).
>> >> + * For PCI hardware that has two or more functions but no per-function
>> >> + * reset, we can do a bus reset iff all the functions are co-assigned
>> >> + * to the same domain.
>> >> + *
>> >> + * If a function has no per-function reset mechanism the 'reset' sysfs
>> >> + * file that the toolstack uses to reset a function prior to assigning
>> >> + * the device will be missing.  In this case, pciback adds its own
>> >> + * which will try a bus reset.
>> >> + *
>> >> + * Note: pciback does not check for co-assigment before doing a bus
>> >> + * reset, only that the devices are bound to pciback.  The toolstack
>> >> + * is assumed to have done the right thing.
>> >> + */
>> >> +static int __pcistub_reset_function(struct pci_dev *dev)
>> >> +{
>> >> + struct pci_d

Re: [Xen-devel] [RFC PATCH] Xen PCI back - do slot and bus reset (v0).

2013-12-16 Thread Sander Eikelenboom

Monday, December 16, 2013, 4:36:12 PM, you wrote:

> On Mon, Dec 16, 2013 at 04:23:53PM +0100, Sander Eikelenboom wrote:
>> 
>> Monday, December 16, 2013, 3:35:15 PM, you wrote:
>> 
>> > On Mon, Dec 16, 2013 at 10:59:01AM +, David Vrabel wrote:
>> >> On 13/12/13 16:09, Konrad Rzeszutek Wilk wrote:
>> >> > Hey,
>> >> > 
>> >> > While I was trying to narrow down the state of GPU passthrough
>> >> > (still not finished) and figuring what needs to be done I realized
>> >> > that Xen PCIback did not reset my GPU properly (when I crashed the
>> >> > Windows guest by mistake). It does an FLR reset or Power one - if
>> >> > the device supports it. But it seems that some of these GPUs
>> >> > are liars and actually don't do the power part properly.
>> >> 
>> >> In my experience the devices do not lie.  They correctly report that
>> >> they do not perform a reset in D3hot.
>> >> 
>> >> Here's the patch I'm using to solve this.  It does something similar.
>> >> i.e., a SBR if all devices on that bus are safe to be reset.
>> >> 
>> >> I prefer it because it provides the standard 'reset' sysfs file that the
>> >> toolstack/userspace can use.
>> 
>> > We can still add the 'reset' to SysFS
>> >> 
>> >> It does have some limitations:  a) It does not check whether a device is
>> >> in use (only if it is bound to pciback); and b) it hand rolls
>> >> pci_slot_reset() (because it didn't exist at the time).
>> 
>> > .. which can have those limiations removed and be based on this patchset.
>> > Meaning it won't do a bus-reset or device reset if the rest of the devices
>> > are _not_ assigned to pciback.
>> 
>> Perhaps there is something to learn from the steps vfio-pci takes to do this 
>> ?
>> (they sorted out quite some stuff around pci/vga passtrough)

> That is actually what I based it on :-)

Perhaps noteworthy then: [PATCH] pci: Add "try" reset interfaces
http://lkml.indiana.edu/hypermail/linux/kernel/1312.2/00577.html

>> 
>> >> 
>> >> diff --git a/drivers/xen/xen-pciback/pci_stub.c
>> >> b/drivers/xen/xen-pciback/pci_stub.c
>> >> index 4e8ba38..5a03e63 100644
>> >> --- a/drivers/xen/xen-pciback/pci_stub.c
>> >> +++ b/drivers/xen/xen-pciback/pci_stub.c
>> >> @@ -14,6 +14,7 @@
>> >>  #include 
>> >>  #include 
>> >>  #include 
>> >> +#include 
>> >>  #include 
>> >>  #include 
>> >>  #include 
>> >> @@ -43,6 +44,7 @@ struct pcistub_device {
>> >>   struct kref kref;
>> >>   struct list_head dev_list;
>> >>   spinlock_t lock;
>> >> + bool created_reset_file;
>> >> 
>> >>   struct pci_dev *dev;
>> >>   struct xen_pcibk_device *pdev;/* non-NULL if struct pci_dev is in 
>> >> use */
>> >> @@ -60,6 +62,114 @@ static LIST_HEAD(pcistub_devices);
>> >>  static int initialize_devices;
>> >>  static LIST_HEAD(seized_devices);
>> >> 
>> >> +/*
>> >> + * pci_reset_function() will only work if there is a mechanism to
>> >> + * reset that single function (e.g., FLR or a D-state transition).
>> >> + * For PCI hardware that has two or more functions but no per-function
>> >> + * reset, we can do a bus reset iff all the functions are co-assigned
>> >> + * to the same domain.
>> >> + *
>> >> + * If a function has no per-function reset mechanism the 'reset' sysfs
>> >> + * file that the toolstack uses to reset a function prior to assigning
>> >> + * the device will be missing.  In this case, pciback adds its own
>> >> + * which will try a bus reset.
>> >> + *
>> >> + * Note: pciback does not check for co-assigment before doing a bus
>> >> + * reset, only that the devices are bound to pciback.  The toolstack
>> >> + * is assumed to have done the right thing.
>> >> + */
>> >> +static int __pcistub_reset_function(struct pci_dev *dev)
>> >> +{
>> >> + struct pci_dev *pdev;
>> >> + u16 ctrl;
>> >> + int ret;
>> >> +
>> >> + ret = __pci_reset_function_locked(dev);
>> >> + if (ret == 0)
>> >> + return 0;
>> >&g

Re: [cfg80211 / iwlwifi] setting wireless regulatory domain doesn't work.

2013-12-17 Thread Sander Eikelenboom

Tuesday, December 17, 2013, 3:17:50 AM, you wrote:

> Hi Sander,

> On Mon, Dec 16, 2013 at 11:56 PM, Sander Eikelenboom
>  wrote:
>>
>> Monday, December 16, 2013, 12:37:47 PM, you wrote:
>>
>>> On 12/16/2013 12:22 PM, Sander Eikelenboom wrote:
>>>>
>>>> Wednesday, December 11, 2013, 7:38:50 PM, you wrote:
>>>>
>>>>> On Wed, Dec 11, 2013 at 7:11 PM, Sander Eikelenboom
>>>>>  wrote:
>>>>>>
>>>>>> Wednesday, December 11, 2013, 6:53:07 PM, you wrote:
>>>>>>
>>>>>>> The best way to address all this is by automatic region awareness and
>>>>>>> doing the right thing on devices, this however requires good
>>>>>>> architecture / calibration data  / etc and all that needs to be
>>>>>>> verified by the system integrators, and finally they need to be
>>>>>>> certified. If you want to hack your firmware and software go at it,
>>>>>>> just be aware there are reasons for things.
>>>>>>
>>>>>> Well the general problem seems to be "we don't trust the user" so we 
>>>>>> FORCE him to the lowest
>>>>>> common denominator (without a way to overrule that) so he is forced to 
>>>>>> operate *well* within the law.
>>>>
>>>>> Its simply stupid to have the user be involved, period, the fact that
>>>>> a user would be involved should only be for testing or helping
>>>>> compliance for a busted device, development, research and obviously
>>>>> hacking. Linux allows all these but by default a device with firmware
>>>>> and a custom regdomain that will barf if you try to use a channel that
>>>>> is not allowed is a restriction in firmware. Feel free to reverse
>>>>> engineer that if you don't like it but it just won't be supported or
>>>>> go upstream. Now, the common denominator is generally optimized for
>>>>> best performance as well so you shouldn't have to do anything, and for
>>>>> APs -- this is typically carefully crafted for a region, also highly
>>>>> optimized.
>>>>
>>>>>>>>> It doesn't seem like you are getting your original requests getting
>>>>>>>>> processed, so I don't think CRDA is passing it. Can you verify running
>>>>>>>>> from CRDA code:
>>>>>>>>
>>>>>>>> They don't get processed unless i remove the return from the code as i 
>>>>>>>> indicated.
>>>>>>>> If i remove that return it processes the request.
>>>>>>>>
>>>>>>>>> ./regdbdump /usr/lib/crda/regulatory.bin
>>>>>>>>
>>>>>>>> Although it's in a different location on Debian, 
>>>>>>>> /lib/crda/regulatory.bin
>>>>>>>> the dump seems fine.
>>>>>>
>>>>>>> OK thanks. Can you send a patch of what exact change you made, it was
>>>>>>> unclear from the paste you made.
>>>>>>
>>>>>>> diff -u file.c.orig file.c
>>>>>>
>>>>>> Well i just did a pull from wireless-next, to try Avinash Patil's patch.
>>>>>> net/wireless/reg.c had already changed much so i couldn't apply his 
>>>>>> patch without.
>>>>>>
>>>>>> With his patch it sets the regulatory domain, although as now expected i 
>>>>>> still can not use channels 12 and 13 yet,
>>>>>> probably due to those firmware restrictions.
>>>>
>>>>> Its unclear what results you got, and yeah if the device is restricted
>>>>> then its just the fw telling the driver its channels and you can't use
>>>>> them. That's it. You won't be able to override information then unless
>>>>> you hack the firmware
>>>>
>>>> Ping ?
>>>>
>>>> Is there anymore information you need to *fix* the problem ?
>>
>>> Maybe you did not get the essence of the response from Luis: There is
>>> *no* problem to be fixed.
>>
>> *sigh* ..
>>
>> Let's start from scratch then ...
>>
>>
>> a) Isn't the point of the whole regulatory domain system that i can select 
>> (and restrict)

[RC6 Bell Chime] Re: [PATCH 00/24] rfcomm fixes

2014-03-10 Thread Sander Eikelenboom
Hi all,

Since:
- 3.14-RC6 has been cut
- this regression is known and reported since the merge window
- the fix (revert of 3 patches) is known for over a month now
- but it's still not in mainline
- my polite ping request from last week seems to have provoked exactly 0 (zero) 
response.

IT'S TIME TO CHIME SOME BELLS :-)

Hope that WILL be heard somewhere ...

--
Sander

PS. on the informative side the 3 commits to be reverted are:

f86772af6a0f643d3e13eb3f4f9213ae0c333ee4 Bluetooth: Remove 
rfcomm_carrier_raised()
4a2fb3ecc7467c775b154813861f25a0ddc11aa0 Bluetooth: Always wait for a 
connection on RFCOMM open()
e228b63390536f5b737056059a9a04ea016b1abf Bluetooth: Move rfcomm_get_device() 
before rfcomm_dev_activate()


Monday, March 3, 2014, 8:38:53 PM, you wrote:


> Wednesday, February 12, 2014, 12:06:44 PM, you wrote:

>> Monday, February 10, 2014, 11:09:38 PM, you wrote:

>>> Hi Peter,

 This patch series addresses a number of previously unknown issues
 with the RFCOMM tty device implementation, in addition to
 addressing the locking regression recently reported [1].
 
 As Gianluca suggested and I agree, this series first reverts
 3 of the 4 patches of 3.14-rc1 for bluetooth/rfcomm/tty.c.

>>> so for 3.14 we should revert 3 patches. And then the other 21 are intended 
>>> for 3.15 merge window.

>>> I realize that we still have to deal with some breakage, but we do not want 
>>> regressions and I clearly not going to take 24 patches for 3.14 at this 
>>> point in time.

>>> What I can do is take all 24 patches into bluetooth-next and let them sit 
>>> for 1 week and have people test them. And then we go ahead with reverting 3 
>>> patches from 3.14. Does that make sense?

>> Reverting those 3 patches works for me.

>> --
>> Sander

>>> Regards

>>> Marcel

> Hi Marcel,

> Ping... it seems these 3 reverts are still not in 3.14-rc5 to fix the 
> regressions ?

> --
> Sander



-- 
Best regards,
 Sandermailto:li...@eikelenboom.it

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 3.14-mw regression: rtl8169 WARNING: DMA-API: exceeded 7 overlapping mappings of pfn 55ebe

2014-02-13 Thread Sander Eikelenboom

Thursday, February 13, 2014, 9:14:47 PM, you wrote:

> On Tue, 2014-02-11 at 20:17 -0800, Eric Dumazet wrote:
>> On Tue, 2014-02-11 at 18:07 -0800, Dan Williams wrote:
>> 
>> > The overlap granularity is too large.  Multiple dma_map_single
>> > mappings are allowed to a given page as long as they don't collide on
>> > the same cache line.
>> > 
>> 
>> I am not sure why you try number of mappings of a page.
>> 
>> Try launching 100 concurrent netperf -t TCP_SENFILE
>> 
>> Same page might be mapped more than 100 times, more than 1 times in
>> some cases.

> Thanks for that test case.

> I updated the fix patch with the following.

> diff --git a/lib/dma-debug.c b/lib/dma-debug.c
> index 42b12740940b..611010df1e9c 100644
> --- a/lib/dma-debug.c
> +++ b/lib/dma-debug.c
> @@ -513,6 +513,13 @@ static int active_cln_insert(struct dma_debug_entry 
> *entry)
> unsigned long flags;
> int rc;
>  
> +   /* If the device is not writing memory then we don't have any
> +* concerns about the cpu consuming stale data.  This mitigates
> +* legitimate usages of overlapping mappings.
> +*/
+   if (entry->>direction == DMA_TO_DEVICE)
> +   return 0;
> +
> spin_lock_irqsave(&radix_lock, flags);
> rc = radix_tree_insert(&dma_active_cacheline, to_cln(entry), entry);
> if (rc == -EEXIST)
> @@ -526,6 +533,10 @@ static void active_cln_remove(struct dma_debug_entry 
> *entry)
>  {
> unsigned long flags;
>  
> +   /* ...mirror the insert case */
+   if (entry->>direction == DMA_TO_DEVICE)
> +   return;
> +
> spin_lock_irqsave(&radix_lock, flags);
> /* since we are counting overlaps the final put of the
>  * cacheline will occur when the overlap count is 0.


> Sander, barring a negative test result from you I'll send the attached
> patch to Andrew.

Hi Dan,

That seems to effectively suppress the warning, thanks and:

Tested-by; Sander Eikelenboom 

--
Sander

> --
> Dan


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] Sound USB: Prevent printk ratelimiting from spamming kernel log while DEBUG not defined

2014-05-02 Thread Sander Eikelenboom
This (widely used) construction:

if(printk_ratelimit())
dev_dbg()

Causes the ratelimiting to spam the kernel log with the "callbacks suppressed"
message below, even while the dev_dbg it is supposed to rate limit wouldn't
print anything because DEBUG is not defined for this device.

[  533.803964] retire_playback_urb: 852 callbacks suppressed
[  538.807930] retire_playback_urb: 852 callbacks suppressed
[  543.811897] retire_playback_urb: 852 callbacks suppressed
[  548.815745] retire_playback_urb: 852 callbacks suppressed
[  553.819826] retire_playback_urb: 852 callbacks suppressed

So use dev_dbg_ratelimited() instead of this construction.

Signed-off-by: Sander Eikelenboom 
---
 sound/usb/pcm.c |5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/sound/usb/pcm.c b/sound/usb/pcm.c
index 131336d..c62a165 100644
--- a/sound/usb/pcm.c
+++ b/sound/usb/pcm.c
@@ -1501,9 +1501,8 @@ static void retire_playback_urb(struct snd_usb_substream 
*subs,
 * The error should be lower than 2ms since the estimate relies
 * on two reads of a counter updated every ms.
 */
-   if (printk_ratelimit() &&
-   abs(est_delay - subs->last_delay) * 1000 > runtime->rate * 2)
-   dev_dbg(&subs->dev->dev,
+   if (abs(est_delay - subs->last_delay) * 1000 > runtime->rate * 2)
+   dev_dbg_ratelimited(&subs->dev->dev,
"delay: estimated %d, actual %d\n",
est_delay, subs->last_delay);
 
-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] Prevent printk ratelimiting from spamming kernel log while DEBUG not defined

2014-05-02 Thread Sander Eikelenboom
Hi All,

This patch is for USB sound, but the construction is widely used in the kernel,
so this could pop up in more places.

Greg, Julia,

Would it be worthwhile to sweep this tree wide ?
And would this be something for Coccinelle ?
This probably also goes for the other loglevels and the "net" variant.


Sander Eikelenboom (1):
  Sound USB: Prevent printk ratelimiting from spamming kernel log while
DEBUG not defined

 sound/usb/pcm.c |5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Sound USB: Prevent printk ratelimiting from spamming kernel log while DEBUG not defined

2014-05-02 Thread Sander Eikelenboom

Friday, May 2, 2014, 6:13:09 PM, you wrote:

> At Fri,  2 May 2014 15:09:27 +0200,
> Sander Eikelenboom wrote:
>> 
>> This (widely used) construction:
>> 
>> if(printk_ratelimit())
>>   dev_dbg()
>> 
>> Causes the ratelimiting to spam the kernel log with the "callbacks 
>> suppressed"
>> message below, even while the dev_dbg it is supposed to rate limit wouldn't
>> print anything because DEBUG is not defined for this device.
>> 
>> [  533.803964] retire_playback_urb: 852 callbacks suppressed
>> [  538.807930] retire_playback_urb: 852 callbacks suppressed
>> [  543.811897] retire_playback_urb: 852 callbacks suppressed
>> [  548.815745] retire_playback_urb: 852 callbacks suppressed
>> [  553.819826] retire_playback_urb: 852 callbacks suppressed
>> 
>> So use dev_dbg_ratelimited() instead of this construction.
>> 
>> Signed-off-by: Sander Eikelenboom 

> Thanks, applied.  This is a result of the recent rewrite to dev_dbg()
> from plain printk(), I suppose.

Yes that patch (https://lkml.org/lkml/2014/4/9/457) prevented spamming the 
kernel log when debugging was enabled ..
but  now the rate limiting code starts spamming the kernel log instead :-)

I must say i wasn't very aware of that effect either and also used this 
construction on 
debug patches.


> Takashi

>> ---
>>  sound/usb/pcm.c |5 ++---
>>  1 file changed, 2 insertions(+), 3 deletions(-)
>> 
>> diff --git a/sound/usb/pcm.c b/sound/usb/pcm.c
>> index 131336d..c62a165 100644
>> --- a/sound/usb/pcm.c
>> +++ b/sound/usb/pcm.c
>> @@ -1501,9 +1501,8 @@ static void retire_playback_urb(struct 
>> snd_usb_substream *subs,
>>* The error should be lower than 2ms since the estimate relies
>>* on two reads of a counter updated every ms.
>>*/
>> - if (printk_ratelimit() &&
>> - abs(est_delay - subs->last_delay) * 1000 > runtime->rate * 2)
>> - dev_dbg(&subs->dev->dev,
>> + if (abs(est_delay - subs->last_delay) * 1000 > runtime->rate * 2)
>> + dev_dbg_ratelimited(&subs->dev->dev,
>>   "delay: estimated %d, actual %d\n",
>>   est_delay, subs->last_delay);
>>  
>> -- 
>> 1.7.10.4
>> 


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


3.15-mw: Oops Workqueue: writeback bdi_writeback_workfn (flush-8:16) RIP: e030:[] [] kobject_put+0x11/0x70

2014-04-12 Thread Sander Eikelenboom
Hi,

I just ran into the oops belowafter some uptime.

--
Sander

[175753.946560] IP: [] kobject_put+0x11/0x70
[175753.964484] PGD 0 
[175753.982157] Oops:  [#1] SMP 
[175753.999575] Modules linked in:
[175754.016705] CPU: 4 PID: 23869 Comm: kworker/u12:3 Not tainted 
3.14.0-mw-20140409a+ #1
[175754.033879] Hardware name: MSI MS-7640/890FXA-GD70 (MS-7640)  , BIOS V1.8B1 
09/13/2010
[175754.050839] Workqueue: writeback bdi_writeback_workfn (flush-8:16)
[175754.067560] task: 88000f1d91e0 ti: 8800046c6000 task.ti: 
8800046c6000
[175754.084258] RIP: e030:[]  [] 
kobject_put+0x11/0x70
[175754.100764] RSP: e02b:8800046c76f8  EFLAGS: 00010002
[175754.117096] RAX: 0001 RBX: 01d0 RCX: 
010f5319
[175754.133166] RDX: 010f5318 RSI: 88005f717e20 RDI: 
01d0
[175754.149101] RBP: 8800046c7708 R08: 00017e20 R09: 
880057588718
[175754.164706] R10:  R11:  R12: 
0002
[175754.180080] R13: 0020 R14: 0008 R15: 
88002ad154c8
[175754.195262] FS:  7f1ac8154700() GS:88005f70() 
knlGS:
[175754.210318] CS:  e033 DS:  ES:  CR0: 8005003b
[175754.225097] CR2: 020c CR3: 0221 CR4: 
0660
[175754.239767] Stack:
[175754.254060]  88000835cc40 88000835cc40 8800046c7718 
816e04b7
[175754.268330]  8800046c7768 8171002c 8800591589c0 
8800571da000
[175754.282367]  8800046c7768 880059158800 88002ad154c8 

[175754.296180] Call Trace:
[175754.309706]  [] put_device+0x17/0x20
[175754.323129]  [] scsi_init_io+0x10c/0x170
[175754.336275]  [] scsi_setup_fs_cmnd+0x66/0xa0
[175754.349247]  [] sd_prep_fn+0x2a4/0xd30
[175754.362023]  [] ? frontend_changed+0x2dd/0x3e0
[175754.374461]  [] blk_peek_request+0xbe/0x220
[175754.386674]  [] ? scsi_request_fn+0x2dd/0x470
[175754.398681]  [] scsi_request_fn+0x41/0x470
[175754.410403]  [] ? lock_acquire+0xe5/0x150
[175754.421913]  [] __blk_run_queue+0x37/0x50
[175754.433239]  [] queue_unplugged+0x39/0xb0
[175754.444271]  [] blk_flush_plug_list+0x1fb/0x280
[175754.455069]  [] blk_finish_plug+0x18/0x50
[175754.465622]  [] ext4_writepages+0x446/0xd20
[175754.476014]  [] ? __lock_acquire+0x516/0x2210
[175754.486096]  [] do_writepages+0x21/0x50
[175754.495922]  [] __writeback_single_inode+0x40/0x220
[175754.505550]  [] writeback_sb_inodes+0x291/0x440
[175754.515007]  [] __writeback_inodes_wb+0x9f/0xd0
[175754.524158]  [] wb_writeback+0x243/0x2c0
[175754.533058]  [] bdi_writeback_workfn+0x118/0x480
[175754.541760]  [] ? process_one_work+0x15b/0x490
[175754.550297]  [] process_one_work+0x1c5/0x490
[175754.558544]  [] ? process_one_work+0x15b/0x490
[175754.566551]  [] worker_thread+0x11b/0x370
[175754.574323]  [] ? trace_hardirqs_on+0xd/0x10
[175754.581861]  [] ? manage_workers.isra.21+0x2b0/0x2b0
[175754.589261]  [] kthread+0xe4/0x100
[175754.596332]  [] ? __init_kthread_worker+0x70/0x70
[175754.603198]  [] ret_from_fork+0x7c/0xb0
[175754.609811]  [] ? __init_kthread_worker+0x70/0x70
[175754.616232] Code: 89 f7 e8 03 19 00 00 0f b6 43 04 eb a2 66 66 66 66 2e 0f 
1f 84 00 00 00 00 00 55 48 89 e5 53 48 89 fb 48 83 ec 08 48 85 ff 74 0d  47 
3c 01 74 29 f0 83 6b 38 01 74 0a 48 83 c4 08 5b 5d c3 0f 
[175754.629397] RIP  [] kobject_put+0x11/0x70
[175754.635534]  RSP 
[175754.641515] CR2: 020c
[175754.647383] ---[ end trace 8a401ccf86be679c ]---

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Xen-devel] Xen-unstable Linux 3.14-rc3 and 3.13 Network troubles "bisected"

2014-03-26 Thread Sander Eikelenboom
Paul,

You have been awfully silent for this whole thread while this is a regression 
caused by a patch of you
(ca2f09f2b2c6c25047cfc545d057c4edfcfe561c as clearly stated much earlier in 
this thread).

The commit messages states:
"net_rx_action() is the place where we could do with an accurate 
predicition but,
since that has proven tricky to calculate, a cheap worse-case (but not too 
bad)
estimate is all we really need since the only thing we *must* prevent is 
xenvif_gop_skb()
consuming more slots than are available."

Your "worst-case" calculation stated in the commit message is clearly not the 
worst case,
since it doesn't take calls to "get_next_rx_buffer" into account.

Problem is that a worst case calculation would probably be reverting to the old 
calculation,
and the problems this patch was trying to solve would reappear, but introducing 
new regressions
isn't very useful either. And since it seems such a tricky and fragile thing to 
determine, it would
probably be best to be split into a distinct function with a comment to explain 
the rationale used.

Since this doesn't seem to progress very fast .. CC'ed some more folks .. you 
never know ..

--
Sander


Tuesday, March 25, 2014, 4:29:42 PM, you wrote:


> Tuesday, March 25, 2014, 4:15:39 PM, you wrote:

>> On Sat, Mar 22, 2014 at 07:28:34PM +0100, Sander Eikelenboom wrote:
>> [...]
>>> > Yes there is only one frag .. but it seems to be much larger than 
>>> > PAGE_SIZE .. and xenvif_gop_frag_copy brakes that frag down into smaller 
>>> > bits .. hence the calculation in xenvif_rx_action determining the slots 
>>> > needed by doing:
>>> 
>>> > for (i = 0; i < nr_frags; i++) {
>>> > unsigned int size;
>>> > size = skb_frag_size(&skb_shinfo(skb)->frags[i]);
>>> > max_slots_needed += DIV_ROUND_UP(size, PAGE_SIZE);
>>> > }
>>> 
>>> > But the code in xenvif_gop_frag_copy .. seems to be needing one more slot 
>>> > (from the emperical test) .. and calling "get_next_rx_buffer" seems 
>>> > involved in that ..
>>> 
>>> Hmm looked again .. and it seems this is it .. when your frags are large 
>>> enough you have the chance of running into this.
>>> 

>> get_next_rx_buffer is guarded by start_new_rx_buffer. Do you see any
>> problem with that implementation?
> In general no, but "get_next_rx_buffer" up's cons .. and the calculations 
> done in "xenvif_rx_action" for max_slots_needed to prevent the overrun
> don't count in this possibility. So it's not the guarding of 
> "start_new_rx_buffer" that is at fault. It's the ones early in 
> "xenvif_rx_action".
> The ones that were changed by Paul's patch from MAX_SKB_FRAGS to a calculated 
> value that should be a "slim fit".

> The problem is in determining upfront in "xenvif_rx_action" when and how 
> often the "get_next_rx_buffer" path will be taken.
> Unless there are other non direct restrictions (from a size point of view) it 
> can be called multiple times per frag per skb.

>>> Problem is .. i don't see an easy fix, the "one more slot" of the empirical 
>>> test doesn't seem to be the worst case either (i think):
>>> 
>>> - In my case the packets that hit this only have 1 frag, but i could have 
>>> had more frags.
>>>   I also think you can't rule out the possibility of doing the 
>>> "get_next_rx_buffer" for multiple subsequent frags from one packet,
>>>   so in the worst (and perhaps even from a single frag since it's looped 
>>> over a split of it in what seems PAGE_SIZE pieces.)
>>>   - So an exact calculation of how much slots we are going to need for 
>>> hitting this "get_next_rx_buffer"  upfront in "xenvif_rx_action" seems 
>>> unfeasible.
>>>   - A worst case gamble seems impossible either .. if you take multiple 
>>> frags * multiple times the "get_next_rx_buffer" ... you would probably be 
>>> back at just
>>> setting the needed_slots to MAX_SKB_FRAGS.
>>> 
>>> - Other thing would be checking for the available slots before doing the  
>>> "get_next_rx_buffer" .. how ever .. we don't account for how many slots we 
>>> still need to
>>>   just process the remaining frags.
>>> 

>> We've done a worst case estimation for whole SKB (linear area + all
>> frag

Re: [Xen-devel] Xen-unstable Linux 3.14-rc3 and 3.13 Network troubles "bisected"

2014-03-26 Thread Sander Eikelenboom

Wednesday, March 26, 2014, 3:44:42 PM, you wrote:

>> -Original Message-
>> From: Sander Eikelenboom [mailto:li...@eikelenboom.it]
>> Sent: 26 March 2014 11:11
>> To: Paul Durrant
>> Cc: Wei Liu; annie li; Zoltan Kiss; xen-de...@lists.xen.org; Ian Campbell; 
>> linux-
>> kernel; net...@vger.kernel.org
>> Subject: Re: [Xen-devel] Xen-unstable Linux 3.14-rc3 and 3.13 Network
>> troubles "bisected"
>> 
>> Paul,
>> 
>> You have been awfully silent for this whole thread while this is a regression
>> caused by a patch of you
>> (ca2f09f2b2c6c25047cfc545d057c4edfcfe561c as clearly stated much earlier in
>> this thread).
>> 

> Sorry, I've been distracted...

>> The commit messages states:
>> "net_rx_action() is the place where we could do with an accurate
>> predicition but,
>> since that has proven tricky to calculate, a cheap worse-case (but not 
>> too
>> bad)
>> estimate is all we really need since the only thing we *must* prevent is
>> xenvif_gop_skb()
>> consuming more slots than are available."
>> 
>> Your "worst-case" calculation stated in the commit message is clearly not the
>> worst case,
>> since it doesn't take calls to "get_next_rx_buffer" into account.
>> 

> It should be taking into account the behaviour of start_new_rx_buffer(), 
> which should be true if a slot is full or a frag will overflow the current 
> slot and doesn't require splitting.
> The code in net_rx_action() makes the assumption that each frag will require 
> as many slots as its size requires, i.e. it assumes no packing of multiple 
> frags into a single slot, so it should be a worst case.
> Did I miss something in that logic?

Yes.
In "xenvif_gop_skb()" this loop:

for (i = 0; i < nr_frags; i++) {
xenvif_gop_frag_copy(vif, skb, npo,
 skb_frag_page(&skb_shinfo(skb)->frags[i]),
 skb_frag_size(&skb_shinfo(skb)->frags[i]),
 skb_shinfo(skb)->frags[i].page_offset,
 &head);
}

Is capable of using up (at least) 1 slot more than is anticipated for in 
"net_rx_action()"  by this code:

for (i = 0; i < skb_shinfo(skb)->nr_frags; i++) {
unsigned int size;
size = skb_frag_size(&skb_shinfo(skb)->frags[i]);
max_slots_needed += DIV_ROUND_UP(size, PAGE_SIZE);
}

And this happens when it calls "get_next_rx_buffer()" from 
"xenvif_gop_frag_copy()" where it's breaking down the frag.

Ultimately this results in bad grant reference warnings (and packets marked as 
"errors" in the interface statistics).

In my case it always seems to be a skb with 1 frag which is broken down in 5 or 
6 pieces ..

So "get_next_rx_buffer()" is called once .. and i'm overrunning the ring with 1 
slot, but i'm not sure if that's not coincedence
since in the code there seem to be no explicit limitation on how often this 
code path is taken. So perhaps it's implicitly limited
since packets and frags can't be arbitrarily large in comparison with the 
page_size but that's not something i'm capable of figuring out :-)



>   Paul

>> Problem is that a worst case calculation would probably be reverting to the
>> old calculation,
>> and the problems this patch was trying to solve would reappear, but
>> introducing new regressions
>> isn't very useful either. And since it seems such a tricky and fragile thing 
>> to
>> determine, it would
>> probably be best to be split into a distinct function with a comment to 
>> explain
>> the rationale used.
>> 
>> Since this doesn't seem to progress very fast .. CC'ed some more folks .. you
>> never know ..
>> 
>> --
>> Sander
>> 
>> 
>> Tuesday, March 25, 2014, 4:29:42 PM, you wrote:
>> 
>> 
>> > Tuesday, March 25, 2014, 4:15:39 PM, you wrote:
>> 
>> >> On Sat, Mar 22, 2014 at 07:28:34PM +0100, Sander Eikelenboom wrote:
>> >> [...]
>> >>> > Yes there is only one frag .. but it seems to be much larger than
>> PAGE_SIZE .. and xenvif_gop_frag_copy brakes that frag down into smaller
>> bits .. hence the calculation in xenvif_rx_action determining the slots 
>> needed
>> by doing:
>> >>>
>> >>> > for (i = 0; i < nr_frags; i++) {
>> >>> >

Re: [Xen-devel] Xen-unstable Linux 3.14-rc3 and 3.13 Network troubles "bisected"

2014-03-26 Thread Sander Eikelenboom

Wednesday, March 26, 2014, 4:50:30 PM, you wrote:

>> -Original Message-
>> From: Sander Eikelenboom [mailto:li...@eikelenboom.it]
>> Sent: 26 March 2014 15:23
>> To: Paul Durrant
>> Cc: Wei Liu; annie li; Zoltan Kiss; xen-de...@lists.xen.org; Ian Campbell; 
>> linux-
>> kernel; net...@vger.kernel.org
>> Subject: Re: [Xen-devel] Xen-unstable Linux 3.14-rc3 and 3.13 Network
>> troubles "bisected"
>> 
>> 
>> Wednesday, March 26, 2014, 3:44:42 PM, you wrote:
>> 
>> >> -Original Message-
>> >> From: Sander Eikelenboom [mailto:li...@eikelenboom.it]
>> >> Sent: 26 March 2014 11:11
>> >> To: Paul Durrant
>> >> Cc: Wei Liu; annie li; Zoltan Kiss; xen-de...@lists.xen.org; Ian Campbell;
>> linux-
>> >> kernel; net...@vger.kernel.org
>> >> Subject: Re: [Xen-devel] Xen-unstable Linux 3.14-rc3 and 3.13 Network
>> >> troubles "bisected"
>> >>
>> >> Paul,
>> >>
>> >> You have been awfully silent for this whole thread while this is a
>> regression
>> >> caused by a patch of you
>> >> (ca2f09f2b2c6c25047cfc545d057c4edfcfe561c as clearly stated much earlier
>> in
>> >> this thread).
>> >>
>> 
>> > Sorry, I've been distracted...
>> 
>> >> The commit messages states:
>> >> "net_rx_action() is the place where we could do with an accurate
>> >> predicition but,
>> >> since that has proven tricky to calculate, a cheap worse-case (but not
>> too
>> >> bad)
>> >> estimate is all we really need since the only thing we *must* prevent 
>> >> is
>> >> xenvif_gop_skb()
>> >> consuming more slots than are available."
>> >>
>> >> Your "worst-case" calculation stated in the commit message is clearly not
>> the
>> >> worst case,
>> >> since it doesn't take calls to "get_next_rx_buffer" into account.
>> >>
>> 
>> > It should be taking into account the behaviour of start_new_rx_buffer(),
>> which should be true if a slot is full or a frag will overflow the current 
>> slot and
>> doesn't require splitting.
>> > The code in net_rx_action() makes the assumption that each frag will
>> require as many slots as its size requires, i.e. it assumes no packing of
>> multiple frags into a single slot, so it should be a worst case.
>> > Did I miss something in that logic?
>> 
>> Yes.
>> In "xenvif_gop_skb()" this loop:
>> 
>> for (i = 0; i < nr_frags; i++) {
>> xenvif_gop_frag_copy(vif, skb, npo,
>>  
>> skb_frag_page(&skb_shinfo(skb)->frags[i]),
>>  
>> skb_frag_size(&skb_shinfo(skb)->frags[i]),
>>  skb_shinfo(skb)->frags[i].page_offset,
>>  &head);
>> }
>> 
>> Is capable of using up (at least) 1 slot more than is anticipated for in
>> "net_rx_action()"  by this code:
>> 
>> for (i = 0; i < skb_shinfo(skb)->nr_frags; i++) {
>> unsigned int size;
>> size = skb_frag_size(&skb_shinfo(skb)->frags[i]);
>> max_slots_needed += DIV_ROUND_UP(size, PAGE_SIZE);
>> }
>> 
>> And this happens when it calls "get_next_rx_buffer()" from
>> "xenvif_gop_frag_copy()" where it's breaking down the frag.
>> 

> The function that determines whether to consume another slot is 
> start_new_rx_buffer() and for each frag I don't see why this would return 
> true more than DIV_ROUND_UP(size, PAGE_SIZE) times.
> It may be called more times than that since the code in 
> xenvif_gop_frag_copy() must also allow for the offset of the frag but should 
> not return true in all cases.
> So, I still cannot see why a frag would ever consume more than 
> DIV_ROUND_UP(size, PAGE_SIZE) slots.

Well here a case were a frag is broken down in 2 pieces:

[ 1156.870372] vif vif-7-0 vif7.0: ?!? xenvif_gop_frag_copy Me here 1  
npo->meta_prod:39 vif->rx.sring->req_prod:2105867 vif->rx.req_cons:2105867 
npo->copy_gref:760  npo->copy_off:4096  MAX_BUFFER_OFFSET:4096 bytes:560 
size:560  offset:0 head:1273462060 i:2 vif->rx.sring->req_event:2104275 
estimated_slots_needed:4 reserved_slots_left:0
[ 1

Re: [Xen-devel] Xen-unstable Linux 3.14-rc3 and 3.13 Network troubles "bisected"

2014-03-26 Thread Sander Eikelenboom

Wednesday, March 26, 2014, 5:25:21 PM, you wrote:

>> -Original Message-
>> From: Sander Eikelenboom [mailto:li...@eikelenboom.it]
>> Sent: 26 March 2014 16:07
>> To: Paul Durrant
>> Cc: Wei Liu; annie li; Zoltan Kiss; xen-de...@lists.xen.org; Ian Campbell; 
>> linux-
>> kernel; net...@vger.kernel.org
>> Subject: Re: [Xen-devel] Xen-unstable Linux 3.14-rc3 and 3.13 Network
>> troubles "bisected"
>> 
>> 
>> Wednesday, March 26, 2014, 4:50:30 PM, you wrote:
>> 
>> >> -Original Message-
>> >> From: Sander Eikelenboom [mailto:li...@eikelenboom.it]
>> >> Sent: 26 March 2014 15:23
>> >> To: Paul Durrant
>> >> Cc: Wei Liu; annie li; Zoltan Kiss; xen-de...@lists.xen.org; Ian Campbell;
>> linux-
>> >> kernel; net...@vger.kernel.org
>> >> Subject: Re: [Xen-devel] Xen-unstable Linux 3.14-rc3 and 3.13 Network
>> >> troubles "bisected"
>> >>
>> >>
>> >> Wednesday, March 26, 2014, 3:44:42 PM, you wrote:
>> >>
>> >> >> -Original Message-
>> >> >> From: Sander Eikelenboom [mailto:li...@eikelenboom.it]
>> >> >> Sent: 26 March 2014 11:11
>> >> >> To: Paul Durrant
>> >> >> Cc: Wei Liu; annie li; Zoltan Kiss; xen-de...@lists.xen.org; Ian 
>> >> >> Campbell;
>> >> linux-
>> >> >> kernel; net...@vger.kernel.org
>> >> >> Subject: Re: [Xen-devel] Xen-unstable Linux 3.14-rc3 and 3.13 Network
>> >> >> troubles "bisected"
>> >> >>
>> >> >> Paul,
>> >> >>
>> >> >> You have been awfully silent for this whole thread while this is a
>> >> regression
>> >> >> caused by a patch of you
>> >> >> (ca2f09f2b2c6c25047cfc545d057c4edfcfe561c as clearly stated much
>> earlier
>> >> in
>> >> >> this thread).
>> >> >>
>> >>
>> >> > Sorry, I've been distracted...
>> >>
>> >> >> The commit messages states:
>> >> >> "net_rx_action() is the place where we could do with an accurate
>> >> >> predicition but,
>> >> >> since that has proven tricky to calculate, a cheap worse-case (but 
>> >> >> not
>> >> too
>> >> >> bad)
>> >> >> estimate is all we really need since the only thing we *must* 
>> >> >> prevent
>> is
>> >> >> xenvif_gop_skb()
>> >> >> consuming more slots than are available."
>> >> >>
>> >> >> Your "worst-case" calculation stated in the commit message is clearly
>> not
>> >> the
>> >> >> worst case,
>> >> >> since it doesn't take calls to "get_next_rx_buffer" into account.
>> >> >>
>> >>
>> >> > It should be taking into account the behaviour of
>> start_new_rx_buffer(),
>> >> which should be true if a slot is full or a frag will overflow the 
>> >> current slot
>> and
>> >> doesn't require splitting.
>> >> > The code in net_rx_action() makes the assumption that each frag will
>> >> require as many slots as its size requires, i.e. it assumes no packing of
>> >> multiple frags into a single slot, so it should be a worst case.
>> >> > Did I miss something in that logic?
>> >>
>> >> Yes.
>> >> In "xenvif_gop_skb()" this loop:
>> >>
>> >> for (i = 0; i < nr_frags; i++) {
>> >> xenvif_gop_frag_copy(vif, skb, npo,
>> >>  
>> >> skb_frag_page(&skb_shinfo(skb)->frags[i]),
>> >>  
>> >> skb_frag_size(&skb_shinfo(skb)->frags[i]),
>> >>  
>> >> skb_shinfo(skb)->frags[i].page_offset,
>> >>  &head);
>> >> }
>> >>
>> >> Is capable of using up (at least) 1 slot more than is anticipated for in
>> >> "net_rx_action()"  by this code:
>> >>
>> >> for (i = 0; i < skb_shinfo(skb)->nr_frags; i++) {
>> >> unsigned int size;

Re: [Xen-devel] Xen-unstable Linux 3.14-rc3 and 3.13 Network troubles "bisected"

2014-03-26 Thread Sander Eikelenboom

Hi Paul,

Seems your last mail arrived in pretty bad shape (truncated) in my mailbox ..

--
Sander

Wednesday, March 26, 2014, 6:16:49 PM, you wrote:

>> -Original Message-
>> From: Sander Eikelenboom [mailto:li...@eikelenboom.it]
>> Sent: 26 March 2014 16:54
>> To: Paul Durrant
>> Cc: Wei Liu; annie li; Zoltan Kiss; xen-de...@lists.xen.org; Ian Campbell; 
>> linux-
>> kernel; net...@vger.kernel.org
>> Subject: Re: [Xen-devel] Xen-unstable Linux 3.14-rc3 and 3.13 Network
>> troubles "bisected"
>> 
>> 
>> Wednesday, March 26, 2014, 5:25:21 PM, you wrote:
>> 
>> >> -Original Message-
>> >> From: Sander Eikelenboom [mailto:li...@eikelenboom.it]
>> >> Sent: 26 March 2014 16:07
>> >> To: Paul Durrant
>> >> Cc: Wei Liu; annie li; Zoltan Kiss; xen-de...@lists.xen.org; Ian Campbell;
>> linux-
>> >> kernel; net...@vger.kernel.org
>> >> Subject: Re: [Xen-devel] Xen-unstable Linux 3.14-rc3 and 3.13 Network
>> >> troubles "bisected"
>> >>
>> >>
>> >> Wednesday, March 26, 2014, 4:50:30 PM, you wrote:
>> >>
>> >> >> -Original Message-
>> >> >> From: Sander Eikelenboom [mailto:li...@eikelenboom.it]
>> >> >> Sent: 26 March 2014 15:23
>> >> >> To: Paul Durrant
>> >> >> Cc: Wei Liu; annie li; Zoltan Kiss; xen-de...@lists.xen.org; Ian 
>> >> >> Campbell;
>> >> linux-
>> >> >> kernel; net...@vger.kernel.org
>> >> >> Subject: Re: [Xen-devel] Xen-unstable Linux 3.14-rc3 and 3.13 Network
>> >> >> troubles "bisected"
>> >> >>
>> >> >>
>> >> >> Wednesday, March 26, 2014, 3:44:42 PM, you wrote:
>> >> >>
>> >> >> >> -Original Message-
>> >> >> >> From: Sander Eikelenboom [mailto:li...@eikelenboom.it]
>> >> >> >> Sent: 26 March 2014 11:11
>> >> >> >> To: Paul Durrant
>> >> >> >> Cc: Wei Liu; annie li; Zoltan Kiss; xen-de...@lists.xen.org; Ian
>> Campbell;
>> >> >> linux-
>> >> >> >> kernel; net...@vger.kernel.org
>> >> >> >> Subject: Re: [Xen-devel] Xen-unstable Linux 3.14-rc3 and 3.13
>> Network
>> >> >> >> troubles "bisected"
>> >> >> >>
>> >> >> >> Paul,
>> >> >> >>
>> >> >> >> You have been awfully silent for this whole thread while this is a
>> >> >> regression
>> >> >> >> caused by a patch of you
>> >> >> >> (ca2f09f2b2c6c25047cfc545d057c4edfcfe561c as clearly stated much
>> >> earlier
>> >> >> in
>> >> >> >> this thread).
>> >> >> >>
>> >> >>
>> >> >> > Sorry, I've been distracted...
>> >> >>
>> >> >> >> The commit messages states:
>> >> >> >> "net_rx_action() is the place where we could do with an accurate
>> >> >> >> predicition but,
>> >> >> >> since that has proven tricky to calculate, a cheap worse-case 
>> >> >> >> (but
>> not
>> >> >> too
>> >> >> >> bad)
>> >> >> >> estimate is all we really need since the only thing we *must*
>> prevent
>> >> is
>> >> >> >> xenvif_gop_skb()
>> >> >> >> consuming more slots than are available."
>> >> >> >>
>> >> >> >> Your "worst-case" calculation stated in the commit message is
>> clearly
>> >> not
>> >> >> the
>> >> >> >> worst case,
>> >> >> >> since it doesn't take calls to "get_next_rx_buffer" into account

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Xen-devel] Xen-unstable Linux 3.14-rc3 and 3.13 Network troubles "bisected"

2014-03-26 Thread Sander Eikelenboom

Wednesday, March 26, 2014, 6:46:06 PM, you wrote:

> Re-send shortened version...

>> -Original Message-
>> From: Sander Eikelenboom [mailto:li...@eikelenboom.it]
>> Sent: 26 March 2014 16:54
>> To: Paul Durrant
>> Cc: Wei Liu; annie li; Zoltan Kiss; xen-de...@lists.xen.org; Ian Campbell; 
>> linux-
>> kernel; net...@vger.kernel.org
>> Subject: Re: [Xen-devel] Xen-unstable Linux 3.14-rc3 and 3.13 Network
>> troubles "bisected"
>> 
> [snip]
>> >>
>> >> - When processing an SKB we end up in "xenvif_gop_frag_copy" while
>> prod
>> >> == cons ... but we still have bytes and size left ..
>> >> - start_new_rx_buffer() has returned true ..
>> >> - so we end up in get_next_rx_buffer
>> >> - this does a RING_GET_REQUEST and ups cons ..
>> >> - and we end up with a bad grant reference.
>> >>
>> >> Sometimes we are saved by the bell .. since additional slots have become
>> >> free (you see cons become > prod in "get_next_rx_buffer" but shortly
>> after
>> >> that prod is increased ..
>> >> just in time to not cause a overrun).
>> >>
>> 
>> > Ah, but hang on... There's a BUG_ON meta_slots_used >
>> max_slots_needed, so if we are overflowing the worst-case calculation then
>> why is that BUG_ON not firing?
>> 
>> You mean:
>> sco = (struct skb_cb_overlay *)skb->cb;
>> sco->meta_slots_used = xenvif_gop_skb(skb, &npo);
>> BUG_ON(sco->meta_slots_used > max_slots_needed);
>> 
>> in "get_next_rx_buffer" ?
>> 

> That code excerpt is from net_rx_action(),isn't it?

Yes

>> I don't know .. at least now it doesn't crash dom0 and therefore not my
>> complete machine and since tcp is recovering from a failed packet  :-)
>> 

> Well, if the code calculating max_slots_needed were underestimating then the 
> BUG_ON() should fire. If it is not firing in your case then this suggests 
> your problem lies elsewhere, or that meta_slots_used is not equal to the 
> number of ring slots consumed.

It's seem to be the last ..

[ 1157.188908] vif vif-7-0 vif7.0: ?!? xenvif_gop_skb Me here 5 
npo->meta_prod:40 old_meta_prod:36 vif->rx.sring->req_prod:2105867 
vif->rx.req_cons:2105868 meta->gso_type:1 meta->gso_size:1448 nr_frags:1 
req->gref:657 req->id:7 estimated_slots_needed:4 j(data):1 
reserved_slots_left:-1used in funcstart: 0 + 1 .. used_dataloop:1 .. 
used_fragloop:3
[ 1157.244975] vif vif-7-0 vif7.0: ?!? xenvif_rx_action me here 2 ..  
vif->rx.sring->req_prod:2105867 vif->rx.req_cons:2105868 sco->meta_slots_used:4 
max_upped_gso:1 skb_is_gso(skb):1 max_slots_needed:4 j:6 is_gso:1 nr_frags:1 
firstpart:1 secondpart:2 reserved_slots_left:-1

net_rx_action() calculated we would need 4 slots .. and sco->meta_slots_used == 
4 when we return so it doesn't trigger you BUG_ON ..

The 4 slots we calculated are:
  1 slot for the data part: DIV_ROUND_UP(offset_in_page(skb->data) + 
skb_headlen(skb), PAGE_SIZE)
  2 slots for the single frag in this SKB from: DIV_ROUND_UP(size, PAGE_SIZE)
  1 slot since GSO

In the debug code i annotated all cons++, and the code uses 1 slot to process 
the data from the SKB as expected but uses 3 slots in the frag chopping loop.
And when it reaches the state  were cons > prod it is always in 
"get_next_rx_buffer".

>> But probably because "npo->copy_prod++" seems to be used for the frags ..
>> and it isn't added to  npo->meta_prod ?
>> 

> meta_slots_used is calculated as the value of meta_prod at return (from 
> xenvif_gop_skb()) minus the value on entry ,
> and if you look back up the code then you can see that meta_prod is 
> incremented every time RING_GET_REQUEST() is evaluated.
> So, we must be consuming a slot without evaluating RING_GET_REQUEST() and I 
> think that's exactly what's happening...
> Right at the bottom of xenvif_gop_frag_copy() req_cons is simply incremented 
> in the case of a GSO. So the BUG_ON() is indeed off by one.

That is probably only done on first iteration / frag ?
Because i don't see my warn there trigger .. but it could be that's because at 
that moment we still have cons <= prod.


>   Paul



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Xen-devel] Xen-unstable Linux 3.14-rc3 and 3.13 Network troubles "bisected"

2014-03-26 Thread Sander Eikelenboom

Wednesday, March 26, 2014, 6:48:15 PM, you wrote:

>> -Original Message-
>> From: Paul Durrant
>> Sent: 26 March 2014 17:47
>> To: 'Sander Eikelenboom'
>> Cc: Wei Liu; annie li; Zoltan Kiss; xen-de...@lists.xen.org; Ian Campbell; 
>> linux-
>> kernel; net...@vger.kernel.org
>> Subject: RE: [Xen-devel] Xen-unstable Linux 3.14-rc3 and 3.13 Network
>> troubles "bisected"
>> 
>> Re-send shortened version...
>> 
>> > -Original Message-
>> > From: Sander Eikelenboom [mailto:li...@eikelenboom.it]
>> > Sent: 26 March 2014 16:54
>> > To: Paul Durrant
>> > Cc: Wei Liu; annie li; Zoltan Kiss; xen-de...@lists.xen.org; Ian Campbell;
>> linux-
>> > kernel; net...@vger.kernel.org
>> > Subject: Re: [Xen-devel] Xen-unstable Linux 3.14-rc3 and 3.13 Network
>> > troubles "bisected"
>> >
>> [snip]
>> > >>
>> > >> - When processing an SKB we end up in "xenvif_gop_frag_copy" while
>> > prod
>> > >> == cons ... but we still have bytes and size left ..
>> > >> - start_new_rx_buffer() has returned true ..
>> > >> - so we end up in get_next_rx_buffer
>> > >> - this does a RING_GET_REQUEST and ups cons ..
>> > >> - and we end up with a bad grant reference.
>> > >>
>> > >> Sometimes we are saved by the bell .. since additional slots have
>> become
>> > >> free (you see cons become > prod in "get_next_rx_buffer" but shortly
>> > after
>> > >> that prod is increased ..
>> > >> just in time to not cause a overrun).
>> > >>
>> >
>> > > Ah, but hang on... There's a BUG_ON meta_slots_used >
>> > max_slots_needed, so if we are overflowing the worst-case calculation
>> then
>> > why is that BUG_ON not firing?
>> >
>> > You mean:
>> > sco = (struct skb_cb_overlay *)skb->cb;
>> > sco->meta_slots_used = xenvif_gop_skb(skb, &npo);
>> > BUG_ON(sco->meta_slots_used > max_slots_needed);
>> >
>> > in "get_next_rx_buffer" ?
>> >
>> 
>> That code excerpt is from net_rx_action(),isn't it?
>> 
>> > I don't know .. at least now it doesn't crash dom0 and therefore not my
>> > complete machine and since tcp is recovering from a failed packet  :-)
>> >
>> 
>> Well, if the code calculating max_slots_needed were underestimating then
>> the BUG_ON() should fire. If it is not firing in your case then this suggests
>> your problem lies elsewhere, or that meta_slots_used is not equal to the
>> number of ring slots consumed.
>> 
>> > But probably because "npo->copy_prod++" seems to be used for the frags
>> ..
>> > and it isn't added to  npo->meta_prod ?
>> >
>> 
>> meta_slots_used is calculated as the value of meta_prod at return (from
>> xenvif_gop_skb()) minus the value on entry , and if you look back up the
>> code then you can see that meta_prod is incremented every time
>> RING_GET_REQUEST() is evaluated. So, we must be consuming a slot without
>> evaluating RING_GET_REQUEST() and I think that's exactly what's
>> happening... Right at the bottom of xenvif_gop_frag_copy() req_cons is
>> simply incremented in the case of a GSO. So the BUG_ON() is indeed off by
>> one.
>> 

> Can you re-test with the following patch applied?

>   Paul

> diff --git a/drivers/net/xen-netback/netback.c 
> b/drivers/net/xen-netback/netback
> index 438d0c0..4f24220 100644
> --- a/drivers/net/xen-netback/netback.c
> +++ b/drivers/net/xen-netback/netback.c
> @@ -482,6 +482,8 @@ static void xenvif_rx_action(struct xenvif *vif)

> while ((skb = skb_dequeue(&vif->rx_queue)) != NULL) {
> RING_IDX max_slots_needed;
> +   RING_IDX old_req_cons;
> +   RING_IDX ring_slots_used;
> int i;

> /* We need a cheap worse case estimate for the number of
> @@ -511,8 +513,12 @@ static void xenvif_rx_action(struct xenvif *vif)
> vif->rx_last_skb_slots = 0;

> sco = (struct skb_cb_overlay *)skb->cb;
> +
> +   old_req_cons = vif->rx.req_cons;
> sco->meta_slots_used = xenvif_gop_skb(skb, &npo);
> -   BUG_ON(sco->meta_slots_used > max_slots_needed);
> +   ring_slots_used = vif->

Re: [Xen-devel] Xen-unstable Linux 3.14-rc3 and 3.13 Network troubles "bisected"

2014-03-26 Thread Sander Eikelenboom

Wednesday, March 26, 2014, 7:15:30 PM, you wrote:

>> -Original Message-
>> From: Sander Eikelenboom [mailto:li...@eikelenboom.it]
>> Sent: 26 March 2014 18:08
>> To: Paul Durrant
>> Cc: Wei Liu; annie li; Zoltan Kiss; xen-de...@lists.xen.org; Ian Campbell; 
>> linux-
>> kernel; net...@vger.kernel.org
>> Subject: Re: [Xen-devel] Xen-unstable Linux 3.14-rc3 and 3.13 Network
>> troubles "bisected"
>> 
>> 
>> Wednesday, March 26, 2014, 6:46:06 PM, you wrote:
>> 
>> > Re-send shortened version...
>> 
>> >> -Original Message-
>> >> From: Sander Eikelenboom [mailto:li...@eikelenboom.it]
>> >> Sent: 26 March 2014 16:54
>> >> To: Paul Durrant
>> >> Cc: Wei Liu; annie li; Zoltan Kiss; xen-de...@lists.xen.org; Ian Campbell;
>> linux-
>> >> kernel; net...@vger.kernel.org
>> >> Subject: Re: [Xen-devel] Xen-unstable Linux 3.14-rc3 and 3.13 Network
>> >> troubles "bisected"
>> >>
>> > [snip]
>> >> >>
>> >> >> - When processing an SKB we end up in "xenvif_gop_frag_copy" while
>> >> prod
>> >> >> == cons ... but we still have bytes and size left ..
>> >> >> - start_new_rx_buffer() has returned true ..
>> >> >> - so we end up in get_next_rx_buffer
>> >> >> - this does a RING_GET_REQUEST and ups cons ..
>> >> >> - and we end up with a bad grant reference.
>> >> >>
>> >> >> Sometimes we are saved by the bell .. since additional slots have
>> become
>> >> >> free (you see cons become > prod in "get_next_rx_buffer" but shortly
>> >> after
>> >> >> that prod is increased ..
>> >> >> just in time to not cause a overrun).
>> >> >>
>> >>
>> >> > Ah, but hang on... There's a BUG_ON meta_slots_used >
>> >> max_slots_needed, so if we are overflowing the worst-case calculation
>> then
>> >> why is that BUG_ON not firing?
>> >>
>> >> You mean:
>> >> sco = (struct skb_cb_overlay *)skb->cb;
>> >> sco->meta_slots_used = xenvif_gop_skb(skb, &npo);
>> >> BUG_ON(sco->meta_slots_used > max_slots_needed);
>> >>
>> >> in "get_next_rx_buffer" ?
>> >>
>> 
>> > That code excerpt is from net_rx_action(),isn't it?
>> 
>> Yes
>> 
>> >> I don't know .. at least now it doesn't crash dom0 and therefore not my
>> >> complete machine and since tcp is recovering from a failed packet  :-)
>> >>
>> 
>> > Well, if the code calculating max_slots_needed were underestimating then
>> the BUG_ON() should fire. If it is not firing in your case then this suggests
>> your problem lies elsewhere, or that meta_slots_used is not equal to the
>> number of ring slots consumed.
>> 
>> It's seem to be the last ..
>> 
>> [ 1157.188908] vif vif-7-0 vif7.0: ?!? xenvif_gop_skb Me here 5 npo-
>> >meta_prod:40 old_meta_prod:36 vif->rx.sring->req_prod:2105867 vif-
>> >rx.req_cons:2105868 meta->gso_type:1 meta->gso_size:1448 nr_frags:1
>> req->gref:657 req->id:7 estimated_slots_needed:4 j(data):1
>> reserved_slots_left:-1used in funcstart: 0 + 1 .. used_dataloop:1 ..
>> used_fragloop:3
>> [ 1157.244975] vif vif-7-0 vif7.0: ?!? xenvif_rx_action me here 2 ..  vif-
>> >rx.sring->req_prod:2105867 vif->rx.req_cons:2105868 sco-
>> >meta_slots_used:4 max_upped_gso:1 skb_is_gso(skb):1
>> max_slots_needed:4 j:6 is_gso:1 nr_frags:1 firstpart:1 secondpart:2
>> reserved_slots_left:-1
>> 
>> net_rx_action() calculated we would need 4 slots .. and sco-
>> >meta_slots_used == 4 when we return so it doesn't trigger you BUG_ON ..
>> 
>> The 4 slots we calculated are:
>>   1 slot for the data part: DIV_ROUND_UP(offset_in_page(skb->data) +
>> skb_headlen(skb), PAGE_SIZE)
>>   2 slots for the single frag in this SKB from: DIV_ROUND_UP(size, PAGE_SIZE)
>>   1 slot since GSO
>> 
>> In the debug code i annotated all cons++, and the code uses 1 slot to process
>> the data from the SKB as expected but uses 3 slots in the frag chopping loop.
>> And when it reaches the state  were cons > prod it is always in
>> "get_next_rx_buffer".
>> 
>> >> But probably because "npo->copy_

Re: [Xen-devel] Xen-unstable Linux 3.14-rc3 and 3.13 Network troubles "bisected"

2014-03-27 Thread Sander Eikelenboom

Thursday, March 27, 2014, 10:47:02 AM, you wrote:

>> -Original Message-
>> From: Sander Eikelenboom [mailto:li...@eikelenboom.it]
>> Sent: 26 March 2014 19:57
>> To: Paul Durrant
>> Cc: Wei Liu; annie li; Zoltan Kiss; xen-de...@lists.xen.org; Ian Campbell; 
>> linux-
>> kernel; net...@vger.kernel.org
>> Subject: Re: [Xen-devel] Xen-unstable Linux 3.14-rc3 and 3.13 Network
>> troubles "bisected"
>> 
>> 
>> Wednesday, March 26, 2014, 6:48:15 PM, you wrote:
>> 
>> >> -----Original Message-
>> >> From: Paul Durrant
>> >> Sent: 26 March 2014 17:47
>> >> To: 'Sander Eikelenboom'
>> >> Cc: Wei Liu; annie li; Zoltan Kiss; xen-de...@lists.xen.org; Ian Campbell;
>> linux-
>> >> kernel; net...@vger.kernel.org
>> >> Subject: RE: [Xen-devel] Xen-unstable Linux 3.14-rc3 and 3.13 Network
>> >> troubles "bisected"
>> >>
>> >> Re-send shortened version...
>> >>
>> >> > -Original Message-
>> >> > From: Sander Eikelenboom [mailto:li...@eikelenboom.it]
>> >> > Sent: 26 March 2014 16:54
>> >> > To: Paul Durrant
>> >> > Cc: Wei Liu; annie li; Zoltan Kiss; xen-de...@lists.xen.org; Ian 
>> >> > Campbell;
>> >> linux-
>> >> > kernel; net...@vger.kernel.org
>> >> > Subject: Re: [Xen-devel] Xen-unstable Linux 3.14-rc3 and 3.13 Network
>> >> > troubles "bisected"
>> >> >
>> >> [snip]
>> >> > >>
>> >> > >> - When processing an SKB we end up in "xenvif_gop_frag_copy"
>> while
>> >> > prod
>> >> > >> == cons ... but we still have bytes and size left ..
>> >> > >> - start_new_rx_buffer() has returned true ..
>> >> > >> - so we end up in get_next_rx_buffer
>> >> > >> - this does a RING_GET_REQUEST and ups cons ..
>> >> > >> - and we end up with a bad grant reference.
>> >> > >>
>> >> > >> Sometimes we are saved by the bell .. since additional slots have
>> >> become
>> >> > >> free (you see cons become > prod in "get_next_rx_buffer" but
>> shortly
>> >> > after
>> >> > >> that prod is increased ..
>> >> > >> just in time to not cause a overrun).
>> >> > >>
>> >> >
>> >> > > Ah, but hang on... There's a BUG_ON meta_slots_used >
>> >> > max_slots_needed, so if we are overflowing the worst-case calculation
>> >> then
>> >> > why is that BUG_ON not firing?
>> >> >
>> >> > You mean:
>> >> > sco = (struct skb_cb_overlay *)skb->cb;
>> >> > sco->meta_slots_used = xenvif_gop_skb(skb, &npo);
>> >> > BUG_ON(sco->meta_slots_used > max_slots_needed);
>> >> >
>> >> > in "get_next_rx_buffer" ?
>> >> >
>> >>
>> >> That code excerpt is from net_rx_action(),isn't it?
>> >>
>> >> > I don't know .. at least now it doesn't crash dom0 and therefore not my
>> >> > complete machine and since tcp is recovering from a failed packet  :-)
>> >> >
>> >>
>> >> Well, if the code calculating max_slots_needed were underestimating
>> then
>> >> the BUG_ON() should fire. If it is not firing in your case then this 
>> >> suggests
>> >> your problem lies elsewhere, or that meta_slots_used is not equal to the
>> >> number of ring slots consumed.
>> >>
>> >> > But probably because "npo->copy_prod++" seems to be used for the
>> frags
>> >> ..
>> >> > and it isn't added to  npo->meta_prod ?
>> >> >
>> >>
>> >> meta_slots_used is calculated as the value of meta_prod at return (from
>> >> xenvif_gop_skb()) minus the value on entry , and if you look back up the
>> >> code then you can see that meta_prod is incremented every time
>> >> RING_GET_REQUEST() is evaluated. So, we must be consuming a slot
>> without
>> >> evaluating RING_GET_REQUEST() and I think that's exactly what's
>> >> happening... Right at the bottom of xenvif_gop_frag_copy() req_cons is
>

Re: [Xen-devel] Xen-unstable Linux 3.14-rc3 and 3.13 Network troubles "bisected"

2014-03-27 Thread Sander Eikelenboom
Hrmm i don't know if it's your mailer or my mailer .. but i seem to get a lot 
of your mails truncated somehow :S
though the xen-devel list archive seem to have them in complete form .. so it's 
probably my mailer tripping over something

> I'll come up with some patches shortly.

OK will test them ASAP.


Thursday, March 27, 2014, 10:54:09 AM, you wrote:

>> -Original Message-
>> From: Sander Eikelenboom [mailto:li...@eikelenboom.it]
>> Sent: 26 March 2014 20:18
>> To: Paul Durrant
>> Cc: Wei Liu; annie li; Zoltan Kiss; xen-de...@lists.xen.org; Ian Campbell; 
>> linux-
>> kernel; net...@vger.kernel.org
>> Subject: Re: [Xen-devel] Xen-unstable Linux 3.14-rc3 and 3.13 Network
>> troubles "bisected"
>> 
>> 
>> Wednesday, March 26, 2014, 7:15:30 PM, you wrote:
>> 
>> >> -Original Message-
>> >> From: Sander Eikelenboom [mailto:li...@eikelenboom.it]
>> >> Sent: 26 March 2014 18:08
>> >> To: Paul Durrant
>> >> Cc: Wei Liu; annie li; Zoltan Kiss; xen-de...@lists.xen.org; Ian Campbell;
>> linux-
>> >> kernel; net...@vger.kernel.org
>> >> Subject: Re: [Xen-devel] Xen-unstable Linux 3.14-rc3 and 3.13 Network
>> >> troubles "bisected"
>> >>
>> >>
>> >> Wednesday, March 26, 2014, 6:46:06 PM, you wrote:
>> >>
>> >> > Re-send shortened version...
>> >>
>> >> >> -Original Message-
>> >> >> From: Sander Eikelenboom [mailto:li...@eikelenboom.it]
>> >> >> Sent: 26 March 2014 16:54
>> >> >> To: Paul Durrant
>> >> >> Cc: Wei Liu; annie li; Zoltan Kiss; xen-de...@lists.xen.org; Ian 
>> >> >> Campbell;
>> >> linux-
>> >> >> kernel; net...@vger.kernel.org
>> >> >> Subject: Re: [Xen-devel] Xen-unstable Linux 3.14-rc3 and 3.13 Network
>> >> >> troubles "bisected"
>> >> >>
>> >> > [snip]
>> >> >> >>
>> >> >> >> - When processing an SKB we end up in "xenvif_gop_frag_copy"
>> while
>> >> >> prod
>> >> >> >> == cons ... but we still have bytes and size left ..
>> >> >> >> - start_new_rx_buffer() has returned true ..
>> >> >> >> - so we end up in get_next_rx_buffer
>> >> >> >> - this does a RING_GET_REQUEST and ups cons ..
>> >> >> >> - and we end up with a bad grant reference.
>> >> >> >>
>> >> >> >> Sometimes we are saved by the bell .. since additional slots have
>> >> become
>> >> >> >> free (you see cons become > prod in "get_next_rx_buffer" but
>> shortly
>> >> >> after
>> >> >> >> that prod is increased ..
>> >> >> >> just in time to not cause a overrun).
>> >> >> >>
>> >> >>
>> >> >> > Ah, but hang on... There's a BUG_ON meta_slots_used >
>> >> >> max_slots_needed, so if we are overflowing the worst-case calculation
>> >> then
>> >> >> why is that BUG_ON not firing?
>> >> >>
>> >> >> You mean:
>> >> >> sco = (struct skb_cb_overlay *)skb->cb;
>> >> >> sco->meta_slots_used = xenvif_gop_skb(skb, &npo);
>> >> >> BUG_ON(sco->meta_slots_used > max_slots_needed);
>> >> >>
>> >> >> in "get_next_rx_buffer" ?
>> >> >>
>> >>
>> >> > That code excerpt is from net_rx_action(),isn't it?
>> >>
>> >> Yes
>> >>
>> >> >> I don't know .. at least now it doesn't crash dom0 and therefore not my
>> >> >> complete machine and since tcp is recovering from a failed packet  :-)
>> >> >>
>> >>
>> >> > Well, if the code calculating max_slots_needed were underestimating
>> then
>> >> the BUG_ON() should fire. If it is not firing in your case then this 
>> >> suggests
>> >> your problem lies elsewhere, or that meta_slots_used is not equal to the
>> >> number of ring slots consumed.
>> >>
>> >> It's seem to be the last ..
>> >>
>> >> [ 1157.188908] vif vif-7-0 vif7.0: ?!? 

Re: 3.15-mw: Oops Workqueue: writeback bdi_writeback_workfn (flush-8:16) RIP: e030:[] [] kobject_put+0x11/0x70

2014-04-14 Thread Sander Eikelenboom

Monday, April 14, 2014, 1:30:15 PM, you wrote:

> On Sat, Apr 12, 2014 at 01:34:31PM +0200, Sander Eikelenboom wrote:
>> Hi,
>> 
>> I just ran into the oops belowafter some uptime.

> Classic use after free introduced by my recent changes, sorry.

> This should fix it:

Thx !

> ---
> From: Christoph Hellwig 
> Subject: scsi: don't reference freed command in scsi_init_sgtable

> When scsi_init_io fails we have to release our device reference, but
> we do this trying to reference the just freed command.  Add a local
> scsi_device pointer to fix this.

> Reported-by: Sander Eikelenboom 
> Signed-off-by: Christoph Hellwig 

> diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
> index 65a123d..54eff6a 100644
> --- a/drivers/scsi/scsi_lib.c
> +++ b/drivers/scsi/scsi_lib.c
> @@ -1044,6 +1044,7 @@ static int scsi_init_sgtable(struct request *req, 
> struct scsi_data_buffer *sdb,
>   */
>  int scsi_init_io(struct scsi_cmnd *cmd, gfp_t gfp_mask)
>  {
> +   struct scsi_device *sdev = cmd->device;
> struct request *rq = cmd->request;
>  
> int error = scsi_init_sgtable(rq, &cmd->sdb, gfp_mask);
> @@ -1091,7 +1092,7 @@ err_exit:
> scsi_release_buffers(cmd);
> cmd->request->special = NULL;
> scsi_put_command(cmd);
> -   put_device(&cmd->device->sdev_gendev);
> +   put_device(&sdev->sdev_gendev);
> return error;
>  }
>  EXPORT_SYMBOL(scsi_init_io);


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Xen-devel] [PATCH v4] PCI back fixes for 3.17.

2014-07-14 Thread Sander Eikelenboom

Friday, July 11, 2014, 10:08:47 PM, you wrote:

> Please see this set of patches which are fixes to Xen pciback
> for 3.17. They are also located at:

>  git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen.git 
> devel/pciback-3.17.v4 

> These patches do not include the PCI bus reset/slot code as we are still
> discussing that.

> Konrad Rzeszutek Wilk (5):
>   xen-pciback: Document the various parameters and attributes in SysFS
>   xen/pciback: Don't deadlock when unbinding.
>   xen/pciback: Include the domain id if removing the device whilst still 
> in use
>   xen/pciback: Print out the domain owning the device.
>   xen/pciback: Remove tons of dereferences

>  Documentation/ABI/testing/sysfs-driver-pciback |   25 +++
>  drivers/xen/xen-pciback/passthrough.c  |9 -
>  drivers/xen/xen-pciback/pci_stub.c |   41 
> +
>  drivers/xen/xen-pciback/pciback.h  |7 ++--
>  drivers/xen/xen-pciback/vpci.c |9 -
>  drivers/xen/xen-pciback/xenbus.c   |4 +-
>  6 files changed, 67 insertions(+), 28 deletions(-)

Hi Konrad / David,

Thanks for the fixes in this series, i just tested this series and noticed (as 
somewhat expected :-) ) it's still lacking a fix for the bug that pciback 
doesn't properly free / disown a device from a HVM guest.

This only happens when using "xl pci-detach domain BDF"
  AND
when the guest has more than one pci device attached and you remove only ONE of 
them.

The pci-detach doesn't show an error, the device is removed from the guest, but 
it seems it is not internally freed in pciback.

Below is the sequence and outcome (with some added printk's) for two situations:
A) Guest has only one pci-device passed through which is then removed, as you 
can see 
it's freed in pciback as well.

B) Guest has two pci-devices passed through from which one is then removed. 
After that the second is removed as well.

>From what i recall Konrad thought it could be due to the guest being a HVM and 
>thus the signaling is different from a
PV-Guest (no pci-front confirming the removal via xenbus). But since it does 
get freed when it is the last device attached to the guest
i don't know if that's completely true.
It's now semi-fixed when doing the 'xl pci-assignable-remove', that seems to 
force unregistring the device owner, however something still doesn't seem
right see the second "xl pci-detach" removing the last device from (A), it 
takes about a minute instead of not more than a second.

Also note that when on calling "xl pci-assignable-remove" for the last (or 
only) passed through device for a guest, you don't get the warning messages 
about
the device being 'in-use'

Completely below an diff of the added printk's.

--

Sander


Ad A)

root@dom0:~# xl pci-list router
Vdev Device
05.0 :02:00.0
06.0 :00:1b.0

root@dom0:~# xl pci-assignable-list

root@dom0:~# xl pci-detach router 00:1b.0

dmesg shows:
[  434.839156] pciback :00:1b.0: restoring config space at offset 0x10 (was 
0x4, writing 0xf7d30004)
[  434.841745] pciback :00:1b.0: restoring config space at offset 0xc (was 
0x0, writing 0x10)
[  434.844205] pciback :00:1b.0: restoring config space at offset 0x4 (was 
0x10, writing 0x16)

xl dmesg shows:
(XEN) [2014-07-14 16:02:27] memory_map:remove: dom1 gfn=f3070 mfn=f7d30 nr=4
(XEN) [2014-07-14 16:02:27] io.c:322: d1: unbind: m_gsi=22 g_gsi=40 dev=00:00.6 
intx=0
(XEN) [2014-07-14 16:02:27] io.c:390: d1 unmap: m_irq=22 dev=00:00.6 intx=0
(XEN) [2014-07-14 16:02:27] [VT-D]iommu.c:1579: d1:PCIe: unmap :00:1b.0
(XEN) [2014-07-14 16:02:27] [VT-D]iommu.c:1440: d0:PCIe: map :00:1b.0

root@dom0:~# xl pci-assignable-list
:00:1b.0

root@dom0:~# xl pci-assignable-remove 00:1b.0
dmesg shows:
[  609.246406] xen_pciback: ** removing device :00:1b.0 while still 
in-use by domain 1! **
[  609.248985] xen_pciback: ** driver domain may still access this device's 
i/o resources!
[  609.251344] xen_pciback: ** shutdown driver domain before binding device
[  609.253291] xen_pciback: ** to other drivers or domains
[  609.355083] xen: xen_unregister_device_domain_owner
[  609.356448] xen: xen_unregister_device_domain_owner
[  609.357789] xen: xen_unregister_device_domain_owner: ENODEV
[  609.463125] pciback :00:1b.0: restoring config space at offset 0x10 (was 
0x4, writing 0xf7d30004)
[  609.465692] pciback :00:1b.0: restoring config space at offset 0xc (was 
0x0, writing 0x10)
[  609.468137] pciback :00:1b.0: restoring config space at offset 0x4 (was 
0x10, writing 0x16)

root@dom0:~# xl pci-list router
Vdev Device
05.0 :02:00.0

root@dom0:~# xl pci-detach router 02:00.0
dmesg shows:
[  930.571300] pciback :02:00.0: restoring config space at offset 0x3c (was 
0x100, writing 0x104)
[  930.574140] pciback :02:00.0: restoring config space at offset 0x10 (was 
0x4, writing 0xf7c4)
[  930

Re: [Xen-devel] [PATCH v4] PCI back fixes for 3.17.

2014-07-14 Thread Sander Eikelenboom

Monday, July 14, 2014, 7:22:25 PM, you wrote:

> On Mon, Jul 14, 2014 at 06:37:55PM +0200, Sander Eikelenboom wrote:
>> 
>> Friday, July 11, 2014, 10:08:47 PM, you wrote:
>> 
>> > Please see this set of patches which are fixes to Xen pciback
>> > for 3.17. They are also located at:
>> 
>> >  git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen.git 
>> > devel/pciback-3.17.v4 
>> 
>> > These patches do not include the PCI bus reset/slot code as we are still
>> > discussing that.
>> 
>> > Konrad Rzeszutek Wilk (5):
>> >   xen-pciback: Document the various parameters and attributes in SysFS
>> >   xen/pciback: Don't deadlock when unbinding.
>> >   xen/pciback: Include the domain id if removing the device whilst 
>> > still in use
>> >   xen/pciback: Print out the domain owning the device.
>> >   xen/pciback: Remove tons of dereferences
>> 
>> >  Documentation/ABI/testing/sysfs-driver-pciback |   25 +++
>> >  drivers/xen/xen-pciback/passthrough.c  |9 -
>> >  drivers/xen/xen-pciback/pci_stub.c |   41 
>> > +
>> >  drivers/xen/xen-pciback/pciback.h  |7 ++--
>> >  drivers/xen/xen-pciback/vpci.c |9 -
>> >  drivers/xen/xen-pciback/xenbus.c   |4 +-
>> >  6 files changed, 67 insertions(+), 28 deletions(-)
>> 
>> Hi Konrad / David,
>> 
>> Thanks for the fixes in this series, i just tested this series and noticed 
>> (as 
>> somewhat expected :-) ) it's still lacking a fix for the bug that pciback 
>> doesn't properly free / disown a device from a HVM guest.
>> 
>> This only happens when using "xl pci-detach domain BDF"
>>   AND
>> when the guest has more than one pci device attached and you remove only ONE 
>> of 
>> them.
>> 
>> The pci-detach doesn't show an error, the device is removed from the guest, 
>> but 
>> it seems it is not internally freed in pciback.
>> 
>> Below is the sequence and outcome (with some added printk's) for two 
>> situations:
>> A) Guest has only one pci-device passed through which is then removed, as 
>> you can see 
>> it's freed in pciback as well.
>> 
>> B) Guest has two pci-devices passed through from which one is then removed. 
>> After that the second is removed as well.
>> 
>> >From what i recall Konrad thought it could be due to the guest being a HVM 
>> >and thus the signaling is different from a
>> PV-Guest (no pci-front confirming the removal via xenbus). But since it does 
>> get freed when it is the last device attached to the guest
>> i don't know if that's completely true.
>> It's now semi-fixed when doing the 'xl pci-assignable-remove', that seems to 
>> force unregistring the device owner, however something still doesn't seem
>> right see the second "xl pci-detach" removing the last device from (A), it 
>> takes about a minute instead of not more than a second.
>> 
>> Also note that when on calling "xl pci-assignable-remove" for the last (or 
>> only) passed through device for a guest, you don't get the warning messages 
>> about
>> the device being 'in-use'
>> 
>> Completely below an diff of the added printk's.
>> 
>> --
>> 
>> Sander
>> 
>> 
>> Ad A)
>> 
>> root@dom0:~# xl pci-list router
>> Vdev Device
>> 05.0 :02:00.0
>> 06.0 :00:1b.0
>> 
>> root@dom0:~# xl pci-assignable-list
>> 
>> root@dom0:~# xl pci-detach router 00:1b.0
>> 
>> dmesg shows:
>> [  434.839156] pciback :00:1b.0: restoring config space at offset 0x10 
>> (was 0x4, writing 0xf7d30004)
>> [  434.841745] pciback :00:1b.0: restoring config space at offset 0xc 
>> (was 0x0, writing 0x10)
>> [  434.844205] pciback :00:1b.0: restoring config space at offset 0x4 
>> (was 0x10, writing 0x16)
>> 
>> xl dmesg shows:
>> (XEN) [2014-07-14 16:02:27] memory_map:remove: dom1 gfn=f3070 mfn=f7d30 nr=4
>> (XEN) [2014-07-14 16:02:27] io.c:322: d1: unbind: m_gsi=22 g_gsi=40 
>> dev=00:00.6 intx=0
>> (XEN) [2014-07-14 16:02:27] io.c:390: d1 unmap: m_irq=22 dev=00:00.6 intx=0
>> (XEN) [2014-07-14 16:02:27] [VT-D]iommu.c:1579: d1:PCIe: unmap :00:1b.0
>> (XEN) [2014-07-14 16:02:27] [VT-D]iommu.c:1440: d0:PCIe: map :00:1b.0
>> 
>> root@dom0:~# xl p

Re: [Xen-devel] [PATCH v4] PCI back fixes for 3.17.

2014-07-14 Thread Sander Eikelenboom

Monday, July 14, 2014, 7:37:53 PM, you wrote:

>> >> Ad B)
>> >> 
>> >> root@dom0:~# xl pci-list router
>> >> Vdev Device
>> >> 05.0 :00:1b.0
>> >> 
>> >> root@dom0:~# xl pci-assignable-list
>> >> :02:00.0
>> >> 
>> >> root@dom0:~# xl pci-detach router 00:1b.0
>> >> dmesg shows:
>> >> [  199.742668] pciback :00:1b.0: restoring config space at offset 
>> >> 0x10 (was 0x4, writing 0xf7d30004)
>> >> [  199.743527] pciback :00:1b.0: restoring config space at offset 0xc 
>> >> (was 0x0, writing 0x10)
>> >> [  199.744321] pciback :00:1b.0: restoring config space at offset 0x4 
>> >> (was 0x10, writing 0x16)
>> >> [  199.757184] xen-pciback pci-1-0: xen_pcibk_xenbus_remove freeing pdev 
>> >> @ 0x8800589fce40
>> >> [  199.758139] xen-pciback pci-1-0: xen_pcibk_disconnect pdev @ 
>> >> 0x8800589fce40
>> >> [  199.862595] xen: xen_unregister_device_domain_owner
>> >> 
>> >> xl dmesg shows:
>> >> (XEN) [2014-07-14 16:28:29] memory_map:remove: dom1 gfn=f3070 mfn=f7d30 
>> >> nr=4
>> >> (XEN) [2014-07-14 16:28:29] io.c:322: d1: unbind: m_gsi=22 g_gsi=36 
>> >> dev=00:00.5 intx=0
>> >> (XEN) [2014-07-14 16:28:29] io.c:390: d1 unmap: m_irq=22 dev=00:00.5 
>> >> intx=0
>> >> (XEN) [2014-07-14 16:28:29] [VT-D]iommu.c:1579: d1:PCIe: unmap 
>> >> :00:1b.0
>> >> (XEN) [2014-07-14 16:28:29] [VT-D]iommu.c:1440: d0:PCIe: map :00:1b.0
>> >> 
>> >> root@dom0:~# xl pci-list router
>> >> root@dom0:~# xl pci-assignable-list
>> >> :00:1b.0
>> >> :02:00.0
>> >> 
>> >> root@dom0:~# xl pci-assignable-remove 00:1b.0
>> >> dmesg shows:
>> >> [  318.827415] xen: xen_unregister_device_domain_owner
>> >> [  318.828771] xen: xen_unregister_device_domain_owner: ENODEV
>> >> [  318.930869] pciback :00:1b.0: restoring config space at offset 
>> >> 0x10 (was 0x4, writing 0xf7d30004)
>> >> [  318.933435] pciback :00:1b.0: restoring config space at offset 0xc 
>> >> (was 0x0, writing 0x10)
>> >> [  318.935877] pciback :00:1b.0: restoring config space at offset 0x4 
>> >> (was 0x10, writing 0x16)
>> >> 
>> >> root@dom0:~# xl pci-list router
>> >> root@dom0:~# xl pci-assignable-list
>> >> :02:00.0
>> >> 
>> >> 
>> 
>> > And if you do:
>> 
>> > # xl pci-detach router 02:00.0
>> 

> Err, I meant
> # xl pci-assignable-remove 02:00.0

Ah ok .. so that is:
remove a device from pciback that has never been assigned to any guest .. 

will also give that a go .. although that probably won't be a problem. 

>> > Do you see it being cleared from pciback? And what do you
>> > see in /sys/bus/pci/drivers/pciback ?
>> 
>> Hmm good point .. i also had the plan to look into xenstore what was in 
>> there .. 
>> but forgot .. will post both right away :-) 

> Thank you!
>> 
>> > Thanks!
>> 
>> 


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Xen-devel] [PATCH v4] PCI back fixes for 3.17.

2014-07-14 Thread Sander Eikelenboom

Monday, July 14, 2014, 7:45:25 PM, you wrote:

> On Mon, Jul 14, 2014 at 07:43:04PM +0200, Sander Eikelenboom wrote:
>> 
>> Monday, July 14, 2014, 7:37:53 PM, you wrote:
>> 
>> >> >> Ad B)
>> >> >> 
>> >> >> root@dom0:~# xl pci-list router
>> >> >> Vdev Device
>> >> >> 05.0 :00:1b.0
>> >> >> 
>> >> >> root@dom0:~# xl pci-assignable-list
>> >> >> :02:00.0
>> >> >> 
>> >> >> root@dom0:~# xl pci-detach router 00:1b.0
>> >> >> dmesg shows:
>> >> >> [  199.742668] pciback :00:1b.0: restoring config space at offset 
>> >> >> 0x10 (was 0x4, writing 0xf7d30004)
>> >> >> [  199.743527] pciback :00:1b.0: restoring config space at offset 
>> >> >> 0xc (was 0x0, writing 0x10)
>> >> >> [  199.744321] pciback :00:1b.0: restoring config space at offset 
>> >> >> 0x4 (was 0x10, writing 0x16)
>> >> >> [  199.757184] xen-pciback pci-1-0: xen_pcibk_xenbus_remove freeing 
>> >> >> pdev @ 0x8800589fce40
>> >> >> [  199.758139] xen-pciback pci-1-0: xen_pcibk_disconnect pdev @ 
>> >> >> 0x8800589fce40
>> >> >> [  199.862595] xen: xen_unregister_device_domain_owner
>> >> >> 
>> >> >> xl dmesg shows:
>> >> >> (XEN) [2014-07-14 16:28:29] memory_map:remove: dom1 gfn=f3070 
>> >> >> mfn=f7d30 nr=4
>> >> >> (XEN) [2014-07-14 16:28:29] io.c:322: d1: unbind: m_gsi=22 g_gsi=36 
>> >> >> dev=00:00.5 intx=0
>> >> >> (XEN) [2014-07-14 16:28:29] io.c:390: d1 unmap: m_irq=22 dev=00:00.5 
>> >> >> intx=0
>> >> >> (XEN) [2014-07-14 16:28:29] [VT-D]iommu.c:1579: d1:PCIe: unmap 
>> >> >> :00:1b.0
>> >> >> (XEN) [2014-07-14 16:28:29] [VT-D]iommu.c:1440: d0:PCIe: map 
>> >> >> :00:1b.0
>> >> >> 
>> >> >> root@dom0:~# xl pci-list router
>> >> >> root@dom0:~# xl pci-assignable-list
>> >> >> :00:1b.0
>> >> >> :02:00.0
>> >> >> 
>> >> >> root@dom0:~# xl pci-assignable-remove 00:1b.0
>> >> >> dmesg shows:
>> >> >> [  318.827415] xen: xen_unregister_device_domain_owner
>> >> >> [  318.828771] xen: xen_unregister_device_domain_owner: ENODEV
>> >> >> [  318.930869] pciback :00:1b.0: restoring config space at offset 
>> >> >> 0x10 (was 0x4, writing 0xf7d30004)
>> >> >> [  318.933435] pciback :00:1b.0: restoring config space at offset 
>> >> >> 0xc (was 0x0, writing 0x10)
>> >> >> [  318.935877] pciback :00:1b.0: restoring config space at offset 
>> >> >> 0x4 (was 0x10, writing 0x16)
>> >> >> 
>> >> >> root@dom0:~# xl pci-list router
>> >> >> root@dom0:~# xl pci-assignable-list
>> >> >> :02:00.0
>> >> >> 
>> >> >> 
>> >> 
>> >> > And if you do:
>> >> 
>> >> > # xl pci-detach router 02:00.0
>> >> 
>> 
>> > Err, I meant
>> > # xl pci-assignable-remove 02:00.0
>> 
>> Ah ok .. so that is:
>> remove a device from pciback that has never been assigned to any guest .. 
>> 
>> will also give that a go .. although that probably won't be a problem. 

> Right. So I think it all works as expected? That is if you have
> two PCI devices assigned to a guest and want to re-use them you
> have to do the 'pci-detach' twice and then follow that with
>  'pci-assignable-remove' twice as well?

Well except that's not what i want :-) .. that's about the same as shutting 
down 
the guest .. and indeed that works :-)

What i want to do is to remove just *one* device and leave the other one in the 
guest.

The one that i remove .. i would like to be able to remove it from being 
assignable .. (rebind it in dom0) and/or assign it to another guest.

And that fails because pciback still thinks the guest owns the device ..
while libxl thinks it doesn't (outcome of xl pci-assignable-list).

When doing the:

# xl pci-assignable-list
:02:00.0
# xl pci-assignable-remove 02:00.0
dmesg shows:   
[  443.292951] xen: xen_unregister_device_domain_owner
[  443.294308] xen: xen_unregister_device_domain_owner: ENODEV
[  443.3988

Re: [Xen-devel] [PATCH v4] PCI back fixes for 3.17.

2014-07-14 Thread Sander Eikelenboom

Monday, July 14, 2014, 8:45:47 PM, you wrote:

> On Mon, Jul 14, 2014 at 08:24:33PM +0200, Sander Eikelenboom wrote:
>> 
>> Monday, July 14, 2014, 7:45:25 PM, you wrote:
>> 
>> > On Mon, Jul 14, 2014 at 07:43:04PM +0200, Sander Eikelenboom wrote:
>> >> 
>> >> Monday, July 14, 2014, 7:37:53 PM, you wrote:
>> >> 
>> >> >> >> Ad B)
>> >> >> >> 
>> >> >> >> root@dom0:~# xl pci-list router
>> >> >> >> Vdev Device
>> >> >> >> 05.0 :00:1b.0
>> >> >> >> 
>> >> >> >> root@dom0:~# xl pci-assignable-list
>> >> >> >> :02:00.0
>> >> >> >> 
>> >> >> >> root@dom0:~# xl pci-detach router 00:1b.0
>> >> >> >> dmesg shows:
>> >> >> >> [  199.742668] pciback :00:1b.0: restoring config space at 
>> >> >> >> offset 0x10 (was 0x4, writing 0xf7d30004)
>> >> >> >> [  199.743527] pciback :00:1b.0: restoring config space at 
>> >> >> >> offset 0xc (was 0x0, writing 0x10)
>> >> >> >> [  199.744321] pciback :00:1b.0: restoring config space at 
>> >> >> >> offset 0x4 (was 0x10, writing 0x16)
>> >> >> >> [  199.757184] xen-pciback pci-1-0: xen_pcibk_xenbus_remove freeing 
>> >> >> >> pdev @ 0x8800589fce40
>> >> >> >> [  199.758139] xen-pciback pci-1-0: xen_pcibk_disconnect pdev @ 
>> >> >> >> 0x8800589fce40
>> >> >> >> [  199.862595] xen: xen_unregister_device_domain_owner
>> >> >> >> 
>> >> >> >> xl dmesg shows:
>> >> >> >> (XEN) [2014-07-14 16:28:29] memory_map:remove: dom1 gfn=f3070 
>> >> >> >> mfn=f7d30 nr=4
>> >> >> >> (XEN) [2014-07-14 16:28:29] io.c:322: d1: unbind: m_gsi=22 g_gsi=36 
>> >> >> >> dev=00:00.5 intx=0
>> >> >> >> (XEN) [2014-07-14 16:28:29] io.c:390: d1 unmap: m_irq=22 
>> >> >> >> dev=00:00.5 intx=0
>> >> >> >> (XEN) [2014-07-14 16:28:29] [VT-D]iommu.c:1579: d1:PCIe: unmap 
>> >> >> >> :00:1b.0
>> >> >> >> (XEN) [2014-07-14 16:28:29] [VT-D]iommu.c:1440: d0:PCIe: map 
>> >> >> >> :00:1b.0
>> >> >> >> 
>> >> >> >> root@dom0:~# xl pci-list router
>> >> >> >> root@dom0:~# xl pci-assignable-list
>> >> >> >> :00:1b.0
>> >> >> >> :02:00.0
>> >> >> >> 
>> >> >> >> root@dom0:~# xl pci-assignable-remove 00:1b.0
>> >> >> >> dmesg shows:
>> >> >> >> [  318.827415] xen: xen_unregister_device_domain_owner
>> >> >> >> [  318.828771] xen: xen_unregister_device_domain_owner: ENODEV
>> >> >> >> [  318.930869] pciback :00:1b.0: restoring config space at 
>> >> >> >> offset 0x10 (was 0x4, writing 0xf7d30004)
>> >> >> >> [  318.933435] pciback :00:1b.0: restoring config space at 
>> >> >> >> offset 0xc (was 0x0, writing 0x10)
>> >> >> >> [  318.935877] pciback :00:1b.0: restoring config space at 
>> >> >> >> offset 0x4 (was 0x10, writing 0x16)
>> >> >> >> 
>> >> >> >> root@dom0:~# xl pci-list router
>> >> >> >> root@dom0:~# xl pci-assignable-list
>> >> >> >> :02:00.0
>> >> >> >> 
>> >> >> >> 
>> >> >> 
>> >> >> > And if you do:
>> >> >> 
>> >> >> > # xl pci-detach router 02:00.0
>> >> >> 
>> >> 
>> >> > Err, I meant
>> >> > # xl pci-assignable-remove 02:00.0
>> >> 
>> >> Ah ok .. so that is:
>> >> remove a device from pciback that has never been assigned to any guest .. 
>> >> 
>> >> will also give that a go .. although that probably won't be a problem. 
>> 
>> > Right. So I think it all works as expected? That is if you have
>> > two PCI devices assigned to a guest and want to re-use them you
>> > have to do the 'pci-detach' twice

Re: [Xen-devel] [PATCH v4] PCI back fixes for 3.17.

2014-07-14 Thread Sander Eikelenboom

Monday, July 14, 2014, 9:01:29 PM, you wrote:


> Monday, July 14, 2014, 8:45:47 PM, you wrote:

>> On Mon, Jul 14, 2014 at 08:24:33PM +0200, Sander Eikelenboom wrote:
>>> 
>>> Monday, July 14, 2014, 7:45:25 PM, you wrote:
>>> 
>>> > On Mon, Jul 14, 2014 at 07:43:04PM +0200, Sander Eikelenboom wrote:
>>> >> 
>>> >> Monday, July 14, 2014, 7:37:53 PM, you wrote:
>>> >> 
>>> >> >> >> Ad B)
>>> >> >> >> 
>>> >> >> >> root@dom0:~# xl pci-list router
>>> >> >> >> Vdev Device
>>> >> >> >> 05.0 :00:1b.0
>>> >> >> >> 
>>> >> >> >> root@dom0:~# xl pci-assignable-list
>>> >> >> >> :02:00.0
>>> >> >> >> 
>>> >> >> >> root@dom0:~# xl pci-detach router 00:1b.0
>>> >> >> >> dmesg shows:
>>> >> >> >> [  199.742668] pciback :00:1b.0: restoring config space at 
>>> >> >> >> offset 0x10 (was 0x4, writing 0xf7d30004)
>>> >> >> >> [  199.743527] pciback :00:1b.0: restoring config space at 
>>> >> >> >> offset 0xc (was 0x0, writing 0x10)
>>> >> >> >> [  199.744321] pciback :00:1b.0: restoring config space at 
>>> >> >> >> offset 0x4 (was 0x10, writing 0x16)
>>> >> >> >> [  199.757184] xen-pciback pci-1-0: xen_pcibk_xenbus_remove 
>>> >> >> >> freeing pdev @ 0x8800589fce40
>>> >> >> >> [  199.758139] xen-pciback pci-1-0: xen_pcibk_disconnect pdev @ 
>>> >> >> >> 0x8800589fce40
>>> >> >> >> [  199.862595] xen: xen_unregister_device_domain_owner
>>> >> >> >> 
>>> >> >> >> xl dmesg shows:
>>> >> >> >> (XEN) [2014-07-14 16:28:29] memory_map:remove: dom1 gfn=f3070 
>>> >> >> >> mfn=f7d30 nr=4
>>> >> >> >> (XEN) [2014-07-14 16:28:29] io.c:322: d1: unbind: m_gsi=22 
>>> >> >> >> g_gsi=36 dev=00:00.5 intx=0
>>> >> >> >> (XEN) [2014-07-14 16:28:29] io.c:390: d1 unmap: m_irq=22 
>>> >> >> >> dev=00:00.5 intx=0
>>> >> >> >> (XEN) [2014-07-14 16:28:29] [VT-D]iommu.c:1579: d1:PCIe: unmap 
>>> >> >> >> :00:1b.0
>>> >> >> >> (XEN) [2014-07-14 16:28:29] [VT-D]iommu.c:1440: d0:PCIe: map 
>>> >> >> >> :00:1b.0
>>> >> >> >> 
>>> >> >> >> root@dom0:~# xl pci-list router
>>> >> >> >> root@dom0:~# xl pci-assignable-list
>>> >> >> >> :00:1b.0
>>> >> >> >> :02:00.0
>>> >> >> >> 
>>> >> >> >> root@dom0:~# xl pci-assignable-remove 00:1b.0
>>> >> >> >> dmesg shows:
>>> >> >> >> [  318.827415] xen: xen_unregister_device_domain_owner
>>> >> >> >> [  318.828771] xen: xen_unregister_device_domain_owner: ENODEV
>>> >> >> >> [  318.930869] pciback :00:1b.0: restoring config space at 
>>> >> >> >> offset 0x10 (was 0x4, writing 0xf7d30004)
>>> >> >> >> [  318.933435] pciback :00:1b.0: restoring config space at 
>>> >> >> >> offset 0xc (was 0x0, writing 0x10)
>>> >> >> >> [  318.935877] pciback :00:1b.0: restoring config space at 
>>> >> >> >> offset 0x4 (was 0x10, writing 0x16)
>>> >> >> >> 
>>> >> >> >> root@dom0:~# xl pci-list router
>>> >> >> >> root@dom0:~# xl pci-assignable-list
>>> >> >> >> :02:00.0
>>> >> >> >> 
>>> >> >> >> 
>>> >> >> 
>>> >> >> > And if you do:
>>> >> >> 
>>> >> >> > # xl pci-detach router 02:00.0
>>> >> >> 
>>> >> 
>>> >> > Err, I meant
>>> >> > # xl pci-assignable-remove 02:00.0
>>> >> 
>>> >> Ah ok .. so that is:
>>> >> remove a device from pciback that has never been assigned to a

Re: [Xen-devel] [PATCH v4] PCI back fixes for 3.17.

2014-07-14 Thread Sander Eikelenboom

Monday, July 14, 2014, 9:54:05 PM, you wrote:

> On Mon, Jul 14, 2014 at 09:01:29PM +0200, Sander Eikelenboom wrote:
>> 
>> Monday, July 14, 2014, 8:45:47 PM, you wrote:
>> 
>> > On Mon, Jul 14, 2014 at 08:24:33PM +0200, Sander Eikelenboom wrote:
>> >> 
>> >> Monday, July 14, 2014, 7:45:25 PM, you wrote:
>> >> 
>> >> > On Mon, Jul 14, 2014 at 07:43:04PM +0200, Sander Eikelenboom wrote:
>> >> >> 
>> >> >> Monday, July 14, 2014, 7:37:53 PM, you wrote:
>> >> >> 
>> >> >> >> >> Ad B)
>> >> >> >> >> 
>> >> >> >> >> root@dom0:~# xl pci-list router
>> >> >> >> >> Vdev Device
>> >> >> >> >> 05.0 :00:1b.0
>> >> >> >> >> 
>> >> >> >> >> root@dom0:~# xl pci-assignable-list
>> >> >> >> >> :02:00.0
>> >> >> >> >> 
>> >> >> >> >> root@dom0:~# xl pci-detach router 00:1b.0
>> >> >> >> >> dmesg shows:
>> >> >> >> >> [  199.742668] pciback :00:1b.0: restoring config space at 
>> >> >> >> >> offset 0x10 (was 0x4, writing 0xf7d30004)
>> >> >> >> >> [  199.743527] pciback :00:1b.0: restoring config space at 
>> >> >> >> >> offset 0xc (was 0x0, writing 0x10)
>> >> >> >> >> [  199.744321] pciback :00:1b.0: restoring config space at 
>> >> >> >> >> offset 0x4 (was 0x10, writing 0x16)
>> >> >> >> >> [  199.757184] xen-pciback pci-1-0: xen_pcibk_xenbus_remove 
>> >> >> >> >> freeing pdev @ 0x8800589fce40
>> >> >> >> >> [  199.758139] xen-pciback pci-1-0: xen_pcibk_disconnect pdev @ 
>> >> >> >> >> 0x8800589fce40
>> >> >> >> >> [  199.862595] xen: xen_unregister_device_domain_owner
>> >> >> >> >> 
>> >> >> >> >> xl dmesg shows:
>> >> >> >> >> (XEN) [2014-07-14 16:28:29] memory_map:remove: dom1 gfn=f3070 
>> >> >> >> >> mfn=f7d30 nr=4
>> >> >> >> >> (XEN) [2014-07-14 16:28:29] io.c:322: d1: unbind: m_gsi=22 
>> >> >> >> >> g_gsi=36 dev=00:00.5 intx=0
>> >> >> >> >> (XEN) [2014-07-14 16:28:29] io.c:390: d1 unmap: m_irq=22 
>> >> >> >> >> dev=00:00.5 intx=0
>> >> >> >> >> (XEN) [2014-07-14 16:28:29] [VT-D]iommu.c:1579: d1:PCIe: unmap 
>> >> >> >> >> :00:1b.0
>> >> >> >> >> (XEN) [2014-07-14 16:28:29] [VT-D]iommu.c:1440: d0:PCIe: map 
>> >> >> >> >> :00:1b.0
>> >> >> >> >> 
>> >> >> >> >> root@dom0:~# xl pci-list router
>> >> >> >> >> root@dom0:~# xl pci-assignable-list
>> >> >> >> >> :00:1b.0
>> >> >> >> >> :02:00.0
>> >> >> >> >> 
>> >> >> >> >> root@dom0:~# xl pci-assignable-remove 00:1b.0
>> >> >> >> >> dmesg shows:
>> >> >> >> >> [  318.827415] xen: xen_unregister_device_domain_owner
>> >> >> >> >> [  318.828771] xen: xen_unregister_device_domain_owner: ENODEV
>> >> >> >> >> [  318.930869] pciback :00:1b.0: restoring config space at 
>> >> >> >> >> offset 0x10 (was 0x4, writing 0xf7d30004)
>> >> >> >> >> [  318.933435] pciback :00:1b.0: restoring config space at 
>> >> >> >> >> offset 0xc (was 0x0, writing 0x10)
>> >> >> >> >> [  318.935877] pciback :00:1b.0: restoring config space at 
>> >> >> >> >> offset 0x4 (was 0x10, writing 0x16)
>> >> >> >> >> 
>> >> >> >> >> root@dom0:~# xl pci-list router
>> >> >> >> >> root@dom0:~# xl pci-assignable-list
>> >> >> >> >> :02:00.0
>> >> >> >> >> 
>> >> >> >> >> 
>> >> >> >> 
>> >> >> >> 

  1   2   3   >