On Wed, Jul 24, 2019 at 03:27:51PM +0200, Maik Stohn wrote:
> KERNEL CRASH when using XHCI devices (affects any architecture, any USB 
> device) 
> 
> This was already reported as a kernel bug in bugzilla 
> (https://bugzilla.kernel.org/show_bug.cgi?id=204257) but I got told to report 
> it here since it is usb related...
> 
> Affected kernels: 5.2, 5.2.1, 5.2.2, 5.3-rc1, ...
> 
> This bug is already causing real world problems with existing software and 
> devices using SCSI BOT with raw SCSI commands and libusb software.
> 
> Reproduce (tested on several different machines with 5.2,5.2.1,5.2.2,5.3rc1):
> 
>  - usb flash drive attached to XHCI controller (e.g. USB3.0 flash drive 
> attached to USB3.0 port)
>  - generic scsi module loaded (e.g. /dev/sg0 comes up when attaching the 
> flash drive)
>  - command line tool "sg_raw" from "sg3-utils"
>  - execute: and press a key + return (-s1 sends one byte which is read from 
> stdin)
>    $ sudo sg_raw -s1 /dev/sg0 00 00 00 00 00 00 00 00 00 00
>    
>  -> KERNEL Oops
>  
>  - same for -s2, -s3, ... up to -s8   (sending 1 to 8 bytes, exactly the 
> maximum of bytes on my 64 bit machine where the "DMA bypass optimization / 
> IDT" kicks in, see below)
>  
> Since this can be triggered by any normal user (without any special USB 
> device needed) I think it is important enough to fix it for the existing 5.2 
> kernel as well.
> 
> ---
> 
> Patch introducing the crash:  https://patchwork.kernel.org/patch/10919167 / 
> commit 33e39350ebd20fe6a77a51b8c21c3aa6b4a208cf - "usb: xhci: add Immediate 
> Data Transfer support"
> 
> Reason: NULL pointer dereference
> 
> ---
> 
> I took me quite some time to find the cause of this.
> 
> I narrowed the crash down to the place inside of "xhci_queue_bulk_tx" in 
> "xhci-ring.c" where the next SG is loaded
> 
> ...
>     while (sg && sent_len >= block_len) {
>       /* New sg entry */
>       --num_sgs;
>       sent_len -= block_len;
>       if (num_sgs != 0) {
>         sg = sg_next(sg);
>         block_len = sg_dma_len(sg);           <================= CRASH
>                                                                  The comment 
> of "sg_dma_len" clearly states "These macros should be used after a 
> dma_map_sg call has been done..." - which was 
>                                                                  omitted by 
> the new "xhci_map_urb_for_dma" function since the transfer was considered 
> suitable for IDT.
>         addr = (u64) sg_dma_address(sg);
>         addr += sent_len;
>       }
>     }
>     block_len -= sent_len;
>     send_addr = addr;
> ...
> 
> This only happens if the transfer was cosnideres suitable for IDT. 
> When I patched the function "xhci_urb_suitable_for_idt" to always return 
> false (nothing suitable for IDT) everything was working fine. 
> 
> 
> Unfortunately I'm not deep enough into the inner workings of the kernel usb 
> host driver to find a solution for this other than reverting the patch for 
> IDT.

What patch did you find that caused this regression?  We can revert it
if that is the easiest thing to do.

thanks,

greg k-h

Reply via email to