External USB drives become unresponsive after few hours.
I have tested the following kernel versions: - 3.18.4, 3.18.6, 3.18.7, 3.19.4 [all affected] - 3.17.1 [unaffected] - 3.17.8 [probably the last unaffected version; I'm using it currently] Also, I've been using the very same configuration (hardware) along with 2.6.x, 3.2.x, 3.4.x, 3.10.x and have never encountered such a behavior before. And the problem is: When at least one external drive is plugged-in AND mounted, after ~2-4 hours the following occurs (@11315.681561): [ 5570.110523] usb 2-1.2: new high-speed USB device number 5 using ehci-pci [ 5570.852917] usb 2-1.2: New USB device found, idVendor=1058, idProduct=0730 [ 5570.852923] usb 2-1.2: New USB device strings: Mfr=1, Product=2, SerialNumber=3 [ 5570.852927] usb 2-1.2: Product: My Passport 0730 [ 5570.852930] usb 2-1.2: Manufacturer: Western Digital [ 5570.852933] usb 2-1.2: SerialNumber: [ 5570.853517] usb-storage 2-1.2:1.0: USB Mass Storage device detected [ 5570.853691] scsi host8: usb-storage 2-1.2:1.0 [ 5572.932659] scsi 8:0:0:0: Direct-Access WD My Passport 0730 1012 PQ: 0 ANSI: 6 [ 5572.933013] sd 8:0:0:0: Attached scsi generic sg5 type 0 [ 5575.306801] scsi 8:0:0:1: Enclosure WD SES Device 1012 PQ: 0 ANSI: 6 [ 5575.307160] sd 8:0:0:0: [sdc] 976707584 512-byte logical blocks: (500 GB/465 GiB) [ 5575.308405] sd 8:0:0:0: [sdc] Write Protect is off [ 5575.308416] sd 8:0:0:0: [sdc] Mode Sense: 47 00 10 08 [ 5575.309772] sd 8:0:0:0: [sdc] No Caching mode page found [ 5575.309776] sd 8:0:0:0: [sdc] Assuming drive cache: write through [ 5575.311176] scsi 8:0:0:1: Attached scsi generic sg6 type 13 [ 5575.328540] sdc: sdc1 [ 5575.331026] sd 8:0:0:0: [sdc] Attached SCSI disk [11315.681561] ehci-pci :00:1d.0: swiotlb buffer is full (sz: 32768 bytes) [11315.681565] DMA: Out of SW-IOMMU space for 32768 bytes at device :00:1d.0 [11315.681874] ehci-pci :00:1d.0: swiotlb buffer is full (sz: 32768 bytes) [11315.681876] DMA: Out of SW-IOMMU space for 32768 bytes at device :00:1d.0 [11315.682171] ehci-pci :00:1d.0: swiotlb buffer is full (sz: 32768 bytes) [11315.682174] DMA: Out of SW-IOMMU space for 32768 bytes at device :00:1d.0 [...and so on...] The amount of bytes may vary, e.g.: DMA: Out of SW-IOMMU space for 65536 bytes at device :00:1d.0 Also, a *usb-storage* process drains one of CPU cores and can't be killed even with -9. When the above occurs, the drive becomes inaccessible and can not be unmounted. The only way is to unplug it (usb-storage process terminates at this point). Reboot is also necessary, because error messages keep flooding the log and some problems with network (eth0) may also happen - it's usable, but slows down significantly (e.g. when loading a webpage). My equipment: Toshiba L505-138 laptop (USB-2.0 only) + 2 external 'WD My Passport' USB drives (750GB: USB-2.0 & 500GB: USB-2.0/3.0). My system: Fatdog64-700 (a 64bit distro inspired by Puppy Linux and built from LFS). http://distro.ibiblio.org/fatdog/web/ Here's where I posted about the issue for the first time: http://www.murga-linux.com/puppy/viewtopic.php?p=828168#828168 (so far no one else has reported a similar issue) Attached 'lscpi -k' output. Just let me know if you need some more/specific info. Best regards, Jake (SFR) 00:00.0 Host bridge: Intel Corporation Core Processor DRAM Controller (rev 02) Subsystem: Toshiba America Info Systems Device ff00 00:01.0 PCI bridge: Intel Corporation Core Processor PCI Express x16 Root Port (rev 02) Kernel driver in use: pcieport 00:16.0 Communication controller: Intel Corporation 5 Series/3400 Series Chipset HECI Controller (rev 06) Subsystem: Toshiba America Info Systems Device ff00 00:1a.0 USB controller: Intel Corporation 5 Series/3400 Series Chipset USB2 Enhanced Host Controller (rev 05) Subsystem: Toshiba America Info Systems Device ff00 Kernel driver in use: ehci-pci 00:1b.0 Audio device: Intel Corporation 5 Series/3400 Series Chipset High Definition Audio (rev 05) Subsystem: Toshiba America Info Systems Device ff00 Kernel driver in use: snd_hda_intel Kernel modules: snd_hda_intel 00:1c.0 PCI bridge: Intel Corporation 5 Series/3400 Series Chipset PCI Express Root Port 1 (rev 05) Kernel driver in use: pcieport 00:1c.1 PCI bridge: Intel Corporation 5 Series/3400 Series Chipset PCI Express Root Port 2 (rev 05) Kernel driver in use: pcieport 00:1c.2 PCI bridge: Intel Corporation 5 Series/3400 Series Chipset PCI Express Root Port 3 (rev 05) Kernel driver in use: pcieport 00:1c.3 PCI bridge: Intel Corporation 5 Series/3400 Series Chipset PCI Express Root Port 4 (rev 05) Kernel driver in use: pcieport 00:1c.4 PCI bridge: Intel Corporation 5 Series/3400 Series Chipset PCI Express Root Port 5 (rev 05) Kernel driver in use: pcieport 00:1c.5 PCI bridge: Intel Corporation 5 Series/3400 Series Chipset PCI Express Root Port 6 (rev 05) Kernel driver in use: pcie
Re: Error: DMA: Out of SW-IOMMU space [was: External USB drives become unresponsive after few hours.]
On 16 April 2015 at 16:15, Alan Stern wrote: > This appears to be a problem with the IOMMU or SWIOTLB subsystems, not > the USB subsystem. I have CC'ed the appropriate mailing lists. Thanks, I'm far from being a kernel expert, so was expecting it could be wrong subsection. On 16 April 2015 at 16:24, Suman Tripathi wrote: > Try increasing the SWIOTLB size to 128MB .Default is 64MB. Ok, so I'm back to k3.18.7 (default in the latest Fatdog), although I'm not sure what should be the exact value of swiotlb boot param? Got totally mixed results from uncle Google - some says the unit is in MiB, some that it's 4k pages and another that 128MiB = 65536, so I played it safe and used swiotlb=131072. Is this correct? It may take a few days, but I'll let you know if it worked (or for how long, if not). On 16 April 2015 at 16:54, Alexander Duyck wrote: > More likely would be a device driver that is DMA mapping memory but not > unmapping it after it is done resulting in the bounce buffer pool being > depleted. > You might want dump the list of drivers loaded on the system with lsmod, > and then possibly look at doing a git bisect for something introduced > between 3.17 and 3.18 since that seems to be when you started seeing > this issue. Ok, I'll (try to) look at this, but like I said - I'm not a kernel (nor git) expert. Anyway, I guess I'm gonna start with this: https://wiki.gentoo.org/wiki/Kernel_git-bisect Who knows...perhaps I'll find something... Thank you all for the replies. Jake -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Error: DMA: Out of SW-IOMMU space [was: External USB drives become unresponsive after few hours.]
On 16 April 2015 at 20:42, Konrad Rzeszutek Wilk wrote: > And easier way is to compile the kernel with CONFIG_DMA_API_DEBUG > and then load the attached module. > > That should tell you who and what else is holding on the buffers. Thanks, this will be my next step then, right after I'm done with testing the increased SWIOTLB. Jake -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Error: DMA: Out of SW-IOMMU space [was: External USB drives become unresponsive after few hours.]
On 16 April 2015 at 18:57, Dorian Gray wrote: > On 16 April 2015 at 16:24, Suman Tripathi wrote: >> Try increasing the SWIOTLB size to 128MB .Default is 64MB. > > Ok, so I'm back to k3.18.7 (default in the latest Fatdog), although > I'm not sure what should be the exact value of swiotlb boot param? > Got totally mixed results from uncle Google - some says the unit is in > MiB, some that it's 4k pages and another that 128MiB = 65536, so I > played it safe and used swiotlb=131072. > Is this correct? > It may take a few days, but I'll let you know if it worked (or for how > long, if not). I was running 3.18.7 + swiotlb=131072 + 2 external drives plugged-in and mounted for about 18 hours straight. The error didn't show up. Well, I would run it a little longer, but I had to restart X and while doing so, the system crashed for an unknown reason. Anyway, this seems to be quite reliable workaround - at least I can _use_ kernels newer than 3.17.8, because with that bug, popping up after a couple of hours of uptime, it was a total show stopper to me. Thanks! Jake -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Error: DMA: Out of SW-IOMMU space [was: External USB drives become unresponsive after few hours.]
On 16 April 2015 at 20:42, Konrad Rzeszutek Wilk wrote: > And easier way is to compile the kernel with CONFIG_DMA_API_DEBUG > and then load the attached module. > > That should tell you who and what else is holding on the buffers. Ok, I have compiled 3.19.4 w/ CONFIG_DMA_API_DEBUG=y + the module you sent me. Now, I'm not sure if I've done it right - I waited until the error occured and then modprobe'd dump_dma. I have attached the kernel log, but it tells me not much, if anything... Thanks again. Jake dump_dma.log.tar.bz2 Description: BZip2 compressed data
Re: Error: DMA: Out of SW-IOMMU space [was: External USB drives become unresponsive after few hours.]
On 17 April 2015 at 22:06, Konrad Rzeszutek Wilk wrote: > On Fri, Apr 17, 2015 at 05:14:20PM +0200, Dorian Gray wrote: >> On 16 April 2015 at 20:42, Konrad Rzeszutek Wilk >> wrote: >> > And easier way is to compile the kernel with CONFIG_DMA_API_DEBUG >> > and then load the attached module. >> > >> > That should tell you who and what else is holding on the buffers. >> >> Ok, I have compiled 3.19.4 w/ CONFIG_DMA_API_DEBUG=y + the module you sent >> me. >> Now, I'm not sure if I've done it right - I waited until the error >> occured and then modprobe'd dump_dma. >> I have attached the kernel log, but it tells me not much, if anything... > > The network driver is quite hungry for DMA. Did it do the same thing > in the earlier kernels? > > Thanks. >> >> Thanks again. >> Jake > > Yeah, you're right: # grep rtl8192se dump_dma_k3.19.4.log | wc -l 6789 # # grep rtl8192se dump_dma_k3.17.8.log | wc -l 162 # So, wlan driver would be the real culprit then..? I would have never thought... I guess I'm gonna test 3.19.4 once more (just to be sure) with rtl8192se removed and see what happens. Thanks! Jake dump_dma_logs.tar.bz2 Description: BZip2 compressed data
Re: Error: DMA: Out of SW-IOMMU space [was: External USB drives become unresponsive after few hours.]
On 18 April 2015 at 12:10, Dorian Gray wrote: > On 17 April 2015 at 22:06, Konrad Rzeszutek Wilk > wrote: >> On Fri, Apr 17, 2015 at 05:14:20PM +0200, Dorian Gray wrote: >>> On 16 April 2015 at 20:42, Konrad Rzeszutek Wilk >>> wrote: >>> > And easier way is to compile the kernel with CONFIG_DMA_API_DEBUG >>> > and then load the attached module. >>> > >>> > That should tell you who and what else is holding on the buffers. >>> >>> Ok, I have compiled 3.19.4 w/ CONFIG_DMA_API_DEBUG=y + the module you sent >>> me. >>> Now, I'm not sure if I've done it right - I waited until the error >>> occured and then modprobe'd dump_dma. >>> I have attached the kernel log, but it tells me not much, if anything... >> >> The network driver is quite hungry for DMA. Did it do the same thing >> in the earlier kernels? >> >> Thanks. >>> >>> Thanks again. >>> Jake >> >> > > Yeah, you're right: > > # grep rtl8192se dump_dma_k3.19.4.log | wc -l > 6789 > # > # grep rtl8192se dump_dma_k3.17.8.log | wc -l > 162 > # > > So, wlan driver would be the real culprit then..? > I would have never thought... > > I guess I'm gonna test 3.19.4 once more (just to be sure) with > rtl8192se removed and see what happens. > > Thanks! > Jake [update] Ok, 6 hours of uptime (3.19.4 + blacklisted rtl8192se) and everything was fine... However, I was checking periodically and noticed that 'radeon' also tends to grow continuously over time, whereas ethernet driver sticks to, more or less, the same range: # uname -r 3.19.4 # # grep -Eo 'radeon|r8169' L1.log | sort | uniq -c 62 r8169 4183 radeon # # grep -Eo 'radeon|r8169' L2.log | sort | uniq -c 33 r8169 5582 radeon # # grep -Eo 'radeon|r8169' L3.log | sort | uniq -c 54 r8169 7007 radeon # # grep -Eo 'radeon|r8169' L4.log | sort | uniq -c 49 r8169 7429 radeon # # grep -Eo 'radeon|r8169' L5.log | sort | uniq -c 34 r8169 9360 radeon # It doesn't grow that much in 3.17.8: # uname -r 3.17.8 # # grep -Eo 'radeon|r8169|rtl8192se' L1.log | sort | uniq -c 265 r8169 1229 radeon 142 rtl8192se # # grep -Eo 'radeon|r8169|rtl8192se' L2.log | sort | uniq -c 187 r8169 3159 radeon 124 rtl8192se # # grep -Eo 'radeon|r8169|rtl8192se' L3.log | sort | uniq -c 41 r8169 1894 radeon 39 rtl8192se # # grep -Eo 'radeon|r8169|rtl8192se' L4.log | sort | uniq -c 64 r8169 3370 radeon 77 rtl8192se # # grep -Eo 'radeon|r8169|rtl8192se' L5.log | sort | uniq -c 52 r8169 2597 radeon 49 rtl8192se # Btw, at some point (3.19.4) I encounetered this: [21631.181909] DMA-API: debugging out of memory - disabling Jake -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Error: DMA: Out of SW-IOMMU space [was: External USB drives become unresponsive after few hours.]
I think the case is closed. Now that I know it's not USB, but wireless driver, I looked through the new k3.19.5's changelog and saw this: commit b943e69d33fac1e5f6db57868e061096b0aae67a Author: Larry Finger Date: Sat Mar 21 15:16:05 2015 -0500 rtlwifi: Fix IOMMU mapping leak in AP mode commit be0b5e635883678bfbc695889772fed545f3427d upstream. Transmission of an AP beacon does not call the TX interrupt service routine, which usually does the cleanup. Instead, cleanup is handled in a tasklet completion routine. Unfortunately, this routine has a serious bug in that it does not release the DMA mapping before it frees the skb, thus one IOMMU mapping is leaked for each beacon. The test system failed with no free IOMMU mapping slots approximately one hour after hostapd was used to start an AP. This issue was reported and tested at https://github.com/lwfinger/rtlwifi_new/issues/30. Reported-and-tested-by: Kevin Mullican Cc: Kevin Mullican Signed-off-by: Shao Fu Signed-off-by: Larry Finger Signed-off-by: Kalle Valo Signed-off-by: Greg Kroah-Hartman Looks very related, especially because my wireless card is also always in AP mode, however I haven't been actually using it lately, so probably that's why I didn't notice anything related to it (and kept focused on USB), until I used dump_dma. Well, due to my minimal knowledge regarding kernel's internals I can't be 100% sure that this was it, but so far 3.19.5 is working stable (uptime 6hrs and counting). Thank you Konrad (and everyone else involved) for helping me out to pinpoint the actual culprit. Jake On 18 April 2015 at 21:59, Dorian Gray wrote: > On 18 April 2015 at 12:10, Dorian Gray wrote: >> On 17 April 2015 at 22:06, Konrad Rzeszutek Wilk >> wrote: >>> On Fri, Apr 17, 2015 at 05:14:20PM +0200, Dorian Gray wrote: >>>> On 16 April 2015 at 20:42, Konrad Rzeszutek Wilk >>>> wrote: >>>> > And easier way is to compile the kernel with CONFIG_DMA_API_DEBUG >>>> > and then load the attached module. >>>> > >>>> > That should tell you who and what else is holding on the buffers. >>>> >>>> Ok, I have compiled 3.19.4 w/ CONFIG_DMA_API_DEBUG=y + the module you sent >>>> me. >>>> Now, I'm not sure if I've done it right - I waited until the error >>>> occured and then modprobe'd dump_dma. >>>> I have attached the kernel log, but it tells me not much, if anything... >>> >>> The network driver is quite hungry for DMA. Did it do the same thing >>> in the earlier kernels? >>> >>> Thanks. >>>> >>>> Thanks again. >>>> Jake >>> >>> >> >> Yeah, you're right: >> >> # grep rtl8192se dump_dma_k3.19.4.log | wc -l >> 6789 >> # >> # grep rtl8192se dump_dma_k3.17.8.log | wc -l >> 162 >> # >> >> So, wlan driver would be the real culprit then..? >> I would have never thought... >> >> I guess I'm gonna test 3.19.4 once more (just to be sure) with >> rtl8192se removed and see what happens. >> >> Thanks! >> Jake > > > [update] > > Ok, 6 hours of uptime (3.19.4 + blacklisted rtl8192se) and everything > was fine... > However, I was checking periodically and noticed that 'radeon' also > tends to grow continuously over time, whereas ethernet driver sticks > to, more or less, the same range: > > # uname -r > 3.19.4 > # > # grep -Eo 'radeon|r8169' L1.log | sort | uniq -c > 62 r8169 >4183 radeon > # > # grep -Eo 'radeon|r8169' L2.log | sort | uniq -c > 33 r8169 >5582 radeon > # > # grep -Eo 'radeon|r8169' L3.log | sort | uniq -c > 54 r8169 >7007 radeon > # > # grep -Eo 'radeon|r8169' L4.log | sort | uniq -c > 49 r8169 >7429 radeon > # > # grep -Eo 'radeon|r8169' L5.log | sort | uniq -c > 34 r8169 >9360 radeon > # > > It doesn't grow that much in 3.17.8: > > # uname -r > 3.17.8 > # > # grep -Eo 'radeon|r8169|rtl8192se' L1.log | sort | uniq -c > 265 r8169 >1229 radeon > 142 rtl8192se > # > # grep -Eo 'radeon|r8169|rtl8192se' L2.log | sort | uniq -c > 187 r8169 >3159 radeon > 124 rtl8192se > # > # grep -Eo 'radeon|r8169|rtl8192se' L3.log | sort | uniq -c > 41 r8169 >1894 radeon > 39 rtl8192se > # > # grep -Eo 'radeon|r8169|rtl8192se' L4.log | sort | uniq -c > 64 r8169 >3370 radeon > 77 rtl8192se > # > # grep -Eo 'radeon|r8169|rtl8192se' L5.log | sort | uniq -c > 52 r8169 >2597 radeon > 49 rtl8192se > # > > > Btw, at some point (3.19.4) I encounetered this: > [21631.181909] DMA-API: debugging out of memory - disabling > > Jake -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html