Reproduceable SATA lockup on 3.7.8 with SSD
Howdy, I seem to have the same problem (or similar) as Mathieu Desnoyers in https://lkml.org/lkml/2013/2/22/437 I can reliably get my SSD to drop from the SATA bus given the right workload on linux. How can I tell if it's linux's fault of the drive's fault? Thanks, Marc - Forwarded message from Marc MERLIN ----- From: Marc MERLIN To: linux-...@vger.kernel.org Hopefully this is the right list. I know that IDE!=SATA, but I can't find a SATA list. Please redirect me if needed. Hardware: Lenovo T530, 64bit kernel and userland. Hadware is shown below, but 2 drives, one SSD (OCZ-VERTEX4) and one HD (Hitachi HTS54101). The SSD will lockup reliably if I do a specific mencoder command that reads MP4 files and rewrites them to another file in the same directory. The log of what happens is shown below, the drive is eventually taken off the bus. Once I reboot, it back, as if nothing happened. If I do the same command on the HD, it works, but of course timings will be different since the HD is slower. How can I tell if it's the SSD's firmware's fault, or the linux SATA/AHCI code that is buggy? Thanks, Marc Failure log: ata1.00: exception Emask 0x0 SAct 0x7fff SErr 0x0 action 0x6 frozen ata1.00: failed command: WRITE FPDMA QUEUED ata1.00: cmd 61/00:00:00:38:13/04:00:33:00:00/40 tag 0 ncq 524288 out res 40/00:00:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout) ata1.00: status: { DRDY } ata1.00: failed command: WRITE FPDMA QUEUED ata1.00: cmd 61/00:08:00:3c:13/04:00:33:00:00/40 tag 1 ncq 524288 out res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) ata1.00: status: { DRDY } (snipped) ata1.00: failed command: WRITE FPDMA QUEUED ata1.00: cmd 61/00:e8:00:30:13/04:00:33:00:00/40 tag 29 ncq 524288 out res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) ata1.00: status: { DRDY } ata1.00: failed command: WRITE FPDMA QUEUED ata1.00: cmd 61/00:f0:00:34:13/04:00:33:00:00/40 tag 30 ncq 524288 out res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) ata1.00: status: { DRDY } ata1: hard resetting link ata1: link is slow to respond, please be patient (ready=0) ata1: COMRESET failed (errno=-16) ata1: hard resetting link ata1: link is slow to respond, please be patient (ready=0) ata1: COMRESET failed (errno=-16) ata1: hard resetting link ata1: link is slow to respond, please be patient (ready=0) ata1: COMRESET failed (errno=-16) ata1: limiting SATA link speed to 3.0 Gbps ata1: hard resetting link ata1: COMRESET failed (errno=-16) ata1: reset failed, giving up ata1.00: disabled ata1.00: device reported invalid CHS sector 0 (...) ata1.00: device reported invalid CHS sector 0 ata1: EH complete sd 0:0:0:0: [sda] Unhandled error code sd 0:0:0:0: [sda] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK sd 0:0:0:0: [sda] CDB: Write(10): 2a 00 33 13 34 00 00 04 00 00 end_request: I/O error, dev sda, sector 856896512 sd 0:0:0:0: [sda] Unhandled error code Boot shows: ahci :00:1f.2: version 3.0 ahci :00:1f.2: irq 42 for MSI/MSI-X ahci: SSS flag set, parallel bus scan disabled ahci :00:1f.2: AHCI 0001.0300 32 slots 6 ports 6 Gbps 0x13 impl SATA mode ahci :00:1f.2: flags: 64bit ncq ilck stag pm led clo pio slum part ems sxs apst ahci :00:1f.2: setting latency timer to 64 scsi0 : ahci scsi1 : ahci scsi2 : ahci scsi3 : ahci scsi4 : ahci scsi5 : ahci ata1: SATA max UDMA/133 abar m2048@0xf2538000 port 0xf2538100 irq 42 ata2: SATA max UDMA/133 abar m2048@0xf2538000 port 0xf2538180 irq 42 ata3: DUMMY ata4: DUMMY ata5: SATA max UDMA/133 abar m2048@0xf2538000 port 0xf2538300 irq 42 ata6: DUMMY scsi6 : pata_legacy ata7: PATA max PIO4 cmd 0x1f0 ctl 0x3f6 irq 14 ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300) ata1.00: ACPI cmd ef/02:00:00:00:00:a0 (SET FEATURES) succeeded ata1.00: ACPI cmd f5/00:00:00:00:00:a0 (SECURITY FREEZE LOCK) filtered out ata1.00: ATA-9: OCZ-VERTEX4, 1.5, max UDMA/133 ata1.00: 1000215216 sectors, multi 16: LBA48 NCQ (depth 31/32), AA ata1.00: ACPI cmd ef/02:00:00:00:00:a0 (SET FEATURES) succeeded ata1.00: ACPI cmd f5/00:00:00:00:00:a0 (SECURITY FREEZE LOCK) filtered out ata1.00: configured for UDMA/133 scsi 0:0:0:0: Direct-Access ATA OCZ-VERTEX4 1.5 PQ: 0 ANSI: 5 sd 0:0:0:0: [sda] 1000215216 512-byte logical blocks: (512 GB/476 GiB) sd 0:0:0:0: [sda] Write Protect is off sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00 sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA sda: sda1 sda2 sda3 sda4 sd 0:0:0:0: [sda] Attached SCSI disk ata2: SATA link up 6.0 Gbps (SStatus 133 SControl 300) ata2.00: ACPI cmd ef/02:00:00:00:00:a0 (SET FEATURES) succeeded ata2.00: ACPI cmd f5/00:00:00:00:00:a0 (SECURITY FREEZE LOCK) filtered out ata2.00: ATA-8: Hitachi HTS541010A9E680, JA0OA480, max UDMA/133 ata2.00: 1953525168 sectors, multi 16: LBA48 NCQ (depth 31/32), AA ata2.00: ACPI cmd ef/02:00:00:00:00:a0 (SET FEATURES) succeeded ata2.00: ACPI cmd f5
Re: Reproduceable SATA lockup on 3.7.8 with SSD
On Tue, Feb 26, 2013 at 10:29:59AM -0500, Jeff Garzik wrote: > On 02/25/2013 07:27 PM, Marc MERLIN wrote: > >Howdy, > > > >I seem to have the same problem (or similar) as Mathieu Desnoyers in > >https://lkml.org/lkml/2013/2/22/437 > > > >I can reliably get my SSD to drop from the SATA bus given the right > >workload > >on linux. > > > >How can I tell if it's linux's fault of the drive's fault? > > Manually force speed to 3.0 Gbps, then 1.5 Gbps, and see what happens. > > Try module/kernel parameter libata.force=1.5Gbps or libata.force=3.0Gbps Ok, so by reading my log at time of failure, you saw that speed was flipping between the two? (I couldn't see that, but I'm not good at reading it). Also, just to make sure, you're not saying that you want me to change the speed at runtime, but 1) boot once with speed forced at 3Gbps and try and reproduce 2) boot a 2nd time with speed forced at 1.5Gbps and try and reproduce If libata is not a module in my kernel, I can still put libata.force=1.5Gbps on the lilo/grub command line, correct? Thanks, Marc On Mon, Feb 25, 2013 at 08:02:32PM -0500, Mathieu Desnoyers wrote: > - try diagnostic tools from your drive vendor, if it reports your drive > as bad, then it might just be your drive failing, Good point, drive is brand new (just replaced). > - try to run a SMART test from smartmontools, Unfortunately, OCZ does not support SMART. > - try to reproduce your issue with a simple test-case (trying my test > program might help) that clearly fails quickly, and all the time, on > your problematic hardware, My test fails 100% on my hardware too. Very easy to reproduce. I think it's basically a big amount of read/writes that cause it. > - find out if there are known firmware upgrades for your drive provided > by your vendor, try them out, Did that, I have the latest. > - find out if there are known BIOS upgrades for your machine provided by > your vendor, try them out, > - try test-case on various kernel versions, > - try test-case on various distributions (just in case), > - try test-case with power management disabled in your machine's BIOS, > - try test-case with other SSD drives of the exact same model as > yours, so you can see if it's just you own drive failing, > - try moving your drive to a different machine (same model, different > model), and see if the test-case still fails, > - try with other SSD drives (from other vendors) on your machine, > - check if you partition mount options enable TRIM or not, try to > disable TRIM explicitly (see mount(8), discard/nodiscard option), > - try using a different filesystem (just in case), > - try using a different block I/O scheduler, > - try using your drive vendor's SSD eraser, to reinitialize your entire > disk (yes, you will lose you entire data). This might be useful if > TRIM handling has changed after a firmware upgrade for instance. Those will take a while :) especially without spare hardware. I'll try older kernels first when I have a chance though. Thanks for your reply. Marc -- "A mouse is a device used to point at the xterm you want to type in" - A.S.R. Microsoft is to operating systems what McDonalds is to gourmet cooking Home page: http://marc.merlins.org/ -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: iwl3945: order 5 allocation during ifconfig up; vm problem?
On Wed, Sep 12, 2012 at 07:16:28AM +0200, Eric Dumazet wrote: > On Tue, 2012-09-11 at 16:25 -0700, Andrew Morton wrote: > > > Asking for a 256k allocation is pretty crazy - this is an operating > > system kernel, not a userspace application. > > > > I'm wondering if this is due to a recent change, but I'm having trouble > > working out where the allocation call site is. > > -- > > (Adding Marc Merlin to CC, since he reported same problem) > > Thats the firmware loading in iwlwifi driver. Not sure if it can use SG. > > drivers/net/wireless/iwlwifi/iwl-drv.c > > iwl_alloc_ucode() -> iwl_alloc_fw_desc() -> dma_alloc_coherent() > > It seems some sections of /lib/firmware/iwlwifi*.ucode files are above > 128 Kbytes, so dma_alloc_coherent() try order-5 allocations Thanks for looping me in, yes, this looks very familiar to me :) In the other thread, Johannes Berg gave me this patch which is supposed to help: http://p.sipsolutions.net/11ea33b376a5bac5.txt Unfortunately due to very long work days, I haven't had the time to try it out yet, but I will soon. Would that help in this case too? And to answer David Rientjes, I also have compaction on: gandalfthegreat:~# zgrep CONFIG_COMPACTION /proc/config.gz CONFIG_COMPACTION=y Full config: http://marc.merlins.org/tmp/config-3.5.2-amd64-preempt-noide-20120731 If that helps for comparison, my thread is here: http://www.spinics.net/lists/linux-wireless/msg96438.html Thanks, Marc -- "A mouse is a device used to point at the xterm you want to type in" - A.S.R. Microsoft is to operating systems what McDonalds is to gourmet cooking Home page: http://marc.merlins.org/ -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Fairly reproduceable crash in 3.7.1 Null pointer rb_erase+0xc4/0x292
Google only shows hits pointing to an ext4 patch that didn't go in 3.7 proper. http://marc.merlins.org/tmp/crash.jpg My call trace doesn't look copmlete, but shows "fatal exception in interrupt" and: timerqueue_del __remove_hrtimer __run_htrimer hrtimer_interruypt smp_apic_timer_interrupt paravirt_read_tpe intel_idle intel_idle cpuidle_enter I had pretty repeated crashes when plugging power back into my running laptop, but the display just freezes and I can't get a dump. For the crash here, I did: suspend to RAM, plug power back in, wake up. Laptop crashed about 3 seconds after wakeup. I'm on vacation with no hardware to get a proper crash dump or even serial console, but I have a bad screenshot. Boy do I wish this could be saved in some kind of NVRAM instead like on android. Marc -- "A mouse is a device used to point at the xterm you want to type in" - A.S.R. Microsoft is to operating systems what McDonalds is to gourmet cooking Home page: http://marc.merlins.org/ -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Fairly reproduceable crash in 3.7.1 Null pointer rb_erase+0xc4/0x292
On Wed, Jan 02, 2013 at 02:27:57PM -0800, Marc MERLIN wrote: > Google only shows hits pointing to an ext4 patch that didn't go in 3.7 > proper. > > http://marc.merlins.org/tmp/crash.jpg Grumble, I kind of forgot to add the link to my .config, sorry about that: http://marc.merlins.org/tmp/.config-3.7.1-amd64-preempt-20121226 While my crash picture is lame (sorry), I'm happy to provide what else I can before I go home and can provide a proper serial console crash dump. Marc -- "A mouse is a device used to point at the xterm you want to type in" - A.S.R. Microsoft is to operating systems what McDonalds is to gourmet cooking Home page: http://marc.merlins.org/ -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Fairly reproduceable crash in 3.7.1 Null pointer rb_erase+0xc4/0x292
On Thu, Jan 03, 2013 at 08:12:18AM +0100, Romain Francoise wrote: > Marc MERLIN writes: > > > I had pretty repeated crashes when plugging power back into my running > > laptop, but the display just freezes and I can't get a dump. > > > For the crash here, I did: suspend to RAM, plug power back in, wake up. > > Laptop crashed about 3 seconds after wakeup. > > Sounds like https://bugzilla.kernel.org/show_bug.cgi?id=51661 which is > fixed by 3935e89505a1c3ab3f3b0c7ef0eae54124f48905 ("watchdog: Fix > disable/enable regression"), expect that in 3.7.2... That looks like a good match, thank you. Hopefully this quick thread will help google steer folks who do get a crash to the right page and fix. Cheers, Marc -- "A mouse is a device used to point at the xterm you want to type in" - A.S.R. Microsoft is to operating systems what McDonalds is to gourmet cooking Home page: http://marc.merlins.org/ -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Supporting SYSRQ on broken laptops like the thinkpad T530
On Wed, Jan 09, 2013 at 03:36:44AM +0100, Roland Eggner wrote: > On 2013-01-08 Tuesday at 15:09 -0800 Marc MERLIN wrote: > > In its infinite wisdom, lenovo has removed the sysrq key on the latest > > thinkpads, and replaced it with a stupid ALT+FN+S key combination, which > > doesn't really work for doing sysrq from the console (nor do I know how the > > genius who did that intended for SYSRQ-S to work). > > http://forums.lenovo.com/t5/T400-T500-and-newer-T-series/T430-s-T530-Where-are-the-shortcut-function-keys-break-Pause-etc/ta-p/781749 > > > > I realize that one solution is to throw my laptop window at a suitable high > > floorand replace it with one from a vendor that doesn't randomly remove keys > > from the keyboard. > > That said, I was wondering if there were other solutions, especially > > considering that thinkpads used to be the better linux laptops. > > My Dell “Precision M4500” notebook suffers similar (same?) problem. So far > I could not find a solution better than this: e.g. Alt-Fn-SysRq-s > > press and hold Alt > press and hold Fn > press and leave F10|SysRq > leave Fn > press and leave s > leave Alt Just for the sake of the archives, turns out that on the lenovo T430 and T530 you should ignore the Lenovo documentation I quoted above, and you can indeed use the PrtSc key between Right Alt and Right Ctrl, that key works just fine for Sysrq. I have no idea why Lenovo felt they had to document some complicated alternate software sysrq with Fn+S Anyway, hope this helps someone. Marc -- "A mouse is a device used to point at the xterm you want to type in" - A.S.R. Microsoft is to operating systems what McDonalds is to gourmet cooking Home page: http://marc.merlins.org/ -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Reproduceable SATA lockup on 3.7.8 with SSD
On Tue, Feb 26, 2013 at 08:50:04AM -0800, Marc MERLIN wrote: > On Tue, Feb 26, 2013 at 10:29:59AM -0500, Jeff Garzik wrote: > > On 02/25/2013 07:27 PM, Marc MERLIN wrote: > > >Howdy, > > > > > >I seem to have the same problem (or similar) as Mathieu Desnoyers in > > >https://lkml.org/lkml/2013/2/22/437 > > > > > >I can reliably get my SSD to drop from the SATA bus given the right > > >workload > > >on linux. > > > > > >How can I tell if it's linux's fault of the drive's fault? > > > > Manually force speed to 3.0 Gbps, then 1.5 Gbps, and see what happens. > > > > Try module/kernel parameter libata.force=1.5Gbps or libata.force=3.0Gbps > > Ok, so by reading my log at time of failure, you saw that speed was > flipping between the two? (I couldn't see that, but I'm not good at reading > it). > > Also, just to make sure, you're not saying that you want me to change the > speed at runtime, but > 1) boot once with speed forced at 3Gbps and try and reproduce > 2) boot a 2nd time with speed forced at 1.5Gbps and try and reproduce > > If libata is not a module in my kernel, I can still put > libata.force=1.5Gbps > on the lilo/grub command line, correct? Jeff, could you clear up what you'd like me to try out? Thanks, Marc -- "A mouse is a device used to point at the xterm you want to type in" - A.S.R. Microsoft is to operating systems what McDonalds is to gourmet cooking Home page: http://marc.merlins.org/ -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Supporting SYSRQ on broken laptops like the thinkpad T530
On Sat, Mar 30, 2013 at 06:56:28PM +0100, Pavel Machek wrote: > Sometimes it works, sometimes it does not. Don't blame lenovo for > that. > > Maybe it should be modified to take sysrq and _then_ key? > > Or maybe we should use something like lshift+rshift+lalt+ralt+key? It can't hurt to add alternatives like the one you suggested. They don't have to be convenient, although the one you suggest takes 5 fingers at the same time :) Is there anything that uses shift+ctrl+alt + key in userspace? I checked enlightenment 17, they have crazy key bindings, but nothing that uses all 3 modifier keys at the same time. If that's not safe, feel free to add one or 2 more just to be safe. Thanks for suggesting this. Marc -- "A mouse is a device used to point at the xterm you want to type in" - A.S.R. Microsoft is to operating systems what McDonalds is to gourmet cooking Home page: http://marc.merlins.org/ -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Supporting SYSRQ on broken laptops like the thinkpad T530
In its infinite wisdom, lenovo has removed the sysrq key on the latest thinkpads, and replaced it with a stupid ALT+FN+S key combination, which doesn't really work for doing sysrq from the console (nor do I know how the genius who did that intended for SYSRQ-S to work). http://forums.lenovo.com/t5/T400-T500-and-newer-T-series/T430-s-T530-Where-are-the-shortcut-function-keys-break-Pause-etc/ta-p/781749 I realize that one solution is to throw my laptop window at a suitable high floorand replace it with one from a vendor that doesn't randomly remove keys from the keyboard. That said, I was wondering if there were other solutions, especially considering that thinkpads used to be the better linux laptops. Thanks, Marc -- "A mouse is a device used to point at the xterm you want to type in" - A.S.R. Microsoft is to operating systems what McDonalds is to gourmet cooking Home page: http://marc.merlins.org/ -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Supporting SYSRQ on broken laptops like the thinkpad T530
On Wed, Jan 09, 2013 at 03:36:44AM +0100, Roland Eggner wrote: > On 2013-01-08 Tuesday at 15:09 -0800 Marc MERLIN wrote: > > In its infinite wisdom, lenovo has removed the sysrq key on the latest > > thinkpads, and replaced it with a stupid ALT+FN+S key combination, which > > doesn't really work for doing sysrq from the console (nor do I know how the > > genius who did that intended for SYSRQ-S to work). > > http://forums.lenovo.com/t5/T400-T500-and-newer-T-series/T430-s-T530-Where-are-the-shortcut-function-keys-break-Pause-etc/ta-p/781749 > > > > I realize that one solution is to throw my laptop window at a suitable high > > floorand replace it with one from a vendor that doesn't randomly remove keys > > from the keyboard. > > That said, I was wondering if there were other solutions, especially > > considering that thinkpads used to be the better linux laptops. > > My Dell “Precision M4500” notebook suffers similar (same?) problem. So far > I could not find a solution better than this: e.g. Alt-Fn-SysRq-s > > press and hold Alt > press and hold Fn > press and leave F10|SysRq > leave Fn > press and leave s > leave Alt Holy crap. That works for me too. If only lenovo could have been bothered to document it properly. It's still a pitty to type and remmember the exact hold and release key sequences, but it's better than nothing. Thanks much. > Several months ago a LKML user claimed, his cat had managed to press > Alt-Fn-SysRq-c on his Dell Latitude notebook with similar keyboard, and > provided > photos showing the kernel crash message ;) Yeah, but my cat is not nearly smart enough for that :) Thanks for your help again, Marc -- "A mouse is a device used to point at the xterm you want to type in" - A.S.R. Microsoft is to operating systems what McDonalds is to gourmet cooking Home page: http://marc.merlins.org/ -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] 2.4 version of my duplicate IP and MAC detection patch
On Sat, Sep 23, 2000 at 02:02:24PM +, Julian Anastasov wrote: > > I didn't receive any negative comments, except for Alexey who believed the > > check should be done in user space. > > Now you receive another negative comment, for the 2.2 version :) Thanks for the feedback, it is appreciated. > Currently, in Linux 2.2 there is a device flag "hidden" which > is based on this statement: many host can configure same IP address > but it is assumed that only one is advertised. Your patch now will Yes, I know LVS and arp_invisible, later renamed arp_hidden > print messages for all these hidden addresses. They are not advertised > and there is no problem caused from duplication. I thought about that, but isn't the shared IP just an IP alias and not the primary IP? As far as I know, the machines which share the IP have a primary IP and put that one in their ARP packets, so my patch should not complain. That said, adding a flag that lets you disable the duplicate IP detection on an interface basis wouldn't be a bad idea, I'll look into this. > - sip=127.0.0.0/8, this address is shared but we "assume" it is not > advertised from the neighbours Are you saying that some machines would ARP with a source IP of localhost? That'd be pretty broken, wouldn't it? Or you talking about a kind of DOS that would trigger warnings on all the machines? (the dupe check could ignore that) > - you work with ifa_address and not with ifa_local and ifa_mask. I'll look into this too. Thanks for your feedback. Marc -- Microsoft is to software what McDonalds is to gourmet cooking Home page: http://marc.merlins.org/ (friendly to non IE browsers) Finger [EMAIL PROTECTED] for PGP key and other contact information - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
[patch] 2.4 version of my duplicate IP and MAC detection patch
I updated my duplicate IP detection patch to work with 2.4. I announced the 2.2 version here last year, and several people expr= essed interest in it, but it never made it into the kernel unfortunately. I = asked a few times and eventually gave up as I didn't want to appear overly = pushy and it got included in the kernels that I use (kernels from VA). I didn't receive any negative comments, except for Alexey who believe= d the check should be done in user space. The patches (2.2/2.4) and discussion can be found here: http://marc.merlins.org/linux/arppatch/ Here are two excerpts: ---= - What does the patch do? It looks at all the broadcast ARP requests and checks that the source = IP of the request is different from the interface's IP. This will catch a ma= chine that is using your IP and is trying to talk to a machine on your net fo= r the first time or the first time in a while. The big plus of this approach is that it's passive=20 It will get your system to output this:=20 Uh Oh, MAC address 00:A0:C9:EE:9C:8A claims to have our IP addresses (192.168.205.9) (duplicate IP conflict likely) or this:=20 Uh Oh, I received an ARP packet claiming to be from our MAC address 00:80:C8:47:37:72, but with an IP I don't own (192.168.205.1). Someone = has apparently stolen our MAC address ---= - ---= - But then why not write the whole thing in user space? Well, the line has to be drawn somewhere... The whole IP stack could = be in user space if we wanted... In this case, the actual added code (I'= m not talking about the existing code which I turned into a function) is abo= ut 20 lines, it's trivial and it uses much less resources on a slow machine = (386) than a user space solution which forces a context switches, system c= alls, and memory for that user process. Also, not that others are always right, but do you know any OS that= does duplicate IP checking by inspecting ARP requests in user space? ---= - I'm attaching the 2.4 version which I'd really like to see included i= n the main tree. While I don't see a good reason to disable this, if what it = takes is a config option, experimental or not, disabled by default or not= (I'd rather have it non experimental since it's a year old, and enable= d by default, but I'll settle), I'll do what it takes. I'm attaching the 2.4 version and you can find the 2.2 version, as we= ll as more info on my page: http://marc.merlins.org/linux/arppatch/ Thanks, Marc diff -urN linux-2.2.4-test5/net/ipv4/arp.c linux-2.2.4-test5-detectarpd= upe/net/ipv4/arp.c --- linux-2.2.4-test5/net/ipv4/arp.cFri Jul 21 21:54:29 2000 +++ linux-2.2.4-test5-detectarpdupe/net/ipv4/arp.c Sun Sep 17 19:19:49 = 2000 @@ -65,6 +65,8 @@ * clean up the APFDDI & gen. FDDI bits. * Alexey Kuznetsov: new arp state machine; * now it is in net/core/neighbour.c. + * Marc Merlin : Added duplicate IP and MAC address + * detection (2000/09/17) */ =20 /* RFC1122 Status: @@ -121,6 +123,8 @@ =20 #include #include +=20 +#undef IDONTRECEIVEMYOWNPACKETSBACK=20 =20 #if defined(CONFIG_AX25) || defined(CONFIG_AX25_MODULE) static char *ax2asc2(ax25_address *a, char *buf); @@ -135,6 +139,7 @@ static void arp_solicit(struct neighbour *neigh, struct sk_buff *skb); static void arp_error_report(struct neighbour *neigh, struct sk_buff *= skb); static void parp_redo(struct sk_buff *skb); +static char *mac2asc(unsigned char *sha, unsigned char addr_len); =20 static struct neigh_ops arp_generic_ops =3D { @@ -716,6 +721,55 @@ goto out; } =20 + if (!memcmp(sha,dev->dev_addr,dev->addr_len)) + { + char ourip=3D0; + struct in_device *idev=3Ddev->ip_ptr; + struct in_ifaddr *adlist=3Didev->ifa_list; + =09 + while (adlist !=3D NULL) + { + if (adlist->ifa_address =3D=3D sip) { + =09 + ourip=3D1; + break; + } + adlist=3Dadlist->ifa_next; + } + =09 + if (net_ratelimit()) { + if (ourip) { +#ifdef IDONTRECEIVEMYOWNPACKETSBACK +/* This is an attempt at detecting that someone stole your MAC and you= r IP, but + * in some network configurations and with some switches, you will get= your + * own packets back, so this warning would be triggered by error for t= oo m
Re: [patch] 2.4 version of my duplicate IP and MAC detection patch
On Fri, Sep 22, 2000 at 01:31:06AM +0200, Andi Kleen wrote: > You added a linear IP search to fast path ARP processing. The people running > thousands of IP aliases will surely love you. You could at least use the > ip_route_input output instead that arp_rcv computes anyways and check > for RTN_LOCAL. While you actually don't get broadcast ARP request very often (more than a few per minute is rare), even on a busy net, making it faster doesn't hurt. I'll write a new patch, thanks. > BTW, the idea of doing it in user space is not to have a daemon running > but just to do DAD once when you configure the ip address, like most other > OSes do [as easily done with arping and a small script, see ipcfg from > iproute2]. I know about this. It only helps you not to steal someone else's IP, it doesn't help when someone else just stole your IP. Take a Solaris box, an IRIX one, or windows (these are the only ones I have access to for testing) and they'll all complain and notice if I steal their IP. I find it useful that a server syslogs the fact that its IP was stolen. I can use that info to bring up a temporary DHCP IP, and send a message to a central network monitor which will trace the culprit MAC address and optionally turn off the switched port it came from. Marc -- Microsoft is to software what McDonalds is to gourmet cooking Home page: http://marc.merlins.org/ (friendly to non IE browsers) Finger [EMAIL PROTECTED] for PGP key and other contact information - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] 2.4 version of my duplicate IP and MAC detection patch
On Fri, Sep 22, 2000 at 01:25:54AM -0700, David S. Miller wrote: > You've made the foo-address to ascii string routines non-reentrant. > The hbuffer[] was on the local stack for a very good reason. You are right, fixed. http://marc.merlins.org/linux/arppatch/arp-patch-2.4_v1.3 (that part of the patch is a year old, and I honestly don't remember why hbuffer became a static, as it is obviously wrong) > Why can't you write a userspace daemon that listens on one of the > lower level raw'ish sockets for arp packets and do the same checks > there. You can. > I don't like this change at all, I think it can be done completely > in user space. The existence of a working tcpdump is proof of this > fact. :-) Whether it can be done efficiently is another issue. That was my original point. http://marc.merlins.org/linux/arppatch/ But then why not write the whole thing in user space? Well, the line has to be drawn somewhere... The whole IP stack could be in user space if we wanted... In this case, the actual added code (I'm not talking about the existing code which I turned into a function) is about 20 lines, it's trivial and it uses much less resources on a slow machine (386) than a user space solution which forces a context switches, system calls, and memory for that user process. Also, not that others are always right, but do you know any OS that does duplicate IP checking by inspecting ARP requests in user space? > Making it possible to do this efficiently would be the kernel change > which might result from your work on a userspace variant, so have at > it. You're saying that you'd rather have a hook to do this from user space? I guess I didn't see the point since the kernel change is so small. > Even failing that, I would prefer something like a special "arp > netlink socket" which would allow a privileged userspace program > to hear all arp traffic the machine can hear. I guess I can see why you'd want that, but it will be more code and overhead than the current solution (by quite a bit actually, and Andi seemed concerned about not impacting the fast path, which this will, and in an significant way). Again, everyone else isn't always right, but all the other systems I know check for dupe IP by looking at ARP packets, and do it in the kernel, since it's a simple check. On Fri, Sep 22, 2000 at 01:19:30PM +0200, Andi Kleen wrote: > On Fri, Sep 22, 2000 at 01:25:54AM -0700, David S. Miller wrote: > > I don't like this change at all, I think it can be done completely > > in user space. The existence of a working tcpdump is proof of this > > fact. :-) Whether it can be done efficiently is another issue. > > I agree. I think DAD once during IP configuration should be enough. Come on, Andi, it's not. You do DAD, you get your IP, I plug my laptop, use your IP, you don't even know it. My patch lets you know. The reason I wrote it is that I've seen this happen too many times already. On Fri, Sep 22, 2000 at 04:10:53AM -0700, David S. Miller wrote: >That already exists in form of a packet socket bound to the ARP >IEEE protocol. Marc is probably right though that running an arp >daemon all the time just for that would be a bit of overkill >though. > > Then it stands to reason that it's _really_ overkill to have this kind > of stuff in the kernel too :-) It's not the same. It's overkill do to this in userspace because you need to be looking at the packets a second time, with context switches and all, while in the kernel, you already have the ARP packet in hand, you just take a quick extra peek at it. But going back to the original point, passively checking the from addresses of ARP packets you are already receiving is useful and induces just about no extra load. I can fix the patch, but if you're really against the concept, you can let me know and I'll leave you alone :-) Regardless though, linux is one of the few well known TCP/IP capable OSes that doesn't say a word when its IP is being used by someone else, and this has to be fixed some way or another. I simply believe my way is the simplest and the lightest, but you're more than welcome to write you own and prove me wrong :-) Marc -- Microsoft is to software what McDonalds is to gourmet cooking Home page: http://marc.merlins.org/ (friendly to non IE browsers) Finger [EMAIL PROTECTED] for PGP key and other contact information - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
NVME regression in all kernels after 4.4.x for NVME in M2 slot for laptop?
I've been stuck on 4.4.x for a while (currently 4.4.5) because any subsequent kernel would fail to suspend or resume (S3 sleep) on my Thinkpad P70. Due to lack of time, I only got around to doing a git bisect now (sorry), and did it between 4.4.0 and 4.5.0 It's my first bisect, but I hope I did it right outside of the fact that my kernel wasn't exactly the same each time due to having my .config file change depending on which kernel I ended up on. However, the patch found by bisect makes sense that it would be a good culprit. I use an NVME 512GB SSD in my laptop, and I guess very few people use those which could be why I'm the first/only person to report this. Sadly because NVME changed a lot between 4.4 and 4.5 and I'm not a kernel hacker, I can't just reverse apply the patch to 4.5 and see if it works because I'd have to unroll a bunch of other changes too, and that's a bit beyond my expertise and time at hand right now. Would this patch make sense as being the reason why I can't S3 sleep anymore and would you have a test patch against 4.5, 4.6, or 4.7 I can try to see if it fixes the problem? Symptom is that my red LED (the dot for in in thinkpad on the back cover) goes flashing in weird ways when I shut the lid, but not always the same pattern, however none are the normal on/off gentle pulsing that indicate proper S3 sleep. The caps lock key LED also flashes rapidly when I open the lid and the laptop is stone dead at this point. Boot logs on 4.4.5 kernel where sleep works fine: [1.245549] ahci :00:17.0: version 3.0 [1.245733] ahci :00:17.0: AHCI 0001.0301 32 slots 2 ports 6 Gbps 0xc impl SATA mode [1.245771] ahci :00:17.0: flags: 64bit ncq sntf pm led clo only pio slum part ems deso sadm sds apst [1.251140] scsi host0: ahci [1.251587] scsi host1: ahci [1.251972] scsi host2: ahci [1.252360] scsi host3: ahci [1.252437] ata1: DUMMY [1.252449] ata2: DUMMY [1.252462] ata3: SATA max UDMA/133 abar m2048@0xd584c000 port 0xd584c200 irq 122 [1.252499] ata4: SATA max UDMA/133 abar m2048@0xd584c000 port 0xd584c280 irq 122 [1.253374] scsi host4: pata_legacy [1.253439] ata5: PATA max PIO4 cmd 0x1f0 ctl 0x3f6 irq 14 [1.355385] nvme0n1: p1 p2 p3 p4 p5 p6 p7 p8 [1.570804] ata3: SATA link up 6.0 Gbps (SStatus 133 SControl 300) [1.570877] ata4: SATA link up 6.0 Gbps (SStatus 133 SControl 300) [1.573097] ata3.00: ACPI cmd ef/02:00:00:00:00:a0 (SET FEATURES) succeeded [1.573101] ata3.00: ACPI cmd f5/00:00:00:00:00:a0 (SECURITY FREEZE LOCK) filtered out [1.573690] ata3.00: supports DRM functions and may not be fully accessible [1.574399] ata3.00: disabling queued TRIM support [1.574402] ata3.00: ATA-9: Samsung SSD 850 EVO 2TB, EMT01B6Q, max UDMA/133 [1.574435] ata3.00: 3907029168 sectors, multi 1: LBA48 NCQ (depth 31/32), AA [1.575954] ata3.00: ACPI cmd ef/02:00:00:00:00:a0 (SET FEATURES) succeeded [1.575958] ata3.00: ACPI cmd f5/00:00:00:00:00:a0 (SECURITY FREEZE LOCK) filtered out [1.576550] ata3.00: supports DRM functions and may not be fully accessible [1.577209] ata3.00: disabling queued TRIM support [1.578007] ata3.00: configured for UDMA/133 [1.578037] ata4.00: ACPI cmd ef/02:00:00:00:00:a0 (SET FEATURES) succeeded [1.578040] ata4.00: ACPI cmd f5/00:00:00:00:00:a0 (SECURITY FREEZE LOCK) filtered out Patch found by bisect, attached Thanks, Marc -- "A mouse is a device used to point at the xterm you want to type in" - A.S.R. Microsoft is to operating systems what McDonalds is to gourmet cooking Home page: http://marc.merlins.org/ | PGP 1024R/763BE901 saruman:/usr/src/linux# git bisect good 25646264e15af96c5c630fc742708b1eb3339222 is the first bad commit commit 25646264e15af96c5c630fc742708b1eb3339222 Author: Keith Busch Date: Mon Jan 4 09:10:57 2016 -0700 NVMe: Remove queue freezing on resets NVMe submits all commands through the block layer now. This means we can let requests queue at the blk-mq hardware context since there is no path that bypasses this anymore so we don't need to freeze the queues anymore. The driver can simply stop the h/w queues from running during a reset instead. This also fixes a WARN in percpu_ref_reinit when the queue was unfrozen with requeued requests. Signed-off-by: Keith Busch Signed-off-by: Jens Axboe diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c index e31a256..8da4a8a 100644 --- a/drivers/nvme/host/core.c +++ b/drivers/nvme/host/core.c @@ -1372,12 +1372,14 @@ out: return ret; } -void nvme_stop_queues(struct nvme_ctrl *ctrl) +void nvme_freeze_queues(struct nvme_ctrl *ctrl) { struct nvme_ns *ns; mutex_lock(&ctrl->namespaces_mutex); list_for_each_entry(ns, &ctrl->namespaces, list) { + blk_mq_freeze_queue_start(ns->queue); + spin_lock_irq(ns->queue->queue_lock); queue_flag_set
Re: [PATCH 4.14 095/140] bcache: fix crashes in duplicate cache device register
On Tue, Mar 13, 2018 at 04:24:58PM +0100, Greg Kroah-Hartman wrote: > 4.14-stable review patch. If anyone has any objections, please let me know. Just in case someone is considering whether it's important to merge, the bug did crash my kernel of course, but I'm virtually certain it was also responsible for corrupting my existing bcache device enough that I had to restore it from backup. Thanks again to Tang for fixing it. > -- > > From: Tang Junhui > > commit cc40daf91bdddbba72a4a8cd0860640e06668309 upstream. > > Kernel crashed when register a duplicate cache device, the call trace is > bellow: > [ 417.643790] CPU: 1 PID: 16886 Comm: bcache-register Tainted: G >W OE4.15.5-amd64-preempt-sysrq-20171018 #2 > [ 417.643861] Hardware name: LENOVO 20ERCTO1WW/20ERCTO1WW, BIOS > N1DET41W (1.15 ) 12/31/2015 > [ 417.643870] RIP: 0010:bdevname+0x13/0x1e > [ 417.643876] RSP: 0018:a3aa9138fd38 EFLAGS: 00010282 > [ 417.643884] RAX: RBX: 8c8f2f2f8000 RCX: d6701f8 > c7edf > [ 417.643890] RDX: a3aa9138fd88 RSI: a3aa9138fd88 RDI: 000 > 0 > [ 417.643895] RBP: a3aa9138fde0 R08: a3aa9138fae8 R09: 000 > 1850e > [ 417.643901] R10: 8c8eed34b271 R11: 8c8eed34b250 R12: 000 > 0 > [ 417.643906] R13: d6701f78f940 R14: 8c8f38f8 R15: 8c8ea7d > 9 > [ 417.643913] FS: 7fde7e66f500() GS:8c8f6144() knlGS: > > [ 417.643919] CS: 0010 DS: ES: CR0: 80050033 > [ 417.643925] CR2: 0314 CR3: 0007e6fa0001 CR4: 003 > 606e0 > [ 417.643931] DR0: DR1: DR2: 000 > 0 > [ 417.643938] DR3: DR6: fffe0ff0 DR7: 000 > 00400 > [ 417.643946] Call Trace: > [ 417.643978] register_bcache+0x1117/0x1270 [bcache] > [ 417.643994] ? slab_pre_alloc_hook+0x15/0x3c > [ 417.644001] ? slab_post_alloc_hook.isra.44+0xa/0x1a > [ 417.644013] ? kernfs_fop_write+0xf6/0x138 > [ 417.644020] kernfs_fop_write+0xf6/0x138 > [ 417.644031] __vfs_write+0x31/0xcc > [ 417.644043] ? current_kernel_time64+0x10/0x36 > [ 417.644115] ? __audit_syscall_entry+0xbf/0xe3 > [ 417.644124] vfs_write+0xa5/0xe2 > [ 417.644133] SyS_write+0x5c/0x9f > [ 417.644144] do_syscall_64+0x72/0x81 > [ 417.644161] entry_SYSCALL_64_after_hwframe+0x3d/0xa2 > [ 417.644169] RIP: 0033:0x7fde7e1c1974 > [ 417.644175] RSP: 002b:7fff13009a38 EFLAGS: 0246 ORIG_RAX: 000 > 1 > [ 417.644183] RAX: ffda RBX: 01658280 RCX: 7fde7e1c > 1974 > [ 417.644188] RDX: 000a RSI: 01658280 RDI: > 0001 > [ 417.644193] RBP: 000a R08: 0003 R09: > 0077 > [ 417.644198] R10: 089e R11: 0246 R12: > 0001 > [ 417.644203] R13: 000a R14: 7fff R15: > > [ 417.644213] Code: c7 c2 83 6f ee 98 be 20 00 00 00 48 89 df e8 6c 27 3b 0 > 0 48 89 d8 5b c3 0f 1f 44 00 00 48 8b 47 70 48 89 f2 48 8b bf 80 00 00 00 <8 > b> b0 14 03 00 00 e9 73 ff ff ff 0f 1f 44 00 00 48 8b 47 40 39 > [ 417.644302] RIP: bdevname+0x13/0x1e RSP: a3aa9138fd38 > [ 417.644306] CR2: 0314 > > When registering duplicate cache device in register_cache(), after failure > on calling register_cache_set(), bch_cache_release() will be called, then > bdev will be freed, so bdevname(bdev, name) caused kernel crash. > > Since bch_cache_release() will free bdev, so in this patch we make sure > bdev being freed if register_cache() fail, and do not free bdev again in > register_bcache() when register_cache() fail. > > Signed-off-by: Tang Junhui > Reported-by: Marc MERLIN > Tested-by: Michael Lyle > Reviewed-by: Michael Lyle > Cc: > Signed-off-by: Jens Axboe > Signed-off-by: Greg Kroah-Hartman > > --- > drivers/md/bcache/super.c | 16 ++-- > 1 file changed, 10 insertions(+), 6 deletions(-) > > --- a/drivers/md/bcache/super.c > +++ b/drivers/md/bcache/super.c > @@ -1181,7 +1181,7 @@ static void register_bdev(struct cache_s > > return; > err: > - pr_notice("error opening %s: %s", bdevname(bdev, name), err); > + pr_notice("error %s: %s", bdevname(bdev, name), err); > bcache_device_stop(&dc->disk); > } > > @@ -1849,6 +1849,8 @@ static int register_cache(struct cache_s > const char *err = NULL; /* must be set for any error case */ > int ret = 0; > > + bdevname(bdev, name); > + > memcpy(&ca->sb, sb, sizeof(struct cache_sb)); >
Re: [PATCH 4.14 095/140] bcache: fix crashes in duplicate cache device register
[linux-kernel to bcc, moving back to bcache list] On Tue, Mar 13, 2018 at 10:26:33AM -0700, Michael Lyle wrote: > Though note you're still not safe from -that-. If there's duplicate > UUIDs around because you've duplicated devices, there's just no sane > way to tell which is the "right one" to attach to. Thanks for clearing that up, Mike. So, what happened to me was 1) I dd'ed drive1 to drive2 (raw device) 2) while that was going on, I ran fdisk on drive2 to fix a partition type 3) saving fdisk caused drive2 to be rescanned by the kernel 4) udev said, oh, a bcache partition, yummy, let me register that 5) instead I got a kernel crash that got fixed by this patch 6) tried to reboot a few times, and each time the kernel would crash early, until I found out it was bcache, removed drive2, system came back up 7) by then, my bcache filesystem was heavily corrupted and unsuable If there is a duplicate cache device UUID, wouldn't bcache just use the first one it sees and ignore the 2nd one? In my case this would have been the safe thing and I'm guessing in most cases, whatever device the UUID got duplicated on, will come 2nd in the boot order, and therefore is safer to ignore, even if the duplicate situation isn't safe per se. What do you think? Thanks, Marc > Mike > > On Tue, Mar 13, 2018 at 9:19 AM, Marc MERLIN wrote: > > On Tue, Mar 13, 2018 at 04:24:58PM +0100, Greg Kroah-Hartman wrote: > >> 4.14-stable review patch. If anyone has any objections, please let me > >> know. > > > > Just in case someone is considering whether it's important to merge, the > > bug did crash my kernel of course, but I'm virtually certain it was also > > responsible for corrupting my existing bcache device enough that I had > > to restore it from backup. > > > > Thanks again to Tang for fixing it. > > > > > >> -- > >> > >> From: Tang Junhui > >> > >> commit cc40daf91bdddbba72a4a8cd0860640e06668309 upstream. > >> > >> Kernel crashed when register a duplicate cache device, the call trace is > >> bellow: > >> [ 417.643790] CPU: 1 PID: 16886 Comm: bcache-register Tainted: G > >>W OE4.15.5-amd64-preempt-sysrq-20171018 #2 > >> [ 417.643861] Hardware name: LENOVO 20ERCTO1WW/20ERCTO1WW, BIOS > >> N1DET41W (1.15 ) 12/31/2015 > >> [ 417.643870] RIP: 0010:bdevname+0x13/0x1e > >> [ 417.643876] RSP: 0018:a3aa9138fd38 EFLAGS: 00010282 > >> [ 417.643884] RAX: RBX: 8c8f2f2f8000 RCX: d6701f8 > >> c7edf > >> [ 417.643890] RDX: a3aa9138fd88 RSI: a3aa9138fd88 RDI: 000 > >> 0 > >> [ 417.643895] RBP: a3aa9138fde0 R08: a3aa9138fae8 R09: 000 > >> 1850e > >> [ 417.643901] R10: 8c8eed34b271 R11: 8c8eed34b250 R12: 000 > >> 0 > >> [ 417.643906] R13: d6701f78f940 R14: 8c8f38f8 R15: 8c8ea7d > >> 9 > >> [ 417.643913] FS: 7fde7e66f500() GS:8c8f6144() knlGS: > >> > >> [ 417.643919] CS: 0010 DS: ES: CR0: 80050033 > >> [ 417.643925] CR2: 0314 CR3: 0007e6fa0001 CR4: 003 > >> 606e0 > >> [ 417.643931] DR0: DR1: DR2: 000 > >> 0 > >> [ 417.643938] DR3: DR6: fffe0ff0 DR7: 000 > >> 00400 > >> [ 417.643946] Call Trace: > >> [ 417.643978] register_bcache+0x1117/0x1270 [bcache] > >> [ 417.643994] ? slab_pre_alloc_hook+0x15/0x3c > >> [ 417.644001] ? slab_post_alloc_hook.isra.44+0xa/0x1a > >> [ 417.644013] ? kernfs_fop_write+0xf6/0x138 > >> [ 417.644020] kernfs_fop_write+0xf6/0x138 > >> [ 417.644031] __vfs_write+0x31/0xcc > >> [ 417.644043] ? current_kernel_time64+0x10/0x36 > >> [ 417.644115] ? __audit_syscall_entry+0xbf/0xe3 > >> [ 417.644124] vfs_write+0xa5/0xe2 > >> [ 417.644133] SyS_write+0x5c/0x9f > >> [ 417.644144] do_syscall_64+0x72/0x81 > >> [ 417.644161] entry_SYSCALL_64_after_hwframe+0x3d/0xa2 > >> [ 417.644169] RIP: 0033:0x7fde7e1c1974 > >> [ 417.644175] RSP: 002b:7fff13009a38 EFLAGS: 0246 ORIG_RAX: > >> 000 > >> 1 > >> [ 417.644183] RAX: ffda RBX: 01658280 RCX: > >> 7fde7e1c > >> 1974 > >> [ 417.644188] RDX: 000a RSI: 01658280 RDI: > >> > >> 0001 > >> [ 417.644193] RBP: 000a R
Re: 5.9.11 still hanging 2mn at each boot and looping on nvidia-gpu 0000:01:00.3: PME# enabled (Quadro RTX 4000 Mobile)
On Wed, Jan 27, 2021 at 03:33:00PM -0600, Bjorn Helgaas wrote: > Hi Marc, I appreciate your persistence on this. I am frankly > surprised that you've put up with this so long. Well, been using linux for 27 years, but also it's not like I have much of a choice outside of switching to windows, as tempting as it's getting sometimes ;) > > after boot, when it gets the right trigger (not sure which ones), it > > loops on this evern 2 seconds, mostly forever. > > > > I'm not sure if it's nouveau's fault or the kernel's PCI PME's fault, or > > something else. > > IIUC there are basically two problems: > > 1) A 2 minute delay during boot > Another random thought: is there any chance the boot delay could be > related to crypto waiting for entropy? So, the 2mn hang went away after I added the nouveau firwmare in initrd. The only problem is that the nouveau driver does not give a very good clue as to what's going on and what to do. For comparison the intel iwlwifi driver is very clear about firmware it's trying to load, if it can't and what exact firmware you need to find on the internet (filename) > 2) Some sort of event every 2 seconds that kills your battery life > Your machine doesn't sound unusual, and I haven't seen a flood of > similar reports, so maybe there's something unusual about your config. > But I really don't have any guesses for either one. Honestly, there are not too many thinpad P73 running linux out there. I wouldn't be surprised if it's only a handful or two. > It sounds like v5.5 worked fine and you first noticed the slow boot > problem in v5.8. We *could* try to bisect it, but I know that's a lot > of work on your part. I've done that in the past, to be honest now that it works after I added the firmware that nouveau started needing, and didn't need before, the hang at boot is gone for sure. The PCI PM wakeup issues on batteries happen sometimes still, but they are much more rare now. > Grasping for any ideas for the boot delay; could you boot with > "initcall_debug" and collect your "lsmod" output? I notice async_tx > in some of your logs, but I have no idea what it is. It's from > crypto, so possibly somewhat unusual? Is this still neeeded? I think of nouveau does a better job of helping the user correct the issue if firmware is missing (I think intel even gives a URL in printk), that would probably be what's needed for the most part. [ 12.832547] async_tx: api initialized (async) comes from ./crypto/async_tx/async_tx.c Thanks for your answer, let me know if there is anything else useful I can give, I think I'm otherwise mostly ok now. Marc -- "A mouse is a device used to point at the xterm you want to type in" - A.S.R. Home page: http://marc.merlins.org/ | PGP 7F55D5F27AAF9D08
Re: 5.9.11 still hanging 2mn at each boot and looping on nvidia-gpu 0000:01:00.3: PME# enabled (Quadro RTX 4000 Mobile)
On Fri, Jan 29, 2021 at 03:20:32PM -0600, Bjorn Helgaas wrote: > > For comparison the intel iwlwifi driver is very clear about firmware > > it's trying to load, if it can't and what exact firmware you need to > > find on the internet (filename) > > I guess you're referring to this in iwl_request_firmware()? > > IWL_ERR(drv, "check > git://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git\n"); > Yes :) > How can we fix this in nouveau so we don't have the debug this again? > I don't really know how firmware loading works, but "git grep -A5 > request_firmware drivers/gpu/drm/nouveau/" shows that we generally > print something when request_firmware() fails. Well, have a look at https://pastebin.com/dX19aCpj do you see any warning whatsoever? > But I didn't notice those messages in your logs, so I'm probably > barking up the wrong tree. you're not It seems that newer kernels are a bit better: [ 189.304662] nouveau :01:00.0: pmu: firmware unavailable [ 189.312455] nouveau :01:00.0: disp: destroy running... [ 189.316552] nouveau :01:00.0: disp: destroy completed in 1us [ 189.320326] nouveau :01:00.0: disp ctor failed, -12 [ 189.324214] nouveau: probe of :01:00.0 failed with error -12 So, it probably got better, but that message got displayed after the 2mn hang that having the firmware, stops from happening. whichever developer with the right hardware can probably easily reproduce this by removing the firmware and looking at the boot messages. At the very least, it should print something more clear "driver will not function properly", and a URL to where one can get the driver, would be awesome. > So maybe the wakeups are related to having vs not having the nouveau > firmware? I'm still curious about that, and it smells like a bug to > me, but probably something to do with nouveau where I have no hope of > debugging it. Right. Honestly, given the time I've lost with this, and now that it seems gone with the firmware, I'm happy to leave well enough alone :) I'm not sure how you are involved with the driver, but are you able to help improve the dmesg output? Thanks, Marc -- "A mouse is a device used to point at the xterm you want to type in" - A.S.R. Home page: http://marc.merlins.org/ | PGP 7F55D5F27AAF9D08
Re: [PATCH v2 1/2] PCI: Introduce pcie_wait_for_link_delay()
On Fri, Oct 04, 2019 at 03:39:46PM +0300, Mika Westerberg wrote: > This is otherwise similar to pcie_wait_for_link() but allows passing > custom activation delay in milliseconds. > > Signed-off-by: Mika Westerberg > --- > drivers/pci/pci.c | 21 ++--- > 1 file changed, 18 insertions(+), 3 deletions(-) > > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c > index e7982af9a5d8..bfd92e018925 100644 Hi Mika, So, I have a thinkpad P73 with thunderbolt, and while I don't boot often, my last boots have been unreliable at best (was only able to boot 5.7 once, and 5.8 did not succeed either). 5.6 was working for a while, but couldn't boot it either this morning, so I had to go back to 5.5. This does not mean 5.5 does not have the problem, just that it booted this morning, while 5.6 didn't when I tried. Once the kernel is booted, the problem does not seem to occur much, or at all. Basically, I'm getting the same thing than this person with a P53 (which is a mostly identical lenovo thinkpad, to mine) kernel: pcieport :00:01.0: PME: Spurious native interrupt! kernel: pcieport :00:01.0: PME: Spurious native interrupt! kernel: pcieport :00:01.0: PME: Spurious native interrupt! kernel: pcieport :00:01.0: PME: Spurious native interrupt! kernel: pcieport :00:01.0: PME: Spurious native interrupt! https://bbs.archlinux.org/viewtopic.php?id=250658 The kernel boots eventually, but it takes minutes, and everything is so super slow, that I just can't reasonably use the machine. This shows similar issues with 5.3, 5.4. https://forum.proxmox.com/threads/pme-spurious-native-interrupt-kernel-meldungen.62850/ Another report here with 5.6: https://bugzilla.redhat.com/show_bug.cgi?id=1831899 My current kernel is running your patch above, and I haven't done a lot of research yet to confirm whether going back to a kernel before it was merged, fixes the problem. Unfortunately the problem is not consistent, so it makes things harder to test/debug, especially on my main laptop that I do all my work on :) I noticed this older patch of yours: http://patchwork.ozlabs.org/project/linux-pci/patch/0113014581dbe2d1f938813f1783905bd81b79db.1560079442.git.lu...@wunner.de/ This patch is not in my kernel, is it worth adding? Can I get you more info to help debug this? If that helps: sauron:/usr/src/linux-5.7.11-amd64-preempt-sysrq-20190816/drivers/pci# lspci 00:00.0 Host bridge: Intel Corporation Device 3e20 (rev 0d) 00:01.0 PCI bridge: Intel Corporation Xeon E3-1200 v5/E3-1500 v5/6th Gen Core Processor PCIe Controller (x16) (rev 0d) 00:02.0 VGA compatible controller: Intel Corporation UHD Graphics 630 (Mobile) (rev 02) 00:04.0 Signal processing controller: Intel Corporation Xeon E3-1200 v5/E3-1500 v5/6th Gen Core Processor Thermal Subsystem (rev 0d) 00:08.0 System peripheral: Intel Corporation Xeon E3-1200 v5/v6 / E3-1500 v5 / 6th/7th/8th Gen Core Processor Gaussian Mixture Model 00:12.0 Signal processing controller: Intel Corporation Cannon Lake PCH Thermal Controller (rev 10) 00:14.0 USB controller: Intel Corporation Cannon Lake PCH USB 3.1 xHCI Host Controller (rev 10) 00:14.2 RAM memory: Intel Corporation Cannon Lake PCH Shared SRAM (rev 10) 00:15.0 Serial bus controller [0c80]: Intel Corporation Cannon Lake PCH Serial IO I2C Controller #0 (rev 10) 00:15.1 Serial bus controller [0c80]: Intel Corporation Cannon Lake PCH Serial IO I2C Controller #1 (rev 10) 00:16.0 Communication controller: Intel Corporation Cannon Lake PCH HECI Controller (rev 10) 00:17.0 SATA controller: Intel Corporation Cannon Lake Mobile PCH SATA AHCI Controller (rev 10) 00:1b.0 PCI bridge: Intel Corporation Cannon Lake PCH PCI Express Root Port #17 (rev f0) 00:1c.0 PCI bridge: Intel Corporation Cannon Lake PCH PCI Express Root Port #1 (rev f0) 00:1c.5 PCI bridge: Intel Corporation Cannon Lake PCH PCI Express Root Port #6 (rev f0) 00:1c.7 PCI bridge: Intel Corporation Cannon Lake PCH PCI Express Root Port #8 (rev f0) 00:1e.0 Communication controller: Intel Corporation Cannon Lake PCH Serial IO UART Host Controller (rev 10) 00:1f.0 ISA bridge: Intel Corporation Cannon Lake LPC Controller (rev 10) 00:1f.3 Audio device: Intel Corporation Cannon Lake PCH cAVS (rev 10) 00:1f.4 SMBus: Intel Corporation Cannon Lake PCH SMBus Controller (rev 10) 00:1f.5 Serial bus controller [0c80]: Intel Corporation Cannon Lake PCH SPI Controller (rev 10) 00:1f.6 Ethernet controller: Intel Corporation Ethernet Connection (7) I219-LM (rev 10) 01:00.0 VGA compatible controller: NVIDIA Corporation TU104GLM [Quadro RTX 4000 Mobile / Max-Q] (rev a1) 01:00.1 Audio device: NVIDIA Corporation TU104 HD Audio Controller (rev a1) 01:00.2 USB controller: NVIDIA Corporation TU104 USB 3.1 Host Controller (rev a1) 01:00.3 Serial bus controller [0c80]: NVIDIA Corporation TU104 USB Type-C UCSI Controller (rev a1) 02:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe SSD Controller SM981/PM981/PM983 04:00.0 PCI bridge: Intel Cor
Re: [PATCH v2 1/2] PCI: Introduce pcie_wait_for_link_delay()
I forgot to add that my mostly hanging boots look like this: https://photos.app.goo.gl/HJvTraYYZbiNTNE39 Marc -- "A mouse is a device used to point at the xterm you want to type in" - A.S.R. Home page: http://marc.merlins.org/
Re: [PATCH v2 1/2] PCI: Introduce pcie_wait_for_link_delay()
On Sat, Aug 08, 2020 at 01:22:02PM -0700, Marc MERLIN wrote: > Basically, I'm getting the same thing than this person with a P53 (which > is a mostly identical lenovo thinkpad, to mine) > kernel: pcieport :00:01.0: PME: Spurious native interrupt! > kernel: pcieport :00:01.0: PME: Spurious native interrupt! > kernel: pcieport :00:01.0: PME: Spurious native interrupt! > kernel: pcieport :00:01.0: PME: Spurious native interrupt! > kernel: pcieport :00:01.0: PME: Spurious native interrupt! > https://bbs.archlinux.org/viewtopic.php?id=250658 I had to reboot today and tried my 5.7.11 kernel 6 times. It never booted and each time got stuck on pcieport :00:01.0: PME: Spurious native interrupt! This is the nvidia driver and claimed by nouveau (I don't use nvidia graphics but I'm forced to use nouveau to turn the nvidia chip down so that it doesn't drain my batteries). I ended up being able to boot the 7th time after removing the yubikey in my USB-C port, which is also thunderbolt. PME messages shown below. Let me know if you'd like further data. Thanks, Marc [4.142484] acpi PNP0A08:00: _OSC: OS now controls [PCIeHotplug PME PCIeCapability LTR DPC] [4.151715] pci :00:01.0: PME# supported from D0 D3hot D3cold [4.151727] pci :00:01.0: PME# disabled [4.165979] pci :00:14.0: PME# supported from D3hot D3cold [4.166000] pci :00:14.0: PME# disabled [4.177746] pci :00:16.0: PME# supported from D3hot [4.177767] pci :00:16.0: PME# disabled [4.180850] pci :00:17.0: PME# supported from D3hot [4.180862] pci :00:17.0: PME# disabled [4.183830] pci :00:1b.0: PME# supported from D0 D3hot D3cold [4.183847] pci :00:1b.0: PME# disabled [4.189643] pci :00:1c.0: PME# supported from D0 D3hot D3cold [4.189660] pci :00:1c.0: PME# disabled [4.193085] pci :00:1c.5: PME# supported from D0 D3hot D3cold [4.193101] pci :00:1c.5: PME# disabled [4.196462] pci :00:1c.7: PME# supported from D0 D3hot D3cold [4.196478] pci :00:1c.7: PME# disabled [4.206057] pci :00:1f.3: PME# supported from D3hot D3cold [4.206079] pci :00:1f.3: PME# disabled [4.214993] pci :00:1f.6: PME# supported from D0 D3hot D3cold [4.215015] pci :00:1f.6: PME# disabled [4.217978] pci :01:00.0: PME# supported from D0 D3hot [4.217991] pci :01:00.0: PME# disabled [4.219129] pci :01:00.2: PME# supported from D0 D3hot [4.219142] pci :01:00.2: PME# disabled [4.219578] pci :01:00.3: PME# supported from D0 D3hot [4.219591] pci :01:00.3: PME# disabled [4.221398] pci :04:00.0: PME# supported from D0 D1 D2 D3hot D3cold [4.221433] pci :04:00.0: PME# disabled [4.82] pci :05:00.0: PME# supported from D0 D1 D2 D3hot D3cold [4.97] pci :05:00.0: PME# disabled [4.222792] pci :05:01.0: PME# supported from D0 D1 D2 D3hot D3cold [4.222806] pci :05:01.0: PME# disabled [4.223289] pci :05:02.0: PME# supported from D0 D1 D2 D3hot D3cold [4.223304] pci :05:02.0: PME# disabled [4.223839] pci :05:04.0: PME# supported from D0 D1 D2 D3hot D3cold [4.223854] pci :05:04.0: PME# disabled [4.224645] pci :06:00.0: PME# supported from D0 D1 D2 D3hot D3cold [4.224661] pci :06:00.0: PME# disabled [4.225644] pci :2c:00.0: PME# supported from D0 D1 D2 D3hot D3cold [4.225661] pci :2c:00.0: PME# disabled [4.227557] pci :52:00.0: PME# supported from D0 D3hot D3cold [4.227708] pci :52:00.0: PME# disabled [4.229139] pci :54:00.0: PME# supported from D1 D2 D3hot D3cold [4.229155] pci :54:00.0: PME# disabled [7.238126] pcieport :00:01.0: PME: Signaling with IRQ 122 [7.239208] pcieport :00:1b.0: PME: Signaling with IRQ 123 [7.239861] pcieport :00:1c.0: PME: Signaling with IRQ 124 [7.241522] pcieport :00:1c.5: PME: Signaling with IRQ 125 [7.242499] pcieport :00:1c.7: PME: Signaling with IRQ 126 [7.401422] pcieport :05:01.0: PME# enabled [7.401868] pcieport :05:04.0: PME# enabled [8.985668] xhci_hcd :01:00.2: PME# enabled [8.988738] xhci_hcd :2c:00.0: PME# enabled [9.008649] pcieport :05:02.0: PME# enabled [ 12.378450] nvidia-gpu :01:00.3: PME# enabled [ 25.610848] thunderbolt :06:00.0: PME# enabled [ 25.628766] pcieport :05:00.0: PME# enabled [ 25.648762] pcieport :04:00.0: PME# enabled [ 25.668889] pcieport :00:1c.0: PME# enabled [ 179.608847] nvidia-gpu :01:00.3: PME# disabled [ 179.608873] pcieport :00:01.0: PME: Spurious native interrupt! [ 183.359454] nvidia-gpu :01:00.3: PME# enabled [ 183.396832] nvidia-gpu :01:00.3: PME# disabled [ 183.396859] pcieport :00:01.0: PME: Spurious native interrupt! [ 187.147398] nvidia-gpu :01:00.3: PME# enabled [ 1
Re: 5.9.11 still hanging 2mn at each boot and looping on nvidia-gpu 0000:01:00.3: PME# enabled (Quadro RTX 4000 Mobile)
On Sat, Dec 26, 2020 at 03:12:09AM -0800, Ilia Mirkin wrote: > > after boot, when it gets the right trigger (not sure which ones), it > > loops on this evern 2 seconds, mostly forever. > > The gpu suspends with runtime pm. And then gets woken up for some > reason (could be something quite silly, like lspci, or could be > something explicitly checking connectors, etc). Repeat. Ah, fair point. Could it be powertop even? How would I go towards tracing that? Sounds like this would be a problem with all chips if userspace is able to wake them up every second or two with a probe. Now I wonder what broken userspace I have that could be doing this. > Display offload usually requires acceleration -- the copies are done > using the DMA engine. Please make sure that you have firmware > available (and a new enough mesa). The errors suggest that you don't > have firmware available at the time that nouveau loads. Depending on > your setup, that might mean the firmware has to be built into the > kernel, or available in initramfs. (Or just regular filesystem if you > don't use a complicated boot sequence. But many people go with distro > defaults, which do have this complexity.) Hi Ilia, thanks for your answer. Do you think that could be a reason why the boot would hang for 2 full minutes at every boot ever since I upgraded to 5.5? Also, without wanting to sound like a full newbie, where is that firmware you're talking about? In my kernel source? Here's what I do have: sauron:/usr/local/bin# dpkggrep nouveau libdrm-nouveau2:amd64 install xserver-xorg-video-nouveau install no nouveau-firmware package in debian: sauron:/usr/local/bin# apt-cache search nouveau bumblebee - NVIDIA Optimus support for Linux libdrm-nouveau2 - Userspace interface to nouveau-specific kernel DRM services -- runtime xfonts-jmk - Jim Knoble's character-cell fonts for X xserver-xorg-video-nouveau - X.Org X server -- Nouveau display driver No firmware file on my disk: sauron:/usr/local/bin# find /lib/modules/5.9.11-amd64-preempt-sysrq-20190817/ /lib/firmware/ |grep nouveau /lib/modules/5.9.11-amd64-preempt-sysrq-20190817/kernel/drivers/gpu/drm/nouveau /lib/modules/5.9.11-amd64-preempt-sysrq-20190817/kernel/drivers/gpu/drm/nouveau/nouveau.ko sauron:/usr/local/bin# The kernel module is in my initrd: sauron:/usr/local/bin# dd if=/boot/initrd.img-5.9.11-amd64-preempt-sysrq-20190817 bs=2966528 skip=1 | gunzip | cpio -tdv | grep nouveau drwxr-xr-x 1 root root0 Nov 30 15:40 usr/lib/modules/5.9.11-amd64-preempt-sysrq-20190817/kernel/drivers/gpu/drm/nouveau -rw-r--r-- 1 root root 3691385 Nov 30 15:35 usr/lib/modules/5.9.11-amd64-preempt-sysrq-20190817/kernel/drivers/gpu/drm/nouveau/nouveau.ko 17+1 records in 17+1 records out 52566778 bytes (53 MB, 50 MiB) copied, 1.69708 s, 31.0 MB/s What am I supposed to do/check next? Note that ultimately I only need nouveau not to hang my boot 2mn and do PM so that the nvidia chip goes to sleep since I don't use it. Thanks, Marc -- "A mouse is a device used to point at the xterm you want to type in" - A.S.R. Home page: http://marc.merlins.org/ | PGP 7F55D5F27AAF9D08
5.9.11 still hanging 2mn at each boot and looping on nvidia-gpu 0000:01:00.3: PME# enabled (Quadro RTX 4000 Mobile)
This started with 5.5 and hasn't gotten better since then, despite some reports I tried to send. As per my previous message: I have a Thinkpad P70 with hybrid graphics. 01:00.0 VGA compatible controller: NVIDIA Corporation GM107GLM [Quadro M600M] (rev a2) that one works fine, I can use i915 for the main screen, and nouveau to display on the external ports (external ports are only wired to nvidia chip, so it's impossible to use them without turning the nvidia chip on). I now got a newer P73 also with the same hybrid graphics (setup as such in the bios). It runs fine with i915, and I don't need to use external display with nouveau for now (it almost works, but I only see the mouse cursor on the external screen, no window or anything else can get displayed, very weird). 01:00.0 VGA compatible controller: NVIDIA Corporation TU104GLM [Quadro RTX 4000 Mobile / Max-Q] (rev a1) after boot, when it gets the right trigger (not sure which ones), it loops on this evern 2 seconds, mostly forever. I'm not sure if it's nouveau's fault or the kernel's PCI PME's fault, or something else. Boot hangs look like this: [ 10.659209] Console: switching to colour frame buffer device 240x67 [ 10.732353] i915 :00:02.0: [drm] fb0: i915drmfb frame buffer device [ 12.101203] nvidia-gpu :01:00.3: saving config space at offset 0x0 (reading 0x1ad910de) [ 12.101212] nvidia-gpu :01:00.3: saving config space at offset 0x4 (reading 0x100406) [ 12.101217] nvidia-gpu :01:00.3: saving config space at offset 0x8 (reading 0xc8000a1) [ 12.101223] nvidia-gpu :01:00.3: saving config space at offset 0xc (reading 0x80) [ 12.101228] nvidia-gpu :01:00.3: saving config space at offset 0x10 (reading 0xce054000) [ 12.101234] nvidia-gpu :01:00.3: saving config space at offset 0x14 (reading 0x0) [ 12.101239] nvidia-gpu :01:00.3: saving config space at offset 0x18 (reading 0x0) [ 12.101244] nvidia-gpu :01:00.3: saving config space at offset 0x1c (reading 0x0) [ 12.101249] nvidia-gpu :01:00.3: saving config space at offset 0x20 (reading 0x0) [ 12.101254] nvidia-gpu :01:00.3: saving config space at offset 0x24 (reading 0x0) [ 12.101259] nvidia-gpu :01:00.3: saving config space at offset 0x28 (reading 0x0) [ 12.101265] nvidia-gpu :01:00.3: saving config space at offset 0x2c (reading 0x229b17aa) [ 12.101270] nvidia-gpu :01:00.3: saving config space at offset 0x30 (reading 0x0) [ 12.101275] nvidia-gpu :01:00.3: saving config space at offset 0x34 (reading 0x68) [ 12.101280] nvidia-gpu :01:00.3: saving config space at offset 0x38 (reading 0x0) [ 12.101285] nvidia-gpu :01:00.3: saving config space at offset 0x3c (reading 0x4ff) [ 12.101333] nvidia-gpu :01:00.3: PME# enabled [ 25.151246] thunderbolt :06:00.0: saving config space at offset 0x0 (reading 0x15eb8086) [ 25.151260] thunderbolt :06:00.0: saving config space at offset 0x4 (reading 0x100406) [ 25.151265] thunderbolt :06:00.0: saving config space at offset 0x8 (reading 0x886) [ 25.151270] thunderbolt :06:00.0: saving config space at offset 0xc (reading 0x20) [ 25.151276] thunderbolt :06:00.0: saving config space at offset 0x10 (reading 0xcc10) [ 25.151281] thunderbolt :06:00.0: saving config space at offset 0x14 (reading 0xcc14) [ 25.151286] thunderbolt :06:00.0: saving config space at offset 0x18 (reading 0x0) [ 25.151291] thunderbolt :06:00.0: saving config space at offset 0x1c (reading 0x0) [ 25.151296] thunderbolt :06:00.0: saving config space at offset 0x20 (reading 0x0) [ 25.151301] thunderbolt :06:00.0: saving config space at offset 0x24 (reading 0x0) [ 25.151306] thunderbolt :06:00.0: saving config space at offset 0x28 (reading 0x0) [ 25.151311] thunderbolt :06:00.0: saving config space at offset 0x2c (reading 0x229b17aa) [ 25.151316] thunderbolt :06:00.0: saving config space at offset 0x30 (reading 0x0) [ 25.151322] thunderbolt :06:00.0: saving config space at offset 0x34 (reading 0x80) [ 25.151327] thunderbolt :06:00.0: saving config space at offset 0x38 (reading 0x0) [ 25.151332] thunderbolt :06:00.0: saving config space at offset 0x3c (reading 0x1ff) [ 25.151416] thunderbolt :06:00.0: PME# enabled [ 25.169204] pcieport :05:00.0: saving config space at offset 0x0 (reading 0x15ea8086) [ 25.169214] pcieport :05:00.0: saving config space at offset 0x4 (reading 0x100407) [ 25.169219] pcieport :05:00.0: saving config space at offset 0x8 (reading 0x6040006) [ 25.169224] pcieport :05:00.0: saving config space at offset 0xc (reading 0x10020) [ 25.169229] pcieport :05:00.0: saving config space at offset 0x10 (reading 0x0) [ 25.169233] pcieport :05:00.0: saving config space at offset 0x14 (reading 0x0) [ 25.169238] pcieport :05:00.0: saving config space at offset 0x18 (reading 0x60605) [ 25.1692
Re: btrfs-rmw-2: page allocation failure: order:1, mode:0x8020
+linux-kernel since I got no answer. Hi, I see you are maintainers of/contributors to drivers/scsi/mvsas The btrfs folks pointed out that the problem below is due to the MVS driver, namely: From: Chris Mason This is an order 1 atomic allocation from the mvs driver, we really should not be depending on that to get IO done. A quick search and it looks like we're allocating MVS_SLOT_BUF_SZ (8192) bytes. You could try bumping the lowmem reserves. -chris Would you be able to modify the driver to avoid these low memory problems? Thanks, Marc - Forwarded message from Marc MERLIN - From: Marc MERLIN To: linux-bt...@vger.kernel.org My server died last night during a btrfs send/receive to a btrfs radi5 array Here are the logs. Is this anything known or with a possible workaround? Thanks, Marc btrfs-rmw-2: page allocation failure: order:1, mode:0x8020 CPU: 1 PID: 12499 Comm: btrfs-rmw-2 Not tainted 3.14.0-rc5-amd64-i915-preempt-20140216c #1 Hardware name: System manufacturer P5KC/P5KC, BIOS 050205/24/2007 88000549d780 816090b3 88000549d808 811037b0 0001fffe 88007ff7ce00 0002 0030 88007ff7ce00 Call Trace: [] dump_stack+0x4e/0x7a [] warn_alloc_failed+0x111/0x125 [] __alloc_pages_nodemask+0x707/0x854 [] ? dma_generic_alloc_coherent+0xa7/0x11c [] dma_generic_alloc_coherent+0xa7/0x11c [] dma_pool_alloc+0x10a/0x1cb [] mvs_task_prep+0x192/0xa42 [mvsas] [] ? blkg_path.isra.80.constprop.90+0x17/0x38 [] ? cache_alloc+0x1c/0x29b [] mvs_task_exec.isra.9+0x5d/0xc9 [mvsas] [] mvs_queue_command+0x3d/0x29b [mvsas] [] ? kmem_cache_alloc+0xe3/0x161 [] sas_ata_qc_issue+0x1cd/0x235 [libsas] [] ata_qc_issue+0x291/0x2f1 [] ? ata_scsiop_mode_sense+0x29c/0x29c [] __ata_scsi_queuecmd+0x184/0x1e0 [] ata_sas_queuecmd+0x31/0x4d [] sas_queuecommand+0x98/0x1fe [libsas] [] scsi_dispatch_cmd+0x14f/0x22e [] scsi_request_fn+0x4da/0x507 [] ? blk_recount_segments+0x1e/0x2e [] __blk_run_queue_uncond+0x22/0x2b [] __blk_run_queue+0x19/0x1b [] blk_queue_bio+0x23f/0x256 [] generic_make_request+0x9c/0xdb [] submit_bio+0x112/0x131 [] rmw_work+0x112/0x162 [] worker_loop+0x168/0x4d8 [] ? btrfs_queue_worker+0x283/0x283 [] kthread+0xae/0xb6 [] ? __kthread_parkme+0x61/0x61 [] ret_from_fork+0x7c/0xb0 [] ? __kthread_parkme+0x61/0x61 Mem-Info: Node 0 DMA per-cpu: CPU0: hi:0, btch: 1 usd: 0 CPU1: hi:0, btch: 1 usd: 0 Node 0 DMA32 per-cpu: CPU0: hi: 186, btch: 31 usd: 171 CPU1: hi: 186, btch: 31 usd: 190 active_anon:17298 inactive_anon:21061 isolated_anon:0 active_file:67491 inactive_file:94189 isolated_file:32 unevictable:1260 dirty:38914 writeback:49596 unstable:0 free:15999 slab_reclaimable:8198 slab_unreclaimable:9741 mapped:12981 shmem:1661 pagetables:2711 bounce:0 free_cma:0 Node 0 DMA free:8084kB min:348kB low:432kB high:520kB active_anon:360kB inactive_anon:764kB active_file:288kB inactive_file:2040kB unevictable:100kB isolated(anon):0kB isolated(file):0kB present:15976kB managed:15892kB mlocked:100kB dirty:0kB writeback:1272kB mapped:252kB shmem:8kB slab_reclaimable:168kB slab_unreclaimable:336kB kernel_stack:88kB pagetables:128kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no lowmem_reserve[]: 0 1987 1987 1987 Node 0 DMA32 free:56080kB min:44704kB low:55880kB high:67056kB active_anon:68832kB inactive_anon:83480kB active_file:269676kB inactive_file:374588kB unevictable:4940kB isolated(anon):0kB isolated(file):128kB present:2080256kB managed:2039064kB mlocked:4940kB dirty:155668kB writeback:197112kB mapped:51672kB shmem:6636kB slab_reclaimable:32624kB slab_unreclaimable:38628kB kernel_stack:2912kB pagetables:10716kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:32 all_unreclaimable? no lowmem_reserve[]: 0 0 0 0 Node 0 DMA: 85*4kB (UEM) 22*8kB (UEM) 62*16kB (UEM) 6*32kB (UM) 2*64kB (UE) 5*128kB (UEM) 6*256kB (UEM) 4*512kB (EM) 0*1024kB 1*2048kB (R) 0*4096kB = 8100kB Node 0 DMA32: 13004*4kB (M) 16*8kB (M) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 1*4096kB (R) = 56240kB Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB 164139 total pagecache pages 0 pages in swap cache Swap cache stats: add 0, delete 0, find 0/0 Free swap = 9255932kB Total swap = 9255932kB 524058 pages RAM 0 pages HighMem/MovableOnly 10298 pages reserved 0 pages hwpoisoned mvsas :01:00.0: mvsas prep failed[0]! btrfs-rmw-2: page allocation failure: order:1, mode:0x8020 CPU: 1 PID: 12499 Comm: btrfs-rmw-2 Not tainted 3.14.0-rc5-amd64-i915-preempt-20140216c #1 Hardware name: System manufacturer P5KC/P5KC, BIOS 050205/24/2007 88000549d690 816
Deleting pstore data causes immediate hang of 4.15.5 on Lenovo P70 with upgraded bios
Howdy, I have a thinkpad P70 which started to fail resuming from S3 sleep after any kernel past 4.12 (sometimes it would work, sometimes the HD led would come on when trying to resume, but nothing else). After much debugging trying to figure what was causing it and coming short, I decided to upgrade the very old firmware/bios on that laptop, since it likely had many bugs. The firmware update from a boot CD was weird, long, and worrisome. It looks like after 1h or so (very long procedure), I got the latest firmware now, but it won't boot my NVME M2 drive anymore, it shows in the boot menu, but just hangs if I use it to boot. However, I can get it to boot my M2 SATA drive. The nvme drive shows up fine and works once linux has booted. So, I figured I'd try a new bootmgr entry saruman:~# efibootmgr -v -c -d /dev/nvme0n1 -p 1 -L "GrubNVME" -l '\EFI\debian\grubx64.efi' Could not prepare Boot variable: No space left on device <<< Ok, this brought me to https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=845023 and https://mjg59.dreamwidth.org/23554.html Sure enough, saruman:~# df /sys/fs/pstore/ Filesystem 1K-blocks Used Available Use% Mounted on pstore 0 0 0- /sys/fs/pstore it's full of files, and I'm assuming the variable storage is full of crap (see below) The problem is trying to delete any file in there causes an immediate hange of the kernel. Any idea how to get around this problem? I realize it may be the bios that's crashing/hanging and not linux. At least filling up the space did not brick my machine like Matthew pointing out some firwmare crashes when it's full ( https://mjg59.dreamwidth.org/23554.html ) Is there any way to clear all this space, maybe from inside the bios by resetting everything to default, or some other way? saruman:~# l /sys/fs/pstore/ | wc -l 151 saruman:~# l /sys/fs/pstore/ | head total 0 drwxr-x--- 2 root root0 Mar 1 22:00 ./ drwxr-xr-x 10 root root0 Mar 1 22:02 ../ -r--r--r-- 1 root root 983 Feb 16 2016 dmesg-efi-145565830401001 -r--r--r-- 1 root root 1744 Feb 16 2016 dmesg-efi-145565830401002 -r--r--r-- 1 root root 952 Feb 16 2016 dmesg-efi-145565830402001 -r--r--r-- 1 root root 1636 Feb 16 2016 dmesg-efi-145565830402002 -r--r--r-- 1 root root 1014 Feb 16 2016 dmesg-efi-145565830403001 -r--r--r-- 1 root root 1781 Feb 16 2016 dmesg-efi-145565830403002 -r--r--r-- 1 root root 351 Feb 16 2016 dmesg-efi-145565830404001 saruman:~# cat /sys/fs/pstore/dmesg-efi-145565830401001 Oops#1 Part1 <4>[ 4508.389437] [] do_execveat_common.isra.26+0x450/0x5fd <4>[ 4508.389495] [] do_execve+0x23/0x25 <4>[ 4508.389541] [] SyS_execve+0x2a/0x2e <4>[ 4508.389582] [] stub_execve+0x5/0x5 <4>[ 4508.389624] [] ? entry_SYSCALL_64_fastpath+0x16/0x75 <4>[ 4508.389682] Code: 45 31 e4 48 8b 47 78 4c 8b 30 48 8d 58 f0 48 8d 47 78 48 89 45 d0 49 83 ee 10 48 8d 43 10 48 39 45 d0 74 6f 4c 8b 6b 08 4c 89 e7 <49> 8b 75 00 e8 3a f0 ff ff 49 8d 75 40 48 89 df 49 89 c4 e8 6b <1>[ 4508.390025] RIP [] unlink_anon_vmas+0x41/0x13e <4>[ 4508.390086] RSP <4>[ 4508.390119] CR2: 00fb <7>[ 4508.390339] pci_bus :3b: busn_res: [bus 3b] is released <7>[ 4508.390468] pci_bus :3c: busn_res: [bus 3c-6f] is released <7>[ 4508.390605] pci_bus :06: busn_res: [bus 06-6f] is released <4>[ 4508.470221] ---[ end trace e21f39de184e5ef4 ]--- Yeah, there is another issue that I have something that kept writing here until it filled up, and nothing that ever emptied it. I guess my old bios didn't care and the new one is having issues with this. If I'm unlucky, this may even have caused the firmware upgrade to fail partially? Handle 0x000E, DMI type 0, 24 bytes BIOS Information Vendor: LENOVO Version: N1DET95W (2.21 ) Release Date: 12/13/2017 Runtime Size: 128 kB ROM Size: 16384 kB BIOS Revision: 2.21 Firmware Revision: 1.17 Thanks, Marc -- "A mouse is a device used to point at the xterm you want to type in" - A.S.R. Microsoft is to operating systems what McDonalds is to gourmet cooking Home page: http://marc.merlins.org/
Re: Deleting pstore data causes immediate hang of 4.15.5 on Lenovo P70 with upgraded bios
[+linux-efi and fixed Matthew's Email] As an update, I got my NVME drive to boot once at least, it seem that I need to wait about 2mn for the bios to do whatever, hang, recover and then finally continue booting. If I take over and force a boot on the M2 Sata drive instead, then it boots near instantly. After 2H on the phone with lenovo an finally getting someone with a clue, apparently removing the CMOS battery may clear that pstore storage and help with my issue. Obviously it will also kill my efiboomgr entries and all my settings, although I could recover from that if needed. Before I go through all that trouble though, it'd be great to figure out why linux is causing hangs when deleting pstore data, and if it's only a bios bug we can do nothing about, or maybe an issue on the linux side. Is there any other way to delete from /sys/fs/pstore/ besides rm which causes an instant hang? Well, how about that, truncating the files seems to work, and now efibootmgr is able to make a new entry with the space I just freed. pstore is still full of files, but they're not 0 sized, so I'm likely only wasting the space for the filenames now. Now, I probably have to also find what is writing to pstore and kill that job given that deleting from pstore seems not possible on my machine, and filling it up causes the bios to get upset. Marc On Thu, Mar 01, 2018 at 10:22:39PM -0800, Marc MERLIN wrote: > Howdy, > > I have a thinkpad P70 which started to fail resuming from S3 sleep after any > kernel past 4.12 (sometimes it would work, sometimes the HD led would come > on when trying to resume, but nothing else). > After much debugging trying to figure what was causing it and coming short, > I decided to upgrade the very old firmware/bios on that laptop, since it > likely > had many bugs. > > The firmware update from a boot CD was weird, long, and worrisome. It looks > like after 1h or so (very long procedure), I got the latest firmware now, > but it won't boot my NVME M2 drive anymore, it shows in the boot menu, but > just hangs if I use it to boot. > However, I can get it to boot my M2 SATA drive. The nvme drive shows up fine > and works once linux has booted. > > So, I figured I'd try a new bootmgr entry > saruman:~# efibootmgr -v -c -d /dev/nvme0n1 -p 1 -L "GrubNVME" -l > '\EFI\debian\grubx64.efi' > Could not prepare Boot variable: No space left on device <<< > > Ok, this brought me to > https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=845023 > and > https://mjg59.dreamwidth.org/23554.html > > Sure enough, > saruman:~# df /sys/fs/pstore/ > Filesystem 1K-blocks Used Available Use% Mounted on > pstore 0 0 0- /sys/fs/pstore > it's full of files, and I'm assuming the variable storage is full of crap > (see below) > > The problem is trying to delete any file in there causes an immediate hange > of the kernel. > > Any idea how to get around this problem? I realize it may be the bios > that's crashing/hanging and not linux. > At least filling up the space did not brick my machine like Matthew pointing > out > some firwmare crashes when it's full ( > https://mjg59.dreamwidth.org/23554.html ) > > Is there any way to clear all this space, maybe from inside the bios by > resetting everything to default, or some other way? > > saruman:~# l /sys/fs/pstore/ | wc -l > 151 > saruman:~# l /sys/fs/pstore/ | head > total 0 > drwxr-x--- 2 root root0 Mar 1 22:00 ./ > drwxr-xr-x 10 root root0 Mar 1 22:02 ../ > -r--r--r-- 1 root root 983 Feb 16 2016 dmesg-efi-145565830401001 > -r--r--r-- 1 root root 1744 Feb 16 2016 dmesg-efi-145565830401002 > -r--r--r-- 1 root root 952 Feb 16 2016 dmesg-efi-145565830402001 > -r--r--r-- 1 root root 1636 Feb 16 2016 dmesg-efi-145565830402002 > -r--r--r-- 1 root root 1014 Feb 16 2016 dmesg-efi-145565830403001 > -r--r--r-- 1 root root 1781 Feb 16 2016 dmesg-efi-145565830403002 > -r--r--r-- 1 root root 351 Feb 16 2016 dmesg-efi-145565830404001 > saruman:~# cat /sys/fs/pstore/dmesg-efi-145565830401001 > Oops#1 Part1 > <4>[ 4508.389437] [] do_execveat_common.isra.26+0x450/0x5fd > <4>[ 4508.389495] [] do_execve+0x23/0x25 > <4>[ 4508.389541] [] SyS_execve+0x2a/0x2e > <4>[ 4508.389582] [] stub_execve+0x5/0x5 > <4>[ 4508.389624] [] ? entry_SYSCALL_64_fastpath+0x16/0x75 > <4>[ 4508.389682] Code: 45 31 e4 48 8b 47 78 4c 8b 30 48 8d 58 f0 48 8d 47 78 > 48 89 45 d0 49 83 ee 10 48 8d 43 10 48 39 45 d0 74 6f 4c 8b 6b 08 4c 89 e7 > <49> 8b 75 00 e8 3a f0 ff ff 49 8d 75 40 48 89 df 49 89 c4 e8 6b > <1>[ 4508.390025] RIP [] unlink_anon_vmas+0x41/0x13e > <4>[ 4508.390086] RSP > <4
Re: Deleting pstore data causes immediate hang of 4.15.5 on Lenovo P70 with upgraded bios
Sigh, and now I was just able to do this: saruman:/sys/fs/pstore# \rm * saruman:/sys/fs/pstore# l total 0 drwxr-x--- 2 root root 0 Mar 2 11:28 ./ drwxr-xr-x 10 root root 0 Mar 2 10:20 ../ Ok, so forget linux, I think it's just a stupid EFI bios. If I were to venture a guess: 1) I went in setup, reset to default, that deleted my efibootmgr entries 2) some EFI space got freed as a result 3) truncating pstore files worked, because of #1 or not 4) now that the storage fronted by pstore, wasn't full anymore, deleting files just worked. 5) I had to recreate my efibootmgr entries, and now that there is space, that worked fine. I'm going to guess that the EFI bios needs some space to delete files and without any, it just hangs. Oh well, sorry for the noise, and if maybe someone hits this problem in the future, they'll be able to find this post with the solution. On Fri, Mar 02, 2018 at 11:17:39AM -0800, Marc MERLIN wrote: > [+linux-efi and fixed Matthew's Email] > > As an update, I got my NVME drive to boot once at least, it seem that I need > to wait about 2mn for the bios to do whatever, hang, recover and then > finally continue booting. > If I take over and force a boot on the M2 Sata drive instead, then it boots > near instantly. > > After 2H on the phone with lenovo an finally getting someone with a clue, > apparently removing the CMOS battery may clear that pstore storage and help > with my issue. > Obviously it will also kill my efiboomgr entries and all my settings, > although I could recover from that if needed. > Before I go through all that trouble though, it'd be great to figure out why > linux is causing hangs when deleting pstore data, and if it's only a bios > bug we can do nothing about, or maybe an issue on the linux side. > > Is there any other way to delete from /sys/fs/pstore/ besides rm which > causes an instant hang? > Well, how about that, truncating the files seems to work, and now efibootmgr > is able to make a new entry with the space I just freed. > pstore is still full of files, but they're not 0 sized, so I'm likely only > wasting the space for the filenames now. > > Now, I probably have to also find what is writing to pstore and > kill that job given that deleting from pstore seems not possible on my > machine, and filling it up causes the bios to get upset. > > Marc > > On Thu, Mar 01, 2018 at 10:22:39PM -0800, Marc MERLIN wrote: > > Howdy, > > > > I have a thinkpad P70 which started to fail resuming from S3 sleep after any > > kernel past 4.12 (sometimes it would work, sometimes the HD led would come > > on when trying to resume, but nothing else). > > After much debugging trying to figure what was causing it and coming short, > > I decided to upgrade the very old firmware/bios on that laptop, since it > > likely > > had many bugs. > > > > The firmware update from a boot CD was weird, long, and worrisome. It looks > > like after 1h or so (very long procedure), I got the latest firmware now, > > but it won't boot my NVME M2 drive anymore, it shows in the boot menu, but > > just hangs if I use it to boot. > > However, I can get it to boot my M2 SATA drive. The nvme drive shows up fine > > and works once linux has booted. > > > > So, I figured I'd try a new bootmgr entry > > saruman:~# efibootmgr -v -c -d /dev/nvme0n1 -p 1 -L "GrubNVME" -l > > '\EFI\debian\grubx64.efi' > > Could not prepare Boot variable: No space left on device <<< > > > > Ok, this brought me to > > https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=845023 > > and > > https://mjg59.dreamwidth.org/23554.html > > > > Sure enough, > > saruman:~# df /sys/fs/pstore/ > > Filesystem 1K-blocks Used Available Use% Mounted on > > pstore 0 0 0- /sys/fs/pstore > > it's full of files, and I'm assuming the variable storage is full of crap > > (see below) > > > > The problem is trying to delete any file in there causes an immediate hange > > of the kernel. > > > > Any idea how to get around this problem? I realize it may be the bios > > that's crashing/hanging and not linux. > > At least filling up the space did not brick my machine like Matthew > > pointing out > > some firwmare crashes when it's full ( > > https://mjg59.dreamwidth.org/23554.html ) > > > > Is there any way to clear all this space, maybe from inside the bios by > > resetting everything to default, or some other way? > > > > saruman:~# l /sys/fs/pstore/ | wc -l > > 151 > > saruman:~# l /sys/fs/pstore/ | head &
Re: 4.8.8 kernel trigger OOM killer repeatedly when I have lots of RAM that should be free
On Mon, Nov 21, 2016 at 01:56:39PM -0800, Marc MERLIN wrote: > On Mon, Nov 21, 2016 at 10:50:20PM +0100, Vlastimil Babka wrote: > > > 4.9rc5 however seems to be doing better, and is still running after 18 > > > hours. However, I got a few page allocation failures as per below, but the > > > system seems to recover. > > > Vlastimil, do you want me to continue the copy on 4.9 (may take 3-5 days) > > > or is that good enough, and i should go back to 4.8.8 with that patch > > > applied? > > > https://marc.info/?l=linux-mm&m=147423605024993 > > > > Hi, I think it's enough for 4.9 for now and I would appreciate trying > > 4.8 with that patch, yeah. > > So the good news is that it's been running for almost 5H and so far so good. And the better news is that the copy is still going strong, 4.4TB and going. So 4.8.8 is fixed with that one single patch as far as I'm concerned. So thanks for that, looks good to me to merge. Marc -- "A mouse is a device used to point at the xterm you want to type in" - A.S.R. Microsoft is to operating systems what McDonalds is to gourmet cooking Home page: http://marc.merlins.org/ | PGP 1024R/763BE901
Re: 4.8.8 kernel trigger OOM killer repeatedly when I have lots of RAM that should be free
On Tue, Nov 22, 2016 at 05:25:44PM +0100, Michal Hocko wrote: > currently AFAIR. I hate that Marc is not falling into that category but > is it really problem for you to run with 4.9? If we have more users Don't do anything just on my account. I had a problem, it's been fixed in 2 different ways: 4.8+patch, or 4.9rc5 For me this was a 100% regression from 4.6, there was just no way I could copy my data at all with 4.8, it not only failed, but killed all the services on my machine until it randomly killed the shell that was doing the copy. Personally, I'll stick with 4.8 + this patch, and switch to 4.9 when it's out (I'm a bit wary of RC kernels on a production server, especially when I'm in the middle of trying to get my only good backup to work again) But at the same time, what I'm doing is probably not common (btrfs on top of dmcrypt, on top of bcache, on top of swraid5, for both source and destination), so I can't comment on whether the fix I just put on my 4.8 kernel does not cause other regressions or problems for other people. Either way, I'm personally ok again now, so I thank you all for your help, and will leave the hard decisions to you :) Marc -- "A mouse is a device used to point at the xterm you want to type in" - A.S.R. Microsoft is to operating systems what McDonalds is to gourmet cooking Home page: http://marc.merlins.org/ | PGP 1024R/763BE901
Re: [PATCH-RFC]: sysrq-a: graceful reboot via kernel_restart(), similar to sysrq-o
On Thu, Mar 10, 2016 at 09:13:13PM -0800, Marc MERLIN wrote: > On Fri, Mar 11, 2016 at 04:35:21AM +, Eric Wheeler wrote: > > Hello all, > > > > We were having a discussion on the bcache list about the safest reboot > > options via sysrq here: > > http://thread.gmane.org/gmane.linux.kernel.bcache.devel/3559/focus=3586 > > > > The result of the discussion ended up in a patch for sysrq-a to call > > kernel_restart much in the same way as sysrq-ocalls kernel_power_off. > > > > Please comment on the patch and suggest any appropriate changes. > > Thanks Eric. > > The quick rationale is that sysrq-r is not desirable to use if you're using > bcache, or software raid since it will reboot without giving them a > chance to properly sync their buffers and get into a clean state. > > I've been using sysrq-o to get a clean shutdown, but of course that > actually powers off the server, and you then need to rely on something > like WOL to bring the machine back up, which isn't always easy or > possible. > > This new reboot with proper flushing (kernel_power_off) allows for safe > reboots that don't upset bcache or software raid. Just updated to 4.6 and re-applied Eric sysrq patch. It's saved me many times already. I absolutely need to do clean reboots for both my software raid and bcache, and when the system is not doing well, sysrq-o does the graceful shutdown, but also powers off my server, which is not what I want. I've been using the new sysrq-x Eric wrote and it's been working great. Any chance, we can get this into standard kernels? I can't be the only person who benefits from this... Any suggestion on who might be a good person to review/critique/integrate this patch? Thanks, Marc -- "A mouse is a device used to point at the xterm you want to type in" - A.S.R. Microsoft is to operating systems what McDonalds is to gourmet cooking Home page: http://marc.merlins.org/ | PGP 1024R/763BE901
Re: 4.8.8 kernel trigger OOM killer repeatedly when I have lots of RAM that should be free
On Mon, Nov 28, 2016 at 08:23:15AM +0100, Michal Hocko wrote: > Marc, could you try this patch please? I think it should be pretty clear > it should help you but running it through your use case would be more > than welcome before I ask Greg to take this to the 4.8 stable tree. This will take a little while, the whole copy took 5 days to finish and I'm a bit hesitant about blowing it away and starting over :) Let me see if I can come up with maybe another disk array for another test. For now, as a reminder, I'm running that attached patch, and it works fine I'll report back as soon as I can. Marc -- "A mouse is a device used to point at the xterm you want to type in" - A.S.R. Microsoft is to operating systems what McDonalds is to gourmet cooking Home page: http://marc.merlins.org/ diff --git a/mm/page_alloc.c b/mm/page_alloc.c index a2214c64ed3c..9b3b3a79c58a 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -3347,17 +3347,24 @@ should_reclaim_retry(gfp_t gfp_mask, unsigned order, ac->nodemask) { unsigned long available; unsigned long reclaimable; + int check_order = order; + unsigned long watermark = min_wmark_pages(zone); available = reclaimable = zone_reclaimable_pages(zone); available -= DIV_ROUND_UP(no_progress_loops * available, MAX_RECLAIM_RETRIES); available += zone_page_state_snapshot(zone, NR_FREE_PAGES); + if (order > 0 && order <= PAGE_ALLOC_COSTLY_ORDER) { + check_order = 0; + watermark += 1UL << order; + } + /* * Would the allocation succeed if we reclaimed the whole * available? */ - if (__zone_watermark_ok(zone, order, min_wmark_pages(zone), + if (__zone_watermark_ok(zone, check_order, watermark, ac_classzone_idx(ac), alloc_flags, available)) { /* * If we didn't make any progress and have a lot of
Re: 4.8.8 kernel trigger OOM killer repeatedly when I have lots of RAM that should be free
On Mon, Nov 28, 2016 at 08:23:15AM +0100, Michal Hocko wrote: > Marc, could you try this patch please? I think it should be pretty clear > it should help you but running it through your use case would be more > than welcome before I ask Greg to take this to the 4.8 stable tree. I ran it overnight and copied 1.4TB with it before it failed because there wasn't enough disk space on the other side, so I think it fixes the problem too. Marc -- "A mouse is a device used to point at the xterm you want to type in" - A.S.R. Microsoft is to operating systems what McDonalds is to gourmet cooking Home page: http://marc.merlins.org/ | PGP 1024R/763BE901
Re: 4.8.8 kernel trigger OOM killer repeatedly when I have lots of RAM that should be free
On Mon, Nov 28, 2016 at 08:23:15AM +0100, Michal Hocko wrote: > Marc, could you try this patch please? I think it should be pretty clear > it should help you but running it through your use case would be more > than welcome before I ask Greg to take this to the 4.8 stable tree. > > Thanks! > > On Wed 23-11-16 07:34:10, Michal Hocko wrote: > [...] > > commit b2ccdcb731b666aa28f86483656c39c5e53828c7 > > Author: Michal Hocko > > Date: Wed Nov 23 07:26:30 2016 +0100 > > > > mm, oom: stop pre-mature high-order OOM killer invocations > > > > 31e49bfda184 ("mm, oom: protect !costly allocations some more for > > !CONFIG_COMPACTION") was an attempt to reduce chances of pre-mature OOM > > killer invocation for high order requests. It seemed to work for most > > users just fine but it is far from bullet proof and obviously not > > sufficient for Marc who has reported pre-mature OOM killer invocations > > with 4.8 based kernels. 4.9 will all the compaction improvements seems > > to be behaving much better but that would be too intrusive to backport > > to 4.8 stable kernels. Instead this patch simply never declares OOM for > > !costly high order requests. We rely on order-0 requests to do that in > > case we are really out of memory. Order-0 requests are much more common > > and so a risk of a livelock without any way forward is highly unlikely. > > > > Reported-by: Marc MERLIN > > Signed-off-by: Michal Hocko Tested-by: Marc MERLIN Marc > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > > index a2214c64ed3c..7401e996009a 100644 > > --- a/mm/page_alloc.c > > +++ b/mm/page_alloc.c > > @@ -3161,6 +3161,16 @@ should_compact_retry(struct alloc_context *ac, > > unsigned int order, int alloc_fla > > if (!order || order > PAGE_ALLOC_COSTLY_ORDER) > > return false; > > > > +#ifdef CONFIG_COMPACTION > > + /* > > +* This is a gross workaround to compensate a lack of reliable > > compaction > > +* operation. We cannot simply go OOM with the current state of the > > compaction > > +* code because this can lead to pre mature OOM declaration. > > +*/ > > + if (order <= PAGE_ALLOC_COSTLY_ORDER) > > + return true; > > +#endif > > + > > /* > > * There are setups with compaction disabled which would prefer to loop > > * inside the allocator rather than hit the oom killer prematurely. > > -- > > Michal Hocko > > SUSE Labs > > -- > Michal Hocko > SUSE Labs > -- "A mouse is a device used to point at the xterm you want to type in" - A.S.R. Microsoft is to operating systems what McDonalds is to gourmet cooking Home page: http://marc.merlins.org/ | PGP 1024R/763BE901
Re: 4.8.8 kernel trigger OOM killer repeatedly when I have lots of RAM that should be free
On Tue, Nov 29, 2016 at 05:07:51PM +0100, Michal Hocko wrote: > On Tue 29-11-16 07:55:37, Marc MERLIN wrote: > > On Mon, Nov 28, 2016 at 08:23:15AM +0100, Michal Hocko wrote: > > > Marc, could you try this patch please? I think it should be pretty clear > > > it should help you but running it through your use case would be more > > > than welcome before I ask Greg to take this to the 4.8 stable tree. > > > > I ran it overnight and copied 1.4TB with it before it failed because > > there wasn't enough disk space on the other side, so I think it fixes > > the problem too. > > Can I add your Tested-by? Done. Now, probably unrelated, but hard to be sure, doing those big copies causes massive hangs on my system. I hit a few of the 120s hangs, but more generally lots of things hang, including shells, my DNS server, monitoring reading from USB and timing out, and so forth. Examples below. I have a hard time telling what is the fault, but is there a chance it might be memory allocation pressure? I already have a preempt kernel, so I can't make it more preempt than that. Now, to be fair, this is not a new problem, it's just varying degrees of bad and usually only happens when I do a lot of I/O with btrfs. That said, btrfs may very well just be suffering from memory allocation issues and hanging as a result, with everything else on my system also hanging for similar reasons until the memory pressure goes away with the copy or scrub are finished. What do you think? [28034.954435] INFO: task btrfs:5618 blocked for more than 120 seconds. [28034.975471] Tainted: G U 4.8.10-amd64-preempt-sysrq-20161121vb3tj1 #12 [28035.000964] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [28035.025429] btrfs D 91154d33fc70 0 5618 5372 0x0080 [28035.047717] 91154d33fc70 00200246 911842f880c0 9115a4cf01c0 [28035.071020] 91154d33fc58 91154d34 91165493bca0 9115623773f0 [28035.094252] 1000 0001 91154d33fc88 b86cf1a6 [28035.117538] Call Trace: [28035.125791] [] schedule+0x8b/0xa3 [28035.141550] [] btrfs_start_ordered_extent+0xce/0x122 [28035.162457] [] ? wake_up_atomic_t+0x2c/0x2c [28035.180891] [] btrfs_wait_ordered_range+0xa9/0x10d [28035.201723] [] btrfs_truncate+0x40/0x24b [28035.219269] [] btrfs_setattr+0x1da/0x2d7 [28035.237032] [] notify_change+0x252/0x39c [28035.254566] [] do_truncate+0x81/0xb4 [28035.271057] [] vfs_truncate+0xd9/0xf9 [28035.287782] [] do_sys_truncate+0x63/0xa7 I get other hangs like: [10338.968912] perf: interrupt took too long (3927 > 3917), lowering kernel.perf_event_max_sample_rate to 50750 [12971.047705] ftdi_sio ttyUSB15: usb_serial_generic_read_bulk_callback - urb stopped: -32 [17761.122238] usb 4-1.4: USB disconnect, device number 39 [17761.141063] usb 4-1.4: usbfs: USBDEVFS_CONTROL failed cmd hub-ctrl rqt 160 rq 6 len 1024 ret -108 [17761.263252] usb 4-1: reset SuperSpeed USB device number 2 using xhci_hcd [17761.938575] usb 4-1.4: new SuperSpeed USB device number 40 using xhci_hcd [24130.574425] hpet1: lost 2306 rtc interrupts [24156.034950] hpet1: lost 1628 rtc interrupts [24173.314738] hpet1: lost 1104 rtc interrupts [24180.129950] hpet1: lost 436 rtc interrupts [24257.557955] hpet1: lost 4954 rtc interrupts [24267.522656] hpet1: lost 637 rtc interrupts Thanks, Marc -- "A mouse is a device used to point at the xterm you want to type in" - A.S.R. Microsoft is to operating systems what McDonalds is to gourmet cooking Home page: http://marc.merlins.org/ | PGP 1024R/763BE901
Re: 4.8.8 kernel trigger OOM killer repeatedly when I have lots of RAM that should be free
Thanks for the reply and suggestions. On Tue, Nov 29, 2016 at 09:07:03AM -0800, Linus Torvalds wrote: > On Tue, Nov 29, 2016 at 8:34 AM, Marc MERLIN wrote: > > Now, to be fair, this is not a new problem, it's just varying degrees of > > bad and usually only happens when I do a lot of I/O with btrfs. > > One situation where I've seen something like this happen is > > (a) lots and lots of dirty data queued up > (b) horribly slow storage In my case, it is a 5x 4TB HDD with software raid 5 < bcache < dmcrypt < btrfs bcache is currently half disabled (as in I removed the actual cache) or too many bcache requests pile up, and the kernel dies when too many workqueues have piled up. I'm just kind of worried that since I'm going through 4 subsystems before my data can hit disk, that's a lot of memory allocations and places where data can accumulate and cause bottlenecks if the next subsystem isn't as fast. But this shouldn't be "horribly slow", should it? (it does copy a few terabytes per day, not fast, but not horrible, about 30MB/s or so) > Sadly, our defaults for "how much dirty data do we allow" are somewhat > buggered. The global defaults are in "percent of memory", and are > generally _much_ too high for big-memory machines: > > [torvalds@i7 linux]$ cat /proc/sys/vm/dirty_ratio > 20 > [torvalds@i7 linux]$ cat /proc/sys/vm/dirty_background_ratio > 10 I can confirm I have the same. > says that it only starts really throttling writes when you hit 20% of > all memory used. You don't say how much memory you have in that > machine, but if it's the same one you talked about earlier, it was > 24GB. So you can have 4GB of dirty data waiting to be flushed out. Correct, 24GB and 4GB. > And we *try* to do this per-device backing-dev congestion thing to > make things work better, but it generally seems to not work very well. > Possibly because of inconsistent write speeds (ie _sometimes_ the SSD > does really well, and we want to open up, and then it shuts down). > > One thing you can try is to just make the global limits much lower. As in > >echo 2 > /proc/sys/vm/dirty_ratio >echo 1 > /proc/sys/vm/dirty_background_ratio I will give that a shot, thank you. Marc -- "A mouse is a device used to point at the xterm you want to type in" - A.S.R. Microsoft is to operating systems what McDonalds is to gourmet cooking Home page: http://marc.merlins.org/ | PGP 1024R/763BE901
Re: 4.8.8 kernel trigger OOM killer repeatedly when I have lots of RAM that should be free
On Tue, Nov 29, 2016 at 09:40:19AM -0800, Marc MERLIN wrote: > Thanks for the reply and suggestions. > > On Tue, Nov 29, 2016 at 09:07:03AM -0800, Linus Torvalds wrote: > > On Tue, Nov 29, 2016 at 8:34 AM, Marc MERLIN wrote: > > > Now, to be fair, this is not a new problem, it's just varying degrees of > > > bad and usually only happens when I do a lot of I/O with btrfs. > > > > One situation where I've seen something like this happen is > > > > (a) lots and lots of dirty data queued up > > (b) horribly slow storage > > In my case, it is a 5x 4TB HDD with > software raid 5 < bcache < dmcrypt < btrfs > bcache is currently half disabled (as in I removed the actual cache) or > too many bcache requests pile up, and the kernel dies when too many > workqueues have piled up. > I'm just kind of worried that since I'm going through 4 subsystems > before my data can hit disk, that's a lot of memory allocations and > places where data can accumulate and cause bottlenecks if the next > subsystem isn't as fast. > > But this shouldn't be "horribly slow", should it? (it does copy a few > terabytes per day, not fast, but not horrible, about 30MB/s or so) > > > Sadly, our defaults for "how much dirty data do we allow" are somewhat > > buggered. The global defaults are in "percent of memory", and are > > generally _much_ too high for big-memory machines: > > > > [torvalds@i7 linux]$ cat /proc/sys/vm/dirty_ratio > > 20 > > [torvalds@i7 linux]$ cat /proc/sys/vm/dirty_background_ratio > > 10 > > I can confirm I have the same. > > > says that it only starts really throttling writes when you hit 20% of > > all memory used. You don't say how much memory you have in that > > machine, but if it's the same one you talked about earlier, it was > > 24GB. So you can have 4GB of dirty data waiting to be flushed out. > > Correct, 24GB and 4GB. > > > And we *try* to do this per-device backing-dev congestion thing to > > make things work better, but it generally seems to not work very well. > > Possibly because of inconsistent write speeds (ie _sometimes_ the SSD > > does really well, and we want to open up, and then it shuts down). > > > > One thing you can try is to just make the global limits much lower. As in > > > >echo 2 > /proc/sys/vm/dirty_ratio > >echo 1 > /proc/sys/vm/dirty_background_ratio > > I will give that a shot, thank you. And, after 5H of copying, not a single hang, or USB disconnect, or anything. Obviously this seems to point to other problems in the code, and I have no idea which layer is a culprit here, but reducing the buffers absolutely helped a lot. Thanks much, Marc -- "A mouse is a device used to point at the xterm you want to type in" - A.S.R. Microsoft is to operating systems what McDonalds is to gourmet cooking Home page: http://marc.merlins.org/
Re: 4.8.8 kernel trigger OOM killer repeatedly when I have lots of RAM that should be free
On Tue, Nov 29, 2016 at 10:01:10AM -0800, Linus Torvalds wrote: > On Tue, Nov 29, 2016 at 9:40 AM, Marc MERLIN wrote: > > > > In my case, it is a 5x 4TB HDD with > > software raid 5 < bcache < dmcrypt < btrfs > > It doesn't sound like the nasty situations I have seen (particularly > with large USB flash storage - often high momentary speed for > benchmarks, but slows down to a crawl after you've written a bit to > it, and doesn't have the smart garbage collection that modern "real" > SSDs have). I gave it a thought again, I think it is exactly the nasty situation you described. bcache takes I/O quickly while sending to SSD cache. SSD fills up, now bcache can't handle IO as quickly and has to hang until the SSD has been flushed to spinning rust drives. This actually is exactly the same as filling up the cache on a USB key and now you're waiting for slow writes to flash, is it not? With your dirty ratio workaround, I was able to re-enable bcache and have it not fall over, but only barely. I recorded over a hundred workqueues in flight during the copy at some point (just not enough to actually kill the kernel this time). I've started a bcache followp on this here: http://marc.info/?l=linux-bcache&m=148052441423532&w=2 http://marc.info/?l=linux-bcache&m=148052620524162&w=2 This message shows the huge pileup of workqueeues in bcache just before the kernel dies with Hardware name: System manufacturer System Product Name/P8H67-M PRO, BIOS 3904 04/27/2013 task: 9ee0c2fa4180 task.stack: 9ee0c2fa8000 RIP: 0010:[] [] cpuidle_enter_state+0x119/0x171 RSP: :9ee0c2fabea0 EFLAGS: 0246 RAX: 9ee0de3d90c0 RBX: 0004 RCX: 001f RDX: RSI: 0007 RDI: RBP: 9ee0c2fabed0 R08: 0f92 R09: 0f42 R10: 9ee0c2fabe50 R11: 071c71c71c71c71c R12: e047bfdcb200 R13: 0af626899577 R14: 0004 R15: 0af6264cc557 FS: () GS:9ee0de3c() knlGS: CS: 0010 DS: ES: CR0: 80050033 CR2: 0898b000 CR3: 00045cc06000 CR4: 001406e0 Stack: 0f40 e047bfdcb200 bbccc060 9ee0c2fac000 9ee0c2fa8000 9ee0c2fac000 9ee0c2fabee0 bb57a1ac 9ee0c2fabf30 bb09238d 9ee0c2fa8000 00070004 Call Trace: [] cpuidle_enter+0x17/0x19 [] cpu_startup_entry+0x210/0x28b [] start_secondary+0x13e/0x140 Code: 00 00 00 48 c7 c7 cd ae b2 bb c6 05 4b 8e 7a 00 01 e8 17 6c ae ff fa 66 0f 1f 44 00 00 31 ff e8 75 60 b4 44 00 00 <4c> 89 e8 b9 e8 03 00 00 4c 29 f8 48 99 48 f7 f9 ba ff ff ff 7f Kernel panic - not syncing: Hard LOCKUP A full traceback showing the pilup of requests is here: http://marc.info/?l=linux-bcache&m=147949497808483&w=2 and there: http://pastebin.com/rJ5RKUVm (2 different ones but mostly the same result) We can probably follow up on the bcache thread I Cc'ed you on since I'm not sure if the fault here lies with bcache or the VM subsystem anymore. Thanks. Marc -- "A mouse is a device used to point at the xterm you want to type in" - A.S.R. Microsoft is to operating systems what McDonalds is to gourmet cooking Home page: http://marc.merlins.org/ | PGP 1024R/763BE901
Re: 4.8.8 kernel trigger OOM killer repeatedly when I have lots of RAM that should be free
On Wed, Nov 30, 2016 at 10:14:50AM -0800, Linus Torvalds wrote: > Anyway, none of this seems new per se. I'm adding Kent and Jens to the > cc (Tejun already was), in the hope that maybe they have some idea how > to control the nasty worst-case behavior wrt workqueue lockup (it's > not really a "lockup", it looks like it's just hundreds of workqueues > all waiting for IO to complete and much too deep IO queues). I'll take your word for it, all I got in the end was Kernel panic - not syncing: Hard LOCKUP and the system stone dead when I woke up hours later. > And I think your NMI watchdog then turns the "system is no longer > responsive" into an actual kernel panic. Ah, I see. Thanks for the reply, and sorry for bringing in that separate thread from the btrfs mailing list, which effectively was a suggestion similar to what you're saying here too. Marc -- "A mouse is a device used to point at the xterm you want to type in" - A.S.R. Microsoft is to operating systems what McDonalds is to gourmet cooking Home page: http://marc.merlins.org/ | PGP 1024R/763BE901
Re: 4.8.8 kernel trigger OOM killer repeatedly when I have lots of RAM that should be free
On Mon, Nov 21, 2016 at 10:50:20PM +0100, Vlastimil Babka wrote: > > 4.9rc5 however seems to be doing better, and is still running after 18 > > hours. However, I got a few page allocation failures as per below, but the > > system seems to recover. > > Vlastimil, do you want me to continue the copy on 4.9 (may take 3-5 days) > > or is that good enough, and i should go back to 4.8.8 with that patch > > applied? > > https://marc.info/?l=linux-mm&m=147423605024993 > > Hi, I think it's enough for 4.9 for now and I would appreciate trying > 4.8 with that patch, yeah. So the good news is that it's been running for almost 5H and so far so good. > The failures below are in a GFP_NOWAIT context, which cannot do any > reclaim so it's not affected by OOM rewrite. If it's a regression, it > has to be caused by something else. But it seems the code in > cfq_get_queue() intentionally doesn't want to reclaim or use any atomic > reserves, and has a fallback scenario for allocation failure, in which > case I would argue that it should add __GFP_NOWARN, as these warnings > can't help anyone. CCing Tejun as author of commit d4aad7ff0. No, that's not a regression, I get those on occasion. The good news is that they're not fatal. Just got another one with 4.8.8. No idea if they're actual errors I should worry about, or just warnings that spam the console a bit, but things retry, recover and succeed, so I can ignore them. Another one from 4.8.8 below. I'll report back tomorrow to see if this has run for a day and if so, I'll call your patch a fix for my problem (but at this point, it's already looking very good). Thanks, Marc cron: page allocation failure: order:0, mode:0x2204000(GFP_NOWAIT|__GFP_COMP|__GFP_NOTRACK) CPU: 4 PID: 9748 Comm: cron Tainted: G U 4.8.8-amd64-volpreempt-sysrq-20161108vb2 #9 Hardware name: System manufacturer System Product Name/P8H67-M PRO, BIOS 3904 04/27/2013 a1e37429f6d0 9a36a0bb a1e37429f768 9a1359d4 022040009f5e8d00 0012 9a140770 Call Trace: [] dump_stack+0x61/0x7d [] warn_alloc_failed+0x11c/0x132 [] ? wakeup_kswapd+0x8e/0x153 [] __alloc_pages_nodemask+0x87b/0xb02 [] ? __alloc_pages_nodemask+0x87b/0xb02 [] cache_grow_begin+0xb2/0x30b [] fallback_alloc+0x137/0x19f [] cache_alloc_node+0xd3/0xde [] kmem_cache_alloc_node+0x8e/0x163 [] cfq_get_queue+0x162/0x29d [] ? kmem_cache_alloc+0xd7/0x14b [] ? slab_post_alloc_hook+0x5b/0x66 [] cfq_set_request+0x141/0x2be [] ? timekeeping_get_ns+0x1e/0x32 [] ? ktime_get+0x41/0x52 [] ? ktime_get_ns+0x9/0xb [] ? cfq_init_icq+0x12/0x19 [] elv_set_request+0x1f/0x24 [] get_request+0x324/0x5aa [] ? wake_up_atomic_t+0x2c/0x2c [] blk_queue_bio+0x19f/0x28c [] generic_make_request+0xbd/0x160 [] submit_bio+0x100/0x11d [] ? map_swap_page+0x12/0x14 [] ? get_swap_bio+0x57/0x6c [] swap_readpage+0x110/0x118 [] read_swap_cache_async+0x26/0x2d [] swapin_readahead+0x11a/0x16a [] do_swap_page+0x9c/0x431 [] ? do_swap_page+0x9c/0x431 [] handle_mm_fault+0xa4d/0xb3d [] ? vfs_getattr_nosec+0x26/0x37 [] __do_page_fault+0x267/0x43d [] do_page_fault+0x25/0x27 [] page_fault+0x28/0x30 Mem-Info: active_anon:532194 inactive_anon:133376 isolated_anon:0 active_file:4118244 inactive_file:382010 isolated_file:0 unevictable:1687 dirty:3502 writeback:386111 unstable:0 slab_reclaimable:41767 slab_unreclaimable:106595 mapped:512496 shmem:582026 pagetables:5352 bounce:0 free:92092 free_pcp:176 free_cma:2072 Node 0 active_anon:2128776kB inactive_anon:533504kB active_file:16472976kB inactive_file:1528040kB unevictable:6748kB isolated(anon):0kB isolated(file):0kB mapped:2049984kB dirty:14008kB writeback:154kB shmem:0kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 2328104kB writeback_tmp:0kB unstable:0kB pages_scanned:1 all_unreclaimable? no Node 0 DMA free:15884kB min:168kB low:208kB high:248kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15976kB managed:15892kB mlocked:0kB slab_reclaimable:0kB slab_unreclaimable:8kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB lowmem_reserve[]: 0 3200 23767 23767 23767 Node 0 DMA32 free:117580kB min:35424kB low:44280kB high:53136kB active_anon:3980kB inactive_anon:400kB active_file:2632672kB inactive_file:286956kB unevictable:0kB writepending:288296kB present:3362068kB managed:3296500kB mlocked:0kB slab_reclaimable:41632kB slab_unreclaimable:19512kB kernel_stack:880kB pagetables:676kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB lowmem_reserve[]: 0 0 20567 20567 20567 Node 0 Normal free:234904kB min:226544kB low:283180kB high:339816kB active_anon:2124796kB inactive_anon:533104kB active_file:13840304kB inactive_file:1241268kB unevictable:6748kB writepending:1270156kB present:21485568kB managed:21080636kB mlocked:6748kB slab_reclaimable:125436kB sl
Re: [PATCH] objtool: fix CONFIG_STACK_VALIDATION warning for out-of-tree modules
On Wed, Feb 15, 2017 at 12:21:17PM -0600, Josh Poimboeuf wrote: > When building a CONFIG_STACK_VALIDATION enabled kernel without the > libelf devel package installed, the Makefile prints a warning: > > "Cannot use CONFIG_STACK_VALIDATION, please install libelf-dev, > libelf-devel or elfutils-libelf-devel" > > But when building an out-of-tree module, the warning doesn't show. > Instead it tries to use objtool, and the build fails with: > > /bin/sh: ./tools/objtool/objtool: No such file or directory > > Make sure the warning and the disabling of objtool occur in all cases, > by moving the CONFIG_STACK_VALIDATION checks outside the 'ifeq > ($(KBUILD_EXTMOD),)' block in the Makefile. > > Reported-by: Marc MERLIN > Suggested-by: Jessica Yu > Fixes: 3b27a0c85d70 ("objtool: Detect and warn if libelf is missing and don't > break the build") > Signed-off-by: Josh Poimboeuf Tested-By: Marc MERLIN saruman:/usr/src/linux-block# dpkg --remove libelf-dev saruman:/usr/src/linux-block/tools/objtool# make clean saruman:/usr/src/linux-block# dkms install bbswitch/0.8 Kernel preparation unnecessary for this kernel. Skipping... Building module: cleaning build area... make -j8 KERNELRELEASE=4.10.0-rc7-mm3kb1+ KVERSION=4.10.0-rc7-mm3kb1+...(bad exit status: 2) Error! Bad return status for module build on kernel: 4.10.0-rc7-mm3kb1+ (x86_64) saruman:/usr/src/linux-block# patch -p1 -s < objtool.patch saruman:/usr/src/linux-block# dkms install bbswitch/0.8 Kernel preparation unnecessary for this kernel. Skipping... Building module: cleaning build area... make -j8 KERNELRELEASE=4.10.0-rc7-mm3kb1+ KVERSION=4.10.0-rc7-mm3kb1+... cleaning build area... DKMS: build completed. bbswitch.ko: Running module version sanity check. - Original module - No original module exists within this kernel - Installation - Installing to /lib/modules/4.10.0-rc7-mm3kb1+/updates/dkms/ depmod... DKMS: install completed. All good, thank you. Marc -- "A mouse is a device used to point at the xterm you want to type in" - A.S.R. Microsoft is to operating systems what McDonalds is to gourmet cooking Home page: http://marc.merlins.org/ | PGP 1024R/763BE901
Please turn "Cannot use CONFIG_STACK_VALIDATION" into build error
Hi Josh, I'll start with the story as to why. i've lost more hours than I care to list, because I was unable to build the virtualbox kernel driver with newer kernels. Sadly, it gives no useful debug info outside of make[1]: *** No rule to make target '/tmp/vbox.0/linux/SUPDrv-linux.o', needed by '/tmp/vbox.0/vboxdrv.o'. Stop. It took some pretty deep debugging to finally see this: Trying rule prerequisite 'tools/objtool/objtool'. Looking for a rule with intermediate file 'tools/objtool/objtool'. Avoiding implicit rule recursion. which look quite inoccuous and don't look as errors at all. When I filed a bug with the vbox folks, they were unable to find out why the module refused to build on my kernel, and I was stuck with older kernels as a result. Then, I had another module, bbswitch, to turn off the nvidia chip on my laptop to save battery. That one also failed to build with newer kernels, but thankfully made it more clear that the problem was related to tools/objtool/objtool missing. But why was it missing? No idea... I trace that down to CONFIG_STACK_VALIDATION which there seems to be no menu option for, so I manually disable it in .config, rebuild, and it's automatically re-enabled. Gah. More hair pulling, and finally I make a typo saruman:/usr/src/linux-block# make xonfig Makefile:1044: "Cannot use CONFIG_STACK_VALIDATION, please install libelf-dev, libelf-devel or elfutils-libelf-devel" scripts/kconfig/conf --silentoldconfig Kconfig Makefile:1044: "Cannot use CONFIG_STACK_VALIDATION, please install libelf-dev, libelf-devel or elfutils-libelf-devel" make: *** No rule to make target 'xonfig'. Stop. Sure enough, this was my problem, but I never saw the error message because I build kernels with make-kpkg --revision 1gandalf kernel-image which does other stuff and hid that warning, which really should have been a fatal error in my opinion. Given that 1) CONFIG_STACK_VALIDATION seems silently auto enabled. 2) without libelf-dev, the kernel will build but will leave a tree missing objtool, which in turn causes (all?) 3rd party modules to fail building. 3) and that it's kind of non trivial to find out why if that happens, Would you consider making "Cannot use CONFIG_STACK_VALIDATION, please install libelf-dev, libelf-devel or elfutils-libelf-devel" a build error as opposed to a warning? This sure would have saved me countless errors of debugging the wrong things. Thank you Marc -- "A mouse is a device used to point at the xterm you want to type in" - A.S.R. Microsoft is to operating systems what McDonalds is to gourmet cooking Home page: http://marc.merlins.org/ | PGP 1024R/763BE901
Re: Please turn "Cannot use CONFIG_STACK_VALIDATION" into build error
On Mon, Feb 13, 2017 at 12:41:06PM -0600, Josh Poimboeuf wrote: > Hm, that doesn't sound right. Nothing automatically enables > CONFIG_STACK_VALIDATION. It should be disabled unless manually enabled. > Maybe you got it confused with CONFIG_HAVE_STACK_VALIDATION, which is > always enabled? I did mean CONFIG_STACK_VALIDATION, which is what requires objtool. It's very possible I enabled it myself during a make oldconfig some time back, what I meant is that disabling it from the config file doesn't make it go away, it comes back on its own (see below) > BTW, there is a config option for it in the menu: > > Kernel hacking > Compile-time checks and compiler options > Compile-time stack metadata validation Thanks, I had a hard time finding it since it was not in the same place than the other options around it. To be honest, I never quite know how to find where a .config option is located in an xconfig menu, so I looked around other ones above and below it in .config, and turns out it was the wrong place. Anyway, after not finding it in xconfig, I editted .config, and did: # CONFIG_STACK_VALIDATION is not set save .config and the next build re-enabled the option. That's what caught me by surprise. Did I do something wrong, or is there an issue there? > > 2) without libelf-dev, the kernel will build but will leave a tree > > missing objtool, which in turn causes (all?) 3rd party modules to fail > > building. > > Yes, this is a bug. Obviously the fix is to make sure objtool builds, but is there a way to make things better if it doesn't build? (apparently yes, as you replied below) > Correct me if I'm wrong, but it sounds like make-kpkg suppressed stderr? > If so, that should be fixed. It does not, but it adds lines of output before the build starts, and since the error with libelf-dev missing is not colorized, it was effectively invisible (one line amongst hundreds scrolling on the screen). Now that I know what the error is and how to look for it, I can see it, but as a diagnosis that things were wrong and that things should be fixed, or 3rd party modules would fail to build in weird ways, it was unfortunately useless. > When I try to build an OOT module with CONFIG_STACK_VALIDATION enabled > and elfutils-libelf-devel missing (on Fedora), I get: > > make: Entering directory '/home/jpoimboe/git/linux' > make[1]: Entering directory '/home/jpoimboe/ktest/output' > CC [M] /home/jpoimboe/livepatch-test/1/livepatch2.o > /bin/sh: ./tools/objtool/objtool: No such file or directory > /home/jpoimboe/git/linux/scripts/Makefile.build:300: recipe for target > '/home/jpoimboe/livepatch-test/1/livepatch2.o' failed > make[2]: *** [/home/jpoimboe/livepatch-test/1/livepatch2.o] Error 1 > /home/jpoimboe/git/linux/Makefile:1490: recipe for target > '_module_/home/jpoimboe/livepatch-test/1' failed > make[1]: *** [_module_/home/jpoimboe/livepatch-test/1] Error 2 > make[1]: Leaving directory '/home/jpoimboe/ktest/output' > Makefile:150: recipe for target 'sub-make' failed > make: *** [sub-make] Error 2 > make: Leaving directory '/home/jpoimboe/git/linux' > > It's not a perfect error message, but the > '/bin/sh: ./tools/objtool/objtool: No such file or directory' > is at least a big clue. I'm curious why you didn't see that. In the virtualbox build, it just doesn't show up at all, even in the debug log :( It's only after spending many many hours trying to find why virtualbox was not working, that I realized that my bbswitch module wasn't building either, and that one did point to objtool as a culprit. But even after I found this, it was non trivial to link this to libelf-dev missing, given that the message wasn't that visible in a kernel build. > Anyway, the above libelf-dev warning is just a warning and not a build > error because CONFIG_STACK_VALIDATION is enabled for allyesconfig, and > it's not a severe enough problem to warrant breaking the build. Understood. > Ideally the same warning should be printed when building OOT modules. > I'll try to figure out if there's a way to do that it. This would help, although in that case you can even make the warning an error since objtool missing seems to be fatal? Thanks, Marc -- "A mouse is a device used to point at the xterm you want to type in" - A.S.R. Microsoft is to operating systems what McDonalds is to gourmet cooking Home page: http://marc.merlins.org/
Re: Please turn "Cannot use CONFIG_STACK_VALIDATION" into build error
On Mon, Feb 13, 2017 at 04:00:02PM -0600, Josh Poimboeuf wrote: > On Mon, Feb 13, 2017 at 01:31:32PM -0800, Marc MERLIN wrote: > > Anyway, after not finding it in xconfig, I editted .config, and did: > > # CONFIG_STACK_VALIDATION is not set > > save .config > > and the next build re-enabled the option. > > That's what caught me by surprise. Did I do something wrong, or is there an > > issue there? > > I really don't see how it would be possible for it to come back by > itself, as it's disabled by default, and no other options select it. > When I remove it, it stays disabled. Mmmh, you are correct. I have no idea why/how it got re-enabled yesterday. I'm not seeing this again today. > > This would help, although in that case you can even make the warning an > > error since objtool missing seems to be fatal? > > It doesn't need to be fatal though. It should just be a warning and the > build should succeed, like it does when building the kernel. Agreed, that would be even better. Thanks for looking at that. Marc -- "A mouse is a device used to point at the xterm you want to type in" - A.S.R. Microsoft is to operating systems what McDonalds is to gourmet cooking Home page: http://marc.merlins.org/
Re: [PATCH-RFC]: sysrq-a: graceful reboot via kernel_restart(), similar to sysrq-o
On Fri, Mar 11, 2016 at 04:35:21AM +, Eric Wheeler wrote: > Hello all, > > We were having a discussion on the bcache list about the safest reboot > options via sysrq here: > http://thread.gmane.org/gmane.linux.kernel.bcache.devel/3559/focus=3586 > > The result of the discussion ended up in a patch for sysrq-a to call > kernel_restart much in the same way as sysrq-ocalls kernel_power_off. > > Please comment on the patch and suggest any appropriate changes. Thanks Eric. The quick rationale is that sysrq-r is not desirable to use if you're using bcache, or software raid since it will reboot without giving them a chance to properly sync their buffers and get into a clean state. I've been using sysrq-o to get a clean shutdown, but of course that actually powers off the server, and you then need to rely on something like WOL to bring the machine back up, which isn't always easy or possible. This new reboot with proper flushing (kernel_power_off) allows for safe reboots that don't upset bcache or software raid. Thanks, Marc -- "A mouse is a device used to point at the xterm you want to type in" - A.S.R. Microsoft is to operating systems what McDonalds is to gourmet cooking Home page: http://marc.merlins.org/ | PGP 1024R/763BE901
Re: pcieport 0000:00:01.0: PME: Spurious native interrupt (nvidia with nouveau and thunderbolt on thinkpad P73)
port :00:01.0: saving config space at offset 0x20 (reading 0xce00cd00) [6.724050] pcieport :00:01.0: saving config space at offset 0x24 (reading 0xb1f1a001) [6.724054] pcieport :00:01.0: saving config space at offset 0x28 (reading 0x0) [6.724058] pcieport :00:01.0: saving config space at offset 0x2c (reading 0x0) [6.724062] pcieport :00:01.0: saving config space at offset 0x30 (reading 0x0) [6.724066] pcieport :00:01.0: saving config space at offset 0x34 (reading 0x88) [6.724070] pcieport :00:01.0: saving config space at offset 0x38 (reading 0x0) [6.724074] pcieport :00:01.0: saving config space at offset 0x3c (reading 0x201ff) [6.724129] pcieport :00:1b.0: runtime IRQ mapping not provided by arch [6.724650] pcieport :00:1b.0: PME: Signaling with IRQ 123 [6.725021] pcieport :00:1b.0: saving config space at offset 0x0 (reading 0xa3408086) [6.725026] pcieport :00:1b.0: saving config space at offset 0x4 (reading 0x100407) [6.725031] pcieport :00:1b.0: saving config space at offset 0x8 (reading 0x60400f0) [6.725035] pcieport :00:1b.0: saving config space at offset 0xc (reading 0x81) [6.725040] pcieport :00:1b.0: saving config space at offset 0x10 (reading 0x0) [6.725044] pcieport :00:1b.0: saving config space at offset 0x14 (reading 0x0) [6.725049] pcieport :00:1b.0: saving config space at offset 0x18 (reading 0x20200) [6.725053] pcieport :00:1b.0: saving config space at offset 0x1c (reading 0x20f0) [6.725058] pcieport :00:1b.0: saving config space at offset 0x20 (reading 0xce30ce30) [6.725062] pcieport :00:1b.0: saving config space at offset 0x24 (reading 0x1fff1) [6.725067] pcieport :00:1b.0: saving config space at offset 0x28 (reading 0x0) [6.725071] pcieport :00:1b.0: saving config space at offset 0x2c (reading 0x0) [6.725075] pcieport :00:1b.0: saving config space at offset 0x30 (reading 0x0) [6.725080] pcieport :00:1b.0: saving config space at offset 0x34 (reading 0x40) [6.725084] pcieport :00:1b.0: saving config space at offset 0x38 (reading 0x0) [6.725089] pcieport :00:1b.0: saving config space at offset 0x3c (reading 0x201ff) [6.725154] pcieport :00:1c.0: runtime IRQ mapping not provided by arch [6.725284] pcieport :00:1c.0: PME: Signaling with IRQ 124 [6.725580] pcieport :00:1c.0: pciehp: Slot #0 AttnBtn- PwrCtrl- MRL- AttnInd- PwrInd- HotPlug+ Surprise+ Interlock- NoCompl+ IbPresDis- LLActRep+ [6.726086] pci_bus :04: dev 00, created physical slot 0 Any idea what's going on? Thanks, Marc On Sat, Aug 08, 2020 at 01:22:02PM -0700, Marc MERLIN wrote: > On Fri, Oct 04, 2019 at 03:39:46PM +0300, Mika Westerberg wrote: > > This is otherwise similar to pcie_wait_for_link() but allows passing > > custom activation delay in milliseconds. > > > > Signed-off-by: Mika Westerberg > > --- > > drivers/pci/pci.c | 21 ++--- > > 1 file changed, 18 insertions(+), 3 deletions(-) > > > > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c > > index e7982af9a5d8..bfd92e018925 100644 > > Hi Mika, > > So, I have a thinkpad P73 with thunderbolt, and while I don't boot > often, my last boots have been unreliable at best (was only able to boot > 5.7 once, and 5.8 did not succeed either). > > 5.6 was working for a while, but couldn't boot it either this morning, > so I had to go back to 5.5. This does not mean 5.5 does not have the > problem, just that it booted this morning, while 5.6 didn't when I > tried. > Once the kernel is booted, the problem does not seem to occur much, or > at all. > > Basically, I'm getting the same thing than this person with a P53 (which > is a mostly identical lenovo thinkpad, to mine) > kernel: pcieport :00:01.0: PME: Spurious native interrupt! > kernel: pcieport :00:01.0: PME: Spurious native interrupt! > kernel: pcieport :00:01.0: PME: Spurious native interrupt! > kernel: pcieport :00:01.0: PME: Spurious native interrupt! > kernel: pcieport :00:01.0: PME: Spurious native interrupt! > https://bbs.archlinux.org/viewtopic.php?id=250658 > > The kernel boots eventually, but it takes minutes, and everything is so > super slow, that I just can't reasonably use the machine. > > This shows similar issues with 5.3, 5.4. > https://forum.proxmox.com/threads/pme-spurious-native-interrupt-kernel-meldungen.62850/ > > Another report here with 5.6: > https://bugzilla.redhat.com/show_bug.cgi?id=1831899 > > My current kernel is running your patch above, and I haven't done a lot > of research yet to confirm whether going back to a kernel before it was > merged, fixes
Re: [Nouveau] pcieport 0000:00:01.0: PME: Spurious native interrupt (nvidia with nouveau and thunderbolt on thinkpad P73)
On Mon, Sep 07, 2020 at 09:14:03PM +0200, Karol Herbst wrote: > > - changes in the nouveau driver. Mika told me the PCIe regression > > "pcieport :00:01.0: PME: Spurious native interrupt!" is supposed > > to be fixed in 5.8, but I still get a 4mn hang or so during boot and > > with 5.8, removing the USB key, didn't help make the boot faster > > that's the root port the GPU is attached to, no? I saw that message on > the Thinkpad P1G2 when runtime resuming the Nvidia GPU, but it does > seem to come from the root port. Hi Karol, thanks for your answer. 00:01.0 PCI bridge: Intel Corporation Xeon E3-1200 v5/E3-1500 v5/6th Gen Core Processor PCIe Controller (x16) (rev 0d) 01:00.0 VGA compatible controller: NVIDIA Corporation TU104GLM [Quadro RTX 4000 Mobile / Max-Q] (rev a1) > Well, you'd also need it when attaching external displays. Indeed. I just don't need that on this laptop, but familiar with the not so seemless procedure to turn on both GPUs, and mirror the intel one into the nvidia one for external output. > > [ 11.262985] nvidia-gpu :01:00.3: PME# enabled > > [ 11.303060] nvidia-gpu :01:00.3: PME# disabled > > mhh, interesting. I heard some random comments that the Nvidia > USB-C/UCSI driver is a bit broken and can cause various issues. Mind > blacklisting i2c-nvidia-gpu and typec_nvidia (and verify they don't > get loaded) and see if that helps? Right, this one: 01:00.3 Serial bus controller [0c80]: NVIDIA Corporation TU104 USB Type-C UCSI Controller (rev a1) Sure, I'll blacklist it. Ok, just did that, removed from initrd, rebooted, and it was no better. >From initrd (before root gets mounted), I have this: nouveau 1961984 0 mxm_wmi16384 1 nouveau hwmon 32768 1 nouveau ttm 102400 1 nouveau wmi32768 2 nouveau,mxm_wmi I still got a 2mn hang. and a nouveau probe error [ 189.124530] nouveau: probe of :01:00.0 failed with error -12 Here's what it looks like: [9.693230] hid: raw HID events driver (C) Jiri Kosina [9.694988] usbcore: registered new interface driver usbhid [9.694989] usbhid: USB HID core driver [9.696700] hid-generic 0003:1050:0200.0001: hiddev0,hidraw0: USB HID v1.00 Device [Yubico Yubico Gnubby (gnubby1)] on usb-:00:14.0-2/input0 [9.784456] Console: switching to colour frame buffer device 240x67 [9.816297] i915 :00:02.0: fb0: i915drmfb frame buffer device [ 25.087400] thunderbolt :06:00.0: saving config space at offset 0x0 (reading 0x15eb8086) [ 25.087414] thunderbolt :06:00.0: saving config space at offset 0x4 (reading 0x100406) [ 25.087419] thunderbolt :06:00.0: saving config space at offset 0x8 (reading 0x886) [ 25.087424] thunderbolt :06:00.0: saving config space at offset 0xc (reading 0x20) [ 25.087430] thunderbolt :06:00.0: saving config space at offset 0x10 (reading 0xcc10) [ 25.087435] thunderbolt :06:00.0: saving config space at offset 0x14 (reading 0xcc14) [ 25.087440] thunderbolt :06:00.0: saving config space at offset 0x18 (reading 0x0) [ 25.087445] thunderbolt :06:00.0: saving config space at offset 0x1c (reading 0x0) [ 25.087450] thunderbolt :06:00.0: saving config space at offset 0x20 (reading 0x0) [ 25.087455] thunderbolt :06:00.0: saving config space at offset 0x24 (reading 0x0) [ 25.087460] thunderbolt :06:00.0: saving config space at offset 0x28 (reading 0x0) [ 25.087466] thunderbolt :06:00.0: saving config space at offset 0x2c (reading 0x229b17aa) [ 25.087471] thunderbolt :06:00.0: saving config space at offset 0x30 (reading 0x0) [ 25.087476] thunderbolt :06:00.0: saving config space at offset 0x34 (reading 0x80) [ 25.087481] thunderbolt :06:00.0: saving config space at offset 0x38 (reading 0x0) [ 25.087486] thunderbolt :06:00.0: saving config space at offset 0x3c (reading 0x1ff) [ 25.087571] thunderbolt :06:00.0: PME# enabled [ 25.105353] pcieport :05:00.0: saving config space at offset 0x0 (reading 0x15ea8086) [ 25.105364] pcieport :05:00.0: saving config space at offset 0x4 (reading 0x100407) [ 25.105370] pcieport :05:00.0: saving config space at offset 0x8 (reading 0x6040006) [ 25.105375] pcieport :05:00.0: saving config space at offset 0xc (reading 0x10020) [ 25.105380] pcieport :05:00.0: saving config space at offset 0x10 (reading 0x0) [ 25.105384] pcieport :05:00.0: saving config space at offset 0x14 (reading 0x0) [ 25.105389] pcieport :05:00.0: saving config space at offset 0x18 (reading 0x60605) [ 25.105394] pcieport :05:00.0: saving config space at offset 0x1c (reading 0x1f1) [ 25.105399] pcieport :05:00.0: saving config space at offset 0x20 (reading 0xcc10cc10) [ 25.105404] pcieport :05:00.0: saving config space at offset 0x24 (reading 0x1fff1) [ 25.105409] pcieport :05:00.0: saving config space
Re: [Nouveau] pcieport 0000:00:01.0: PME: Spurious native interrupt (nvidia with nouveau and thunderbolt on thinkpad P73)
On Tue, Sep 08, 2020 at 01:51:19AM +0200, Karol Herbst wrote: > oh, I somehow missed that "disp ctor failed" message. I think that > might explain why things are a bit hanging. From the top of my head I > am not sure if that's something known or something new. But just in > case I CCed Lyude and Ben. And I think booting with > nouveau.debug=disp=trace could already show something relevant. Thanks. I've added that to my boot for next time I reboot. I'm moving some folks to Bcc now, and let's remove the lists other than nouveau on followups (lkml and pci). I'm just putting a warning here so that it shows up in other list archives and anyone finding this later knows that they should look in the nouveau archives for further updates/resolution. Thanks, Marc -- "A mouse is a device used to point at the xterm you want to type in" - A.S.R. Home page: http://marc.merlins.org/ | PGP 7F55D5F27AAF9D08
Re: [Nouveau] pcieport 0000:00:01.0: PME: Spurious native interrupt (nvidia with nouveau and thunderbolt on thinkpad P73)
On Mon, Sep 07, 2020 at 05:29:35PM -0700, Marc MERLIN wrote: > On Tue, Sep 08, 2020 at 01:51:19AM +0200, Karol Herbst wrote: > > oh, I somehow missed that "disp ctor failed" message. I think that > > might explain why things are a bit hanging. From the top of my head I > > am not sure if that's something known or something new. But just in > > case I CCed Lyude and Ben. And I think booting with > > nouveau.debug=disp=trace could already show something relevant. > > Thanks. > I've added that to my boot for next time I reboot. > > I'm moving some folks to Bcc now, and let's remove the lists other than > nouveau on followups (lkml and pci). I'm just putting a warning here > so that it shows up in other list archives and anyone finding this > later knows that they should look in the nouveau archives for further > updates/resolution. Hi, I didn't hear back on this issue. Did you need the nouveau.debug=disp=trace or are you already working on the "disp ctor failed" issue? Thanks Marc -- "A mouse is a device used to point at the xterm you want to type in" - A.S.R. Home page: http://marc.merlins.org/ | PGP 7F55D5F27AAF9D08
Re: 4.8.8 kernel trigger OOM killer repeatedly when I have lots of RAM that should be free
Howdy, Well, sadly, the problem is more or less back is 4.11.0. The system doesn't really crash but it goes into an infinite loop with [34776.826800] BUG: workqueue lockup - pool cpus=6 node=0 flags=0x0 nice=0 stuck for 33s! More logs: https://pastebin.com/YqE4riw0 (I upgraded from 4.8 with custom patches you gave me, and went to 4.11.0 gargamel:~# cat /proc/sys/vm/dirty_ratio 2 gargamel:~# cat /proc/sys/vm/dirty_background_ratio 1 gargamel:~# free total used free sharedbuffers cached Mem: 24392600 163626608029940 0 8884 13739000 -/+ buffers/cache:2614776 21777824 Swap: 15616764 0 15616764 And yet, I was doing a btrfs check repair on a busy filesystem, within 40mn or so, it triggered the workqueue lockup. gargamel:~# grep CONFIG_COMPACTION /boot/config-4.11.0-amd64-preempt-sysrq-20170406 CONFIG_COMPACTION=y kernel config file: https://pastebin.com/7Tajse6L To be fair, I didn't try to run btrfs check on 4.8 and now I'm busy trying to recover a filesystem that apparently got corrupted by a bad SAS driver in 4.8 which caused a lot of I/O errors and corruption. This is just to say that btrfs on top of dmcrypt on top of bcache may have been enough layers to hang on btrfs check on 4.8 too, but I can't really go back to check right now due to the driver corruption issues. Any idea what I should do next? Thanks, Marc On Tue, Nov 29, 2016 at 03:01:35PM -0800, Marc MERLIN wrote: > On Tue, Nov 29, 2016 at 09:40:19AM -0800, Marc MERLIN wrote: > > Thanks for the reply and suggestions. > > > > On Tue, Nov 29, 2016 at 09:07:03AM -0800, Linus Torvalds wrote: > > > On Tue, Nov 29, 2016 at 8:34 AM, Marc MERLIN wrote: > > > > Now, to be fair, this is not a new problem, it's just varying degrees of > > > > bad and usually only happens when I do a lot of I/O with btrfs. > > > > > > One situation where I've seen something like this happen is > > > > > > (a) lots and lots of dirty data queued up > > > (b) horribly slow storage > > > > In my case, it is a 5x 4TB HDD with > > software raid 5 < bcache < dmcrypt < btrfs > > bcache is currently half disabled (as in I removed the actual cache) or > > too many bcache requests pile up, and the kernel dies when too many > > workqueues have piled up. > > I'm just kind of worried that since I'm going through 4 subsystems > > before my data can hit disk, that's a lot of memory allocations and > > places where data can accumulate and cause bottlenecks if the next > > subsystem isn't as fast. > > > > But this shouldn't be "horribly slow", should it? (it does copy a few > > terabytes per day, not fast, but not horrible, about 30MB/s or so) > > > > > Sadly, our defaults for "how much dirty data do we allow" are somewhat > > > buggered. The global defaults are in "percent of memory", and are > > > generally _much_ too high for big-memory machines: > > > > > > [torvalds@i7 linux]$ cat /proc/sys/vm/dirty_ratio > > > 20 > > > [torvalds@i7 linux]$ cat /proc/sys/vm/dirty_background_ratio > > > 10 > > > > I can confirm I have the same. > > > > > says that it only starts really throttling writes when you hit 20% of > > > all memory used. You don't say how much memory you have in that > > > machine, but if it's the same one you talked about earlier, it was > > > 24GB. So you can have 4GB of dirty data waiting to be flushed out. > > > > Correct, 24GB and 4GB. > > > > > And we *try* to do this per-device backing-dev congestion thing to > > > make things work better, but it generally seems to not work very well. > > > Possibly because of inconsistent write speeds (ie _sometimes_ the SSD > > > does really well, and we want to open up, and then it shuts down). > > > > > > One thing you can try is to just make the global limits much lower. As in > > > > > >echo 2 > /proc/sys/vm/dirty_ratio > > >echo 1 > /proc/sys/vm/dirty_background_ratio > > > > I will give that a shot, thank you. > > And, after 5H of copying, not a single hang, or USB disconnect, or anything. > Obviously this seems to point to other problems in the code, and I have no > idea which layer is a culprit here, but reducing the buffers absolutely > helped a lot. -- "A mouse is a device used to point at the xterm you want to type in" - A.S.R. Microsoft is to operating systems what McDonalds is to gourmet cooking Home page: http://marc.merlins.org/ | PGP 1024R/763BE901
Re: 4.8.8 kernel trigger OOM killer repeatedly when I have lots of RAM that should be free
On Tue, May 02, 2017 at 09:44:33AM +0200, Michal Hocko wrote: > On Mon 01-05-17 21:12:35, Marc MERLIN wrote: > > Howdy, > > > > Well, sadly, the problem is more or less back is 4.11.0. The system doesn't > > really > > crash but it goes into an infinite loop with > > [34776.826800] BUG: workqueue lockup - pool cpus=6 node=0 flags=0x0 nice=0 > > stuck for 33s! > > More logs: https://pastebin.com/YqE4riw0 > > I am seeing a lot of traces where tasks is waiting for an IO. I do not > see any OOM report there. Why do you believe this is an OOM killer > issue? Good question. This is a followup of the problem I had in 4.8.8 until I got a patch to fix the issue. Then, it used to OOM and later, to pile up I/O tasks like this. Now it doesn't OOM anymore, but tasks still pile up. I temporarily fixed the issue by doing this: gargamel:~# echo 0 > /proc/sys/vm/dirty_ratio gargamel:~# echo 0 > /proc/sys/vm/dirty_background_ratio of course my performance is abysmal now, but I can at least run btrfs scrub without piling up enough IO to deadlock the system. On Tue, May 02, 2017 at 07:44:47PM +0900, Tetsuo Handa wrote: > > Any idea what I should do next? > > Maybe you can try collecting list of all in-flight allocations with backtraces > using kmallocwd patches at > http://lkml.kernel.org/r/1489578541-81526-1-git-send-email-penguin-ker...@i-love.sakura.ne.jp > and > http://lkml.kernel.org/r/201704272019.jeh26057.shfotmljoov...@i-love.sakura.ne.jp > which also tracks mempool allocations. > (Well, the > > - cond_resched(); > + //cond_resched(); > > change in the latter patch would not be preferable.) Thanks. I can give that a shot as soon as my current scrub is done, it may take another 12 to 24H at this rate. In the meantimne, as explained above, not allowing any dirty VM has worked around the problem (Linus pointed out to me in the original thread that on a lightly loaded 24GB system, even 1 or 2% could still be a lot of memory for requests to pile up in and cause issues in degenerative cases like mine). Now I'm still curious what changed betweeen 4.8.8 + custom patches and 4.11 to cause this. Marc -- "A mouse is a device used to point at the xterm you want to type in" - A.S.R. Microsoft is to operating systems what McDonalds is to gourmet cooking Home page: http://marc.merlins.org/ | PGP 1024R/763BE901