Reproduceable SATA lockup on 3.7.8 with SSD

2013-02-25 Thread Marc MERLIN
Howdy,

I seem to have the same problem (or similar) as Mathieu Desnoyers in
https://lkml.org/lkml/2013/2/22/437

I can reliably get my SSD to drop from the SATA bus given the right workload
on linux.

How can I tell if it's linux's fault of the drive's fault?

Thanks,
Marc

- Forwarded message from Marc MERLIN  -----

From: Marc MERLIN 
To: linux-...@vger.kernel.org

Hopefully this is the right list. I know that IDE!=SATA, but I can't find
a SATA list.
Please redirect me if needed.

Hardware:
Lenovo T530, 64bit kernel and userland.
Hadware is shown below, but 2 drives, one SSD (OCZ-VERTEX4) and one HD (Hitachi 
HTS54101).

The SSD will lockup reliably if I do a specific mencoder command that reads MP4
files and rewrites them to another file in the same directory.

The log of what happens is shown below, the drive is eventually taken off the 
bus.
Once I reboot, it back, as if nothing happened.
If I do the same command on the HD, it works, but of course timings will be 
different
since the HD is slower.

How can I tell if it's the SSD's firmware's fault, or the linux SATA/AHCI code
that is buggy?

Thanks,
Marc

Failure log:
ata1.00: exception Emask 0x0 SAct 0x7fff SErr 0x0 action 0x6 frozen
ata1.00: failed command: WRITE FPDMA QUEUED
ata1.00: cmd 61/00:00:00:38:13/04:00:33:00:00/40 tag 0 ncq 524288 out
 res 40/00:00:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
ata1.00: status: { DRDY }
ata1.00: failed command: WRITE FPDMA QUEUED
ata1.00: cmd 61/00:08:00:3c:13/04:00:33:00:00/40 tag 1 ncq 524288 out
 res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
ata1.00: status: { DRDY }
(snipped)
ata1.00: failed command: WRITE FPDMA QUEUED
ata1.00: cmd 61/00:e8:00:30:13/04:00:33:00:00/40 tag 29 ncq 524288 out
 res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
ata1.00: status: { DRDY }
ata1.00: failed command: WRITE FPDMA QUEUED
ata1.00: cmd 61/00:f0:00:34:13/04:00:33:00:00/40 tag 30 ncq 524288 out
 res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
ata1.00: status: { DRDY }
ata1: hard resetting link
ata1: link is slow to respond, please be patient (ready=0)
ata1: COMRESET failed (errno=-16)
ata1: hard resetting link
ata1: link is slow to respond, please be patient (ready=0)
ata1: COMRESET failed (errno=-16)
ata1: hard resetting link
ata1: link is slow to respond, please be patient (ready=0)
ata1: COMRESET failed (errno=-16)
ata1: limiting SATA link speed to 3.0 Gbps
ata1: hard resetting link
ata1: COMRESET failed (errno=-16)
ata1: reset failed, giving up
ata1.00: disabled
ata1.00: device reported invalid CHS sector 0
(...)
ata1.00: device reported invalid CHS sector 0
ata1: EH complete
sd 0:0:0:0: [sda] Unhandled error code
sd 0:0:0:0: [sda]  
Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
sd 0:0:0:0: [sda] CDB: 
Write(10): 2a 00 33 13 34 00 00 04 00 00
end_request: I/O error, dev sda, sector 856896512
sd 0:0:0:0: [sda] Unhandled error code


Boot shows:
ahci :00:1f.2: version 3.0
ahci :00:1f.2: irq 42 for MSI/MSI-X
ahci: SSS flag set, parallel bus scan disabled
ahci :00:1f.2: AHCI 0001.0300 32 slots 6 ports 6 Gbps 0x13 impl SATA mode
ahci :00:1f.2: flags: 64bit ncq ilck stag pm led clo pio slum part ems sxs 
apst 
ahci :00:1f.2: setting latency timer to 64
scsi0 : ahci
scsi1 : ahci
scsi2 : ahci
scsi3 : ahci
scsi4 : ahci
scsi5 : ahci
ata1: SATA max UDMA/133 abar m2048@0xf2538000 port 0xf2538100 irq 42
ata2: SATA max UDMA/133 abar m2048@0xf2538000 port 0xf2538180 irq 42
ata3: DUMMY
ata4: DUMMY
ata5: SATA max UDMA/133 abar m2048@0xf2538000 port 0xf2538300 irq 42
ata6: DUMMY
scsi6 : pata_legacy
ata7: PATA max PIO4 cmd 0x1f0 ctl 0x3f6 irq 14
ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
ata1.00: ACPI cmd ef/02:00:00:00:00:a0 (SET FEATURES) succeeded
ata1.00: ACPI cmd f5/00:00:00:00:00:a0 (SECURITY FREEZE LOCK) filtered out
ata1.00: ATA-9: OCZ-VERTEX4, 1.5, max UDMA/133
ata1.00: 1000215216 sectors, multi 16: LBA48 NCQ (depth 31/32), AA
ata1.00: ACPI cmd ef/02:00:00:00:00:a0 (SET FEATURES) succeeded
ata1.00: ACPI cmd f5/00:00:00:00:00:a0 (SECURITY FREEZE LOCK) filtered out
ata1.00: configured for UDMA/133
scsi 0:0:0:0: Direct-Access ATA  OCZ-VERTEX4  1.5  PQ: 0 ANSI: 5
sd 0:0:0:0: [sda] 1000215216 512-byte logical blocks: (512 GB/476 GiB)
sd 0:0:0:0: [sda] Write Protect is off
sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support 
DPO or FUA
 sda: sda1 sda2 sda3 sda4
sd 0:0:0:0: [sda] Attached SCSI disk
ata2: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
ata2.00: ACPI cmd ef/02:00:00:00:00:a0 (SET FEATURES) succeeded
ata2.00: ACPI cmd f5/00:00:00:00:00:a0 (SECURITY FREEZE LOCK) filtered out
ata2.00: ATA-8: Hitachi HTS541010A9E680, JA0OA480, max UDMA/133
ata2.00: 1953525168 sectors, multi 16: LBA48 NCQ (depth 31/32), AA
ata2.00: ACPI cmd ef/02:00:00:00:00:a0 (SET FEATURES) succeeded
ata2.00: ACPI cmd f5

Re: Reproduceable SATA lockup on 3.7.8 with SSD

2013-02-26 Thread Marc MERLIN
On Tue, Feb 26, 2013 at 10:29:59AM -0500, Jeff Garzik wrote:
> On 02/25/2013 07:27 PM, Marc MERLIN wrote:
> >Howdy,
> >
> >I seem to have the same problem (or similar) as Mathieu Desnoyers in
> >https://lkml.org/lkml/2013/2/22/437
> >
> >I can reliably get my SSD to drop from the SATA bus given the right 
> >workload
> >on linux.
> >
> >How can I tell if it's linux's fault of the drive's fault?
> 
> Manually force speed to 3.0 Gbps, then 1.5 Gbps, and see what happens.
> 
> Try module/kernel parameter libata.force=1.5Gbps or libata.force=3.0Gbps

Ok, so by reading my log at time of failure, you saw that speed was
flipping between the two? (I couldn't see that, but I'm not good at reading
it).

Also, just to make sure, you're not saying that you want me to change the
speed at runtime, but 
1) boot once with speed forced at 3Gbps and try and reproduce
2) boot a 2nd time with speed forced at 1.5Gbps and try and reproduce

If libata is not a module in my kernel, I can still put 
libata.force=1.5Gbps 
on the lilo/grub command line, correct?

Thanks,
Marc

On Mon, Feb 25, 2013 at 08:02:32PM -0500, Mathieu Desnoyers wrote:
> - try diagnostic tools from your drive vendor, if it reports your drive
>   as bad, then it might just be your drive failing,

Good point, drive is brand new (just replaced).

> - try to run a SMART test from smartmontools,

Unfortunately, OCZ does not support SMART.

> - try to reproduce your issue with a simple test-case (trying my test
>   program might help) that clearly fails quickly, and all the time, on
>   your problematic hardware,

My test fails 100% on my hardware too. Very easy to reproduce.
I think it's basically a big amount of read/writes that cause it.

> - find out if there are known firmware upgrades for your drive provided
>   by your vendor, try them out,

Did that, I have the latest.

> - find out if there are known BIOS upgrades for your machine provided by
>   your vendor, try them out,
> - try test-case on various kernel versions,
> - try test-case on various distributions (just in case),
> - try test-case with power management disabled in your machine's BIOS,
> - try test-case with other SSD drives of the exact same model as
>   yours, so you can see if it's just you own drive failing,
> - try moving your drive to a different machine (same model, different
>   model), and see if the test-case still fails,
> - try with other SSD drives (from other vendors) on your machine,
> - check if you partition mount options enable TRIM or not, try to
>   disable TRIM explicitly (see mount(8), discard/nodiscard option),
> - try using a different filesystem (just in case),
> - try using a different block I/O scheduler,
> - try using your drive vendor's SSD eraser, to reinitialize your entire
>   disk (yes, you will lose you entire data). This might be useful if
>   TRIM handling has changed after a firmware upgrade for instance.
 
Those will take a while :) especially without spare hardware.

I'll try older kernels first when I have a chance though.

Thanks for your reply.
Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/  
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: iwl3945: order 5 allocation during ifconfig up; vm problem?

2012-09-11 Thread Marc MERLIN
On Wed, Sep 12, 2012 at 07:16:28AM +0200, Eric Dumazet wrote:
> On Tue, 2012-09-11 at 16:25 -0700, Andrew Morton wrote:
> 
> > Asking for a 256k allocation is pretty crazy - this is an operating
> > system kernel, not a userspace application.
> > 
> > I'm wondering if this is due to a recent change, but I'm having trouble
> > working out where the allocation call site is.
> > --
> 
> (Adding Marc Merlin to CC, since he reported same problem)
> 
> Thats the firmware loading in iwlwifi driver. Not sure if it can use SG.
> 
> drivers/net/wireless/iwlwifi/iwl-drv.c
> 
> iwl_alloc_ucode() -> iwl_alloc_fw_desc() -> dma_alloc_coherent()
> 
> It seems some sections of /lib/firmware/iwlwifi*.ucode files are above
> 128 Kbytes, so dma_alloc_coherent() try order-5 allocations

Thanks for looping me in, yes, this looks very familiar to me :)

In the other thread, Johannes Berg gave me this patch which is supposed to
help: http://p.sipsolutions.net/11ea33b376a5bac5.txt

Unfortunately due to very long work days, I haven't had the time to try it
out yet, but I will soon.

Would that help in this case too?

And to answer David Rientjes, I also have compaction on:
gandalfthegreat:~# zgrep CONFIG_COMPACTION /proc/config.gz 
CONFIG_COMPACTION=y

Full config:
http://marc.merlins.org/tmp/config-3.5.2-amd64-preempt-noide-20120731

If that helps for comparison, my thread is here:
http://www.spinics.net/lists/linux-wireless/msg96438.html

Thanks,
Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/  
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Fairly reproduceable crash in 3.7.1 Null pointer rb_erase+0xc4/0x292

2013-01-02 Thread Marc MERLIN
Google only shows hits pointing to an ext4 patch that didn't go in 3.7
proper.

http://marc.merlins.org/tmp/crash.jpg

My call trace doesn't look copmlete, but shows "fatal exception in interrupt" 
and:
timerqueue_del
__remove_hrtimer
__run_htrimer
hrtimer_interruypt
smp_apic_timer_interrupt

paravirt_read_tpe
intel_idle
intel_idle
cpuidle_enter

I had pretty repeated crashes when plugging power back into my running
laptop, but the display just freezes and I can't get a dump.

For the crash here, I did: suspend to RAM, plug power back in, wake up.
Laptop crashed about 3 seconds after wakeup.

I'm on vacation with no hardware to get a proper crash dump or even
serial console, but I have a bad screenshot.

Boy do I wish this could be saved in some kind of NVRAM instead like on
android.


Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Fairly reproduceable crash in 3.7.1 Null pointer rb_erase+0xc4/0x292

2013-01-02 Thread Marc MERLIN
On Wed, Jan 02, 2013 at 02:27:57PM -0800, Marc MERLIN wrote:
> Google only shows hits pointing to an ext4 patch that didn't go in 3.7
> proper.
> 
> http://marc.merlins.org/tmp/crash.jpg

Grumble, I kind of forgot to add the link to my .config, sorry about
that:
http://marc.merlins.org/tmp/.config-3.7.1-amd64-preempt-20121226

While my crash picture is lame (sorry), I'm happy to provide what else I
can before I go home and can provide a proper serial console crash dump.

Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Fairly reproduceable crash in 3.7.1 Null pointer rb_erase+0xc4/0x292

2013-01-03 Thread Marc MERLIN
On Thu, Jan 03, 2013 at 08:12:18AM +0100, Romain Francoise wrote:
> Marc MERLIN  writes:
> 
> > I had pretty repeated crashes when plugging power back into my running
> > laptop, but the display just freezes and I can't get a dump.
> 
> > For the crash here, I did: suspend to RAM, plug power back in, wake up.
> > Laptop crashed about 3 seconds after wakeup.
> 
> Sounds like https://bugzilla.kernel.org/show_bug.cgi?id=51661 which is
> fixed by 3935e89505a1c3ab3f3b0c7ef0eae54124f48905 ("watchdog: Fix
> disable/enable regression"), expect that in 3.7.2...

That looks like a good match, thank you. 
Hopefully this quick thread will help google steer folks who do get a
crash to the right page and fix.

Cheers,
Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Supporting SYSRQ on broken laptops like the thinkpad T530

2013-03-22 Thread Marc MERLIN
On Wed, Jan 09, 2013 at 03:36:44AM +0100, Roland Eggner wrote:
> On 2013-01-08 Tuesday at 15:09 -0800 Marc MERLIN wrote:
> > In its infinite wisdom, lenovo has removed the sysrq key on the latest
> > thinkpads, and replaced it with a stupid ALT+FN+S key combination, which
> > doesn't really work for doing sysrq from the console (nor do I know how the
> > genius who did that intended for SYSRQ-S to work).
> > http://forums.lenovo.com/t5/T400-T500-and-newer-T-series/T430-s-T530-Where-are-the-shortcut-function-keys-break-Pause-etc/ta-p/781749
> > 
> > I realize that one solution is to throw my laptop window at a suitable high
> > floorand replace it with one from a vendor that doesn't randomly remove keys
> > from the keyboard.
> > That said, I was wondering if there were other solutions, especially
> > considering that thinkpads used to be the better linux laptops.
> 
> My Dell “Precision M4500” notebook suffers similar (same?) problem.  So far 
> I could not find a solution better than this:  e.g. Alt-Fn-SysRq-s
> 
> press and hold Alt
> press and hold Fn
> press and leave F10|SysRq
> leave Fn
> press and leave s
> leave Alt

Just for the sake of the archives, turns out that on the lenovo T430 and T530 
you should ignore the Lenovo documentation I quoted above, and you can 
indeed use the PrtSc key between Right Alt and Right Ctrl, that key works
just fine for Sysrq.

I have no idea why Lenovo felt they had to document some complicated
alternate software sysrq with Fn+S

Anyway, hope this helps someone.

Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/  
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Reproduceable SATA lockup on 3.7.8 with SSD

2013-02-28 Thread Marc MERLIN
On Tue, Feb 26, 2013 at 08:50:04AM -0800, Marc MERLIN wrote:
> On Tue, Feb 26, 2013 at 10:29:59AM -0500, Jeff Garzik wrote:
> > On 02/25/2013 07:27 PM, Marc MERLIN wrote:
> > >Howdy,
> > >
> > >I seem to have the same problem (or similar) as Mathieu Desnoyers in
> > >https://lkml.org/lkml/2013/2/22/437
> > >
> > >I can reliably get my SSD to drop from the SATA bus given the right 
> > >workload
> > >on linux.
> > >
> > >How can I tell if it's linux's fault of the drive's fault?
> > 
> > Manually force speed to 3.0 Gbps, then 1.5 Gbps, and see what happens.
> > 
> > Try module/kernel parameter libata.force=1.5Gbps or libata.force=3.0Gbps
> 
> Ok, so by reading my log at time of failure, you saw that speed was
> flipping between the two? (I couldn't see that, but I'm not good at reading
> it).
> 
> Also, just to make sure, you're not saying that you want me to change the
> speed at runtime, but 
> 1) boot once with speed forced at 3Gbps and try and reproduce
> 2) boot a 2nd time with speed forced at 1.5Gbps and try and reproduce
> 
> If libata is not a module in my kernel, I can still put 
> libata.force=1.5Gbps 
> on the lilo/grub command line, correct?

Jeff, could you clear up what you'd like me to try out?

Thanks,
Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/  
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Supporting SYSRQ on broken laptops like the thinkpad T530

2013-03-30 Thread Marc MERLIN
On Sat, Mar 30, 2013 at 06:56:28PM +0100, Pavel Machek wrote:
> Sometimes it works, sometimes it does not. Don't blame lenovo for
> that.
> 
> Maybe it should be modified to take sysrq and _then_ key?
> 
> Or maybe we should use something like lshift+rshift+lalt+ralt+key?

It can't hurt to add alternatives like the one you suggested. They don't
have to be convenient, although the one you suggest takes 5 fingers at
the same time :)

Is there anything that uses
shift+ctrl+alt + key 
in userspace?

I checked enlightenment 17, they have crazy key bindings, but nothing
that uses all 3 modifier keys at the same time.
If that's not safe, feel free to add one or 2 more just to be safe.

Thanks for suggesting this.

Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Supporting SYSRQ on broken laptops like the thinkpad T530

2013-01-08 Thread Marc MERLIN
In its infinite wisdom, lenovo has removed the sysrq key on the latest
thinkpads, and replaced it with a stupid ALT+FN+S key combination, which
doesn't really work for doing sysrq from the console (nor do I know how the
genius who did that intended for SYSRQ-S to work).
http://forums.lenovo.com/t5/T400-T500-and-newer-T-series/T430-s-T530-Where-are-the-shortcut-function-keys-break-Pause-etc/ta-p/781749

I realize that one solution is to throw my laptop window at a suitable high
floorand replace it with one from a vendor that doesn't randomly remove keys
from the keyboard.
That said, I was wondering if there were other solutions, especially
considering that thinkpads used to be the better linux laptops.

Thanks,
Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/  
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Supporting SYSRQ on broken laptops like the thinkpad T530

2013-01-08 Thread Marc MERLIN
On Wed, Jan 09, 2013 at 03:36:44AM +0100, Roland Eggner wrote:
> On 2013-01-08 Tuesday at 15:09 -0800 Marc MERLIN wrote:
> > In its infinite wisdom, lenovo has removed the sysrq key on the latest
> > thinkpads, and replaced it with a stupid ALT+FN+S key combination, which
> > doesn't really work for doing sysrq from the console (nor do I know how the
> > genius who did that intended for SYSRQ-S to work).
> > http://forums.lenovo.com/t5/T400-T500-and-newer-T-series/T430-s-T530-Where-are-the-shortcut-function-keys-break-Pause-etc/ta-p/781749
> > 
> > I realize that one solution is to throw my laptop window at a suitable high
> > floorand replace it with one from a vendor that doesn't randomly remove keys
> > from the keyboard.
> > That said, I was wondering if there were other solutions, especially
> > considering that thinkpads used to be the better linux laptops.
> 
> My Dell “Precision M4500” notebook suffers similar (same?) problem.  So far 
> I could not find a solution better than this:  e.g. Alt-Fn-SysRq-s
> 
> press and hold Alt
> press and hold Fn
> press and leave F10|SysRq
> leave Fn
> press and leave s
> leave Alt

Holy crap. That works for me too. If only lenovo could have been bothered to
document it properly. It's still a pitty to type and remmember the exact
hold and release key sequences, but it's better than nothing.

Thanks much.

> Several months ago a LKML user claimed, his cat had managed to press 
> Alt-Fn-SysRq-c on his Dell Latitude notebook with similar keyboard, and 
> provided 
> photos showing the kernel crash message ;)

Yeah, but my cat is not nearly smart enough for that :)

Thanks for your help again,
Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/  
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] 2.4 version of my duplicate IP and MAC detection patch

2000-09-27 Thread Marc MERLIN

On Sat, Sep 23, 2000 at 02:02:24PM +, Julian Anastasov wrote:
> > I didn't receive  any negative comments, except for Alexey  who believed the
> > check should be done in user space.
> 
>   Now you receive another negative comment, for the 2.2 version :)
 
Thanks for the feedback, it is appreciated.
 
>   Currently, in Linux 2.2 there is a device flag "hidden" which
> is based on this statement: many host can configure same IP address
> but it is assumed that only one is advertised. Your patch now will

Yes, I know LVS and arp_invisible, later renamed arp_hidden

> print messages for all these hidden addresses. They are not advertised
> and there is no problem caused from duplication.

I thought about that,  but isn't the shared IP just an IP  alias and not the
primary IP? As far as I know, the machines which share the IP have a primary
IP and put that one in their ARP packets, so my patch should not complain.

That said, adding a flag that lets you disable the duplicate IP detection on
an interface basis wouldn't be a bad idea, I'll look into this.

> - sip=127.0.0.0/8, this address is shared but we "assume" it is not
> advertised from the neighbours
 
Are you saying that  some machines would ARP with a  source IP of localhost?
That'd be  pretty broken, wouldn't  it? Or you talking  about a kind  of DOS
that would trigger warnings on all the machines?
(the dupe check could ignore that)
 
> - you work with ifa_address and not with ifa_local and ifa_mask.

I'll look into this too.

Thanks for your feedback.
Marc
-- 
Microsoft is to software what McDonalds is to gourmet cooking
 
Home page: http://marc.merlins.org/ (friendly to non IE browsers)
Finger [EMAIL PROTECTED] for PGP key and other contact information
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



[patch] 2.4 version of my duplicate IP and MAC detection patch

2000-09-21 Thread Marc MERLIN

I updated my duplicate IP detection patch to work with 2.4.

I announced  the 2.2 version  here last  year, and several  people expr=
essed
interest in it, but it never  made it into the kernel unfortunately. I =
asked
a few times and  eventually gave up as I didn't want  to appear overly =
pushy
and it got included in the kernels that I use (kernels from VA).

I didn't receive  any negative comments, except for Alexey  who believe=
d the
check should be done in user space.

The patches (2.2/2.4) and discussion can be found here:
http://marc.merlins.org/linux/arppatch/

Here are two excerpts:
---=
-
What does the patch do?
It looks at all the broadcast ARP  requests and checks that the source =
IP of
the request is different from the  interface's IP. This will catch a ma=
chine
that is using your IP and is trying to talk to a machine on your net fo=
r the
first time or the first time in a while.
The big plus of this approach is that it's passive=20

It will get your system to output this:=20
Uh Oh, MAC address 00:A0:C9:EE:9C:8A claims to have our IP addresses
(192.168.205.9) (duplicate IP conflict likely)
or this:=20
Uh Oh, I received an ARP packet claiming to be from our MAC address
00:80:C8:47:37:72, but with an IP I don't own (192.168.205.1). Someone =
has
apparently stolen our MAC address
---=
-

---=
-
But then why not write the whole thing in user space?

Well, the line has  to be drawn somewhere... The whole IP  stack could =
be in
user space  if we  wanted... In this  case, the actual  added code  (I'=
m not
talking about the existing code which I  turned into a function) is abo=
ut 20
lines, it's trivial and it uses much  less resources on a slow machine =
(386)
than a  user space solution which  forces a context switches,  system c=
alls,
and memory for that user process.
Also, not  that others are always  right, but do  you know any OS  that=
 does
duplicate IP checking by inspecting ARP requests in user space?
---=
-

I'm attaching the 2.4  version which I'd really like to  see included i=
n the
main tree. While I don't see a good reason to disable this, if what it =
takes
is a  config option, experimental  or not, disabled  by default or  not=
 (I'd
rather  have it  non experimental  since  it's a  year old,  and enable=
d  by
default, but I'll settle), I'll do what it takes.

I'm attaching the 2.4  version and you can find the 2.2  version, as we=
ll as
more info on my page:
http://marc.merlins.org/linux/arppatch/

Thanks,
Marc

diff -urN linux-2.2.4-test5/net/ipv4/arp.c linux-2.2.4-test5-detectarpd=
upe/net/ipv4/arp.c
--- linux-2.2.4-test5/net/ipv4/arp.cFri Jul 21 21:54:29 2000
+++ linux-2.2.4-test5-detectarpdupe/net/ipv4/arp.c  Sun Sep 17 19:19:49 =
2000
@@ -65,6 +65,8 @@
  * clean up the APFDDI & gen. FDDI bits.
  * Alexey Kuznetsov:   new arp state machine;
  *     now it is in net/core/neighbour.c.
+ * Marc Merlin :   Added duplicate IP and MAC address
+ * detection (2000/09/17)
  */
=20
 /* RFC1122 Status:
@@ -121,6 +123,8 @@
=20
 #include 
 #include 
+=20
+#undef IDONTRECEIVEMYOWNPACKETSBACK=20
=20
 #if defined(CONFIG_AX25) || defined(CONFIG_AX25_MODULE)
 static char *ax2asc2(ax25_address *a, char *buf);
@@ -135,6 +139,7 @@
 static void arp_solicit(struct neighbour *neigh, struct sk_buff *skb);
 static void arp_error_report(struct neighbour *neigh, struct sk_buff *=
skb);
 static void parp_redo(struct sk_buff *skb);
+static char *mac2asc(unsigned char *sha, unsigned char addr_len);
=20
 static struct neigh_ops arp_generic_ops =3D
 {
@@ -716,6 +721,55 @@
goto out;
}
=20
+   if (!memcmp(sha,dev->dev_addr,dev->addr_len))
+   {
+   char ourip=3D0;
+   struct in_device *idev=3Ddev->ip_ptr;
+   struct in_ifaddr *adlist=3Didev->ifa_list;
+   =09
+   while (adlist !=3D NULL)
+   {
+   if (adlist->ifa_address =3D=3D sip) {
+   =09
+   ourip=3D1;
+   break;
+   }
+   adlist=3Dadlist->ifa_next;
+   }
+   =09
+   if (net_ratelimit()) {
+   if (ourip) {
+#ifdef IDONTRECEIVEMYOWNPACKETSBACK
+/* This is an attempt at detecting that someone stole your MAC and you=
r IP, but
+ * in some network configurations and with some switches, you will get=
 your
+ * own packets back, so this warning would be triggered by error for t=
oo m

Re: [patch] 2.4 version of my duplicate IP and MAC detection patch

2000-09-21 Thread Marc MERLIN

On Fri, Sep 22, 2000 at 01:31:06AM +0200, Andi Kleen wrote:
> You added a linear IP search to fast path ARP processing. The people running 
> thousands of IP aliases will surely love you. You could at least use the
> ip_route_input output instead that arp_rcv computes anyways and check
> for RTN_LOCAL. 
 
While you actually  don't get broadcast ARP request very  often (more than a
few per minute is rare), even on a busy net, making it faster doesn't hurt.
I'll write a new patch, thanks.
 
> BTW, the idea  of doing it in user  space is not to have  a daemon running
> but just to do DAD once when you configure the ip address, like most other
> OSes do  [as easily done  with arping and a  small script, see  ipcfg from
> iproute2].
 
I know  about this. It  only helps you  not to steal  someone else's  IP, it
doesn't help when someone else just stole your IP.
Take a Solaris box, an IRIX one, or  windows (these are the only ones I have
access to for testing) and they'll all  complain and notice if I steal their
IP.
I find it  useful that a server  syslogs the fact that its  IP was stolen. I
can use that info  to bring up a temporary DHCP IP, and  send a message to a
central  network  monitor which  will  trace  the  culprit MAC  address  and
optionally turn off the switched port it came from.
 
Marc
-- 
Microsoft is to software what McDonalds is to gourmet cooking
 
Home page: http://marc.merlins.org/ (friendly to non IE browsers)
Finger [EMAIL PROTECTED] for PGP key and other contact information
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [patch] 2.4 version of my duplicate IP and MAC detection patch

2000-09-22 Thread Marc MERLIN

On Fri, Sep 22, 2000 at 01:25:54AM -0700, David S. Miller wrote:
> You've made the foo-address to ascii string routines non-reentrant.
> The hbuffer[] was on the local stack for a very good reason.
 
You are right, fixed.
http://marc.merlins.org/linux/arppatch/arp-patch-2.4_v1.3
(that part of the patch is a year old, and I honestly don't remember why
hbuffer became a static, as it is obviously wrong)
 
> Why can't you write a userspace daemon that listens on one of the
> lower level raw'ish sockets for arp packets and do the same checks
> there.

You can.

> I don't like this change at all, I think it can be done completely
> in user space.  The existence of a working tcpdump is proof of this
> fact. :-)  Whether it can be done efficiently is another issue.

That was my original point.
http://marc.merlins.org/linux/arppatch/

But then why not write the whole thing in user space?
Well, the line has  to be drawn somewhere... The whole IP  stack could be in
user space  if we  wanted... In this  case, the actual  added code  (I'm not
talking about the existing code which I  turned into a function) is about 20
lines, it's trivial and it uses much  less resources on a slow machine (386)
than a  user space solution which  forces a context switches,  system calls,
and memory for that user process.
Also, not  that others are always  right, but do  you know any OS  that does
duplicate IP checking by inspecting ARP requests in user space?


> Making it possible to do this efficiently would be the kernel change
> which might result from your work on a userspace variant, so have at
> it.

You're saying that you'd rather have a hook to do this from user space?
I guess I didn't see the point since the kernel change is so small.

> Even failing that, I would prefer something like a special "arp
> netlink socket" which would allow a privileged userspace program
> to hear all arp traffic the machine can hear.

I guess I can see why you'd want that, but it will be more code and overhead
than  the  current solution  (by  quite  a  bit  actually, and  Andi  seemed
concerned about  not impacting  the fast  path, which this  will, and  in an
significant way).

Again, everyone  else isn't always right,  but all the other  systems I know
check for dupe IP by looking at ARP  packets, and do it in the kernel, since
it's a simple check.

On Fri, Sep 22, 2000 at 01:19:30PM +0200, Andi Kleen wrote:
> On Fri, Sep 22, 2000 at 01:25:54AM -0700, David S. Miller wrote:
> > I don't like this change at all, I think it can be done completely
> > in user space.  The existence of a working tcpdump is proof of this
> > fact. :-)  Whether it can be done efficiently is another issue.
> 
> I agree. I think DAD once during IP configuration should be enough.
 
Come on, Andi, it's not. You do DAD, you get your IP, I plug my laptop, use
your IP, you don't even know it. My patch lets you know.
The reason I wrote it is that I've seen this happen too many times already.

 
On Fri, Sep 22, 2000 at 04:10:53AM -0700, David S. Miller wrote:
>That already exists in form of a packet socket bound to the ARP
>IEEE protocol. Marc is probably right though that running an arp
>daemon all the time just for that would be a bit of overkill
>though.
> 
> Then it stands to reason that it's _really_ overkill to have this kind
> of stuff in the kernel too :-)

It's not the same. It's overkill do to this in userspace because you need to
be looking  at the  packets a  second time, with  context switches  and all,
while in the kernel, you already have  the ARP packet in hand, you just take
a quick extra peek at it.

But going back to the original  point, passively checking the from addresses
of ARP packets you are already receiving is useful and induces just about no
extra load.

I can fix the  patch, but if you're really against the  concept, you can let
me know and I'll leave you alone :-)
Regardless though,  linux is one of  the few well known  TCP/IP capable OSes
that doesn't say a word when its IP  is being used by someone else, and this
has to be fixed some way or another. I simply believe my way is the simplest
and the lightest, but you're more than welcome to write you own and prove me
wrong :-)

Marc
-- 
Microsoft is to software what McDonalds is to gourmet cooking
 
Home page: http://marc.merlins.org/ (friendly to non IE browsers)
Finger [EMAIL PROTECTED] for PGP key and other contact information
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



NVME regression in all kernels after 4.4.x for NVME in M2 slot for laptop?

2016-08-05 Thread Marc MERLIN
I've been stuck on 4.4.x for a while (currently 4.4.5) because any
subsequent kernel would fail to suspend or resume (S3 sleep) on my
Thinkpad P70.

Due to lack of time, I only got around to doing a git bisect now
(sorry), and did it between 4.4.0 and 4.5.0
It's my first bisect, but I hope I did it right outside of the fact that
my kernel wasn't exactly the same each time due to having my .config
file change depending on which kernel I ended up on.

However, the patch found by bisect makes sense that it would be a good
culprit.
I use an NVME 512GB SSD in my laptop, and I guess very few people use those
which could be why I'm the first/only person to report this.

Sadly because NVME changed a lot between 4.4 and 4.5 and I'm not a
kernel hacker, I can't just reverse apply the patch to 4.5 and see if it
works because I'd have to unroll a bunch of other changes too, and
that's a bit beyond my expertise and time at hand right now.

Would this patch make sense as being the reason why I can't S3 sleep
anymore and would you have a test patch against 4.5, 4.6, or 4.7 I can
try to see if it fixes the problem?
Symptom is that my red LED (the dot for in in thinkpad on the back
cover) goes flashing in weird ways when I shut the lid, but not always
the same pattern, however none are the normal on/off gentle pulsing that
indicate proper S3 sleep.
The caps lock key LED also flashes rapidly when I open the lid and the
laptop is stone dead at this point.

Boot logs on 4.4.5 kernel where sleep works fine:
[1.245549] ahci :00:17.0: version 3.0
[1.245733] ahci :00:17.0: AHCI 0001.0301 32 slots 2 ports 6 Gbps 0xc 
impl SATA mode
[1.245771] ahci :00:17.0: flags: 64bit ncq sntf pm led clo only pio 
slum part ems deso sadm sds apst 
[1.251140] scsi host0: ahci
[1.251587] scsi host1: ahci
[1.251972] scsi host2: ahci
[1.252360] scsi host3: ahci
[1.252437] ata1: DUMMY
[1.252449] ata2: DUMMY
[1.252462] ata3: SATA max UDMA/133 abar m2048@0xd584c000 port 0xd584c200 
irq 122
[1.252499] ata4: SATA max UDMA/133 abar m2048@0xd584c000 port 0xd584c280 
irq 122
[1.253374] scsi host4: pata_legacy
[1.253439] ata5: PATA max PIO4 cmd 0x1f0 ctl 0x3f6 irq 14
[1.355385]  nvme0n1: p1 p2 p3 p4 p5 p6 p7 p8
[1.570804] ata3: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[1.570877] ata4: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[1.573097] ata3.00: ACPI cmd ef/02:00:00:00:00:a0 (SET FEATURES) succeeded
[1.573101] ata3.00: ACPI cmd f5/00:00:00:00:00:a0 (SECURITY FREEZE LOCK) 
filtered out
[1.573690] ata3.00: supports DRM functions and may not be fully accessible
[1.574399] ata3.00: disabling queued TRIM support
[1.574402] ata3.00: ATA-9: Samsung SSD 850 EVO 2TB, EMT01B6Q, max UDMA/133
[1.574435] ata3.00: 3907029168 sectors, multi 1: LBA48 NCQ (depth 31/32), AA
[1.575954] ata3.00: ACPI cmd ef/02:00:00:00:00:a0 (SET FEATURES) succeeded
[1.575958] ata3.00: ACPI cmd f5/00:00:00:00:00:a0 (SECURITY FREEZE LOCK) 
filtered out
[1.576550] ata3.00: supports DRM functions and may not be fully accessible
[1.577209] ata3.00: disabling queued TRIM support
[1.578007] ata3.00: configured for UDMA/133
[1.578037] ata4.00: ACPI cmd ef/02:00:00:00:00:a0 (SET FEATURES) succeeded
[1.578040] ata4.00: ACPI cmd f5/00:00:00:00:00:a0 (SECURITY FREEZE LOCK) 
filtered out


Patch found by bisect, attached

Thanks,
Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/ | PGP 1024R/763BE901
saruman:/usr/src/linux# git bisect good
25646264e15af96c5c630fc742708b1eb3339222 is the first bad commit
commit 25646264e15af96c5c630fc742708b1eb3339222
Author: Keith Busch 
Date:   Mon Jan 4 09:10:57 2016 -0700

NVMe: Remove queue freezing on resets

NVMe submits all commands through the block layer now. This means we
can let requests queue at the blk-mq hardware context since there is no
path that bypasses this anymore so we don't need to freeze the queues
anymore. The driver can simply stop the h/w queues from running during
a reset instead.

This also fixes a WARN in percpu_ref_reinit when the queue was unfrozen
with requeued requests.

Signed-off-by: Keith Busch 
Signed-off-by: Jens Axboe 


diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index e31a256..8da4a8a 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -1372,12 +1372,14 @@ out:
 	return ret;
 }
 
-void nvme_stop_queues(struct nvme_ctrl *ctrl)
+void nvme_freeze_queues(struct nvme_ctrl *ctrl)
 {
 	struct nvme_ns *ns;
 
 	mutex_lock(&ctrl->namespaces_mutex);
 	list_for_each_entry(ns, &ctrl->namespaces, list) {
+		blk_mq_freeze_queue_start(ns->queue);
+
 		spin_lock_irq(ns->queue->queue_lock);
 		queue_flag_set

Re: [PATCH 4.14 095/140] bcache: fix crashes in duplicate cache device register

2018-03-13 Thread Marc MERLIN
On Tue, Mar 13, 2018 at 04:24:58PM +0100, Greg Kroah-Hartman wrote:
> 4.14-stable review patch.  If anyone has any objections, please let me know.
 
Just in case someone is considering whether it's important to merge, the
bug did crash my kernel of course, but I'm virtually certain it was also
responsible for corrupting my existing bcache device enough that I had
to restore it from backup.

Thanks again to Tang for fixing it.


> --
> 
> From: Tang Junhui 
> 
> commit cc40daf91bdddbba72a4a8cd0860640e06668309 upstream.
> 
> Kernel crashed when register a duplicate cache device, the call trace is
> bellow:
> [  417.643790] CPU: 1 PID: 16886 Comm: bcache-register Tainted: G
>W  OE4.15.5-amd64-preempt-sysrq-20171018 #2
> [  417.643861] Hardware name: LENOVO 20ERCTO1WW/20ERCTO1WW, BIOS
> N1DET41W (1.15 ) 12/31/2015
> [  417.643870] RIP: 0010:bdevname+0x13/0x1e
> [  417.643876] RSP: 0018:a3aa9138fd38 EFLAGS: 00010282
> [  417.643884] RAX:  RBX: 8c8f2f2f8000 RCX: d6701f8
> c7edf
> [  417.643890] RDX: a3aa9138fd88 RSI: a3aa9138fd88 RDI: 000
> 0
> [  417.643895] RBP: a3aa9138fde0 R08: a3aa9138fae8 R09: 000
> 1850e
> [  417.643901] R10: 8c8eed34b271 R11: 8c8eed34b250 R12: 000
> 0
> [  417.643906] R13: d6701f78f940 R14: 8c8f38f8 R15: 8c8ea7d
> 9
> [  417.643913] FS:  7fde7e66f500() GS:8c8f6144() knlGS:
> 
> [  417.643919] CS:  0010 DS:  ES:  CR0: 80050033
> [  417.643925] CR2: 0314 CR3: 0007e6fa0001 CR4: 003
> 606e0
> [  417.643931] DR0:  DR1:  DR2: 000
> 0
> [  417.643938] DR3:  DR6: fffe0ff0 DR7: 000
> 00400
> [  417.643946] Call Trace:
> [  417.643978]  register_bcache+0x1117/0x1270 [bcache]
> [  417.643994]  ? slab_pre_alloc_hook+0x15/0x3c
> [  417.644001]  ? slab_post_alloc_hook.isra.44+0xa/0x1a
> [  417.644013]  ? kernfs_fop_write+0xf6/0x138
> [  417.644020]  kernfs_fop_write+0xf6/0x138
> [  417.644031]  __vfs_write+0x31/0xcc
> [  417.644043]  ? current_kernel_time64+0x10/0x36
> [  417.644115]  ? __audit_syscall_entry+0xbf/0xe3
> [  417.644124]  vfs_write+0xa5/0xe2
> [  417.644133]  SyS_write+0x5c/0x9f
> [  417.644144]  do_syscall_64+0x72/0x81
> [  417.644161]  entry_SYSCALL_64_after_hwframe+0x3d/0xa2
> [  417.644169] RIP: 0033:0x7fde7e1c1974
> [  417.644175] RSP: 002b:7fff13009a38 EFLAGS: 0246 ORIG_RAX: 000
> 1
> [  417.644183] RAX: ffda RBX: 01658280 RCX: 7fde7e1c
> 1974
> [  417.644188] RDX: 000a RSI: 01658280 RDI: 
> 0001
> [  417.644193] RBP: 000a R08: 0003 R09: 
> 0077
> [  417.644198] R10: 089e R11: 0246 R12: 
> 0001
> [  417.644203] R13: 000a R14: 7fff R15: 
> 
> [  417.644213] Code: c7 c2 83 6f ee 98 be 20 00 00 00 48 89 df e8 6c 27 3b 0
> 0 48 89 d8 5b c3 0f 1f 44 00 00 48 8b 47 70 48 89 f2 48 8b bf 80 00 00 00 <8
> b> b0 14 03 00 00 e9 73 ff ff ff 0f 1f 44 00 00 48 8b 47 40 39
> [  417.644302] RIP: bdevname+0x13/0x1e RSP: a3aa9138fd38
> [  417.644306] CR2: 0314
> 
> When registering duplicate cache device in register_cache(), after failure
> on calling register_cache_set(), bch_cache_release() will be called, then
> bdev will be freed, so bdevname(bdev, name) caused kernel crash.
> 
> Since bch_cache_release() will free bdev, so in this patch we make sure
> bdev being freed if register_cache() fail, and do not free bdev again in
> register_bcache() when register_cache() fail.
> 
> Signed-off-by: Tang Junhui 
> Reported-by: Marc MERLIN 
> Tested-by: Michael Lyle 
> Reviewed-by: Michael Lyle 
> Cc: 
> Signed-off-by: Jens Axboe 
> Signed-off-by: Greg Kroah-Hartman 
> 
> ---
>  drivers/md/bcache/super.c |   16 ++--
>  1 file changed, 10 insertions(+), 6 deletions(-)
> 
> --- a/drivers/md/bcache/super.c
> +++ b/drivers/md/bcache/super.c
> @@ -1181,7 +1181,7 @@ static void register_bdev(struct cache_s
>  
>   return;
>  err:
> - pr_notice("error opening %s: %s", bdevname(bdev, name), err);
> + pr_notice("error %s: %s", bdevname(bdev, name), err);
>   bcache_device_stop(&dc->disk);
>  }
>  
> @@ -1849,6 +1849,8 @@ static int register_cache(struct cache_s
>   const char *err = NULL; /* must be set for any error case */
>   int ret = 0;
>  
> + bdevname(bdev, name);
> +
>   memcpy(&ca->sb, sb, sizeof(struct cache_sb));
>   

Re: [PATCH 4.14 095/140] bcache: fix crashes in duplicate cache device register

2018-03-13 Thread Marc MERLIN
[linux-kernel to bcc, moving back to bcache list]

On Tue, Mar 13, 2018 at 10:26:33AM -0700, Michael Lyle wrote:
> Though note you're still not safe from -that-.  If there's duplicate
> UUIDs around because you've duplicated devices, there's just no sane
> way to tell which is the "right one" to attach to.

Thanks for clearing that up, Mike.

So, what happened to me was
1) I dd'ed drive1 to drive2 (raw device)
2) while that was going on, I ran fdisk on drive2 to fix a partition type
3) saving fdisk caused drive2 to be rescanned by the kernel
4) udev said, oh, a bcache partition, yummy, let me register that
5) instead I got a kernel crash that got fixed by this patch
6) tried to reboot a few times, and each time the kernel would crash
early, until I found out it was bcache, removed drive2, system came back
up
7) by then, my bcache filesystem was heavily corrupted and unsuable

If there is a duplicate cache device UUID, wouldn't bcache just use the
first one it sees and ignore the 2nd one? 
In my case this would have been the safe thing and I'm guessing in most
cases, whatever device the UUID got duplicated on, will come 2nd in the
boot order, and therefore is safer to ignore, even if the duplicate
situation isn't safe per se.

What do you think?

Thanks,
Marc

> Mike
> 
> On Tue, Mar 13, 2018 at 9:19 AM, Marc MERLIN  wrote:
> > On Tue, Mar 13, 2018 at 04:24:58PM +0100, Greg Kroah-Hartman wrote:
> >> 4.14-stable review patch.  If anyone has any objections, please let me 
> >> know.
> >
> > Just in case someone is considering whether it's important to merge, the
> > bug did crash my kernel of course, but I'm virtually certain it was also
> > responsible for corrupting my existing bcache device enough that I had
> > to restore it from backup.
> >
> > Thanks again to Tang for fixing it.
> >
> >
> >> --
> >>
> >> From: Tang Junhui 
> >>
> >> commit cc40daf91bdddbba72a4a8cd0860640e06668309 upstream.
> >>
> >> Kernel crashed when register a duplicate cache device, the call trace is
> >> bellow:
> >> [  417.643790] CPU: 1 PID: 16886 Comm: bcache-register Tainted: G
> >>W  OE4.15.5-amd64-preempt-sysrq-20171018 #2
> >> [  417.643861] Hardware name: LENOVO 20ERCTO1WW/20ERCTO1WW, BIOS
> >> N1DET41W (1.15 ) 12/31/2015
> >> [  417.643870] RIP: 0010:bdevname+0x13/0x1e
> >> [  417.643876] RSP: 0018:a3aa9138fd38 EFLAGS: 00010282
> >> [  417.643884] RAX:  RBX: 8c8f2f2f8000 RCX: d6701f8
> >> c7edf
> >> [  417.643890] RDX: a3aa9138fd88 RSI: a3aa9138fd88 RDI: 000
> >> 0
> >> [  417.643895] RBP: a3aa9138fde0 R08: a3aa9138fae8 R09: 000
> >> 1850e
> >> [  417.643901] R10: 8c8eed34b271 R11: 8c8eed34b250 R12: 000
> >> 0
> >> [  417.643906] R13: d6701f78f940 R14: 8c8f38f8 R15: 8c8ea7d
> >> 9
> >> [  417.643913] FS:  7fde7e66f500() GS:8c8f6144() knlGS:
> >> 
> >> [  417.643919] CS:  0010 DS:  ES:  CR0: 80050033
> >> [  417.643925] CR2: 0314 CR3: 0007e6fa0001 CR4: 003
> >> 606e0
> >> [  417.643931] DR0:  DR1:  DR2: 000
> >> 0
> >> [  417.643938] DR3:  DR6: fffe0ff0 DR7: 000
> >> 00400
> >> [  417.643946] Call Trace:
> >> [  417.643978]  register_bcache+0x1117/0x1270 [bcache]
> >> [  417.643994]  ? slab_pre_alloc_hook+0x15/0x3c
> >> [  417.644001]  ? slab_post_alloc_hook.isra.44+0xa/0x1a
> >> [  417.644013]  ? kernfs_fop_write+0xf6/0x138
> >> [  417.644020]  kernfs_fop_write+0xf6/0x138
> >> [  417.644031]  __vfs_write+0x31/0xcc
> >> [  417.644043]  ? current_kernel_time64+0x10/0x36
> >> [  417.644115]  ? __audit_syscall_entry+0xbf/0xe3
> >> [  417.644124]  vfs_write+0xa5/0xe2
> >> [  417.644133]  SyS_write+0x5c/0x9f
> >> [  417.644144]  do_syscall_64+0x72/0x81
> >> [  417.644161]  entry_SYSCALL_64_after_hwframe+0x3d/0xa2
> >> [  417.644169] RIP: 0033:0x7fde7e1c1974
> >> [  417.644175] RSP: 002b:7fff13009a38 EFLAGS: 0246 ORIG_RAX: 
> >> 000
> >> 1
> >> [  417.644183] RAX: ffda RBX: 01658280 RCX: 
> >> 7fde7e1c
> >> 1974
> >> [  417.644188] RDX: 000a RSI: 01658280 RDI: 
> >> 
> >> 0001
> >> [  417.644193] RBP: 000a R

Re: 5.9.11 still hanging 2mn at each boot and looping on nvidia-gpu 0000:01:00.3: PME# enabled (Quadro RTX 4000 Mobile)

2021-01-28 Thread Marc MERLIN
On Wed, Jan 27, 2021 at 03:33:00PM -0600, Bjorn Helgaas wrote:
> Hi Marc, I appreciate your persistence on this.  I am frankly
> surprised that you've put up with this so long.
 
Well, been using linux for 27 years, but also it's not like I have much
of a choice outside of switching to windows, as tempting as it's getting
sometimes ;)

> > after boot, when it gets the right trigger (not sure which ones), it
> > loops on this evern 2 seconds, mostly forever.
> > 
> > I'm not sure if it's nouveau's fault or the kernel's PCI PME's fault, or 
> > something else.
> 
> IIUC there are basically two problems:
> 
>   1) A 2 minute delay during boot
> Another random thought: is there any chance the boot delay could be
> related to crypto waiting for entropy?

So, the 2mn hang went away after I added the nouveau firwmare in initrd.
The only problem is that the nouveau driver does not give a very good
clue as to what's going on and what to do.
For comparison the intel iwlwifi driver is very clear about firmware
it's trying to load, if it can't and what exact firmware you need to
find on the internet (filename)

>   2) Some sort of event every 2 seconds that kills your battery life
> Your machine doesn't sound unusual, and I haven't seen a flood of
> similar reports, so maybe there's something unusual about your config.
> But I really don't have any guesses for either one.

Honestly, there are not too many thinpad P73 running linux out there. I
wouldn't be surprised if it's only a handful or two.

> It sounds like v5.5 worked fine and you first noticed the slow boot
> problem in v5.8.  We *could* try to bisect it, but I know that's a lot
> of work on your part.

I've done that in the past, to be honest now that it works after I added
the firmware that nouveau started needing, and didn't need before, the
hang at boot is gone for sure.
The PCI PM wakeup issues on batteries happen sometimes still, but they
are much more rare now.

> Grasping for any ideas for the boot delay; could you boot with
> "initcall_debug" and collect your "lsmod" output?  I notice async_tx
> in some of your logs, but I have no idea what it is.  It's from
> crypto, so possibly somewhat unusual?

Is this still neeeded? I think of nouveau does a better job of helping
the user correct the issue if firmware is missing (I think intel even
gives a URL in printk), that would probably be what's needed for the
most part.

[   12.832547] async_tx: api initialized (async) comes from 
./crypto/async_tx/async_tx.c

Thanks for your answer, let me know if there is anything else useful I
can give, I think I'm otherwise mostly ok now.

Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
 
Home page: http://marc.merlins.org/   | PGP 7F55D5F27AAF9D08


Re: 5.9.11 still hanging 2mn at each boot and looping on nvidia-gpu 0000:01:00.3: PME# enabled (Quadro RTX 4000 Mobile)

2021-01-30 Thread Marc MERLIN
On Fri, Jan 29, 2021 at 03:20:32PM -0600, Bjorn Helgaas wrote:
> > For comparison the intel iwlwifi driver is very clear about firmware
> > it's trying to load, if it can't and what exact firmware you need to
> > find on the internet (filename)
> 
> I guess you're referring to this in iwl_request_firmware()?
> 
>   IWL_ERR(drv, "check 
> git://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git\n");
>  
 
Yes :)

> How can we fix this in nouveau so we don't have the debug this again?
> I don't really know how firmware loading works, but "git grep -A5
> request_firmware drivers/gpu/drm/nouveau/" shows that we generally
> print something when request_firmware() fails.

Well, have a look at https://pastebin.com/dX19aCpj
do you see any warning whatsoever?

> But I didn't notice those messages in your logs, so I'm probably
> barking up the wrong tree.

you're not It seems that newer kernels are a bit better:
[  189.304662] nouveau :01:00.0: pmu: firmware unavailable
[  189.312455] nouveau :01:00.0: disp: destroy running...
[  189.316552] nouveau :01:00.0: disp: destroy completed in 1us
[  189.320326] nouveau :01:00.0: disp ctor failed, -12
[  189.324214] nouveau: probe of :01:00.0 failed with error -12

So, it probably got better, but that message got displayed after the 2mn
hang that having the firmware, stops from happening.

whichever developer with the right hardware can probably easily
reproduce this by removing the firmware and looking at the boot
messages.

At the very least, it should print something more clear "driver will not
function properly", and a URL to where one can get the driver, would be
awesome.

> So maybe the wakeups are related to having vs not having the nouveau
> firmware?  I'm still curious about that, and it smells like a bug to
> me, but probably something to do with nouveau where I have no hope of
> debugging it.
 
Right. Honestly, given the time I've lost with this, and now that it
seems gone with the firmware, I'm happy to leave well enough alone :)

I'm not sure how you are involved with the driver, but are you able to
help improve the dmesg output?

Thanks,
Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
 
Home page: http://marc.merlins.org/   | PGP 7F55D5F27AAF9D08


Re: [PATCH v2 1/2] PCI: Introduce pcie_wait_for_link_delay()

2020-08-08 Thread Marc MERLIN
On Fri, Oct 04, 2019 at 03:39:46PM +0300, Mika Westerberg wrote:
> This is otherwise similar to pcie_wait_for_link() but allows passing
> custom activation delay in milliseconds.
> 
> Signed-off-by: Mika Westerberg 
> ---
>  drivers/pci/pci.c | 21 ++---
>  1 file changed, 18 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index e7982af9a5d8..bfd92e018925 100644

Hi Mika,

So, I have a thinkpad P73 with thunderbolt, and while I don't boot
often, my last boots have been unreliable at best (was only able to boot
5.7 once, and 5.8 did not succeed either).

5.6 was working for a while, but couldn't boot it either this morning,
so I had to go back to 5.5. This does not mean 5.5 does not have the
problem, just that it booted this morning, while 5.6 didn't when I
tried.
Once the kernel is booted, the problem does not seem to occur much, or
at all.

Basically, I'm getting the same thing than this person with a P53 (which
is a mostly identical lenovo thinkpad, to mine)
kernel: pcieport :00:01.0: PME: Spurious native interrupt!
kernel: pcieport :00:01.0: PME: Spurious native interrupt!
kernel: pcieport :00:01.0: PME: Spurious native interrupt!
kernel: pcieport :00:01.0: PME: Spurious native interrupt!
kernel: pcieport :00:01.0: PME: Spurious native interrupt!
https://bbs.archlinux.org/viewtopic.php?id=250658

The kernel boots eventually, but it takes minutes, and everything is so
super slow, that I just can't reasonably use the machine.

This shows similar issues with 5.3, 5.4.
https://forum.proxmox.com/threads/pme-spurious-native-interrupt-kernel-meldungen.62850/

Another report here with 5.6:
https://bugzilla.redhat.com/show_bug.cgi?id=1831899

My current kernel is running your patch above, and I haven't done a lot
of research yet to confirm whether going back to a kernel before it was
merged, fixes the problem. Unfortunately the problem is not consistent,
so it makes things harder to test/debug, especially on my main laptop
that I do all my work on :)

I noticed this older patch of yours:
http://patchwork.ozlabs.org/project/linux-pci/patch/0113014581dbe2d1f938813f1783905bd81b79db.1560079442.git.lu...@wunner.de/
This patch is not in my kernel, is it worth adding?

Can I get you more info to help debug this?

If that helps:
sauron:/usr/src/linux-5.7.11-amd64-preempt-sysrq-20190816/drivers/pci# lspci
00:00.0 Host bridge: Intel Corporation Device 3e20 (rev 0d)
00:01.0 PCI bridge: Intel Corporation Xeon E3-1200 v5/E3-1500 v5/6th Gen Core 
Processor PCIe Controller (x16) (rev 0d)
00:02.0 VGA compatible controller: Intel Corporation UHD Graphics 630 (Mobile) 
(rev 02)
00:04.0 Signal processing controller: Intel Corporation Xeon E3-1200 v5/E3-1500 
v5/6th Gen Core Processor Thermal Subsystem (rev 0d)
00:08.0 System peripheral: Intel Corporation Xeon E3-1200 v5/v6 / E3-1500 v5 / 
6th/7th/8th Gen Core Processor Gaussian Mixture Model
00:12.0 Signal processing controller: Intel Corporation Cannon Lake PCH Thermal 
Controller (rev 10)
00:14.0 USB controller: Intel Corporation Cannon Lake PCH USB 3.1 xHCI Host 
Controller (rev 10)
00:14.2 RAM memory: Intel Corporation Cannon Lake PCH Shared SRAM (rev 10)
00:15.0 Serial bus controller [0c80]: Intel Corporation Cannon Lake PCH Serial 
IO I2C Controller #0 (rev 10)
00:15.1 Serial bus controller [0c80]: Intel Corporation Cannon Lake PCH Serial 
IO I2C Controller #1 (rev 10)
00:16.0 Communication controller: Intel Corporation Cannon Lake PCH HECI 
Controller (rev 10)
00:17.0 SATA controller: Intel Corporation Cannon Lake Mobile PCH SATA AHCI 
Controller (rev 10)
00:1b.0 PCI bridge: Intel Corporation Cannon Lake PCH PCI Express Root Port #17 
(rev f0)
00:1c.0 PCI bridge: Intel Corporation Cannon Lake PCH PCI Express Root Port #1 
(rev f0)
00:1c.5 PCI bridge: Intel Corporation Cannon Lake PCH PCI Express Root Port #6 
(rev f0)
00:1c.7 PCI bridge: Intel Corporation Cannon Lake PCH PCI Express Root Port #8 
(rev f0)
00:1e.0 Communication controller: Intel Corporation Cannon Lake PCH Serial IO 
UART Host Controller (rev 10)
00:1f.0 ISA bridge: Intel Corporation Cannon Lake LPC Controller (rev 10)
00:1f.3 Audio device: Intel Corporation Cannon Lake PCH cAVS (rev 10)
00:1f.4 SMBus: Intel Corporation Cannon Lake PCH SMBus Controller (rev 10)
00:1f.5 Serial bus controller [0c80]: Intel Corporation Cannon Lake PCH SPI 
Controller (rev 10)
00:1f.6 Ethernet controller: Intel Corporation Ethernet Connection (7) I219-LM 
(rev 10)
01:00.0 VGA compatible controller: NVIDIA Corporation TU104GLM [Quadro RTX 4000 
Mobile / Max-Q] (rev a1)
01:00.1 Audio device: NVIDIA Corporation TU104 HD Audio Controller (rev a1)
01:00.2 USB controller: NVIDIA Corporation TU104 USB 3.1 Host Controller (rev 
a1)
01:00.3 Serial bus controller [0c80]: NVIDIA Corporation TU104 USB Type-C UCSI 
Controller (rev a1)
02:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe SSD 
Controller SM981/PM981/PM983
04:00.0 PCI bridge: Intel Cor

Re: [PATCH v2 1/2] PCI: Introduce pcie_wait_for_link_delay()

2020-08-08 Thread Marc MERLIN
I forgot to add that my mostly hanging boots look like this:
https://photos.app.goo.gl/HJvTraYYZbiNTNE39

Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
 
Home page: http://marc.merlins.org/  


Re: [PATCH v2 1/2] PCI: Introduce pcie_wait_for_link_delay()

2020-08-09 Thread Marc MERLIN
On Sat, Aug 08, 2020 at 01:22:02PM -0700, Marc MERLIN wrote:
> Basically, I'm getting the same thing than this person with a P53 (which
> is a mostly identical lenovo thinkpad, to mine)
> kernel: pcieport :00:01.0: PME: Spurious native interrupt!
> kernel: pcieport :00:01.0: PME: Spurious native interrupt!
> kernel: pcieport :00:01.0: PME: Spurious native interrupt!
> kernel: pcieport :00:01.0: PME: Spurious native interrupt!
> kernel: pcieport :00:01.0: PME: Spurious native interrupt!
> https://bbs.archlinux.org/viewtopic.php?id=250658
 
I had to reboot today and tried my 5.7.11 kernel 6 times.
It never booted and each time got stuck on 
pcieport :00:01.0: PME: Spurious native interrupt!

This is the nvidia driver and claimed by nouveau (I don't use nvidia graphics
but I'm forced to use nouveau to turn the nvidia chip down so that it
doesn't drain my batteries).

I ended up being able to boot the 7th time after removing the yubikey in my 
USB-C 
port, which is also thunderbolt.
PME messages shown below. Let me know if you'd like further data.

Thanks,
Marc

[4.142484] acpi PNP0A08:00: _OSC: OS now controls [PCIeHotplug PME 
PCIeCapability LTR DPC]
[4.151715] pci :00:01.0: PME# supported from D0 D3hot D3cold
[4.151727] pci :00:01.0: PME# disabled
[4.165979] pci :00:14.0: PME# supported from D3hot D3cold
[4.166000] pci :00:14.0: PME# disabled
[4.177746] pci :00:16.0: PME# supported from D3hot
[4.177767] pci :00:16.0: PME# disabled
[4.180850] pci :00:17.0: PME# supported from D3hot
[4.180862] pci :00:17.0: PME# disabled
[4.183830] pci :00:1b.0: PME# supported from D0 D3hot D3cold
[4.183847] pci :00:1b.0: PME# disabled
[4.189643] pci :00:1c.0: PME# supported from D0 D3hot D3cold
[4.189660] pci :00:1c.0: PME# disabled
[4.193085] pci :00:1c.5: PME# supported from D0 D3hot D3cold
[4.193101] pci :00:1c.5: PME# disabled
[4.196462] pci :00:1c.7: PME# supported from D0 D3hot D3cold
[4.196478] pci :00:1c.7: PME# disabled
[4.206057] pci :00:1f.3: PME# supported from D3hot D3cold
[4.206079] pci :00:1f.3: PME# disabled
[4.214993] pci :00:1f.6: PME# supported from D0 D3hot D3cold
[4.215015] pci :00:1f.6: PME# disabled
[4.217978] pci :01:00.0: PME# supported from D0 D3hot
[4.217991] pci :01:00.0: PME# disabled
[4.219129] pci :01:00.2: PME# supported from D0 D3hot
[4.219142] pci :01:00.2: PME# disabled
[4.219578] pci :01:00.3: PME# supported from D0 D3hot
[4.219591] pci :01:00.3: PME# disabled
[4.221398] pci :04:00.0: PME# supported from D0 D1 D2 D3hot D3cold
[4.221433] pci :04:00.0: PME# disabled
[4.82] pci :05:00.0: PME# supported from D0 D1 D2 D3hot D3cold
[4.97] pci :05:00.0: PME# disabled
[4.222792] pci :05:01.0: PME# supported from D0 D1 D2 D3hot D3cold
[4.222806] pci :05:01.0: PME# disabled
[4.223289] pci :05:02.0: PME# supported from D0 D1 D2 D3hot D3cold
[4.223304] pci :05:02.0: PME# disabled
[4.223839] pci :05:04.0: PME# supported from D0 D1 D2 D3hot D3cold
[4.223854] pci :05:04.0: PME# disabled
[4.224645] pci :06:00.0: PME# supported from D0 D1 D2 D3hot D3cold
[4.224661] pci :06:00.0: PME# disabled
[4.225644] pci :2c:00.0: PME# supported from D0 D1 D2 D3hot D3cold
[4.225661] pci :2c:00.0: PME# disabled
[4.227557] pci :52:00.0: PME# supported from D0 D3hot D3cold
[4.227708] pci :52:00.0: PME# disabled
[4.229139] pci :54:00.0: PME# supported from D1 D2 D3hot D3cold
[4.229155] pci :54:00.0: PME# disabled
[7.238126] pcieport :00:01.0: PME: Signaling with IRQ 122
[7.239208] pcieport :00:1b.0: PME: Signaling with IRQ 123
[7.239861] pcieport :00:1c.0: PME: Signaling with IRQ 124
[7.241522] pcieport :00:1c.5: PME: Signaling with IRQ 125
[7.242499] pcieport :00:1c.7: PME: Signaling with IRQ 126
[7.401422] pcieport :05:01.0: PME# enabled
[7.401868] pcieport :05:04.0: PME# enabled
[8.985668] xhci_hcd :01:00.2: PME# enabled
[8.988738] xhci_hcd :2c:00.0: PME# enabled
[9.008649] pcieport :05:02.0: PME# enabled
[   12.378450] nvidia-gpu :01:00.3: PME# enabled
[   25.610848] thunderbolt :06:00.0: PME# enabled
[   25.628766] pcieport :05:00.0: PME# enabled
[   25.648762] pcieport :04:00.0: PME# enabled
[   25.668889] pcieport :00:1c.0: PME# enabled
[  179.608847] nvidia-gpu :01:00.3: PME# disabled
[  179.608873] pcieport :00:01.0: PME: Spurious native interrupt!
[  183.359454] nvidia-gpu :01:00.3: PME# enabled
[  183.396832] nvidia-gpu :01:00.3: PME# disabled
[  183.396859] pcieport :00:01.0: PME: Spurious native interrupt!
[  187.147398] nvidia-gpu :01:00.3: PME# enabled
[  1

Re: 5.9.11 still hanging 2mn at each boot and looping on nvidia-gpu 0000:01:00.3: PME# enabled (Quadro RTX 4000 Mobile)

2020-12-29 Thread Marc MERLIN
On Sat, Dec 26, 2020 at 03:12:09AM -0800, Ilia Mirkin wrote:
> > after boot, when it gets the right trigger (not sure which ones), it
> > loops on this evern 2 seconds, mostly forever.
> 
> The gpu suspends with runtime pm. And then gets woken up for some
> reason (could be something quite silly, like lspci, or could be
> something explicitly checking connectors, etc). Repeat.

Ah, fair point.  Could it be powertop even?
How would I go towards tracing that?
Sounds like this would be a problem with all chips if userspace is able
to wake them up every second or two with a probe. Now I wonder what
broken userspace I have that could be doing this.
 
> Display offload usually requires acceleration -- the copies are done
> using the DMA engine. Please make sure that you have firmware
> available (and a new enough mesa). The errors suggest that you don't
> have firmware available at the time that nouveau loads. Depending on
> your setup, that might mean the firmware has to be built into the
> kernel, or available in initramfs. (Or just regular filesystem if you
> don't use a complicated boot sequence. But many people go with distro
> defaults, which do have this complexity.)

Hi Ilia, thanks for your answer.

Do you think that could be a reason why the boot would hang for 2 full minutes 
at every
boot ever since I upgraded to 5.5?

Also, without wanting to sound like a full newbie, where is that
firmware you're talking about? In my kernel source?

Here's what I do have:
sauron:/usr/local/bin# dpkggrep nouveau
libdrm-nouveau2:amd64   install
xserver-xorg-video-nouveau  install

no nouveau-firmware package in debian:
sauron:/usr/local/bin# apt-cache search nouveau
bumblebee - NVIDIA Optimus support for Linux
libdrm-nouveau2 - Userspace interface to nouveau-specific kernel DRM services 
-- runtime
xfonts-jmk - Jim Knoble's character-cell fonts for X
xserver-xorg-video-nouveau - X.Org X server -- Nouveau display driver

No firmware file on my disk:
sauron:/usr/local/bin# find /lib/modules/5.9.11-amd64-preempt-sysrq-20190817/ 
/lib/firmware/ |grep nouveau
/lib/modules/5.9.11-amd64-preempt-sysrq-20190817/kernel/drivers/gpu/drm/nouveau
/lib/modules/5.9.11-amd64-preempt-sysrq-20190817/kernel/drivers/gpu/drm/nouveau/nouveau.ko
sauron:/usr/local/bin# 

The kernel module is in my initrd:
sauron:/usr/local/bin# dd 
if=/boot/initrd.img-5.9.11-amd64-preempt-sysrq-20190817 bs=2966528  skip=1 | 
gunzip | cpio -tdv | grep nouveau
drwxr-xr-x   1 root root0 Nov 30 15:40 
usr/lib/modules/5.9.11-amd64-preempt-sysrq-20190817/kernel/drivers/gpu/drm/nouveau
-rw-r--r--   1 root root  3691385 Nov 30 15:35 
usr/lib/modules/5.9.11-amd64-preempt-sysrq-20190817/kernel/drivers/gpu/drm/nouveau/nouveau.ko
17+1 records in
17+1 records out
52566778 bytes (53 MB, 50 MiB) copied, 1.69708 s, 31.0 MB/s

What am I supposed to do/check next?

Note that ultimately I only need nouveau not to hang my boot 2mn and do
PM so that the nvidia chip goes to sleep since I don't use it.

Thanks,
Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
 
Home page: http://marc.merlins.org/   | PGP 7F55D5F27AAF9D08


5.9.11 still hanging 2mn at each boot and looping on nvidia-gpu 0000:01:00.3: PME# enabled (Quadro RTX 4000 Mobile)

2020-12-27 Thread Marc MERLIN
This started with 5.5 and hasn't gotten better since then, despite some reports
I tried to send.

As per my previous message:
I have a Thinkpad P70 with hybrid graphics.
01:00.0 VGA compatible controller: NVIDIA Corporation GM107GLM [Quadro M600M] 
(rev a2)
that one works fine, I can use i915 for the main screen, and nouveau to
display on the external ports (external ports are only wired to nvidia
chip, so it's impossible to use them without turning the nvidia chip
on).
 
I now got a newer P73 also with the same hybrid graphics (setup as such
in the bios). It runs fine with i915, and I don't need to use external
display with nouveau for now (it almost works, but I only see the mouse
cursor on the external screen, no window or anything else can get
displayed, very weird).
01:00.0 VGA compatible controller: NVIDIA Corporation TU104GLM [Quadro RTX 4000 
Mobile / Max-Q] (rev a1)
 

after boot, when it gets the right trigger (not sure which ones), it
loops on this evern 2 seconds, mostly forever.

I'm not sure if it's nouveau's fault or the kernel's PCI PME's fault, or 
something else.

Boot hangs look like this:
[   10.659209] Console: switching to colour frame buffer device 240x67
[   10.732353] i915 :00:02.0: [drm] fb0: i915drmfb frame buffer device
[   12.101203] nvidia-gpu :01:00.3: saving config space at offset 0x0 
(reading 0x1ad910de)
[   12.101212] nvidia-gpu :01:00.3: saving config space at offset 0x4 
(reading 0x100406)
[   12.101217] nvidia-gpu :01:00.3: saving config space at offset 0x8 
(reading 0xc8000a1)
[   12.101223] nvidia-gpu :01:00.3: saving config space at offset 0xc 
(reading 0x80)
[   12.101228] nvidia-gpu :01:00.3: saving config space at offset 0x10 
(reading 0xce054000)
[   12.101234] nvidia-gpu :01:00.3: saving config space at offset 0x14 
(reading 0x0)
[   12.101239] nvidia-gpu :01:00.3: saving config space at offset 0x18 
(reading 0x0)
[   12.101244] nvidia-gpu :01:00.3: saving config space at offset 0x1c 
(reading 0x0)
[   12.101249] nvidia-gpu :01:00.3: saving config space at offset 0x20 
(reading 0x0)
[   12.101254] nvidia-gpu :01:00.3: saving config space at offset 0x24 
(reading 0x0)
[   12.101259] nvidia-gpu :01:00.3: saving config space at offset 0x28 
(reading 0x0)
[   12.101265] nvidia-gpu :01:00.3: saving config space at offset 0x2c 
(reading 0x229b17aa)
[   12.101270] nvidia-gpu :01:00.3: saving config space at offset 0x30 
(reading 0x0)
[   12.101275] nvidia-gpu :01:00.3: saving config space at offset 0x34 
(reading 0x68)
[   12.101280] nvidia-gpu :01:00.3: saving config space at offset 0x38 
(reading 0x0)
[   12.101285] nvidia-gpu :01:00.3: saving config space at offset 0x3c 
(reading 0x4ff)
[   12.101333] nvidia-gpu :01:00.3: PME# enabled
[   25.151246] thunderbolt :06:00.0: saving config space at offset 0x0 
(reading 0x15eb8086)
[   25.151260] thunderbolt :06:00.0: saving config space at offset 0x4 
(reading 0x100406)
[   25.151265] thunderbolt :06:00.0: saving config space at offset 0x8 
(reading 0x886)
[   25.151270] thunderbolt :06:00.0: saving config space at offset 0xc 
(reading 0x20)
[   25.151276] thunderbolt :06:00.0: saving config space at offset 0x10 
(reading 0xcc10)
[   25.151281] thunderbolt :06:00.0: saving config space at offset 0x14 
(reading 0xcc14)
[   25.151286] thunderbolt :06:00.0: saving config space at offset 0x18 
(reading 0x0)
[   25.151291] thunderbolt :06:00.0: saving config space at offset 0x1c 
(reading 0x0)
[   25.151296] thunderbolt :06:00.0: saving config space at offset 0x20 
(reading 0x0)
[   25.151301] thunderbolt :06:00.0: saving config space at offset 0x24 
(reading 0x0)
[   25.151306] thunderbolt :06:00.0: saving config space at offset 0x28 
(reading 0x0)
[   25.151311] thunderbolt :06:00.0: saving config space at offset 0x2c 
(reading 0x229b17aa)
[   25.151316] thunderbolt :06:00.0: saving config space at offset 0x30 
(reading 0x0)
[   25.151322] thunderbolt :06:00.0: saving config space at offset 0x34 
(reading 0x80)
[   25.151327] thunderbolt :06:00.0: saving config space at offset 0x38 
(reading 0x0)
[   25.151332] thunderbolt :06:00.0: saving config space at offset 0x3c 
(reading 0x1ff)
[   25.151416] thunderbolt :06:00.0: PME# enabled
[   25.169204] pcieport :05:00.0: saving config space at offset 0x0 
(reading 0x15ea8086)
[   25.169214] pcieport :05:00.0: saving config space at offset 0x4 
(reading 0x100407)
[   25.169219] pcieport :05:00.0: saving config space at offset 0x8 
(reading 0x6040006)
[   25.169224] pcieport :05:00.0: saving config space at offset 0xc 
(reading 0x10020)
[   25.169229] pcieport :05:00.0: saving config space at offset 0x10 
(reading 0x0)
[   25.169233] pcieport :05:00.0: saving config space at offset 0x14 
(reading 0x0)
[   25.169238] pcieport :05:00.0: saving config space at offset 0x18 
(reading 0x60605)
[   25.1692

Re: btrfs-rmw-2: page allocation failure: order:1, mode:0x8020

2014-03-29 Thread Marc MERLIN
+linux-kernel since I got no answer.

Hi,

I see you are maintainers of/contributors to drivers/scsi/mvsas

The btrfs folks pointed out that the problem below is due to the MVS driver,
namely:

From: Chris Mason 

This is an order 1 atomic allocation from the mvs driver, we really
should not be depending on that to get IO done.  A quick search and it
looks like we're allocating MVS_SLOT_BUF_SZ (8192) bytes.

You could try bumping the lowmem reserves.

-chris


Would you be able to modify the driver to avoid these low memory problems?

Thanks,
Marc


- Forwarded message from Marc MERLIN  -

From: Marc MERLIN 
To: linux-bt...@vger.kernel.org

My server died last night during a btrfs send/receive to a btrfs radi5 array

Here are the logs. Is this anything known or with a possible workaround?

Thanks,
Marc

btrfs-rmw-2: page allocation failure: order:1, mode:0x8020
CPU: 1 PID: 12499 Comm: btrfs-rmw-2 Not tainted 
3.14.0-rc5-amd64-i915-preempt-20140216c #1
Hardware name: System manufacturer P5KC/P5KC, BIOS 050205/24/2007
  88000549d780 816090b3 
 88000549d808 811037b0 0001fffe 88007ff7ce00
  0002 0030 88007ff7ce00
Call Trace:
 [] dump_stack+0x4e/0x7a
 [] warn_alloc_failed+0x111/0x125
 [] __alloc_pages_nodemask+0x707/0x854
 [] ? dma_generic_alloc_coherent+0xa7/0x11c
 [] dma_generic_alloc_coherent+0xa7/0x11c
 [] dma_pool_alloc+0x10a/0x1cb
 [] mvs_task_prep+0x192/0xa42 [mvsas]
 [] ? blkg_path.isra.80.constprop.90+0x17/0x38
 [] ? cache_alloc+0x1c/0x29b
 [] mvs_task_exec.isra.9+0x5d/0xc9 [mvsas]
 [] mvs_queue_command+0x3d/0x29b [mvsas]
 [] ? kmem_cache_alloc+0xe3/0x161
 [] sas_ata_qc_issue+0x1cd/0x235 [libsas]
 [] ata_qc_issue+0x291/0x2f1
 [] ? ata_scsiop_mode_sense+0x29c/0x29c
 [] __ata_scsi_queuecmd+0x184/0x1e0
 [] ata_sas_queuecmd+0x31/0x4d
 [] sas_queuecommand+0x98/0x1fe [libsas]
 [] scsi_dispatch_cmd+0x14f/0x22e
 [] scsi_request_fn+0x4da/0x507
 [] ? blk_recount_segments+0x1e/0x2e
 [] __blk_run_queue_uncond+0x22/0x2b
 [] __blk_run_queue+0x19/0x1b
 [] blk_queue_bio+0x23f/0x256
 [] generic_make_request+0x9c/0xdb
 [] submit_bio+0x112/0x131
 [] rmw_work+0x112/0x162
 [] worker_loop+0x168/0x4d8
 [] ? btrfs_queue_worker+0x283/0x283
 [] kthread+0xae/0xb6
 [] ? __kthread_parkme+0x61/0x61
 [] ret_from_fork+0x7c/0xb0
 [] ? __kthread_parkme+0x61/0x61
Mem-Info:
Node 0 DMA per-cpu:
CPU0: hi:0, btch:   1 usd:   0
CPU1: hi:0, btch:   1 usd:   0
Node 0 DMA32 per-cpu:
CPU0: hi:  186, btch:  31 usd: 171
CPU1: hi:  186, btch:  31 usd: 190
active_anon:17298 inactive_anon:21061 isolated_anon:0
 active_file:67491 inactive_file:94189 isolated_file:32
 unevictable:1260 dirty:38914 writeback:49596 unstable:0
 free:15999 slab_reclaimable:8198 slab_unreclaimable:9741
 mapped:12981 shmem:1661 pagetables:2711 bounce:0
 free_cma:0
Node 0 DMA free:8084kB min:348kB low:432kB high:520kB active_anon:360kB 
inactive_anon:764kB active_file:288kB inactive_file:2040kB unevictable:100kB 
isolated(anon):0kB isolated(file):0kB present:15976kB managed:15892kB 
mlocked:100kB dirty:0kB writeback:1272kB mapped:252kB shmem:8kB 
slab_reclaimable:168kB slab_unreclaimable:336kB kernel_stack:88kB 
pagetables:128kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB 
pages_scanned:0 all_unreclaimable? no
lowmem_reserve[]: 0 1987 1987 1987
Node 0 DMA32 free:56080kB min:44704kB low:55880kB high:67056kB 
active_anon:68832kB inactive_anon:83480kB active_file:269676kB 
inactive_file:374588kB unevictable:4940kB isolated(anon):0kB 
isolated(file):128kB present:2080256kB managed:2039064kB mlocked:4940kB 
dirty:155668kB writeback:197112kB mapped:51672kB shmem:6636kB 
slab_reclaimable:32624kB slab_unreclaimable:38628kB kernel_stack:2912kB 
pagetables:10716kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB 
pages_scanned:32 all_unreclaimable? no
lowmem_reserve[]: 0 0 0 0
Node 0 DMA: 85*4kB (UEM) 22*8kB (UEM) 62*16kB (UEM) 6*32kB (UM) 2*64kB (UE) 
5*128kB (UEM) 6*256kB (UEM) 4*512kB (EM) 0*1024kB 1*2048kB (R) 0*4096kB = 8100kB
Node 0 DMA32: 13004*4kB (M) 16*8kB (M) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 
0*512kB 0*1024kB 0*2048kB 1*4096kB (R) = 56240kB
Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
164139 total pagecache pages
0 pages in swap cache
Swap cache stats: add 0, delete 0, find 0/0
Free swap  = 9255932kB
Total swap = 9255932kB
524058 pages RAM
0 pages HighMem/MovableOnly
10298 pages reserved
0 pages hwpoisoned
mvsas :01:00.0: mvsas prep failed[0]!
btrfs-rmw-2: page allocation failure: order:1, mode:0x8020
CPU: 1 PID: 12499 Comm: btrfs-rmw-2 Not tainted 
3.14.0-rc5-amd64-i915-preempt-20140216c #1
Hardware name: System manufacturer P5KC/P5KC, BIOS 050205/24/2007
  88000549d690 816

Deleting pstore data causes immediate hang of 4.15.5 on Lenovo P70 with upgraded bios

2018-03-01 Thread Marc MERLIN
Howdy,

I have a thinkpad P70 which started to fail resuming from S3 sleep after any
kernel past 4.12 (sometimes it would work, sometimes the HD led would come
on when trying to resume, but nothing else).
After much debugging trying to figure what was causing it and coming short,
I decided to upgrade the very old firmware/bios on that laptop, since it likely
had many bugs.

The firmware update from a boot CD was weird, long, and worrisome. It looks
like after 1h or so (very long procedure), I got the latest firmware now,
but it won't boot my NVME M2 drive anymore, it shows in the boot menu, but
just hangs if I use it to boot.
However, I can get it to boot my M2 SATA drive. The nvme drive shows up fine
and works once linux has booted.

So, I figured I'd try a new bootmgr entry
saruman:~# efibootmgr -v -c -d /dev/nvme0n1 -p 1 -L "GrubNVME" -l 
'\EFI\debian\grubx64.efi'
Could not prepare Boot variable: No space left on device <<<

Ok, this brought me to
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=845023
and
https://mjg59.dreamwidth.org/23554.html

Sure enough,
saruman:~# df /sys/fs/pstore/
Filesystem 1K-blocks  Used Available Use% Mounted on
pstore 0 0 0- /sys/fs/pstore
it's full of files, and I'm assuming the variable storage is full of crap
(see below)

The problem is trying to delete any file in there causes an immediate hange of 
the kernel.

Any idea how to get around this problem?  I realize it may be the bios
that's crashing/hanging and not linux.
At least filling up the space did not brick my machine like Matthew pointing out
some firwmare crashes when it's full ( https://mjg59.dreamwidth.org/23554.html )

Is there any way to clear all this space, maybe from inside the bios by
resetting everything to default, or some other way?

saruman:~# l /sys/fs/pstore/ | wc -l
151
saruman:~# l /sys/fs/pstore/ | head
total 0
drwxr-x---  2 root root0 Mar  1 22:00 ./
drwxr-xr-x 10 root root0 Mar  1 22:02 ../
-r--r--r--  1 root root  983 Feb 16  2016 dmesg-efi-145565830401001
-r--r--r--  1 root root 1744 Feb 16  2016 dmesg-efi-145565830401002
-r--r--r--  1 root root  952 Feb 16  2016 dmesg-efi-145565830402001
-r--r--r--  1 root root 1636 Feb 16  2016 dmesg-efi-145565830402002
-r--r--r--  1 root root 1014 Feb 16  2016 dmesg-efi-145565830403001
-r--r--r--  1 root root 1781 Feb 16  2016 dmesg-efi-145565830403002
-r--r--r--  1 root root  351 Feb 16  2016 dmesg-efi-145565830404001
saruman:~# cat /sys/fs/pstore/dmesg-efi-145565830401001
Oops#1 Part1
<4>[ 4508.389437]  [] do_execveat_common.isra.26+0x450/0x5fd
<4>[ 4508.389495]  [] do_execve+0x23/0x25
<4>[ 4508.389541]  [] SyS_execve+0x2a/0x2e
<4>[ 4508.389582]  [] stub_execve+0x5/0x5
<4>[ 4508.389624]  [] ? entry_SYSCALL_64_fastpath+0x16/0x75
<4>[ 4508.389682] Code: 45 31 e4 48 8b 47 78 4c 8b 30 48 8d 58 f0 48 8d 47 78 
48 89 45 d0 49 83 ee 10 48 8d 43 10 48 39 45 d0 74 6f 4c 8b 6b 08 4c 89 e7 <49> 
8b 75 00 e8 3a f0 ff ff 49 8d 75 40 48 89 df 49 89 c4 e8 6b
<1>[ 4508.390025] RIP  [] unlink_anon_vmas+0x41/0x13e
<4>[ 4508.390086]  RSP 
<4>[ 4508.390119] CR2: 00fb
<7>[ 4508.390339] pci_bus :3b: busn_res: [bus 3b] is released
<7>[ 4508.390468] pci_bus :3c: busn_res: [bus 3c-6f] is released
<7>[ 4508.390605] pci_bus :06: busn_res: [bus 06-6f] is released
<4>[ 4508.470221] ---[ end trace e21f39de184e5ef4 ]---

Yeah, there is another issue that I have something that kept writing here until 
it filled up, and nothing that ever emptied it. I guess my old bios didn't care 
and the new 
one is having issues with this.
If I'm unlucky, this may even have caused the firmware upgrade to fail 
partially?

Handle 0x000E, DMI type 0, 24 bytes
BIOS Information
Vendor: LENOVO
Version: N1DET95W (2.21 )
Release Date: 12/13/2017
Runtime Size: 128 kB
ROM Size: 16384 kB
BIOS Revision: 2.21
Firmware Revision: 1.17

Thanks,
Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/


Re: Deleting pstore data causes immediate hang of 4.15.5 on Lenovo P70 with upgraded bios

2018-03-02 Thread Marc MERLIN
[+linux-efi and fixed Matthew's Email]

As an update, I got my NVME drive to boot once at least, it seem that I need
to wait about 2mn for the bios to do whatever, hang, recover and then
finally continue booting.
If I take over and force a boot on the M2 Sata drive instead, then it boots
near instantly.

After 2H on the phone with lenovo an finally getting someone with a clue,
apparently removing the CMOS battery may clear that pstore storage and help
with my issue.
Obviously it will also kill my efiboomgr entries and all my settings,
although I could recover from that if needed.
Before I go through all that trouble though, it'd be great to figure out why 
linux is causing hangs when deleting pstore data, and if it's only a bios
bug we can do nothing about, or maybe an issue on the linux side.

Is there any other way to delete from /sys/fs/pstore/ besides rm which
causes an instant hang?
Well, how about that, truncating the files seems to work, and now efibootmgr 
is able to make a new entry with the space I just freed.
pstore is still full of files, but they're not 0 sized, so I'm likely only
wasting the space for the filenames now.

Now, I probably have to also find what is writing to pstore and
kill that job given that deleting from pstore seems not possible on my
machine, and filling it up causes the bios to get upset.

Marc

On Thu, Mar 01, 2018 at 10:22:39PM -0800, Marc MERLIN wrote:
> Howdy,
> 
> I have a thinkpad P70 which started to fail resuming from S3 sleep after any
> kernel past 4.12 (sometimes it would work, sometimes the HD led would come
> on when trying to resume, but nothing else).
> After much debugging trying to figure what was causing it and coming short,
> I decided to upgrade the very old firmware/bios on that laptop, since it 
> likely
> had many bugs.
> 
> The firmware update from a boot CD was weird, long, and worrisome. It looks
> like after 1h or so (very long procedure), I got the latest firmware now,
> but it won't boot my NVME M2 drive anymore, it shows in the boot menu, but
> just hangs if I use it to boot.
> However, I can get it to boot my M2 SATA drive. The nvme drive shows up fine
> and works once linux has booted.
> 
> So, I figured I'd try a new bootmgr entry
> saruman:~# efibootmgr -v -c -d /dev/nvme0n1 -p 1 -L "GrubNVME" -l 
> '\EFI\debian\grubx64.efi'
> Could not prepare Boot variable: No space left on device <<<
> 
> Ok, this brought me to
> https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=845023
> and
> https://mjg59.dreamwidth.org/23554.html
> 
> Sure enough,
> saruman:~# df /sys/fs/pstore/
> Filesystem 1K-blocks  Used Available Use% Mounted on
> pstore 0 0 0- /sys/fs/pstore
> it's full of files, and I'm assuming the variable storage is full of crap
> (see below)
> 
> The problem is trying to delete any file in there causes an immediate hange 
> of the kernel.
> 
> Any idea how to get around this problem?  I realize it may be the bios
> that's crashing/hanging and not linux.
> At least filling up the space did not brick my machine like Matthew pointing 
> out
> some firwmare crashes when it's full ( 
> https://mjg59.dreamwidth.org/23554.html )
> 
> Is there any way to clear all this space, maybe from inside the bios by
> resetting everything to default, or some other way?
> 
> saruman:~# l /sys/fs/pstore/ | wc -l
> 151
> saruman:~# l /sys/fs/pstore/ | head
> total 0
> drwxr-x---  2 root root0 Mar  1 22:00 ./
> drwxr-xr-x 10 root root0 Mar  1 22:02 ../
> -r--r--r--  1 root root  983 Feb 16  2016 dmesg-efi-145565830401001
> -r--r--r--  1 root root 1744 Feb 16  2016 dmesg-efi-145565830401002
> -r--r--r--  1 root root  952 Feb 16  2016 dmesg-efi-145565830402001
> -r--r--r--  1 root root 1636 Feb 16  2016 dmesg-efi-145565830402002
> -r--r--r--  1 root root 1014 Feb 16  2016 dmesg-efi-145565830403001
> -r--r--r--  1 root root 1781 Feb 16  2016 dmesg-efi-145565830403002
> -r--r--r--  1 root root  351 Feb 16  2016 dmesg-efi-145565830404001
> saruman:~# cat /sys/fs/pstore/dmesg-efi-145565830401001
> Oops#1 Part1
> <4>[ 4508.389437]  [] do_execveat_common.isra.26+0x450/0x5fd
> <4>[ 4508.389495]  [] do_execve+0x23/0x25
> <4>[ 4508.389541]  [] SyS_execve+0x2a/0x2e
> <4>[ 4508.389582]  [] stub_execve+0x5/0x5
> <4>[ 4508.389624]  [] ? entry_SYSCALL_64_fastpath+0x16/0x75
> <4>[ 4508.389682] Code: 45 31 e4 48 8b 47 78 4c 8b 30 48 8d 58 f0 48 8d 47 78 
> 48 89 45 d0 49 83 ee 10 48 8d 43 10 48 39 45 d0 74 6f 4c 8b 6b 08 4c 89 e7 
> <49> 8b 75 00 e8 3a f0 ff ff 49 8d 75 40 48 89 df 49 89 c4 e8 6b
> <1>[ 4508.390025] RIP  [] unlink_anon_vmas+0x41/0x13e
> <4>[ 4508.390086]  RSP 
> <4

Re: Deleting pstore data causes immediate hang of 4.15.5 on Lenovo P70 with upgraded bios

2018-03-02 Thread Marc MERLIN
Sigh, and now I was just able to do this:
saruman:/sys/fs/pstore# \rm *
saruman:/sys/fs/pstore# l
total 0
drwxr-x---  2 root root 0 Mar  2 11:28 ./
drwxr-xr-x 10 root root 0 Mar  2 10:20 ../

Ok, so forget linux, I think it's just a stupid EFI bios.

If I were to venture a guess:
1) I went in setup, reset to default, that deleted my efibootmgr entries
2) some EFI space got freed as a result
3) truncating pstore files worked, because of #1 or not
4) now that the storage fronted by pstore, wasn't full anymore, deleting
files just worked.
5) I had to recreate my efibootmgr entries, and now that there is space,
that worked fine.

I'm going to guess that the EFI bios needs some space to delete files and
without any, it just hangs.

Oh well, sorry for the noise, and if maybe someone hits this problem in the
future, they'll be able to find this post with the solution.

On Fri, Mar 02, 2018 at 11:17:39AM -0800, Marc MERLIN wrote:
> [+linux-efi and fixed Matthew's Email]
> 
> As an update, I got my NVME drive to boot once at least, it seem that I need
> to wait about 2mn for the bios to do whatever, hang, recover and then
> finally continue booting.
> If I take over and force a boot on the M2 Sata drive instead, then it boots
> near instantly.
> 
> After 2H on the phone with lenovo an finally getting someone with a clue,
> apparently removing the CMOS battery may clear that pstore storage and help
> with my issue.
> Obviously it will also kill my efiboomgr entries and all my settings,
> although I could recover from that if needed.
> Before I go through all that trouble though, it'd be great to figure out why 
> linux is causing hangs when deleting pstore data, and if it's only a bios
> bug we can do nothing about, or maybe an issue on the linux side.
> 
> Is there any other way to delete from /sys/fs/pstore/ besides rm which
> causes an instant hang?
> Well, how about that, truncating the files seems to work, and now efibootmgr 
> is able to make a new entry with the space I just freed.
> pstore is still full of files, but they're not 0 sized, so I'm likely only
> wasting the space for the filenames now.
> 
> Now, I probably have to also find what is writing to pstore and
> kill that job given that deleting from pstore seems not possible on my
> machine, and filling it up causes the bios to get upset.
> 
> Marc
> 
> On Thu, Mar 01, 2018 at 10:22:39PM -0800, Marc MERLIN wrote:
> > Howdy,
> > 
> > I have a thinkpad P70 which started to fail resuming from S3 sleep after any
> > kernel past 4.12 (sometimes it would work, sometimes the HD led would come
> > on when trying to resume, but nothing else).
> > After much debugging trying to figure what was causing it and coming short,
> > I decided to upgrade the very old firmware/bios on that laptop, since it 
> > likely
> > had many bugs.
> > 
> > The firmware update from a boot CD was weird, long, and worrisome. It looks
> > like after 1h or so (very long procedure), I got the latest firmware now,
> > but it won't boot my NVME M2 drive anymore, it shows in the boot menu, but
> > just hangs if I use it to boot.
> > However, I can get it to boot my M2 SATA drive. The nvme drive shows up fine
> > and works once linux has booted.
> > 
> > So, I figured I'd try a new bootmgr entry
> > saruman:~# efibootmgr -v -c -d /dev/nvme0n1 -p 1 -L "GrubNVME" -l 
> > '\EFI\debian\grubx64.efi'
> > Could not prepare Boot variable: No space left on device <<<
> > 
> > Ok, this brought me to
> > https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=845023
> > and
> > https://mjg59.dreamwidth.org/23554.html
> > 
> > Sure enough,
> > saruman:~# df /sys/fs/pstore/
> > Filesystem 1K-blocks  Used Available Use% Mounted on
> > pstore 0 0 0- /sys/fs/pstore
> > it's full of files, and I'm assuming the variable storage is full of crap
> > (see below)
> > 
> > The problem is trying to delete any file in there causes an immediate hange 
> > of the kernel.
> > 
> > Any idea how to get around this problem?  I realize it may be the bios
> > that's crashing/hanging and not linux.
> > At least filling up the space did not brick my machine like Matthew 
> > pointing out
> > some firwmare crashes when it's full ( 
> > https://mjg59.dreamwidth.org/23554.html )
> > 
> > Is there any way to clear all this space, maybe from inside the bios by
> > resetting everything to default, or some other way?
> > 
> > saruman:~# l /sys/fs/pstore/ | wc -l
> > 151
> > saruman:~# l /sys/fs/pstore/ | head
&

Re: 4.8.8 kernel trigger OOM killer repeatedly when I have lots of RAM that should be free

2016-11-22 Thread Marc MERLIN
On Mon, Nov 21, 2016 at 01:56:39PM -0800, Marc MERLIN wrote:
> On Mon, Nov 21, 2016 at 10:50:20PM +0100, Vlastimil Babka wrote:
> > > 4.9rc5 however seems to be doing better, and is still running after 18
> > > hours. However, I got a few page allocation failures as per below, but the
> > > system seems to recover.
> > > Vlastimil, do you want me to continue the copy on 4.9 (may take 3-5 days) 
> > > or is that good enough, and i should go back to 4.8.8 with that patch 
> > > applied?
> > > https://marc.info/?l=linux-mm&m=147423605024993
> > 
> > Hi, I think it's enough for 4.9 for now and I would appreciate trying
> > 4.8 with that patch, yeah.
> 
> So the good news is that it's been running for almost 5H and so far so good.

And the better news is that the copy is still going strong, 4.4TB and
going. So 4.8.8 is fixed with that one single patch as far as I'm
concerned.

So thanks for that, looks good to me to merge.

Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/ | PGP 1024R/763BE901


Re: 4.8.8 kernel trigger OOM killer repeatedly when I have lots of RAM that should be free

2016-11-22 Thread Marc MERLIN
On Tue, Nov 22, 2016 at 05:25:44PM +0100, Michal Hocko wrote:
> currently AFAIR. I hate that Marc is not falling into that category but
> is it really problem for you to run with 4.9? If we have more users

Don't do anything just on my account. I had a problem, it's been fixed
in 2 different ways: 4.8+patch, or 4.9rc5

For me this was a 100% regression from 4.6, there was just no way I
could copy my data at all with 4.8, it not only failed, but killed all
the services on my machine until it randomly killed the shell that was
doing the copy.
Personally, I'll stick with 4.8 + this patch, and switch to 4.9 when
it's out (I'm a bit wary of RC kernels on a production server,
especially when I'm in the middle of trying to get my only good backup
to work again)

But at the same time, what I'm doing is probably not common (btrfs on
top of dmcrypt, on top of bcache, on top of swraid5, for both source and
destination), so I can't comment on whether the fix I just put on my 4.8
kernel does not cause other regressions or problems for other people.

Either way, I'm personally ok again now, so I thank you all for your
help, and will leave the hard decisions to you :)

Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/ | PGP 1024R/763BE901


Re: [PATCH-RFC]: sysrq-a: graceful reboot via kernel_restart(), similar to sysrq-o

2016-05-30 Thread Marc MERLIN
On Thu, Mar 10, 2016 at 09:13:13PM -0800, Marc MERLIN wrote:
> On Fri, Mar 11, 2016 at 04:35:21AM +, Eric Wheeler wrote:
> > Hello all,
> > 
> > We were having a discussion on the bcache list about the safest reboot 
> > options via sysrq here:
> >   http://thread.gmane.org/gmane.linux.kernel.bcache.devel/3559/focus=3586
> > 
> > The result of the discussion ended up in a patch for sysrq-a to call 
> > kernel_restart much in the same way as sysrq-ocalls kernel_power_off.
> > 
> > Please comment on the patch and suggest any appropriate changes.  
> 
> Thanks Eric.
> 
> The quick rationale is that sysrq-r is not desirable  to use if you're using
> bcache, or software raid since it will reboot without giving them a
> chance to properly sync their buffers and get into a clean state.
> 
> I've been using sysrq-o to get a clean shutdown, but of course that
> actually powers off the server, and you then need to rely on something
> like WOL to bring the machine back up, which isn't always easy or
> possible.
> 
> This new reboot with proper flushing (kernel_power_off) allows for safe
> reboots that don't upset bcache or software raid.

Just updated to 4.6 and re-applied Eric sysrq patch.
It's saved me many times already. I absolutely need to do clean reboots
for both my software raid and bcache, and when the system is not doing
well, sysrq-o does the graceful shutdown, but also powers off my server,
which is not what I want.
I've been using the new sysrq-x Eric wrote and it's been working great.

Any chance, we can get this into standard kernels? I can't be the only
person who benefits from this...
Any suggestion on who might be a good person to
review/critique/integrate this patch?

Thanks,
Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/ | PGP 1024R/763BE901


Re: 4.8.8 kernel trigger OOM killer repeatedly when I have lots of RAM that should be free

2016-11-28 Thread Marc MERLIN
On Mon, Nov 28, 2016 at 08:23:15AM +0100, Michal Hocko wrote:
> Marc, could you try this patch please? I think it should be pretty clear
> it should help you but running it through your use case would be more
> than welcome before I ask Greg to take this to the 4.8 stable tree.
 
This will take a little while, the whole copy took 5 days to finish and I'm a
bit hesitant about blowing it away and starting over :)
Let me see if I can come up with maybe another disk array for another test.

For now, as a reminder, I'm running that attached patch, and it works fine
I'll report back as soon as I can.

Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/  
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index a2214c64ed3c..9b3b3a79c58a 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3347,17 +3347,24 @@ should_reclaim_retry(gfp_t gfp_mask, unsigned order,
ac->nodemask) {
unsigned long available;
unsigned long reclaimable;
+   int check_order = order;
+   unsigned long watermark = min_wmark_pages(zone);
 
available = reclaimable = zone_reclaimable_pages(zone);
available -= DIV_ROUND_UP(no_progress_loops * available,
  MAX_RECLAIM_RETRIES);
available += zone_page_state_snapshot(zone, NR_FREE_PAGES);
 
+   if (order > 0 && order <= PAGE_ALLOC_COSTLY_ORDER) {
+   check_order = 0;
+   watermark += 1UL << order;
+   }
+
/*
 * Would the allocation succeed if we reclaimed the whole
 * available?
 */
-   if (__zone_watermark_ok(zone, order, min_wmark_pages(zone),
+   if (__zone_watermark_ok(zone, check_order, watermark,
ac_classzone_idx(ac), alloc_flags, available)) {
/*
 * If we didn't make any progress and have a lot of



Re: 4.8.8 kernel trigger OOM killer repeatedly when I have lots of RAM that should be free

2016-11-29 Thread Marc MERLIN
On Mon, Nov 28, 2016 at 08:23:15AM +0100, Michal Hocko wrote:
> Marc, could you try this patch please? I think it should be pretty clear
> it should help you but running it through your use case would be more
> than welcome before I ask Greg to take this to the 4.8 stable tree.

I ran it overnight and copied 1.4TB with it before it failed because
there wasn't enough disk space on the other side, so I think it fixes
the problem too.

Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/ | PGP 1024R/763BE901


Re: 4.8.8 kernel trigger OOM killer repeatedly when I have lots of RAM that should be free

2016-11-29 Thread Marc MERLIN
On Mon, Nov 28, 2016 at 08:23:15AM +0100, Michal Hocko wrote:
> Marc, could you try this patch please? I think it should be pretty clear
> it should help you but running it through your use case would be more
> than welcome before I ask Greg to take this to the 4.8 stable tree.
> 
> Thanks!
> 
> On Wed 23-11-16 07:34:10, Michal Hocko wrote:
> [...]
> > commit b2ccdcb731b666aa28f86483656c39c5e53828c7
> > Author: Michal Hocko 
> > Date:   Wed Nov 23 07:26:30 2016 +0100
> > 
> > mm, oom: stop pre-mature high-order OOM killer invocations
> > 
> > 31e49bfda184 ("mm, oom: protect !costly allocations some more for
> > !CONFIG_COMPACTION") was an attempt to reduce chances of pre-mature OOM
> > killer invocation for high order requests. It seemed to work for most
> > users just fine but it is far from bullet proof and obviously not
> > sufficient for Marc who has reported pre-mature OOM killer invocations
> > with 4.8 based kernels. 4.9 will all the compaction improvements seems
> > to be behaving much better but that would be too intrusive to backport
> > to 4.8 stable kernels. Instead this patch simply never declares OOM for
> > !costly high order requests. We rely on order-0 requests to do that in
> > case we are really out of memory. Order-0 requests are much more common
> > and so a risk of a livelock without any way forward is highly unlikely.
> > 
> > Reported-by: Marc MERLIN 
> > Signed-off-by: Michal Hocko 

Tested-by: Marc MERLIN 

Marc

> > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > index a2214c64ed3c..7401e996009a 100644
> > --- a/mm/page_alloc.c
> > +++ b/mm/page_alloc.c
> > @@ -3161,6 +3161,16 @@ should_compact_retry(struct alloc_context *ac, 
> > unsigned int order, int alloc_fla
> > if (!order || order > PAGE_ALLOC_COSTLY_ORDER)
> > return false;
> >  
> > +#ifdef CONFIG_COMPACTION
> > +   /*
> > +* This is a gross workaround to compensate a lack of reliable 
> > compaction
> > +* operation. We cannot simply go OOM with the current state of the 
> > compaction
> > +* code because this can lead to pre mature OOM declaration.
> > +*/
> > +   if (order <= PAGE_ALLOC_COSTLY_ORDER)
> > +   return true;
> > +#endif
> > +
> > /*
> >  * There are setups with compaction disabled which would prefer to loop
> >  * inside the allocator rather than hit the oom killer prematurely.
> > -- 
> > Michal Hocko
> > SUSE Labs
> 
> -- 
> Michal Hocko
> SUSE Labs
> 

-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/ | PGP 1024R/763BE901


Re: 4.8.8 kernel trigger OOM killer repeatedly when I have lots of RAM that should be free

2016-11-29 Thread Marc MERLIN
On Tue, Nov 29, 2016 at 05:07:51PM +0100, Michal Hocko wrote:
> On Tue 29-11-16 07:55:37, Marc MERLIN wrote:
> > On Mon, Nov 28, 2016 at 08:23:15AM +0100, Michal Hocko wrote:
> > > Marc, could you try this patch please? I think it should be pretty clear
> > > it should help you but running it through your use case would be more
> > > than welcome before I ask Greg to take this to the 4.8 stable tree.
> > 
> > I ran it overnight and copied 1.4TB with it before it failed because
> > there wasn't enough disk space on the other side, so I think it fixes
> > the problem too.
> 
> Can I add your Tested-by?

Done.

Now, probably unrelated, but hard to be sure, doing those big copies
causes massive hangs on my system. I hit a few of the 120s hangs,
but more generally lots of things hang, including shells, my DNS server,
monitoring reading from USB and timing out, and so forth.
Examples below. 
I have a hard time telling what is the fault, but is there a chance it
might be memory allocation pressure?
I already have a preempt kernel, so I can't make it more preempt than
that.
Now, to be fair, this is not a new problem, it's just varying degrees of
bad and usually only happens when I do a lot of I/O with btrfs.
That said, btrfs may very well just be suffering from memory allocation
issues and hanging as a result, with everything else on my system also
hanging for similar reasons until the memory pressure goes away with the
copy or scrub are finished.

What do you think?

[28034.954435] INFO: task btrfs:5618 blocked for more than 120 seconds.
[28034.975471]   Tainted: G U  
4.8.10-amd64-preempt-sysrq-20161121vb3tj1 #12
[28035.000964] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this 
message.
[28035.025429] btrfs   D 91154d33fc70 0  5618   5372 0x0080
[28035.047717]  91154d33fc70 00200246 911842f880c0 
9115a4cf01c0
[28035.071020]  91154d33fc58 91154d34 91165493bca0 
9115623773f0
[28035.094252]  1000 0001 91154d33fc88 
b86cf1a6
[28035.117538] Call Trace:
[28035.125791]  [] schedule+0x8b/0xa3
[28035.141550]  [] btrfs_start_ordered_extent+0xce/0x122
[28035.162457]  [] ? wake_up_atomic_t+0x2c/0x2c
[28035.180891]  [] btrfs_wait_ordered_range+0xa9/0x10d
[28035.201723]  [] btrfs_truncate+0x40/0x24b
[28035.219269]  [] btrfs_setattr+0x1da/0x2d7
[28035.237032]  [] notify_change+0x252/0x39c
[28035.254566]  [] do_truncate+0x81/0xb4
[28035.271057]  [] vfs_truncate+0xd9/0xf9
[28035.287782]  [] do_sys_truncate+0x63/0xa7

I get other hangs like:

[10338.968912] perf: interrupt took too long (3927 > 3917), lowering 
kernel.perf_event_max_sample_rate to 50750

[12971.047705] ftdi_sio ttyUSB15: usb_serial_generic_read_bulk_callback - urb 
stopped: -32

[17761.122238] usb 4-1.4: USB disconnect, device number 39
[17761.141063] usb 4-1.4: usbfs: USBDEVFS_CONTROL failed cmd hub-ctrl rqt 160 
rq 6 len 1024 ret -108
[17761.263252] usb 4-1: reset SuperSpeed USB device number 2 using xhci_hcd
[17761.938575] usb 4-1.4: new SuperSpeed USB device number 40 using xhci_hcd

[24130.574425] hpet1: lost 2306 rtc interrupts
[24156.034950] hpet1: lost 1628 rtc interrupts
[24173.314738] hpet1: lost 1104 rtc interrupts
[24180.129950] hpet1: lost 436 rtc interrupts
[24257.557955] hpet1: lost 4954 rtc interrupts
[24267.522656] hpet1: lost 637 rtc interrupts

Thanks,
Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/ | PGP 1024R/763BE901


Re: 4.8.8 kernel trigger OOM killer repeatedly when I have lots of RAM that should be free

2016-11-29 Thread Marc MERLIN
Thanks for the reply and suggestions.

On Tue, Nov 29, 2016 at 09:07:03AM -0800, Linus Torvalds wrote:
> On Tue, Nov 29, 2016 at 8:34 AM, Marc MERLIN  wrote:
> > Now, to be fair, this is not a new problem, it's just varying degrees of
> > bad and usually only happens when I do a lot of I/O with btrfs.
> 
> One situation where I've seen something like this happen is
> 
>  (a) lots and lots of dirty data queued up
>  (b) horribly slow storage

In my case, it is a 5x 4TB HDD with 
software raid 5 < bcache < dmcrypt < btrfs
bcache is currently half disabled (as in I removed the actual cache) or
too many bcache requests pile up, and the kernel dies when too many
workqueues have piled up.
I'm just kind of worried that since I'm going through 4 subsystems
before my data can hit disk, that's a lot of memory allocations and
places where data can accumulate and cause bottlenecks if the next
subsystem isn't as fast.

But this shouldn't be "horribly slow", should it? (it does copy a few
terabytes per day, not fast, but not horrible, about 30MB/s or so)

> Sadly, our defaults for "how much dirty data do we allow" are somewhat
> buggered. The global defaults are in "percent of memory", and are
> generally _much_ too high for big-memory machines:
> 
> [torvalds@i7 linux]$ cat /proc/sys/vm/dirty_ratio
> 20
> [torvalds@i7 linux]$ cat /proc/sys/vm/dirty_background_ratio
> 10

I can confirm I have the same.

> says that it only starts really throttling writes when you hit 20% of
> all memory used. You don't say how much memory you have in that
> machine, but if it's the same one you talked about earlier, it was
> 24GB. So you can have 4GB of dirty data waiting to be flushed out.

Correct, 24GB and 4GB.

> And we *try* to do this per-device backing-dev congestion thing to
> make things work better, but it generally seems to not work very well.
> Possibly because of inconsistent write speeds (ie _sometimes_ the SSD
> does really well, and we want to open up, and then it shuts down).
> 
> One thing you can try is to just make the global limits much lower. As in
> 
>echo 2 > /proc/sys/vm/dirty_ratio
>echo 1 > /proc/sys/vm/dirty_background_ratio

I will give that a shot, thank you.

Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/ | PGP 1024R/763BE901


Re: 4.8.8 kernel trigger OOM killer repeatedly when I have lots of RAM that should be free

2016-11-29 Thread Marc MERLIN
On Tue, Nov 29, 2016 at 09:40:19AM -0800, Marc MERLIN wrote:
> Thanks for the reply and suggestions.
> 
> On Tue, Nov 29, 2016 at 09:07:03AM -0800, Linus Torvalds wrote:
> > On Tue, Nov 29, 2016 at 8:34 AM, Marc MERLIN  wrote:
> > > Now, to be fair, this is not a new problem, it's just varying degrees of
> > > bad and usually only happens when I do a lot of I/O with btrfs.
> > 
> > One situation where I've seen something like this happen is
> > 
> >  (a) lots and lots of dirty data queued up
> >  (b) horribly slow storage
> 
> In my case, it is a 5x 4TB HDD with 
> software raid 5 < bcache < dmcrypt < btrfs
> bcache is currently half disabled (as in I removed the actual cache) or
> too many bcache requests pile up, and the kernel dies when too many
> workqueues have piled up.
> I'm just kind of worried that since I'm going through 4 subsystems
> before my data can hit disk, that's a lot of memory allocations and
> places where data can accumulate and cause bottlenecks if the next
> subsystem isn't as fast.
> 
> But this shouldn't be "horribly slow", should it? (it does copy a few
> terabytes per day, not fast, but not horrible, about 30MB/s or so)
> 
> > Sadly, our defaults for "how much dirty data do we allow" are somewhat
> > buggered. The global defaults are in "percent of memory", and are
> > generally _much_ too high for big-memory machines:
> > 
> > [torvalds@i7 linux]$ cat /proc/sys/vm/dirty_ratio
> > 20
> > [torvalds@i7 linux]$ cat /proc/sys/vm/dirty_background_ratio
> > 10
> 
> I can confirm I have the same.
> 
> > says that it only starts really throttling writes when you hit 20% of
> > all memory used. You don't say how much memory you have in that
> > machine, but if it's the same one you talked about earlier, it was
> > 24GB. So you can have 4GB of dirty data waiting to be flushed out.
> 
> Correct, 24GB and 4GB.
> 
> > And we *try* to do this per-device backing-dev congestion thing to
> > make things work better, but it generally seems to not work very well.
> > Possibly because of inconsistent write speeds (ie _sometimes_ the SSD
> > does really well, and we want to open up, and then it shuts down).
> > 
> > One thing you can try is to just make the global limits much lower. As in
> > 
> >echo 2 > /proc/sys/vm/dirty_ratio
> >echo 1 > /proc/sys/vm/dirty_background_ratio
> 
> I will give that a shot, thank you.

And, after 5H of copying, not a single hang, or USB disconnect, or anything.
Obviously this seems to point to other problems in the code, and I have no
idea which layer is a culprit here, but reducing the buffers absolutely
helped a lot.

Thanks much,
Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/  


Re: 4.8.8 kernel trigger OOM killer repeatedly when I have lots of RAM that should be free

2016-11-30 Thread Marc MERLIN
On Tue, Nov 29, 2016 at 10:01:10AM -0800, Linus Torvalds wrote:
> On Tue, Nov 29, 2016 at 9:40 AM, Marc MERLIN  wrote:
> >
> > In my case, it is a 5x 4TB HDD with
> > software raid 5 < bcache < dmcrypt < btrfs
> 
> It doesn't sound like the nasty situations I have seen (particularly
> with large USB flash storage - often high momentary speed for
> benchmarks, but slows down to a crawl after you've written a bit to
> it, and doesn't have the smart garbage collection that modern "real"
> SSDs have).

I gave it a thought again, I think it is exactly the nasty situation you
described.
bcache takes I/O quickly while sending to SSD cache. SSD fills up, now
bcache can't handle IO as quickly and has to hang until the SSD has been
flushed to spinning rust drives.
This actually is exactly the same as filling up the cache on a USB key
and now you're waiting for slow writes to flash, is it not?

With your dirty ratio workaround, I was able to re-enable bcache and
have it not fall over, but only barely. I recorded over a hundred
workqueues in flight during the copy at some point (just not enough
to actually kill the kernel this time).

I've started a bcache followp on this here:
http://marc.info/?l=linux-bcache&m=148052441423532&w=2
http://marc.info/?l=linux-bcache&m=148052620524162&w=2

This message shows the huge pileup of workqueeues in bcache
just before the kernel dies with
Hardware name: System manufacturer System Product Name/P8H67-M PRO, BIOS 3904 
04/27/2013
task: 9ee0c2fa4180 task.stack: 9ee0c2fa8000
RIP: 0010:[]  [] 
cpuidle_enter_state+0x119/0x171
RSP: :9ee0c2fabea0  EFLAGS: 0246
RAX: 9ee0de3d90c0 RBX: 0004 RCX: 001f
RDX:  RSI: 0007 RDI: 
RBP: 9ee0c2fabed0 R08: 0f92 R09: 0f42
R10: 9ee0c2fabe50 R11: 071c71c71c71c71c R12: e047bfdcb200
R13: 0af626899577 R14: 0004 R15: 0af6264cc557
FS:  () GS:9ee0de3c() knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2: 0898b000 CR3: 00045cc06000 CR4: 001406e0
Stack:
 0f40 e047bfdcb200 bbccc060 9ee0c2fac000
 9ee0c2fa8000 9ee0c2fac000 9ee0c2fabee0 bb57a1ac
 9ee0c2fabf30 bb09238d 9ee0c2fa8000 00070004
Call Trace:
 [] cpuidle_enter+0x17/0x19
 [] cpu_startup_entry+0x210/0x28b
 [] start_secondary+0x13e/0x140
Code: 00 00 00 48 c7 c7 cd ae b2 bb c6 05 4b 8e 7a 00 01 e8 17 6c ae ff fa 66 
0f 1f 44 00 00 31 ff e8 75 60 b4
44 00 00 <4c> 89 e8 b9 e8 03 00 00 4c 29 f8 48 99 48 f7 f9 ba ff ff ff 7f
Kernel panic - not syncing: Hard LOCKUP

A full traceback showing the pilup of requests is here:
http://marc.info/?l=linux-bcache&m=147949497808483&w=2

and there:
http://pastebin.com/rJ5RKUVm
(2 different ones but mostly the same result)

We can probably follow up on the bcache thread I Cc'ed you on since I'm
not sure if the fault here lies with bcache or the VM subsystem anymore.

Thanks.
Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/ | PGP 1024R/763BE901


Re: 4.8.8 kernel trigger OOM killer repeatedly when I have lots of RAM that should be free

2016-11-30 Thread Marc MERLIN
On Wed, Nov 30, 2016 at 10:14:50AM -0800, Linus Torvalds wrote:
> Anyway, none of this seems new per se. I'm adding Kent and Jens to the
> cc (Tejun already was), in the hope that maybe they have some idea how
> to control the nasty worst-case behavior wrt workqueue lockup (it's
> not really a "lockup", it looks like it's just hundreds of workqueues
> all waiting for IO to complete and much too deep IO queues).
 
I'll take your word for it, all I got in the end was
Kernel panic - not syncing: Hard LOCKUP
and the system stone dead when I woke up hours later.

> And I think your NMI watchdog then turns the "system is no longer
> responsive" into an actual kernel panic.

Ah, I see.

Thanks for the reply, and sorry for bringing in that separate thread
from the btrfs mailing list, which effectively was a suggestion similar
to what you're saying here too.

Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/ | PGP 1024R/763BE901


Re: 4.8.8 kernel trigger OOM killer repeatedly when I have lots of RAM that should be free

2016-11-21 Thread Marc MERLIN
On Mon, Nov 21, 2016 at 10:50:20PM +0100, Vlastimil Babka wrote:
> > 4.9rc5 however seems to be doing better, and is still running after 18
> > hours. However, I got a few page allocation failures as per below, but the
> > system seems to recover.
> > Vlastimil, do you want me to continue the copy on 4.9 (may take 3-5 days) 
> > or is that good enough, and i should go back to 4.8.8 with that patch 
> > applied?
> > https://marc.info/?l=linux-mm&m=147423605024993
> 
> Hi, I think it's enough for 4.9 for now and I would appreciate trying
> 4.8 with that patch, yeah.

So the good news is that it's been running for almost 5H and so far so good.

> The failures below are in a GFP_NOWAIT context, which cannot do any
> reclaim so it's not affected by OOM rewrite. If it's a regression, it
> has to be caused by something else. But it seems the code in
> cfq_get_queue() intentionally doesn't want to reclaim or use any atomic
> reserves, and has a fallback scenario for allocation failure, in which
> case I would argue that it should add __GFP_NOWARN, as these warnings
> can't help anyone. CCing Tejun as author of commit d4aad7ff0.

No, that's not a regression, I get those on occasion. The good news is that 
they're not
fatal. Just got another one with 4.8.8.
No idea if they're actual errors I should worry about, or just warnings that 
spam
the console a bit, but things retry, recover and succeed, so I can ignore them.

Another one from 4.8.8 below. I'll report back tomorrow to see if this has run 
for a day
and if so, I'll call your patch a fix for my problem (but at this point, it's 
already
looking very good).

Thanks, Marc

cron: page allocation failure: order:0, 
mode:0x2204000(GFP_NOWAIT|__GFP_COMP|__GFP_NOTRACK)
CPU: 4 PID: 9748 Comm: cron Tainted: G U  
4.8.8-amd64-volpreempt-sysrq-20161108vb2 #9
Hardware name: System manufacturer System Product Name/P8H67-M PRO, BIOS 3904 
04/27/2013
  a1e37429f6d0 9a36a0bb 
  a1e37429f768 9a1359d4 022040009f5e8d00
 0012   9a140770
Call Trace:
 [] dump_stack+0x61/0x7d
 [] warn_alloc_failed+0x11c/0x132
 [] ? wakeup_kswapd+0x8e/0x153
 [] __alloc_pages_nodemask+0x87b/0xb02
 [] ? __alloc_pages_nodemask+0x87b/0xb02
 [] cache_grow_begin+0xb2/0x30b
 [] fallback_alloc+0x137/0x19f
 [] cache_alloc_node+0xd3/0xde
 [] kmem_cache_alloc_node+0x8e/0x163
 [] cfq_get_queue+0x162/0x29d
 [] ? kmem_cache_alloc+0xd7/0x14b
 [] ? slab_post_alloc_hook+0x5b/0x66
 [] cfq_set_request+0x141/0x2be
 [] ? timekeeping_get_ns+0x1e/0x32
 [] ? ktime_get+0x41/0x52
 [] ? ktime_get_ns+0x9/0xb
 [] ? cfq_init_icq+0x12/0x19
 [] elv_set_request+0x1f/0x24
 [] get_request+0x324/0x5aa
 [] ? wake_up_atomic_t+0x2c/0x2c
 [] blk_queue_bio+0x19f/0x28c
 [] generic_make_request+0xbd/0x160
 [] submit_bio+0x100/0x11d
 [] ? map_swap_page+0x12/0x14
 [] ? get_swap_bio+0x57/0x6c
 [] swap_readpage+0x110/0x118
 [] read_swap_cache_async+0x26/0x2d
 [] swapin_readahead+0x11a/0x16a
 [] do_swap_page+0x9c/0x431
 [] ? do_swap_page+0x9c/0x431
 [] handle_mm_fault+0xa4d/0xb3d
 [] ? vfs_getattr_nosec+0x26/0x37
 [] __do_page_fault+0x267/0x43d
 [] do_page_fault+0x25/0x27
 [] page_fault+0x28/0x30
Mem-Info:
active_anon:532194 inactive_anon:133376 isolated_anon:0
 active_file:4118244 inactive_file:382010 isolated_file:0
 unevictable:1687 dirty:3502 writeback:386111 unstable:0
 slab_reclaimable:41767 slab_unreclaimable:106595
 mapped:512496 shmem:582026 pagetables:5352 bounce:0
 free:92092 free_pcp:176 free_cma:2072
Node 0 active_anon:2128776kB inactive_anon:533504kB active_file:16472976kB 
inactive_file:1528040kB unevictable:6748kB isolated(anon):0kB 
isolated(file):0kB mapped:2049984kB dirty:14008kB writeback:154kB shmem:0kB 
shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 2328104kB writeback_tmp:0kB 
unstable:0kB pages_scanned:1 all_unreclaimable? no
Node 0 DMA free:15884kB min:168kB low:208kB high:248kB active_anon:0kB 
inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB 
writepending:0kB present:15976kB managed:15892kB mlocked:0kB 
slab_reclaimable:0kB slab_unreclaimable:8kB kernel_stack:0kB pagetables:0kB 
bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
lowmem_reserve[]: 0 3200 23767 23767 23767
Node 0 DMA32 free:117580kB min:35424kB low:44280kB high:53136kB 
active_anon:3980kB inactive_anon:400kB active_file:2632672kB 
inactive_file:286956kB unevictable:0kB writepending:288296kB present:3362068kB 
managed:3296500kB mlocked:0kB slab_reclaimable:41632kB 
slab_unreclaimable:19512kB kernel_stack:880kB pagetables:676kB bounce:0kB 
free_pcp:0kB local_pcp:0kB free_cma:0kB
lowmem_reserve[]: 0 0 20567 20567 20567
Node 0 Normal free:234904kB min:226544kB low:283180kB high:339816kB 
active_anon:2124796kB inactive_anon:533104kB active_file:13840304kB 
inactive_file:1241268kB unevictable:6748kB writepending:1270156kB 
present:21485568kB managed:21080636kB mlocked:6748kB slab_reclaimable:125436kB 
sl

Re: [PATCH] objtool: fix CONFIG_STACK_VALIDATION warning for out-of-tree modules

2017-02-15 Thread Marc MERLIN
On Wed, Feb 15, 2017 at 12:21:17PM -0600, Josh Poimboeuf wrote:
> When building a CONFIG_STACK_VALIDATION enabled kernel without the
> libelf devel package installed, the Makefile prints a warning:
> 
>   "Cannot use CONFIG_STACK_VALIDATION, please install libelf-dev, 
> libelf-devel or elfutils-libelf-devel"
> 
> But when building an out-of-tree module, the warning doesn't show.
> Instead it tries to use objtool, and the build fails with:
> 
>   /bin/sh: ./tools/objtool/objtool: No such file or directory
> 
> Make sure the warning and the disabling of objtool occur in all cases,
> by moving the CONFIG_STACK_VALIDATION checks outside the 'ifeq
> ($(KBUILD_EXTMOD),)' block in the Makefile.
> 
> Reported-by: Marc MERLIN 
> Suggested-by: Jessica Yu 
> Fixes: 3b27a0c85d70 ("objtool: Detect and warn if libelf is missing and don't 
> break the build")
> Signed-off-by: Josh Poimboeuf 

Tested-By: Marc MERLIN 

saruman:/usr/src/linux-block# dpkg --remove libelf-dev
saruman:/usr/src/linux-block/tools/objtool# make clean
saruman:/usr/src/linux-block# dkms install  bbswitch/0.8

Kernel preparation unnecessary for this kernel.  Skipping...

Building module:
cleaning build area...
make -j8 KERNELRELEASE=4.10.0-rc7-mm3kb1+ KVERSION=4.10.0-rc7-mm3kb1+...(bad 
exit status: 2)
Error! Bad return status for module build on kernel: 4.10.0-rc7-mm3kb1+ (x86_64)

saruman:/usr/src/linux-block# patch -p1 -s < objtool.patch 

saruman:/usr/src/linux-block# dkms install  bbswitch/0.8

Kernel preparation unnecessary for this kernel.  Skipping...

Building module:
cleaning build area...
make -j8 KERNELRELEASE=4.10.0-rc7-mm3kb1+ KVERSION=4.10.0-rc7-mm3kb1+...
cleaning build area...

DKMS: build completed.

bbswitch.ko:
Running module version sanity check.
 - Original module
   - No original module exists within this kernel
 - Installation
   - Installing to /lib/modules/4.10.0-rc7-mm3kb1+/updates/dkms/

depmod...

DKMS: install completed.


All good, thank you.

Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/ | PGP 1024R/763BE901


Please turn "Cannot use CONFIG_STACK_VALIDATION" into build error

2017-02-13 Thread Marc MERLIN
Hi Josh,

I'll start with the story as to why.
i've lost more hours than I care to list, because I was unable to build
the virtualbox kernel driver with newer kernels.
Sadly, it gives no useful debug info outside of
make[1]: *** No rule to make target '/tmp/vbox.0/linux/SUPDrv-linux.o', needed 
by '/tmp/vbox.0/vboxdrv.o'.  Stop.

It took some pretty deep debugging to finally see this:
 Trying rule prerequisite 'tools/objtool/objtool'.
 Looking for a rule with intermediate file 'tools/objtool/objtool'.
  Avoiding implicit rule recursion.
which look quite inoccuous and don't look as errors at all.
When I filed a bug with the vbox folks, they were unable to find out why
the module refused to build on my kernel, and I was stuck with older
kernels as a result.

Then, I had another module, bbswitch, to turn off the nvidia chip on my
laptop to save battery. That one also failed to build with newer
kernels, but thankfully made it more clear that the problem was related
to tools/objtool/objtool missing.

But why was it missing? No idea...
I trace that down to CONFIG_STACK_VALIDATION which there seems to be no
menu option for, so I manually disable it in .config, rebuild, and it's
automatically re-enabled. Gah.

More hair pulling, and finally I make a typo
saruman:/usr/src/linux-block# make xonfig
Makefile:1044: "Cannot use CONFIG_STACK_VALIDATION, please install libelf-dev, 
libelf-devel or elfutils-libelf-devel"
scripts/kconfig/conf  --silentoldconfig Kconfig
Makefile:1044: "Cannot use CONFIG_STACK_VALIDATION, please install libelf-dev, 
libelf-devel or elfutils-libelf-devel"
make: *** No rule to make target 'xonfig'.  Stop.

Sure enough, this was my problem, but I never saw the error message
because I build kernels with 
make-kpkg --revision 1gandalf kernel-image
which does other stuff and hid that warning, which really should have
been a fatal error in my opinion.

Given that 
1) CONFIG_STACK_VALIDATION seems silently auto enabled.
2) without libelf-dev, the kernel will build but will leave a tree
missing objtool, which in turn causes (all?) 3rd party modules to fail
building.
3) and that it's kind of non trivial to find out why if that happens, 

Would you consider making
"Cannot use CONFIG_STACK_VALIDATION, please install libelf-dev, libelf-devel or 
elfutils-libelf-devel"
a build error as opposed to a warning?
This sure would have saved me countless errors of debugging the wrong
things.

Thank you
Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/ | PGP 1024R/763BE901


Re: Please turn "Cannot use CONFIG_STACK_VALIDATION" into build error

2017-02-13 Thread Marc MERLIN
On Mon, Feb 13, 2017 at 12:41:06PM -0600, Josh Poimboeuf wrote:
> Hm, that doesn't sound right.  Nothing automatically enables
> CONFIG_STACK_VALIDATION.  It should be disabled unless manually enabled.
> Maybe you got it confused with CONFIG_HAVE_STACK_VALIDATION, which is
> always enabled?
 
I did mean CONFIG_STACK_VALIDATION, which is what requires objtool.
It's very possible I enabled it myself during a make oldconfig some time
back, what I meant is that disabling it from the config file doesn't make it
go away, it comes back on its own (see below)

> BTW, there is a config option for it in the menu:
> 
>   Kernel hacking
> Compile-time checks and compiler options
>   Compile-time stack metadata validation
 
Thanks, I had a hard time finding it since it was not in the same place than
the other options around it. To be honest, I never quite know how to find
where a .config option is located in an xconfig menu, so I looked around
other ones above and below it in .config, and turns out it was the wrong
place.

Anyway, after not finding it in xconfig, I editted .config, and did:
# CONFIG_STACK_VALIDATION is not set
save .config
and the next build re-enabled the option.
That's what caught me by surprise. Did I do something wrong, or is there an
issue there?

> > 2) without libelf-dev, the kernel will build but will leave a tree
> > missing objtool, which in turn causes (all?) 3rd party modules to fail
> > building.
> 
> Yes, this is a bug.
 
Obviously the fix is to make sure objtool builds, but is there a way to make
things better if it doesn't build?
(apparently yes, as you replied below)

> Correct me if I'm wrong, but it sounds like make-kpkg suppressed stderr?
> If so, that should be fixed.
 
It does not, but it adds lines of output before the build starts, and since
the error with libelf-dev missing is not colorized, it was effectively
invisible (one line amongst hundreds scrolling on the screen).
Now that I know what the error is and how to look for it, I can see it, but
as a diagnosis that things were wrong and that things should be fixed, or
3rd party modules would fail to build in weird ways, it was unfortunately
useless.

> When I try to build an OOT module with CONFIG_STACK_VALIDATION enabled
> and elfutils-libelf-devel missing (on Fedora), I get:
> 
>   make: Entering directory '/home/jpoimboe/git/linux'
>   make[1]: Entering directory '/home/jpoimboe/ktest/output'
> CC [M]  /home/jpoimboe/livepatch-test/1/livepatch2.o
>   /bin/sh: ./tools/objtool/objtool: No such file or directory
>   /home/jpoimboe/git/linux/scripts/Makefile.build:300: recipe for target 
> '/home/jpoimboe/livepatch-test/1/livepatch2.o' failed
>   make[2]: *** [/home/jpoimboe/livepatch-test/1/livepatch2.o] Error 1
>   /home/jpoimboe/git/linux/Makefile:1490: recipe for target 
> '_module_/home/jpoimboe/livepatch-test/1' failed
>   make[1]: *** [_module_/home/jpoimboe/livepatch-test/1] Error 2
>   make[1]: Leaving directory '/home/jpoimboe/ktest/output'
>   Makefile:150: recipe for target 'sub-make' failed
>   make: *** [sub-make] Error 2
>   make: Leaving directory '/home/jpoimboe/git/linux'
> 
> It's not a perfect error message, but the
>   '/bin/sh: ./tools/objtool/objtool: No such file or directory'
> is at least a big clue.  I'm curious why you didn't see that.
 
In the virtualbox build, it just doesn't show up at all, even in the debug
log :(
It's only after spending many many hours trying to find why virtualbox was
not working, that I realized that my bbswitch module wasn't building either,
and that one did point to objtool as a culprit.
But even after I found this, it was non trivial to link this to libelf-dev
missing, given that the message wasn't that visible in a kernel build.

> Anyway, the above libelf-dev warning is just a warning and not a build
> error because CONFIG_STACK_VALIDATION is enabled for allyesconfig, and
> it's not a severe enough problem to warrant breaking the build.

Understood.

> Ideally the same warning should be printed when building OOT modules.
> I'll try to figure out if there's a way to do that it.

This would help, although in that case you can even make the warning an
error since objtool missing seems to be fatal?

Thanks,
Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/  


Re: Please turn "Cannot use CONFIG_STACK_VALIDATION" into build error

2017-02-13 Thread Marc MERLIN
On Mon, Feb 13, 2017 at 04:00:02PM -0600, Josh Poimboeuf wrote:
> On Mon, Feb 13, 2017 at 01:31:32PM -0800, Marc MERLIN wrote:
> > Anyway, after not finding it in xconfig, I editted .config, and did:
> > # CONFIG_STACK_VALIDATION is not set
> > save .config
> > and the next build re-enabled the option.
> > That's what caught me by surprise. Did I do something wrong, or is there an
> > issue there?
> 
> I really don't see how it would be possible for it to come back by
> itself, as it's disabled by default, and no other options select it.
> When I remove it, it stays disabled.
 
Mmmh, you are correct.
I have no idea why/how it got re-enabled yesterday. I'm not seeing this
again today.

> > This would help, although in that case you can even make the warning an
> > error since objtool missing seems to be fatal?
> 
> It doesn't need to be fatal though.  It should just be a warning and the
> build should succeed, like it does when building the kernel.

Agreed, that would be even better.

Thanks for looking at that.
Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/  


Re: [PATCH-RFC]: sysrq-a: graceful reboot via kernel_restart(), similar to sysrq-o

2016-03-10 Thread Marc MERLIN
On Fri, Mar 11, 2016 at 04:35:21AM +, Eric Wheeler wrote:
> Hello all,
> 
> We were having a discussion on the bcache list about the safest reboot 
> options via sysrq here:
>   http://thread.gmane.org/gmane.linux.kernel.bcache.devel/3559/focus=3586
> 
> The result of the discussion ended up in a patch for sysrq-a to call 
> kernel_restart much in the same way as sysrq-ocalls kernel_power_off.
> 
> Please comment on the patch and suggest any appropriate changes.  

Thanks Eric.

The quick rationale is that sysrq-r is not desirable  to use if you're using
bcache, or software raid since it will reboot without giving them a
chance to properly sync their buffers and get into a clean state.

I've been using sysrq-o to get a clean shutdown, but of course that
actually powers off the server, and you then need to rely on something
like WOL to bring the machine back up, which isn't always easy or
possible.

This new reboot with proper flushing (kernel_power_off) allows for safe
reboots that don't upset bcache or software raid.

Thanks,
Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/ | PGP 1024R/763BE901


Re: pcieport 0000:00:01.0: PME: Spurious native interrupt (nvidia with nouveau and thunderbolt on thinkpad P73)

2020-09-06 Thread Marc MERLIN
port :00:01.0: saving config space at offset 0x20 
(reading 0xce00cd00)
[6.724050] pcieport :00:01.0: saving config space at offset 0x24 
(reading 0xb1f1a001)
[6.724054] pcieport :00:01.0: saving config space at offset 0x28 
(reading 0x0)
[6.724058] pcieport :00:01.0: saving config space at offset 0x2c 
(reading 0x0)
[6.724062] pcieport :00:01.0: saving config space at offset 0x30 
(reading 0x0)
[6.724066] pcieport :00:01.0: saving config space at offset 0x34 
(reading 0x88)
[6.724070] pcieport :00:01.0: saving config space at offset 0x38 
(reading 0x0)
[6.724074] pcieport :00:01.0: saving config space at offset 0x3c 
(reading 0x201ff)
[6.724129] pcieport :00:1b.0: runtime IRQ mapping not provided by arch
[6.724650] pcieport :00:1b.0: PME: Signaling with IRQ 123
[6.725021] pcieport :00:1b.0: saving config space at offset 0x0 
(reading 0xa3408086)
[6.725026] pcieport :00:1b.0: saving config space at offset 0x4 
(reading 0x100407)
[6.725031] pcieport :00:1b.0: saving config space at offset 0x8 
(reading 0x60400f0)
[6.725035] pcieport :00:1b.0: saving config space at offset 0xc 
(reading 0x81)
[6.725040] pcieport :00:1b.0: saving config space at offset 0x10 
(reading 0x0)
[6.725044] pcieport :00:1b.0: saving config space at offset 0x14 
(reading 0x0)
[6.725049] pcieport :00:1b.0: saving config space at offset 0x18 
(reading 0x20200)
[6.725053] pcieport :00:1b.0: saving config space at offset 0x1c 
(reading 0x20f0)
[6.725058] pcieport :00:1b.0: saving config space at offset 0x20 
(reading 0xce30ce30)
[6.725062] pcieport :00:1b.0: saving config space at offset 0x24 
(reading 0x1fff1)
[6.725067] pcieport :00:1b.0: saving config space at offset 0x28 
(reading 0x0)
[6.725071] pcieport :00:1b.0: saving config space at offset 0x2c 
(reading 0x0)
[6.725075] pcieport :00:1b.0: saving config space at offset 0x30 
(reading 0x0)
[6.725080] pcieport :00:1b.0: saving config space at offset 0x34 
(reading 0x40)
[6.725084] pcieport :00:1b.0: saving config space at offset 0x38 
(reading 0x0)
[6.725089] pcieport :00:1b.0: saving config space at offset 0x3c 
(reading 0x201ff)
[6.725154] pcieport :00:1c.0: runtime IRQ mapping not provided by arch
[6.725284] pcieport :00:1c.0: PME: Signaling with IRQ 124
[6.725580] pcieport :00:1c.0: pciehp: Slot #0 AttnBtn- PwrCtrl- MRL- 
AttnInd- PwrInd- HotPlug+ Surprise+ Interlock- NoCompl+ IbPresDis- LLActRep+
[6.726086] pci_bus :04: dev 00, created physical slot 0

Any idea what's going on?

Thanks,
Marc

On Sat, Aug 08, 2020 at 01:22:02PM -0700, Marc MERLIN wrote:
> On Fri, Oct 04, 2019 at 03:39:46PM +0300, Mika Westerberg wrote:
> > This is otherwise similar to pcie_wait_for_link() but allows passing
> > custom activation delay in milliseconds.
> > 
> > Signed-off-by: Mika Westerberg 
> > ---
> >  drivers/pci/pci.c | 21 ++---
> >  1 file changed, 18 insertions(+), 3 deletions(-)
> > 
> > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> > index e7982af9a5d8..bfd92e018925 100644
> 
> Hi Mika,
> 
> So, I have a thinkpad P73 with thunderbolt, and while I don't boot
> often, my last boots have been unreliable at best (was only able to boot
> 5.7 once, and 5.8 did not succeed either).
> 
> 5.6 was working for a while, but couldn't boot it either this morning,
> so I had to go back to 5.5. This does not mean 5.5 does not have the
> problem, just that it booted this morning, while 5.6 didn't when I
> tried.
> Once the kernel is booted, the problem does not seem to occur much, or
> at all.
> 
> Basically, I'm getting the same thing than this person with a P53 (which
> is a mostly identical lenovo thinkpad, to mine)
> kernel: pcieport :00:01.0: PME: Spurious native interrupt!
> kernel: pcieport :00:01.0: PME: Spurious native interrupt!
> kernel: pcieport :00:01.0: PME: Spurious native interrupt!
> kernel: pcieport :00:01.0: PME: Spurious native interrupt!
> kernel: pcieport :00:01.0: PME: Spurious native interrupt!
> https://bbs.archlinux.org/viewtopic.php?id=250658
> 
> The kernel boots eventually, but it takes minutes, and everything is so
> super slow, that I just can't reasonably use the machine.
> 
> This shows similar issues with 5.3, 5.4.
> https://forum.proxmox.com/threads/pme-spurious-native-interrupt-kernel-meldungen.62850/
> 
> Another report here with 5.6:
> https://bugzilla.redhat.com/show_bug.cgi?id=1831899
> 
> My current kernel is running your patch above, and I haven't done a lot
> of research yet to confirm whether going back to a kernel before it was
> merged, fixes

Re: [Nouveau] pcieport 0000:00:01.0: PME: Spurious native interrupt (nvidia with nouveau and thunderbolt on thinkpad P73)

2020-09-07 Thread Marc MERLIN
On Mon, Sep 07, 2020 at 09:14:03PM +0200, Karol Herbst wrote:
> > - changes in the nouveau driver. Mika told me the PCIe regression
> >   "pcieport :00:01.0: PME: Spurious native interrupt!" is supposed
> >   to be fixed in 5.8, but I still get a 4mn hang or so during boot and
> >   with 5.8, removing the USB key, didn't help make the boot faster
> 
> that's the root port the GPU is attached to, no? I saw that message on
> the Thinkpad P1G2 when runtime resuming the Nvidia GPU, but it does
> seem to come from the root port.

Hi Karol, thanks for your answer.
 
00:01.0 PCI bridge: Intel Corporation Xeon E3-1200 v5/E3-1500 v5/6th Gen Core 
Processor PCIe Controller (x16) (rev 0d)
01:00.0 VGA compatible controller: NVIDIA Corporation TU104GLM [Quadro RTX 4000 
Mobile / Max-Q] (rev a1)

> Well, you'd also need it when attaching external displays.
 
Indeed. I just don't need that on this laptop, but familiar with the not
so seemless procedure to turn on both GPUs, and mirror the intel one into
the nvidia one for external output. 

> > [   11.262985] nvidia-gpu :01:00.3: PME# enabled
> > [   11.303060] nvidia-gpu :01:00.3: PME# disabled
> 
> mhh, interesting. I heard some random comments that the Nvidia
> USB-C/UCSI driver is a bit broken and can cause various issues. Mind
> blacklisting i2c-nvidia-gpu and typec_nvidia (and verify they don't
> get loaded) and see if that helps?

Right, this one:
01:00.3 Serial bus controller [0c80]: NVIDIA Corporation TU104 USB Type-C UCSI 
Controller (rev a1)
Sure, I'll blacklist it. Ok, just did that, removed from initrd,
rebooted, and it was no better.

>From initrd (before root gets mounted), I have this:
nouveau  1961984  0
mxm_wmi16384  1 nouveau
hwmon  32768  1 nouveau
ttm   102400  1 nouveau
wmi32768  2 nouveau,mxm_wmi

I still got a 2mn hang. and a nouveau probe error
[  189.124530] nouveau: probe of :01:00.0 failed with error -12


Here's what it looks like:
[9.693230] hid: raw HID events driver (C) Jiri Kosina
[9.694988] usbcore: registered new interface driver usbhid
[9.694989] usbhid: USB HID core driver
[9.696700] hid-generic 0003:1050:0200.0001: hiddev0,hidraw0: USB HID v1.00 
Device [Yubico Yubico Gnubby (gnubby1)] on usb-:00:14.0-2/input0
[9.784456] Console: switching to colour frame buffer device 240x67
[9.816297] i915 :00:02.0: fb0: i915drmfb frame buffer device
[   25.087400] thunderbolt :06:00.0: saving config space at offset 0x0 
(reading 0x15eb8086)
[   25.087414] thunderbolt :06:00.0: saving config space at offset 0x4 
(reading 0x100406)
[   25.087419] thunderbolt :06:00.0: saving config space at offset 0x8 
(reading 0x886)
[   25.087424] thunderbolt :06:00.0: saving config space at offset 0xc 
(reading 0x20)
[   25.087430] thunderbolt :06:00.0: saving config space at offset 0x10 
(reading 0xcc10)
[   25.087435] thunderbolt :06:00.0: saving config space at offset 0x14 
(reading 0xcc14)
[   25.087440] thunderbolt :06:00.0: saving config space at offset 0x18 
(reading 0x0)
[   25.087445] thunderbolt :06:00.0: saving config space at offset 0x1c 
(reading 0x0)
[   25.087450] thunderbolt :06:00.0: saving config space at offset 0x20 
(reading 0x0)
[   25.087455] thunderbolt :06:00.0: saving config space at offset 0x24 
(reading 0x0)
[   25.087460] thunderbolt :06:00.0: saving config space at offset 0x28 
(reading 0x0)
[   25.087466] thunderbolt :06:00.0: saving config space at offset 0x2c 
(reading 0x229b17aa)
[   25.087471] thunderbolt :06:00.0: saving config space at offset 0x30 
(reading 0x0)
[   25.087476] thunderbolt :06:00.0: saving config space at offset 0x34 
(reading 0x80)
[   25.087481] thunderbolt :06:00.0: saving config space at offset 0x38 
(reading 0x0)
[   25.087486] thunderbolt :06:00.0: saving config space at offset 0x3c 
(reading 0x1ff)
[   25.087571] thunderbolt :06:00.0: PME# enabled
[   25.105353] pcieport :05:00.0: saving config space at offset 0x0 
(reading 0x15ea8086)
[   25.105364] pcieport :05:00.0: saving config space at offset 0x4 
(reading 0x100407)
[   25.105370] pcieport :05:00.0: saving config space at offset 0x8 
(reading 0x6040006)
[   25.105375] pcieport :05:00.0: saving config space at offset 0xc 
(reading 0x10020)
[   25.105380] pcieport :05:00.0: saving config space at offset 0x10 
(reading 0x0)
[   25.105384] pcieport :05:00.0: saving config space at offset 0x14 
(reading 0x0)
[   25.105389] pcieport :05:00.0: saving config space at offset 0x18 
(reading 0x60605)
[   25.105394] pcieport :05:00.0: saving config space at offset 0x1c 
(reading 0x1f1)
[   25.105399] pcieport :05:00.0: saving config space at offset 0x20 
(reading 0xcc10cc10)
[   25.105404] pcieport :05:00.0: saving config space at offset 0x24 
(reading 0x1fff1)
[   25.105409] pcieport :05:00.0: saving config space

Re: [Nouveau] pcieport 0000:00:01.0: PME: Spurious native interrupt (nvidia with nouveau and thunderbolt on thinkpad P73)

2020-09-07 Thread Marc MERLIN
On Tue, Sep 08, 2020 at 01:51:19AM +0200, Karol Herbst wrote:
> oh, I somehow missed that "disp ctor failed" message. I think that
> might explain why things are a bit hanging. From the top of my head I
> am not sure if that's something known or something new. But just in
> case I CCed Lyude and Ben. And I think booting with
> nouveau.debug=disp=trace could already show something relevant.

Thanks.
I've added that to my boot for next time I reboot.

I'm moving some folks to Bcc now, and let's remove the lists other than
nouveau on followups (lkml and pci). I'm just putting a warning here
so that it shows up in other list archives and anyone finding this
later knows that they should look in the nouveau archives for further
updates/resolution.

Thanks,
Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
 
Home page: http://marc.merlins.org/   | PGP 7F55D5F27AAF9D08


Re: [Nouveau] pcieport 0000:00:01.0: PME: Spurious native interrupt (nvidia with nouveau and thunderbolt on thinkpad P73)

2020-09-13 Thread Marc MERLIN
On Mon, Sep 07, 2020 at 05:29:35PM -0700, Marc MERLIN wrote:
> On Tue, Sep 08, 2020 at 01:51:19AM +0200, Karol Herbst wrote:
> > oh, I somehow missed that "disp ctor failed" message. I think that
> > might explain why things are a bit hanging. From the top of my head I
> > am not sure if that's something known or something new. But just in
> > case I CCed Lyude and Ben. And I think booting with
> > nouveau.debug=disp=trace could already show something relevant.
> 
> Thanks.
> I've added that to my boot for next time I reboot.
> 
> I'm moving some folks to Bcc now, and let's remove the lists other than
> nouveau on followups (lkml and pci). I'm just putting a warning here
> so that it shows up in other list archives and anyone finding this
> later knows that they should look in the nouveau archives for further
> updates/resolution.

Hi, I didn't hear back on this issue. Did you need the nouveau.debug=disp=trace
or are you already working on the "disp ctor failed" issue?

Thanks
Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
 
Home page: http://marc.merlins.org/   | PGP 7F55D5F27AAF9D08


Re: 4.8.8 kernel trigger OOM killer repeatedly when I have lots of RAM that should be free

2017-05-01 Thread Marc MERLIN
Howdy,

Well, sadly, the problem is more or less back is 4.11.0. The system doesn't 
really 
crash but it goes into an infinite loop with
[34776.826800] BUG: workqueue lockup - pool cpus=6 node=0 flags=0x0 nice=0 
stuck for 33s!
More logs: https://pastebin.com/YqE4riw0

(I upgraded from 4.8 with custom patches you gave me, and went to 4.11.0

gargamel:~# cat /proc/sys/vm/dirty_ratio
2
gargamel:~# cat /proc/sys/vm/dirty_background_ratio
1
gargamel:~# free
 total   used   free sharedbuffers cached
Mem:  24392600   163626608029940  0   8884   13739000
-/+ buffers/cache:2614776   21777824
Swap: 15616764  0   15616764

And yet, I was doing a btrfs check repair on a busy filesystem, within 40mn or 
so,
it triggered the workqueue lockup.

gargamel:~# grep CONFIG_COMPACTION 
/boot/config-4.11.0-amd64-preempt-sysrq-20170406 
CONFIG_COMPACTION=y

kernel config file: https://pastebin.com/7Tajse6L

To be fair, I didn't try to run btrfs check on 4.8 and now I'm busy
trying to recover a filesystem that apparently got corrupted by a bad
SAS driver in 4.8 which caused a lot of I/O errors and corruption.
This is just to say that btrfs on top of dmcrypt on top of bcache may
have been enough layers to hang on btrfs check on 4.8 too, but I can't
really go back to check right now due to the driver corruption issues.

Any idea what I should do next?

Thanks,
Marc

On Tue, Nov 29, 2016 at 03:01:35PM -0800, Marc MERLIN wrote:
> On Tue, Nov 29, 2016 at 09:40:19AM -0800, Marc MERLIN wrote:
> > Thanks for the reply and suggestions.
> > 
> > On Tue, Nov 29, 2016 at 09:07:03AM -0800, Linus Torvalds wrote:
> > > On Tue, Nov 29, 2016 at 8:34 AM, Marc MERLIN  wrote:
> > > > Now, to be fair, this is not a new problem, it's just varying degrees of
> > > > bad and usually only happens when I do a lot of I/O with btrfs.
> > > 
> > > One situation where I've seen something like this happen is
> > > 
> > >  (a) lots and lots of dirty data queued up
> > >  (b) horribly slow storage
> > 
> > In my case, it is a 5x 4TB HDD with 
> > software raid 5 < bcache < dmcrypt < btrfs
> > bcache is currently half disabled (as in I removed the actual cache) or
> > too many bcache requests pile up, and the kernel dies when too many
> > workqueues have piled up.
> > I'm just kind of worried that since I'm going through 4 subsystems
> > before my data can hit disk, that's a lot of memory allocations and
> > places where data can accumulate and cause bottlenecks if the next
> > subsystem isn't as fast.
> > 
> > But this shouldn't be "horribly slow", should it? (it does copy a few
> > terabytes per day, not fast, but not horrible, about 30MB/s or so)
> > 
> > > Sadly, our defaults for "how much dirty data do we allow" are somewhat
> > > buggered. The global defaults are in "percent of memory", and are
> > > generally _much_ too high for big-memory machines:
> > > 
> > > [torvalds@i7 linux]$ cat /proc/sys/vm/dirty_ratio
> > > 20
> > > [torvalds@i7 linux]$ cat /proc/sys/vm/dirty_background_ratio
> > > 10
> > 
> > I can confirm I have the same.
> > 
> > > says that it only starts really throttling writes when you hit 20% of
> > > all memory used. You don't say how much memory you have in that
> > > machine, but if it's the same one you talked about earlier, it was
> > > 24GB. So you can have 4GB of dirty data waiting to be flushed out.
> > 
> > Correct, 24GB and 4GB.
> > 
> > > And we *try* to do this per-device backing-dev congestion thing to
> > > make things work better, but it generally seems to not work very well.
> > > Possibly because of inconsistent write speeds (ie _sometimes_ the SSD
> > > does really well, and we want to open up, and then it shuts down).
> > > 
> > > One thing you can try is to just make the global limits much lower. As in
> > > 
> > >echo 2 > /proc/sys/vm/dirty_ratio
> > >echo 1 > /proc/sys/vm/dirty_background_ratio
> > 
> > I will give that a shot, thank you.
> 
> And, after 5H of copying, not a single hang, or USB disconnect, or anything.
> Obviously this seems to point to other problems in the code, and I have no
> idea which layer is a culprit here, but reducing the buffers absolutely
> helped a lot.

-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/ | PGP 1024R/763BE901


Re: 4.8.8 kernel trigger OOM killer repeatedly when I have lots of RAM that should be free

2017-05-02 Thread Marc MERLIN
On Tue, May 02, 2017 at 09:44:33AM +0200, Michal Hocko wrote:
> On Mon 01-05-17 21:12:35, Marc MERLIN wrote:
> > Howdy,
> > 
> > Well, sadly, the problem is more or less back is 4.11.0. The system doesn't 
> > really 
> > crash but it goes into an infinite loop with
> > [34776.826800] BUG: workqueue lockup - pool cpus=6 node=0 flags=0x0 nice=0 
> > stuck for 33s!
> > More logs: https://pastebin.com/YqE4riw0
> 
> I am seeing a lot of traces where tasks is waiting for an IO. I do not
> see any OOM report there. Why do you believe this is an OOM killer
> issue?

Good question. This is a followup of the problem I had in 4.8.8 until I
got a patch to fix the issue. Then, it used to OOM and later, to pile up
I/O tasks like this.
Now it doesn't OOM anymore, but tasks still pile up.
I temporarily fixed the issue by doing this:
gargamel:~# echo 0 > /proc/sys/vm/dirty_ratio
gargamel:~# echo 0 > /proc/sys/vm/dirty_background_ratio

of course my performance is abysmal now, but I can at least run btrfs
scrub without piling up enough IO to deadlock the system.

On Tue, May 02, 2017 at 07:44:47PM +0900, Tetsuo Handa wrote:
> > Any idea what I should do next?
> 
> Maybe you can try collecting list of all in-flight allocations with backtraces
> using kmallocwd patches at
> http://lkml.kernel.org/r/1489578541-81526-1-git-send-email-penguin-ker...@i-love.sakura.ne.jp
> and 
> http://lkml.kernel.org/r/201704272019.jeh26057.shfotmljoov...@i-love.sakura.ne.jp
> which also tracks mempool allocations.
> (Well, the
> 
> - cond_resched();
> + //cond_resched();
> 
> change in the latter patch would not be preferable.)

Thanks. I can give that a shot as soon as my current scrub is done, it
may take another 12 to 24H at this rate.
In the meantimne, as explained above, not allowing any dirty VM has
worked around the problem (Linus pointed out to me in the original
thread that on a lightly loaded 24GB system, even 1 or 2% could still be
a lot of memory for requests to pile up in and cause issues in
degenerative cases like mine).
Now I'm still curious what changed betweeen 4.8.8 + custom patches and 4.11 to 
cause
this.

Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/ | PGP 1024R/763BE901