On Tue, 18 Mar 2025, Gleb Smirnoff wrote:

On Tue, Mar 18, 2025 at 08:14:31AM -0700, David Wolfskill wrote:
D> It completed successfully:
D>
D> g1-48(15.0-C)[1] uname -aUK
D> FreeBSD g1-48.catwhisker.org 15.0-CURRENT FreeBSD 15.0-CURRENT #262 
main-n275998-82589f926b52: Tue Mar 18 14:17:34 UTC 2025     
r...@g1-48.catwhisker.org:/common/S3/obj/usr/src/amd64.amd64/sys/CANARY amd64 
1500034 1500034
D>
D> Specifically:
D> * I used the slice on the laptop where I had done the "git bisect"
D> * I first issued "git bisect reset"
D> * Then "git pull" to bring /usr/src up to main-n275998-82589f926b52
D> * The "git revert 19df0c5abcb9d4e951e610b6de98d4d8a00bd5f9
D> * Then the usual buildworld, kernel. installworld stuff
D> * Reboot

This needs to be fixed ASAP, it blocks FreeBSD CURRENT usage on laptops.

If this is not fixed by weekend, I will push revert of
19df0c5abcb9d4e951e610b6de98d4d8a00bd5f9, to get tree in a good shape
before beginning of the stabweek.

Just to follow-up on this.  David has been fantastic doing kernel
debugging via email in ddb> with a blank screen in front of him
and got me a core dump.  (*)

He's hitting a ... somewhere in i915kms.ko (here's the two instances I
have):
REDZONE: Buffer underflow detected. 16 bytes corrupted before 
0xfffffe089bc65000 (262148 bytes allocated).
REDZONE: Buffer underflow detected. 16 bytes corrupted before 
0xfffffe08a7e70000 (262148 bytes allocated).

From what I gathered so far it is "generation specific" so depending on
what chipset/model/age the graphics chip is there's different function
pointers.
That likely also explains why other people who tested these
malloc changes have not seen this.
I cannot yet say if/which are affected but I am preparing
some debugging changes locally for him and am already seeing four
different calls through that bit during init (module loading).

I also do build drm-kmod differently to him (I use the github checkout
in /usr/local/sys/ still while he's building the port along with the
kernel.  Also there seems to be some problem loading firmware.

I assume we'll keep debugging it to a point that we can either have a
fix for drm-kmod-6.1 or at least write an intelligent bug report for his
case.

I can't say if a non-debug kernel would "just work" by accident (it
likely has for months) but these things are likely elsewhere too and the
reason for the occasional stuck in X with a dead laptop (while actually
sitting in ddb or gone through a panic) people have been seeing.

While this one is possibly a side-effect of the commit (contigmalloc
instead of malloc) the bug is elsewhere and the two changes which went
in and the one further which is coming may actually help us to make
drm-kmod (amonst other LinuxKPI consumers) more reliable.
I would hope that some DMA problems in wireless land also go away,
especially on arm64.  All painful but helpful.  So I see little reason
to back this change out anymore at this point, but get drm-kmod fixed
instead.

Lots of health,
Bjoern


(*) we should write some of this down for people as it may help in a lot
of situations.

--
Bjoern A. Zeeb                                                     r15:7

Reply via email to