[Bug 268276] Regression: Black screen on resume caused by commit 9e007a88d65b

2022-12-09 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=268276

Bug ID: 268276
   Summary: Regression: Black screen on resume caused by commit
9e007a88d65b
   Product: Base System
   Version: CURRENT
  Hardware: amd64
OS: Any
Status: New
  Severity: Affects Some People
  Priority: ---
 Component: kern
  Assignee: b...@freebsd.org
  Reporter: asha...@badland.io

I've finally narrowed down the cause for suspend/resume breaking on my ryzen
system for the past year. Commit 9e007a88d65b changed the polling rate of
atkbd, which for some reason causes the GPU to disappear off the PCI bus,
leaving the screen black.

Author: Alexander Motin 
Date:   Wed Jan 5 11:32:44 2022 -0500

atkbd: Reduce polling rate from 10Hz to ~1Hz.

In my understanding this is only needed to workaround lost interrupts.
I was thinking to remove it completely, but the comment about edge-
triggered interrupt may be true and needs deeper investigation.  ~1Hz
should be often enough to handle the supposedly rare loss cases, but
rare enough to not appear in top.  Add sysctl hw.atkbd.hz to tune it.

MFC after:  1 month

The workaround is to put sysctl hw.atkbd.hz=10 in /boot/loader.conf

System is AMD Ryzen 9 5900X, TUF Gaming b550-PLUS motherboard, NVIDIA GTX 960.
I did update the motherboard firmware but that didn't help.

Usually when resuming you can ssh into the machine, but if you try to do
anything graphical the following prints:

Dec  9 02:12:32 mick kernel: NVRM: GPU at PCI::07:00:
GPU-8293a5fd-a5ed-570d-283f-675298ebf38c
Dec  9 02:12:32 mick kernel: NVRM: Xid (PCI::07:00): 79, pid='',
name=, GPU has fallen off the bus.
Dec  9 02:12:32 mick kernel: NVRM: GPU :07:00.0: GPU has fallen off the
bus.
Dec  9 02:12:32 mick devd[384]: notify_clients: send() failed; dropping
unresponsive client
Dec  9 02:12:32 mick kernel: nvidia-modeset: ERROR: GPU:0: Failed detecting
connected display devices
Dec  9 02:12:32 mick syslogd: last message repeated 2 times
Dec  9 02:12:32 mick kernel: nvidia-modeset: ERROR: GPU:0: Failure reading
maximum pixel clock value for display device HDMI-0.
Dec  9 02:12:32 mick kernel: nvidia-modeset: ERROR: GPU:0: Failed detecting
connected display devices


I first noticed this on GhostBSD, and for some reason couldn't reproduce the
bisect range on FreeBSD kernels. I had to bisect between GhostBSD's 21.12.24
and 22.3.16 kernel releases to find this commit. Then I could apply the sysctl
workaround to a FreeBSD CURRENT kernel and have suspend/resume working again.

Why was this change made? Is there some performance reason why we don't want to
be polling atkbd so much? I'm not sure why this would affect the entire PCI
bus, but since it breaks suspend resume on certain machines it would be nice to
get a fix into base so things work out of the box again without having to add
the sysctl workaround.

-- 
You are receiving this mail because:
You are the assignee for the bug.


[Bug 268276] Regression: Black screen on resume caused by commit 9e007a88d65b

2022-12-09 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=268276

--- Comment #1 from Alexander Motin  ---
The change was made to save CPU power by not waking up 10 timer per second for
no good reason.  I have no any idea why it could affect GPU.  May be we could
explicitly poll the keyboard during resume, if it help this situation somehow.

-- 
You are receiving this mail because:
You are the assignee for the bug.


[Bug 268276] Regression: Black screen on resume caused by commit 9e007a88d65b

2022-12-09 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=268276

--- Comment #2 from Austin Shafer  ---
I don't think it's directly affecting the GPU per se, but instead is causing
something wacky to happen with the PCI bus. Then the GPU can't be found on the
bus, so the nvidia driver bails and the screen stays black. The GPU fans fully
spin up, so it's not like the GPU doesn't have power.

Thanks, saving CPU makes sense, polling on resume is an interesting idea. I
wonder if not polling makes us do something different that causes us to miss an
interrupt or something. It's hard to know without having more info, but I'm not
sure what I should do to get better data?

---

Also I feel like I should mention this (probably unrelated) issue:
https://github.com/amshafer/nvidia-driver/issues/1

This predates commit 9e007a88d65b and therefore has a different root cause, but
I'm linking it because the symptoms are the same. In that case it looked like
ACPI/buggy firmware, but I find it interesting as a second data point. Maybe
the atkbd polling causes us or the firmware to hit a situation like this for
some reason.

-- 
You are receiving this mail because:
You are the assignee for the bug.


[Bug 268186] Kerberos authentication fails with a Linux/FreeIPA KDC

2022-12-09 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=268186

--- Comment #14 from amend...@gmail.com ---
I have confirmed that I have the right packages/options to use SSSD with MIT
Kerberos. I also tried configuring pam_krb5 as you suggested, and it had no
effect.

Is it possible that SSH is rejecting the ticket before it ever hands it off to
PAM for authentication?

-- 
You are receiving this mail because:
You are the assignee for the bug.


[Bug 267028] General protection fault kernel panic immediately after kldload amdgpu

2022-12-09 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=267028

--- Comment #19 from George Mitchell  ---
Created attachment 238668
  --> https://bugs.freebsd.org/bugzilla/attachment.cgi?id=238668&action=edit
Another crash summary; looks like all the earlier ones

Contrary to my comment #17, I got this same crash this morning, even waiting
five seconds after loading amdgpu.ko before proceeding.  So the delay doesn't
prevent the crash.

-- 
You are receiving this mail because:
You are the assignee for the bug.


[Bug 268186] Kerberos authentication fails with a Linux/FreeIPA KDC

2022-12-09 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=268186

--- Comment #15 from Cy Schubert  ---
(In reply to amendlik from comment #14)
Possibly. Can you post ssh -vvv output, please.

It may be accepting the ticket but refusing to allow the client because one end
or another doesn't support ciphers of the other side. We at $JOB have a lot of
these problems with Windows PuTTY connections because clients are running older
versions of PuTTY or they are using deprecated ciphers no longer supported by
sshd.

-- 
You are receiving this mail because:
You are the assignee for the bug.


[Bug 267871] /usr/bin/rs compile fails after udate to c++

2022-12-09 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=267871

John Baldwin  changed:

   What|Removed |Added

 CC||s...@freebsd.org

--- Comment #6 from John Baldwin  ---
(In reply to devel from comment #5)
Doing a fresh git clone doesn't ensure that /usr/obj is empty.  The only way
make(1) could possibly know to look for rs.c is if it found a generated .depend
file in /usr/obj that contained a reference to rs.c.

One question though is if you are using meta mode by chance?  I don't know if
somehow the implicit SRCS breaks.  bsd.prog.mk has logic to assume .cc
instead of .c for the default value of SRCS if PROG_CXX is used instead of
PROG.  I see a somewhat dubious check for PROG_CXX (seems like it should be
checking something else?) in local.dirdeps.mk, but aside from that I can't see
any way that make would think the implicit source was rs.c instead of rs.cc.

-- 
You are receiving this mail because:
You are the assignee for the bug.


[Bug 254595] 13.0-RC[23]: /usr/home not linked to /home

2022-12-09 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=254595

Tomasz "CeDeROM" CEDRO  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|Open|Closed

--- Comment #1 from Tomasz "CeDeROM" CEDRO  ---
13.1 is out and 13.2 incoming.

-- 
You are receiving this mail because:
You are the assignee for the bug.


[Bug 268265] 13.1-STABLE make buildworld fail

2022-12-09 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=268265

Tomasz "CeDeROM" CEDRO  changed:

   What|Removed |Added

 Status|Open|Closed
 Resolution|--- |Works As Intended

--- Comment #3 from Tomasz "CeDeROM" CEDRO  ---
I have pulled today, builds as expected. Might have also been caused by new
hardware setup. Sorry for the noise :-)

-- 
You are receiving this mail because:
You are the assignee for the bug.