amdgpu, 16.6-rc7+, WARNING: ./include/linux/sched.h:2175 at __ww_mutex_lock.constprop.0+0xec3/0x1ab0, CPU#5: kworker/5:1/122

2025-07-22 Thread Borislav Petkov
Hi, I see this on latest Linus + tip/master from today. Something about clearing blocked tasks' relationships with the same mutex held... [5.222437] [drm] amdgpu kernel modesetting enabled. [5.227168] input: HDA Digital PCBeep as /devices/pci:00/:00:08.1/:03:00.6/sound/card1/

WARNING: drivers/gpu/drm/drm_gem.c:286 at drm_gem_object_handle_put_unlocked+0xb1/0xf0 [drm]

2025-07-08 Thread Borislav Petkov
Hi all, I see the below on -rc5 + tip, on a RN machine. --- [5.592468] cdc_ncm 2-2:2.0 eth0: register 'cdc_ncm' at usb-:03:00.3-2, CDC NCM (NO ZLP), f8:e4:3b:33:37:71 [5.593133] usbcore: registered new interface driver cdc_ncm [5.597944] usbcore: registered new interface driver

Re: amdgpu RENOIR funky complaints in dmesg

2025-05-13 Thread Borislav Petkov
On Mon, May 12, 2025 at 01:22:01PM +, Lin, Wayne wrote: > It's due to a newly merged patch which adds more logs indicating exceptions > while doing AUX transactions. These exceptions might be temporary state with > the DPRx. > Will give another patch to adjust the log. Sorry for any inconvenien

amdgpu RENOIR funky complaints in dmesg

2025-05-12 Thread Borislav Petkov
Hey folks, this is rc6 + tip/master on a Zen2 RN laptop. Needless to say, the complaints are brand new. Thx. [0.875804] ACPI: bus type drm_connector registered [0.877903] [drm] Initialized vgem 1.0.0 for vgem on minor 0 [0.880430] [drm] Initialized vkms 1.0.0 for vkms on minor 1 [

Re: Display Port handling errors out when monitor is slow to wake up

2025-04-30 Thread Borislav Petkov
+ amdgpu folks. On Tue, Apr 29, 2025 at 02:51:21PM +0200, Marcus Rückert wrote: > Hardware: > - ASUS ROG Swift OLED PG27AQDP > - XFX Mercury Radeon RX 9070 XT OC Gaming Edition with RGB, 16GB GDDR6, HDMI, > 3x DP RX-97TRGBBB9 > > Kernel: > - kernel-default-6.15~rc4-1.1.g62ec7c7.x86_64 from >

amdgpu: Reproducible soft lockups when playing games

2025-04-30 Thread Borislav Petkov
+ amdgpu folks On Tue, Apr 29, 2025 at 02:51:56PM +0200, Marcus Rückert wrote: > Hardware: > - ASUS ROG Swift OLED PG27AQDP @ 480 Hz > - LG 27GL850-B @ 144Hz > - XFX Mercury Radeon RX 9070 XT OC Gaming Edition with RGB, 16GB GDDR6, HDMI, > 3x DP RX-97TRGBBB9 > - Ryzen 9 9950X3D on ASUS ProArt X8

Re: kmemleak: Found object by alias at 0xffff888107b65918

2025-01-11 Thread Borislav Petkov
On Thu, Jan 09, 2025 at 03:40:59PM -0500, Alex Deucher wrote: > Possibly fixed by this patch? > https://lore.kernel.org/lkml/CAJZ5v0i=ap+w4QZ8f2DsaHY6D=XUEuSNjyQ-2_=dgolfzjd...@mail.gmail.com/T/ Yap, it does. You can add Reported-by: Borislav Petkov (AMD) Tested-by: Borislav Petkov (AMD

kmemleak: Found object by alias at 0xffff888107b65918

2025-01-10 Thread Borislav Petkov
Hi folks, this is rc6 + tip/master, machine is Carrizo laptop. full dmesg attached. Thx. ... [ 13.271015] [drm] DM_PPLIB:level : 8 [ 13.271658] [drm] Display Core v3.2.310 initialized on DCE 11.0 [ 13.351651] kmemleak: Found object by alias at 0x888107b65918 [ 13.35236

Re: [PATCH] drm/amd: Use a constant format string for amdgpu_ucode_request

2024-10-28 Thread Borislav Petkov
On Mon, Aug 05, 2024 at 04:12:48PM -0400, Alex Deucher wrote: > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_umsch_mm.c > > b/drivers/gpu/drm/amd/amdgpu/amdgpu_umsch_mm.c > > index fbc2852278e1..6162582d0aa2 100644 > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_umsch_mm.c > > +++ b/drivers/gpu/dr

Re: Error in amd driver?

2024-05-06 Thread Borislav Petkov
+ amd-gfx@lists.freedesktop.org On Sun, May 05, 2024 at 09:59:22PM +0300, Tranton Baddy wrote: > I have this in my dmesg since version 6.8.6, not sure when it appeared. Is > amdgpu driver has bug? > [ 64.253144] > == > [ 64.2531

amdgpu kmemleaks

2024-02-28 Thread Borislav Petkov
Hi folks, anyone interested in a bunch of amdgpu kmemleak reports from latest Linus tree + tip? GPU is: [ 11.317312] [drm] amdgpu kernel modesetting enabled. [ 11.363627] [drm] initializing kernel modesetting (CARRIZO 0x1002:0x9874 0x103C:0x807E 0xC4). [ 11.364077] [drm] register mmio bas

Re: [PATCH 06/20] x86/mce/amd: Use helper for GPU UMC bank type checks

2023-11-27 Thread Borislav Petkov
On Sat, Nov 18, 2023 at 01:32:34PM -0600, Yazen Ghannam wrote: > +/* GPU UMCs have MCATYPE=0x1.*/ > +bool smca_gpu_umc_bank_type(u64 ipid) > +{ > + if (!smca_umc_bank_type(ipid)) > + return false; > + > + return FIELD_GET(MCI_IPID_MCATYPE, ipid) == 0x1; > +} And now this tells

Re: [PATCH 05/20] x86/mce/amd: Use helper for UMC bank type check

2023-11-27 Thread Borislav Petkov
On Sat, Nov 18, 2023 at 01:32:33PM -0600, Yazen Ghannam wrote: > @@ -714,14 +721,10 @@ static bool legacy_mce_is_memory_error(struct mce *m) > */ > static bool smca_mce_is_memory_error(struct mce *m) > { > - enum smca_bank_types bank_type; > - > if (XEC(m->status, 0x3f)) >

Re: [PATCH 03/20] x86/mce: Use mce_setup() helpers for apei_smca_report_x86_error()

2023-11-22 Thread Borislav Petkov
On Sat, Nov 18, 2023 at 01:32:31PM -0600, Yazen Ghannam wrote: > Current AMD systems may report MCA errors using the ACPI Boot Error > Record Table (BERT). The BERT entries for MCA errors will be an x86 > Common Platform Error Record (CPER) with an MSR register context that > matches the MCAX/SMCA

Re: [PATCH 02/20] x86/mce: Define mce_setup() helpers for global and per-CPU fields

2023-11-22 Thread Borislav Petkov
On Sat, Nov 18, 2023 at 01:32:30PM -0600, Yazen Ghannam wrote: > +void mce_setup_global(struct mce *m) We usually call those things "common": mce_setup_common(). > +{ > + memset(m, 0, sizeof(struct mce)); > + > + m->cpuid= cpuid_eax(1); > + m->cpuvendor= boot_cpu_data.x86

Re: [PATCH] drm/radeon: Disable outputs when releasing fbdev client

2023-06-09 Thread Borislav Petkov
nd the modesetting > code when the framebuffer got displayed. It only got unpinned once by > the fbdev helper radeon_fbdev_destroy_pinned_object(). Hence TTM's BO- > release function complains about the pin counter. Forcing the outputs > off also undoes the modesettings pin incre

WARNING: CPU: 5 PID: 1464 at drivers/gpu/drm/ttm/ttm_bo.c:326 ttm_bo_release+0x27e/0x2d0 [ttm]

2023-06-05 Thread Borislav Petkov
Hi, this below triggers with the latest Linus tree: 51f269a6ecc7 ("Merge tag 'probes-fixes-6.4-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace") ... [ 16.173593] [drm] radeon kernel modesetting enabled. [ 16.173743] radeon :29:00.0: vgaarb: deactivate vga console

Re: amdgpu refcount saturation

2022-12-23 Thread Borislav Petkov
On Thu, Dec 22, 2022 at 10:20:37PM +0100, Michal Kubecek wrote: > Unfortunately, just like Boris, I always seem to have multiple stack > traces tangled together. See if this fixes it: https://lore.kernel.org/r/20221219104718.21677-1-christian.koe...@amd.com Thx. -- Regards/Gruss, Boris. h

Re: [PATCH] drm/amdgpu: grab extra fence reference for drm_sched_job_add_dependency

2022-12-19 Thread Borislav Petkov
u_vm_sdma.c | 2 ++ > 1 file changed, 2 insertions(+) Thanks, that fixes it. Reported-by: Borislav Petkov (AMD) Tested-by: Borislav Petkov (AMD) -- Regards/Gruss, Boris. https://people.kernel.org/tglx/notes-about-netiquette

amdgpu refcount saturation

2022-12-18 Thread Borislav Petkov
Hi folks, this is with Linus' tree from Wed: 041fae9c105a ("Merge tag 'f2fs-for-6.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs") on a CZ laptop: [7.782901] [drm] initializing kernel modesetting (CARRIZO 0x1002:0x9874 0x103C:0x807E 0xC4) The splat is kinda messy:

Re: RIP: 0010:radeon_vm_fini+0x15/0x220 [radeon]

2022-01-17 Thread Borislav Petkov
On Mon, Jan 17, 2022 at 08:16:09AM +0100, Christian König wrote: > Interesting to see that even that old stuff is still used. Well, "used" is a stretch. This is my way of testing on K8 as pretty much all the big K8 boxes to which I had access to, got decommissioned so this baby is the only K8 rea

Re: [PATCH] drm/amd/pm: avoid duplicate powergate/ungate setting

2021-11-08 Thread Borislav Petkov
On Mon, Nov 08, 2021 at 09:51:03AM +0100, Paul Menzel wrote: > Please elaborate the kind of issues. It fails to reboot on Carrizo-based laptops. Whoever commits this, pls add Link: https://lore.kernel.org/r/yv81vidwqlwva...@zn.tnic so that it is clear what the whole story way. Thx. -- Regard

Re: bf756fb833cb ("drm/amdgpu: add missing cleanups for Polaris12 UVD/VCE on suspend")

2021-11-05 Thread Borislav Petkov
On Fri, Nov 05, 2021 at 08:05:41AM +, Quan, Evan wrote: > I'm wondering are you able to give the attached patch(alone) a try. Yap, looks good. Tested-by: Borislav Petkov -- Regards/Gruss, Boris. https://people.kernel.org/tglx/notes-about-netiquette

Re: [PATCH] drm/amdgpu: fix the hang observed on Carrizo due to UVD suspend failure

2021-10-18 Thread Borislav Petkov
On Mon, Oct 18, 2021 at 03:34:32PM +0800, Evan Quan wrote: > It's confirmed that on some APUs the interaction with SMU(about DPM > disablement) > will power off the UVD. That will make the succeeding interactions with UVD > on the > suspend path impossible. And the system will hang due to that. T

Re: bf756fb833cb ("drm/amdgpu: add missing cleanups for Polaris12 UVD/VCE on suspend")

2021-10-14 Thread Borislav Petkov
On Thu, Oct 14, 2021 at 02:02:48AM +, Quan, Evan wrote: > [Quan, Evan] Yes, but not(apply them) at the same time. One by one as you did > before. > - try the patch1 first Ok, first patch worked fine. > - undo the changes of patch1 and try patch2 Did that, worked fine too except after the fi

Re: bf756fb833cb ("drm/amdgpu: add missing cleanups for Polaris12 UVD/VCE on suspend")

2021-10-13 Thread Borislav Petkov
On Wed, Oct 13, 2021 at 09:19:45AM +, Quan, Evan wrote: > So, I need your help to confirm the last two patches(I sent you) do not > affect the fix for the bug above. > Please follow the steps below to verify it: > 1. Launch a video playing > 2. open another terminal and issue "sudo pm-suspend"

Re: bf756fb833cb ("drm/amdgpu: add missing cleanups for Polaris12 UVD/VCE on suspend")

2021-10-11 Thread Borislav Petkov
On Mon, Oct 11, 2021 at 08:03:51AM +, Quan, Evan wrote: > OK... Then forget about previous patches. Let's try to narrow down the > issue first. Please try the attached patch1 first. If it works, It does. > please undo the changes of patch1 and try patch2 to narrow down further. It does too.

[PATCH -v2] x86/Kconfig: Do not enable AMD_MEM_ENCRYPT_ACTIVE_BY_DEFAULT automatically

2021-10-11 Thread Borislav Petkov
t to have SME enabled, will need to either enable it in their config or use "mem_encrypt=on" on the kernel command line. [ tlendacky: Generalize commit message. ] Fixes: 7744ccdbc16f ("x86/mm: Add Secure Memory Encryption (SME) support") Reported-by: Paul Menzel Signed-off-by: Borislav P

Re: `AMD_MEM_ENCRYPT_ACTIVE_BY_DEFAULT=y` causes AMDGPU to fail on Ryzen: amdgpu: SME is not compatible with RAVEN

2021-10-11 Thread Borislav Petkov
On Mon, Oct 11, 2021 at 03:05:33PM +0200, Paul Menzel wrote: > I think, the IOMMU is enabled on the MSI B350M MORTAR, but otherwise, yes > this looks fine. The help text could also be updated to mention problems > with AMD Raven devices. This is not only about Raven GPUs but, as Alex explained, pr

Re: bf756fb833cb ("drm/amdgpu: add missing cleanups for Polaris12 UVD/VCE on suspend")

2021-10-11 Thread Borislav Petkov
On Sat, Oct 09, 2021 at 09:54:13AM +, Quan, Evan wrote: > Oops, I just found some necessary changes are missing from the patch of the > link below. > https://lists.freedesktop.org/archives/amd-gfx/2021-September/069006.html > > Could you try the patch from the link above + the attached patch?

Re: bf756fb833cb ("drm/amdgpu: add missing cleanups for Polaris12 UVD/VCE on suspend")

2021-10-11 Thread Borislav Petkov
On Sat, Oct 09, 2021 at 01:20:39AM +, Quan, Evan wrote: > Maybe the change below can address your issue. > https://lists.freedesktop.org/archives/amd-gfx/2021-September/069006.html Nope, that one doesn't change anything. Thx. -- Regards/Gruss, Boris. https://people.kernel.org/tglx/note

Re: bf756fb833cb ("drm/amdgpu: add missing cleanups for Polaris12 UVD/VCE on suspend")

2021-10-08 Thread Borislav Petkov
On Fri, Oct 08, 2021 at 11:12:35AM -0400, Alex Deucher wrote: > Can you try swapping the order of > amdgpu_device_ip_set_powergating_state() and > amdgpu_device_ip_set_clockgating_state() in the patch? Nope, the diff below didn't change things. Should I comment them out one by one and see whether

bf756fb833cb ("drm/amdgpu: add missing cleanups for Polaris12 UVD/VCE on suspend")

2021-10-07 Thread Borislav Petkov
Hi folks, commit in $Subject breaks rebooting an HP laptop here with a Carrizo chipset: after typing "reboot" and pressing Enter, it powers off the machine up to a certain point but the fans remain on, screen goes black and nothing happens anymore. No reboot. I have to power it off by holding the

Re: `AMD_MEM_ENCRYPT_ACTIVE_BY_DEFAULT=y` causes AMDGPU to fail on Ryzen: amdgpu: SME is not compatible with RAVEN

2021-10-06 Thread Borislav Petkov
On Wed, Oct 06, 2021 at 02:21:40PM -0400, Alex Deucher wrote: > And just another general comment, swiotlb + bounce buffers isn't > really useful on GPUs. You may have 10-100s of MBs of memory mapped > long term into the GPU's address space for random access. E.g., you > may have buffers in system

Re: `AMD_MEM_ENCRYPT_ACTIVE_BY_DEFAULT=y` causes AMDGPU to fail on Ryzen: amdgpu: SME is not compatible with RAVEN

2021-10-06 Thread Borislav Petkov
On Wed, Oct 06, 2021 at 02:36:56PM -0400, Alex Deucher wrote: > From the x86 model and family info? I think Raven has different > families from other Zen based CPUs. Yeah, I'd like to avoid a f/m/s mapping table, if possible. Those things should be a last resort and they always need adjustment wh

Re: `AMD_MEM_ENCRYPT_ACTIVE_BY_DEFAULT=y` causes AMDGPU to fail on Ryzen: amdgpu: SME is not compatible with RAVEN

2021-10-06 Thread Borislav Petkov
On Wed, Oct 06, 2021 at 02:10:30PM -0400, Alex Deucher wrote: > This is not limited to Raven. That's what the innocuous "a.o." wanted to state. :) > All GPUs (and quite a few other > devices) have a limited DMA mask. AMD GPUs have between 32 and 48 > bits of DMA depending on what generation the

Re: `AMD_MEM_ENCRYPT_ACTIVE_BY_DEFAULT=y` causes AMDGPU to fail on Ryzen: amdgpu: SME is not compatible with RAVEN

2021-10-06 Thread Borislav Petkov
Ok, so I sat down and wrote something and tried to capture all the stuff we so talked about that it is clear in the future why we did it. Thoughts? --- From: Borislav Petkov Date: Wed, 6 Oct 2021 19:34:55 +0200 Subject: [PATCH] x86/Kconfig: Do not enable AMD_MEM_ENCRYPT_ACTIVE_BY_DEFAULT

Re: `AMD_MEM_ENCRYPT_ACTIVE_BY_DEFAULT=y` causes AMDGPU to fail on Ryzen: amdgpu: SME is not compatible with RAVEN

2021-10-06 Thread Borislav Petkov
On Wed, Oct 06, 2021 at 09:23:22AM -0400, Alex Deucher wrote: > There could be some OEM systems that disable the IOMMU on the platform > and don't provide a switch in the bios to enable it. The GPU driver > will still work in that case, it will just not be able to enable KFD > support for ROCm com

Re: `AMD_MEM_ENCRYPT_ACTIVE_BY_DEFAULT=y` causes AMDGPU to fail on Ryzen: amdgpu: SME is not compatible with RAVEN

2021-10-06 Thread Borislav Petkov
On Tue, Oct 05, 2021 at 10:48:15AM -0400, Alex Deucher wrote: > It's not incompatible per se, but SEM requires the IOMMU be enabled > because the C bit used for encryption is beyond the dma_mask of most > devices. If the C bit is not set, the en/decryption for DMA doesn't > occur. So you need IOM

Re: `AMD_MEM_ENCRYPT_ACTIVE_BY_DEFAULT=y` causes AMDGPU to fail on Ryzen: amdgpu: SME is not compatible with RAVEN

2021-10-05 Thread Borislav Petkov
On Tue, Oct 05, 2021 at 04:29:41PM +0200, Paul Menzel wrote: > Selecting the symbol `AMD_MEM_ENCRYPT` – as > done in Debian 5.13.9-1~exp1 [1] – also selects > `AMD_MEM_ENCRYPT_ACTIVE_BY_DEFAULT`, as it defaults to yes, I'm assuming that "selecting" is done automatically: alldefconfig, olddefconfig

Re: [PATCH v4 0/8] Implement generic cc_platform_has() helper function

2021-09-28 Thread Borislav Petkov
On Tue, Sep 28, 2021 at 02:01:57PM -0700, Kuppuswamy, Sathyanarayanan wrote: > Yes. But, since the check is related to TDX, I just want to confirm whether > you are fine with naming the function as intel_*(). Why is this such a big of a deal?! There's amd_cc_platform_has() and intel_cc_platform_h

Re: [PATCH v4 0/8] Implement generic cc_platform_has() helper function

2021-09-28 Thread Borislav Petkov
On Tue, Sep 28, 2021 at 01:48:46PM -0700, Kuppuswamy, Sathyanarayanan wrote: > Just read it. If you want to use cpuid_has_tdx_guest() directly in > cc_platform_has(), then you want to rename intel_cc_platform_has() to > tdx_cc_platform_has()? Why? You simply do: if (cpuid_has_tdx_guest()

Re: [PATCH v4 0/8] Implement generic cc_platform_has() helper function

2021-09-28 Thread Borislav Petkov
On Tue, Sep 28, 2021 at 12:19:49PM -0700, Kuppuswamy, Sathyanarayanan wrote: > Intel CC support patch is not included in this series. You want me > to address the issue raised by Joerg before merging it? Did you not see my email to you today: https://lkml.kernel.org/r/yvl4zughfsh1q...@zn.tnic ?

[PATCH 8/8] treewide: Replace the use of mem_encrypt_active() with cc_platform_has()

2021-09-28 Thread Borislav Petkov
implementation of mem_encrypt_active(), cc_platform_has() does not need to be implemented in s390 (the config option ARCH_HAS_CC_PLATFORM is not set). Signed-off-by: Tom Lendacky Signed-off-by: Borislav Petkov --- arch/powerpc/include/asm/mem_encrypt.h | 5 - arch/powerpc/platforms/pseries/svm.c| 5

[PATCH v4 0/8] Implement generic cc_platform_has() helper function

2021-09-28 Thread Borislav Petkov
From: Borislav Petkov Hi all, here's v4 of the cc_platform_has() patchset with feedback incorporated. I'm going to route this through tip if there are no objections. Thx. Tom Lendacky (8): x86/ioremap: Selectively build arch override encryption functions arch/cc: Introduce a f

[PATCH 4/8] powerpc/pseries/svm: Add a powerpc version of cc_platform_has()

2021-09-28 Thread Borislav Petkov
: Borislav Petkov Acked-by: Michael Ellerman --- arch/powerpc/platforms/pseries/Kconfig | 1 + arch/powerpc/platforms/pseries/Makefile | 2 ++ arch/powerpc/platforms/pseries/cc_platform.c | 26 3 files changed, 29 insertions(+) create mode 100644 arch/powerpc/platforms

[PATCH 5/8] x86/sme: Replace occurrences of sme_active() with cc_platform_has()

2021-09-28 Thread Borislav Petkov
sev_active() that are really geared towards detecting if SME is active. Signed-off-by: Tom Lendacky Signed-off-by: Borislav Petkov --- arch/x86/include/asm/kexec.h | 2 +- arch/x86/include/asm/mem_encrypt.h | 2 -- arch/x86/kernel/machine_kexec_64.c | 15 --- arch/x86/kernel

[PATCH 1/8] x86/ioremap: Selectively build arch override encryption functions

2021-09-28 Thread Borislav Petkov
nally, phys_mem_access_encrypted() is conditionally built as well, but requires a static inline version of it when CONFIG_AMD_MEM_ENCRYPT is not set. Signed-off-by: Tom Lendacky Signed-off-by: Borislav Petkov --- arch/x86/include/asm/io.h | 8 arch/x86/mm/ioremap.c | 2 +- 2

[PATCH 3/8] x86/sev: Add an x86 version of cc_platform_has()

2021-09-28 Thread Borislav Petkov
From: Tom Lendacky Introduce an x86 version of the cc_platform_has() function. This will be used to replace vendor specific calls like sme_active(), sev_active(), etc. Signed-off-by: Tom Lendacky Signed-off-by: Borislav Petkov --- arch/x86/Kconfig | 1 + arch/x86/include

[PATCH 6/8] x86/sev: Replace occurrences of sev_active() with cc_platform_has()

2021-09-28 Thread Borislav Petkov
-by: Borislav Petkov --- arch/x86/include/asm/mem_encrypt.h | 2 -- arch/x86/kernel/crash_dump_64.c| 4 +++- arch/x86/kernel/kvm.c | 3 ++- arch/x86/kernel/kvmclock.c | 4 ++-- arch/x86/kernel/machine_kexec_64.c | 4 ++-- arch/x86/kvm/svm/svm.c | 3

[PATCH 2/8] arch/cc: Introduce a function to check for confidential computing features

2021-09-28 Thread Borislav Petkov
: Tom Lendacky Signed-off-by: Borislav Petkov --- arch/Kconfig| 3 ++ include/linux/cc_platform.h | 88 + 2 files changed, 91 insertions(+) create mode 100644 include/linux/cc_platform.h diff --git a/arch/Kconfig b/arch/Kconfig index

[PATCH 7/8] x86/sev: Replace occurrences of sev_es_active() with cc_platform_has()

2021-09-28 Thread Borislav Petkov
Signed-off-by: Borislav Petkov --- arch/x86/include/asm/mem_encrypt.h | 2 -- arch/x86/kernel/sev.c | 6 +++--- arch/x86/mm/mem_encrypt.c | 24 +++- arch/x86/realmode/init.c | 3 +-- 4 files changed, 7 insertions(+), 28 deletions(-) diff --git

Re: [PATCHv3 2/2] drm/amdgpu: Register MCE notifier for Aldebaran RAS

2021-09-27 Thread Borislav Petkov
On Fri, Sep 24, 2021 at 07:46:10PM +, Yazen Ghannam wrote: > I agree with you in general. But this device isn't really a GPU. And > users of this device seem to want to count *every* error, at least for > now. Aha, so something accelerator-y where they do general purpose computation. So what'

Re: [PATCH v3 5/8] x86/sme: Replace occurrences of sme_active() with cc_platform_has()

2021-09-24 Thread Borislav Petkov
On Fri, Sep 24, 2021 at 12:41:32PM +0300, Kirill A. Shutemov wrote: > On Thu, Sep 23, 2021 at 08:21:03PM +0200, Borislav Petkov wrote: > > On Thu, Sep 23, 2021 at 12:05:58AM +0300, Kirill A. Shutemov wrote: > > > Unless we find other way to guarantee RIP-relative a

Re: [PATCH v3 5/8] x86/sme: Replace occurrences of sme_active() with cc_platform_has()

2021-09-23 Thread Borislav Petkov
On Thu, Sep 23, 2021 at 12:05:58AM +0300, Kirill A. Shutemov wrote: > Unless we find other way to guarantee RIP-relative access, we must use > fixup_pointer() to access any global variables. Yah, I've asked compiler folks about any guarantees we have wrt rip-relative addresses but it doesn't look

Re: [PATCHv3 2/2] drm/amdgpu: Register MCE notifier for Aldebaran RAS

2021-09-23 Thread Borislav Petkov
On Thu, Sep 23, 2021 at 05:23:21PM +, Yazen Ghannam wrote: > Shouldn't the error still be reported to EDAC for decoding and counting? I > think users want this. You know what happens with users getting ECCs reported, right? They think immediately their hw is going bad and start wanting to repl

Re: [PATCHv3 2/2] drm/amdgpu: Register MCE notifier for Aldebaran RAS

2021-09-23 Thread Borislav Petkov
On Thu, Sep 23, 2021 at 02:29:07PM +, Yazen Ghannam wrote: > > + /* > > +* If the error was generated in UMC_V2, which belongs to GPU UMCs, > > +* and error occurred in DramECC (Extended error code = 0) then only > > +* process the error, else bail out. > > +*/ > > + if (!m

Re: [PATCH v3 5/8] x86/sme: Replace occurrences of sme_active() with cc_platform_has()

2021-09-22 Thread Borislav Petkov
On Wed, Sep 22, 2021 at 05:30:15PM +0300, Kirill A. Shutemov wrote: > Not fine, but waiting to blowup with random build environment change. Why is it not fine? Are you suspecting that the compiler might generate something else and not a rip-relative access? -- Regards/Gruss, Boris. https:/

Re: [PATCHv2 1/2] x86/MCE/AMD: Export smca_get_bank_type symbol

2021-09-22 Thread Borislav Petkov
; Want me to ACK this and you can carry it through your tree along with the > > second patch? > > That would be great. Thanks! Ok, with the above changelog removed: Acked-by: Borislav Petkov Thx. -- Regards/Gruss, Boris. https://people.kernel.org/tglx/notes-about-netiquette

Re: [PATCHv2 2/2] drm/amdgpu: Register MCE notifier for Aldebaran RAS

2021-09-22 Thread Borislav Petkov
On Sun, Sep 12, 2021 at 10:13:11PM -0400, Mukul Joshi wrote: > On Aldebaran, GPU driver will handle bad page retirement > even though UMC is host managed. As a result, register a > bad page retirement handler on the mce notifier chain to > retire bad pages on Aldebaran. > > v1->v2: > - Use smca_ge

Re: [PATCHv2 1/2] x86/MCE/AMD: Export smca_get_bank_type symbol

2021-09-22 Thread Borislav Petkov
On Sun, Sep 12, 2021 at 10:13:10PM -0400, Mukul Joshi wrote: > Export smca_get_bank_type for use in the AMD GPU > driver to determine MCA bank while handling correctable > and uncorrectable errors in GPU UMC. > > v1->v2: > - Drop the function is_smca_umc_v2(). > - Drop the patch to introduce a new

Re: [PATCH v3 5/8] x86/sme: Replace occurrences of sme_active() with cc_platform_has()

2021-09-21 Thread Borislav Petkov
On Wed, Sep 22, 2021 at 12:20:59AM +0300, Kirill A. Shutemov wrote: > I still believe calling cc_platform_has() from __startup_64() is totally > broken as it lacks proper wrapping while accessing global variables. Well, one of the issues on the AMD side was using boot_cpu_data too early and the In

Re: [PATCH v3 5/8] x86/sme: Replace occurrences of sme_active() with cc_platform_has()

2021-09-21 Thread Borislav Petkov
On Tue, Sep 21, 2021 at 12:04:58PM -0500, Tom Lendacky wrote: > Looks like instrumentation during early boot. I worked with Boris offline to > exclude arch/x86/kernel/cc_platform.c from some of the instrumentation and > that allowed an allyesconfig to boot. And here's the lineup I have so far, I'd

Re: [PATCH v3 0/8] Implement generic cc_platform_has() helper function

2021-09-16 Thread Borislav Petkov
On Wed, Sep 15, 2021 at 10:26:06AM -0700, Kuppuswamy, Sathyanarayanan wrote: > I have a Intel variant patch (please check following patch). But it includes > TDX changes as well. Shall I move TDX changes to different patch and just > create a separate patch for adding intel_cc_platform_has()? Yes,

Re: [PATCH v3 4/8] powerpc/pseries/svm: Add a powerpc version of cc_platform_has()

2021-09-15 Thread Borislav Petkov
On Wed, Sep 15, 2021 at 07:18:34PM +0200, Christophe Leroy wrote: > Could you please provide more explicit explanation why inlining such an > helper is considered as bad practice and messy ? Tom already told you to look at the previous threads. Let's read them together. This one, for example: htt

Re: [PATCH v3 0/8] Implement generic cc_platform_has() helper function

2021-09-15 Thread Borislav Petkov
On Wed, Sep 08, 2021 at 05:58:31PM -0500, Tom Lendacky wrote: > This patch series provides a generic helper function, cc_platform_has(), > to replace the sme_active(), sev_active(), sev_es_active() and > mem_encrypt_active() functions. > > It is expected that as new confidential computing technolo

Re: [PATCH v3 4/8] powerpc/pseries/svm: Add a powerpc version of cc_platform_has()

2021-09-15 Thread Borislav Petkov
On Wed, Sep 15, 2021 at 10:28:59AM +1000, Michael Ellerman wrote: > I don't love it, a new C file and an out-of-line call to then call back > to a static inline that for most configuration will return false ... but > whatever :) Yeah, hch thinks it'll cause a big mess otherwise: https://lore.kern

Re: [PATCH v3 5/8] x86/sme: Replace occurrences of sme_active() with cc_platform_has()

2021-09-14 Thread Borislav Petkov
On Wed, Sep 08, 2021 at 05:58:36PM -0500, Tom Lendacky wrote: > diff --git a/arch/x86/mm/mem_encrypt.c b/arch/x86/mm/mem_encrypt.c > index 18fe19916bc3..4b54a2377821 100644 > --- a/arch/x86/mm/mem_encrypt.c > +++ b/arch/x86/mm/mem_encrypt.c > @@ -144,7 +144,7 @@ void __init sme_unmap_bootdata(char

Re: [PATCH v3 4/8] powerpc/pseries/svm: Add a powerpc version of cc_platform_has()

2021-09-14 Thread Borislav Petkov
On Tue, Sep 14, 2021 at 04:47:41PM +0200, Christophe Leroy wrote: > Yes, see > https://lore.kernel.org/linuxppc-dev/20210914123919.58203...@canb.auug.org.au/T/#t Aha, more compiler magic stuff ;-\ Oh well, I guess that fix will land upstream soon. Thx. -- Regards/Gruss, Boris. https://pe

Re: [PATCH v3 4/8] powerpc/pseries/svm: Add a powerpc version of cc_platform_has()

2021-09-14 Thread Borislav Petkov
On Wed, Sep 08, 2021 at 05:58:35PM -0500, Tom Lendacky wrote: > Introduce a powerpc version of the cc_platform_has() function. This will > be used to replace the powerpc mem_encrypt_active() implementation, so > the implementation will initially only support the CC_ATTR_MEM_ENCRYPT > attribute. >

Re: [PATCH v3 3/8] x86/sev: Add an x86 version of cc_platform_has()

2021-09-13 Thread Borislav Petkov
On Wed, Sep 08, 2021 at 05:58:34PM -0500, Tom Lendacky wrote: > diff --git a/arch/x86/kernel/cc_platform.c b/arch/x86/kernel/cc_platform.c > new file mode 100644 > index ..3c9bacd3c3f3 > --- /dev/null > +++ b/arch/x86/kernel/cc_platform.c > @@ -0,0 +1,21 @@ > +// SPDX-License-Identifier

Re: [PATCH v3 2/8] mm: Introduce a function to check for confidential computing features

2021-09-13 Thread Borislav Petkov
On Wed, Sep 08, 2021 at 05:58:33PM -0500, Tom Lendacky wrote: > In prep for other confidential computing technologies, introduce a generic preparation > helper function, cc_platform_has(), that can be used to check for specific > active confidential computing attributes, like memory encryption. T

Re: [PATCH] drm/amdgpu: Fix build with missing pm_suspend_target_state module export

2021-08-24 Thread Borislav Petkov
On Tue, Aug 24, 2021 at 07:22:46PM +0530, Lazar, Lijo wrote: > 'pm_suspend_target_state' is only available when CONFIG_PM_SLEEP > is set/enabled. pm_suspend_target_state is available only when CONFIG_SUSPEND is enabled. The extern thing is only a forward declaration. > OTOH, when both SUSPEND and

Re: [PATCH] drm/amdgpu: Fix build with missing pm_suspend_target_state module export

2021-08-24 Thread Borislav Petkov
On Tue, Aug 24, 2021 at 06:38:41PM +0530, Lazar, Lijo wrote: > Without CONFIG_PM_SLEEP and with CONFIG_SUSPEND Can you even create such a .config? > I remember giving a reviewed-by for this one, looks like it never got in. > https://www.spinics.net/lists/amd-gfx/msg66166.html A better version of

[PATCH] drm/amdgpu: Fix build with missing pm_suspend_target_state module export

2021-08-24 Thread Borislav Petkov
From: Borislav Petkov Building a randconfig here triggered: ERROR: modpost: "pm_suspend_target_state" [drivers/gpu/drm/amd/amdgpu/amdgpu.ko] undefined! because the module export of that symbol happens in kernel/power/suspend.c which is enabled with CONFIG_SUSPEND. The ifdef

Re: ERROR: modpost: "pm_suspend_target_state" [drivers/gpu/drm/amd/amdgpu/amdgpu.ko] undefined!

2021-08-23 Thread Borislav Petkov
On Mon, Aug 23, 2021 at 04:31:42PM -0400, Alex Deucher wrote: > Thanks. I think that should do the trick. Care to send that as a > formal patch? Sure, but let me run it through the randconfigs tests first to make sure nothing else breaks. It is late here so if I don't manage now I'll send you a fo

Re: ERROR: modpost: "pm_suspend_target_state" [drivers/gpu/drm/amd/amdgpu/amdgpu.ko] undefined!

2021-08-23 Thread Borislav Petkov
On Mon, Aug 23, 2021 at 03:49:39PM -0400, Alex Deucher wrote: > Maybe fixed with this patch? > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=5706cb3c910cc8283f344bc37a889a8d523a2c6d Nope, this one is already in: $ git tag --contains 5706cb3c910cc8283f344bc37a889a8d

Re: [PATCH v2 03/12] x86/sev: Add an x86 version of prot_guest_has()

2021-08-19 Thread Borislav Petkov
On Thu, Aug 19, 2021 at 10:52:53AM +0100, Christoph Hellwig wrote: > Which suggest that the name is not good to start with. Maybe protected > hardware, system or platform might be a better choice? Yah, coming up with a proper name here hasn't been easy. prot_guest_has() is not the first variant.

Re: [PATCH v2 06/12] x86/sev: Replace occurrences of sev_active() with prot_guest_has()

2021-08-17 Thread Borislav Petkov
On Tue, Aug 17, 2021 at 10:26:18AM -0500, Tom Lendacky wrote: > >>/* > >> - * If SME is active we need to be sure that kexec pages are > >> - * not encrypted because when we boot to the new kernel the > >> + * If host memory encryption is active we need to be sure that kexec > >> + * pa

Re: [PATCH v2 05/12] x86/sme: Replace occurrences of sme_active() with prot_guest_has()

2021-08-17 Thread Borislav Petkov
On Tue, Aug 17, 2021 at 09:46:58AM -0500, Tom Lendacky wrote: > I'm ok with letting the TDX folks make changes to these calls to be SME or > SEV specific, if necessary, later. Yap, exactly. Let's add the specific stuff only when really needed. Thx. -- Regards/Gruss, Boris. https://people.k

Re: [PATCH v2 03/12] x86/sev: Add an x86 version of prot_guest_has()

2021-08-17 Thread Borislav Petkov
On Tue, Aug 17, 2021 at 10:22:52AM -0500, Tom Lendacky wrote: > I can change it to be an AMD/HYGON check... although, I'll have to check > to see if any (very) early use of the function will work with that. We can always change it later if really needed. It is just that I'm not a fan of such "pre

Re: [PATCH v2 09/12] mm: Remove the now unused mem_encrypt_active() function

2021-08-17 Thread Borislav Petkov
On Fri, Aug 13, 2021 at 11:59:28AM -0500, Tom Lendacky wrote: > The mem_encrypt_active() function has been replaced by prot_guest_has(), > so remove the implementation. > > Reviewed-by: Joerg Roedel > Signed-off-by: Tom Lendacky > --- > include/linux/mem_encrypt.h | 4 > 1 file changed, 4

Re: [PATCH v2 07/12] x86/sev: Replace occurrences of sev_es_active() with prot_guest_has()

2021-08-17 Thread Borislav Petkov
ture support is added for other memory encyrption > techonologies, the use of PATTR_GUEST_PROT_STATE can be updated, as > required, to specifically use PATTR_SEV_ES. > > Cc: Thomas Gleixner > Cc: Ingo Molnar > Cc: Borislav Petkov > Signed-off-by: Tom Lendacky > --- >

Re: [PATCH v2 04/12] powerpc/pseries/svm: Add a powerpc version of prot_guest_has()

2021-08-17 Thread Borislav Petkov
On Fri, Aug 13, 2021 at 11:59:23AM -0500, Tom Lendacky wrote: > Introduce a powerpc version of the prot_guest_has() function. This will > be used to replace the powerpc mem_encrypt_active() implementation, so > the implementation will initially only support the PATTR_MEM_ENCRYPT > attribute. > > C

Re: [PATCH v2 06/12] x86/sev: Replace occurrences of sev_active() with prot_guest_has()

2021-08-17 Thread Borislav Petkov
On Fri, Aug 13, 2021 at 11:59:25AM -0500, Tom Lendacky wrote: > diff --git a/arch/x86/kernel/machine_kexec_64.c > b/arch/x86/kernel/machine_kexec_64.c > index 8e7b517ad738..66ff788b79c9 100644 > --- a/arch/x86/kernel/machine_kexec_64.c > +++ b/arch/x86/kernel/machine_kexec_64.c > @@ -167,7 +167,7

Re: [PATCH v2 05/12] x86/sme: Replace occurrences of sme_active() with prot_guest_has()

2021-08-17 Thread Borislav Petkov
On Fri, Aug 13, 2021 at 11:59:24AM -0500, Tom Lendacky wrote: > diff --git a/arch/x86/mm/mem_encrypt.c b/arch/x86/mm/mem_encrypt.c > index edc67ddf065d..5635ca9a1fbe 100644 > --- a/arch/x86/mm/mem_encrypt.c > +++ b/arch/x86/mm/mem_encrypt.c > @@ -144,7 +144,7 @@ void __init sme_unmap_bootdata(char

Re: [PATCH v2 09/12] mm: Remove the now unused mem_encrypt_active() function

2021-08-17 Thread Borislav Petkov
On Tue, Aug 17, 2021 at 12:22:33PM +0200, Borislav Petkov wrote: > This one wants to be part of the previous patch. ... and the three following patches too - the treewide patch does a single atomic :) replacement and that's it. -- Regards/Gruss, Boris. https://people.kernel.org/tg

Re: [PATCH v2 03/12] x86/sev: Add an x86 version of prot_guest_has()

2021-08-16 Thread Borislav Petkov
On Sun, Aug 15, 2021 at 08:53:31AM -0500, Tom Lendacky wrote: > It's not a cross-vendor thing as opposed to a KVM or other hypervisor > thing where the family doesn't have to be reported as AMD or HYGON. What would be the use case? A HV starts a guest which is supposed to be encrypted using the AM

Re: [PATCH v2 02/12] mm: Introduce a function to check for virtualization protection features

2021-08-16 Thread Borislav Petkov
On Fri, Aug 13, 2021 at 11:59:21AM -0500, Tom Lendacky wrote: > In prep for other protected virtualization technologies, introduce a > generic helper function, prot_guest_has(), that can be used to check > for specific protection attributes, like memory encryption. This is > intended to eliminate h

Re: [PATCH v2 01/12] x86/ioremap: Selectively build arch override encryption functions

2021-08-16 Thread Borislav Petkov
ecrypted() > - memremap_is_efi_data() > - memremap_is_setup_data() > - early_memremap_is_setup_data() > > And finally, phys_mem_access_encrypted() is conditionally built as well, > but requires a static inline version of it when CONFIG_AMD_MEM_ENCRYPT is > not set. > > Cc: Thomas Gleixne

Re: [PATCH v2 03/12] x86/sev: Add an x86 version of prot_guest_has()

2021-08-16 Thread Borislav Petkov
On Fri, Aug 13, 2021 at 11:59:22AM -0500, Tom Lendacky wrote: > diff --git a/arch/x86/include/asm/protected_guest.h > b/arch/x86/include/asm/protected_guest.h > new file mode 100644 > index ..51e4eefd9542 > --- /dev/null > +++ b/arch/x86/include/asm/protected_guest.h > @@ -0,0 +1,29 @@

Re: [PATCH 01/11] mm: Introduce a function to check for virtualization protection features

2021-07-28 Thread Borislav Petkov
On Wed, Jul 28, 2021 at 02:17:27PM +0100, Christoph Hellwig wrote: > So common checks obviously make sense, but I really hate the stupid > multiplexer. Having one well-documented helper per feature is much > easier to follow. We had that in x86 - it was called cpu_has_ where xxx is the feature bi

Re: [5.13-rc1][bug] often hangs for no reason

2021-05-17 Thread Borislav Petkov
On Mon, May 17, 2021 at 03:27:23AM +0500, Mikhail Gavrilov wrote: > Hi folks. > 5.13-rc1 after 5.13-rc0 is a disaster because it hangs and hangs again > after reboot. > All hang's have in common is that they all happens in > smp_call_function_many_cond function (I compared all trace [1], [2], > [3]

Re: [PATCH] drm/amdgpu: Register bad page handler for Aldebaran

2021-05-14 Thread Borislav Petkov
On Fri, May 14, 2021 at 01:06:33PM +, Joshi, Mukul wrote: > We have RAS functionality in other ASICs that is not dependent on > CONFIG_X86_MCE_AMD. So, I don't think we would want to do that just > for one ASIC. Lemme try again: you said that those errors do get reported through a deferred int

Re: [PATCH] drm/amdgpu: Register bad page handler for Aldebaran

2021-05-14 Thread Borislav Petkov
On Thu, May 13, 2021 at 11:10:34PM +, Joshi, Mukul wrote: > That's probably not the best example to look at. Oh, it is the *perfect* example but... > smca_get_long_name() is used in drivers/edac/mce_amd.c and this file > doesn't get compiled when CONFIG_X86_MCE_AMD is not defined. > > And amd

Re: [PATCH] drm/amdgpu: Register bad page handler for Aldebaran

2021-05-14 Thread Borislav Petkov
On Thu, May 13, 2021 at 11:14:30PM +, Joshi, Mukul wrote: > Are you OK with a new MCE priority (MCE_PRIO_ACCEL) or do you want us to use > something else? I still don't know why a separate priority is needed. Maybe this still needs answering: > It is a deferred interrupt that generates an MCE

Re: [PATCH] drm/amdgpu: Register bad page handler for Aldebaran

2021-05-13 Thread Borislav Petkov
On Thu, May 13, 2021 at 10:32:45AM -0400, Alex Deucher wrote: > Right. The sys admin can query the bad page count and decide when to > retire the card. Yap, although the driver should actively "tell" the sysadmin when some critical counts of retired VRAM pages are reached because I doubt all admi

Re: [PATCH] drm/amdgpu: Register bad page handler for Aldebaran

2021-05-13 Thread Borislav Petkov
On Thu, May 13, 2021 at 10:17:47AM -0400, Alex Deucher wrote: > The bad pages are stored in an EEPROM on the board and the next time > the driver loads it reads the EEPROM so that it can reserve the bad > pages at init time so they don't get used again. And that works automagically on the next boo

Re: [PATCH] drm/amdgpu: Register bad page handler for Aldebaran

2021-05-13 Thread Borislav Petkov
On Thu, May 13, 2021 at 03:20:36AM +, Joshi, Mukul wrote: > Exporting smca_get_bank_type() works fine when CONFIG_X86_MCE_AMD is defined. > I would need to put #ifdef CONFIG_X86_MCE_AMD in my code to compile the amdgpu > driver when CONFIG_X86_MCE_AMD is not defined. > I can avoid all that by u

Re: [PATCH] drm/amdgpu: Register bad page handler for Aldebaran

2021-05-12 Thread Borislav Petkov
On Wed, May 12, 2021 at 07:00:58PM +, Joshi, Mukul wrote: > SMCA UMCv2 corresponds to GPU's UMC MCA bank and the GPU driver is > only interested in errors on GPU UMC. So that thing should be called SMCA_GPU_UMC not SMCA_UMC_V2. > We cannot know this without is_smca_umc_v2. You don't need it

  1   2   >