Re: BUG: KASAN: vmalloc-out-of-bounds in copy_to_kernel_nofault+0xd8/0x1c8 (v6.13-rc6, PowerMac G4)
Le 22/01/2025 à 00:21, Erhard Furtner a écrit : On Tue, 21 Jan 2025 23:07:25 +0100 Christophe Leroy wrote: Meanwhile I bisected the bug. Offending commit is: # git bisect good 32913f348229c9f72dda45fc2c08c6d9dfcd3d6d is the first bad commit commit 32913f348229c9f72dda45fc2c08c6d9dfcd3d6d Author: Linus Torvalds Date: Mon Dec 9 10:00:25 2024 -0800 futex: fix user access on powerpc The powerpc user access code is special, and unlike other architectures distinguishes between user access for reading and writing. And commit 43a43faf5376 ("futex: improve user space accesses") messed that up. It went undetected elsewhere, but caused ppc32 to fail early during boot, because the user access had been started with user_read_access_begin(), but then finished off with just a plain "user_access_end()". Note that the address-masking user access helpers don't even have that read-vs-write distinction, so if powerpc ever wants to do address masking tricks, we'll have to do some extra work for it. [ Make sure to also do it for the EFAULT case, as pointed out by Christophe Leroy ] Reported-by: Andreas Schwab Cc: Christophe Leroy Link: https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.kernel.org%2Fall%2F87bjxl6b0i.fsf%40igel.home%2F&data=05%7C02%7Cchristophe.leroy%40csgroup.eu%7Cb4c1dc7184f54a410a0e08dd3a7270b6%7C8b87af7d86474dc78df45f69a2011bb5%7C0%7C0%7C638730985407902881%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=E5Yp9jopCPE1NFuBM8rs%2B1jXZ%2FXAaKvBGpcEP%2BaMyz0%3D&reserved=0 Signed-off-by: Linus Torvalds kernel/futex/futex.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) Indeed, reverting 32913f348229c9f72dda45fc2c08c6d9dfcd3d6d on top of v6.13 makes the KASAN hit disappear. That looks terribly odd. On G4, user_read_access_begin() and user_read_access_end() are no-op because book3s/32 can only protect user access by kernel against write. Read is always granted. So the bug must be an indirect side effect of what user_access_end() does. user_access_end() does a sync. Would the lack of sync (once replaced user_access_end() by user_read_access_end() ) lead to some odd re-ordering ? Or another possibility is that user_access_end() is called on some kernel address (I see in the description of commit 43a43faf5376 ("futex: improve user space accesses") that the replaced __get_user() was expected to work on kernel adresses) ? Calling user_access_begin() and user_access_end() is unexpected and there is no guard so it could lead to strange segment settings which hides a KASAN hit. But once the fix the issue the KASAN resurfaces ? Could this be the problem ? Do you have a way to reproduce the bug on QEMU ? It would enable me to investigate it further. Attached v6.13 .config plays nicely with qemu ttyS0 (forgot to disable SERIAL_8250 and set SERIAL_PMACZILOG + SERIAL_PMACZILOG_CONSOLE instead as I prefer the PCI Serial card in my G4). The KASAN hit also shows up on qemu 8.2.7 via via: qemu-system-ppc -machine mac99,via=pmu -cpu 7450 -m 2G -nographic -append console=ttyS0 -kernel vmlinux-6.13.0-PMacG4 -hda Debian-VM_g4.img I was able to reproduce it with v6.13 with QEMU when loading test_bpf module. On my side, the problem doesn't disappear when reverting of commit 32913f348229 ("futex: fix user access on powerpc") I bisected it to commit e4137f08816b ("mm, kasan, kmsan: instrument copy_from/to_kernel_nofault"), which makes a lot more sense to me. It might be a problem in the way patch_instruction() is implemented on powerpc, to be investigated. Christophe
Re: Re: [PATCH v2] treewide: const qualify ctl_tables where applicable
On Tue, Jan 21, 2025 at 02:40:16PM +0100, Alexander Gordeev wrote: > On Fri, Jan 10, 2025 at 03:16:08PM +0100, Joel Granados wrote: > > Hi Joel, > > > Add the const qualifier to all the ctl_tables in the tree except for > > watchdog_hardlockup_sysctl, memory_allocation_profiling_sysctls, > > loadpin_sysctl_table and the ones calling register_net_sysctl (./net, > > drivers/inifiniband dirs). These are special cases as they use a > > registration function with a non-const qualified ctl_table argument or > > modify the arrays before passing them on to the registration function. > > > > Constifying ctl_table structs will prevent the modification of > > proc_handler function pointers as the arrays would reside in .rodata. > > This is made possible after commit 78eb4ea25cd5 ("sysctl: treewide: > > constify the ctl_table argument of proc_handlers") constified all the > > proc_handlers. > > I could identify at least these occurences in s390 code as well: Hey Alexander Thx for bringing these to my attention. I had completely missed them as the spatch only deals with ctl_tables outside functions. Short answer: These should not be included in the current patch because they are a different pattern from how sysctl tables are usually used. So I will not include them. With that said, I think it might be interesting to look closer at them as they seem to be complicating the proc_handler (I have to look at them closer). I see that they are defining a ctl_table struct within the functions and just using the data (from the incoming ctl_table) to forward things down to proc_do{u,}intvec_* functions. This is very odd and I have only seen it done in order to change the incoming ctl_table (which is not what is being done here). I will take a closer look after the merge window and circle back with more info. Might take me a while as I'm not very familiar with s390 code; any additional information on why those are being used inside the functions would be helpfull. Best > > diff --git a/arch/s390/appldata/appldata_base.c > b/arch/s390/appldata/appldata_base.c > index dd7ba7587dd5..9b83c318f919 100644 > --- a/arch/s390/appldata/appldata_base.c > +++ b/arch/s390/appldata/appldata_base.c > @@ -204,7 +204,7 @@ appldata_timer_handler(const struct ctl_table *ctl, int > write, > { > int timer_active = appldata_timer_active; > int rc; > - struct ctl_table ctl_entry = { > + const struct ctl_table ctl_entry = { > .procname = ctl->procname, > .data = &timer_active, > .maxlen = sizeof(int), > @@ -237,7 +237,7 @@ appldata_interval_handler(const struct ctl_table *ctl, > int write, > { > int interval = appldata_interval; > int rc; > - struct ctl_table ctl_entry = { > + const struct ctl_table ctl_entry = { > .procname = ctl->procname, > .data = &interval, > .maxlen = sizeof(int), > @@ -269,7 +269,7 @@ appldata_generic_handler(const struct ctl_table *ctl, int > write, > struct list_head *lh; > int rc, found; > int active; > - struct ctl_table ctl_entry = { > + const struct ctl_table ctl_entry = { > .data = &active, > .maxlen = sizeof(int), > .extra1 = SYSCTL_ZERO, > diff --git a/arch/s390/kernel/hiperdispatch.c > b/arch/s390/kernel/hiperdispatch.c > index 7857a7e8e56c..7d0ba16085c1 100644 > --- a/arch/s390/kernel/hiperdispatch.c > +++ b/arch/s390/kernel/hiperdispatch.c > @@ -273,7 +273,7 @@ static int hiperdispatch_ctl_handler(const struct > ctl_table *ctl, int write, > { > int hiperdispatch; > int rc; > - struct ctl_table ctl_entry = { > + const struct ctl_table ctl_entry = { > .procname = ctl->procname, > .data = &hiperdispatch, > .maxlen = sizeof(int), > diff --git a/arch/s390/kernel/topology.c b/arch/s390/kernel/topology.c > index 6691808bf50a..26e50de83d80 100644 > --- a/arch/s390/kernel/topology.c > +++ b/arch/s390/kernel/topology.c > @@ -629,7 +629,7 @@ static int topology_ctl_handler(const struct ctl_table > *ctl, int write, > int enabled = topology_is_enabled(); > int new_mode; > int rc; > - struct ctl_table ctl_entry = { > + const struct ctl_table ctl_entry = { > .procname = ctl->procname, > .data = &enabled, > .maxlen = sizeof(int), > @@ -658,7 +658,7 @@ static int polarization_ctl_handler(const struct > ctl_table *ctl, int write, > { > int polarization; > int rc; > - struct ctl_table ctl_entry = { > + const struct ctl_table ctl_entry = { > .procname = ctl->procname, > .data = &polarization, > .maxlen = sizeof(int), > diff --git a/arch/s390/mm/cmm.c b/arch/s390/mm/cmm.c > index 939e3bec2db7..8
Re: [PATCH v2 0/2] PCI/AER: Report fatal errors of RCiEP and EP if link recoverd
Hi, all, Gentle ping. Best Regards, Shuai 在 2024/12/24 19:03, Shuai Xue 写道: 在 2024/11/12 21:54, Shuai Xue 写道: changes since v1: - rewrite commit log per Bjorn - refactor aer_get_device_error_info to reduce duplication per Keith - fix to avoid reporting fatal errors twice for root and downstream ports per Keith The AER driver has historically avoided reading the configuration space of an endpoint or RCiEP that reported a fatal error, considering the link to that device unreliable. Consequently, when a fatal error occurs, the AER and DPC drivers do not report specific error types, resulting in logs like: pcieport :30:03.0: EDR: EDR event received pcieport :30:03.0: DPC: containment event, status:0x0005 source:0x3400 pcieport :30:03.0: DPC: ERR_FATAL detected pcieport :30:03.0: AER: broadcast error_detected message nvme nvme0: frozen state error detected, reset controller nvme :34:00.0: ready 0ms after DPC pcieport :30:03.0: AER: broadcast slot_reset message AER status registers are sticky and Write-1-to-clear. If the link recovered after hot reset, we can still safely access AER status of the error device. In such case, report fatal errors which helps to figure out the error root case. - Patch 1/2 identifies the error device by SOURCE ID register - Patch 2/3 reports the AER status if link recoverd. After this patch set, the logs like: pcieport :30:03.0: EDR: EDR event received pcieport :30:03.0: DPC: containment event, status:0x0005 source:0x3400 pcieport :30:03.0: DPC: ERR_FATAL detected pcieport :30:03.0: AER: broadcast error_detected message nvme nvme0: frozen state error detected, reset controller pcieport :30:03.0: waiting 100 ms for downstream link, after activation nvme :34:00.0: ready 0ms after DPC nvme :34:00.0: PCIe Bus Error: severity=Uncorrectable (Fatal), type=Data Link Layer, (Receiver ID) nvme :34:00.0: device [144d:a804] error status/mask=0010/00504000 nvme :34:00.0: [ 4] DLP (First) pcieport :30:03.0: AER: broadcast slot_reset message Shuai Xue (2): PCI/DPC: Run recovery on device that detected the error PCI/AER: Report fatal errors of RCiEP and EP if link recoverd drivers/pci/pci.h | 5 +++-- drivers/pci/pcie/aer.c | 11 +++ drivers/pci/pcie/dpc.c | 32 +--- drivers/pci/pcie/edr.c | 35 ++- drivers/pci/pcie/err.c | 9 + 5 files changed, 62 insertions(+), 30 deletions(-) Hi, all, Gentle ping. Best Regards, Shuai
Re: Re: [PATCH v2] treewide: const qualify ctl_tables where applicable
On Wed, 22 Jan 2025 at 13:25, Joel Granados wrote: > > On Tue, Jan 21, 2025 at 02:40:16PM +0100, Alexander Gordeev wrote: > > On Fri, Jan 10, 2025 at 03:16:08PM +0100, Joel Granados wrote: > > > > Hi Joel, > > > > > Add the const qualifier to all the ctl_tables in the tree except for > > > watchdog_hardlockup_sysctl, memory_allocation_profiling_sysctls, > > > loadpin_sysctl_table and the ones calling register_net_sysctl (./net, > > > drivers/inifiniband dirs). These are special cases as they use a > > > registration function with a non-const qualified ctl_table argument or > > > modify the arrays before passing them on to the registration function. > > > > > > Constifying ctl_table structs will prevent the modification of > > > proc_handler function pointers as the arrays would reside in .rodata. > > > This is made possible after commit 78eb4ea25cd5 ("sysctl: treewide: > > > constify the ctl_table argument of proc_handlers") constified all the > > > proc_handlers. > > > > I could identify at least these occurences in s390 code as well: > Hey Alexander > > Thx for bringing these to my attention. I had completely missed them as > the spatch only deals with ctl_tables outside functions. > > Short answer: > These should not be included in the current patch because they are a > different pattern from how sysctl tables are usually used. So I will not > include them. > > With that said, I think it might be interesting to look closer at them > as they seem to be complicating the proc_handler (I have to look at them > closer). > > I see that they are defining a ctl_table struct within the functions and > just using the data (from the incoming ctl_table) to forward things down > to proc_do{u,}intvec_* functions. This is very odd and I have only seen > it done in order to change the incoming ctl_table (which is not what is > being done here). > > I will take a closer look after the merge window and circle back with > more info. Might take me a while as I'm not very familiar with s390 > code; any additional information on why those are being used inside the > functions would be helpfull. > Using const data on the stack is not as useful, because the stack is always mapped writable. Global data structures marked 'const' will be moved into an ELF section that is typically mapped read-only in its entirely, and so the data cannot be modified by writing to it directly. No such protection is possible for the stack, and so the constness there is only enforced at compile time.
Re: [PATCH] powerpc/pseries/iommu: Don't unset window if it was never set
On Mon, 13 Jan 2025 03:48:55 +, Shivaprasad G Bhat wrote: > On pSeries, when user attempts to use the same vfio container used by > different iommu group, the spapr_tce_set_window() returns -EPERM > and the subsequent cleanup leads to the below crash. > >Kernel attempted to read user page (308) - exploit attempt? >BUG: Kernel NULL pointer dereference on read at 0x0308 >Faulting instruction address: 0xc01ce358 >Oops: Kernel access of bad area, sig: 11 [#1] >NIP: c01ce358 LR: c01ce05c CTR: c005add0 > >NIP [c01ce358] spapr_tce_unset_window+0x3b8/0x510 >LR [c01ce05c] spapr_tce_unset_window+0xbc/0x510 >Call Trace: > spapr_tce_unset_window+0xbc/0x510 (unreliable) > tce_iommu_attach_group+0x24c/0x340 [vfio_iommu_spapr_tce] > vfio_container_attach_group+0xec/0x240 [vfio] > vfio_group_fops_unl_ioctl+0x548/0xb00 [vfio] > sys_ioctl+0x754/0x1580 > system_call_exception+0x13c/0x330 > system_call_vectored_common+0x15c/0x2ec > >--- interrupt: 3000 > > [...] Applied to powerpc/next. [1/1] powerpc/pseries/iommu: Don't unset window if it was never set https://git.kernel.org/powerpc/c/17391cb2613b82f8c405570fea605af3255ff8d2 Thanks
Re: [PATCH v4 RESEND] powerpc/pseries/eeh: Fix get PE state translation
On Thu, 16 Jan 2025 04:39:54 -0600, Narayana Murty N wrote: > The PE Reset State "0" returned by RTAS calls > "ibm_read_slot_reset_[state|state2]" indicates that the reset is > deactivated and the PE is in a state where MMIO and DMA are allowed. > However, the current implementation of "pseries_eeh_get_state()" does > not reflect this, causing drivers to incorrectly assume that MMIO and > DMA operations cannot be resumed. > > [...] Applied to powerpc/next. [1/1] powerpc/pseries/eeh: Fix get PE state translation https://git.kernel.org/powerpc/c/11b93559000c686ad7e5ab0547e76f21cc143844 Thanks
Re: [PATCH v2] powerpc: increase MIN RMA size for CAS negotiation
On Fri, 06 Dec 2024 12:25:45 +0530, Avnish Chouhan wrote: > Change RMA size from 512 MB to 768 MB which will result > in more RMA at boot time for PowerPC. When PowerPC LPAR use/uses vTPM, > Secure Boot or FADump, the 512 MB RMA memory is not sufficient for > booting. With this 512 MB RMA, GRUB2 run out of memory and unable to > load the necessary. Sometimes even usage of CDROM which requires more > memory for installation along with the options mentioned above troubles > the boot memory and result in boot failures. Increasing the RMA size > will resolves multiple out of memory issues observed in PowerPC. > > [...] Applied to powerpc/next. [1/1] powerpc: increase MIN RMA size for CAS negotiation https://git.kernel.org/powerpc/c/ae908b87b6bb32c170e9baf5858f2a7553cacc06 Thanks
[PATCH v2] fs: introduce getfsxattrat and setfsxattrat syscalls
From: Andrey Albershteyn Introduce getfsxattrat and setfsxattrat syscalls to manipulate inode extended attributes/flags. The syscalls take parent directory FD and path to the child together with struct fsxattr. This is an alternative to FS_IOC_FSSETXATTR ioctl with a difference that file don't need to be open. By having this we can manipulated inode extended attributes not only on normal files but also on special ones. This is not possible with FS_IOC_FSSETXATTR ioctl as opening special files returns VFS special inode instead of underlying filesystem one. This patch adds two new syscalls which allows userspace to set extended inode attributes on special files by using parent directory to open FS inode. Also, as vfs_fileattr_set() is now will be called on special files too, let's forbid any other attributes except projid and nextents (symlink can have an extent). CC: linux-...@vger.kernel.org CC: linux-fsde...@vger.kernel.org CC: linux-...@vger.kernel.org Signed-off-by: Andrey Albershteyn --- v1: https://lore.kernel.org/linuxppc-dev/20250109174540.893098-1-aalbe...@kernel.org/ Previous discussion: https://lore.kernel.org/linux-xfs/20240520164624.665269-2-aalbe...@redhat.com/ XFS has project quotas which could be attached to a directory. All new inodes in these directories inherit project ID set on parent directory. The project is created from userspace by opening and calling FS_IOC_FSSETXATTR on each inode. This is not possible for special files such as FIFO, SOCK, BLK etc. Therefore, some inodes are left with empty project ID. Those inodes then are not shown in the quota accounting but still exist in the directory. Moreover, in the case when special files are created in the directory with already existing project quota, these inode inherit extended attributes. This than leaves them with these attributes without the possibility to clear them out. This, in turn, prevents userspace from re-creating quota project on these existing files. --- arch/alpha/kernel/syscalls/syscall.tbl | 2 + arch/arm/tools/syscall.tbl | 2 + arch/arm64/tools/syscall_32.tbl | 2 + arch/m68k/kernel/syscalls/syscall.tbl | 2 + arch/microblaze/kernel/syscalls/syscall.tbl | 2 + arch/mips/kernel/syscalls/syscall_n32.tbl | 2 + arch/mips/kernel/syscalls/syscall_n64.tbl | 2 + arch/mips/kernel/syscalls/syscall_o32.tbl | 2 + arch/parisc/kernel/syscalls/syscall.tbl | 2 + arch/powerpc/kernel/syscalls/syscall.tbl| 2 + arch/s390/kernel/syscalls/syscall.tbl | 2 + arch/sh/kernel/syscalls/syscall.tbl | 2 + arch/sparc/kernel/syscalls/syscall.tbl | 2 + arch/x86/entry/syscalls/syscall_32.tbl | 2 + arch/x86/entry/syscalls/syscall_64.tbl | 2 + arch/xtensa/kernel/syscalls/syscall.tbl | 2 + fs/inode.c | 99 + fs/ioctl.c | 16 - include/linux/fileattr.h| 1 + include/linux/syscalls.h| 4 ++ include/uapi/asm-generic/unistd.h | 8 ++- 21 files changed, 157 insertions(+), 3 deletions(-) diff --git a/arch/alpha/kernel/syscalls/syscall.tbl b/arch/alpha/kernel/syscalls/syscall.tbl index c59d53d6d3f3490f976ca179ddfe02e69265ae4d..4b9e687494c16b60c6fd6ca1dc4d6564706a7e25 100644 --- a/arch/alpha/kernel/syscalls/syscall.tbl +++ b/arch/alpha/kernel/syscalls/syscall.tbl @@ -506,3 +506,5 @@ 574common getxattrat sys_getxattrat 575common listxattrat sys_listxattrat 576common removexattrat sys_removexattrat +577common getfsxattratsys_getfsxattrat +578common setfsxattratsys_setfsxattrat diff --git a/arch/arm/tools/syscall.tbl b/arch/arm/tools/syscall.tbl index 49eeb2ad8dbd8e074c6240417693f23fb328afa8..66466257f3c2debb3e2299f0b608c6740c98cab2 100644 --- a/arch/arm/tools/syscall.tbl +++ b/arch/arm/tools/syscall.tbl @@ -481,3 +481,5 @@ 464common getxattrat sys_getxattrat 465common listxattrat sys_listxattrat 466common removexattrat sys_removexattrat +467common getfsxattratsys_getfsxattrat +468common setfsxattratsys_setfsxattrat diff --git a/arch/arm64/tools/syscall_32.tbl b/arch/arm64/tools/syscall_32.tbl index 69a829912a05eb8a3e21ed701d1030e31c0148bc..9c516118b154811d8d11d5696f32817430320dbf 100644 --- a/arch/arm64/tools/syscall_32.tbl +++ b/arch/arm64/tools/syscall_32.tbl @@ -478,3 +478,5 @@ 464common getxattrat sys_getxattrat 465common listxattrat sys_listxattrat 466common removexattrat sys_removexattrat +467common getfsxattratsys_getfsxattrat +468common setfsxattratsys_setfsxattrat diff --git a/arch/m68k/kernel/syscalls/
Re: [PATCH] mm/hugetlb: bring gigantic page allocation under hugepages_supported()
On Tue, 21 Jan 2025 20:34:19 +0530 Sourabh Jain wrote: > Despite having kernel arguments to enable gigantic hugepages, this > provides a way for the architecture to disable gigantic hugepages on the > fly, similar to what we do for hugepages. > > Components like fadump (PowerPC-specific) need this functionality to > disable gigantic hugepages when the kernel is booted solely to collect > the kernel core dump. > > Cc: Thomas Gleixner > Cc: Ingo Molnar > Cc: Borislav Petkov > Cc: Heiko Carstens > Cc: Vasily Gorbik > Cc: Muchun Song > Cc: Madhavan Srinivasan > Cc: Michael Ellerman > Cc: linux...@kvack.org > Cc: linux-ker...@vger.kernel.org > Cc: linuxppc-dev@lists.ozlabs.org > Signed-off-by: Sourabh Jain > --- > > To evaluate the impact of this change on architectures other than > PowerPC, I did the following analysis: > > For architectures where hugepages_supported() is not redefined, it > depends on HPAGE_SHIFT, which is found to be a constant. It is mostly > initialized to PMD_SHIFT. > > Architecture : HPAGE_SHIFT initialized with > > ARC: PMD_SHIFT (constant) > ARM: PMD_SHIFT (constant) > ARM64: PMD_SHIFT (constant) > Hexagon: 22 (constant) > LoongArch: (PAGE_SHIFT + PAGE_SHIFT - 3) (appears to be constant) > MIPS: (PAGE_SHIFT + PAGE_SHIFT - 3) (appears to be constant) > PARISC: PMD_SHIFT (appears to be constant) > RISC-V: PMD_SHIFT (constant) > SH: 16 | 18 | 20 | 22 | 26 (constant) > SPARC: 23 (constant) > > So seems like this change shouldn't have any impact on above > architectures. > > On the S390 and X86 architectures, hugepages_supported() is redefined, > and I am uncertain at what point it is safe to call > hugepages_supported(). For s390, hugepages_supported() checks EDAT1 machine flag, which is initialized long before any initcalls. So it is safe to be called here. My common code hugetlb skills got a little rusty, but shouldn't arch_hugetlb_valid_size() already prevent getting here for gigantic hugepages, in case they are not supported? And could you not use that for your purpose? BTW, please also add arch mailing list for such questions. > > --- > mm/hugetlb.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/mm/hugetlb.c b/mm/hugetlb.c > index cec4b121193f..48b42b8d26b4 100644 > --- a/mm/hugetlb.c > +++ b/mm/hugetlb.c > @@ -4629,7 +4629,7 @@ static int __init hugepages_setup(char *s) >* But we need to allocate gigantic hstates here early to still >* use the bootmem allocator. >*/ > - if (hugetlb_max_hstate && hstate_is_gigantic(parsed_hstate)) > + if (hugetlb_max_hstate && hstate_is_gigantic(parsed_hstate) && > hugepages_supported()) > hugetlb_hstate_alloc_pages(parsed_hstate); > > last_mhp = mhp;
Re: [PATCH 5.10] ibmvnic: Add tx check to prevent skb leak
On 1/20/2025 4:46 AM, Denis Arefev wrote: > From: Nick Child > > From: Nick Child > > commit 0983d288caf984de0202c66641577b739caad561 upstream. > > Below is a summary of how the driver stores a reference to an skb during > transmit: > tx_buff[free_map[consumer_index]]->skb = new_skb; > free_map[consumer_index] = IBMVNIC_INVALID_MAP; > consumer_index ++; > Where variable data looks like this: > free_map == [4, IBMVNIC_INVALID_MAP, IBMVNIC_INVALID_MAP, 0, 3] > consumer_index^ > tx_buff == [skb=null, skb=, skb=, skb=null, skb=null] > > The driver has checks to ensure that free_map[consumer_index] pointed to > a valid index but there was no check to ensure that this index pointed > to an unused/null skb address. So, if, by some chance, our free_map and > tx_buff lists become out of sync then we were previously risking an > skb memory leak. This could then cause tcp congestion control to stop > sending packets, eventually leading to ETIMEDOUT. > > Therefore, add a conditional to ensure that the skb address is null. If > not then warn the user (because this is still a bug that should be > patched) and free the old pointer to prevent memleak/tcp problems. > > Signed-off-by: Nick Child > Signed-off-by: Paolo Abeni > [Denis: minor fix to resolve merge conflict.] > Signed-off-by: Denis Arefev > --- I thought the process asked to have the stable tag, i.e. Cc: # 5.10.x Anyways, this looks good to me, and seems like a good candidate for backporting. Reviewed-by: Jacob Keller Thanks, Jake > Backport fix for CVE-2024-41066 > Link: https://nvd.nist.gov/vuln/detail/CVE-2024-41066 > --- > drivers/net/ethernet/ibm/ibmvnic.c | 12 > 1 file changed, 12 insertions(+) > > diff --git a/drivers/net/ethernet/ibm/ibmvnic.c > b/drivers/net/ethernet/ibm/ibmvnic.c > index 84da6ccaf339..439796975cbf 100644 > --- a/drivers/net/ethernet/ibm/ibmvnic.c > +++ b/drivers/net/ethernet/ibm/ibmvnic.c > @@ -1625,6 +1625,18 @@ static netdev_tx_t ibmvnic_xmit(struct sk_buff *skb, > struct net_device *netdev) > (tx_pool->consumer_index + 1) % tx_pool->num_buffers; > > tx_buff = &tx_pool->tx_buff[index]; > + > + /* Sanity checks on our free map to make sure it points to an index > + * that is not being occupied by another skb. If skb memory is > + * not freed then we see congestion control kick in and halt tx. > + */ > + if (unlikely(tx_buff->skb)) { > + dev_warn_ratelimited(dev, "TX free map points to untracked skb > (%s %d idx=%d)\n", > + skb_is_gso(skb) ? "tso_pool" : "tx_pool", > + queue_num, bufidx); > + dev_kfree_skb_any(tx_buff->skb); > + } > + > tx_buff->skb = skb; > tx_buff->data_dma[0] = data_dma_addr; > tx_buff->data_len[0] = skb->len;
Re: BUG: KASAN: vmalloc-out-of-bounds in copy_to_kernel_nofault+0xd8/0x1c8 (v6.13-rc6, PowerMac G4)
Le 22/01/2025 à 16:32, Christophe Leroy a écrit : Le 22/01/2025 à 00:21, Erhard Furtner a écrit : On Tue, 21 Jan 2025 23:07:25 +0100 Christophe Leroy wrote: Meanwhile I bisected the bug. Offending commit is: # git bisect good 32913f348229c9f72dda45fc2c08c6d9dfcd3d6d is the first bad commit commit 32913f348229c9f72dda45fc2c08c6d9dfcd3d6d Author: Linus Torvalds Date: Mon Dec 9 10:00:25 2024 -0800 futex: fix user access on powerpc The powerpc user access code is special, and unlike other architectures distinguishes between user access for reading and writing. And commit 43a43faf5376 ("futex: improve user space accesses") messed that up. It went undetected elsewhere, but caused ppc32 to fail early during boot, because the user access had been started with user_read_access_begin(), but then finished off with just a plain "user_access_end()". Note that the address-masking user access helpers don't even have that read-vs-write distinction, so if powerpc ever wants to do address masking tricks, we'll have to do some extra work for it. [ Make sure to also do it for the EFAULT case, as pointed out by Christophe Leroy ] Reported-by: Andreas Schwab Cc: Christophe Leroy Link: https://eur01.safelinks.protection.outlook.com/? url=https%3A%2F%2Flore.kernel.org%2Fall%2F87bjxl6b0i.fsf%40igel.home%2F&data=05%7C02%7Cchristophe.leroy%40csgroup.eu%7Cb4c1dc7184f54a410a0e08dd3a7270b6%7C8b87af7d86474dc78df45f69a2011bb5%7C0%7C0%7C638730985407902881%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=E5Yp9jopCPE1NFuBM8rs%2B1jXZ%2FXAaKvBGpcEP%2BaMyz0%3D&reserved=0 Signed-off-by: Linus Torvalds kernel/futex/futex.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) Indeed, reverting 32913f348229c9f72dda45fc2c08c6d9dfcd3d6d on top of v6.13 makes the KASAN hit disappear. That looks terribly odd. On G4, user_read_access_begin() and user_read_access_end() are no-op because book3s/32 can only protect user access by kernel against write. Read is always granted. So the bug must be an indirect side effect of what user_access_end() does. user_access_end() does a sync. Would the lack of sync (once replaced user_access_end() by user_read_access_end() ) lead to some odd re-ordering ? Or another possibility is that user_access_end() is called on some kernel address (I see in the description of commit 43a43faf5376 ("futex: improve user space accesses") that the replaced __get_user() was expected to work on kernel adresses) ? Calling user_access_begin() and user_access_end() is unexpected and there is no guard so it could lead to strange segment settings which hides a KASAN hit. But once the fix the issue the KASAN resurfaces ? Could this be the problem ? Do you have a way to reproduce the bug on QEMU ? It would enable me to investigate it further. Attached v6.13 .config plays nicely with qemu ttyS0 (forgot to disable SERIAL_8250 and set SERIAL_PMACZILOG + SERIAL_PMACZILOG_CONSOLE instead as I prefer the PCI Serial card in my G4). The KASAN hit also shows up on qemu 8.2.7 via via: qemu-system-ppc -machine mac99,via=pmu -cpu 7450 -m 2G -nographic - append console=ttyS0 -kernel vmlinux-6.13.0-PMacG4 -hda Debian-VM_g4.img I was able to reproduce it with v6.13 with QEMU when loading test_bpf module. On my side, the problem doesn't disappear when reverting of commit 32913f348229 ("futex: fix user access on powerpc") I bisected it to commit e4137f08816b ("mm, kasan, kmsan: instrument copy_from/to_kernel_nofault"), which makes a lot more sense to me. It might be a problem in the way patch_instruction() is implemented on powerpc, to be investigated. I think the problem is commit 37bc3e5fd764 ("powerpc/lib/code-patching: Use alternate map for patch_instruction()") Can you try the change below: diff --git a/arch/powerpc/lib/code-patching.c b/arch/powerpc/lib/code-patching.c index af97fbb3c257..8a378fc19074 100644 --- a/arch/powerpc/lib/code-patching.c +++ b/arch/powerpc/lib/code-patching.c @@ -108,7 +108,7 @@ static int text_area_cpu_up(unsigned int cpu) unsigned long addr; int err; - area = get_vm_area(PAGE_SIZE, VM_ALLOC); + area = get_vm_area(PAGE_SIZE, 0); if (!area) { WARN_ONCE(1, "Failed to create text area for cpu %d\n", cpu);
Re: [PATCH v2] powerpc: increase MIN RMA size for CAS negotiation
Hello Maddy, On 22/01/25 18:13, Madhavan Srinivasan wrote: On Fri, 06 Dec 2024 12:25:45 +0530, Avnish Chouhan wrote: Change RMA size from 512 MB to 768 MB which will result in more RMA at boot time for PowerPC. When PowerPC LPAR use/uses vTPM, Secure Boot or FADump, the 512 MB RMA memory is not sufficient for booting. With this 512 MB RMA, GRUB2 run out of memory and unable to load the necessary. Sometimes even usage of CDROM which requires more memory for installation along with the options mentioned above troubles the boot memory and result in boot failures. Increasing the RMA size will resolves multiple out of memory issues observed in PowerPC. [...] Applied to powerpc/next. [1/1] powerpc: increase MIN RMA size for CAS negotiation https://git.kernel.org/powerpc/c/ae908b87b6bb32c170e9baf5858f2a7553cacc06 The above changes break the fadump additional parameter for the HASH MMU. Here is the proposed fix: https://lore.kernel.org/all/20250120173501.1147236-1-sourabhj...@linux.ibm.com/ The proposed fix ensures that the memory reserved for the additional parameter for fadump on the HASH MMU does not overlap with GRUB. Please share your opinion on the fixes. Thanks, Sourabh Jain
Re: [PATCH v2 1/2] PCI/DPC: Run recovery on device that detected the error
On 11/12/24 5:54 AM, Shuai Xue wrote: The current implementation of pcie_do_recovery() assumes that the recovery process is executed on the device that detected the error. However, the DPC driver currently passes the error port that experienced the DPC event to pcie_do_recovery(). Use the SOURCE ID register to correctly identify the device that detected the error. By passing this error device to pcie_do_recovery(), subsequent patches will be able to accurately access AER status of the error device. When passing the error device, I assume pcie_do_recovery() will find the upstream bride and run the recovery logic . Signed-off-by: Shuai Xue --- IMO, moving the "err_port" rename to a separate patch will make this change more clear. But it is up to you. drivers/pci/pci.h | 2 +- drivers/pci/pcie/dpc.c | 30 -- drivers/pci/pcie/edr.c | 35 ++- 3 files changed, 43 insertions(+), 24 deletions(-) diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h index 14d00ce45bfa..0866f79aec54 100644 --- a/drivers/pci/pci.h +++ b/drivers/pci/pci.h @@ -521,7 +521,7 @@ struct rcec_ea { void pci_save_dpc_state(struct pci_dev *dev); void pci_restore_dpc_state(struct pci_dev *dev); void pci_dpc_init(struct pci_dev *pdev); -void dpc_process_error(struct pci_dev *pdev); +struct pci_dev *dpc_process_error(struct pci_dev *pdev); pci_ers_result_t dpc_reset_link(struct pci_dev *pdev); bool pci_dpc_recovered(struct pci_dev *pdev); #else diff --git a/drivers/pci/pcie/dpc.c b/drivers/pci/pcie/dpc.c index 2b6ef7efa3c1..62a68cde4364 100644 --- a/drivers/pci/pcie/dpc.c +++ b/drivers/pci/pcie/dpc.c @@ -257,10 +257,17 @@ static int dpc_get_aer_uncorrect_severity(struct pci_dev *dev, return 1; } -void dpc_process_error(struct pci_dev *pdev) +/** + * dpc_process_error - handle the DPC error status Handling the DPC error status has nothing to do with finding the error source. Why not add a new helper function? + * @pdev: the port that experienced the containment event + * + * Return the device that experienced the error. detected the error? + */ +struct pci_dev *dpc_process_error(struct pci_dev *pdev) { u16 cap = pdev->dpc_cap, status, source, reason, ext_reason; struct aer_err_info info; + struct pci_dev *err_dev = NULL; I don't think you need NULL initialization here. pci_read_config_word(pdev, cap + PCI_EXP_DPC_STATUS, &status); pci_read_config_word(pdev, cap + PCI_EXP_DPC_SOURCE_ID, &source); @@ -283,6 +290,13 @@ void dpc_process_error(struct pci_dev *pdev) "software trigger" : "reserved error"); + if (reason == PCI_EXP_DPC_STATUS_TRIGGER_RSN_NFE || + reason == PCI_EXP_DPC_STATUS_TRIGGER_RSN_FE) + err_dev = pci_get_domain_bus_and_slot(pci_domain_nr(pdev->bus), + PCI_BUS_NUM(source), source & 0xff); + else + err_dev = pci_dev_get(pdev); + /* show RP PIO error detail information */ if (pdev->dpc_rp_extensions && reason == PCI_EXP_DPC_STATUS_TRIGGER_RSN_IN_EXT && @@ -295,6 +309,8 @@ void dpc_process_error(struct pci_dev *pdev) pci_aer_clear_nonfatal_status(pdev); pci_aer_clear_fatal_status(pdev); } + + return err_dev; } static void pci_clear_surpdn_errors(struct pci_dev *pdev) @@ -350,21 +366,23 @@ static bool dpc_is_surprise_removal(struct pci_dev *pdev) static irqreturn_t dpc_handler(int irq, void *context) { - struct pci_dev *pdev = context; + struct pci_dev *err_port = context, *err_dev = NULL; NULL initialization is not needed. /* * According to PCIe r6.0 sec 6.7.6, errors are an expected side effect * of async removal and should be ignored by software. */ - if (dpc_is_surprise_removal(pdev)) { - dpc_handle_surprise_removal(pdev); + if (dpc_is_surprise_removal(err_port)) { + dpc_handle_surprise_removal(err_port); return IRQ_HANDLED; } - dpc_process_error(pdev); + err_dev = dpc_process_error(err_port); /* We configure DPC so it only triggers on ERR_FATAL */ - pcie_do_recovery(pdev, pci_channel_io_frozen, dpc_reset_link); + pcie_do_recovery(err_dev, pci_channel_io_frozen, dpc_reset_link); + + pci_dev_put(err_dev); return IRQ_HANDLED; } diff --git a/drivers/pci/pcie/edr.c b/drivers/pci/pcie/edr.c index e86298dbbcff..6ac95e5e001b 100644 --- a/drivers/pci/pcie/edr.c +++ b/drivers/pci/pcie/edr.c @@ -150,7 +150,7 @@ static int acpi_send_edr_status(struct pci_dev *pdev, struct pci_dev *edev, static void edr_handle_event(acpi_handle handle, u32 event, void *data) { - struct pci_dev *pdev = data, *edev; + struct pci_dev *pdev = data, *err_port, *err_dev = NULL; pci_ers_result_t estate = PCI_ERS_RESULT_D
Re: [PATCH] mm/hugetlb: bring gigantic page allocation under hugepages_supported()
Hello Gerald, On 22/01/25 19:36, Gerald Schaefer wrote: On Tue, 21 Jan 2025 20:34:19 +0530 Sourabh Jain wrote: Despite having kernel arguments to enable gigantic hugepages, this provides a way for the architecture to disable gigantic hugepages on the fly, similar to what we do for hugepages. Components like fadump (PowerPC-specific) need this functionality to disable gigantic hugepages when the kernel is booted solely to collect the kernel core dump. Cc: Thomas Gleixner Cc: Ingo Molnar Cc: Borislav Petkov Cc: Heiko Carstens Cc: Vasily Gorbik Cc: Muchun Song Cc: Madhavan Srinivasan Cc: Michael Ellerman Cc: linux...@kvack.org Cc: linux-ker...@vger.kernel.org Cc: linuxppc-dev@lists.ozlabs.org Signed-off-by: Sourabh Jain --- To evaluate the impact of this change on architectures other than PowerPC, I did the following analysis: For architectures where hugepages_supported() is not redefined, it depends on HPAGE_SHIFT, which is found to be a constant. It is mostly initialized to PMD_SHIFT. Architecture : HPAGE_SHIFT initialized with ARC: PMD_SHIFT (constant) ARM: PMD_SHIFT (constant) ARM64: PMD_SHIFT (constant) Hexagon: 22 (constant) LoongArch: (PAGE_SHIFT + PAGE_SHIFT - 3) (appears to be constant) MIPS: (PAGE_SHIFT + PAGE_SHIFT - 3) (appears to be constant) PARISC: PMD_SHIFT (appears to be constant) RISC-V: PMD_SHIFT (constant) SH: 16 | 18 | 20 | 22 | 26 (constant) SPARC: 23 (constant) So seems like this change shouldn't have any impact on above architectures. On the S390 and X86 architectures, hugepages_supported() is redefined, and I am uncertain at what point it is safe to call hugepages_supported(). For s390, hugepages_supported() checks EDAT1 machine flag, which is initialized long before any initcalls. So it is safe to be called here. Thanks for the info. My common code hugetlb skills got a little rusty, but shouldn't arch_hugetlb_valid_size() already prevent getting here for gigantic hugepages, in case they are not supported? And could you not use that for your purpose? Yes, handling this in arch_hugetlb_valid_size is even better. That way, we can avoid initializing data structures to hold hstate, which is not required anyway. Thanks for the review and suggestion. I will handle this in the architecture-specific code. - Sourabh Jain
Re: [PATCH 2/2] powerpc/fadump: fix additional param memory reservation for HASH MMU
On 2025-01-20 23:05:00 Mon, Sourabh Jain wrote: > Commit 683eab94da75bc ("powerpc/fadump: setup additional parameters for > dump capture kernel") introduced the additional parameter feature in > fadump for HASH MMU with the understanding that GRUB does not use the > memory area between 640MB and 768MB for its operation. > > However, the patch ("powerpc: increase MIN RMA size for CAS > negotiation") changes the MIN RMA size to 768MB, allowing GRUB to use > memory up to 768MB. This makes the fadump reservation for the additional > parameter feature for HASH MMU unreliable. I see. Please include above patch in single patch series, with this change as first patch. > > To address this, adjust the memory range for the additional parameter in > fadump for HASH MMU. This will ensure that GRUB does not overwrite the > memory reserved for fadump's additional parameter in HASH MMU. > > The new policy for the memory range for the additional parameter in HASH > MMU is that the first memory block must be larger than the MIN_RMA size, > as the bootloader can use memory up to the MIN_RMA size. The range > should be between MIN_RMA and the RMA size (ppc64_rma_size), and it must > not overlap with the fadump reserved area. > > Cc: Avnish Chouhan > Cc: Brian King > Cc: Hari Bathini > Cc: Madhavan Srinivasan > Cc: Michael Ellerman > Cc: Mahesh Salgaonkar > Signed-off-by: Sourabh Jain > --- > arch/powerpc/kernel/fadump.c | 21 +++-- > 1 file changed, 11 insertions(+), 10 deletions(-) > > diff --git a/arch/powerpc/kernel/fadump.c b/arch/powerpc/kernel/fadump.c > index 4b371c738213..5831f3ec8561 100644 > --- a/arch/powerpc/kernel/fadump.c > +++ b/arch/powerpc/kernel/fadump.c > @@ -33,6 +33,7 @@ > #include > #include > #include > +#include > > /* > * The CPU who acquired the lock to trigger the fadump crash should > @@ -1764,19 +1765,19 @@ void __init fadump_setup_param_area(void) > range_end = memblock_end_of_DRAM(); > } else { > /* > - * Passing additional parameters is supported for hash MMU only > - * if the first memory block size is 768MB or higher. > + * Memory range for passing additional parameters for HASH MMU > + * must meet the following conditions: > + * 1. The first memory block size must be higher than the > + *minimum RMA (MIN_RMA) size. Bootloader can use memory > + *up to RMA size. So it should be avoided. > + * 2. The range should be between MIN_RMA and RMA size > (ppc64_rma_size) > + * 3. It must not overlap with the fadump reserved area. >*/ > - if (ppc64_rma_size < 0x3000) > + if (ppc64_rma_size < MIN_RMA*1024*1024) > return; > > - /* > - * 640 MB to 768 MB is not used by PFW/bootloader. So, try > reserving > - * memory for passing additional parameters in this range to > avoid > - * being stomped on by PFW/bootloader. > - */ > - range_start = 0x2A00; > - range_end = range_start + 0x400; > + range_start = MIN_RMA * 1024 * 1024; > + range_end = min(ppc64_rma_size, fw_dump.boot_mem_top); Please update fadump documentation which makes this restriction clear. Rest looks good to me. Reviewed-by: Mahesh Salgaonkar Thanks, -Mahesh
Re: [PATCH v2 1/2] PCI/DPC: Run recovery on device that detected the error
在 2025/1/23 12:53, Sathyanarayanan Kuppuswamy 写道: On 11/12/24 5:54 AM, Shuai Xue wrote: The current implementation of pcie_do_recovery() assumes that the recovery process is executed on the device that detected the error. However, the DPC driver currently passes the error port that experienced the DPC event to pcie_do_recovery(). Use the SOURCE ID register to correctly identify the device that detected the error. By passing this error device to pcie_do_recovery(), subsequent patches will be able to accurately access AER status of the error device. When passing the error device, I assume pcie_do_recovery() will find the upstream bride and run the recovery logic . Yes, the pcie_do_recovery() will find the upstream bridge and walk bridges potentially AER affected. Signed-off-by: Shuai Xue --- IMO, moving the "err_port" rename to a separate patch will make this change more clear. But it is up to you. I see, I will add a separate patch. drivers/pci/pci.h | 2 +- drivers/pci/pcie/dpc.c | 30 -- drivers/pci/pcie/edr.c | 35 ++- 3 files changed, 43 insertions(+), 24 deletions(-) diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h index 14d00ce45bfa..0866f79aec54 100644 --- a/drivers/pci/pci.h +++ b/drivers/pci/pci.h @@ -521,7 +521,7 @@ struct rcec_ea { void pci_save_dpc_state(struct pci_dev *dev); void pci_restore_dpc_state(struct pci_dev *dev); void pci_dpc_init(struct pci_dev *pdev); -void dpc_process_error(struct pci_dev *pdev); +struct pci_dev *dpc_process_error(struct pci_dev *pdev); pci_ers_result_t dpc_reset_link(struct pci_dev *pdev); bool pci_dpc_recovered(struct pci_dev *pdev); #else diff --git a/drivers/pci/pcie/dpc.c b/drivers/pci/pcie/dpc.c index 2b6ef7efa3c1..62a68cde4364 100644 --- a/drivers/pci/pcie/dpc.c +++ b/drivers/pci/pcie/dpc.c @@ -257,10 +257,17 @@ static int dpc_get_aer_uncorrect_severity(struct pci_dev *dev, return 1; } -void dpc_process_error(struct pci_dev *pdev) +/** + * dpc_process_error - handle the DPC error status Handling the DPC error status has nothing to do with finding the error source. Why not add a new helper function? As PCIe Spec, DPC Error Source ID - When the DPC Trigger Reason field indicates that DPC was triggered due to the reception of an ERR_NONFATAL or ERR_FATAL, this register contains the Requester ID of the received Message. Otherwise, the value of this register is undefined. To find the error source, we need to - check the error reason from PCI_EXP_DPC_STATUS, - Identify the error device by PCI_EXP_DPC_SOURCE_ID for ERR_NONFATAL and ERR_FATAL reason. The code will duplicate with dpc_process_error. Therefore, I directly reused dpc_process_error. + * @pdev: the port that experienced the containment event + * + * Return the device that experienced the error. detected the error? Will change it. + */ +struct pci_dev *dpc_process_error(struct pci_dev *pdev) { u16 cap = pdev->dpc_cap, status, source, reason, ext_reason; struct aer_err_info info; +struct pci_dev *err_dev = NULL; I don't think you need NULL initialization here. Will remove it. pci_read_config_word(pdev, cap + PCI_EXP_DPC_STATUS, &status); pci_read_config_word(pdev, cap + PCI_EXP_DPC_SOURCE_ID, &source); @@ -283,6 +290,13 @@ void dpc_process_error(struct pci_dev *pdev) "software trigger" : "reserved error"); +if (reason == PCI_EXP_DPC_STATUS_TRIGGER_RSN_NFE || +reason == PCI_EXP_DPC_STATUS_TRIGGER_RSN_FE) +err_dev = pci_get_domain_bus_and_slot(pci_domain_nr(pdev->bus), +PCI_BUS_NUM(source), source & 0xff); +else +err_dev = pci_dev_get(pdev); + /* show RP PIO error detail information */ if (pdev->dpc_rp_extensions && reason == PCI_EXP_DPC_STATUS_TRIGGER_RSN_IN_EXT && @@ -295,6 +309,8 @@ void dpc_process_error(struct pci_dev *pdev) pci_aer_clear_nonfatal_status(pdev); pci_aer_clear_fatal_status(pdev); } + +return err_dev; } static void pci_clear_surpdn_errors(struct pci_dev *pdev) @@ -350,21 +366,23 @@ static bool dpc_is_surprise_removal(struct pci_dev *pdev) static irqreturn_t dpc_handler(int irq, void *context) { -struct pci_dev *pdev = context; +struct pci_dev *err_port = context, *err_dev = NULL; NULL initialization is not needed. Will remove it. Thanks for valuable comments. Best Regards, Shuai