Re: [PATCH 2/5] uaccess: Add speculation barrier to copy_from_user_iter()
On Mon, 23 Jun 2025 at 22:49, Christophe Leroy wrote: > > > > > (Although I also suspect that when we added ITER_UBUF we might have > > created cases where those user addresses aren't checked at iter > > creation time any more). > > > > Let's take the follow path as an exemple: > > snd_pcm_ioctl(SNDRV_PCM_IOCTL_WRITEI_FRAMES) >snd_pcm_common_ioctl() > snd_pcm_xferi_frames_ioctl() >snd_pcm_lib_write() > __snd_pcm_lib_xfer() >default_write_copy() > copy_from_iter() >_copy_from_iter() > __copy_from_iter() >iterate_and_advance() > iterate_and_advance2() >iterate_iovec() > copy_from_user_iter() > > As far as I can see, none of those functions check the accessibility of > the iovec. Am I missing something ? So we still to do this checking at creation time (see import_iovec -> __import_iovec, and import_ubuf). In the path you give as an example, the check happens at that "do_transfer()" stage when it does err = import_ubuf(type, (__force void __user *)data, bytes, &iter); but yeah, it's very non-obvious (see __snd_pcm_lib_xfer(), which calls writer() which is either interleaved_copy() or noninterleaved_copy(), and then they do that do_transfer() thing which does that import_ubuf() thing. So *because* you were supposed to have checked your iov_iters beforehand, the actual iter code itself at some point just used __copy_to_user() directly with no checking at all. And that all was really *much* too subtle, and Al fixed this a few years ago (see commit 09fc68dc66f7: "iov_iter: saner checks on copyin/copyout") Linus
Re: [PATCH 0/5] powerpc: Implement masked user access
Hi! On Tue, Jun 24, 2025 at 05:50:01PM +0100, David Laight wrote: > On Tue, 24 Jun 2025 08:17:14 -0500 > Segher Boessenkool wrote: > > > On Tue, Jun 24, 2025 at 07:27:47AM +0200, Christophe Leroy wrote: > > > Ah ok, I overlooked that, I didn't know the cmove instruction, seem > > > similar to the isel instruction on powerpc e500. > > > > cmove does a move (register or memory) when some condition is true. > > The destination of x86 'cmov' is always a register (only the source can be > memory - an is probably always read). Both source operands can be mem, right? But probably not both at the same time. > It is a also a computational instruction. Terminology... x86 is not a RISC architecture, or more generally, a load/store architecture. A computational instruction is one that doesn't touch memory or does a branch, or some system function, some supervisor or hypervisor instruction maybe. x86 does not have many computational insns, most insns can touch memory :-) (The important thing is that most computational insns do not ever cause exceptions, the only exceptions are if you divide by zero or similar :-) ) > It may well always do the register write - hard to detect. > > There is a planned new instruction that would do a conditional write > to memory - but not on any cpu yet. Interesting! Instructions like the atomic store insns we got for p9, maybe? They can do minimum/maximum and various kinds of more generic reductions and similar. > > isel (which is base PowerPC, not something "e500" only) is a > > computational instruction, it copies one of two registers to a third, > > which of the two is decided by any bit in the condition register. > > Does that mean it could be used for all the ppc cpu variants? No, only things that implement architecture version of 2.03 or later. That is from 2006, so essentially everything that is still made implements it :-) But ancient things do not. Both 970 (Apple G5) and Cell BE do not yet have it (they are ISA 2.01 and 2.02 respectively). And the older p5's do not have it yet either, but the newer ones do. And all classic PowerPC is ISA 1.xx of course. Medieval CPUs :-) > > But sure, seen from very far off both isel and cmove can be used to > > implement the ternary operator ("?:"), are similar in that way :-) > > Which is exactly what you want to avoid speculation. There are cheaper / simpler / more effective / better ways to get that, but sure, everything is better than a conditional branch, always :-) Segher
Re: [PATCH 0/5] powerpc: Implement masked user access
On Tue, 24 Jun 2025 13:25:05 -0500 Segher Boessenkool wrote: > Hi! > > On Tue, Jun 24, 2025 at 05:50:01PM +0100, David Laight wrote: > > On Tue, 24 Jun 2025 08:17:14 -0500 > > Segher Boessenkool wrote: > > > > > On Tue, Jun 24, 2025 at 07:27:47AM +0200, Christophe Leroy wrote: > > > > Ah ok, I overlooked that, I didn't know the cmove instruction, seem > > > > similar to the isel instruction on powerpc e500. > > > > > > cmove does a move (register or memory) when some condition is true. > > > > The destination of x86 'cmov' is always a register (only the source can be > > memory - and is probably always read). > > Both source operands can be mem, right? But probably not both at the > same time. It only has one 'real' source, but the implementation could easily read the destination register and then decide which value to write back - rather than doing a conditional write to the register file. A conditional write would be a right PITA for the alu result forwarding logic > > > It is a also a computational instruction. > > Terminology... > > x86 is not a RISC architecture, or more generally, a load/store > architecture. It sort of is these days. The memory transfers are separate u-ops, so a 'reg += mem' instruction is split into two be the decoder. Although some u-ops get merged together and executed in one clock, obvious example is some 'compare+branch' pairs. > A computational instruction is one that doesn't touch memory or does a > branch, or some system function, some supervisor or hypervisor > instruction maybe. > > x86 does not have many computational insns, most insns can touch > memory :-) Except that the memory 'bit' is executed separately from any alu 'stuff'. So for a 'reg += mem' instruction the memory read can be started as soon as the registers that contain the address are valid, the 'add' requires the memory read have completed and the instruction that generated the old value of 'reg' have completed - which could be waiting on all sorts of things (like a divide). Once both values are ready the 'add' can be executed (provided a suitable alu is available). > (The important thing is that most computational insns do not ever cause > exceptions, the only exceptions are if you divide by zero or > similar :-) ) > > > It may well always do the register write - hard to detect. > > > > There is a planned new instruction that would do a conditional write > > to memory - but not on any cpu yet. > > Interesting! Instructions like the atomic store insns we got for p9, > maybe? They can do minimum/maximum and various kinds of more generic > reductions and similar. I think they are only conditional stores. But they do save a conditional branch. A late disable of a memory write is far less problematic than a disabled register file write. No one minds (too much) about slight delays between writes and reads of the same location (reduced by a store to load forwarder) but you don't want to lose clocks between adjacent simple alu instructions. For my sins I re-implemented a soft cpu last year... Which doesn't have a 'cmov' :-( > > > > isel (which is base PowerPC, not something "e500" only) is a > > > computational instruction, it copies one of two registers to a third, > > > which of the two is decided by any bit in the condition register. > > > > Does that mean it could be used for all the ppc cpu variants? > > No, only things that implement architecture version of 2.03 or later. > That is from 2006, so essentially everything that is still made > implements it :-) > > But ancient things do not. Both 970 (Apple G5) and Cell BE do not yet > have it (they are ISA 2.01 and 2.02 respectively). And the older p5's > do not have it yet either, but the newer ones do. > > And all classic PowerPC is ISA 1.xx of course. Medieval CPUs :-) That make more sense than the list in patch 5/5. > > > > But sure, seen from very far off both isel and cmove can be used to > > > implement the ternary operator ("?:"), are similar in that way :-) > > > > Which is exactly what you want to avoid speculation. > > There are cheaper / simpler / more effective / better ways to get that, > but sure, everything is better than a conditional branch, always :-) Everything except a TLB miss :-) And for access_ok() avoiding the conditional is a good enough reason to use a 'conditional move' instruction. Avoiding speculation is actually free. > > > Segher
Re: [PATCH 0/5] powerpc: Implement masked user access
Hi! On Tue, Jun 24, 2025 at 09:32:58AM +0100, David Laight wrote: > > So GCC uses the 'unlikely' variant of the branch instruction to force > > the correct prediction, doesn't it ? > > Nope... > Most architectures don't have likely/unlikely variants of branches. In GCC, "likely" means 80%. "Very likely" means 99.95%. Most things get something more appropriate than such coarse things predicted. Most of the time GCC uses these predicted branch probabilities to lay out code in such a way that the fall-through path is the expected one. Target backends can do special things with it as well, but usually that isn't necessary. There are many different predictors. GCC usually can predict things not bad by just looking at the shape of the code, using various heuristics. Things like profile-guided optimisation allow to use a profile from an actual execution to optimise the code such that it will work faster (assuming that future executions of the code will execute similarly!) You also can use __builtin_expect() in the source code, to put coarse static prediction in. That is what the kernel "{un,}likely" macros do. If the compiler knows some branch is not very predictable, it can optimise the code knowing that. Like, it could use other strategies than conditional branches. On old CPUs something like "this branch is taken 50% of the time" makes it a totally unpredictable branch. But if say it branches exactly every second time, it is 100% predicted correctly by more advanced predictors, not just 50%. To properly model modern branch predictors we need to record a "how predictable is this branch" score as well for every branch, not just a "how often does it branch instead of falling through" score. We're not there yet. Segher
[powerpc:merge] BUILD SUCCESS c8e9cecda99751688799297fd00694d73bfd3197
tree/branch: https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git merge branch HEAD: c8e9cecda99751688799297fd00694d73bfd3197 Automatic merge of 'next' into merge (2025-06-23 18:21) elapsed time: 1453m configs tested: 62 configs skipped: 1 The following configs have been built successfully. More configs may be tested in the coming days. tested configs: alpha allnoconfiggcc-15.1.0 alphaallyesconfiggcc-15.1.0 arc allnoconfiggcc-15.1.0 arc randconfig-001-20250624gcc-12.4.0 arc randconfig-002-20250624gcc-8.5.0 arm allnoconfigclang-21 arm randconfig-001-20250624gcc-13.3.0 arm randconfig-002-20250624gcc-8.5.0 arm randconfig-003-20250624gcc-12.4.0 arm randconfig-004-20250624clang-17 arm64 allnoconfiggcc-15.1.0 arm64 randconfig-001-20250624clang-21 arm64 randconfig-002-20250624gcc-10.5.0 arm64 randconfig-003-20250624clang-21 arm64 randconfig-004-20250624clang-21 csky allnoconfiggcc-15.1.0 hexagon allmodconfigclang-17 hexagon allnoconfigclang-21 hexagon allyesconfigclang-21 i386 allmodconfiggcc-12 i386 allnoconfiggcc-12 i386 allyesconfiggcc-12 i386buildonly-randconfig-001-20250624clang-20 i386buildonly-randconfig-002-20250624gcc-12 i386buildonly-randconfig-003-20250624clang-20 i386buildonly-randconfig-004-20250624clang-20 i386buildonly-randconfig-005-20250624clang-20 i386buildonly-randconfig-006-20250624gcc-12 i386defconfigclang-20 loongarch allnoconfiggcc-15.1.0 m68k allmodconfiggcc-15.1.0 m68k allnoconfiggcc-15.1.0 m68k allyesconfiggcc-15.1.0 microblazeallnoconfiggcc-15.1.0 mips allnoconfiggcc-15.1.0 nios2 allnoconfiggcc-14.2.0 openrisc allnoconfiggcc-15.1.0 pariscallnoconfiggcc-15.1.0 powerpc allnoconfiggcc-15.1.0 riscv allnoconfiggcc-15.1.0 s390 allmodconfigclang-18 s390 allnoconfigclang-21 s390 allyesconfiggcc-15.1.0 sh allmodconfiggcc-15.1.0 shallnoconfiggcc-15.1.0 sh allyesconfiggcc-15.1.0 sparcallmodconfiggcc-15.1.0 sparc allnoconfiggcc-15.1.0 um allmodconfigclang-19 umallnoconfigclang-21 um allyesconfiggcc-12 x86_64allnoconfigclang-20 x86_64 allyesconfigclang-20 x86_64 buildonly-randconfig-001-20250624clang-20 x86_64 buildonly-randconfig-002-20250624gcc-12 x86_64 buildonly-randconfig-003-20250624clang-20 x86_64 buildonly-randconfig-004-20250624clang-20 x86_64 buildonly-randconfig-005-20250624clang-20 x86_64 buildonly-randconfig-006-20250624gcc-12 x86_64 defconfiggcc-11 x86_64 rhel-9.4-rustclang-18 xtensaallnoconfiggcc-15.1.0 -- 0-DAY CI Kernel Test Service https://github.com/intel/lkp-tests/wiki
Re: [PATCH 3/3] mm: update architecture and driver code to use vm_flags_t
On Wed, Jun 18, 2025 at 08:42:54PM +0100, Lorenzo Stoakes wrote: > --- a/arch/x86/kernel/cpu/sgx/encl.c > +++ b/arch/x86/kernel/cpu/sgx/encl.c > @@ -279,7 +279,7 @@ static struct sgx_encl_page *__sgx_encl_load_page(struct > sgx_encl *encl, > > static struct sgx_encl_page *sgx_encl_load_page_in_vma(struct sgx_encl *encl, > unsigned long addr, > -unsigned long vm_flags) > +vm_flags_t vm_flags) > { > unsigned long vm_prot_bits = vm_flags & VM_ACCESS_FLAGS; > struct sgx_encl_page *entry; > @@ -520,9 +520,9 @@ static void sgx_vma_open(struct vm_area_struct *vma) > * Return: 0 on success, -EACCES otherwise > */ > int sgx_encl_may_map(struct sgx_encl *encl, unsigned long start, > - unsigned long end, unsigned long vm_flags) > + unsigned long end, vm_flags_t vm_flags) > { > - unsigned long vm_prot_bits = vm_flags & VM_ACCESS_FLAGS; > + vm_flags_t vm_prot_bits = vm_flags & VM_ACCESS_FLAGS; > struct sgx_encl_page *page; > unsigned long count = 0; > int ret = 0; > @@ -605,7 +605,7 @@ static int sgx_encl_debug_write(struct sgx_encl *encl, > struct sgx_encl_page *pag > */ > static struct sgx_encl_page *sgx_encl_reserve_page(struct sgx_encl *encl, > unsigned long addr, > -unsigned long vm_flags) > +vm_flags_t vm_flags) > { > struct sgx_encl_page *entry; > > diff --git a/arch/x86/kernel/cpu/sgx/encl.h b/arch/x86/kernel/cpu/sgx/encl.h > index f94ff14c9486..8ff47f6652b9 100644 > --- a/arch/x86/kernel/cpu/sgx/encl.h > +++ b/arch/x86/kernel/cpu/sgx/encl.h > @@ -101,7 +101,7 @@ static inline int sgx_encl_find(struct mm_struct *mm, > unsigned long addr, > } > > int sgx_encl_may_map(struct sgx_encl *encl, unsigned long start, > - unsigned long end, unsigned long vm_flags); > + unsigned long end, vm_flags_t vm_flags); > > bool current_is_ksgxd(void); > void sgx_encl_release(struct kref *ref); Reviewed-by: Jarkko Sakkinen BR, Jarkko
Re: [PATCH v2 6/6] pci/hotplug/pnv_php: Enable third attention indicator
On Wed, Jun 18, 2025 at 07:37:54PM -0500, Timothy Pearson wrote: > - Original Message - > > From: "Bjorn Helgaas" > > To: "Timothy Pearson" > > Cc: "linuxppc-dev" , "linux-kernel" > > , "linux-pci" > > , "Madhavan Srinivasan" , > > "Michael Ellerman" , > > "christophe leroy" , "Naveen N Rao" > > , "Bjorn Helgaas" > > , "Shawn Anastasio" > > Sent: Wednesday, June 18, 2025 2:01:46 PM > > Subject: Re: [PATCH v2 6/6] pci/hotplug/pnv_php: Enable third attention > > indicator > > > On Wed, Jun 18, 2025 at 11:58:59AM -0500, Timothy Pearson wrote: > >> state > > > > Weird wrapping of last word of subject to here. > > I'll need to see what's up with my git format-patch setup. Apologies > for that across the multiple series. No worries. If you can figure out how to make your mailer use the normal "On xxx, somebody wrote:" attribution instead of duplicating all those headers, that would be far more useful :) > >> +static int pnv_php_get_raw_indicator_status(struct hotplug_slot *slot, u8 > >> *state) > >> +{ > >> + struct pnv_php_slot *php_slot = to_pnv_php_slot(slot); > >> + struct pci_dev *bridge = php_slot->pdev; > >> + u16 status; > >> + > >> + pcie_capability_read_word(bridge, PCI_EXP_SLTCTL, &status); > >> + *state = (status & (PCI_EXP_SLTCTL_AIC | PCI_EXP_SLTCTL_PIC)) >> 6; > > > > Should be able to do this with FIELD_GET(). > > I used the same overall structure as the pciehp_hpc driver here. Do > you want me to also fix up that driver with FIELD_GET()? Nope, I think it's fine to keep this looking like pciehp for now. If somebody wants to use FIELD_GET() in pciehp, I'd probably be OK with that, but no need for you to open that can of worms. > > Is the PCI_EXP_SLTCTL_PIC part needed? It wasn't there before, commit > > log doesn't mention it, and as far as I can tell, this would be the > > only driver to do that. Most expose only the attention status (0=off, > > 1=on, 2=identify/blink). > > > >> + return 0; > >> +} > >> + > >> + > >> static int pnv_php_get_attention_state(struct hotplug_slot *slot, u8 > >> *state) > >> { > >>struct pnv_php_slot *php_slot = to_pnv_php_slot(slot); > >> > >> + pnv_php_get_raw_indicator_status(slot, &php_slot->attention_state); > > > > This is a change worth noting. Previously we didn't read the AIC > > state from PCI_EXP_SLTCTL at all; we used php_slot->attention_state to > > keep track of whatever had been previously set via > > pnv_php_set_attention_state(). > > > > Now we read the current state from PCI_EXP_SLTCTL. It's not clear > > that php_slot->attention_state is still needed at all. > > It probably isn't. It's unclear why IBM took this path at all, > given pciehp's attention handlers predate pnv-php's by many years. > > > Previously, the user could write any value at all to the sysfs > > "attention" file and then read that same value back. After this > > patch, the user can still write anything, but reads will only return > > values with PCI_EXP_SLTCTL_AIC and PCI_EXP_SLTCTL_PIC. > > > >>*state = php_slot->attention_state; > >>return 0; > >> } > >> @@ -461,7 +474,7 @@ static int pnv_php_set_attention_state(struct > >> hotplug_slot > >> *slot, u8 state) > >>mask = PCI_EXP_SLTCTL_AIC; > >> > >>if (state) > >> - new = PCI_EXP_SLTCTL_ATTN_IND_ON; > >> + new = FIELD_PREP(PCI_EXP_SLTCTL_AIC, state); > > > > This changes the behavior in some cases: > > > > write 0: previously turned indicator off, now writes reserved value > > write 2: previously turned indicator on, now sets to blink > > write 3: previously turned indicator on, now turns it off > > If we're looking at normalizing with pciehp with an eye toward > eventually deprecating / removing pnv-php, I can't think of a better > time to change this behavior. I suspect we're the only major user > of this code path at the moment, with most software expecting to see > pciehp-style handling. Thoughts? I'm OK with changing this, but I do think it would be worth calling out the different behavior in the commit log. Bjorn
Re: [PATCH 0/5] powerpc: Implement masked user access
On Tue, Jun 24, 2025 at 07:27:47AM +0200, Christophe Leroy wrote: > Ah ok, I overlooked that, I didn't know the cmove instruction, seem > similar to the isel instruction on powerpc e500. cmove does a move (register or memory) when some condition is true. isel (which is base PowerPC, not something "e500" only) is a computational instruction, it copies one of two registers to a third, which of the two is decided by any bit in the condition register. But sure, seen from very far off both isel and cmove can be used to implemente the ternary operator ("?:"), are similar in that way :-) Segher
Re: [PATCH v2 6/6] pci/hotplug/pnv_php: Enable third attention indicator
- Original Message - > From: "Krishna Kumar" > To: "Timothy Pearson" > Cc: "linuxppc-dev" , "Shawn Anastasio" > , "linux-kernel" > , "linux-pci" , > "Madhavan Srinivasan" , > "Michael Ellerman" , "christophe leroy" > , "Naveen N Rao" > , "Bjorn Helgaas" > Sent: Tuesday, June 24, 2025 2:07:30 AM > Subject: Re: [PATCH v2 6/6] pci/hotplug/pnv_php: Enable third attention > indicator > On 6/21/25 8:35 PM, Timothy Pearson wrote: >> >> - Original Message - >>> From: "Krishna Kumar" >>> To: "linuxppc-dev" , "Timothy Pearson" >>> , "Shawn >>> Anastasio" >>> Cc: "linuxppc-dev" , "linux-kernel" >>> , "linux-pci" >>> , "Madhavan Srinivasan" , >>> "Michael Ellerman" , >>> "christophe leroy" , "Naveen N Rao" >>> , "Bjorn Helgaas" >>> , "Shawn Anastasio" >>> Sent: Friday, June 20, 2025 4:26:51 AM >>> Subject: Re: [PATCH v2 6/6] pci/hotplug/pnv_php: Enable third attention >>> indicator >>> Shawn, Timothy: >>> >>> Thanks for posting the series of patch. Few things I wanted to do better >>> understand Raptor problem - >>> >>> >>> 1. Last time my two patches solved all the hotunplug operation and Kernel >>> crash >>> issue except nvme case. It did not work with >>> >>> NVME since dpc support was not there. I was not able to do that due to >>> being >>> occupied in some other work. >> With the current series all hotplug is working correctly, including not only >> NVMe on root port and bridge ports, but also suprise plug of the entire PCIe >> switch at the root port. The lack of DPC support *might* be related to the >> PE >> freeze, but in any case we prefer the hotplug driver to be able to recover >> from >> a PE freeze (e.g. if a bridge card is faulty and needs to be replaced) >> without >> also requiring a reboot, so I would consider DPC implementation orthogonal to >> this patch set. > Sounds Good !! >> >>> 2. I want to understand the delta from last yr problem to this problem. Is >>> the >>> PHB freeze or hotunplug failure happens >>> >>> only for particular Microsemi switch or it happens with all the >>> switches. When >>> did this problem started coming ? Till last yr >> Hotplug has never worked reliably for us, if it worked at all it was always >> rolling the dice on whether the kernel would oops and take down the host. >> Even >> if the kernel didn't oops, suprise plug and auto-add / auto-remove never >> worked >> beyond one remove operation. > I would like to see this problem may be during our zoom/teams meeting. Though > I > have not tested surprise plug/unplug and only tested via sysfs, you may be > correct but I want to have a look of this problem. >> >>> it was not there. Is it specific to particular Hardware ? Can I get >>> your setup >>> to test this problem and your patch ? >> Because you will need to be able to physically plug and unplug cards and >> drives >> this may be a bit tricky. Do you have access to a POWER9 host system with a >> x16 PCIe slot? If so, all you need is a Supermicro SLC-AO3G-8E2P card and >> some >> random U.2 NVMe drives -- these cards are readily available and provide >> relatively standardized OCuLink access to a Switchtec bridge. >> >> If you don't have access to a POWER9 host, we can set you up with remote >> access, >> but it won't show all of the crashing and problems that occur with surprise >> plug unless we set up a live debug session (video call or similar). > > > Video Call should be fine. During the call I will have a look of existing > problem and fix along with driver/kernel logs. Sounds good. We'll set up a machine in the DMZ for this session so you can also have access. For anyone interested in logging on to the box for logs, can you send over an SSH public key to my Email address directly? Will get everyone added with root access to the test box prior to the call start. >> >>> 3. To me, hot unplug opertaion --> AER triggering --> DPC support, this >>> flow >>> should mask the error to reach root port/cpu and it >>> >>> should solve the PHB freeze/ hot unplug failure operation. If there are >>> AER/EEH >>> related synchronization issue we need to solve them. >>> >>> Can you pls list the issue, I will pass it to EEH/AER team. But yeah, >>> to me if >>> AER implementation is correct and we add DPC support, >>> >>> all the error will be contained by switch itself. The PHB/root port/cpu >>> will not >>> be impacted by this and there should not be any freeze. >> While this is a good goal to work toward, it only solves one possible fault >> mode. The patch series posted here will handle the general case of a PE >> freeze >> without requiring a host reboot, which is great for high-reliability systems >> where there might be a desire to replace the entire switch card (this has >> been >> tested with the patch series and works perfectly). > > > You may be correct on this and this is possible. If the driver and AER/EEH > errors/events are properly > > handled then
Re: [PATCH 0/5] powerpc: Implement masked user access
On Tue, 24 Jun 2025 08:17:14 -0500 Segher Boessenkool wrote: > On Tue, Jun 24, 2025 at 07:27:47AM +0200, Christophe Leroy wrote: > > Ah ok, I overlooked that, I didn't know the cmove instruction, seem > > similar to the isel instruction on powerpc e500. > > cmove does a move (register or memory) when some condition is true. The destination of x86 'cmov' is always a register (only the source can be memory - an is probably always read). It is a also a computational instruction. It may well always do the register write - hard to detect. There is a planned new instruction that would do a conditional write to memory - but not on any cpu yet. > isel (which is base PowerPC, not something "e500" only) is a > computational instruction, it copies one of two registers to a third, > which of the two is decided by any bit in the condition register. Does that mean it could be used for all the ppc cpu variants? > But sure, seen from very far off both isel and cmove can be used to > implement the ternary operator ("?:"), are similar in that way :-) Which is exactly what you want to avoid speculation. David > > > Segher
[PATCH] usb: gadget: fsl: Use usb_endpoint_type() rather than duplicating its implementation
From: Markus Elfring Date: Tue, 24 Jun 2025 16:33:53 +0200 Reuse existing functionality from usb_endpoint_type() instead of keeping duplicate source code. The source code was transformed by using the Coccinelle software. Signed-off-by: Markus Elfring --- drivers/usb/gadget/udc/fsl_qe_udc.c | 4 ++-- drivers/usb/gadget/udc/fsl_udc_core.c | 6 ++ 2 files changed, 4 insertions(+), 6 deletions(-) diff --git a/drivers/usb/gadget/udc/fsl_qe_udc.c b/drivers/usb/gadget/udc/fsl_qe_udc.c index aacfde06387c..6ee3da32cc4e 100644 --- a/drivers/usb/gadget/udc/fsl_qe_udc.c +++ b/drivers/usb/gadget/udc/fsl_qe_udc.c @@ -533,7 +533,7 @@ static int qe_ep_init(struct qe_udc *udc, /* Refer to USB2.0 spec table 9-13, */ if (pipe_num != 0) { - switch (desc->bmAttributes & USB_ENDPOINT_XFERTYPE_MASK) { + switch (usb_endpoint_type(desc)) { case USB_ENDPOINT_XFER_BULK: if (strstr(ep->ep.name, "-iso") || strstr(ep->ep.name, "-int")) @@ -636,7 +636,7 @@ static int qe_ep_init(struct qe_udc *udc, /* initialize ep structure */ ep->ep.maxpacket = max; - ep->tm = (u8)(desc->bmAttributes & USB_ENDPOINT_XFERTYPE_MASK); + ep->tm = (u8) usb_endpoint_type(desc); ep->ep.desc = desc; ep->stopped = 0; ep->init = 1; diff --git a/drivers/usb/gadget/udc/fsl_udc_core.c b/drivers/usb/gadget/udc/fsl_udc_core.c index 4dea8bc30cf6..19c74ba82e16 100644 --- a/drivers/usb/gadget/udc/fsl_udc_core.c +++ b/drivers/usb/gadget/udc/fsl_udc_core.c @@ -599,16 +599,14 @@ static int fsl_ep_enable(struct usb_ep *_ep, struct_ep_qh_setup(udc, (unsigned char) ep_index(ep), (unsigned char) ((desc->bEndpointAddress & USB_DIR_IN) ? USB_SEND : USB_RECV), - (unsigned char) (desc->bmAttributes - & USB_ENDPOINT_XFERTYPE_MASK), + (unsigned char) usb_endpoint_type(desc), max, zlt, mult); /* Init endpoint ctrl register */ dr_ep_setup((unsigned char) ep_index(ep), (unsigned char) ((desc->bEndpointAddress & USB_DIR_IN) ? USB_SEND : USB_RECV), - (unsigned char) (desc->bmAttributes - & USB_ENDPOINT_XFERTYPE_MASK)); + (unsigned char) usb_endpoint_type(desc)); spin_unlock_irqrestore(&udc->lock, flags); retval = 0; -- 2.50.0
[PATCH v15 01/13] arm64/mm: Add addr parameter to __set_{ptes_anysz,ptes,pmds,puds}()
To provide support for page table check on powerpc, we need to reinstate the address parameter in several functions, including page_table_check_{ptes,pmds,puds}_set(). In preparation for this, add the addr parameter to arm64's __set_ptes_anysz() and its callers, __set_ptes(), __set_pmds() and __set_puds(). While this parameter won't (at present) be used on arm64, this will keep the usage of the page table check interfaces consistent. Signed-off-by: Andrew Donnellan --- v15: new patch --- arch/arm64/include/asm/pgtable.h | 19 --- arch/arm64/mm/hugetlbpage.c | 8 2 files changed, 12 insertions(+), 15 deletions(-) diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h index 192d86e1cc76..acbcb5e883ce 100644 --- a/arch/arm64/include/asm/pgtable.h +++ b/arch/arm64/include/asm/pgtable.h @@ -712,8 +712,8 @@ static inline pgprot_t pud_pgprot(pud_t pud) return __pgprot(pud_val(pfn_pud(pfn, __pgprot(0))) ^ pud_val(pud)); } -static inline void __set_ptes_anysz(struct mm_struct *mm, pte_t *ptep, - pte_t pte, unsigned int nr, +static inline void __set_ptes_anysz(struct mm_struct *mm, unsigned long addr, + pte_t *ptep, pte_t pte, unsigned int nr, unsigned long pgsize) { unsigned long stride = pgsize >> PAGE_SHIFT; @@ -748,26 +748,23 @@ static inline void __set_ptes_anysz(struct mm_struct *mm, pte_t *ptep, __set_pte_complete(pte); } -static inline void __set_ptes(struct mm_struct *mm, - unsigned long __always_unused addr, +static inline void __set_ptes(struct mm_struct *mm, unsigned long addr, pte_t *ptep, pte_t pte, unsigned int nr) { - __set_ptes_anysz(mm, ptep, pte, nr, PAGE_SIZE); + __set_ptes_anysz(mm, addr, ptep, pte, nr, PAGE_SIZE); } -static inline void __set_pmds(struct mm_struct *mm, - unsigned long __always_unused addr, +static inline void __set_pmds(struct mm_struct *mm, unsigned long addr, pmd_t *pmdp, pmd_t pmd, unsigned int nr) { - __set_ptes_anysz(mm, (pte_t *)pmdp, pmd_pte(pmd), nr, PMD_SIZE); + __set_ptes_anysz(mm, addr, (pte_t *)pmdp, pmd_pte(pmd), nr, PMD_SIZE); } #define set_pmd_at(mm, addr, pmdp, pmd) __set_pmds(mm, addr, pmdp, pmd, 1) -static inline void __set_puds(struct mm_struct *mm, - unsigned long __always_unused addr, +static inline void __set_puds(struct mm_struct *mm, unsigned long addr, pud_t *pudp, pud_t pud, unsigned int nr) { - __set_ptes_anysz(mm, (pte_t *)pudp, pud_pte(pud), nr, PUD_SIZE); + __set_ptes_anysz(mm, addr, (pte_t *)pudp, pud_pte(pud), nr, PUD_SIZE); } #define set_pud_at(mm, addr, pudp, pud) __set_puds(mm, addr, pudp, pud, 1) diff --git a/arch/arm64/mm/hugetlbpage.c b/arch/arm64/mm/hugetlbpage.c index 0c8737f4f2ce..1003b5020752 100644 --- a/arch/arm64/mm/hugetlbpage.c +++ b/arch/arm64/mm/hugetlbpage.c @@ -226,7 +226,7 @@ void set_huge_pte_at(struct mm_struct *mm, unsigned long addr, if (!pte_present(pte)) { for (i = 0; i < ncontig; i++, ptep++, addr += pgsize) - __set_ptes_anysz(mm, ptep, pte, 1, pgsize); + __set_ptes_anysz(mm, addr, ptep, pte, 1, pgsize); return; } @@ -234,7 +234,7 @@ void set_huge_pte_at(struct mm_struct *mm, unsigned long addr, if (pte_cont(pte) && pte_valid(__ptep_get(ptep))) clear_flush(mm, addr, ptep, pgsize, ncontig); - __set_ptes_anysz(mm, ptep, pte, ncontig, pgsize); + __set_ptes_anysz(mm, addr, ptep, pte, ncontig, pgsize); } pte_t *huge_pte_alloc(struct mm_struct *mm, struct vm_area_struct *vma, @@ -449,7 +449,7 @@ int huge_ptep_set_access_flags(struct vm_area_struct *vma, if (pte_young(orig_pte)) pte = pte_mkyoung(pte); - __set_ptes_anysz(mm, ptep, pte, ncontig, pgsize); + __set_ptes_anysz(mm, addr, ptep, pte, ncontig, pgsize); return 1; } @@ -473,7 +473,7 @@ void huge_ptep_set_wrprotect(struct mm_struct *mm, pte = get_clear_contig_flush(mm, addr, ptep, pgsize, ncontig); pte = pte_wrprotect(pte); - __set_ptes_anysz(mm, ptep, pte, ncontig, pgsize); + __set_ptes_anysz(mm, addr, ptep, pte, ncontig, pgsize); } pte_t huge_ptep_clear_flush(struct vm_area_struct *vma, -- 2.49.0
[PATCH v15 11/13] powerpc: mm: Implement *_user_accessible_page() for ptes
From: Rohan McLure Page table checking depends on architectures providing an implementation of p{te,md,ud}_user_accessible_page. With refactorisations made on powerpc/mm, the pte_access_permitted() and similar methods verify whether a userland page is accessible with the required permissions. Since page table checking is the only user of p{te,md,ud}_user_accessible_page(), implement these for all platforms, using some of the same preliminary checks taken by pte_access_permitted() on that platform. Since commit 8e9bd41e4ce1 ("powerpc/nohash: Replace pte_user() by pte_read()") pte_user() is no longer required to be present on all platforms as it may be equivalent to or implied by pte_read(). Hence implementations of pte_user_accessible_page() are specialised. [a...@linux.ibm.com: rebase and fix commit message] Signed-off-by: Rohan McLure Reviewed-by: Pasha Tatashin Signed-off-by: Andrew Donnellan --- arch/powerpc/include/asm/book3s/32/pgtable.h | 5 + arch/powerpc/include/asm/book3s/64/pgtable.h | 17 + arch/powerpc/include/asm/nohash/pgtable.h| 5 + arch/powerpc/include/asm/pgtable.h | 8 4 files changed, 35 insertions(+) diff --git a/arch/powerpc/include/asm/book3s/32/pgtable.h b/arch/powerpc/include/asm/book3s/32/pgtable.h index 92d21c6faf1e..b225967f85ea 100644 --- a/arch/powerpc/include/asm/book3s/32/pgtable.h +++ b/arch/powerpc/include/asm/book3s/32/pgtable.h @@ -437,6 +437,11 @@ static inline bool pte_access_permitted(pte_t pte, bool write) return true; } +static inline bool pte_user_accessible_page(pte_t pte, unsigned long addr) +{ + return pte_present(pte) && !is_kernel_addr(addr); +} + /* Conversion functions: convert a page and protection to a page entry, * and a page entry and page directory to the page they refer to. * diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h b/arch/powerpc/include/asm/book3s/64/pgtable.h index a2ddcbb3fcb9..5de04302c6f4 100644 --- a/arch/powerpc/include/asm/book3s/64/pgtable.h +++ b/arch/powerpc/include/asm/book3s/64/pgtable.h @@ -540,6 +540,11 @@ static inline bool pte_access_permitted(pte_t pte, bool write) return arch_pte_access_permitted(pte_val(pte), write, 0); } +static inline bool pte_user_accessible_page(pte_t pte, unsigned long addr) +{ + return pte_present(pte) && pte_user(pte); +} + /* * Conversion functions: convert a page and protection to a page entry, * and a page entry and page directory to the page they refer to. @@ -1430,5 +1435,17 @@ static inline bool is_pte_rw_upgrade(unsigned long old_val, unsigned long new_va return false; } +#define pmd_user_accessible_page pmd_user_accessible_page +static inline bool pmd_user_accessible_page(pmd_t pmd, unsigned long addr) +{ + return pmd_leaf(pmd) && pte_user_accessible_page(pmd_pte(pmd), addr); +} + +#define pud_user_accessible_page pud_user_accessible_page +static inline bool pud_user_accessible_page(pud_t pud, unsigned long addr) +{ + return pud_leaf(pud) && pte_user_accessible_page(pud_pte(pud), addr); +} + #endif /* __ASSEMBLY__ */ #endif /* _ASM_POWERPC_BOOK3S_64_PGTABLE_H_ */ diff --git a/arch/powerpc/include/asm/nohash/pgtable.h b/arch/powerpc/include/asm/nohash/pgtable.h index 7d6b9e5b286e..a8bc4f24beb1 100644 --- a/arch/powerpc/include/asm/nohash/pgtable.h +++ b/arch/powerpc/include/asm/nohash/pgtable.h @@ -243,6 +243,11 @@ static inline bool pte_access_permitted(pte_t pte, bool write) return true; } +static inline bool pte_user_accessible_page(pte_t pte, unsigned long addr) +{ + return pte_present(pte) && !is_kernel_addr(addr); +} + /* Conversion functions: convert a page and protection to a page entry, * and a page entry and page directory to the page they refer to. * diff --git a/arch/powerpc/include/asm/pgtable.h b/arch/powerpc/include/asm/pgtable.h index 0f73a9ade0ed..d0938e9c33fb 100644 --- a/arch/powerpc/include/asm/pgtable.h +++ b/arch/powerpc/include/asm/pgtable.h @@ -223,6 +223,14 @@ static inline int pud_pfn(pud_t pud) } #endif +#ifndef pmd_user_accessible_page +#define pmd_user_accessible_page(pmd, addr)false +#endif + +#ifndef pud_user_accessible_page +#define pud_user_accessible_page(pud, addr)false +#endif + #endif /* __ASSEMBLY__ */ #endif /* _ASM_POWERPC_PGTABLE_H */ -- 2.49.0
[PATCH v15 12/13] powerpc: mm: Use set_pte_at_unchecked() for internal usages
From: Rohan McLure In the new set_ptes() API, set_pte_at() (a special case of set_ptes()) is intended to be instrumented by the page table check facility. There are however several other routines that constitute the API for setting page table entries, including set_pmd_at() among others. Such routines are themselves implemented in terms of set_ptes_at(). A future patch providing support for page table checking on powerpc must take care to avoid duplicate calls to page_table_check_p{te,md,ud}_set(). Allow for assignment of pte entries without instrumentation through the set_pte_at_unchecked() routine introduced in this patch. Cause API-facing routines that call set_pte_at() to instead call set_pte_at_unchecked(), which will remain uninstrumented by page table check. set_ptes() is itself implemented by calls to __set_pte_at(), so this eliminates redundant code. [a...@linux.ibm.com: don't change to unchecked for early boot/kernel mappings] Signed-off-by: Rohan McLure Signed-off-by: Andrew Donnellan --- v13: don't use the unchecked version for early-boot kernel mappings (Pasha) --- arch/powerpc/include/asm/pgtable.h | 2 ++ arch/powerpc/mm/book3s64/pgtable.c | 6 +++--- arch/powerpc/mm/book3s64/radix_pgtable.c | 6 +++--- arch/powerpc/mm/pgtable.c| 8 4 files changed, 16 insertions(+), 6 deletions(-) diff --git a/arch/powerpc/include/asm/pgtable.h b/arch/powerpc/include/asm/pgtable.h index d0938e9c33fb..4fe970992147 100644 --- a/arch/powerpc/include/asm/pgtable.h +++ b/arch/powerpc/include/asm/pgtable.h @@ -46,6 +46,8 @@ struct mm_struct; void set_ptes(struct mm_struct *mm, unsigned long addr, pte_t *ptep, pte_t pte, unsigned int nr); #define set_ptes set_ptes +void set_pte_at_unchecked(struct mm_struct *mm, unsigned long addr, + pte_t *ptep, pte_t pte); #define update_mmu_cache(vma, addr, ptep) \ update_mmu_cache_range(NULL, vma, addr, ptep, 1) diff --git a/arch/powerpc/mm/book3s64/pgtable.c b/arch/powerpc/mm/book3s64/pgtable.c index 0db01e10a3f8..1468a815fa5e 100644 --- a/arch/powerpc/mm/book3s64/pgtable.c +++ b/arch/powerpc/mm/book3s64/pgtable.c @@ -128,7 +128,7 @@ void set_pmd_at(struct mm_struct *mm, unsigned long addr, WARN_ON(!(pmd_leaf(pmd))); #endif trace_hugepage_set_pmd(addr, pmd_val(pmd)); - return set_pte_at(mm, addr, pmdp_ptep(pmdp), pmd_pte(pmd)); + return set_pte_at_unchecked(mm, addr, pmdp_ptep(pmdp), pmd_pte(pmd)); } void set_pud_at(struct mm_struct *mm, unsigned long addr, @@ -145,7 +145,7 @@ void set_pud_at(struct mm_struct *mm, unsigned long addr, WARN_ON(!(pud_leaf(pud))); #endif trace_hugepage_set_pud(addr, pud_val(pud)); - return set_pte_at(mm, addr, pudp_ptep(pudp), pud_pte(pud)); + return set_pte_at_unchecked(mm, addr, pudp_ptep(pudp), pud_pte(pud)); } static void do_serialize(void *arg) @@ -551,7 +551,7 @@ void ptep_modify_prot_commit(struct vm_area_struct *vma, unsigned long addr, if (radix_enabled()) return radix__ptep_modify_prot_commit(vma, addr, ptep, old_pte, pte); - set_pte_at(vma->vm_mm, addr, ptep, pte); + set_pte_at_unchecked(vma->vm_mm, addr, ptep, pte); } #ifdef CONFIG_TRANSPARENT_HUGEPAGE diff --git a/arch/powerpc/mm/book3s64/radix_pgtable.c b/arch/powerpc/mm/book3s64/radix_pgtable.c index 9f764bc42b8c..2a52089018a5 100644 --- a/arch/powerpc/mm/book3s64/radix_pgtable.c +++ b/arch/powerpc/mm/book3s64/radix_pgtable.c @@ -1600,7 +1600,7 @@ void radix__ptep_modify_prot_commit(struct vm_area_struct *vma, (atomic_read(&mm->context.copros) > 0)) radix__flush_tlb_page(vma, addr); - set_pte_at(mm, addr, ptep, pte); + set_pte_at_unchecked(mm, addr, ptep, pte); } int pud_set_huge(pud_t *pud, phys_addr_t addr, pgprot_t prot) @@ -1611,7 +1611,7 @@ int pud_set_huge(pud_t *pud, phys_addr_t addr, pgprot_t prot) if (!radix_enabled()) return 0; - set_pte_at(&init_mm, 0 /* radix unused */, ptep, new_pud); + set_pte_at_unchecked(&init_mm, 0 /* radix unused */, ptep, new_pud); return 1; } @@ -1658,7 +1658,7 @@ int pmd_set_huge(pmd_t *pmd, phys_addr_t addr, pgprot_t prot) if (!radix_enabled()) return 0; - set_pte_at(&init_mm, 0 /* radix unused */, ptep, new_pmd); + set_pte_at_unchecked(&init_mm, 0 /* radix unused */, ptep, new_pmd); return 1; } diff --git a/arch/powerpc/mm/pgtable.c b/arch/powerpc/mm/pgtable.c index 61df5aed7989..4cc9af7961ca 100644 --- a/arch/powerpc/mm/pgtable.c +++ b/arch/powerpc/mm/pgtable.c @@ -224,6 +224,14 @@ void set_ptes(struct mm_struct *mm, unsigned long addr, pte_t *ptep, } } +void set_pte_at_unchecked(struct mm_struct *mm, unsigned long addr, + pte_t *ptep, pte_t pte) +{ + VM_WARN_ON(pte_hw_valid(*ptep) && !pte_protnone
[PATCH v15 09/13] mm: Provide address parameter to p{te,md,ud}_user_accessible_page()
From: Rohan McLure On several powerpc platforms, a page table entry may not imply whether the relevant mapping is for userspace or kernelspace. Instead, such platforms infer this by the address which is being accessed. Add an additional address argument to each of these routines in order to provide support for page table check on powerpc. [a...@linux.ibm.com: rebase on arm64 changes] Signed-off-by: Rohan McLure Reviewed-by: Pasha Tatashin Acked-by: Ingo Molnar # x86 Signed-off-by: Andrew Donnellan --- v15: rebase --- arch/arm64/include/asm/pgtable.h | 6 +++--- arch/riscv/include/asm/pgtable.h | 6 +++--- arch/x86/include/asm/pgtable.h | 6 +++--- mm/page_table_check.c| 12 ++-- 4 files changed, 15 insertions(+), 15 deletions(-) diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h index 4dae6fd51792..804606e4ef0e 100644 --- a/arch/arm64/include/asm/pgtable.h +++ b/arch/arm64/include/asm/pgtable.h @@ -1314,17 +1314,17 @@ static inline int pgd_devmap(pgd_t pgd) #endif #ifdef CONFIG_PAGE_TABLE_CHECK -static inline bool pte_user_accessible_page(pte_t pte) +static inline bool pte_user_accessible_page(pte_t pte, unsigned long addr) { return pte_valid(pte) && (pte_user(pte) || pte_user_exec(pte)); } -static inline bool pmd_user_accessible_page(pmd_t pmd) +static inline bool pmd_user_accessible_page(pmd_t pmd, unsigned long addr) { return pmd_valid(pmd) && !pmd_table(pmd) && (pmd_user(pmd) || pmd_user_exec(pmd)); } -static inline bool pud_user_accessible_page(pud_t pud) +static inline bool pud_user_accessible_page(pud_t pud, unsigned long addr) { return pud_valid(pud) && !pud_table(pud) && (pud_user(pud) || pud_user_exec(pud)); } diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h index 99f8a05c595b..29126d47cf8d 100644 --- a/arch/riscv/include/asm/pgtable.h +++ b/arch/riscv/include/asm/pgtable.h @@ -834,17 +834,17 @@ static inline void set_pud_at(struct mm_struct *mm, unsigned long addr, } #ifdef CONFIG_PAGE_TABLE_CHECK -static inline bool pte_user_accessible_page(pte_t pte) +static inline bool pte_user_accessible_page(pte_t pte, unsigned long addr) { return pte_present(pte) && pte_user(pte); } -static inline bool pmd_user_accessible_page(pmd_t pmd) +static inline bool pmd_user_accessible_page(pmd_t pmd, unsigned long addr) { return pmd_leaf(pmd) && pmd_user(pmd); } -static inline bool pud_user_accessible_page(pud_t pud) +static inline bool pud_user_accessible_page(pud_t pud, unsigned long addr) { return pud_leaf(pud) && pud_user(pud); } diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h index 8de92004f3fb..60523dd7f3a9 100644 --- a/arch/x86/include/asm/pgtable.h +++ b/arch/x86/include/asm/pgtable.h @@ -1726,17 +1726,17 @@ static inline bool arch_has_hw_nonleaf_pmd_young(void) #endif #ifdef CONFIG_PAGE_TABLE_CHECK -static inline bool pte_user_accessible_page(pte_t pte) +static inline bool pte_user_accessible_page(pte_t pte, unsigned long addr) { return (pte_val(pte) & _PAGE_PRESENT) && (pte_val(pte) & _PAGE_USER); } -static inline bool pmd_user_accessible_page(pmd_t pmd) +static inline bool pmd_user_accessible_page(pmd_t pmd, unsigned long addr) { return pmd_leaf(pmd) && (pmd_val(pmd) & _PAGE_PRESENT) && (pmd_val(pmd) & _PAGE_USER); } -static inline bool pud_user_accessible_page(pud_t pud) +static inline bool pud_user_accessible_page(pud_t pud, unsigned long addr) { return pud_leaf(pud) && (pud_val(pud) & _PAGE_PRESENT) && (pud_val(pud) & _PAGE_USER); } diff --git a/mm/page_table_check.c b/mm/page_table_check.c index 1c33439b9c0b..abc2232ceb39 100644 --- a/mm/page_table_check.c +++ b/mm/page_table_check.c @@ -151,7 +151,7 @@ void __page_table_check_pte_clear(struct mm_struct *mm, unsigned long addr, if (&init_mm == mm) return; - if (pte_user_accessible_page(pte)) { + if (pte_user_accessible_page(pte, addr)) { page_table_check_clear(pte_pfn(pte), PAGE_SIZE >> PAGE_SHIFT); } } @@ -163,7 +163,7 @@ void __page_table_check_pmd_clear(struct mm_struct *mm, unsigned long addr, if (&init_mm == mm) return; - if (pmd_user_accessible_page(pmd)) { + if (pmd_user_accessible_page(pmd, addr)) { page_table_check_clear(pmd_pfn(pmd), PMD_SIZE >> PAGE_SHIFT); } } @@ -175,7 +175,7 @@ void __page_table_check_pud_clear(struct mm_struct *mm, unsigned long addr, if (&init_mm == mm) return; - if (pud_user_accessible_page(pud)) { + if (pud_user_accessible_page(pud, addr)) { page_table_check_clear(pud_pfn(pud), PUD_SIZE >> PAGE_SHIFT); } } @@ -208,7 +208,7 @@ void __page_table_check_ptes_set(struct mm_struct *mm, unsigned long addr, for (i = 0; i < nr; i++) __page_table_check_pte_clear(mm, ad
[PATCH v15 08/13] mm/page_table_check: Reinstate address parameter in [__]page_table_check_pte_clear()
From: Rohan McLure This reverts commit aa232204c468 ("mm/page_table_check: remove unused parameter in [__]page_table_check_pte_clear"). Reinstate previously unused parameters for the purpose of supporting powerpc platforms, as many do not encode user/kernel ownership of the page in the pte, but instead in the address of the access. [a...@linux.ibm.com: rebase, fix additional occurrence and loop handling] Signed-off-by: Rohan McLure Reviewed-by: Pasha Tatashin Acked-by: Ingo Molnar # x86 Signed-off-by: Andrew Donnellan --- v13: fix an additional occurrence v15: rebase, fix loop handling --- arch/arm64/include/asm/pgtable.h | 2 +- arch/riscv/include/asm/pgtable.h | 2 +- arch/x86/include/asm/pgtable.h | 4 ++-- include/linux/page_table_check.h | 11 +++ include/linux/pgtable.h | 4 ++-- mm/page_table_check.c| 7 --- 6 files changed, 17 insertions(+), 13 deletions(-) diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h index db8bab350a2d..4dae6fd51792 100644 --- a/arch/arm64/include/asm/pgtable.h +++ b/arch/arm64/include/asm/pgtable.h @@ -1391,7 +1391,7 @@ static inline pte_t __ptep_get_and_clear_anysz(struct mm_struct *mm, switch (pgsize) { case PAGE_SIZE: - page_table_check_pte_clear(mm, pte); + page_table_check_pte_clear(mm, address, pte); break; case PMD_SIZE: page_table_check_pmd_clear(mm, address, pte_pmd(pte)); diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h index cc3c690f6c93..99f8a05c595b 100644 --- a/arch/riscv/include/asm/pgtable.h +++ b/arch/riscv/include/asm/pgtable.h @@ -603,7 +603,7 @@ static inline pte_t ptep_get_and_clear(struct mm_struct *mm, { pte_t pte = __pte(atomic_long_xchg((atomic_long_t *)ptep, 0)); - page_table_check_pte_clear(mm, pte); + page_table_check_pte_clear(mm, address, pte); return pte; } diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h index 50d76a38ab9f..8de92004f3fb 100644 --- a/arch/x86/include/asm/pgtable.h +++ b/arch/x86/include/asm/pgtable.h @@ -1298,7 +1298,7 @@ static inline pte_t ptep_get_and_clear(struct mm_struct *mm, unsigned long addr, pte_t *ptep) { pte_t pte = native_ptep_get_and_clear(ptep); - page_table_check_pte_clear(mm, pte); + page_table_check_pte_clear(mm, addr, pte); return pte; } @@ -1314,7 +1314,7 @@ static inline pte_t ptep_get_and_clear_full(struct mm_struct *mm, * care about updates and native needs no locking */ pte = native_local_ptep_get_and_clear(ptep); - page_table_check_pte_clear(mm, pte); + page_table_check_pte_clear(mm, addr, pte); } else { pte = ptep_get_and_clear(mm, addr, ptep); } diff --git a/include/linux/page_table_check.h b/include/linux/page_table_check.h index 3973b69ae294..12268a32e8be 100644 --- a/include/linux/page_table_check.h +++ b/include/linux/page_table_check.h @@ -14,7 +14,8 @@ extern struct static_key_true page_table_check_disabled; extern struct page_ext_operations page_table_check_ops; void __page_table_check_zero(struct page *page, unsigned int order); -void __page_table_check_pte_clear(struct mm_struct *mm, pte_t pte); +void __page_table_check_pte_clear(struct mm_struct *mm, unsigned long addr, + pte_t pte); void __page_table_check_pmd_clear(struct mm_struct *mm, unsigned long addr, pmd_t pmd); void __page_table_check_pud_clear(struct mm_struct *mm, unsigned long addr, @@ -45,12 +46,13 @@ static inline void page_table_check_free(struct page *page, unsigned int order) __page_table_check_zero(page, order); } -static inline void page_table_check_pte_clear(struct mm_struct *mm, pte_t pte) +static inline void page_table_check_pte_clear(struct mm_struct *mm, + unsigned long addr, pte_t pte) { if (static_branch_likely(&page_table_check_disabled)) return; - __page_table_check_pte_clear(mm, pte); + __page_table_check_pte_clear(mm, addr, pte); } static inline void page_table_check_pmd_clear(struct mm_struct *mm, @@ -119,7 +121,8 @@ static inline void page_table_check_free(struct page *page, unsigned int order) { } -static inline void page_table_check_pte_clear(struct mm_struct *mm, pte_t pte) +static inline void page_table_check_pte_clear(struct mm_struct *mm, + unsigned long addr, pte_t pte) { } diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h index 13e9336bf20e..3e7c2e260279 100644 --- a/include/linux/pgtable.h +++ b/include/linux/pgtable.h @@ -483,7 +483,7 @@ static inline pte_t ptep_get_and_clear(struct mm_struct *mm, { pte_t pte = ptep_get(p
[PATCH v15 06/13] mm/page_table_check: Reinstate address parameter in [__]page_table_check_pud_clear()
From: Rohan McLure This reverts commit 931c38e16499 ("mm/page_table_check: remove unused parameter in [__]page_table_check_pud_clear"). Reinstate previously unused parameters for the purpose of supporting powerpc platforms, as many do not encode user/kernel ownership of the page in the pte, but instead in the address of the access. [a...@linux.ibm.com: rebase on arm64 changes] Signed-off-by: Rohan McLure Reviewed-by: Pasha Tatashin Acked-by: Ingo Molnar # x86 Signed-off-by: Andrew Donnellan --- v15: rebase --- arch/arm64/include/asm/pgtable.h | 2 +- arch/x86/include/asm/pgtable.h | 2 +- include/linux/page_table_check.h | 11 +++ include/linux/pgtable.h | 2 +- mm/page_table_check.c| 5 +++-- 5 files changed, 13 insertions(+), 9 deletions(-) diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h index 383c01f9cae9..72ccbd810b13 100644 --- a/arch/arm64/include/asm/pgtable.h +++ b/arch/arm64/include/asm/pgtable.h @@ -1398,7 +1398,7 @@ static inline pte_t __ptep_get_and_clear_anysz(struct mm_struct *mm, break; #ifndef __PAGETABLE_PMD_FOLDED case PUD_SIZE: - page_table_check_pud_clear(mm, pte_pud(pte)); + page_table_check_pud_clear(mm, address, pte_pud(pte)); break; #endif default: diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h index 995ca7e6e0e9..4d6da32e77a6 100644 --- a/arch/x86/include/asm/pgtable.h +++ b/arch/x86/include/asm/pgtable.h @@ -1376,7 +1376,7 @@ static inline pud_t pudp_huge_get_and_clear(struct mm_struct *mm, { pud_t pud = native_pudp_get_and_clear(pudp); - page_table_check_pud_clear(mm, pud); + page_table_check_pud_clear(mm, addr, pud); return pud; } diff --git a/include/linux/page_table_check.h b/include/linux/page_table_check.h index 66e109238416..808cc3a48c28 100644 --- a/include/linux/page_table_check.h +++ b/include/linux/page_table_check.h @@ -16,7 +16,8 @@ extern struct page_ext_operations page_table_check_ops; void __page_table_check_zero(struct page *page, unsigned int order); void __page_table_check_pte_clear(struct mm_struct *mm, pte_t pte); void __page_table_check_pmd_clear(struct mm_struct *mm, pmd_t pmd); -void __page_table_check_pud_clear(struct mm_struct *mm, pud_t pud); +void __page_table_check_pud_clear(struct mm_struct *mm, unsigned long addr, + pud_t pud); void __page_table_check_ptes_set(struct mm_struct *mm, unsigned long addr, pte_t *ptep, pte_t pte, unsigned int nr); void __page_table_check_pmds_set(struct mm_struct *mm, unsigned long addr, @@ -59,12 +60,13 @@ static inline void page_table_check_pmd_clear(struct mm_struct *mm, pmd_t pmd) __page_table_check_pmd_clear(mm, pmd); } -static inline void page_table_check_pud_clear(struct mm_struct *mm, pud_t pud) +static inline void page_table_check_pud_clear(struct mm_struct *mm, + unsigned long addr, pud_t pud) { if (static_branch_likely(&page_table_check_disabled)) return; - __page_table_check_pud_clear(mm, pud); + __page_table_check_pud_clear(mm, addr, pud); } static inline void page_table_check_ptes_set(struct mm_struct *mm, @@ -123,7 +125,8 @@ static inline void page_table_check_pmd_clear(struct mm_struct *mm, pmd_t pmd) { } -static inline void page_table_check_pud_clear(struct mm_struct *mm, pud_t pud) +static inline void page_table_check_pud_clear(struct mm_struct *mm, + unsigned long addr, pud_t pud) { } diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h index d06b5fec9540..765451864de3 100644 --- a/include/linux/pgtable.h +++ b/include/linux/pgtable.h @@ -650,7 +650,7 @@ static inline pud_t pudp_huge_get_and_clear(struct mm_struct *mm, pud_t pud = *pudp; pud_clear(pudp); - page_table_check_pud_clear(mm, pud); + page_table_check_pud_clear(mm, address, pud); return pud; } diff --git a/mm/page_table_check.c b/mm/page_table_check.c index 0957767a2940..bd1242087a35 100644 --- a/mm/page_table_check.c +++ b/mm/page_table_check.c @@ -167,7 +167,8 @@ void __page_table_check_pmd_clear(struct mm_struct *mm, pmd_t pmd) } EXPORT_SYMBOL(__page_table_check_pmd_clear); -void __page_table_check_pud_clear(struct mm_struct *mm, pud_t pud) +void __page_table_check_pud_clear(struct mm_struct *mm, unsigned long addr, + pud_t pud) { if (&init_mm == mm) return; @@ -246,7 +247,7 @@ void __page_table_check_puds_set(struct mm_struct *mm, unsigned long addr, return; for (i = 0; i < nr; i++) - __page_table_check_pud_clear(mm, *(pudp + i)); + __page_table_check_pud_clear(mm, addr + PUD_SIZE * i, *(pudp + i)); if (pud_user_accessible_page(pud))
Re: [PATCH v2 6/6] pci/hotplug/pnv_php: Enable third attention indicator
On 6/21/25 3:29 PM, Lukas Wunner wrote: > On Fri, Jun 20, 2025 at 02:56:51PM +0530, Krishna Kumar wrote: >> 5. If point 3 and 4 does not solve the problem, then only we should >> move to pciehp.c. But AFAIK, PPC/Powernv is DT based while pciehp.c >> may be only supporting acpi (I have to check it on this). We need to >> provide PHB related information via DTB and maintain the related >> topology information via dtb and then it can be doable. > pciehp is not ACPI-specific. The PCIe port service driver in > drivers/pci/pcie/portdrv.c binds to any PCIe port, examines the > port's capabilities (e.g. hotplug, AER, DPC, ...) and instantiates > sub-devices to which pciehp and the other drivers such as aer bind. Good to know and thanks for the info. As I already told I did not go through pciehp.c, and its internal working (since I did not needed it) I can only comment in concrete manner once I am done with its code review and internal logic. What my concern was, you can comment on below point if I am wrong (I will learn something) - 1. If we get PHB info from mmcfg via acpi table in x86 and create a root port from there with some address/entity and if this Acpi and associated entity is not present for PPC, then it can be a problem. 2. PPC is normally based on DTB entity and it identifies PHB and pcie devices from there. If this all the information is correctly map via portdrv.c then there is no problem and whatever you are telling is correct and it will work. 3. But if point 2 is not handled correctly we need to just aligned with port related data structure to make it work. Not a big deal but need to put thorough logic and testing effort. If its already in place, we are good. > > How do you prevent pciehp from driving hotplug on PowerNV anyway? > Do you disable CONFIG_HOTPLUG_PCI_PCIE? I am not sure if at present pciehp is used for Powenv, and PPC uses this config or it has disabled it (since we rely on pnvphp.c for our hotplug operation, I did not checked it). If its going to work on PPC by enabling config and its giving correct output, I dont see any reason to not using it as an alternate driver where pnvphp.c fails. But yeah, as I told I can commnet on this once I do some experiment (going through the code and some testing). But yeah, its always good to hear from you. Thanks a bunch !! > > Thanks, > > Lukas Best regards, Krishna >
Re: [PATCH 0/3] use vm_flags_t consistently
On 19/06/25 1:12 AM, Lorenzo Stoakes wrote: > The VMA flags field vma->vm_flags is of type vm_flags_t. Right now this is > exactly equivalent to unsigned long, but it should not be assumed to be. > > Much code that references vma->vm_flags already correctly uses vm_flags_t, > but a fairly large chunk of code simply uses unsigned long and assumes that > the two are equivalent. > > This series corrects that and has us use vm_flags_t consistently. > > This series is motivated by the desire to, in a future series, adjust > vm_flags_t to be a u64 regardless of whether the kernel is 32-bit or 64-bit > in order to deal with the VMA flag exhaustion issue and avoid all the > various problems that arise from it (being unable to use certain features > in 32-bit, being unable to add new flags except for 64-bit, etc.) > > This is therefore a critical first step towards that goal. At any rate, > using the correct type is of value regardless. > > We additionally take the opportunity to refer to VMA flags as vm_flags > where possible to make clear what we're referring to. > > Overall, this series does not introduce any functional change. > > Lorenzo Stoakes (3): > mm: change vm_get_page_prot() to accept vm_flags_t argument > mm: update core kernel code to use vm_flags_t consistently > mm: update architecture and driver code to use vm_flags_t Hello Lorenzo, Just wondering which tree-branch this series applies ? Tried all the usual ones but could not apply the series cleanly. v6.16-rc3 next-20250624 mm-stable mm-unstable b4 am cover.1750274467.git.lorenzo.stoa...@oracle.com git am ./20250618_lorenzo_stoakes_use_vm_flags_t_consistently.mbx - Anshuman
[PATCH] ASoC: fsl_asrc: use internal measured ratio for non-ideal ratio mode
When USRC=0, there is underrun issue for the non-ideal ratio mode; according to the reference mannual, the internal measured ratio can be used with USRC=1 and IDRC=0. Fixes: d0250cf4f2ab ("ASoC: fsl_asrc: Add an option to select internal ratio mode") Signed-off-by: Shengjiu Wang --- sound/soc/fsl/fsl_asrc.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/sound/soc/fsl/fsl_asrc.c b/sound/soc/fsl/fsl_asrc.c index 677529916dc0..745532ccbdba 100644 --- a/sound/soc/fsl/fsl_asrc.c +++ b/sound/soc/fsl/fsl_asrc.c @@ -517,7 +517,8 @@ static int fsl_asrc_config_pair(struct fsl_asrc_pair *pair, bool use_ideal_rate) regmap_update_bits(asrc->regmap, REG_ASRCTR, ASRCTR_ATSi_MASK(index), ASRCTR_ATS(index)); regmap_update_bits(asrc->regmap, REG_ASRCTR, - ASRCTR_USRi_MASK(index), 0); + ASRCTR_IDRi_MASK(index) | ASRCTR_USRi_MASK(index), + ASRCTR_USR(index)); /* Set the input and output clock sources */ regmap_update_bits(asrc->regmap, REG_ASRCSR, -- 2.34.1
Re: [PATCH 1/3] mm: change vm_get_page_prot() to accept vm_flags_t argument
On 19/06/25 1:12 AM, Lorenzo Stoakes wrote: > We abstract the type of the VMA flags to vm_flags_t, however in may places > it is simply assumed this is unsigned long, which is simply incorrect. > > At the moment this is simply an incongruity, however in future we plan to > change this type and therefore this change is a critical requirement for > doing so. > > Overall, this patch does not introduce any functional change. > > Signed-off-by: Lorenzo Stoakes > --- > arch/arm64/mm/mmap.c | 2 +- > arch/powerpc/include/asm/book3s/64/pkeys.h | 3 ++- > arch/sparc/mm/init_64.c| 2 +- > arch/x86/mm/pgprot.c | 2 +- > include/linux/mm.h | 4 ++-- > include/linux/pgtable.h| 2 +- > tools/testing/vma/vma_internal.h | 2 +- > 7 files changed, 9 insertions(+), 8 deletions(-) > > diff --git a/arch/arm64/mm/mmap.c b/arch/arm64/mm/mmap.c > index c86c348857c4..08ee177432c2 100644 > --- a/arch/arm64/mm/mmap.c > +++ b/arch/arm64/mm/mmap.c > @@ -81,7 +81,7 @@ static int __init adjust_protection_map(void) > } > arch_initcall(adjust_protection_map); > > -pgprot_t vm_get_page_prot(unsigned long vm_flags) > +pgprot_t vm_get_page_prot(vm_flags_t vm_flags) > { > ptdesc_t prot; > > diff --git a/arch/powerpc/include/asm/book3s/64/pkeys.h > b/arch/powerpc/include/asm/book3s/64/pkeys.h > index 5b178139f3c0..6f2075636591 100644 > --- a/arch/powerpc/include/asm/book3s/64/pkeys.h > +++ b/arch/powerpc/include/asm/book3s/64/pkeys.h > @@ -4,8 +4,9 @@ > #define _ASM_POWERPC_BOOK3S_64_PKEYS_H > > #include > +#include > > -static inline u64 vmflag_to_pte_pkey_bits(u64 vm_flags) > +static inline u64 vmflag_to_pte_pkey_bits(vm_flags_t vm_flags) > { > if (!mmu_has_feature(MMU_FTR_PKEY)) > return 0x0UL; > diff --git a/arch/sparc/mm/init_64.c b/arch/sparc/mm/init_64.c > index 25ae4c897aae..7ed58bf3aaca 100644 > --- a/arch/sparc/mm/init_64.c > +++ b/arch/sparc/mm/init_64.c > @@ -3201,7 +3201,7 @@ void copy_highpage(struct page *to, struct page *from) > } > EXPORT_SYMBOL(copy_highpage); > > -pgprot_t vm_get_page_prot(unsigned long vm_flags) > +pgprot_t vm_get_page_prot(vm_flags_t vm_flags) > { > unsigned long prot = pgprot_val(protection_map[vm_flags & > (VM_READ|VM_WRITE|VM_EXEC|VM_SHARED)]); > diff --git a/arch/x86/mm/pgprot.c b/arch/x86/mm/pgprot.c > index c84bd9540b16..dc1afd5c839d 100644 > --- a/arch/x86/mm/pgprot.c > +++ b/arch/x86/mm/pgprot.c > @@ -32,7 +32,7 @@ void add_encrypt_protection_map(void) > protection_map[i] = pgprot_encrypted(protection_map[i]); > } > > -pgprot_t vm_get_page_prot(unsigned long vm_flags) > +pgprot_t vm_get_page_prot(vm_flags_t vm_flags) > { > unsigned long val = pgprot_val(protection_map[vm_flags & > (VM_READ|VM_WRITE|VM_EXEC|VM_SHARED)]); > diff --git a/include/linux/mm.h b/include/linux/mm.h > index 98a606908307..7a7cd2e1b2af 100644 > --- a/include/linux/mm.h > +++ b/include/linux/mm.h > @@ -3487,10 +3487,10 @@ static inline bool range_in_vma(struct vm_area_struct > *vma, > } > > #ifdef CONFIG_MMU > -pgprot_t vm_get_page_prot(unsigned long vm_flags); > +pgprot_t vm_get_page_prot(vm_flags_t vm_flags); > void vma_set_page_prot(struct vm_area_struct *vma); > #else > -static inline pgprot_t vm_get_page_prot(unsigned long vm_flags) > +static inline pgprot_t vm_get_page_prot(vm_flags_t vm_flags) > { > return __pgprot(0); > } > diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h > index 1d4439499503..cf1515c163e2 100644 > --- a/include/linux/pgtable.h > +++ b/include/linux/pgtable.h > @@ -2001,7 +2001,7 @@ typedef unsigned int pgtbl_mod_mask; > * x: (yes) yes > */ > #define DECLARE_VM_GET_PAGE_PROT \ > -pgprot_t vm_get_page_prot(unsigned long vm_flags)\ > +pgprot_t vm_get_page_prot(vm_flags_t vm_flags) > \ > {\ > return protection_map[vm_flags &\ > (VM_READ | VM_WRITE | VM_EXEC | VM_SHARED)];\ > diff --git a/tools/testing/vma/vma_internal.h > b/tools/testing/vma/vma_internal.h > index d7fea56e3bb3..4e3a2f1ac09e 100644 > --- a/tools/testing/vma/vma_internal.h > +++ b/tools/testing/vma/vma_internal.h > @@ -581,7 +581,7 @@ static inline pgprot_t pgprot_modify(pgprot_t oldprot, > pgprot_t newprot) > return __pgprot(pgprot_val(oldprot) | pgprot_val(newprot)); > } > > -static inline pgprot_t vm_get_page_prot(unsigned long vm_flags) > +static inline pgprot_t vm_get_page_prot(vm_flags_t vm_flags) > { > return __pgprot(vm_flags); > } LGTM Reviewed-by: Anshuman Khandual
Re: [PATCH 3/3] mm: update architecture and driver code to use vm_flags_t
On 19/06/25 1:12 AM, Lorenzo Stoakes wrote: > In future we intend to change the vm_flags_t type, so it isn't correct for > architecture and driver code to assume it is unsigned long. Correct this > assumption across the board. > > Overall, this patch does not introduce any functional change. > > Signed-off-by: Lorenzo Stoakes > --- > arch/arm/mm/fault.c| 2 +- > arch/arm64/include/asm/mman.h | 10 +- > arch/arm64/mm/fault.c | 2 +- > arch/arm64/mm/mmu.c| 2 +- > arch/powerpc/include/asm/mman.h| 2 +- > arch/powerpc/include/asm/pkeys.h | 4 ++-- > arch/powerpc/kvm/book3s_hv_uvmem.c | 2 +- > arch/sparc/include/asm/mman.h | 4 ++-- > arch/x86/kernel/cpu/sgx/encl.c | 8 > arch/x86/kernel/cpu/sgx/encl.h | 2 +- > tools/testing/vma/vma_internal.h | 2 +- > 11 files changed, 20 insertions(+), 20 deletions(-) > > diff --git a/arch/arm/mm/fault.c b/arch/arm/mm/fault.c > index ab01b51de559..46169fe42c61 100644 > --- a/arch/arm/mm/fault.c > +++ b/arch/arm/mm/fault.c > @@ -268,7 +268,7 @@ do_page_fault(unsigned long addr, unsigned int fsr, > struct pt_regs *regs) > int sig, code; > vm_fault_t fault; > unsigned int flags = FAULT_FLAG_DEFAULT; > - unsigned long vm_flags = VM_ACCESS_FLAGS; > + vm_flags_t vm_flags = VM_ACCESS_FLAGS; > > if (kprobe_page_fault(regs, fsr)) > return 0; > diff --git a/arch/arm64/include/asm/mman.h b/arch/arm64/include/asm/mman.h > index 21df8bbd2668..8770c7ee759f 100644 > --- a/arch/arm64/include/asm/mman.h > +++ b/arch/arm64/include/asm/mman.h > @@ -11,10 +11,10 @@ > #include > #include > > -static inline unsigned long arch_calc_vm_prot_bits(unsigned long prot, > +static inline vm_flags_t arch_calc_vm_prot_bits(unsigned long prot, > unsigned long pkey) > { > - unsigned long ret = 0; > + vm_flags_t ret = 0; > > if (system_supports_bti() && (prot & PROT_BTI)) > ret |= VM_ARM64_BTI; > @@ -34,8 +34,8 @@ static inline unsigned long arch_calc_vm_prot_bits(unsigned > long prot, > } > #define arch_calc_vm_prot_bits(prot, pkey) arch_calc_vm_prot_bits(prot, pkey) > > -static inline unsigned long arch_calc_vm_flag_bits(struct file *file, > -unsigned long flags) > +static inline vm_flags_t arch_calc_vm_flag_bits(struct file *file, > + unsigned long flags) > { > /* >* Only allow MTE on anonymous mappings as these are guaranteed to be > @@ -68,7 +68,7 @@ static inline bool arch_validate_prot(unsigned long prot, > } > #define arch_validate_prot(prot, addr) arch_validate_prot(prot, addr) > > -static inline bool arch_validate_flags(unsigned long vm_flags) > +static inline bool arch_validate_flags(vm_flags_t vm_flags) > { > if (system_supports_mte()) { > /* > diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c > index ec0a337891dd..24be3e632f79 100644 > --- a/arch/arm64/mm/fault.c > +++ b/arch/arm64/mm/fault.c > @@ -549,7 +549,7 @@ static int __kprobes do_page_fault(unsigned long far, > unsigned long esr, > const struct fault_info *inf; > struct mm_struct *mm = current->mm; > vm_fault_t fault; > - unsigned long vm_flags; > + vm_flags_t vm_flags; > unsigned int mm_flags = FAULT_FLAG_DEFAULT; > unsigned long addr = untagged_addr(far); > struct vm_area_struct *vma; > diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c > index 8fcf59ba39db..248d96349fd0 100644 > --- a/arch/arm64/mm/mmu.c > +++ b/arch/arm64/mm/mmu.c > @@ -720,7 +720,7 @@ void mark_rodata_ro(void) > > static void __init declare_vma(struct vm_struct *vma, > void *va_start, void *va_end, > -unsigned long vm_flags) > +vm_flags_t vm_flags) > { > phys_addr_t pa_start = __pa_symbol(va_start); > unsigned long size = va_end - va_start; > diff --git a/arch/powerpc/include/asm/mman.h b/arch/powerpc/include/asm/mman.h > index 42a51a993d94..912f78a956a1 100644 > --- a/arch/powerpc/include/asm/mman.h > +++ b/arch/powerpc/include/asm/mman.h > @@ -14,7 +14,7 @@ > #include > #include > > -static inline unsigned long arch_calc_vm_prot_bits(unsigned long prot, > +static inline vm_flags_t arch_calc_vm_prot_bits(unsigned long prot, > unsigned long pkey) > { > #ifdef CONFIG_PPC_MEM_KEYS > diff --git a/arch/powerpc/include/asm/pkeys.h > b/arch/powerpc/include/asm/pkeys.h > index 59a2c7dbc78f..28e752138996 100644 > --- a/arch/powerpc/include/asm/pkeys.h > +++ b/arch/powerpc/include/asm/pkeys.h > @@ -30,9 +30,9 @@ extern u32 reserved_allocation_mask; /* bits set for > reserved keys */ > #endif > > > -static inline u64 pkey_to_vmflag_bits(u16 pkey) > +static inline vm_flags_t pkey_to_vmflag_bits(u16 pkey) > { > - return (((u64)pkey << VM_PKEY_SHIFT) & ARCH_V
Re: [PATCH 2/3] mm: update core kernel code to use vm_flags_t consistently
On 19/06/25 1:12 AM, Lorenzo Stoakes wrote: > The core kernel code is currently very inconsistent in its use of > vm_flags_t vs. unsigned long. This prevents us from changing the type of > vm_flags_t in the future and is simply not correct, so correct this. > > While this results in rather a lot of churn, it is a critical pre-requisite > for a future planned change to VMA flag type. > > Additionally, update VMA userland tests to account for the changes. > > To make review easier and to break things into smaller parts, driver and > architecture-specific changes is left for a subsequent commit. > > The code has been adjusted to cascade the changes across all calling code > as far as is needed. > > We will adjust architecture-specific and driver code in a subsequent patch. > > Overall, this patch does not introduce any functional change. > > Signed-off-by: Lorenzo Stoakes > --- > fs/exec.c| 2 +- > fs/userfaultfd.c | 2 +- > include/linux/coredump.h | 2 +- > include/linux/huge_mm.h | 12 +- > include/linux/khugepaged.h | 4 +- > include/linux/ksm.h | 4 +- > include/linux/memfd.h| 4 +- > include/linux/mm.h | 6 +- > include/linux/mm_types.h | 2 +- > include/linux/mman.h | 4 +- > include/linux/rmap.h | 4 +- > include/linux/userfaultfd_k.h| 4 +- > include/trace/events/fs_dax.h| 6 +- > mm/debug.c | 2 +- > mm/execmem.c | 8 +- > mm/filemap.c | 2 +- > mm/gup.c | 2 +- > mm/huge_memory.c | 2 +- > mm/hugetlb.c | 4 +- > mm/internal.h| 4 +- > mm/khugepaged.c | 4 +- > mm/ksm.c | 2 +- > mm/madvise.c | 4 +- > mm/mapping_dirty_helpers.c | 2 +- > mm/memfd.c | 8 +- > mm/memory.c | 4 +- > mm/mmap.c| 16 +- > mm/mprotect.c| 8 +- > mm/mremap.c | 2 +- > mm/nommu.c | 12 +- > mm/rmap.c| 4 +- > mm/shmem.c | 6 +- > mm/userfaultfd.c | 14 +- > mm/vma.c | 78 - > mm/vma.h | 16 +- > mm/vmscan.c | 4 +- > tools/testing/vma/vma.c | 266 +++ > tools/testing/vma/vma_internal.h | 8 +- > 38 files changed, 269 insertions(+), 269 deletions(-) > > diff --git a/fs/exec.c b/fs/exec.c > index 1f5fdd2e096e..d7aaf78c2a8f 100644 > --- a/fs/exec.c > +++ b/fs/exec.c > @@ -601,7 +601,7 @@ int setup_arg_pages(struct linux_binprm *bprm, > struct mm_struct *mm = current->mm; > struct vm_area_struct *vma = bprm->vma; > struct vm_area_struct *prev = NULL; > - unsigned long vm_flags; > + vm_flags_t vm_flags; > unsigned long stack_base; > unsigned long stack_size; > unsigned long stack_expand; > diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c > index a8867508bef6..d8b2692a5072 100644 > --- a/fs/userfaultfd.c > +++ b/fs/userfaultfd.c > @@ -1242,7 +1242,7 @@ static int userfaultfd_register(struct userfaultfd_ctx > *ctx, > int ret; > struct uffdio_register uffdio_register; > struct uffdio_register __user *user_uffdio_register; > - unsigned long vm_flags; > + vm_flags_t vm_flags; > bool found; > bool basic_ioctls; > unsigned long start, end; > diff --git a/include/linux/coredump.h b/include/linux/coredump.h > index 76e41805b92d..c504b0faecc2 100644 > --- a/include/linux/coredump.h > +++ b/include/linux/coredump.h > @@ -10,7 +10,7 @@ > #ifdef CONFIG_COREDUMP > struct core_vma_metadata { > unsigned long start, end; > - unsigned long flags; > + vm_flags_t flags; > unsigned long dump_size; > unsigned long pgoff; > struct file *file; > diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h > index 35e34e6a98a2..8f1b15213f61 100644 > --- a/include/linux/huge_mm.h > +++ b/include/linux/huge_mm.h > @@ -263,7 +263,7 @@ static inline unsigned long > thp_vma_suitable_orders(struct vm_area_struct *vma, > } > > unsigned long __thp_vma_allowable_orders(struct vm_area_struct *vma, > - unsigned long vm_flags, > + vm_flags_t vm_flags, >unsigned long tva_flags, >unsigned long orders); > > @@ -284,7 +284,7 @@ unsigned long __thp_vma_allowable_orders(struct > vm_area_struct *vma, > */ > static inline > unsigned long thp_vma_allowable_orders(struct vm_area_struct *vma, > -unsigned long vm_flags, > +
Re: [PATCH 0/3] use vm_flags_t consistently
On Wed, 25 Jun 2025 08:25:35 +0530 Anshuman Khandual wrote: > ust wondering which tree-branch this series applies ? Tried all the usual > ones but could not apply the series cleanly. > > v6.16-rc3 > next-20250624 > mm-stable > mm-unstable It's now in mm-unstable if that helps.
Re: [PATCH 2/5] uaccess: Add speculation barrier to copy_from_user_iter()
Le 22/06/2025 à 18:57, Linus Torvalds a écrit : On Sun, 22 Jun 2025 at 02:52, Christophe Leroy wrote: The results of "access_ok()" can be mis-speculated. Hmm. This code is critical. I think it should be converted to use that masked address thing if we have to add it here. Ok, I'll add it. And at some point this access_ok() didn't even exist, because we check the addresses at iter creation time. So this one might be a "belt and suspenders" check, rather than something critical. (Although I also suspect that when we added ITER_UBUF we might have created cases where those user addresses aren't checked at iter creation time any more). Let's take the follow path as an exemple: snd_pcm_ioctl(SNDRV_PCM_IOCTL_WRITEI_FRAMES) snd_pcm_common_ioctl() snd_pcm_xferi_frames_ioctl() snd_pcm_lib_write() __snd_pcm_lib_xfer() default_write_copy() copy_from_iter() _copy_from_iter() __copy_from_iter() iterate_and_advance() iterate_and_advance2() iterate_iovec() copy_from_user_iter() As far as I can see, none of those functions check the accessibility of the iovec. Am I missing something ? Christophe
Re: [PATCH 1/5] uaccess: Add masked_user_{read/write}_access_begin
Le 22/06/2025 à 18:35, David Laight a écrit : On Sun, 22 Jun 2025 11:52:39 +0200 Christophe Leroy wrote: Allthough masked_user_access_begin() seems to only be used when reading data from user at the moment, introduce masked_user_read_access_begin() and masked_user_write_access_begin() in order to match user_read_access_begin() and user_write_access_begin(). Have them default to masked_user_access_begin() when they are not defined. Signed-off-by: Christophe Leroy --- fs/select.c | 2 +- include/linux/uaccess.h | 8 kernel/futex/futex.h| 4 ++-- lib/strncpy_from_user.c | 2 +- lib/strnlen_user.c | 2 +- 5 files changed, 13 insertions(+), 5 deletions(-) diff --git a/fs/select.c b/fs/select.c index 9fb650d03d52..d8547bedf5eb 100644 --- a/fs/select.c +++ b/fs/select.c @@ -777,7 +777,7 @@ static inline int get_sigset_argpack(struct sigset_argpack *to, // the path is hot enough for overhead of copy_from_user() to matter if (from) { if (can_do_masked_user_access()) - from = masked_user_access_begin(from); + from = masked_user_read_access_begin(from); else if (!user_read_access_begin(from, sizeof(*from))) return -EFAULT; unsafe_get_user(to->p, &from->p, Efault); diff --git a/include/linux/uaccess.h b/include/linux/uaccess.h index 7c06f4795670..682a0cd2fe51 100644 --- a/include/linux/uaccess.h +++ b/include/linux/uaccess.h @@ -41,6 +41,14 @@ #ifdef masked_user_access_begin #define can_do_masked_user_access() 1 #else #define can_do_masked_user_access() 0 #define masked_user_access_begin(src) NULL #define mask_user_address(src) (src) #endif +#ifndef masked_user_write_access_begin +#define masked_user_write_access_begin masked_user_access_begin +#endif +#ifndef masked_user_read_access_begin +#define masked_user_read_access_begin masked_user_access_begin +#endif I think that needs merging with the bit above. Perhaps generating something like: #ifdef masked_user_access_begin #define masked_user_read_access_begin masked_user_access_begin #define masked_user_write_access_begin masked_user_access_begin #endif #ifdef masked_user_read_access_begin #define can_do_masked_user_access() 1 #else #define can_do_masked_user_access() 0 #define masked_user_read_access_begin(src) NULL #define masked_user_write_access_begin(src) NULL #define mask_user_address(src) (src) #endif Otherwise you'll have to #define masked_user_access_begin even though it is never used. I'm not sure I understand what you mean. masked_user_access_begin() is used, for instance in arch/x86/include/asm/futex.h so it will remain. masked_user_access_begin() is the analogy of user_access_begin(), it starts a read-write user access and is worth it. Two more patches could change x86-64 to define both and then remove the 'then unused' first check - but that has to be for later. Christophe
Re: [PATCH v2 6/6] pci/hotplug/pnv_php: Enable third attention indicator
On 6/21/25 8:35 PM, Timothy Pearson wrote: > > - Original Message - >> From: "Krishna Kumar" >> To: "linuxppc-dev" , "Timothy Pearson" >> , "Shawn >> Anastasio" >> Cc: "linuxppc-dev" , "linux-kernel" >> , "linux-pci" >> , "Madhavan Srinivasan" , >> "Michael Ellerman" , >> "christophe leroy" , "Naveen N Rao" >> , "Bjorn Helgaas" >> , "Shawn Anastasio" >> Sent: Friday, June 20, 2025 4:26:51 AM >> Subject: Re: [PATCH v2 6/6] pci/hotplug/pnv_php: Enable third attention >> indicator >> Shawn, Timothy: >> >> Thanks for posting the series of patch. Few things I wanted to do better >> understand Raptor problem - >> >> >> 1. Last time my two patches solved all the hotunplug operation and Kernel >> crash >> issue except nvme case. It did not work with >> >> NVME since dpc support was not there. I was not able to do that due to >> being >> occupied in some other work. > With the current series all hotplug is working correctly, including not only > NVMe on root port and bridge ports, but also suprise plug of the entire PCIe > switch at the root port. The lack of DPC support *might* be related to the > PE freeze, but in any case we prefer the hotplug driver to be able to recover > from a PE freeze (e.g. if a bridge card is faulty and needs to be replaced) > without also requiring a reboot, so I would consider DPC implementation > orthogonal to this patch set. Sounds Good !! > >> 2. I want to understand the delta from last yr problem to this problem. Is >> the >> PHB freeze or hotunplug failure happens >> >> only for particular Microsemi switch or it happens with all the >> switches. When >> did this problem started coming ? Till last yr > Hotplug has never worked reliably for us, if it worked at all it was always > rolling the dice on whether the kernel would oops and take down the host. > Even if the kernel didn't oops, suprise plug and auto-add / auto-remove never > worked beyond one remove operation. I would like to see this problem may be during our zoom/teams meeting. Though I have not tested surprise plug/unplug and only tested via sysfs, you may be correct but I want to have a look of this problem. > >> it was not there. Is it specific to particular Hardware ? Can I get your >> setup >> to test this problem and your patch ? > Because you will need to be able to physically plug and unplug cards and > drives this may be a bit tricky. Do you have access to a POWER9 host system > with a x16 PCIe slot? If so, all you need is a Supermicro SLC-AO3G-8E2P card > and some random U.2 NVMe drives -- these cards are readily available and > provide relatively standardized OCuLink access to a Switchtec bridge. > > If you don't have access to a POWER9 host, we can set you up with remote > access, but it won't show all of the crashing and problems that occur with > surprise plug unless we set up a live debug session (video call or similar). Video Call should be fine. During the call I will have a look of existing problem and fix along with driver/kernel logs. > >> 3. To me, hot unplug opertaion --> AER triggering --> DPC support, this flow >> should mask the error to reach root port/cpu and it >> >> should solve the PHB freeze/ hot unplug failure operation. If there are >> AER/EEH >> related synchronization issue we need to solve them. >> >> Can you pls list the issue, I will pass it to EEH/AER team. But yeah, to >> me if >> AER implementation is correct and we add DPC support, >> >> all the error will be contained by switch itself. The PHB/root port/cpu >> will not >> be impacted by this and there should not be any freeze. > While this is a good goal to work toward, it only solves one possible fault > mode. The patch series posted here will handle the general case of a PE > freeze without requiring a host reboot, which is great for high-reliability > systems where there might be a desire to replace the entire switch card (this > has been tested with the patch series and works perfectly). You may be correct on this and this is possible. If the driver and AER/EEH errors/events are properly handled then we may not need DPC in all cases. The point of DPC was to absorb the error at switch port itself so that it will not reach to PHB/Root-port/Cpu and will avoid further corruption. I was hoping that if DPC gets enabled, we may not need explicit reboot for drives to come up in case of surprise hot unplug. But yeah, we can compare this with current result when this support will be enabled. > >> 4. Ofcourse we can pick some of the fixes from pciehp driver if its missing >> in >> pnv_php.c. Also at the same time you have done >> >> some cleanup in hot unplug path and fixed the attenuation button related >> code. >> If these works fine, we can pick it. But I want to test it. >> >> Pls provide me setup. >> >> 5. If point 3 and 4 does not solve the problem, then only we should move to >> pciehp.c. But AFAI
Re: [PATCH 2/5] uaccess: Add speculation barrier to copy_from_user_iter()
On Tue, 24 Jun 2025 07:49:03 +0200 Christophe Leroy wrote: > Le 22/06/2025 à 18:57, Linus Torvalds a écrit : > > On Sun, 22 Jun 2025 at 02:52, Christophe Leroy > > wrote: > >> > >> The results of "access_ok()" can be mis-speculated. > > > > Hmm. This code is critical. I think it should be converted to use that > > masked address thing if we have to add it here. > > Ok, I'll add it. > > > > > And at some point this access_ok() didn't even exist, because we check > > the addresses at iter creation time. So this one might be a "belt and > > suspenders" check, rather than something critical. > > > > (Although I also suspect that when we added ITER_UBUF we might have > > created cases where those user addresses aren't checked at iter > > creation time any more). > > > > Let's take the follow path as an exemple: > > snd_pcm_ioctl(SNDRV_PCM_IOCTL_WRITEI_FRAMES) >snd_pcm_common_ioctl() > snd_pcm_xferi_frames_ioctl() >snd_pcm_lib_write() > __snd_pcm_lib_xfer() >default_write_copy() > copy_from_iter() >_copy_from_iter() > __copy_from_iter() >iterate_and_advance() > iterate_and_advance2() >iterate_iovec() > copy_from_user_iter() > > As far as I can see, none of those functions check the accessibility of > the iovec. Am I missing something ? The import_ubuf() in do_transfer() ought to contain one. But really you want the one in copy_from_user_iter() rather than the outer one. Mind you that code is horrid. The code only ever copies a single buffer, so could be much shorter. And is that deep call chain really needed for the very common case of one buffer. David > > Christophe
Re: [PATCH 0/5] powerpc: Implement masked user access
Le 22/06/2025 à 18:20, David Laight a écrit : On Sun, 22 Jun 2025 11:52:38 +0200 Christophe Leroy wrote: Masked user access avoids the address/size verification by access_ok(). Allthough its main purpose is to skip the speculation in the verification of user address and size hence avoid the need of spec mitigation, it also has the advantage to reduce the amount of instructions needed so it also benefits to platforms that don't need speculation mitigation, especially when the size of the copy is not know at build time. It also removes a conditional branch that is quite likely to be statically predicted 'the wrong way'. But include/asm-generic/access_ok.h defines access_ok() as: #define access_ok(addr, size) likely(__access_ok(addr, size)) So GCC uses the 'unlikely' variant of the branch instruction to force the correct prediction, doesn't it ? Unlike x86_64 which masks the address to 'all bits set' when the user address is invalid, here the address is set to an address in the gap. It avoids relying on the zero page to catch offseted accesses. On book3s/32 it makes sure the opening remains on user segment. The overcost is a single instruction in the masking. That isn't true (any more). Linus changed the check to (approx): if (uaddr > TASK_SIZE) uaddr = TASK_SIZE; (Implemented with a conditional move) Ah ok, I overlooked that, I didn't know the cmove instruction, seem similar to the isel instruction on powerpc e500. Christophe
[powerpc:next-test] BUILD SUCCESS cf183c1730f2634245da35e9b5d53381b787d112
tree/branch: https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git next-test branch HEAD: cf183c1730f2634245da35e9b5d53381b787d112 powerpc: floppy: Add missing checks after DMA map elapsed time: 1020m configs tested: 75 configs skipped: 70 The following configs have been built successfully. More configs may be tested in the coming days. tested configs: alpha allnoconfiggcc-15.1.0 arc allmodconfigclang-19 arc allnoconfiggcc-15.1.0 arc allyesconfigclang-19 arc randconfig-001-20250624gcc-8.5.0 arc randconfig-002-20250624gcc-8.5.0 arm allmodconfigclang-19 arm allnoconfiggcc-15.1.0 arm allyesconfigclang-19 arm randconfig-001-20250624gcc-8.5.0 arm randconfig-002-20250624gcc-8.5.0 arm randconfig-003-20250624gcc-8.5.0 arm randconfig-004-20250624gcc-8.5.0 arm64allmodconfigclang-19 arm64 allnoconfiggcc-15.1.0 arm64 randconfig-001-20250624gcc-8.5.0 arm64 randconfig-002-20250624gcc-8.5.0 arm64 randconfig-003-20250624gcc-8.5.0 arm64 randconfig-004-20250624gcc-8.5.0 csky allnoconfiggcc-15.1.0 hexagon allnoconfiggcc-15.1.0 i386 allmodconfigclang-20 i386 allnoconfigclang-20 i386 allyesconfigclang-20 i386buildonly-randconfig-001-20250624gcc-12 i386buildonly-randconfig-002-20250624gcc-12 i386buildonly-randconfig-003-20250624gcc-12 i386buildonly-randconfig-004-20250624gcc-12 i386buildonly-randconfig-005-20250624gcc-12 i386buildonly-randconfig-006-20250624gcc-12 i386defconfigclang-20 loongarchallmodconfiggcc-15.1.0 loongarch allnoconfiggcc-15.1.0 m68k allmodconfiggcc-15.1.0 m68k allnoconfiggcc-15.1.0 m68k allyesconfiggcc-15.1.0 microblaze allmodconfiggcc-15.1.0 microblazeallnoconfiggcc-15.1.0 microblaze allyesconfiggcc-15.1.0 mips allnoconfiggcc-15.1.0 nios2 allnoconfiggcc-15.1.0 openrisc allnoconfigclang-21 openrisc allyesconfiggcc-15.1.0 parisc allmodconfiggcc-15.1.0 pariscallnoconfigclang-21 parisc allyesconfiggcc-15.1.0 powerpc allmodconfiggcc-15.1.0 powerpc allnoconfigclang-21 powerpc allnoconfiggcc-15.1.0 powerpc allyesconfiggcc-15.1.0 riscvallmodconfiggcc-15.1.0 riscv allnoconfigclang-21 riscvallyesconfiggcc-15.1.0 s390 allmodconfiggcc-15.1.0 s390 allnoconfigclang-21 s390 allyesconfiggcc-15.1.0 sh allmodconfiggcc-15.1.0 shallnoconfiggcc-15.1.0 sh allyesconfiggcc-15.1.0 sparcallmodconfiggcc-15.1.0 sparc allnoconfiggcc-15.1.0 umallnoconfigclang-21 x86_64allnoconfigclang-20 x86_64 allyesconfigclang-20 x86_64 buildonly-randconfig-001-20250624clang-20 x86_64 buildonly-randconfig-002-20250624clang-20 x86_64 buildonly-randconfig-003-20250624clang-20 x86_64 buildonly-randconfig-004-20250624clang-20 x86_64 buildonly-randconfig-005-20250624clang-20 x86_64 buildonly-randconfig-006-20250624clang-20 x86_64 defconfigclang-20 x86_64 kexecclang-20 x86_64 rhel-9.4clang-20 x86_64 rhel-9.4-rustclang-20 xtensaallnoconfiggcc-15.1.0 -- 0-DAY CI Kernel Test Service https://github.com/intel/lkp-tests/wiki
Re: [PATCH 0/5] powerpc: Implement masked user access
On Tue, 24 Jun 2025 07:27:47 +0200 Christophe Leroy wrote: > Le 22/06/2025 à 18:20, David Laight a écrit : > > On Sun, 22 Jun 2025 11:52:38 +0200 > > Christophe Leroy wrote: > > > >> Masked user access avoids the address/size verification by access_ok(). > >> Allthough its main purpose is to skip the speculation in the > >> verification of user address and size hence avoid the need of spec > >> mitigation, it also has the advantage to reduce the amount of > >> instructions needed so it also benefits to platforms that don't > >> need speculation mitigation, especially when the size of the copy is > >> not know at build time. > > > > It also removes a conditional branch that is quite likely to be > > statically predicted 'the wrong way'. > > But include/asm-generic/access_ok.h defines access_ok() as: > > #define access_ok(addr, size) likely(__access_ok(addr, size)) > > So GCC uses the 'unlikely' variant of the branch instruction to force > the correct prediction, doesn't it ? Nope... Most architectures don't have likely/unlikely variants of branches. So all gcc can do is decide which path is the fall-through and whether the branch is forwards or backwards. Additionally unless there is code in both the 'if' and 'else' clauses the [un]likely seems to have no effect. So on simple cpu that predict 'backwards branches taken' you can get the desired effect - but it may need an 'asm comment' to force the compiler to generate the required branches (eg forwards branch directly to a backwards unconditional jump). On x86 it is all more complicated. I think the pre-fetch code is likely to assume 'not taken' (but might use stale info on the cache line). The predictor itself never does 'static prediction' - it is always based on the referenced branch prediction data structure. So, unless you are in a loop (eg running a benchmark!) there is pretty much a 50% chance of a branch mispredict. I've been trying to benchmark different versions of the u64 * u64 / u64 function - and I think mispredicted branches make a big difference. I need to sit down and sequence the test cases so that I can see the effect of each branch! > > > > >> Unlike x86_64 which masks the address to 'all bits set' when the > >> user address is invalid, here the address is set to an address in > >> the gap. It avoids relying on the zero page to catch offseted > >> accesses. On book3s/32 it makes sure the opening remains on user > >> segment. The overcost is a single instruction in the masking. > > > > That isn't true (any more). > > Linus changed the check to (approx): > > if (uaddr > TASK_SIZE) > > uaddr = TASK_SIZE; > > (Implemented with a conditional move) > > Ah ok, I overlooked that, I didn't know the cmove instruction, seem > similar to the isel instruction on powerpc e500. It got added for the 386 - I learnt 8086 :-) I suspect x86 got there first... Although called 'conditional move' I very much suspect the write is actually unconditional. So the hardware implementation is much the same as 'add carry' except the ALU operation is a simple multiplex. Which means it is unlikely to be speculative. David > > Christophe >