Re: [PATCH v10 1/5] arm64: KVM: Prepare set virtual SEI syndrome value
Hi James, Thank you very much for this mail and your time to review this patch. Appreciate that. I will check it and reply. On 2018/3/16 4:37, James Morse wrote: > Hi Dongjiu Geng, > > On 03/03/18 16:09, Dongjiu Geng wrote: >> Export one API to specify virtual SEI syndrome value >> for guest, and add a helper to get the VSESR_EL2 value. > > This patch adds two helpers that nothing calls... its not big, please merge it > with the patch that uses these. > > >> diff --git a/arch/arm64/include/asm/kvm_emulate.h >> b/arch/arm64/include/asm/kvm_emulate.h >> index 413dc82..3294885 100644 >> --- a/arch/arm64/include/asm/kvm_emulate.h >> +++ b/arch/arm64/include/asm/kvm_emulate.h >> @@ -71,6 +71,11 @@ static inline void vcpu_set_hcr(struct kvm_vcpu *vcpu, >> unsigned long hcr) >> vcpu->arch.hcr_el2 = hcr; >> } >> >> +static inline unsigned long vcpu_get_vsesr(struct kvm_vcpu *vcpu) >> +{ >> +return vcpu->arch.vsesr_el2; >> +} >> + >> static inline void vcpu_set_vsesr(struct kvm_vcpu *vcpu, u64 vsesr) >> { >> vcpu->arch.vsesr_el2 = vsesr; >> diff --git a/arch/arm64/include/asm/kvm_host.h >> b/arch/arm64/include/asm/kvm_host.h >> index a73f63a..3dc49b7 100644 >> --- a/arch/arm64/include/asm/kvm_host.h >> +++ b/arch/arm64/include/asm/kvm_host.h >> @@ -354,6 +354,8 @@ void handle_exit_early(struct kvm_vcpu *vcpu, struct >> kvm_run *run, >> int kvm_perf_init(void); >> int kvm_perf_teardown(void); >> >> +void kvm_set_sei_esr(struct kvm_vcpu *vcpu, u64 syndrome); >> + >> struct kvm_vcpu *kvm_mpidr_to_vcpu(struct kvm *kvm, unsigned long mpidr); >> >> static inline void __cpu_init_hyp_mode(phys_addr_t pgd_ptr, >> diff --git a/arch/arm64/kvm/inject_fault.c b/arch/arm64/kvm/inject_fault.c >> index 60666a0..78ecb28 100644 >> --- a/arch/arm64/kvm/inject_fault.c >> +++ b/arch/arm64/kvm/inject_fault.c >> @@ -186,3 +186,8 @@ void kvm_inject_vabt(struct kvm_vcpu *vcpu) >> { >> pend_guest_serror(vcpu, ESR_ELx_ISV); >> } >> + >> +void kvm_set_sei_esr(struct kvm_vcpu *vcpu, u64 syndrome) >> +{ >> +pend_guest_serror(vcpu, syndrome & ESR_ELx_ISS_MASK); > > If you move the ISS_MASK into pend_guest_serror(), you wouldn't need this at > all. > > It would be better if any validation were in the user-space helpers so we can > check user-space hasn't put something funny in the top bits. > >> +} >> > > > Thanks, > > James > > . > -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/8] Uprobe: Export vaddr <-> offset conversion functions
On 03/15/2018 09:57 PM, Steven Rostedt wrote: > On Tue, 13 Mar 2018 18:25:56 +0530 > Ravi Bangoria wrote: > >> No functionality changes. > Please add a detailed explanation why this patch is needed. All commits > should be self sufficient and stand on their own. If I were to come up > to this patch via a git blame, I would be clueless to why it was done. Sure Steve, Will add description it in next series. Thanks for the review, Ravi -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/8] Uprobe: Rename map_info to uprobe_map_info
On 03/15/2018 10:14 PM, Steven Rostedt wrote: > On Tue, 13 Mar 2018 18:25:58 +0530 > Ravi Bangoria wrote: >> -static inline struct map_info *free_map_info(struct map_info *info) >> +static inline struct uprobe_map_info * >> +uprobe_free_map_info(struct uprobe_map_info *info) >> { >> -struct map_info *next = info->next; >> +struct uprobe_map_info *next = info->next; >> kfree(info); >> return next; >> } >> >> -static struct map_info * >> -build_map_info(struct address_space *mapping, loff_t offset, bool >> is_register) >> +static struct uprobe_map_info * >> +uprobe_build_map_info(struct address_space *mapping, loff_t offset, > Also, as these functions have side effects (like you need to perform a > mmput(info->mm), you need to add kerneldoc type comments to these > functions, explaining how to use them. > > When you upgrade a function from static to use cases outside the file, > it requires documenting that function for future users. Sure, will add a comment here. Thanks for the review, Ravi -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/8] mm: Prefix vma_ to vaddr_to_offset() and offset_to_vaddr()
On 03/15/2018 09:58 PM, Steven Rostedt wrote: > On Tue, 13 Mar 2018 18:25:57 +0530 > Ravi Bangoria wrote: > >> No functionality changes. > Again, please add an explanation to why this patch is done. Sure. Will add. Thanks for the review, Ravi > -- Steve > >> Signed-off-by: Ravi Bangoria -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/8] Uprobe: Export uprobe_map_info along with uprobe_{build/free}_map_info()
On 03/15/2018 10:02 PM, Steven Rostedt wrote: > On Tue, 13 Mar 2018 18:25:59 +0530 > Ravi Bangoria wrote: > >> These exported data structure and functions will be used by other >> files in later patches. > I'm reluctantly OK with the above. > >> No functionality changes. > Please remove this line. There are functionality changes. Turning a > static inline into an exported function *is* a functionality change. Sure. Will change it. Thanks for the review, Ravi -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 5/8] trace_uprobe: Support SDT markers having reference count (semaphore)
On 03/15/2018 10:18 PM, Steven Rostedt wrote: > On Tue, 13 Mar 2018 18:26:00 +0530 > Ravi Bangoria wrote: > >> +static void sdt_increment_ref_ctr(struct trace_uprobe *tu) >> +{ >> +struct uprobe_map_info *info; >> +struct vm_area_struct *vma; >> +unsigned long vaddr; >> + >> +uprobe_start_dup_mmap(); > Please add a comment here that this function ups the mm ref count for > each info returned. Otherwise it's hard to know what that mmput() below > matches. Sure. Will add it. Thanks for the review, Ravi -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 5/8] trace_uprobe: Support SDT markers having reference count (semaphore)
On 03/15/2018 07:51 PM, Oleg Nesterov wrote: > On 03/13, Ravi Bangoria wrote: >> @@ -1053,6 +1056,9 @@ int uprobe_mmap(struct vm_area_struct *vma) >> struct uprobe *uprobe, *u; >> struct inode *inode; >> >> +if (uprobe_mmap_callback) >> +uprobe_mmap_callback(vma); >> + >> if (no_uprobe_events() || !valid_vma(vma, true)) >> return 0; > probe_event_enable() does > > uprobe_register(); > /* WINDOW */ > sdt_increment_ref_ctr(); > > what if uprobe_mmap() is called in between? The counter(s) in this vma > will be incremented twice, no? I guess, it's a valid issue with PATCH 5 but should be taken care by PATCH 6. > >> +static struct vm_area_struct * >> +sdt_find_vma(struct mm_struct *mm, struct trace_uprobe *tu) >> +{ >> +struct vm_area_struct *tmp; >> + >> +for (tmp = mm->mmap; tmp != NULL; tmp = tmp->vm_next) >> +if (sdt_valid_vma(tu, tmp)) >> +return tmp; >> + >> +return NULL; > I can't understand the logic... Lets ignore sdt_valid_vma() for now. > The caller has uprobe_map_info, why it can't simply do > vma = find_vma(uprobe_map_info->vaddr)? and then check sdt_valid_vma(). Yes. that should work. Will change it. Thanks for the review, Ravi -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 5/8] trace_uprobe: Support SDT markers having reference count (semaphore)
On 03/15/2018 08:00 PM, Oleg Nesterov wrote: > On 03/15, Oleg Nesterov wrote: >>> +static struct vm_area_struct * >>> +sdt_find_vma(struct mm_struct *mm, struct trace_uprobe *tu) >>> +{ >>> + struct vm_area_struct *tmp; >>> + >>> + for (tmp = mm->mmap; tmp != NULL; tmp = tmp->vm_next) >>> + if (sdt_valid_vma(tu, tmp)) >>> + return tmp; >>> + >>> + return NULL; >> I can't understand the logic... Lets ignore sdt_valid_vma() for now. >> The caller has uprobe_map_info, why it can't simply do >> vma = find_vma(uprobe_map_info->vaddr)? and then check sdt_valid_vma(). > Note to mention that sdt_find_vma() can return NULL but the callers do > vma_offset_to_vaddr(vma) without any check. If the "mm" we are passing to sdt_find_vma() is returned by uprobe_build_map_info(ref_ctr_offset), sdt_find_vma() must _not_ return NULL. Thanks for the review, Ravi -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 5/8] trace_uprobe: Support SDT markers having reference count (semaphore)
On 03/15/2018 08:31 PM, Oleg Nesterov wrote: > On 03/13, Ravi Bangoria wrote: >> +sdt_update_ref_ctr(struct mm_struct *mm, unsigned long vaddr, short d) >> +{ >> +void *kaddr; >> +struct page *page; >> +struct vm_area_struct *vma; >> +int ret = 0; >> +unsigned short orig = 0; >> + >> +if (vaddr == 0) >> +return -EINVAL; >> + >> +ret = get_user_pages_remote(NULL, mm, vaddr, 1, >> +FOLL_FORCE | FOLL_WRITE, &page, &vma, NULL); >> +if (ret <= 0) >> +return ret; >> + >> +kaddr = kmap_atomic(page); >> +memcpy(&orig, kaddr + (vaddr & ~PAGE_MASK), sizeof(orig)); >> +orig += d; >> +memcpy(kaddr + (vaddr & ~PAGE_MASK), &orig, sizeof(orig)); >> +kunmap_atomic(kaddr); > Hmm. Why memcpy? You could simply do > > kaddr = kmap_atomic(); > unsigned short *ptr = kaddr + (vaddr & ~PAGE_MASK); > *ptr += d; > kunmap_atomic(); Yes, that should work. Will change it. Thanks for the review, Ravi -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 8/8] trace_uprobe/sdt: Document about reference counter
On 03/15/2018 06:17 PM, Masami Hiramatsu wrote: > Hi Ravi, > > On Wed, 14 Mar 2018 20:52:59 +0530 > Ravi Bangoria wrote: > >> On 03/14/2018 07:20 PM, Masami Hiramatsu wrote: >>> On Tue, 13 Mar 2018 18:26:03 +0530 >>> Ravi Bangoria wrote: >>> No functionality changes. >>> Please consider to describe what is this change and why, here. >> Will add in next version. > Thanks, and could you also move this before perf-probe patch? > Also Could you make perf-probe check the tracing/README whether > the kernel supports reference counter syntax or not? > > perf-tool can be used on older (or stable) kernel. Sure, Will do that. Thanks, Ravi -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 5/8] trace_uprobe: Support SDT markers having reference count (semaphore)
On 03/16, Ravi Bangoria wrote: > > On 03/15/2018 08:00 PM, Oleg Nesterov wrote: > > Note to mention that sdt_find_vma() can return NULL but the callers do > > vma_offset_to_vaddr(vma) without any check. > > If the "mm" we are passing to sdt_find_vma() is returned by > uprobe_build_map_info(ref_ctr_offset), sdt_find_vma() must > _not_ return NULL. Not at all. Once build_map_info() returns any mapping can go away. Otherwise, why do you think the caller has to take ->mmap_sem and use find_vma()? If you were right, build_map_info() could just return the list of vma's instead of list of mm's. Oleg. -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 5/8] trace_uprobe: Support SDT markers having reference count (semaphore)
On 03/16/2018 05:09 PM, Oleg Nesterov wrote: > On 03/16, Ravi Bangoria wrote: >> On 03/15/2018 08:00 PM, Oleg Nesterov wrote: >>> Note to mention that sdt_find_vma() can return NULL but the callers do >>> vma_offset_to_vaddr(vma) without any check. >> If the "mm" we are passing to sdt_find_vma() is returned by >> uprobe_build_map_info(ref_ctr_offset), sdt_find_vma() must >> _not_ return NULL. > Not at all. > > Once build_map_info() returns any mapping can go away. Otherwise, why do > you think the caller has to take ->mmap_sem and use find_vma()? If you > were right, build_map_info() could just return the list of vma's instead > of list of mm's. Oh.. okay.. I was under wrong impression then. Will add a check there. Thanks for the review :) Ravi -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 6/8] trace_uprobe/sdt: Fix multiple update of same reference counter
On 03/15/2018 08:19 PM, Oleg Nesterov wrote: > On 03/13, Ravi Bangoria wrote: >> For tiny binaries/libraries, different mmap regions points to the >> same file portion. In such cases, we may increment reference counter >> multiple times. > Yes, > >> But while de-registration, reference counter will get >> decremented only by once > could you explain why this happens? sdt_increment_ref_ctr() and > sdt_decrement_ref_ctr() look symmetrical, _decrement_ should see > the same mappings? Sorry, I thought this happens only for tiny binaries. But that is not the case. This happens for binary / library of any length. Also, it's not a problem with sdt_increment_ref_ctr() / sdt_increment_ref_ctr(). The problem happens with trace_uprobe_mmap_callback(). To illustrate in detail, I'm adding a pr_info() in trace_uprobe_mmap_callback(): vaddr = vma_offset_to_vaddr(vma, tu->ref_ctr_offset); + pr_info("0x%lx-0x%lx : 0x%lx\n", vma->vm_start, vma->vm_end, vaddr); sdt_update_ref_ctr(vma->vm_mm, vaddr, 1); Ok now, libpython has SDT markers with reference counter: # readelf -n /usr/lib64/libpython2.7.so.1.0 | grep -A2 Provider Provider: python Name: function__entry ... Semaphore: 0x002899d8 Probing on that marker: # cd /sys/kernel/debug/tracing/ # echo "p:sdt_python/function__entry /usr/lib64/libpython2.7.so.1.0:0x16a4d4(0x2799d8)" > uprobe_events # echo 1 > events/sdt_python/function__entry/enable When I run python: # strace -o out python mmap(NULL, 2738968, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7fff9246 mmap(0x7fff926a, 327680, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x23) = 0x7fff926a mprotect(0x7fff926a, 65536, PROT_READ) = 0 The first mmap() maps the whole library into one region. Second mmap() and third mprotect() split out the whole region into smaller vmas and sets appropriate protection flags. Now, in this case, trace_uprobe_mmap_callback() updates reference counter twice -- by second mmap() call and by third mprotect() call -- because both regions contain reference counter offset. This I can verify in dmesg: # dmesg | tail trace_kprobe: 0x7fff926a-0x7fff926f : 0x7fff926e99d8 trace_kprobe: 0x7fff926b-0x7fff926f : 0x7fff926e99d8 Final vmas of libpython: # cat /proc/`pgrep python`/maps | grep libpython 7fff9246-7fff926a r-xp 08:05 403934 /usr/lib64/libpython2.7.so.1.0 7fff926a-7fff926b r--p 0023 08:05 403934 /usr/lib64/libpython2.7.so.1.0 7fff926b-7fff926f rw-p 0024 08:05 403934 /usr/lib64/libpython2.7.so.1.0 I see similar problem with normal binary as well. I'm using Brendan Gregg's example[1]: # readelf -n /tmp/tick | grep -A2 Provider Provider: tick Name: loop2 ... Semaphore: 0x1005003c Probing that marker: # echo "p:sdt_tick/loop2 /tmp/tick:0x6e4(0x10036)" > uprobe_events # echo 1 > events/sdt_tick/loop2/enable Now when I run the binary # /tmp/tick load_elf_binary() internally calls mmap() and I see trace_uprobe_mmap_callback() updating reference counter twice: # dmesg | tail trace_kprobe: 0x1001-0x1003 : 0x10020036 trace_kprobe: 0x1002-0x1003 : 0x10020036 proc//maps of the tick: # cat /proc/`pgrep tick`/maps 1000-1001 r-xp 08:05 1335712 /tmp/tick 1001-1002 r--p 08:05 1335712 /tmp/tick 1002-1003 rw-p 0001 08:05 1335712 /tmp/tick [1] https://github.com/iovisor/bcc/issues/327#issuecomment-200576506 > Ether way, this patch doesn't look right at first glance... Just > for example, > >> +static bool sdt_check_mm_list(struct trace_uprobe *tu, struct mm_struct *mm) >> +{ >> +struct sdt_mm_list *tmp = tu->sml; >> + >> +if (!tu->sml || !mm) >> +return false; >> + >> +while (tmp) { >> +if (tmp->mm == mm) >> +return true; >> +tmp = tmp->next; >> +} >> + >> +return false; > ... > >> +} >> + >> +static void sdt_add_mm_list(struct trace_uprobe *tu, struct mm_struct *mm) >> +{ >> +struct sdt_mm_list *tmp; >> + >> +tmp = kzalloc(sizeof(*tmp), GFP_KERNEL); >> +if (!tmp) >> +return; >> + >> +tmp->mm = mm; >> +tmp->next = tu->sml; >> +tu->sml = tmp; >> +} >> + > ... > >> @@ -1020,8 +1104,16 @@ void trace_uprobe_mmap_callback(struct vm_area_struct >> *vma) >> !trace_probe_is_enabled(&tu->tp)) >> continue; >> >> +down_write(&tu->sml_rw_sem); >> +if (sdt_check_mm_list(tu, vma->vm_mm)) >> +goto cont; >> + >> vaddr = vma_offset_to_vaddr(vma, tu->ref_ctr_offset); >> -sdt_update_ref_ctr(vma->vm_mm, vaddr, 1); >> +if (!sdt_update_ref_ctr(vma->vm_mm, vaddr, 1)) >>
Re: [RESEND PATCH v5] input: pxrc: new driver for PhoenixRC Flight Controller Adapter
ping. I do not want to nag, but would someone please have a look at this? Thanks, Marcus Folkesson On Sun, Feb 18, 2018 at 05:17:46PM +0100, Marcus Folkesson wrote: > This driver let you plug in your RC controller to the adapter and > use it as input device in various RC simulators. > > Signed-off-by: Marcus Folkesson > --- > > v5: > - Drop autosuspend support > - Use pm_mutex instead of input_dev->mutex > - Use pxrc->is_open instead of input_dev->users > v4: > - Add call to usb_mark_last_busy() in irq > - Move code from pxrc_resume() to pxrc_reset_resume() > v3: > - Use RUDDER and MISC instead of TILT_X and TILT_Y > - Drop kref and anchor > - Rework URB handling > - Add PM support > v2: > - Change module license to GPLv2 to match SPDX tag > > > Documentation/input/devices/pxrc.rst | 57 +++ > drivers/input/joystick/Kconfig | 9 ++ > drivers/input/joystick/Makefile | 1 + > drivers/input/joystick/pxrc.c| 303 > +++ > 4 files changed, 370 insertions(+) > create mode 100644 Documentation/input/devices/pxrc.rst > create mode 100644 drivers/input/joystick/pxrc.c > > diff --git a/Documentation/input/devices/pxrc.rst > b/Documentation/input/devices/pxrc.rst > new file mode 100644 > index ..ca11f646bae8 > --- /dev/null > +++ b/Documentation/input/devices/pxrc.rst > @@ -0,0 +1,57 @@ > +=== > +pxrc - PhoenixRC Flight Controller Adapter > +=== > + > +:Author: Marcus Folkesson > + > +This driver let you use your own RC controller plugged into the > +adapter that comes with PhoenixRC [1]_ or other compatible adapters. > + > +The adapter supports 7 analog channels and 1 digital input switch. > + > +Notes > += > + > +Many RC controllers is able to configure which stick goes to which channel. > +This is also configurable in most simulators, so a matching is not necessary. > + > +The driver is generating the following input event for analog channels: > + > ++-++ > +| Channel | Event | > ++=++ > +| 1 | ABS_X | > ++-++ > +| 2 | ABS_Y | > ++-++ > +| 3 | ABS_RX| > ++-++ > +| 4 | ABS_RY| > ++-++ > +| 5 | ABS_RUDDER| > ++-++ > +| 6 | ABS_THROTTLE | > ++-++ > +| 7 | ABS_MISC | > ++-++ > + > +The digital input switch is generated as an `BTN_A` event. > + > +Manual Testing > +== > + > +To test this driver's functionality you may use `input-event` which is part > of > +the `input layer utilities` suite [2]_. > + > +For example:: > + > +> modprobe pxrc > +> input-events > + > +To print all input events from input `devnr`. > + > +References > +== > + > +.. [1] http://www.phoenix-sim.com/ > +.. [2] https://www.kraxel.org/cgit/input/ > diff --git a/drivers/input/joystick/Kconfig b/drivers/input/joystick/Kconfig > index f3c2f6ea8b44..332c0cc1b2ab 100644 > --- a/drivers/input/joystick/Kconfig > +++ b/drivers/input/joystick/Kconfig > @@ -351,4 +351,13 @@ config JOYSTICK_PSXPAD_SPI_FF > > To drive rumble motor a dedicated power supply is required. > > +config JOYSTICK_PXRC > + tristate "PhoenixRC Flight Controller Adapter" > + depends on USB_ARCH_HAS_HCD > + depends on USB > + help > + Say Y here if you want to use the PhoenixRC Flight Controller Adapter. > + > + To compile this driver as a module, choose M here: the > + module will be called pxrc. > endif > diff --git a/drivers/input/joystick/Makefile b/drivers/input/joystick/Makefile > index 67651efda2e1..dd0492ebbed7 100644 > --- a/drivers/input/joystick/Makefile > +++ b/drivers/input/joystick/Makefile > @@ -23,6 +23,7 @@ obj-$(CONFIG_JOYSTICK_JOYDUMP) += joydump.o > obj-$(CONFIG_JOYSTICK_MAGELLAN) += magellan.o > obj-$(CONFIG_JOYSTICK_MAPLE) += maplecontrol.o > obj-$(CONFIG_JOYSTICK_PSXPAD_SPI)+= psxpad-spi.o > +obj-$(CONFIG_JOYSTICK_PXRC) += pxrc.o > obj-$(CONFIG_JOYSTICK_SIDEWINDER)+= sidewinder.o > obj-$(CONFIG_JOYSTICK_SPACEBALL) += spaceball.o > obj-$(CONFIG_JOYSTICK_SPACEORB) += spaceorb.o > diff --git a/drivers/input/joystick/pxrc.c b/drivers/input/joystick/pxrc.c > new file mode 100644 > index ..07a0dbd3ced2 > --- /dev/null > +++ b/drivers/input/joystick/pxrc.c > @@ -0,0 +1,303 @@ > +// SPDX-License-Identifier: GPL-2.0 > +/* > + * Driver for Phoenix RC Flight Controller Adapter > + * > + * Copyright (C) 2018 Marcus Folkesson > + * > + */ > + > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#i
Re: [PATCH 6/8] trace_uprobe/sdt: Fix multiple update of same reference counter
On 03/16/2018 05:42 PM, Ravi Bangoria wrote: > > On 03/15/2018 08:19 PM, Oleg Nesterov wrote: >> On 03/13, Ravi Bangoria wrote: >>> For tiny binaries/libraries, different mmap regions points to the >>> same file portion. In such cases, we may increment reference counter >>> multiple times. >> Yes, >> >>> But while de-registration, reference counter will get >>> decremented only by once >> could you explain why this happens? sdt_increment_ref_ctr() and >> sdt_decrement_ref_ctr() look symmetrical, _decrement_ should see >> the same mappings? > Sorry, I thought this happens only for tiny binaries. But that is not the > case. > This happens for binary / library of any length. > > Also, it's not a problem with sdt_increment_ref_ctr() / > sdt_increment_ref_ctr(). > The problem happens with trace_uprobe_mmap_callback(). > > To illustrate in detail, I'm adding a pr_info() in > trace_uprobe_mmap_callback(): > > vaddr = vma_offset_to_vaddr(vma, tu->ref_ctr_offset); > + pr_info("0x%lx-0x%lx : 0x%lx\n", vma->vm_start, vma->vm_end, > vaddr); > sdt_update_ref_ctr(vma->vm_mm, vaddr, 1); > > > Ok now, libpython has SDT markers with reference counter: > > # readelf -n /usr/lib64/libpython2.7.so.1.0 | grep -A2 Provider > Provider: python > Name: function__entry > ... Semaphore: 0x002899d8 > > Probing on that marker: > > # cd /sys/kernel/debug/tracing/ > # echo "p:sdt_python/function__entry > /usr/lib64/libpython2.7.so.1.0:0x16a4d4(0x2799d8)" > uprobe_events > # echo 1 > events/sdt_python/function__entry/enable > > When I run python: > > # strace -o out python > mmap(NULL, 2738968, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, > 0) = 0x7fff9246 > mmap(0x7fff926a, 327680, PROT_READ|PROT_WRITE, > MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x23) = 0x7fff926a > mprotect(0x7fff926a, 65536, PROT_READ) = 0 > > The first mmap() maps the whole library into one region. Second mmap() > and third mprotect() split out the whole region into smaller vmas and sets > appropriate protection flags. > > Now, in this case, trace_uprobe_mmap_callback() updates reference counter > twice -- by second mmap() call and by third mprotect() call -- because both > regions contain reference counter offset. This I can verify in dmesg: > > # dmesg | tail > trace_kprobe: 0x7fff926a-0x7fff926f : 0x7fff926e99d8 > trace_kprobe: 0x7fff926b-0x7fff926f : 0x7fff926e99d8 > > Final vmas of libpython: > > # cat /proc/`pgrep python`/maps | grep libpython > 7fff9246-7fff926a r-xp 08:05 403934 > /usr/lib64/libpython2.7.so.1.0 > 7fff926a-7fff926b r--p 0023 08:05 403934 > /usr/lib64/libpython2.7.so.1.0 > 7fff926b-7fff926f rw-p 0024 08:05 403934 > /usr/lib64/libpython2.7.so.1.0 > > > I see similar problem with normal binary as well. I'm using Brendan Gregg's > example[1]: > > # readelf -n /tmp/tick | grep -A2 Provider > Provider: tick > Name: loop2 > ... Semaphore: 0x1005003c > > Probing that marker: > > # echo "p:sdt_tick/loop2 /tmp/tick:0x6e4(0x10036)" > uprobe_events > # echo 1 > events/sdt_tick/loop2/enable > > Now when I run the binary > > # /tmp/tick > > load_elf_binary() internally calls mmap() and I see > trace_uprobe_mmap_callback() > updating reference counter twice: > > # dmesg | tail > trace_kprobe: 0x1001-0x1003 : 0x10020036 > trace_kprobe: 0x1002-0x1003 : 0x10020036 > > proc//maps of the tick: > > # cat /proc/`pgrep tick`/maps > 1000-1001 r-xp 08:05 1335712 /tmp/tick > 1001-1002 r--p 08:05 1335712 /tmp/tick > 1002-1003 rw-p 0001 08:05 1335712 /tmp/tick > > [1] https://github.com/iovisor/bcc/issues/327#issuecomment-200576506 Also, while de-registration, we look for all existing mms using uprobe_build_mmap_info() and decrement the counter in each of the mm. i.e. we decrement the counter only once. -Ravi -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Improve mutex documentation
On Thu, Mar 15, 2018 at 04:58:12AM -0700, Matthew Wilcox wrote: > On Wed, Mar 14, 2018 at 01:56:31PM -0700, Andrew Morton wrote: > > My memory is weak and our documentation is awful. What does > > mutex_lock_killable() actually do and how does it differ from > > mutex_lock_interruptible()? > > From: Matthew Wilcox > > Add kernel-doc for mutex_lock_killable() and mutex_lock_io(). Reword the > kernel-doc for mutex_lock_interruptible(). > > Signed-off-by: Matthew Wilcox Thanks Matthew! -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 8/8] trace_uprobe/sdt: Document about reference counter
On Fri, 16 Mar 2018 15:12:38 +0530 Ravi Bangoria wrote: > On 03/15/2018 06:17 PM, Masami Hiramatsu wrote: > > Hi Ravi, > > > > On Wed, 14 Mar 2018 20:52:59 +0530 > > Ravi Bangoria wrote: > > > >> On 03/14/2018 07:20 PM, Masami Hiramatsu wrote: > >>> On Tue, 13 Mar 2018 18:26:03 +0530 > >>> Ravi Bangoria wrote: > >>> > No functionality changes. > >>> Please consider to describe what is this change and why, here. > >> Will add in next version. > > Thanks, and could you also move this before perf-probe patch? > > Also Could you make perf-probe check the tracing/README whether > > the kernel supports reference counter syntax or not? > > > > perf-tool can be used on older (or stable) kernel. > > Sure, Will do that. Please see scan_ftrace_readme@util/probe-file.c :) It is easy to expand the pattern table. Thank you, -- Masami Hiramatsu -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] crypto: doc - clarify hash callbacks state machine
On Mon, Mar 05, 2018 at 12:39:45PM +0200, Horia Geantă wrote: > Even though it doesn't make too much sense, it is perfectly legal to: > - call .init() and then (as many times) .update() > - subseqently _not_ call any of .final(), .finup() or .export() Actually it makes perfect sense, because there can be an arbitrary number of requests for a given tfm. There is no requirement that you must finalise the first request before submitting new ones. IOW there can be an arbitrary number of outstanding requests even without the user intentionally abandoning any hash request. So please modify your commit description. Thanks, -- Email: Herbert Xu Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] README: Improve documentation descriptions
"This file" indeed was moved once, but at some point "this file", the top-level README, becomes a file in itself. Now that time has come :) Let's describe how things are, and suggest reading "this file" first, "this file" simply being a the admin-guide README file, not a file that was once moved. Signed-off-by: Martin Kepplinger --- That's at least my opinion :) thanks martin README | 7 +++ 1 file changed, 3 insertions(+), 4 deletions(-) diff --git a/README b/README index b2ba4aaa3a71..12b4674a483c 100644 --- a/README +++ b/README @@ -1,10 +1,9 @@ Linux kernel -This file was moved to Documentation/admin-guide/README.rst - -Please notice that there are several guides for kernel developers and users. -These guides can be rendered in a number of formats, like HTML and PDF. +There are several guides for kernel developers and users. These guides can +be rendered in a number of formats, like HTML and PDF. Please read +Documentation/admin-guide/README.rst first. In order to build the documentation, use ``make htmldocs`` or ``make pdfdocs``. -- 2.14.2 -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 5/8] trace_uprobe: Support SDT markers having reference count (semaphore)
On 03/15, Steven Rostedt wrote: > > On Tue, 13 Mar 2018 18:26:00 +0530 > Ravi Bangoria wrote: > > > +static void sdt_increment_ref_ctr(struct trace_uprobe *tu) > > +{ > > + struct uprobe_map_info *info; > > + struct vm_area_struct *vma; > > + unsigned long vaddr; > > + > > + uprobe_start_dup_mmap(); > > Please add a comment here that this function ups the mm ref count for > each info returned. Otherwise it's hard to know what that mmput() below > matches. You meant uprobe_build_map_info(), not uprobe_start_dup_mmap(). Yes, and if it gets more callers perhaps we should move this mmput() into uprobe_free_map_info()... Oleg. --- x/kernel/events/uprobes.c +++ x/kernel/events/uprobes.c @@ -714,6 +714,7 @@ struct map_info { static inline struct map_info *free_map_info(struct map_info *info) { struct map_info *next = info->next; + mmput(info->mm); kfree(info); return next; } @@ -783,8 +784,11 @@ build_map_info(struct address_space *map goto again; out: - while (prev) - prev = free_map_info(prev); + while (prev) { + info = prev; + prev = prev->next; + kfree(info); + } return curr; } @@ -834,7 +838,6 @@ register_for_each_vma(struct uprobe *upr unlock: up_write(&mm->mmap_sem); free: - mmput(mm); info = free_map_info(info); } out: -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 6/8] trace_uprobe/sdt: Fix multiple update of same reference counter
On 03/16, Ravi Bangoria wrote: > > On 03/15/2018 08:19 PM, Oleg Nesterov wrote: > > On 03/13, Ravi Bangoria wrote: > >> For tiny binaries/libraries, different mmap regions points to the > >> same file portion. In such cases, we may increment reference counter > >> multiple times. > > Yes, > > > >> But while de-registration, reference counter will get > >> decremented only by once > > could you explain why this happens? sdt_increment_ref_ctr() and > > sdt_decrement_ref_ctr() look symmetrical, _decrement_ should see > > the same mappings? ... > # strace -o out python > mmap(NULL, 2738968, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, > 0) = 0x7fff9246 > mmap(0x7fff926a, 327680, PROT_READ|PROT_WRITE, > MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x23) = 0x7fff926a > mprotect(0x7fff926a, 65536, PROT_READ) = 0 Ah, in this case everything is clear, thanks. I was confused by the changelog, I misinterpreted it as if inc/dec are not balanced in case of multiple mappings even if the application doesn't play with mmap/mprotect/etc. And it seems that you are trying to confuse yourself, not only me ;) Just suppose that an application does mmap+munmap in a loop and the mapped region contains uprobe but not the counter. And this all makes me think that we should do something else. Ideally, install_breakpoint() and remove_breakpoint() should inc/dec the counter if they do not fail... Btw, why do we need a counter, not a boolean? Who else can modify it? Or different uprobes can share the same counter? Oleg. -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v5 0/9] ipc: Clamp *mni to the real IPCMNI limit & increase that limit
v4->v5: - Revert the flags back to 16-bit so that there will be no change to the size of ctl_table. - Enhance the sysctl_check_flags() as requested by Luis to perform more checks to spot incorrect ctl_table entries. - Change the sysctl selftest to use dummy sysctls instead of production ones & enhance it to do more checks. - Add one more sysctl selftest for registration failure. - Add 2 ipc patches to add an extended mode to increase IPCMNI from 32k to 2M. - Miscellaneous change to incorporate feedback comments from reviewers. v3->v4: - Remove v3 patches 1 & 2 as they have been merged into the mm tree. - Change flags from uint16_t to unsigned int. - Remove CTL_FLAGS_OOR_WARNED and use pr_warn_ratelimited() instead. - Simplify the warning message code. - Add a new patch to fail the ctl_table registration with invalid flag. - Add a test case for range clamping in sysctl selftest. v2->v3: - Fix kdoc comment errors. - Incorporate comments and suggestions from Luis R. Rodriguez. - Add a patch to fix a typo error in fs/proc/proc_sysctl.c. v1->v2: - Add kdoc comments to the do_proc_do{u}intvec_minmax_conv_param structures. - Add a new flags field to the ctl_table structure for specifying whether range clamping should be activated instead of adding new sysctl parameter handlers. - Clamp the semmni value embedded in the multi-values sem parameter. v1 patch: https://lkml.org/lkml/2018/2/19/453 v2 patch: https://lkml.org/lkml/2018/2/27/627 v3 patch: https://lkml.org/lkml/2018/3/1/716 v4 patch: https://lkml.org/lkml/2018/3/12/867 The sysctl parameters msgmni, shmmni and semmni have an inherent limit of IPC_MNI (32k). However, users may not be aware of that because they can write a value much higher than that without getting any error or notification. Reading the parameters back will show the newly written values which are not real. Enforcing the limit by failing sysctl parameter write, however, may cause regressions if existing user setup scripts set those parameters above 32k as those scripts will now fail in this case. To address this delemma, a new flags field is introduced into the ctl_table. The value CTL_FLAGS_CLAMP_RANGE can be added to any ctl_table entries to enable a looser range clamping without returning any error. For example, .flags = CTL_FLAGS_CLAMP_RANGE, This flags value are now used for the range checking of shmmni, msgmni and semmni without breaking existing applications. If any out of range value is written to those sysctl parameters, the following warning will be printed instead. sysctl: "shmmni" was set out of range [0, 32768], clamped to 32768. Reading the values back will show 32768 instead of some fake values. New sysctl selftests are added to exercise new code added by this patchset. There are users out there requesting increase in the IPCMNI value. The last 2 patches attempt to do that by using a boot kernel parameter "ipcmni_extend" to increase the IPCMNI limit from 32k to 2M. Eric Biederman had posted an RFC patch to just scrap the IPCMNI limit and open up the whole positive integer space for IPC IDs. A major issue that I have with this approach is that SysV IPC had been in use for over 20 years. We just don't know if there are user applications that have dependency on the way that the IDs are built. So drastic change like this may have the potential of breaking some applications. I prefer a more conservative approach where users will observe no change in behavior unless they explictly opt in to enable the extended mode. I could open up the whole positive integer space in this case like what Eric did, but that will make the code more complex. So I just extend IPCMNI to 2M in this case and keep similar ID generation logic. Waiman Long (9): sysctl: Add flags to support min/max range clamping proc/sysctl: Provide additional ctl_table.flags checks sysctl: Warn when a clamped sysctl parameter is set out of range ipc: Clamp msgmni and shmmni to the real IPCMNI limit ipc: Clamp semmni to the real IPCMNI limit test_sysctl: Add range clamping test test_sysctl: Add ctl_table registration failure test ipc: Allow boot time extension of IPCMNI from 32k to 2M ipc: Conserve sequence numbers in extended IPCMNI mode Documentation/admin-guide/kernel-parameters.txt | 3 + fs/proc/proc_sysctl.c | 62 include/linux/ipc.h | 11 +++- include/linux/ipc_namespace.h | 1 + include/linux/sysctl.h | 32 +++ ipc/ipc_sysctl.c| 33 ++- ipc/sem.c | 25 ipc/util.c | 41 - ipc/util.h | 23 +--- kernel/sysctl.c | 76 ++--- lib/test_sysctl.c | 70
[PATCH v5 1/9] sysctl: Add flags to support min/max range clamping
When minimum/maximum values are specified for a sysctl parameter in the ctl_table structure with proc_dointvec_minmax() handler, update to that parameter will fail with error if the given value is outside of the required range. There are use cases where it may be better to clamp the value of the sysctl parameter to the given range without failing the update, especially if the users are not aware of the actual range limits. Reading the value back after the update will now be a good practice to see if the provided value exceeds the range limits. To provide this less restrictive form of range checking, a new flags field is added to the ctl_table structure. The new field is a 16-bit value that just fits into the hole left by the 16-bit umode_t field without increasing the size of the structure. When the CTL_FLAGS_CLAMP_RANGE flag is set in the ctl_table entry, any update from the userspace will be clamped to the given range without error if either the proc_dointvec_minmax() or the proc_douintvec_minmax() handlers is used. The clamped value is either the maximum or minimum value that is closest to the input value provided by the user. Signed-off-by: Waiman Long --- include/linux/sysctl.h | 20 kernel/sysctl.c| 48 +++- 2 files changed, 59 insertions(+), 9 deletions(-) diff --git a/include/linux/sysctl.h b/include/linux/sysctl.h index b769ecf..e446e1f 100644 --- a/include/linux/sysctl.h +++ b/include/linux/sysctl.h @@ -116,6 +116,7 @@ struct ctl_table void *data; int maxlen; umode_t mode; + uint16_t flags; struct ctl_table *child;/* Deprecated */ proc_handler *proc_handler; /* Callback for text formatting */ struct ctl_table_poll *poll; @@ -123,6 +124,25 @@ struct ctl_table void *extra2; } __randomize_layout; +/** + * enum ctl_table_flags - flags for the ctl table (struct ctl_table.flags) + * + * @CTL_FLAGS_CLAMP_RANGE: Set to indicate that the entry should be + * flexibly clamped to the provided min/max value in case the user + * provided a value outside of the given range. The clamped value is + * either the provided minimum or maximum value that is closest to + * the input value. No lower bound or upper bound checking will be + * done if the corresponding minimum or maximum value isn't provided. + * + * At most 16 different flags are allowed. + */ +enum ctl_table_flags { + CTL_FLAGS_CLAMP_RANGE = BIT(0), + __CTL_FLAGS_MAX = BIT(1), +}; + +#define CTL_TABLE_FLAGS_ALL(__CTL_FLAGS_MAX - 1) + struct ctl_node { struct rb_node node; struct ctl_table_header *header; diff --git a/kernel/sysctl.c b/kernel/sysctl.c index d2aa6b4..af351ed 100644 --- a/kernel/sysctl.c +++ b/kernel/sysctl.c @@ -2504,6 +2504,7 @@ static int proc_dointvec_minmax_sysadmin(struct ctl_table *table, int write, * struct do_proc_dointvec_minmax_conv_param - proc_dointvec_minmax() range checking structure * @min: pointer to minimum allowable value * @max: pointer to maximum allowable value + * @flags: pointer to flags * * The do_proc_dointvec_minmax_conv_param structure provides the * minimum and maximum values for doing range checking for those sysctl @@ -2512,6 +2513,7 @@ static int proc_dointvec_minmax_sysadmin(struct ctl_table *table, int write, struct do_proc_dointvec_minmax_conv_param { int *min; int *max; + uint16_t *flags; }; static int do_proc_dointvec_minmax_conv(bool *negp, unsigned long *lvalp, @@ -2521,9 +2523,21 @@ static int do_proc_dointvec_minmax_conv(bool *negp, unsigned long *lvalp, struct do_proc_dointvec_minmax_conv_param *param = data; if (write) { int val = *negp ? -*lvalp : *lvalp; - if ((param->min && *param->min > val) || - (param->max && *param->max < val)) - return -EINVAL; + bool clamp = param->flags && + (*param->flags & CTL_FLAGS_CLAMP_RANGE); + + if (param->min && *param->min > val) { + if (clamp) + val = *param->min; + else + return -EINVAL; + } + if (param->max && *param->max < val) { + if (clamp) + val = *param->max; + else + return -EINVAL; + } *valp = val; } else { int val = *valp; @@ -2552,7 +2566,8 @@ static int do_proc_dointvec_minmax_conv(bool *negp, unsigned long *lvalp, * This routine will ensure the values are within the range specified by * table->extra1 (min) and table->extra2 (max). * - * Returns 0 on success or -EINVAL on write when the range check fails. + * Returns 0 on success or -EINVAL on writ
[PATCH v5 3/9] sysctl: Warn when a clamped sysctl parameter is set out of range
Even with clamped sysctl parameters, it is still not that straight forward to figure out the exact range of those parameters. One may try to write extreme parameter values to see if they get clamped. To make it easier, a warning with the expected range will now be printed into the kernel ring buffer when a clamped sysctl parameter receives an out of range value. The pr_warn_ratelimited() macro is used to limit the number of warning messages that can be printed within a given period of time. Signed-off-by: Waiman Long --- kernel/sysctl.c | 44 1 file changed, 36 insertions(+), 8 deletions(-) diff --git a/kernel/sysctl.c b/kernel/sysctl.c index af351ed..a9e3ed4 100644 --- a/kernel/sysctl.c +++ b/kernel/sysctl.c @@ -17,6 +17,7 @@ * The list_for_each() macro wasn't appropriate for the sysctl loop. * Removed it and replaced it with older style, 03/23/00, Bill Wendling */ +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt #include #include @@ -2505,6 +2506,7 @@ static int proc_dointvec_minmax_sysadmin(struct ctl_table *table, int write, * @min: pointer to minimum allowable value * @max: pointer to maximum allowable value * @flags: pointer to flags + * @name: sysctl parameter name * * The do_proc_dointvec_minmax_conv_param structure provides the * minimum and maximum values for doing range checking for those sysctl @@ -2514,6 +2516,7 @@ struct do_proc_dointvec_minmax_conv_param { int *min; int *max; uint16_t *flags; + const char *name; }; static int do_proc_dointvec_minmax_conv(bool *negp, unsigned long *lvalp, @@ -2521,24 +2524,35 @@ static int do_proc_dointvec_minmax_conv(bool *negp, unsigned long *lvalp, int write, void *data) { struct do_proc_dointvec_minmax_conv_param *param = data; + if (write) { int val = *negp ? -*lvalp : *lvalp; + bool clamped = false; bool clamp = param->flags && (*param->flags & CTL_FLAGS_CLAMP_RANGE); if (param->min && *param->min > val) { - if (clamp) + if (clamp) { val = *param->min; - else + clamped = true; + } else { return -EINVAL; + } } if (param->max && *param->max < val) { - if (clamp) + if (clamp) { val = *param->max; - else + clamped = true; + } else { return -EINVAL; + } } *valp = val; + if (clamped && param->name) + pr_warn_ratelimited("\"%s\" was set out of range [%d, %d], clamped to %d.\n", + param->name, + param->min ? *param->min : -INT_MAX, + param->max ? *param->max : INT_MAX, val); } else { int val = *valp; if (val < 0) { @@ -2576,6 +2590,7 @@ int proc_dointvec_minmax(struct ctl_table *table, int write, .min = (int *) table->extra1, .max = (int *) table->extra2, .flags = &table->flags, + .name = table->procname, }; return do_proc_dointvec(table, write, buffer, lenp, ppos, do_proc_dointvec_minmax_conv, ¶m); @@ -2586,6 +2601,7 @@ int proc_dointvec_minmax(struct ctl_table *table, int write, * @min: pointer to minimum allowable value * @max: pointer to maximum allowable value * @flags: pointer to flags + * @name: sysctl parameter name * * The do_proc_douintvec_minmax_conv_param structure provides the * minimum and maximum values for doing range checking for those sysctl @@ -2595,6 +2611,7 @@ struct do_proc_douintvec_minmax_conv_param { unsigned int *min; unsigned int *max; uint16_t *flags; + const char *name; }; static int do_proc_douintvec_minmax_conv(unsigned long *lvalp, @@ -2605,6 +2622,7 @@ static int do_proc_douintvec_minmax_conv(unsigned long *lvalp, if (write) { unsigned int val = *lvalp; + bool clamped = false; bool clamp = param->flags && (*param->flags & CTL_FLAGS_CLAMP_RANGE); @@ -2612,18 +2630,27 @@ static int do_proc_douintvec_minmax_conv(unsigned long *lvalp, return -EINVAL; if (param->min && *param->min > val) { - if (clamp) + if (clamp) { val = *param->min; - else + cla
[PATCH v5 2/9] proc/sysctl: Provide additional ctl_table.flags checks
Checking code is added to provide the following additional ctl_table.flags checks: 1) No unknown flag is allowed. 2) Minimum of a range cannot be larger than the maximum value. 3) The signed and unsigned flags are mutually exclusive. 4) The proc_handler should be consistent with the signed or unsigned flags. Two new flags are added to indicate if the min/max values are signed or unsigned - CTL_FLAGS_SIGNED_RANGE & CTL_FLAGS_UNSIGNED_RANGE. These 2 flags can be optionally enabled for range checking purpose. But either one of them must be set with CTL_FLAGS_CLAMP_RANGE. Signed-off-by: Waiman Long --- fs/proc/proc_sysctl.c | 62 ++ include/linux/sysctl.h | 16 +++-- 2 files changed, 76 insertions(+), 2 deletions(-) diff --git a/fs/proc/proc_sysctl.c b/fs/proc/proc_sysctl.c index 493c975..2863ea1 100644 --- a/fs/proc/proc_sysctl.c +++ b/fs/proc/proc_sysctl.c @@ -1092,6 +1092,66 @@ static int sysctl_check_table_array(const char *path, struct ctl_table *table) return err; } +static int sysctl_check_flags(const char *path, struct ctl_table *table) +{ + int err = 0; + uint16_t sign_flags = CTL_FLAGS_SIGNED_RANGE|CTL_FLAGS_UNSIGNED_RANGE; + + if ((table->flags & ~CTL_TABLE_FLAGS_ALL) || + ((table->flags & sign_flags) == sign_flags)) + err = sysctl_err(path, table, "invalid flags"); + + if (table->flags & (CTL_FLAGS_CLAMP_RANGE | sign_flags)) { + int range_err = 0; + bool is_int = (table->maxlen == sizeof(int)); + + if (!is_int && (table->maxlen != sizeof(long))) { + range_err++; + } else if (!table->extra1 || !table->extra2) { + /* No min > max checking needed */ + } else if (table->flags & CTL_FLAGS_UNSIGNED_RANGE) { + unsigned long min, max; + + min = is_int ? *(unsigned int *)table->extra1 +: *(unsigned long *)table->extra1; + max = is_int ? *(unsigned int *)table->extra2 +: *(unsigned long *)table->extra2; + range_err += (min > max); + } else if (table->flags & CTL_FLAGS_SIGNED_RANGE) { + + long min, max; + + min = is_int ? *(int *)table->extra1 +: *(long *)table->extra1; + max = is_int ? *(int *)table->extra2 +: *(long *)table->extra2; + range_err += (min > max); + } else { + /* +* Either CTL_FLAGS_UNSIGNED_RANGE or +* CTL_FLAGS_SIGNED_RANGE should be set. +*/ + range_err++; + } + + /* +* proc_handler and flag consistency check. +*/ + if (((table->proc_handler == proc_douintvec_minmax) || +(table->proc_handler == proc_doulongvec_minmax)) && + !(table->flags & CTL_FLAGS_UNSIGNED_RANGE)) + range_err++; + + if ((table->proc_handler == proc_dointvec_minmax) && + !(table->flags & CTL_FLAGS_SIGNED_RANGE)) + range_err++; + + if (range_err) + err |= sysctl_err(path, table, "Invalid range"); + } + return err; +} + static int sysctl_check_table(const char *path, struct ctl_table *table) { int err = 0; @@ -,6 +1171,8 @@ static int sysctl_check_table(const char *path, struct ctl_table *table) (table->proc_handler == proc_doulongvec_ms_jiffies_minmax)) { if (!table->data) err |= sysctl_err(path, table, "No data"); + if (table->flags) + err |= sysctl_check_flags(path, table); if (!table->maxlen) err |= sysctl_err(path, table, "No maxlen"); else diff --git a/include/linux/sysctl.h b/include/linux/sysctl.h index e446e1f..088f032 100644 --- a/include/linux/sysctl.h +++ b/include/linux/sysctl.h @@ -134,14 +134,26 @@ struct ctl_table * the input value. No lower bound or upper bound checking will be * done if the corresponding minimum or maximum value isn't provided. * + * @CTL_FLAGS_SIGNED_RANGE: Set to indicate that the extra1 and extra2 + * fields are pointers to minimum and maximum signed values of + * an allowable range. + * + * @CTL_FLAGS_UNSIGNED_RANGE: Set to indicate that the extra1 and extra2 + * fields are pointers to minimum and maximum unsigned values of + * an allowable range. + * * At most 16 different flags are allowed. */ enum ctl_t
[PATCH v5 9/9] ipc: Conserve sequence numbers in extended IPCMNI mode
The mixing in of a sequence number into the IPC IDs is probably to avoid ID reuse in userspace as much as possible. With extended IPCMNI mode, the number of usable sequecne numbers is greatly reduced leading to higher chance of ID reuse. To address this issue, we need to conserve the sequence number space as much as possible. Right now, the sequence number is incremented for every new ID created. In reality, we only need to increment the sequence number when one or more IDs have been removed previously to make sure that those IDs will not be reused when a new one is built. This is being done in the extended IPCMNI mode, Signed-off-by: Waiman Long --- include/linux/ipc_namespace.h | 1 + ipc/ipc_sysctl.c | 2 ++ ipc/util.c| 29 ++--- ipc/util.h| 1 + 4 files changed, 26 insertions(+), 7 deletions(-) diff --git a/include/linux/ipc_namespace.h b/include/linux/ipc_namespace.h index b5630c8..9c86fd9 100644 --- a/include/linux/ipc_namespace.h +++ b/include/linux/ipc_namespace.h @@ -16,6 +16,7 @@ struct ipc_ids { int in_use; unsigned short seq; + unsigned short deleted; bool tables_initialized; struct rw_semaphore rwsem; struct idr ipcs_idr; diff --git a/ipc/ipc_sysctl.c b/ipc/ipc_sysctl.c index 5f7cfae..61a832d 100644 --- a/ipc/ipc_sysctl.c +++ b/ipc/ipc_sysctl.c @@ -111,6 +111,7 @@ static int proc_ipc_sem_dointvec(struct ctl_table *table, int write, static int int_max = INT_MAX; int ipc_mni __read_mostly = IPCMNI; int ipc_mni_shift __read_mostly = IPCMNI_SHIFT; +bool ipc_mni_extended __read_mostly; static struct ctl_table ipc_kern_table[] = { { @@ -243,6 +244,7 @@ static int __init ipc_mni_extend(char *str) { ipc_mni = IPCMNI_EXTEND; ipc_mni_shift = IPCMNI_EXTEND_SHIFT; + ipc_mni_extended = true; pr_info("IPCMNI extended to %d.\n", ipc_mni); return 0; } diff --git a/ipc/util.c b/ipc/util.c index daee305..8b38a6f 100644 --- a/ipc/util.c +++ b/ipc/util.c @@ -118,7 +118,8 @@ int ipc_init_ids(struct ipc_ids *ids) { int err; ids->in_use = 0; - ids->seq = 0; + ids->deleted = false; + ids->seq = ipc_mni_extended ? 0 : -1; /* seq # is pre-incremented */ init_rwsem(&ids->rwsem); err = rhashtable_init(&ids->key_ht, &ipc_kht_params); if (err) @@ -192,6 +193,11 @@ static struct kern_ipc_perm *ipc_findkey(struct ipc_ids *ids, key_t key) return NULL; } +/* + * To conserve sequence number space with extended ipc_mni when new ID + * is built, the sequence number is incremented only when one or more + * IDs have been removed previously. + */ #ifdef CONFIG_CHECKPOINT_RESTORE /* * Specify desired id for next allocated IPC object. @@ -205,9 +211,13 @@ static inline int ipc_buildid(int id, struct ipc_ids *ids, struct kern_ipc_perm *new) { if (ids->next_id < 0) { /* default, behave as !CHECKPOINT_RESTORE */ - new->seq = ids->seq++; - if (ids->seq > IPCID_SEQ_MAX) - ids->seq = 0; + if (!ipc_mni_extended || ids->deleted) { + ids->seq++; + if (ids->seq > IPCID_SEQ_MAX) + ids->seq = 0; + ids->deleted = false; + } + new->seq = ids->seq; } else { new->seq = ipcid_to_seqx(ids->next_id); ids->next_id = -1; @@ -223,9 +233,13 @@ static inline int ipc_buildid(int id, struct ipc_ids *ids, static inline int ipc_buildid(int id, struct ipc_ids *ids, struct kern_ipc_perm *new) { - new->seq = ids->seq++; - if (ids->seq > IPCID_SEQ_MAX) - ids->seq = 0; + if (!ipc_mni_extended || ids->deleted) { + ids->seq++; + if (ids->seq > IPCID_SEQ_MAX) + ids->seq = 0; + ids->deleted = false; + } + new->seq = ids->seq; return (new->seq << SEQ_SHIFT) + id; } @@ -435,6 +449,7 @@ void ipc_rmid(struct ipc_ids *ids, struct kern_ipc_perm *ipcp) idr_remove(&ids->ipcs_idr, lid); ipc_kht_remove(ids, ipcp); ids->in_use--; + ids->deleted = true; ipcp->deleted = true; if (unlikely(lid == ids->max_id)) { diff --git a/ipc/util.h b/ipc/util.h index 6871ca9..e6c2055 100644 --- a/ipc/util.h +++ b/ipc/util.h @@ -17,6 +17,7 @@ extern int ipc_mni; extern int ipc_mni_shift; +extern bool ipc_mni_extended; #define SEQ_SHIFT ipc_mni_shift #define SEQ_MASK ((1 << ipc_mni_shift) - 1) -- 1.8.3.1 -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v5 6/9] test_sysctl: Add range clamping test
Add a range clamping test to verify that the input value will be clamped if it exceeds the builtin maximum or minimum value. Below is the expected test run result: Running test: sysctl_test_0006 - run #0 Checking range minimum clamping ... ok Checking range maximum clamping ... ok Checking range minimum clamping ... ok Checking range maximum clamping ... ok Signed-off-by: Waiman Long --- lib/test_sysctl.c| 29 ++ tools/testing/selftests/sysctl/sysctl.sh | 52 2 files changed, 81 insertions(+) diff --git a/lib/test_sysctl.c b/lib/test_sysctl.c index 3dd801c..7bb4cf7 100644 --- a/lib/test_sysctl.c +++ b/lib/test_sysctl.c @@ -38,12 +38,18 @@ static int i_zero; static int i_one_hundred = 100; +static int signed_min = -10; +static int signed_max = 10; +static unsigned int unsigned_min = 10; +static unsigned int unsigned_max = 30; struct test_sysctl_data { int int_0001; int int_0002; int int_0003[4]; + int range_0001; + unsigned int urange_0001; unsigned int uint_0001; char string_0001[65]; @@ -58,6 +64,9 @@ struct test_sysctl_data { .int_0003[2] = 2, .int_0003[3] = 3, + .range_0001 = 0, + .urange_0001 = 20, + .uint_0001 = 314, .string_0001 = "(none)", @@ -102,6 +111,26 @@ struct test_sysctl_data { .mode = 0644, .proc_handler = proc_dostring, }, + { + .procname = "range_0001", + .data = &test_data.range_0001, + .maxlen = sizeof(test_data.range_0001), + .mode = 0644, + .proc_handler = proc_dointvec_minmax, + .flags = CTL_FLAGS_CLAMP_RANGE_SIGNED, + .extra1 = &signed_min, + .extra2 = &signed_max, + }, + { + .procname = "urange_0001", + .data = &test_data.urange_0001, + .maxlen = sizeof(test_data.urange_0001), + .mode = 0644, + .proc_handler = proc_douintvec_minmax, + .flags = CTL_FLAGS_CLAMP_RANGE_UNSIGNED, + .extra1 = &unsigned_min, + .extra2 = &unsigned_max, + }, { } }; diff --git a/tools/testing/selftests/sysctl/sysctl.sh b/tools/testing/selftests/sysctl/sysctl.sh index ec232c3..1aa1bba 100755 --- a/tools/testing/selftests/sysctl/sysctl.sh +++ b/tools/testing/selftests/sysctl/sysctl.sh @@ -34,6 +34,7 @@ ALL_TESTS="$ALL_TESTS 0002:1:1" ALL_TESTS="$ALL_TESTS 0003:1:1" ALL_TESTS="$ALL_TESTS 0004:1:1" ALL_TESTS="$ALL_TESTS 0005:3:1" +ALL_TESTS="$ALL_TESTS 0006:1:1" test_modprobe() { @@ -543,6 +544,38 @@ run_stringtests() test_rc } +# TARGET, RANGE_MIN & RANGE_MAX need to be defined before running test. +run_range_clamping_test() +{ + rc=0 + + echo -n "Checking range minimum clamping ... " + VAL=$((RANGE_MIN - 1)) + echo -n $VAL > "${TARGET}" 2> /dev/null + EXITVAL=$? + NEWVAL=$(cat "${TARGET}") + if [[ $EXITVAL -ne 0 || $NEWVAL -ne $RANGE_MIN ]]; then + echo "FAIL" >&2 + rc=1 + else + echo "ok" + fi + + echo -n "Checking range maximum clamping ... " + VAL=$((RANGE_MAX + 1)) + echo -n $VAL > "${TARGET}" 2> /dev/null + EXITVAL=$? + NEWVAL=$(cat "${TARGET}") + if [[ $EXITVAL -ne 0 || $NEWVAL -ne $RANGE_MAX ]]; then + echo "FAIL" >&2 + rc=1 + else + echo "ok" + fi + + test_rc +} + sysctl_test_0001() { TARGET="${SYSCTL}/int_0001" @@ -600,6 +633,25 @@ sysctl_test_0005() run_limit_digit_int_array } +sysctl_test_0006() +{ + TARGET="${SYSCTL}/range_0001" + ORIG=$(cat "${TARGET}") + RANGE_MIN=-10 + RANGE_MAX=10 + + run_range_clamping_test + set_orig + + TARGET="${SYSCTL}/urange_0001" + ORIG=$(cat "${TARGET}") + RANGE_MIN=10 + RANGE_MAX=30 + + run_range_clamping_test + set_orig +} + list_tests() { echo "Test ID list:" -- 1.8.3.1 -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v5 8/9] ipc: Allow boot time extension of IPCMNI from 32k to 2M
The maximum number of unique System V IPC identifiers was limited to 32k. That limit should be big enough for most use cases. However, there are some users out there requesting for more. To satisfy the need of those users, a new boot time kernel option "ipcmni_extend" is added to extend the IPCMNI value to 2M. This is a 64X increase which hopefully is big enough for them. This new option does have the side effect of reducing the maximum number of unique sequence numbers from 64k down to 1k. So it is a trade-off. Signed-off-by: Waiman Long --- Documentation/admin-guide/kernel-parameters.txt | 3 +++ include/linux/ipc.h | 11 ++- ipc/ipc_sysctl.c| 12 +++- ipc/util.c | 12 ++-- ipc/util.h | 18 +++--- 5 files changed, 41 insertions(+), 15 deletions(-) diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt index 1d1d53f..2be35a4 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -1733,6 +1733,9 @@ ip= [IP_PNP] See Documentation/filesystems/nfs/nfsroot.txt. + ipcmni_extend [KNL] Extend the maximum number of unique System V + IPC identifiers from 32768 to 2097152. + irqaffinity=[SMP] Set the default irq affinity mask The argument is a cpu list, as described above. diff --git a/include/linux/ipc.h b/include/linux/ipc.h index 821b2f2..3ecd869 100644 --- a/include/linux/ipc.h +++ b/include/linux/ipc.h @@ -8,7 +8,16 @@ #include #include -#define IPCMNI 32768 /* <= MAX_INT limit for ipc arrays (including sysctl changes) */ +/* + * By default, the ipc arrays can have up to 32k (15 bits) entries. + * When IPCMNI extension mode is turned on, the ipc arrays can have up + * to 2M (21 bits) entries. However, the space for sequence number will + * be shrunk from 16 bits to 10 bits. + */ +#define IPCMNI_SHIFT 15 +#define IPCMNI_EXTEND_SHIFT21 +#define IPCMNI (1 << IPCMNI_SHIFT) +#define IPCMNI_EXTEND (1 << IPCMNI_EXTEND_SHIFT) /* used by in-kernel data structures */ struct kern_ipc_perm { diff --git a/ipc/ipc_sysctl.c b/ipc/ipc_sysctl.c index 0ad7088..5f7cfae 100644 --- a/ipc/ipc_sysctl.c +++ b/ipc/ipc_sysctl.c @@ -109,7 +109,8 @@ static int proc_ipc_sem_dointvec(struct ctl_table *table, int write, static int zero; static int one = 1; static int int_max = INT_MAX; -static int ipc_mni = IPCMNI; +int ipc_mni __read_mostly = IPCMNI; +int ipc_mni_shift __read_mostly = IPCMNI_SHIFT; static struct ctl_table ipc_kern_table[] = { { @@ -237,3 +238,12 @@ static int __init ipc_sysctl_init(void) } device_initcall(ipc_sysctl_init); + +static int __init ipc_mni_extend(char *str) +{ + ipc_mni = IPCMNI_EXTEND; + ipc_mni_shift = IPCMNI_EXTEND_SHIFT; + pr_info("IPCMNI extended to %d.\n", ipc_mni); + return 0; +} +early_param("ipcmni_extend", ipc_mni_extend); diff --git a/ipc/util.c b/ipc/util.c index 4ed5a17..daee305 100644 --- a/ipc/util.c +++ b/ipc/util.c @@ -112,7 +112,7 @@ static int __init ipc_init(void) * @ids: ipc identifier set * * Set up the sequence range to use for the ipc identifier range (limited - * below IPCMNI) then initialise the keys hashtable and ids idr. + * below ipc_mni) then initialise the keys hashtable and ids idr. */ int ipc_init_ids(struct ipc_ids *ids) { @@ -213,7 +213,7 @@ static inline int ipc_buildid(int id, struct ipc_ids *ids, ids->next_id = -1; } - return SEQ_MULTIPLIER * new->seq + id; + return (new->seq << SEQ_SHIFT) + id; } #else @@ -227,7 +227,7 @@ static inline int ipc_buildid(int id, struct ipc_ids *ids, if (ids->seq > IPCID_SEQ_MAX) ids->seq = 0; - return SEQ_MULTIPLIER * new->seq + id; + return (new->seq << SEQ_SHIFT) + id; } #endif /* CONFIG_CHECKPOINT_RESTORE */ @@ -251,8 +251,8 @@ int ipc_addid(struct ipc_ids *ids, struct kern_ipc_perm *new, int limit) kgid_t egid; int id, err; - if (limit > IPCMNI) - limit = IPCMNI; + if (limit > ipc_mni) + limit = ipc_mni; if (!ids->tables_initialized || ids->in_use >= limit) return -ENOSPC; @@ -769,7 +769,7 @@ static struct kern_ipc_perm *sysvipc_find_ipc(struct ipc_ids *ids, loff_t pos, if (total >= ids->in_use) return NULL; - for (; pos < IPCMNI; pos++) { + for (; pos < ipc_mni; pos++) { ipc = idr_find(&ids->ipcs_idr, pos); if (ipc != NULL) { *new_pos = pos + 1; diff --git a/ipc/util.h b/ipc/util.h index af57394..6871ca9 100644 --- a/ipc/util.h +++ b/ipc/util.h @@ -15,7 +15,11 @@ #include
[PATCH v5 7/9] test_sysctl: Add ctl_table registration failure test
Incorrect sysctl tables are constructed and fed to the register_sysctl_table() function in the test_sysctl kernel module. The function is supposed to fail the registration of those tables or an error will be printed if no failure is returned. The registration failures will cause other warning and error messages to be printed into the dmesg log, though. A new test is also added to the sysctl.sh to look for those failure messages in the dmesg log to see if anything unexpeced happens. Signed-off-by: Waiman Long --- lib/test_sysctl.c| 41 tools/testing/selftests/sysctl/sysctl.sh | 15 2 files changed, 56 insertions(+) diff --git a/lib/test_sysctl.c b/lib/test_sysctl.c index 7bb4cf7..14853d5 100644 --- a/lib/test_sysctl.c +++ b/lib/test_sysctl.c @@ -154,13 +154,54 @@ struct test_sysctl_data { { } }; +static struct ctl_table fail_sysctl_table0[] = { + { + .procname = "failed_sysctl0", + .data = &test_data.range_0001, + .maxlen = sizeof(test_data.range_0001), + .mode = 0644, + .proc_handler = proc_dointvec_minmax, + .flags = CTL_FLAGS_CLAMP_RANGE_SIGNED, + .extra1 = &signed_max, + .extra2 = &signed_min, + }, + { } +}; + +static struct ctl_table fail_sysctl_root_table[] = { + { + .procname = "debug", + .maxlen = 0, + .mode = 0555, + }, + { } +}; + +static struct ctl_table *fail_tables[] = { + fail_sysctl_table0, NULL, +}; + static struct ctl_table_header *test_sysctl_header; static int __init test_sysctl_init(void) { + struct ctl_table_header *fail_sysctl_header; + int i; + test_sysctl_header = register_sysctl_table(test_sysctl_root_table); if (!test_sysctl_header) return -ENOMEM; + + for (i = 0; fail_tables[i]; i++) { + fail_sysctl_root_table[0].child = fail_tables[i]; + fail_sysctl_header = register_sysctl_table(fail_sysctl_root_table); + if (fail_sysctl_header) { + pr_err("fail_tables[%d] registration check failed!\n", i); + unregister_sysctl_table(fail_sysctl_header); + break; + } + } + return 0; } late_initcall(test_sysctl_init); diff --git a/tools/testing/selftests/sysctl/sysctl.sh b/tools/testing/selftests/sysctl/sysctl.sh index 1aa1bba..23acdee 100755 --- a/tools/testing/selftests/sysctl/sysctl.sh +++ b/tools/testing/selftests/sysctl/sysctl.sh @@ -35,6 +35,7 @@ ALL_TESTS="$ALL_TESTS 0003:1:1" ALL_TESTS="$ALL_TESTS 0004:1:1" ALL_TESTS="$ALL_TESTS 0005:3:1" ALL_TESTS="$ALL_TESTS 0006:1:1" +ALL_TESTS="$ALL_TESTS 0007:1:1" test_modprobe() { @@ -652,6 +653,20 @@ sysctl_test_0006() set_orig } +sysctl_test_0007() +{ + echo "Checking test_sysctl module registration failure test ..." + dmesg | grep "sysctl.*fail_tables.*failed" + if [[ $? -eq 0 ]]; then + echo "FAIL" >&2 + rc=1 + else + echo "ok" + fi + + test_rc +} + list_tests() { echo "Test ID list:" -- 1.8.3.1 -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v5 5/9] ipc: Clamp semmni to the real IPCMNI limit
For SysV semaphores, the semmni value is the last part of the 4-element sem number array. To make semmni behave in a similar way to msgmni and shmmni, we can't directly use the _minmax handler. Instead, a special sem specific handler is added to check the last argument to make sure that it is clamped to the [0, IPCMNI] range and prints a warning message once when an out-of-range value is being written. This does require duplicating some of the code in the _minmax handlers. Signed-off-by: Waiman Long --- ipc/ipc_sysctl.c | 12 +++- ipc/sem.c| 25 + ipc/util.h | 4 3 files changed, 40 insertions(+), 1 deletion(-) diff --git a/ipc/ipc_sysctl.c b/ipc/ipc_sysctl.c index 088721e..0ad7088 100644 --- a/ipc/ipc_sysctl.c +++ b/ipc/ipc_sysctl.c @@ -88,12 +88,22 @@ static int proc_ipc_auto_msgmni(struct ctl_table *table, int write, return proc_dointvec_minmax(&ipc_table, write, buffer, lenp, ppos); } +static int proc_ipc_sem_dointvec(struct ctl_table *table, int write, + void __user *buffer, size_t *lenp, loff_t *ppos) +{ + int ret = proc_ipc_dointvec(table, write, buffer, lenp, ppos); + + sem_check_semmni(table, current->nsproxy->ipc_ns); + return ret; +} + #else #define proc_ipc_doulongvec_minmax NULL #define proc_ipc_dointvec NULL #define proc_ipc_dointvec_minmax NULL #define proc_ipc_dointvec_minmax_orphans NULL #define proc_ipc_auto_msgmni NULL +#define proc_ipc_sem_dointvec NULL #endif static int zero; @@ -177,7 +187,7 @@ static int proc_ipc_auto_msgmni(struct ctl_table *table, int write, .data = &init_ipc_ns.sem_ctls, .maxlen = 4*sizeof(int), .mode = 0644, - .proc_handler = proc_ipc_dointvec, + .proc_handler = proc_ipc_sem_dointvec, }, #ifdef CONFIG_CHECKPOINT_RESTORE { diff --git a/ipc/sem.c b/ipc/sem.c index a4af049..faf2caa 100644 --- a/ipc/sem.c +++ b/ipc/sem.c @@ -2337,3 +2337,28 @@ static int sysvipc_sem_proc_show(struct seq_file *s, void *it) return 0; } #endif + +#ifdef CONFIG_PROC_SYSCTL +/* + * Check to see if semmni is out of range and clamp it if necessary. + */ +void sem_check_semmni(struct ctl_table *table, struct ipc_namespace *ns) +{ + bool clamped = false; + + /* +* Clamp semmni to the range [0, IPCMNI]. +*/ + if (ns->sc_semmni < 0) { + ns->sc_semmni = 0; + clamped = true; + } + if (ns->sc_semmni > IPCMNI) { + ns->sc_semmni = IPCMNI; + clamped = true; + } + if (clamped) + pr_warn_ratelimited("sysctl: \"sem[3]\" was set out of range [%d, %d], clamped to %d.\n", +0, IPCMNI, ns->sc_semmni); +} +#endif diff --git a/ipc/util.h b/ipc/util.h index 89b8ec1..af57394 100644 --- a/ipc/util.h +++ b/ipc/util.h @@ -206,6 +206,10 @@ int ipcget(struct ipc_namespace *ns, struct ipc_ids *ids, void free_ipcs(struct ipc_namespace *ns, struct ipc_ids *ids, void (*free)(struct ipc_namespace *, struct kern_ipc_perm *)); +#ifdef CONFIG_PROC_SYSCTL +extern void sem_check_semmni(struct ctl_table *table, struct ipc_namespace *ns); +#endif + #ifdef CONFIG_COMPAT #include struct compat_ipc_perm { -- 1.8.3.1 -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v5 4/9] ipc: Clamp msgmni and shmmni to the real IPCMNI limit
A user can write arbitrary integer values to msgmni and shmmni sysctl parameters without getting error, but the actual limit is really IPCMNI (32k). This can mislead users as they think they can get a value that is not real. Enforcing the limit by failing the sysctl parameter write, however, can break existing user applications. Instead, the range clamping flag is set to enforce the limit without failing existing user code. Users can easily figure out if the sysctl parameter value is out of range by either reading back the parameter value or checking the kernel ring buffer for warning. Signed-off-by: Waiman Long --- ipc/ipc_sysctl.c | 9 +++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/ipc/ipc_sysctl.c b/ipc/ipc_sysctl.c index 8ad93c2..088721e 100644 --- a/ipc/ipc_sysctl.c +++ b/ipc/ipc_sysctl.c @@ -99,6 +99,7 @@ static int proc_ipc_auto_msgmni(struct ctl_table *table, int write, static int zero; static int one = 1; static int int_max = INT_MAX; +static int ipc_mni = IPCMNI; static struct ctl_table ipc_kern_table[] = { { @@ -120,7 +121,10 @@ static int proc_ipc_auto_msgmni(struct ctl_table *table, int write, .data = &init_ipc_ns.shm_ctlmni, .maxlen = sizeof(init_ipc_ns.shm_ctlmni), .mode = 0644, - .proc_handler = proc_ipc_dointvec, + .proc_handler = proc_ipc_dointvec_minmax, + .extra1 = &zero, + .extra2 = &ipc_mni, + .flags = CTL_FLAGS_CLAMP_RANGE_SIGNED, }, { .procname = "shm_rmid_forced", @@ -147,7 +151,8 @@ static int proc_ipc_auto_msgmni(struct ctl_table *table, int write, .mode = 0644, .proc_handler = proc_ipc_dointvec_minmax, .extra1 = &zero, - .extra2 = &int_max, + .extra2 = &ipc_mni, + .flags = CTL_FLAGS_CLAMP_RANGE_SIGNED, }, { .procname = "auto_msgmni", -- 1.8.3.1 -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 4/4] tty: serial: msm_geni_serial: Add serial driver support for GENI based QUP
Hi Karthik, On Tue, Mar 13, 2018 at 1:16 PM Karthik Ramasubramanian < krama...@codeaurora.org> wrote: > On 3/2/2018 5:11 PM, Evan Green wrote: > >> + > >> +#ifdef CONFIG_CONSOLE_POLL > >> +#define RX_BYTES_PW 1 > >> +#else > >> +#define RX_BYTES_PW 4 > >> +#endif > > > > This seems fishy to me. Does either setting work? If so, why not just > > have one value? > Yes, either one works. In the interrupt driven mode, sometimes due to > increased interrupt latency the RX FIFO may overflow if we use only 1 > byte per FIFO word - given there are no flow control lines in the debug > uart. Hence using 4 bytes in the FIFO word will help to prevent the FIFO > overflow - especially in the case where commands are executed through > scripts. > In polling mode, using 1 byte per word helps to use the hardware to > buffer the data instead of software buffering especially when the > framework keeps reading one byte at a time. Ok, I think I understand. Let me paraphrase in case I'm wrong: Normally, you want all 4 bytes per word so that you use the hardware's full FIFO capability. This works out well since on receive you tell the system how many bytes you've received, so you're never stuck with leftover bytes. In polling mode, however, the system asks you for one byte, and the problem is with 4 bytes per FIFO word you may end up having read three additional bytes that you don't know what to do with. Configuring the UART to 1 byte per word allows you to skip coding up a couple of conditionals and an extra couple u32s in the device struct for saving those extra bytes, but divides the hardware FIFO size by 4. It seems a little cheesy to me just to avoid a bit of logic, but I'm not going to put my foot down about it. I guess it might get complicated when the console pulls in four bytes, but then only ends up eating one of them, and then we have to figure out how to get the other three into the normal ISR-based path. > > > >> +static bool qcom_geni_serial_poll_bit(struct uart_port *uport, > >> + int offset, int bit_field, bool set) > >> +{ > >> + u32 reg; > >> + struct qcom_geni_serial_port *port; > >> + unsigned int baud; > >> + unsigned int fifo_bits; > >> + unsigned long timeout_us = 2; > >> + > >> + /* Ensure polling is not re-ordered before the prior writes/reads */ > >> + mb(); > >> + > >> + if (uport->private_data) { > >> + port = to_dev_port(uport, uport); > >> + baud = port->cur_baud; > >> + if (!baud) > >> + baud = 115200; > >> + fifo_bits = port->tx_fifo_depth * port->tx_fifo_width; > >> + /* > >> +* Total polling iterations based on FIFO worth of bytes to be > >> +* sent at current baud .Add a little fluff to the wait. > >> +*/ > >> + timeout_us = ((fifo_bits * USEC_PER_SEC) / baud) + 500; > > > > This fluff is a little mysterious, can it be explained at all? Do you > > think the fluff factor is in units of time (as you have it) or bits? > > Time makes sense I guess if we're worried about clock source > > differences. > The fluff is in micro-seconds and can help with unforeseen delays in > emulation platforms. So emulated platforms go out to lunch, but that generally doesn't depend on baud rate or how many bits there are. Ok. Might be worth noting what that's for. > > > >> + > >> +static void qcom_geni_serial_console_write(struct console *co, const char *s, > >> + unsigned int count) > >> +{ > >> + struct uart_port *uport; > >> + struct qcom_geni_serial_port *port; > >> + bool locked = true; > >> + unsigned long flags; > >> + > >> + WARN_ON(co->index < 0 || co->index >= GENI_UART_CONS_PORTS); > >> + > >> + port = get_port_from_line(co->index); > >> + if (IS_ERR(port)) > >> + return; > >> + > >> + uport = &port->uport; > >> + if (oops_in_progress) > >> + locked = spin_trylock_irqsave(&uport->lock, flags); > >> + else > >> + spin_lock_irqsave(&uport->lock, flags); > >> + > >> + if (locked) { > >> + __qcom_geni_serial_console_write(uport, s, count); > >> + spin_unlock_irqrestore(&uport->lock, flags); > > > > I too am a little lost on the locking here. What exactly is the lock > > protecting? Looks like for the most part it's trying to synchronize > > with the ISR? What specifically in the ISR? I just wanted to go > > through and check to make sure whatever the shared resource is is > > appropriately protected. > The lock protects 2 simultaneous writers from putting the hardware in > the bad state. The output of the command entered in a shell can trigger > a write in the interrupt context while logging activity can trigger a > simultaneous write. Can you be any more specific here? What puts the hardware in a bad state?
Re: [PATCH v12 02/22] selftests/vm: rename all references to pkru to a generic name
On 02/21/2018 05:55 PM, Ram Pai wrote: > int pkey_set(int pkey, unsigned long rights, unsigned long flags) > { > u32 mask = (PKEY_DISABLE_ACCESS|PKEY_DISABLE_WRITE); > - u32 old_pkru = __rdpkru(); > - u32 new_pkru; > + u32 old_pkey_reg = __rdpkey_reg(); > + u32 new_pkey_reg; If we're not using the _actual_ instruction names ("rdpkru"), I think I'd rather this be something more readable, like: __read_pkey_reg(). But, it's OK-ish the way it is. Reviewed-by: Dave Hansen -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v12 04/22] selftests/vm: typecast the pkey register
On 02/21/2018 05:55 PM, Ram Pai wrote: > -static inline unsigned int _rdpkey_reg(int line) > +static inline pkey_reg_t _rdpkey_reg(int line) > { > - unsigned int pkey_reg = __rdpkey_reg(); > + pkey_reg_t pkey_reg = __rdpkey_reg(); > > - dprintf4("rdpkey_reg(line=%d) pkey_reg: %x shadow: %x\n", > + dprintf4("rdpkey_reg(line=%d) pkey_reg: %016lx shadow: %016lx\n", > line, pkey_reg, shadow_pkey_reg); > assert(pkey_reg == shadow_pkey_reg); Hmm. So we're using %lx for an int? Doesn't the compiler complain about this? -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v12 05/22] selftests/vm: generic function to handle shadow key register
On 02/21/2018 05:55 PM, Ram Pai wrote: > +static inline u32 pkey_to_shift(int pkey) > +{ > + return pkey * PKEY_BITS_PER_PKEY; > +} pkey_bit_position(), perhaps? > +static inline pkey_reg_t reset_bits(int pkey, pkey_reg_t bits) > +{ > + u32 shift = pkey_to_shift(pkey); > + > + return ~(bits << shift); > +} I'd prefer clear_pkey_flags() or maybe clear_pkey_bits(). "reset" can mean "reset to 0" or "reset to 1". Also, why the u32 here? Isn't an int more appropriate? > +static inline pkey_reg_t left_shift_bits(int pkey, pkey_reg_t bits) > +{ > + u32 shift = pkey_to_shift(pkey); > + > + return (bits << shift); > +} > + > +static inline pkey_reg_t right_shift_bits(int pkey, pkey_reg_t bits) > +{ > + u32 shift = pkey_to_shift(pkey); > + > + return (bits >> shift); > +} Some comments on these would be handy. Basically that this takes a per-key flags value and puts it at the right position so it can be shoved in the register. -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v12 06/22] selftests/vm: fix the wrong assert in pkey_disable_set()
On 02/21/2018 05:55 PM, Ram Pai wrote: > If the flag is 0, no bits will be set. Hence we cant expect > the resulting bitmap to have a higher value than what it > was earlier. > > cc: Dave Hansen > cc: Florian Weimer > Signed-off-by: Ram Pai > --- > tools/testing/selftests/vm/protection_keys.c |2 +- > 1 files changed, 1 insertions(+), 1 deletions(-) > > diff --git a/tools/testing/selftests/vm/protection_keys.c > b/tools/testing/selftests/vm/protection_keys.c > index 83216c5..0109388 100644 > --- a/tools/testing/selftests/vm/protection_keys.c > +++ b/tools/testing/selftests/vm/protection_keys.c > @@ -443,7 +443,7 @@ void pkey_disable_set(int pkey, int flags) > dprintf1("%s(%d) pkey_reg: 0x%lx\n", > __func__, pkey, rdpkey_reg()); > if (flags) > - pkey_assert(rdpkey_reg() > orig_pkey_reg); > + pkey_assert(rdpkey_reg() >= orig_pkey_reg); > dprintf1("END<---%s(%d, 0x%x)\n", __func__, > pkey, flags); > } I'm not sure about this one. Did this cause a problem for you? Why would you call this and ask no bits to be set? -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v12 07/22] selftests/vm: fixed bugs in pkey_disable_clear()
On 02/21/2018 05:55 PM, Ram Pai wrote: > --- a/tools/testing/selftests/vm/protection_keys.c > +++ b/tools/testing/selftests/vm/protection_keys.c > @@ -461,7 +461,7 @@ void pkey_disable_clear(int pkey, int flags) > pkey, pkey, pkey_rights); > pkey_assert(pkey_rights >= 0); > > - pkey_rights |= flags; > + pkey_rights &= ~flags; > > ret = pkey_set(pkey, pkey_rights, 0); > /* pkey_reg and flags have the same format */ > @@ -475,7 +475,7 @@ void pkey_disable_clear(int pkey, int flags) > dprintf1("%s(%d) pkey_reg: 0x%016lx\n", __func__, > pkey, rdpkey_reg()); > if (flags) > - assert(rdpkey_reg() > orig_pkey_reg); > + assert(rdpkey_reg() < orig_pkey_reg); > } > > void pkey_write_allow(int pkey) This seems so horribly wrong that I wonder how it worked in the first place. Any idea? -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v12 08/22] selftests/vm: clear the bits in shadow reg when a pkey is freed.
On 02/21/2018 05:55 PM, Ram Pai wrote: > When a key is freed, the key is no more effective. > Clear the bits corresponding to the pkey in the shadow > register. Otherwise it will carry some spurious bits > which can trigger false-positive asserts. ... > diff --git a/tools/testing/selftests/vm/protection_keys.c > b/tools/testing/selftests/vm/protection_keys.c > index ca54a95..aaf9f09 100644 > --- a/tools/testing/selftests/vm/protection_keys.c > +++ b/tools/testing/selftests/vm/protection_keys.c > @@ -582,6 +582,9 @@ int alloc_pkey(void) > int sys_pkey_free(unsigned long pkey) > { > int ret = syscall(SYS_pkey_free, pkey); > + > + if (!ret) > + shadow_pkey_reg &= reset_bits(pkey, PKEY_DISABLE_ACCESS); > dprintf1("%s(pkey=%ld) syscall ret: %d\n", __func__, pkey, ret); > return ret; > } Did this cause problems for you in practice? On x86, sys_pkey_free() does not affect PKRU, so this isn't quite right. I'd much rather have the actual tests explicitly clear the PKRU bits and also in the process clear the shadow bits. -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v12 09/22] selftests/vm: fix alloc_random_pkey() to make it really random
On 02/21/2018 05:55 PM, Ram Pai wrote: > alloc_random_pkey() was allocating the same pkey every time. > Not all pkeys were geting tested. fixed it. ... > @@ -602,13 +603,15 @@ int alloc_random_pkey(void) > int alloced_pkeys[NR_PKEYS]; > int nr_alloced = 0; > int random_index; > + > memset(alloced_pkeys, 0, sizeof(alloced_pkeys)); > + srand((unsigned int)time(NULL)); > > /* allocate every possible key and make a note of which ones we got */ > max_nr_pkey_allocs = NR_PKEYS; > - max_nr_pkey_allocs = 1; > for (i = 0; i < max_nr_pkey_allocs; i++) { > int new_pkey = alloc_pkey(); The srand() is probably useful, but won't this always just do a single alloc_pkey() now? That seems like it will mean we always use the first one the kernel gives us, which isn't random. > - dprintf1("%s()::%d, ret: %d pkey_reg: 0x%x shadow: 0x%x\n", __func__, > - __LINE__, ret, __rdpkey_reg(), shadow_pkey_reg); > + dprintf1("%s()::%d, ret: %d pkey_reg: 0x%x shadow: 0x%016lx\n", > + __func__, __LINE__, ret, __rdpkey_reg(), shadow_pkey_reg); > return ret; > } This belonged in the pkey_reg_t patch, I think. -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v12 10/22] selftests/vm: introduce two arch independent abstraction
On 02/21/2018 05:55 PM, Ram Pai wrote: > open_hugepage_file() <- opens the huge page file > get_start_key() <-- provides the first non-reserved key. > Looks reasonable. Reviewed-by: Dave Hansen -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v12 11/22] selftests/vm: pkey register should match shadow pkey
On 02/21/2018 05:55 PM, Ram Pai wrote: > expected_pkey_fault() is comparing the contents of pkey > register with 0. This may not be true all the time. There > could be bits set by default by the architecture > which can never be changed. Hence compare the value against > shadow pkey register, which is supposed to track the bits > accurately all throughout > > cc: Dave Hansen > cc: Florian Weimer > Signed-off-by: Ram Pai > --- > tools/testing/selftests/vm/protection_keys.c |4 ++-- > 1 files changed, 2 insertions(+), 2 deletions(-) > > diff --git a/tools/testing/selftests/vm/protection_keys.c > b/tools/testing/selftests/vm/protection_keys.c > index 254b66d..6054093 100644 > --- a/tools/testing/selftests/vm/protection_keys.c > +++ b/tools/testing/selftests/vm/protection_keys.c > @@ -926,10 +926,10 @@ void expected_pkey_fault(int pkey) > pkey_assert(last_pkey_faults + 1 == pkey_faults); > pkey_assert(last_si_pkey == pkey); > /* > - * The signal handler shold have cleared out PKEY register to let the > + * The signal handler shold have cleared out pkey-register to let the Heh, you randomly changed the formatting and didn't bother with my awful typo. :) >* test program continue. We now have to restore it. >*/ > - if (__rdpkey_reg() != 0) > + if (__rdpkey_reg() != shadow_pkey_reg) > pkey_assert(0); > > __wrpkey_reg(shadow_pkey_reg); > I don't think this should be "shadow_pkey_reg". This was just trying to double-check that the signal handler messed around with PKRU the way we expected. We could also just check that the disable bits for 'pkey' are clear at this point. That would be almost as good. -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v12 12/22] selftests/vm: generic cleanup
On 02/21/2018 05:55 PM, Ram Pai wrote: > cleanup the code to satisfy coding styles. > > cc: Dave Hansen > cc: Florian Weimer > Signed-off-by: Ram Pai > --- > tools/testing/selftests/vm/protection_keys.c | 81 > ++ > 1 files changed, 43 insertions(+), 38 deletions(-) > > diff --git a/tools/testing/selftests/vm/protection_keys.c > b/tools/testing/selftests/vm/protection_keys.c > index 6054093..6fdd8f5 100644 > --- a/tools/testing/selftests/vm/protection_keys.c > +++ b/tools/testing/selftests/vm/protection_keys.c > @@ -4,7 +4,7 @@ > * > * There are examples in here of: > * * how to set protection keys on memory > - * * how to set/clear bits in pkey registers (the rights register) > + * * how to set/clear bits in Protection Key registers (the rights register) I don't think CodingStyle says to do this. :) > * * how to handle SEGV_PKUERR signals and extract pkey-relevant > *information from the siginfo > * > @@ -13,13 +13,18 @@ > * prefault pages in at malloc, or not > * protect MPX bounds tables with protection keys? > * make sure VMA splitting/merging is working correctly > - * OOMs can destroy mm->mmap (see exit_mmap()), so make sure it is immune > to pkeys > - * look for pkey "leaks" where it is still set on a VMA but "freed" back > to the kernel > - * do a plain mprotect() to a mprotect_pkey() area and make sure the pkey > sticks > + * OOMs can destroy mm->mmap (see exit_mmap()), > + * so make sure it is immune to pkeys > + * look for pkey "leaks" where it is still set on a VMA > + *but "freed" back to the kernel > + * do a plain mprotect() to a mprotect_pkey() area and make > + *sure the pkey sticks Ram, I'm not sure where this came from, but this looks horrid. Please don't do this to the file > * Compile like this: > - * gcc -o protection_keys-O2 -g -std=gnu99 -pthread -Wall > protection_keys.c -lrt -ldl -lm > - * gcc -m32 -o protection_keys_32 -O2 -g -std=gnu99 -pthread -Wall > protection_keys.c -lrt -ldl -lm > + * gcc -o protection_keys-O2 -g -std=gnu99 > + *-pthread -Wall protection_keys.c -lrt -ldl -lm > + * gcc -m32 -o protection_keys_32 -O2 -g -std=gnu99 > + *-pthread -Wall protection_keys.c -lrt -ldl -lm > */ Please just leave this, or remove it from the file. It was a long line so it could be copied and pasted, this ruins that. > #define _GNU_SOURCE > #include > @@ -251,26 +256,11 @@ void signal_handler(int signum, siginfo_t *si, void > *vucontext) > dprintf1("signal pkey_reg from pkey_reg: %016lx\n", __rdpkey_reg()); > dprintf1("pkey from siginfo: %jx\n", siginfo_pkey); > *(u64 *)pkey_reg_ptr = 0x; > - dprintf1("WARNING: set PRKU=0 to allow faulting instruction to > continue\n"); > + dprintf1("WARNING: set PKEY_REG=0 to allow faulting instruction " > + "to continue\n"); > pkey_faults++; > dprintf1("==\n"); > return; > - if (trapno == 14) { > - fprintf(stderr, > - "ERROR: In signal handler, page fault, trapno = %d, ip > = %016lx\n", > - trapno, ip); > - fprintf(stderr, "si_addr %p\n", si->si_addr); > - fprintf(stderr, "REG_ERR: %lx\n", > - (unsigned > long)uctxt->uc_mcontext.gregs[REG_ERR]); > - exit(1); > - } else { > - fprintf(stderr, "unexpected trap %d! at 0x%lx\n", trapno, ip); > - fprintf(stderr, "si_addr %p\n", si->si_addr); > - fprintf(stderr, "REG_ERR: %lx\n", > - (unsigned > long)uctxt->uc_mcontext.gregs[REG_ERR]); > - exit(2); > - } > - dprint_in_signal = 0; > } I think this is just randomly removing code now. I think you should probably just drop this patch. It's not really brining anything useful. -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v12 13/22] selftests/vm: powerpc implementation for generic abstraction
On 02/21/2018 05:55 PM, Ram Pai wrote: > static inline u32 pkey_to_shift(int pkey) > { > +#if defined(__i386__) || defined(__x86_64__) /* arch */ > return pkey * PKEY_BITS_PER_PKEY; > +#elif __powerpc64__ /* arch */ > + return (NR_PKEYS - pkey - 1) * PKEY_BITS_PER_PKEY; > +#endif /* arch */ > } I really detest the #if #else style. Can't we just have a pkey_ppc.h and a pkey_x86.h or something? -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v12 14/22] selftests/vm: clear the bits in shadow reg when a pkey is freed.
On 02/21/2018 05:55 PM, Ram Pai wrote: > diff --git a/tools/testing/selftests/vm/protection_keys.c > b/tools/testing/selftests/vm/protection_keys.c > index c4c73e6..e82bd88 100644 > --- a/tools/testing/selftests/vm/protection_keys.c > +++ b/tools/testing/selftests/vm/protection_keys.c > @@ -586,7 +586,8 @@ int sys_pkey_free(unsigned long pkey) > int ret = syscall(SYS_pkey_free, pkey); > > if (!ret) > - shadow_pkey_reg &= reset_bits(pkey, PKEY_DISABLE_ACCESS); > + shadow_pkey_reg &= reset_bits(pkey, > + PKEY_DISABLE_ACCESS | PKEY_DISABLE_WRITE); > dprintf1("%s(pkey=%ld) syscall ret: %d\n", __func__, pkey, ret); > return ret; > } What about your EXEC bit? -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v12 15/22] selftests/vm: powerpc implementation to check support for pkey
On 02/21/2018 05:55 PM, Ram Pai wrote: > #define PAGE_SIZE (0x1UL << 16) > -static inline int cpu_has_pku(void) > +static inline bool is_pkey_supported(void) > { > - return 1; > + /* > + * No simple way to determine this. > + * lets try allocating a key and see if it succeeds. > + */ > + int ret = sys_pkey_alloc(0, 0); > + > + if (ret > 0) { > + sys_pkey_free(ret); > + return true; > + } > + return false; > } The point of doing this was to have a test for the CPU that way separate from the syscalls. Can you leave cpu_has_pkeys() in place? -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v12 16/22] selftests/vm: fix an assertion in test_pkey_alloc_exhaust()
On 02/21/2018 05:55 PM, Ram Pai wrote: > +static inline int arch_reserved_keys(void) > +{ > +#if defined(__i386__) || defined(__x86_64__) /* arch */ > + return NR_RESERVED_PKEYS; > +#elif __powerpc64__ /* arch */ > + if (sysconf(_SC_PAGESIZE) == 4096) > + return NR_RESERVED_PKEYS_4K; > + else > + return NR_RESERVED_PKEYS_64K; > +#else /* arch */ > + NOT SUPPORTED > +#endif /* arch */ > +} Yeah, this is hideous. Please either do it in one header: #ifdef x86.. static inline int arch_reserved_keys(void) { } ... #elif ppc static inline int arch_reserved_keys(void) { } ... #else #error #endif Or in multiple: #ifdef x86.. #include #elif ppc #include #else #error #endif -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v12 17/22] selftests/vm: associate key on a mapped page and detect access violation
On 02/21/2018 05:55 PM, Ram Pai wrote: > detect access-violation on a page to which access-disabled > key is associated much after the page is mapped. Looks fine to me. Did this actually find a bug for you? Acked-by: Dave Hansen -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v12 18/22] selftests/vm: associate key on a mapped page and detect write violation
On 02/21/2018 05:55 PM, Ram Pai wrote: > detect write-violation on a page to which write-disabled > key is associated much after the page is mapped. The more tests the merrier. Acked-by: Dave Hansen -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v12 19/22] selftests/vm: detect write violation on a mapped access-denied-key page
On 02/21/2018 05:55 PM, Ram Pai wrote: > detect write-violation on a page to which access-disabled > key is associated much after the page is mapped. Acked-by: Dave Hansen -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v12 20/22] selftests/vm: testcases must restore pkey-permissions
On 02/21/2018 05:55 PM, Ram Pai wrote: > Generally the signal handler restores the state of the pkey register > before returning. However there are times when the read/write operation > can legitamely fail without invoking the signal handler. Eg: A > sys_read() operaton to a write-protected page should be disallowed. In > such a case the state of the pkey register is not restored to its > original state. The test case is responsible for restoring the key > register state to its original value. Oh, that's a good point. Could we just do this in a common place, though? Like reset the register after each test? Seems more foolproof. -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v12 21/22] selftests/vm: sub-page allocator
On 02/21/2018 05:55 PM, Ram Pai wrote: ... > @@ -888,6 +917,7 @@ void setup_hugetlbfs(void) > void *(*pkey_malloc[])(long size, int prot, u16 pkey) = { > > malloc_pkey_with_mprotect, > + malloc_pkey_with_mprotect_subpage, > malloc_pkey_anon_huge, > malloc_pkey_hugetlb > /* can not do direct with the pkey_mprotect() API: I think I'd rather have an #ifdef on the array entries than have the malloc entry do nothing on x86. Maybe have a ppc-specific section at the end? -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v12 22/22] selftests/vm: Fix deadlock in protection_keys.c
On 02/21/2018 05:55 PM, Ram Pai wrote: > From: Thiago Jung Bauermann > > The sig_chld() handler calls dprintf2() taking care of setting > dprint_in_signal so that sigsafe_printf() won't call printf(). > Unfortunately, this precaution is is negated by dprintf_level(), which > has a call to fflush(). > > This function acquires a lock, which means that if the signal interrupts an > ongoing fflush() the process will deadlock. At least on powerpc this is > easy to trigger, resulting in the following backtrace when attaching to the > frozen process: Ugh, yeah, I've run into this too. Acked-by: Dave Hansen -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/1] Documentation: clk: enable lock is not held for clk_is_enabled API
Quoting Dong Aisheng (2018-01-19 05:37:15) > The core does not need to hold enable lock for clk_is_enabled API. > Update the doc to reflect it. > > Cc: Jonathan Corbet > Cc: Stephen Boyd > Suggested-by: Stephen Boyd > Signed-off-by: Dong Aisheng > --- Applied to clk-next -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4 6/6] arm64: dts: sdm845: Add I2C controller support
Hi, On Wed, Mar 14, 2018 at 4:58 PM, Karthikeyan Ramasubramanian wrote: > Add I2C master controller support for a built-in test I2C slave. > > Signed-off-by: Karthikeyan Ramasubramanian > --- > arch/arm64/boot/dts/qcom/sdm845-mtp.dts | 19 +++ > arch/arm64/boot/dts/qcom/sdm845.dtsi| 29 + > 2 files changed, 48 insertions(+) > > diff --git a/arch/arm64/boot/dts/qcom/sdm845-mtp.dts > b/arch/arm64/boot/dts/qcom/sdm845-mtp.dts > index ea3efc5..69445f1 100644 > --- a/arch/arm64/boot/dts/qcom/sdm845-mtp.dts > +++ b/arch/arm64/boot/dts/qcom/sdm845-mtp.dts > @@ -27,6 +27,10 @@ > serial@a84000 { > status = "okay"; > }; > + > + i2c@a88000 { > + status = "okay"; Any idea what clock frequency is appropriate for MTP? You've got some pretty beefy 2.2K external pulls I think, so presumably 400 KHz is what you're aiming for? It would be nice to specify one so you don't end up the the spam in the log "Bus frequency not specified ..." > + }; > }; > > pinctrl@340 { > @@ -50,5 +54,20 @@ > bias-pull-down; > }; > }; > + > + qup-i2c10-default { nit: probably in the pinctrl section we want to sort alphabetically? We could try to sort by "lowest numbered pin", but IMHO alphabetical makes the most sense. > + pinconf { > + pins = "gpio55", "gpio56"; > + drive-strength = <2>; > + bias-disable; > + }; > + }; > + > + qup-i2c10-sleep { > + pinconf { > + pins = "gpio55", "gpio56"; > + bias-pull-up; Are you sure that you want pullups enabled for sleep here? There are external pulls on this line (as there are on many i2c busses) so doing this will double-enable pulls. It probably won't hurt, but I'm curious if there's some sort of reason here. > + }; > + }; > }; > }; > diff --git a/arch/arm64/boot/dts/qcom/sdm845.dtsi > b/arch/arm64/boot/dts/qcom/sdm845.dtsi > index 59334d9..9ef056f 100644 > --- a/arch/arm64/boot/dts/qcom/sdm845.dtsi > +++ b/arch/arm64/boot/dts/qcom/sdm845.dtsi > @@ -209,6 +209,21 @@ > pins = "gpio4", "gpio5"; > }; > }; > + > + qup_i2c10_default: qup-i2c10-default { > + pinmux { > + function = "qup10"; > + pins = "gpio55", "gpio56"; > + }; > + }; > + > + qup_i2c10_sleep: qup-i2c10-sleep { > + pinmux { > + function = "gpio"; > + pins = "gpio55", "gpio56"; > + }; > + }; > + > }; > > timer@17c9 { > @@ -309,6 +324,20 @@ > interrupts = IRQ_TYPE_LEVEL_HIGH>; > status = "disabled"; > }; > + > + i2c10: i2c@a88000 { Seems like it might be nice to add all the i2c busses into the main sdm845.dtsi file. Sure, most won't be enabled, but it seems like it would avoid churn later. ...if you're sure you want to add only one i2c controller, subject of this patch should indicate that. > + compatible = "qcom,geni-i2c"; > + reg = <0xa88000 0x4000>; > + clock-names = "se"; > + clocks = <&gcc GCC_QUPV3_WRAP1_S2_CLK>; > + pinctrl-names = "default", "sleep"; > + pinctrl-0 = <&qup_i2c10_default>; > + pinctrl-1 = <&qup_i2c10_sleep>; > + interrupts = IRQ_TYPE_LEVEL_HIGH>; > + #address-cells = <1>; > + #size-cells = <0>; > + status = "disabled"; > + }; > }; > }; > }; > -- > Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum, > a Linux Foundation Collaborative Project > > -- > To unsubscribe from this list: send the line "unsubscribe linux-arm-msm" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo
Re: [PATCH v5 2/9] proc/sysctl: Provide additional ctl_table.flags checks
On Fri, Mar 16, 2018 at 02:13:43PM -0400, Waiman Long wrote: > Checking code is added to provide the following additional > ctl_table.flags checks: > > 1) No unknown flag is allowed. > 2) Minimum of a range cannot be larger than the maximum value. > 3) The signed and unsigned flags are mutually exclusive. > 4) The proc_handler should be consistent with the signed or unsigned > flags. > > Two new flags are added to indicate if the min/max values are signed > or unsigned - CTL_FLAGS_SIGNED_RANGE & CTL_FLAGS_UNSIGNED_RANGE. > These 2 flags can be optionally enabled for range checking purpose. > But either one of them must be set with CTL_FLAGS_CLAMP_RANGE. > > Signed-off-by: Waiman Long > --- > diff --git a/include/linux/sysctl.h b/include/linux/sysctl.h > index e446e1f..088f032 100644 > --- a/include/linux/sysctl.h > +++ b/include/linux/sysctl.h > @@ -134,14 +134,26 @@ struct ctl_table > * the input value. No lower bound or upper bound checking will be > * done if the corresponding minimum or maximum value isn't provided. > * > + * @CTL_FLAGS_SIGNED_RANGE: Set to indicate that the extra1 and extra2 > + * fields are pointers to minimum and maximum signed values of > + * an allowable range. > + * > + * @CTL_FLAGS_UNSIGNED_RANGE: Set to indicate that the extra1 and extra2 > + * fields are pointers to minimum and maximum unsigned values of > + * an allowable range. > + * > * At most 16 different flags are allowed. > */ > enum ctl_table_flags { > CTL_FLAGS_CLAMP_RANGE = BIT(0), > - __CTL_FLAGS_MAX = BIT(1), > + CTL_FLAGS_SIGNED_RANGE = BIT(1), > + CTL_FLAGS_UNSIGNED_RANGE= BIT(2), > + __CTL_FLAGS_MAX = BIT(3), > }; You are adding new flags which the user can set, and yet these are used internally. It would be best if internal flags are just that, not flags that a user can set. This patch should be folded with the first one. I'm starting to loose hope on these patch sets. Luis -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v5 1/9] sysctl: Add flags to support min/max range clamping
On Fri, Mar 16, 2018 at 02:13:42PM -0400, Waiman Long wrote: > When the CTL_FLAGS_CLAMP_RANGE flag is set in the ctl_table > entry, any update from the userspace will be clamped to the given > range without error if either the proc_dointvec_minmax() or the > proc_douintvec_minmax() handlers is used. I don't get it. Why define a generic range flag when we can be mores specific and you do that in your next patch. What's the point of this flag then? Luis -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html