Re: [PATCH 00/16] remove eight obsolete architectures
Do we have anything left that still implements NOMMU? David -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 00/16] remove eight obsolete architectures
Hi David, On Thu, Mar 15, 2018 at 10:42 AM, David Howells wrote: > Do we have anything left that still implements NOMMU? Sure: arm, c6x, m68k, microblaze, and sh. Gr{oetje,eeting}s, Geert -- Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org In personal conversations with technical people, I call myself a hacker. But when I'm talking to journalists I just say "programmer" or something like that. -- Linus Torvalds -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 00/16] remove eight obsolete architectures
On Thu, Mar 15, 2018 at 10:42 AM, David Howells wrote: > Do we have anything left that still implements NOMMU? Yes, plenty. I was wondering the same thing, but it seems that the architectures we remove are almost completely representative of what we support overall, except that they are all not licensed to 3rd parties, unlike many of the ones we keep. I've made an overview of the remaining architectures for my own reference[1]. The remaining NOMMU architectures are: - arch/arm has ARMv7-M (Cortex-M microcontroller), which is actually gaining traction - arch/sh has an open-source J2 core that was added not that long ago, it seems to be the only SH compatible core that anyone is working on. - arch/microblaze supports both MMU/NOMMU modes (most use an MMU) - arch/m68k supports several NOMMU targets, both the coldfire SoCs and the classic processors - c6x has no MMU Arnd [1] https://docs.google.com/spreadsheets/d/1QxMvW5jpVG2jb4RM9CQQl27-wVpNYOa-_3K2RVKifb0 -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 00/16] remove eight obsolete architectures
On 03/15/2018 10:42 AM, David Howells wrote: > Do we have anything left that still implements NOMMU? > RISC-V ? (evil grin :-) Cheers, Hannes -- Dr. Hannes ReineckeTeamlead Storage & Networking h...@suse.de +49 911 74053 688 SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton HRB 21284 (AG Nürnberg) -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Documentation/CodingStyle: Add an example for braces
On Wed, 14 Mar 2018, Gary R Hook wrote: > Add another example of required braces when using a compound statement in > a loop. > > Signed-off-by: Gary R Hook > --- > Documentation/process/coding-style.rst |9 + > 1 file changed, 9 insertions(+) > > diff --git a/Documentation/process/coding-style.rst > b/Documentation/process/coding-style.rst > index a20b44a40ec4..d98deb62c400 100644 > --- a/Documentation/process/coding-style.rst > +++ b/Documentation/process/coding-style.rst > @@ -200,6 +200,15 @@ statement; in the latter case use braces in both > branches: > otherwise(); > } > > +Also, use braces when a loop contains more than a single simple statement: Personally, I'd not limit this to loops. if (condition) { if (another_condition) action(); } You could argue the existing rule already covers these cases by excluding selection and iteration statements from the "single statement" in "Do not unnecessarily use braces where a single statement will do." BR, Jani. > + > +.. code-block:: c > + > + while (condition) { > + if (test) > + do_something(); > + } > + > 3.1) Spaces > *** > > > -- > To unsubscribe from this list: send the line "unsubscribe linux-doc" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- Jani Nikula, Intel Open Source Technology Center -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 00/16] remove eight obsolete architectures
On Thu, Mar 15, 2018 at 10:59 AM, Hannes Reinecke wrote: > On 03/15/2018 10:42 AM, David Howells wrote: >> Do we have anything left that still implements NOMMU? >> > RISC-V ? > (evil grin :-) Is anyone producing a chip that includes enough of the Privileged ISA spec to have things like system calls, but not the MMU parts? I thought at least initially the kernel only supports hardware that has a rather complete feature set. Arnd -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 5/8] trace_uprobe: Support SDT markers having reference count (semaphore)
On 03/14/2018 10:29 PM, Oleg Nesterov wrote: > On 03/13, Ravi Bangoria wrote: >> +static bool sdt_valid_vma(struct trace_uprobe *tu, struct vm_area_struct >> *vma) >> +{ >> +unsigned long vaddr = vma_offset_to_vaddr(vma, tu->ref_ctr_offset); >> + >> +return tu->ref_ctr_offset && >> +vma->vm_file && >> +file_inode(vma->vm_file) == tu->inode && >> +vma->vm_flags & VM_WRITE && >> +vma->vm_start <= vaddr && >> +vma->vm_end > vaddr; >> +} > Perhaps in this case a simple > > ref_ctr_offset < vma->vm_end - vma->vm_start > > check without vma_offset_to_vaddr() makes more sense, but I won't insist. > Hmm... I'm not quite sure. Will rethink and get back to you. > >> +static void sdt_increment_ref_ctr(struct trace_uprobe *tu) >> +{ >> +struct uprobe_map_info *info; >> +struct vm_area_struct *vma; >> +unsigned long vaddr; >> + >> +uprobe_start_dup_mmap(); >> +info = uprobe_build_map_info(tu->inode->i_mapping, >> +tu->ref_ctr_offset, false); > Hmm. This doesn't look right. > > If you need to find all mappings (and avoid the races with fork/dup_mmap) you > need to take this semaphore for writing, uprobe_start_dup_mmap() can't help. Oops. Yes. Will change it. Thanks for the review :) Ravi -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Improve mutex documentation
On Wed, Mar 14, 2018 at 01:56:31PM -0700, Andrew Morton wrote: > My memory is weak and our documentation is awful. What does > mutex_lock_killable() actually do and how does it differ from > mutex_lock_interruptible()? From: Matthew Wilcox Add kernel-doc for mutex_lock_killable() and mutex_lock_io(). Reword the kernel-doc for mutex_lock_interruptible(). Signed-off-by: Matthew Wilcox diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c index 858a07590e39..2048359f33d2 100644 --- a/kernel/locking/mutex.c +++ b/kernel/locking/mutex.c @@ -1082,15 +1082,16 @@ static noinline int __sched __mutex_lock_interruptible_slowpath(struct mutex *lock); /** - * mutex_lock_interruptible - acquire the mutex, interruptible - * @lock: the mutex to be acquired + * mutex_lock_interruptible() - Acquire the mutex, interruptible by signals. + * @lock: The mutex to be acquired. * - * Lock the mutex like mutex_lock(), and return 0 if the mutex has - * been acquired or sleep until the mutex becomes available. If a - * signal arrives while waiting for the lock then this function - * returns -EINTR. + * Lock the mutex like mutex_lock(). If a signal is delivered while the + * process is sleeping, this function will return without acquiring the + * mutex. * - * This function is similar to (but not equivalent to) down_interruptible(). + * Context: Process context. + * Return: 0 if the lock was successfully acquired or %-EINTR if a + * signal arrived. */ int __sched mutex_lock_interruptible(struct mutex *lock) { @@ -1104,6 +1105,18 @@ int __sched mutex_lock_interruptible(struct mutex *lock) EXPORT_SYMBOL(mutex_lock_interruptible); +/** + * mutex_lock_killable() - Acquire the mutex, interruptible by fatal signals. + * @lock: The mutex to be acquired. + * + * Lock the mutex like mutex_lock(). If a signal which will be fatal to + * the current process is delivered while the process is sleeping, this + * function will return without acquiring the mutex. + * + * Context: Process context. + * Return: 0 if the lock was successfully acquired or %-EINTR if a + * fatal signal arrived. + */ int __sched mutex_lock_killable(struct mutex *lock) { might_sleep(); @@ -1115,6 +1128,16 @@ int __sched mutex_lock_killable(struct mutex *lock) } EXPORT_SYMBOL(mutex_lock_killable); +/** + * mutex_lock_io() - Acquire the mutex and mark the process as waiting for I/O + * @lock: The mutex to be acquired. + * + * Lock the mutex like mutex_lock(). While the task is waiting for this + * mutex, it will be accounted as being in the IO wait state by the + * scheduler. + * + * Context: Process context. + */ void __sched mutex_lock_io(struct mutex *lock) { int token; -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] kernel-doc: Remove __sched markings
From: Matthew Wilcox I find the __sched annotations unaesthetic in the kernel-doc. Remove them like we remove __inline, __weak, __init and so on. Signed-off-by: Matthew Wilcox diff --git a/scripts/kernel-doc b/scripts/kernel-doc index ae3cac118a11..eb986a7809d3 100755 --- a/scripts/kernel-doc +++ b/scripts/kernel-doc @@ -1578,6 +1578,7 @@ sub dump_function($$) { $prototype =~ s/__meminit +//; $prototype =~ s/__must_check +//; $prototype =~ s/__weak +//; +$prototype =~ s/__sched +//; my $define = $prototype =~ s/^#\s*define\s+//; #ak added $prototype =~ s/__attribute__\s*\(\( (?: -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Improve mutex documentation
Hi, Matthew, On 15.03.2018 14:58, Matthew Wilcox wrote: > On Wed, Mar 14, 2018 at 01:56:31PM -0700, Andrew Morton wrote: >> My memory is weak and our documentation is awful. What does >> mutex_lock_killable() actually do and how does it differ from >> mutex_lock_interruptible()? > > From: Matthew Wilcox > > Add kernel-doc for mutex_lock_killable() and mutex_lock_io(). Reword the > kernel-doc for mutex_lock_interruptible(). > > Signed-off-by: Matthew Wilcox > > diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c > index 858a07590e39..2048359f33d2 100644 > --- a/kernel/locking/mutex.c > +++ b/kernel/locking/mutex.c > @@ -1082,15 +1082,16 @@ static noinline int __sched > __mutex_lock_interruptible_slowpath(struct mutex *lock); > > /** > - * mutex_lock_interruptible - acquire the mutex, interruptible > - * @lock: the mutex to be acquired > + * mutex_lock_interruptible() - Acquire the mutex, interruptible by signals. > + * @lock: The mutex to be acquired. > * > - * Lock the mutex like mutex_lock(), and return 0 if the mutex has > - * been acquired or sleep until the mutex becomes available. If a > - * signal arrives while waiting for the lock then this function > - * returns -EINTR. > + * Lock the mutex like mutex_lock(). If a signal is delivered while the > + * process is sleeping, this function will return without acquiring the > + * mutex. > * > - * This function is similar to (but not equivalent to) down_interruptible(). > + * Context: Process context. > + * Return: 0 if the lock was successfully acquired or %-EINTR if a > + * signal arrived. > */ > int __sched mutex_lock_interruptible(struct mutex *lock) > { > @@ -1104,6 +1105,18 @@ int __sched mutex_lock_interruptible(struct mutex > *lock) > > EXPORT_SYMBOL(mutex_lock_interruptible); > > +/** > + * mutex_lock_killable() - Acquire the mutex, interruptible by fatal signals. Shouldn't we clarify that fatal signals are SIGKILL only? > + * @lock: The mutex to be acquired. > + * > + * Lock the mutex like mutex_lock(). If a signal which will be fatal to > + * the current process is delivered while the process is sleeping, this > + * function will return without acquiring the mutex. > + * > + * Context: Process context. > + * Return: 0 if the lock was successfully acquired or %-EINTR if a > + * fatal signal arrived. > + */ > int __sched mutex_lock_killable(struct mutex *lock) > { > might_sleep(); > @@ -1115,6 +1128,16 @@ int __sched mutex_lock_killable(struct mutex *lock) > } > EXPORT_SYMBOL(mutex_lock_killable); > > +/** > + * mutex_lock_io() - Acquire the mutex and mark the process as waiting for > I/O > + * @lock: The mutex to be acquired. > + * > + * Lock the mutex like mutex_lock(). While the task is waiting for this > + * mutex, it will be accounted as being in the IO wait state by the > + * scheduler. > + * > + * Context: Process context. > + */ > void __sched mutex_lock_io(struct mutex *lock) > { > int token; > -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 8/8] trace_uprobe/sdt: Document about reference counter
Hi Ravi, On Wed, 14 Mar 2018 20:52:59 +0530 Ravi Bangoria wrote: > On 03/14/2018 07:20 PM, Masami Hiramatsu wrote: > > On Tue, 13 Mar 2018 18:26:03 +0530 > > Ravi Bangoria wrote: > > > >> No functionality changes. > > Please consider to describe what is this change and why, here. > > Will add in next version. Thanks, and could you also move this before perf-probe patch? Also Could you make perf-probe check the tracing/README whether the kernel supports reference counter syntax or not? perf-tool can be used on older (or stable) kernel. Thank you, > > >> Signed-off-by: Ravi Bangoria > >> --- > >> Documentation/trace/uprobetracer.txt | 16 +--- > >> kernel/trace/trace.c | 2 +- > >> 2 files changed, 14 insertions(+), 4 deletions(-) > >> > >> diff --git a/Documentation/trace/uprobetracer.txt > >> b/Documentation/trace/uprobetracer.txt > >> index bf526a7c..8fb13b0 100644 > >> --- a/Documentation/trace/uprobetracer.txt > >> +++ b/Documentation/trace/uprobetracer.txt > >> @@ -19,15 +19,25 @@ user to calculate the offset of the probepoint in the > >> object. > >> > >> Synopsis of uprobe_tracer > >> - > >> - p[:[GRP/]EVENT] PATH:OFFSET [FETCHARGS] : Set a uprobe > >> - r[:[GRP/]EVENT] PATH:OFFSET [FETCHARGS] : Set a return uprobe > >> (uretprobe) > >> - -:[GRP/]EVENT : Clear uprobe or uretprobe > >> event > >> + p[:[GRP/]EVENT] PATH:OFFSET[(REF_CTR_OFFSET)] [FETCHARGS] > >> + r[:[GRP/]EVENT] PATH:OFFSET[(REF_CTR_OFFSET)] [FETCHARGS] > > Ah, OK in this context, [] means optional syntax :) > > Correct. > > Thanks, > Ravi > -- Masami Hiramatsu -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Improve mutex documentation
On Thu, Mar 15, 2018 at 03:12:30PM +0300, Kirill Tkhai wrote: > > +/** > > + * mutex_lock_killable() - Acquire the mutex, interruptible by fatal > > signals. > > Shouldn't we clarify that fatal signals are SIGKILL only? It's more complicated than it might seem (... welcome to signal handling!) If you send SIGINT to a task that's waiting on a mutex_killable(), it will still die. I *think* that's due to the code in complete_signal(): if (sig_fatal(p, sig) && !(signal->flags & SIGNAL_GROUP_EXIT) && !sigismember(&t->real_blocked, sig) && (sig == SIGKILL || !p->ptrace)) { ... sigaddset(&t->pending.signal, SIGKILL); You're correct that this code only checks for SIGKILL, but any fatal signal will result in the signal group receiving SIGKILL. Unless I've misunderstood, and it wouldn't be the first time I've misunderstood signal handling. -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Improve mutex documentation
On 15.03.2018 16:18, Matthew Wilcox wrote: > On Thu, Mar 15, 2018 at 03:12:30PM +0300, Kirill Tkhai wrote: >>> +/** >>> + * mutex_lock_killable() - Acquire the mutex, interruptible by fatal >>> signals. >> >> Shouldn't we clarify that fatal signals are SIGKILL only? > > It's more complicated than it might seem (... welcome to signal handling!) > If you send SIGINT to a task that's waiting on a mutex_killable(), it will > still die. I *think* that's due to the code in complete_signal(): > > if (sig_fatal(p, sig) && > !(signal->flags & SIGNAL_GROUP_EXIT) && > !sigismember(&t->real_blocked, sig) && > (sig == SIGKILL || !p->ptrace)) { > ... > sigaddset(&t->pending.signal, SIGKILL); > > You're correct that this code only checks for SIGKILL, but any fatal > signal will result in the signal group receiving SIGKILL. > > Unless I've misunderstood, and it wouldn't be the first time I've > misunderstood signal handling. Sure, thanks for the explanation. Kirill -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 00/16] remove eight obsolete architectures
On Thu, Mar 15, 2018 at 11:42:25AM +0100, Arnd Bergmann wrote: > Is anyone producing a chip that includes enough of the Privileged ISA spec > to have things like system calls, but not the MMU parts? Various SiFive SOCs seem to support M and U mode, but no S mode or iommu. That should be enough for nommu Linux running in M mode if someone cares enough to actually port it. -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 5/8] trace_uprobe: Support SDT markers having reference count (semaphore)
On 03/13, Ravi Bangoria wrote: > > @@ -1053,6 +1056,9 @@ int uprobe_mmap(struct vm_area_struct *vma) > struct uprobe *uprobe, *u; > struct inode *inode; > > + if (uprobe_mmap_callback) > + uprobe_mmap_callback(vma); > + > if (no_uprobe_events() || !valid_vma(vma, true)) > return 0; probe_event_enable() does uprobe_register(); /* WINDOW */ sdt_increment_ref_ctr(); what if uprobe_mmap() is called in between? The counter(s) in this vma will be incremented twice, no? > +static struct vm_area_struct * > +sdt_find_vma(struct mm_struct *mm, struct trace_uprobe *tu) > +{ > + struct vm_area_struct *tmp; > + > + for (tmp = mm->mmap; tmp != NULL; tmp = tmp->vm_next) > + if (sdt_valid_vma(tu, tmp)) > + return tmp; > + > + return NULL; I can't understand the logic... Lets ignore sdt_valid_vma() for now. The caller has uprobe_map_info, why it can't simply do vma = find_vma(uprobe_map_info->vaddr)? and then check sdt_valid_vma(). Oleg. -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 5/8] trace_uprobe: Support SDT markers having reference count (semaphore)
On 03/15, Oleg Nesterov wrote: > > > +static struct vm_area_struct * > > +sdt_find_vma(struct mm_struct *mm, struct trace_uprobe *tu) > > +{ > > + struct vm_area_struct *tmp; > > + > > + for (tmp = mm->mmap; tmp != NULL; tmp = tmp->vm_next) > > + if (sdt_valid_vma(tu, tmp)) > > + return tmp; > > + > > + return NULL; > > I can't understand the logic... Lets ignore sdt_valid_vma() for now. > The caller has uprobe_map_info, why it can't simply do > vma = find_vma(uprobe_map_info->vaddr)? and then check sdt_valid_vma(). Note to mention that sdt_find_vma() can return NULL but the callers do vma_offset_to_vaddr(vma) without any check. Oleg. -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 6/8] trace_uprobe/sdt: Fix multiple update of same reference counter
On 03/13, Ravi Bangoria wrote: > > For tiny binaries/libraries, different mmap regions points to the > same file portion. In such cases, we may increment reference counter > multiple times. Yes, > But while de-registration, reference counter will get > decremented only by once could you explain why this happens? sdt_increment_ref_ctr() and sdt_decrement_ref_ctr() look symmetrical, _decrement_ should see the same mappings? Ether way, this patch doesn't look right at first glance... Just for example, > +static bool sdt_check_mm_list(struct trace_uprobe *tu, struct mm_struct *mm) > +{ > + struct sdt_mm_list *tmp = tu->sml; > + > + if (!tu->sml || !mm) > + return false; > + > + while (tmp) { > + if (tmp->mm == mm) > + return true; > + tmp = tmp->next; > + } > + > + return false; ... > +} > + > +static void sdt_add_mm_list(struct trace_uprobe *tu, struct mm_struct *mm) > +{ > + struct sdt_mm_list *tmp; > + > + tmp = kzalloc(sizeof(*tmp), GFP_KERNEL); > + if (!tmp) > + return; > + > + tmp->mm = mm; > + tmp->next = tu->sml; > + tu->sml = tmp; > +} > + ... > @@ -1020,8 +1104,16 @@ void trace_uprobe_mmap_callback(struct vm_area_struct > *vma) > !trace_probe_is_enabled(&tu->tp)) > continue; > > + down_write(&tu->sml_rw_sem); > + if (sdt_check_mm_list(tu, vma->vm_mm)) > + goto cont; > + > vaddr = vma_offset_to_vaddr(vma, tu->ref_ctr_offset); > - sdt_update_ref_ctr(vma->vm_mm, vaddr, 1); > + if (!sdt_update_ref_ctr(vma->vm_mm, vaddr, 1)) > + sdt_add_mm_list(tu, vma->vm_mm); > + > +cont: > + up_write(&tu->sml_rw_sem); To simplify, suppose that tu->sml is empty. Some process calls this function, increments the counter and adds its ->mm into the list. Then it exits, ->mm is freed. The next fork/exec allocates the same memory for the new ->mm, the new process calls trace_uprobe_mmap_callback() and sdt_check_mm_list() returns T? Oleg. -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 5/8] trace_uprobe: Support SDT markers having reference count (semaphore)
On 03/13, Ravi Bangoria wrote: > > +sdt_update_ref_ctr(struct mm_struct *mm, unsigned long vaddr, short d) > +{ > + void *kaddr; > + struct page *page; > + struct vm_area_struct *vma; > + int ret = 0; > + unsigned short orig = 0; > + > + if (vaddr == 0) > + return -EINVAL; > + > + ret = get_user_pages_remote(NULL, mm, vaddr, 1, > + FOLL_FORCE | FOLL_WRITE, &page, &vma, NULL); > + if (ret <= 0) > + return ret; > + > + kaddr = kmap_atomic(page); > + memcpy(&orig, kaddr + (vaddr & ~PAGE_MASK), sizeof(orig)); > + orig += d; > + memcpy(kaddr + (vaddr & ~PAGE_MASK), &orig, sizeof(orig)); > + kunmap_atomic(kaddr); Hmm. Why memcpy? You could simply do kaddr = kmap_atomic(); unsigned short *ptr = kaddr + (vaddr & ~PAGE_MASK); *ptr += d; kunmap_atomic(); Oleg. -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/8] Uprobe: Export vaddr <-> offset conversion functions
On Tue, 13 Mar 2018 18:25:56 +0530 Ravi Bangoria wrote: > No functionality changes. Please add a detailed explanation why this patch is needed. All commits should be self sufficient and stand on their own. If I were to come up to this patch via a git blame, I would be clueless to why it was done. -- Steve > > Signed-off-by: Ravi Bangoria > --- > -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/8] mm: Prefix vma_ to vaddr_to_offset() and offset_to_vaddr()
On Tue, 13 Mar 2018 18:25:57 +0530 Ravi Bangoria wrote: > No functionality changes. Again, please add an explanation to why this patch is done. -- Steve > > Signed-off-by: Ravi Bangoria -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/8] Uprobe: Export uprobe_map_info along with uprobe_{build/free}_map_info()
On Tue, 13 Mar 2018 18:25:59 +0530 Ravi Bangoria wrote: > These exported data structure and functions will be used by other > files in later patches. I'm reluctantly OK with the above. > > No functionality changes. Please remove this line. There are functionality changes. Turning a static inline into an exported function *is* a functionality change. -- Steve > > Signed-off-by: Ravi Bangoria > --- > include/linux/uprobes.h | 9 + > kernel/events/uprobes.c | 14 +++--- > 2 files changed, 12 insertions(+), 11 deletions(-) > > diff --git a/include/linux/uprobes.h b/include/linux/uprobes.h > index 0a294e9..7bd2760 100644 > --- a/include/linux/uprobes.h > +++ b/include/linux/uprobes.h > @@ -109,12 +109,19 @@ enum rp_check { > RP_CHECK_RET, > }; > > +struct address_space; > struct xol_area; > > struct uprobes_state { > struct xol_area *xol_area; > }; > > +struct uprobe_map_info { > + struct uprobe_map_info *next; > + struct mm_struct *mm; > + unsigned long vaddr; > +}; > + > extern int set_swbp(struct arch_uprobe *aup, struct mm_struct *mm, unsigned > long vaddr); > extern int set_orig_insn(struct arch_uprobe *aup, struct mm_struct *mm, > unsigned long vaddr); > extern bool is_swbp_insn(uprobe_opcode_t *insn); > @@ -149,6 +156,8 @@ struct uprobes_state { > extern bool arch_uprobe_ignore(struct arch_uprobe *aup, struct pt_regs > *regs); > extern void arch_uprobe_copy_ixol(struct page *page, unsigned long vaddr, >void *src, unsigned long len); > +extern struct uprobe_map_info *uprobe_free_map_info(struct uprobe_map_info > *info); > +extern struct uprobe_map_info *uprobe_build_map_info(struct address_space > *mapping, loff_t offset, bool is_register); > #else /* !CONFIG_UPROBES */ > struct uprobes_state { > }; > diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c > index 081b88c1..e7830b8 100644 > --- a/kernel/events/uprobes.c > +++ b/kernel/events/uprobes.c > @@ -695,23 +695,15 @@ static void delete_uprobe(struct uprobe *uprobe) > put_uprobe(uprobe); > } > > -struct uprobe_map_info { > - struct uprobe_map_info *next; > - struct mm_struct *mm; > - unsigned long vaddr; > -}; > - > -static inline struct uprobe_map_info * > -uprobe_free_map_info(struct uprobe_map_info *info) > +struct uprobe_map_info *uprobe_free_map_info(struct uprobe_map_info *info) > { > struct uprobe_map_info *next = info->next; > kfree(info); > return next; > } > > -static struct uprobe_map_info * > -uprobe_build_map_info(struct address_space *mapping, loff_t offset, > - bool is_register) > +struct uprobe_map_info *uprobe_build_map_info(struct address_space *mapping, > + loff_t offset, bool is_register) > { > unsigned long pgoff = offset >> PAGE_SHIFT; > struct vm_area_struct *vma; -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/8] Uprobe: Rename map_info to uprobe_map_info
On Tue, 13 Mar 2018 18:25:58 +0530 Ravi Bangoria wrote: > -static inline struct map_info *free_map_info(struct map_info *info) > +static inline struct uprobe_map_info * > +uprobe_free_map_info(struct uprobe_map_info *info) > { > - struct map_info *next = info->next; > + struct uprobe_map_info *next = info->next; > kfree(info); > return next; > } > > -static struct map_info * > -build_map_info(struct address_space *mapping, loff_t offset, bool > is_register) > +static struct uprobe_map_info * > +uprobe_build_map_info(struct address_space *mapping, loff_t offset, Also, as these functions have side effects (like you need to perform a mmput(info->mm), you need to add kerneldoc type comments to these functions, explaining how to use them. When you upgrade a function from static to use cases outside the file, it requires documenting that function for future users. -- Steve > + bool is_register) > { > unsigned long pgoff = offset >> PAGE_SHIFT; > struct vm_area_struct *vma; > - struct map_info *curr = NULL; > - struct map_info *prev = NULL; > - struct map_info *info; > + struct uprobe_map_info *curr = NULL; > + struct uprobe_map_info *prev = NULL; > + struct uprobe_map_info *info; > int more = 0; > > again: > @@ -729,7 +731,7 @@ static inline struct map_info *free_map_info(struct > map_info *info) >* Needs GFP_NOWAIT to avoid i_mmap_rwsem recursion > through >* reclaim. This is optimistic, no harm done if it > fails. >*/ > - prev = kmalloc(sizeof(struct map_info), > + prev = kmalloc(sizeof(struct uprobe_map_info), > GFP_NOWAIT | __GFP_NOMEMALLOC | > __GFP_NOWARN); > if (prev) > prev->next = NULL; > @@ -762,7 +764,7 @@ static inline struct map_info *free_map_info(struct > map_info *info) > } > > do { > - info = kmalloc(sizeof(struct map_info), GFP_KERNEL); > + info = kmalloc(sizeof(struct uprobe_map_info), GFP_KERNEL); > if (!info) { > curr = ERR_PTR(-ENOMEM); > goto out; > @@ -774,7 +776,7 @@ static inline struct map_info *free_map_info(struct > map_info *info) > goto again; > out: > while (prev) > - prev = free_map_info(prev); > + prev = uprobe_free_map_info(prev); > return curr; > } > > @@ -782,11 +784,11 @@ static inline struct map_info *free_map_info(struct > map_info *info) > register_for_each_vma(struct uprobe *uprobe, struct uprobe_consumer *new) > { > bool is_register = !!new; > - struct map_info *info; > + struct uprobe_map_info *info; > int err = 0; > > percpu_down_write(&dup_mmap_sem); > - info = build_map_info(uprobe->inode->i_mapping, > + info = uprobe_build_map_info(uprobe->inode->i_mapping, > uprobe->offset, is_register); > if (IS_ERR(info)) { > err = PTR_ERR(info); > @@ -825,7 +827,7 @@ static inline struct map_info *free_map_info(struct > map_info *info) > up_write(&mm->mmap_sem); > free: > mmput(mm); > - info = free_map_info(info); > + info = uprobe_free_map_info(info); > } > out: > percpu_up_write(&dup_mmap_sem); -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 5/8] trace_uprobe: Support SDT markers having reference count (semaphore)
On Tue, 13 Mar 2018 18:26:00 +0530 Ravi Bangoria wrote: > +static void sdt_increment_ref_ctr(struct trace_uprobe *tu) > +{ > + struct uprobe_map_info *info; > + struct vm_area_struct *vma; > + unsigned long vaddr; > + > + uprobe_start_dup_mmap(); Please add a comment here that this function ups the mm ref count for each info returned. Otherwise it's hard to know what that mmput() below matches. -- Steve > + info = uprobe_build_map_info(tu->inode->i_mapping, > + tu->ref_ctr_offset, false); > + if (IS_ERR(info)) > + goto out; > + > + while (info) { > + down_write(&info->mm->mmap_sem); > + > + vma = sdt_find_vma(info->mm, tu); > + vaddr = vma_offset_to_vaddr(vma, tu->ref_ctr_offset); > + sdt_update_ref_ctr(info->mm, vaddr, 1); > + > + up_write(&info->mm->mmap_sem); > + mmput(info->mm); > + info = uprobe_free_map_info(info); > + } > + > +out: > + uprobe_end_dup_mmap(); > +} > + -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
rfc: remove print_vma_addr ? (was Re: [PATCH 00/16] remove eight obsolete architectures)
On Thu, 2018-03-15 at 10:48 +0100, Geert Uytterhoeven wrote: > Hi David, > > On Thu, Mar 15, 2018 at 10:42 AM, David Howells wrote: > > Do we have anything left that still implements NOMMU? > > Sure: arm, c6x, m68k, microblaze, and sh. I have a patchset that creates a vsprintf extension for print_vma_addr and removes all the uses similar to the print_symbol() removal. This now avoids any possible printk interleaving. Unfortunately, without some #ifdef in vsprintf, which I would like to avoid, it increases the nommu kernel size by ~500 bytes. Anyone think this is acceptable? Here's the overall patch, but I have it as a series --- Documentation/core-api/printk-formats.rst | 9 + arch/arm64/kernel/traps.c | 13 +++ arch/mips/mm/fault.c | 16 - arch/parisc/mm/fault.c| 15 arch/riscv/kernel/traps.c | 11 +++--- arch/s390/mm/fault.c | 7 ++-- arch/sparc/mm/fault_32.c | 8 ++--- arch/sparc/mm/fault_64.c | 8 ++--- arch/tile/kernel/signal.c | 9 ++--- arch/um/kernel/trap.c | 13 +++ arch/x86/kernel/signal.c | 10 ++ arch/x86/kernel/traps.c | 18 -- arch/x86/mm/fault.c | 12 +++ include/linux/mm.h| 1 - lib/vsprintf.c| 58 ++- mm/memory.c | 33 -- 16 files changed, 112 insertions(+), 129 deletions(-) diff --git a/Documentation/core-api/printk-formats.rst b/Documentation/core-api/printk-formats.rst index 934559b3c130..10a91da1bc83 100644 --- a/Documentation/core-api/printk-formats.rst +++ b/Documentation/core-api/printk-formats.rst @@ -157,6 +157,15 @@ DMA address types dma_addr_t For printing a dma_addr_t type which can vary based on build options, regardless of the width of the CPU data path. +VMA name and address + + +:: + + %pav[hexstart+hexsize] or ?[0+0] if unavailable + +For any address, print the vma's name and its starting address and size + Passed by reference. Raw buffer as an escaped string diff --git a/arch/arm64/kernel/traps.c b/arch/arm64/kernel/traps.c index 2b478565d774..48edf812ce8b 100644 --- a/arch/arm64/kernel/traps.c +++ b/arch/arm64/kernel/traps.c @@ -242,13 +242,14 @@ void arm64_force_sig_info(struct siginfo *info, const char *str, if (!show_unhandled_signals_ratelimited()) goto send_sig; - pr_info("%s[%d]: unhandled exception: ", tsk->comm, task_pid_nr(tsk)); if (esr) - pr_cont("%s, ESR 0x%08x, ", esr_get_class_string(esr), esr); - - pr_cont("%s", str); - print_vma_addr(KERN_CONT " in ", regs->pc); - pr_cont("\n"); + pr_info("%s[%d]: unhandled exception: %s, ESR 0x%08x, %s in %pav\n", + tsk->comm, task_pid_nr(tsk), + esr_get_class_string(esr), esr, + str, ®s->pc); + else + pr_info("%s[%d]: unhandled exception: %s in %pav\n", + tsk->comm, task_pid_nr(tsk), str, ®s->pc); __show_regs(regs); send_sig: diff --git a/arch/mips/mm/fault.c b/arch/mips/mm/fault.c index 4f8f5bf46977..ce7bf077a0f5 100644 --- a/arch/mips/mm/fault.c +++ b/arch/mips/mm/fault.c @@ -213,14 +213,14 @@ static void __kprobes __do_page_fault(struct pt_regs *regs, unsigned long write, tsk->comm, write ? "write access to" : "read access from", field, address); - pr_info("epc = %0*lx in", field, - (unsigned long) regs->cp0_epc); - print_vma_addr(KERN_CONT " ", regs->cp0_epc); - pr_cont("\n"); - pr_info("ra = %0*lx in", field, - (unsigned long) regs->regs[31]); - print_vma_addr(KERN_CONT " ", regs->regs[31]); - pr_cont("\n"); + pr_info("epc = %0*lx in %pav\n", + field, + (unsigned long)regs->cp0_epc, + ®s->cp0_epc); + pr_info("ra = %0*lx in %pav\n", + field, + (unsigned long)regs->regs[31], + ®s->regs[31]); } current->thread.trap_nr = (regs->cp0_cause >> 2) & 0x1f; info.si_signo = SIGSEGV; diff --git a/arch/parisc/mm/fault.c b/arch/parisc/mm/fault.c index e247edbca68e..877cea702714 100644 --- a/arch/parisc/mm/fault.c +++ b/arch/parisc/mm/fault.c @@ -240,17 +240,14 @@ show_signal_msg(struct pt_regs *regs, unsigned long code,
Re: rfc: remove print_vma_addr ? (was Re: [PATCH 00/16] remove eight obsolete architectures)
On Thu, Mar 15, 2018 at 09:56:46AM -0700, Joe Perches wrote: > I have a patchset that creates a vsprintf extension for > print_vma_addr and removes all the uses similar to the > print_symbol() removal. > > This now avoids any possible printk interleaving. > > Unfortunately, without some #ifdef in vsprintf, which > I would like to avoid, it increases the nommu kernel > size by ~500 bytes. > > Anyone think this is acceptable? > > Here's the overall patch, but I have it as a series > --- > Documentation/core-api/printk-formats.rst | 9 + > arch/arm64/kernel/traps.c | 13 +++ > arch/mips/mm/fault.c | 16 - > arch/parisc/mm/fault.c| 15 > arch/riscv/kernel/traps.c | 11 +++--- > arch/s390/mm/fault.c | 7 ++-- > arch/sparc/mm/fault_32.c | 8 ++--- > arch/sparc/mm/fault_64.c | 8 ++--- > arch/tile/kernel/signal.c | 9 ++--- > arch/um/kernel/trap.c | 13 +++ > arch/x86/kernel/signal.c | 10 ++ > arch/x86/kernel/traps.c | 18 -- > arch/x86/mm/fault.c | 12 +++ > include/linux/mm.h| 1 - > lib/vsprintf.c| 58 > ++- > mm/memory.c | 33 -- > 16 files changed, 112 insertions(+), 129 deletions(-) This doesn't feel like a huge win since it's only called ~once per architecture. I'd be more excited if it made the printing of the whole thing standardised; eg we have a print_fault() function in mm/memory.c which takes a suitable set of arguments. -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: rfc: remove print_vma_addr ? (was Re: [PATCH 00/16] remove eight obsolete architectures)
On Thu, 2018-03-15 at 10:08 -0700, Matthew Wilcox wrote: > On Thu, Mar 15, 2018 at 09:56:46AM -0700, Joe Perches wrote: > > I have a patchset that creates a vsprintf extension for > > print_vma_addr and removes all the uses similar to the > > print_symbol() removal. > > > > This now avoids any possible printk interleaving. > > > > Unfortunately, without some #ifdef in vsprintf, which > > I would like to avoid, it increases the nommu kernel > > size by ~500 bytes. > > > > Anyone think this is acceptable? [] > This doesn't feel like a huge win since it's only called ~once per > architecture. I'd be more excited if it made the printing of the whole > thing standardised; eg we have a print_fault() function in mm/memory.c > which takes a suitable set of arguments. Sure but perhaps that's not feasible as the surrounding output is per-arch specific. What could be a standardized fault message here? -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Documentation/CodingStyle: Add an example for braces
On 03/15/2018 05:26 AM, Jani Nikula wrote: On Wed, 14 Mar 2018, Gary R Hook wrote: Add another example of required braces when using a compound statement in a loop. Signed-off-by: Gary R Hook --- Documentation/process/coding-style.rst |9 + 1 file changed, 9 insertions(+) diff --git a/Documentation/process/coding-style.rst b/Documentation/process/coding-style.rst index a20b44a40ec4..d98deb62c400 100644 --- a/Documentation/process/coding-style.rst +++ b/Documentation/process/coding-style.rst @@ -200,6 +200,15 @@ statement; in the latter case use braces in both branches: otherwise(); } +Also, use braces when a loop contains more than a single simple statement: Personally, I'd not limit this to loops. if (condition) { if (another_condition) action(); } You could argue the existing rule already covers these cases by excluding selection and iteration statements from the "single statement" in "Do not unnecessarily use braces where a single statement will do." Define "statement"? There's a school of thought that uses semicolons to indicate a statement. I'm trying to eliminate any ambiguity by calling out compound statements as "more than one statement". Sure, semantics, but in the interest of clarity. An additional sentence and example doesn't really cost much. Thank you for your time. I've made some changes, and a v2 to follow shortly. -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2] Documentation/CodingStyle: Add an example for braces
Add another example of required braces when using a compound statements. Signed-off-by: Gary R Hook --- Changes since v1: - Move the new example up, and make it more generic Documentation/process/coding-style.rst |9 + 1 file changed, 9 insertions(+) diff --git a/Documentation/process/coding-style.rst b/Documentation/process/coding-style.rst index a20b44a40ec4..fcef0b4b59d0 100644 --- a/Documentation/process/coding-style.rst +++ b/Documentation/process/coding-style.rst @@ -188,6 +188,15 @@ and else do_that(); +Do use braces when a body is more complex than a single simple statement: + +.. code-block:: c + + if (condition) { + if (another_condition) + do_something(); + } + This does not apply if only one branch of a conditional statement is a single statement; in the latter case use braces in both branches: -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v10 1/5] arm64: KVM: Prepare set virtual SEI syndrome value
Hi Dongjiu Geng, On 03/03/18 16:09, Dongjiu Geng wrote: > Export one API to specify virtual SEI syndrome value > for guest, and add a helper to get the VSESR_EL2 value. This patch adds two helpers that nothing calls... its not big, please merge it with the patch that uses these. > diff --git a/arch/arm64/include/asm/kvm_emulate.h > b/arch/arm64/include/asm/kvm_emulate.h > index 413dc82..3294885 100644 > --- a/arch/arm64/include/asm/kvm_emulate.h > +++ b/arch/arm64/include/asm/kvm_emulate.h > @@ -71,6 +71,11 @@ static inline void vcpu_set_hcr(struct kvm_vcpu *vcpu, > unsigned long hcr) > vcpu->arch.hcr_el2 = hcr; > } > > +static inline unsigned long vcpu_get_vsesr(struct kvm_vcpu *vcpu) > +{ > + return vcpu->arch.vsesr_el2; > +} > + > static inline void vcpu_set_vsesr(struct kvm_vcpu *vcpu, u64 vsesr) > { > vcpu->arch.vsesr_el2 = vsesr; > diff --git a/arch/arm64/include/asm/kvm_host.h > b/arch/arm64/include/asm/kvm_host.h > index a73f63a..3dc49b7 100644 > --- a/arch/arm64/include/asm/kvm_host.h > +++ b/arch/arm64/include/asm/kvm_host.h > @@ -354,6 +354,8 @@ void handle_exit_early(struct kvm_vcpu *vcpu, struct > kvm_run *run, > int kvm_perf_init(void); > int kvm_perf_teardown(void); > > +void kvm_set_sei_esr(struct kvm_vcpu *vcpu, u64 syndrome); > + > struct kvm_vcpu *kvm_mpidr_to_vcpu(struct kvm *kvm, unsigned long mpidr); > > static inline void __cpu_init_hyp_mode(phys_addr_t pgd_ptr, > diff --git a/arch/arm64/kvm/inject_fault.c b/arch/arm64/kvm/inject_fault.c > index 60666a0..78ecb28 100644 > --- a/arch/arm64/kvm/inject_fault.c > +++ b/arch/arm64/kvm/inject_fault.c > @@ -186,3 +186,8 @@ void kvm_inject_vabt(struct kvm_vcpu *vcpu) > { > pend_guest_serror(vcpu, ESR_ELx_ISV); > } > + > +void kvm_set_sei_esr(struct kvm_vcpu *vcpu, u64 syndrome) > +{ > + pend_guest_serror(vcpu, syndrome & ESR_ELx_ISS_MASK); If you move the ISS_MASK into pend_guest_serror(), you wouldn't need this at all. It would be better if any validation were in the user-space helpers so we can check user-space hasn't put something funny in the top bits. > +} > Thanks, James -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v10 3/5] arm/arm64: KVM: Introduce set and get per-vcpu event
Hi Dongjiu Geng, On 03/03/18 16:09, Dongjiu Geng wrote: > RAS Extension provides VSESR_EL2 register to specify > virtual SError syndrome value, this patch adds a new > IOCTL to export user-invisible states related to > SError exceptions. User space can setup the > kvm_vcpu_events to inject specified SError, also it > can support live migration. > diff --git a/Documentation/virtual/kvm/api.txt > b/Documentation/virtual/kvm/api.txt > index 8a3d708..26ae151 100644 > --- a/Documentation/virtual/kvm/api.txt > +++ b/Documentation/virtual/kvm/api.txt > @@ -819,11 +819,13 @@ struct kvm_clock_data { > > Capability: KVM_CAP_VCPU_EVENTS > Extended by: KVM_CAP_INTR_SHADOW > -Architectures: x86 > +Architectures: x86, arm, arm64 > Type: vm ioctl > Parameters: struct kvm_vcpu_event (out) > Returns: 0 on success, -1 on error > > +X86: > + > Gets currently pending exceptions, interrupts, and NMIs as well as related > states of the vcpu. > > @@ -865,15 +867,29 @@ Only two fields are defined in the flags field: > - KVM_VCPUEVENT_VALID_SMM may be set in the flags field to signal that >smi contains a valid state. > > +ARM, ARM64: > + > +Gets currently pending SError exceptions as well as related states of the > vcpu. > + > +struct kvm_vcpu_events { > + struct { > + bool serror_pending; > + bool serror_has_esr; > + u64 serror_esr; > + } exception; > +}; Don't put bool in an ABI struct. The encoding is up to the compiler. The compiler will insert padding in this struct to make serror_esr naturally aligned. Different compilers may do it differently. You'll see that the existing struct kvm_vcpu_events has 'pad' fields to ensure each element in the struct is naturally aligned. serror_pending and serror_has_esr need to be in a flags field. I thought the logic for re-using the CAP was so user-space could re-use save/restore code to transfer whatever we put in here during migration. If the struct is a different size the code has to be different anyway. My understanding of Drew and Christoffer's comments was that we should re-use the existing struct. (but now that I look at it, its not so clear). (If we reuse the struct, we can put the esr in exception.error_code, if we can get away with it: It would be good to union exception up with a u64, then use that. This would let us transfer anything we need in those RES0 bits of the 64bit VSESR_EL2). > 4.32 KVM_SET_VCPU_EVENTS > > Capability: KVM_CAP_VCPU_EVENTS > Extended by: KVM_CAP_INTR_SHADOW > -Architectures: x86 > +Architectures: x86, arm, arm64 > Type: vm ioctl > Parameters: struct kvm_vcpu_event (in) > Returns: 0 on success, -1 on error > > +X86: > + > Set pending exceptions, interrupts, and NMIs as well as related states of the > vcpu. > > @@ -894,6 +910,12 @@ shall be written into the VCPU. > > KVM_VCPUEVENT_VALID_SMM can only be set if KVM_CAP_X86_SMM is available. > > +ARM, ARM64: > + > +Set pending SError exceptions as well as related states of the vcpu. > + > +See KVM_GET_VCPU_EVENTS for the data structure. > + > > 4.33 KVM_GET_DEBUGREGS > > diff --git a/arch/arm64/include/uapi/asm/kvm.h > b/arch/arm64/include/uapi/asm/kvm.h > index 9abbf30..32c0eae 100644 > --- a/arch/arm64/include/uapi/asm/kvm.h > +++ b/arch/arm64/include/uapi/asm/kvm.h > @@ -39,6 +39,7 @@ > #define __KVM_HAVE_GUEST_DEBUG > #define __KVM_HAVE_IRQ_LINE > #define __KVM_HAVE_READONLY_MEM > +#define __KVM_HAVE_VCPU_EVENTS > > #define KVM_COALESCED_MMIO_PAGE_OFFSET 1 > > @@ -153,6 +154,15 @@ struct kvm_sync_regs { > struct kvm_arch_memory_slot { > }; > > +/* for KVM_GET/SET_VCPU_EVENTS */ > +struct kvm_vcpu_events { > + struct { > + bool serror_pending; > + bool serror_has_esr; > + u64 serror_esr; > + } exception; > +}; > + > /* If you need to interpret the index values, here is the key: */ > #define KVM_REG_ARM_COPROC_MASK 0x0FFF > #define KVM_REG_ARM_COPROC_SHIFT 16 > diff --git a/arch/arm64/kvm/guest.c b/arch/arm64/kvm/guest.c > index 5c7f657..62d49c2 100644 > --- a/arch/arm64/kvm/guest.c > +++ b/arch/arm64/kvm/guest.c > @@ -277,6 +277,32 @@ int kvm_arch_vcpu_ioctl_set_sregs(struct kvm_vcpu *vcpu, > return -EINVAL; > } > > +int kvm_arm_vcpu_get_events(struct kvm_vcpu *vcpu, > + struct kvm_vcpu_events *events) > +{ > + events->exception.serror_pending = (vcpu_get_hcr(vcpu) & HCR_VSE); > + events->exception.serror_has_esr = > + cpus_have_const_cap(ARM64_HAS_RAS_EXTN) && > + (!!vcpu_get_vsesr(vcpu)); > + events->exception.serror_esr = vcpu_get_vsesr(vcpu); > + > + return 0; Nothing checks the return value. Why is it here? > +} > + > +int kvm_arm_vcpu_set_events(struct kvm_vcpu *vcpu, > + struct kvm_vcpu_events *events) > +{ > + bool injected = events->exception.serror_pending; > + bool has_es
Re: [PATCH v10 2/5] arm64: KVM: export the capability to set guest SError syndrome
Hi Dongjiu Geng, On 03/03/18 16:09, Dongjiu Geng wrote: > Before user space injects a SError, it needs to know whether it can > specify the guest Exception Syndrome, so KVM should tell user space > whether it has such capability. > diff --git a/Documentation/virtual/kvm/api.txt > b/Documentation/virtual/kvm/api.txt > index fc3ae95..8a3d708 100644 > --- a/Documentation/virtual/kvm/api.txt > +++ b/Documentation/virtual/kvm/api.txt > @@ -4415,3 +4415,14 @@ Parameters: none > This capability indicates if the flic device will be able to get/set the > AIS states for migration via the KVM_DEV_FLIC_AISM_ALL attribute and allows > to discover this without having to create a flic device. > + > +8.14 KVM_CAP_ARM_SET_SERROR_ESR > + > +Architectures: arm, arm64 > + > +This capability indicates that userspace can specify syndrome value reported > to > +guest OS when guest takes a virtual SError interrupt exception. "when userspace triggers a virtual SError"... how? > +If KVM has this capability, userspace can only specify the ISS field for the > ESR > +syndrome, can not specify the EC field which is not under control by KVM. Where do I put the ESR? If you re-order this after the patch that adds the API, you can describe how this can be used. Thanks, James > +If this virtual SError is taken to EL1 using AArch64, this value will be > reported > +into ISS filed of ESR_EL1. > diff --git a/arch/arm64/kvm/reset.c b/arch/arm64/kvm/reset.c > index 3256b92..38c8a64 100644 > --- a/arch/arm64/kvm/reset.c > +++ b/arch/arm64/kvm/reset.c > @@ -77,6 +77,9 @@ int kvm_arch_dev_ioctl_check_extension(struct kvm *kvm, > long ext) > case KVM_CAP_ARM_PMU_V3: > r = kvm_arm_support_pmu_v3(); > break; > + case KVM_CAP_ARM_INJECT_SERROR_ESR: > + r = cpus_have_const_cap(ARM64_HAS_RAS_EXTN); > + break; > case KVM_CAP_SET_GUEST_DEBUG: > case KVM_CAP_VCPU_ATTRIBUTES: > r = 1; > diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h > index 8fb90a0..3587b33 100644 > --- a/include/uapi/linux/kvm.h > +++ b/include/uapi/linux/kvm.h > @@ -934,6 +934,7 @@ struct kvm_ppc_resize_hpt { > #define KVM_CAP_S390_AIS_MIGRATION 150 > #define KVM_CAP_PPC_GET_CPU_CHAR 151 > #define KVM_CAP_S390_BPB 152 > +#define KVM_CAP_ARM_INJECT_SERROR_ESR 153 > > #ifdef KVM_CAP_IRQ_ROUTING > > -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v9 5/7] arm64: kvm: Introduce KVM_ARM_SET_SERROR_ESR ioctl
Hi gengdongjiu, On 08/03/18 06:18, gengdongjiu wrote: > Hi James, >sorry for my late response due to chines new year. Happy new year, > 2018-02-16 1:55 GMT+08:00 James Morse : >> On 12/02/18 10:19, gengdongjiu wrote: >>> On 2018/2/10 1:44, James Morse wrote: The point? We can't know what a CPU without the RAS extensions puts in there. Why Does this matter? When migrating a pending SError we have to know the difference between 'use this 64bit value', and 'the CPU will generate it'. If I make an SError pending with ESR=0 on a CPU with VSESR, I can't migrated to a system that generates an impdef SError-ESR, because I can't know it will be 0. >> >>> For the target system, before taking the SError, no one can know whether >>> its syndrome value >>> is IMPLEMENTATION DEFINED or architecturally defined. >> >> For a virtual-SError, the hypervisor knows what it generated. (do I have >> VSESR_EL2? What did I put in there?). >> >> >>> when the virtual SError is taken, the ESR_ELx.IDS will be updated, then we >>> can know >>> whether the ESR value is impdef or architecturally defined. >> >> True, the guest can't know anything about a pending virtual SError until it >> takes it. Why is this a problem? >> >> >>> It seems migration is only allowed only when target system and source >>> system all support >>> RAS extension, because we do not know whether its syndrome is >>> IMPLEMENTATION DEFINED or >>> architecturally defined. >> >> I don't think Qemu allows migration between hosts with differing guest-ID >> registers. But we shouldn't depend on this, and we may want to hide the v8.2 >> RAS >> features from the guest's ID register, but still use them from the host. >> >> The way I imagined it working was we would pack the following information >> into >> that events struct: >> { >> bool serror_pending; >> bool serror_has_esr; >> u64 serror_esr; >> } > > I have used your suggestion struct Ah! This is where it came from. Sorry, this was just to illustrate the information/sizes we wanted to transfer I didn't mean it literally. I should have said "64 bits of ESR, so that we can transfer anything that is added to VSESR_EL2 in the future, a flag somewhere to indicate an serror is pending, and another flag to indicate the ESR has a value we should use". Thanks/Sorry! James >> The problem I was trying to describe is because there is no value of >> serror_esr >> we can use to mean 'Ignore this, I'm a v8.0 CPU'. VSESR_EL2 is a 64bit >> register, >> any bits we abuse may get a meaning we want to use in the future. >> >> When it comes to migration, v8.{0,1} systems can only GET/SET events where >> serror_has_esr == false, they can't use the serror_esr. On v8.2 systems we >> should require serror_has_esr to be true. > yes, I agreed. > >> >> If we need to support migration from v8.{0,1} to v8.2, we can make up an >> impdef >> serror_esr. > > For the Qemu migration, I need to check more the QEMU code. > Hi Andrew, > I use KVM_GET/SET_VCPU_EVENTS IOCTL to migrate the Serror > exception status of VM, > The even struct is shown below: > > { > bool serror_pending; > bool serror_has_esr; > u64 serror_esr; > } > > Only when the target machine is armv8.2, it needs to set the > serror_esr(SError Exception Syndrome Register). > for the armv8.0, software can not set the serror_esr(SError Exception > Syndrome Register). > so when migration from v8.{0,1} to v8.2, QEMU should make up an impdef > serror_esr for the v8.2 target. > can you give me some suggestion how to set that register in the QEMU? > I do not familar with the QEMU migration. > Thanks very much. -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v5 2/2] cpuset: Add cpuset.flags control knob to v2
In cgroup v1, there are a bunch of cpuset control knobs that are just boolean flags. To reduce the proliteration of cpuset control knobs in v2 where multiple controllers will be active in a single cgroup, all the boolean flags are now consolidated into a single cpuset.flags control knob. Currently, the cpuset.flags control knob just has one flag - sched_load_balance. This flag is needed to enable CPU isolation similar to what can be done with the "isolcpus" kernel boot parameter. More flags can be added in the future if desired. When the "cpuset.flags" file is read, it shows what flags have been currently enabled. Turning on or off specific flags are done in a way similar to what controllers can be enabled or disabled in the cgroup.subtree_control control knob. To enable the sched_load_balance flag, use # echo +sched_load_balance > cpuset.flags To disable it, use # echo -sched_load_balance > cpuset.flags Signed-off-by: Waiman Long --- Documentation/cgroup-v2.txt | 32 +++ kernel/cgroup/cpuset.c | 98 + 2 files changed, 130 insertions(+) diff --git a/Documentation/cgroup-v2.txt b/Documentation/cgroup-v2.txt index b91fd5d..362026a 100644 --- a/Documentation/cgroup-v2.txt +++ b/Documentation/cgroup-v2.txt @@ -1318,6 +1318,38 @@ Cpuset Interface Files of "cpuset.mems". Its value will be affected by memory nodes hotplug events. + cpuset.flags + A read-write multiple values file which exists on non-root + cgroups. + + On read, it lists the flags that are currently enabled. On + write, the '+' or '-' prefix with the flag name is used to + enable and disable the flag respectively. + + The currently supported flag is: + + sched_load_balance + When it is not set, there will be no load balancing + among CPUs on this cpuset. Tasks will stay in the + CPUs they are running on and will not be moved to + other CPUs. + + When it is set, tasks within this cpuset will be + load-balanced by the kernel scheduler. Tasks will be + moved from CPUs with high load to other CPUs within + the same cpuset with less load periodically. + + This flag is on by default and will have to be + explicitly turned off. + + To set a flag, use the '+' prefix: + + # echo +sched_load_balance > cpuset.flags + + To clear a flag, use the '-' prefix: + + # echo -sched_load_balance > cpuset.flags + IO -- diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c index 7837d1f..3145dc0 100644 --- a/kernel/cgroup/cpuset.c +++ b/kernel/cgroup/cpuset.c @@ -1601,6 +1601,7 @@ static void cpuset_attach(struct cgroup_taskset *tset) FILE_MEMORY_PRESSURE, FILE_SPREAD_PAGE, FILE_SPREAD_SLAB, + FILE_FLAGS, } cpuset_filetype_t; static int cpuset_write_u64(struct cgroup_subsys_state *css, struct cftype *cft, @@ -1823,6 +1824,97 @@ static s64 cpuset_read_s64(struct cgroup_subsys_state *css, struct cftype *cft) return 0; } +static const struct { + char *name; + int cs_bit; +} cpuset_flags[] = { + { "sched_load_balance", CS_SCHED_LOAD_BALANCE }, +}; + +static int cpuset_read_flags(struct seq_file *sf, void *v) +{ + struct cpuset *cs = css_cs(seq_css(sf)); + unsigned long enabled = READ_ONCE(cs->flags); + int i, cnt; + + for (i = cnt = 0; i < ARRAY_SIZE(cpuset_flags); i++) { + if (!test_bit(cpuset_flags[i].cs_bit, &enabled)) + continue; + + if (cnt++) + seq_putc(sf, ' '); + + seq_printf(sf, "%s", cpuset_flags[i].name); + } + seq_putc(sf, '\n'); + return 0; +} + +static ssize_t cpuset_write_flags(struct kernfs_open_file *of, char *buf, + size_t nbytes, loff_t off) +{ + struct cpuset *cs = css_cs(of_css(of)); + unsigned long enable = 0, disable = 0; + char *tok; + int i; + + mutex_lock(&cpuset_mutex); + if (!is_cpuset_online(cs)) { + nbytes = -ENODEV; + goto out_unlock; + } + + /* +* Parse input - space seperated list of feature names prefixed +* with either + or -. +*/ + buf = strstrip(buf); + while ((tok = strsep(&buf, " "))) { + if (tok[0] == '\0') + continue; + for (i = 0; i < ARRAY_SIZE(cpuset_flags); i++) + if (!strcmp(tok + 1, cpuset_flags[i].name)) + break; + if (i == ARRAY_SIZE(cpuset_flags)) { + nbytes = -EINVAL; + goto out_unlock; + } + if (*tok == '+') { + set_bit(cpuset_flags[i].cs_bit, &cs->flags); +
[PATCH v5 0/2] cpuset: Enable cpuset controller in default hierarchy
v5: - Add patch 2 to provide the cpuset.flags control knob for the sched_load_balance flag which should be the only feature that is essential as a replacement of the "isolcpus" kernel boot parameter. v4: - Further minimize the feature set by removing the flags control knob. v3: - Further trim the additional features down to just memory_migrate. - Update Documentation/cgroup-v2.txt. The purpose of this patchset is to provide a minimal set of cpuset features for cgroup v2. That minimal set includes the cpus, mems and their effective_* counterparts as well as a new flags control knob that currently supports only the sched_load_balance flag. This patchset does not exclude the possibility of adding more flags and features in the future after careful consideration. Patch 1 enables cpuset in cgroup v2 with cpus, mems and their effective_* counterparts. Patch 2 adds flags with support for the sched_load_balance only. Waiman Long (2): cpuset: Enable cpuset controller in default hierarchy cpuset: Add cpuset.flags control knob to v2 Documentation/cgroup-v2.txt | 128 kernel/cgroup/cpuset.c | 140 +++- 2 files changed, 256 insertions(+), 12 deletions(-) -- 1.8.3.1 -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v5 1/2] cpuset: Enable cpuset controller in default hierarchy
Given the fact that thread mode had been merged into 4.14, it is now time to enable cpuset to be used in the default hierarchy (cgroup v2) as it is clearly threaded. The cpuset controller had experienced feature creep since its introduction more than a decade ago. Besides the core cpus and mems control files to limit cpus and memory nodes, there are a bunch of additional features that can be controlled from the userspace. Some of the features are of doubtful usefulness and may not be actively used. This patch enables cpuset controller in the default hierarchy with a minimal set of features, namely just the cpus and mems and their effective_* counterparts. We can certainly add more features to the default hierarchy in the future if there is a real user need for them later on. Alternatively, with the unified hiearachy, it may make more sense to move some of those additional cpuset features, if desired, to memory controller or may be to the cpu controller instead of staying with cpuset. Signed-off-by: Waiman Long --- Documentation/cgroup-v2.txt | 96 - kernel/cgroup/cpuset.c | 44 +++-- 2 files changed, 127 insertions(+), 13 deletions(-) diff --git a/Documentation/cgroup-v2.txt b/Documentation/cgroup-v2.txt index 74cdeae..b91fd5d 100644 --- a/Documentation/cgroup-v2.txt +++ b/Documentation/cgroup-v2.txt @@ -48,16 +48,18 @@ v1 is available under Documentation/cgroup-v1/. 5-2-1. Memory Interface Files 5-2-2. Usage Guidelines 5-2-3. Memory Ownership - 5-3. IO - 5-3-1. IO Interface Files - 5-3-2. Writeback - 5-4. PID - 5-4-1. PID Interface Files - 5-5. Device - 5-6. RDMA - 5-6-1. RDMA Interface Files - 5-7. Misc - 5-7-1. perf_event + 5-3. Cpuset + 5.3-1. Cpuset Interface Files + 5-4. IO + 5-4-1. IO Interface Files + 5-4-2. Writeback + 5-5. PID + 5-5-1. PID Interface Files + 5-6. Device + 5-7. RDMA + 5-7-1. RDMA Interface Files + 5-8. Misc + 5-8-1. perf_event 5-N. Non-normative information 5-N-1. CPU controller root cgroup process behaviour 5-N-2. IO controller root cgroup process behaviour @@ -1243,6 +1245,80 @@ POSIX_FADV_DONTNEED to relinquish the ownership of memory areas belonging to the affected files to ensure correct memory ownership. +Cpuset +-- + +The "cpuset" controller provides a mechanism for constraining +the CPU and memory node placement of tasks to only the resources +specified in the cpuset interface files in a task's current cgroup. +This is especially valuable on large NUMA systems where placing jobs +on properly sized subsets of the systems with careful processor and +memory placement to reduce cross-node memory access and contention +can improve overall system performance. + +The "cpuset" controller is hierarchical. That means the controller +cannot use CPUs or memory nodes not allowed in its parent. + + +Cpuset Interface Files +~~ + + cpuset.cpus + A read-write multiple values file which exists on non-root + cgroups. + + It lists the CPUs allowed to be used by tasks within this + cgroup. The CPU numbers are comma-separated numbers or + ranges. For example: + + # cat cpuset.cpus + 0-4,6,8-10 + + An empty value indicates that the cgroup is using the same + setting as the nearest cgroup ancestor with a non-empty + "cpuset.cpus" or all the available CPUs if none is found. + + The value of "cpuset.cpus" stays constant until the next update + and won't be affected by any CPU hotplug events. + + cpuset.effective_cpus + A read-only multiple values file which exists on non-root + cgroups. + + It lists the onlined CPUs that are actually allowed to be + used by tasks within the current cgroup. It is a subset of + "cpuset.cpus". Its value will be affected by CPU hotplug + events. + + cpuset.mems + A read-write multiple values file which exists on non-root + cgroups. + + It lists the memory nodes allowed to be used by tasks within + this cgroup. The memory node numbers are comma-separated + numbers or ranges. For example: + + # cat cpuset.mems + 0-1,3 + + An empty value indicates that the cgroup is using the same + setting as the nearest cgroup ancestor with a non-empty + "cpuset.mems" or all the available memory nodes if none + is found. + + The value of "cpuset.mems" stays constant until the next update + and won't be affected by any memory nodes hotplug events. + + cpuset.effective_mems + A read-only multiple values file which exists on non-root + cgroups. + + It lists the onlined memory nodes that are actually allowed + to be used by tasks within the current cgroup. It is a subset + of "cpuset.mems". Its value will
Spende von € 3.400.000,00 EUR
Hallo,ich bin Roy Cockrum, 58, aus Knoxville,Tennessee Vereinigte Staaten von Amerika, Sie haben eine Spende von € 3.400.000,00 EUR, kontaktieren Sie mich für weitere Details: roycockrumdonatingfu...@gmail.com -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 00/16] remove eight obsolete architectures
Hi, On Thu, Mar 15, 2018 at 10:56:48AM +0100, Arnd Bergmann wrote: > On Thu, Mar 15, 2018 at 10:42 AM, David Howells wrote: > > Do we have anything left that still implements NOMMU? Please don't kill !MMU. > Yes, plenty. > I've made an overview of the remaining architectures for my own reference[1]. > The remaining NOMMU architectures are: > > - arch/arm has ARMv7-M (Cortex-M microcontroller), which is actually > gaining traction ARMv7-R as well, also seems ARM is coming up with more !MMU's - v8-M, v8-R. In addition, though only of academic interest, ARM MMU capable platform's can run !MMU Linux. afzal > - arch/sh has an open-source J2 core that was added not that long ago, > it seems to > be the only SH compatible core that anyone is working on. > - arch/microblaze supports both MMU/NOMMU modes (most use an MMU) > - arch/m68k supports several NOMMU targets, both the coldfire SoCs and the > classic processors > - c6x has no MMU -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html