Re: [PATCH] fs/dcache.c: re-add cond_resched() in shrink_dcache_parent()
On 14.04.2018 00:14, Andrew Morton wrote: > On Fri, 13 Apr 2018 13:28:23 -0700 Khazhismel Kumykov > wrote: > >> shrink_dcache_parent may spin waiting for a parallel shrink_dentry_list. >> In this case we may have 0 dentries to dispose, so we will never >> schedule out while waiting for the parallel shrink_dentry_list to >> complete. >> >> Tested that this fixes syzbot reports of stalls in shrink_dcache_parent() > > Well I guess the patch is OK as a stopgap, but things seem fairly > messed up in there. shrink_dcache_parent() shouldn't be doing a > busywait, waiting for the concurrent shrink_dentry_list(). > > Either we should be waiting (sleeping) for the concurrent operation to > complete or we should just bail out of shrink_dcache_parent(), perhaps > with > > if (list_empty(&data.dispose)) > break; > > or similar. Dunno. I agree, however, not being a dcache expert I'd refrain from touching it, since it seems to be rather fragile. Perhaps Al could take a look in there? > > > That block comment over `struct select_data' is not a good one. "It > returns zero iff...". *What* returns zero? select_collect()? No it > doesn't, it returns an `enum d_walk_ret'. Perhaps the comment is > trying to refer to select_data.found. And the real interpretation of > select_data.found is, umm, hard to describe. "Counts the number of > dentries which are on a shrink list or which were moved to the dispose > list". Why? What's that all about? > > This code needs a bit of thought, documentation and perhaps a redo, > I suspect. >
Re: INFO: rcu detected stall in shrink_dcache_parent
Infinite loop inside shrink_dcache_parent() due to lack of cond_resched(). I can reproduce this issue by running the reproducer on one CPU (using "taskset -c 0"). Reverting commit 32785c0539b7e96f ("fs/dcache.c: add cond_resched() in shrink_dentry_list()") solves this issue. #syz dup: INFO: rcu detected stall in d_walk
Re: INFO: rcu detected stall in vfs_rmdir
Infinite loop inside shrink_dcache_parent() due to lack of cond_resched(). I can reproduce this issue by running the reproducer on one CPU (using "taskset -c 0"). Reverting commit 32785c0539b7e96f ("fs/dcache.c: add cond_resched() in shrink_dentry_list()") solves this issue. #syz dup: INFO: rcu detected stall in d_walk
Re: INFO: rcu detected stall in do_raw_spin_unlock
Infinite loop inside shrink_dcache_parent() due to lack of cond_resched(). I can reproduce this issue by running the reproducer on one CPU (using "taskset -c 0"). Reverting commit 32785c0539b7e96f ("fs/dcache.c: add cond_resched() in shrink_dentry_list()") solves this issue. #syz dup: INFO: rcu detected stall in d_walk
Re: INFO: rcu detected stall in _raw_spin_unlock
Infinite loop inside shrink_dcache_parent() due to lack of cond_resched(). I can reproduce this issue by running the reproducer on one CPU (using "taskset -c 0"). Reverting commit 32785c0539b7e96f ("fs/dcache.c: add cond_resched() in shrink_dentry_list()") solves this issue. #syz dup: INFO: rcu detected stall in d_walk
Re: [PATCH] fs: avoid fdput() after failed fdget() in vfs_dedupe_file_range()
On Sat, Apr 14, 2018 at 01:16:58AM -0500, Zev Weiss wrote: > It's a fairly inconsequential bug, since fdput() won't actually try to > fput() the file due to fd.flags (and thus FDPUT_FPUT) being zero in > the failure case, but most other vfs code takes steps to avoid this. Applied.
Re: [PATCH] fs/dcache.c: re-add cond_resched() in shrink_dcache_parent()
On Sat, Apr 14, 2018 at 10:00:29AM +0300, Nikolay Borisov wrote: > > > On 14.04.2018 00:14, Andrew Morton wrote: > > On Fri, 13 Apr 2018 13:28:23 -0700 Khazhismel Kumykov > > wrote: > > > >> shrink_dcache_parent may spin waiting for a parallel shrink_dentry_list. > >> In this case we may have 0 dentries to dispose, so we will never > >> schedule out while waiting for the parallel shrink_dentry_list to > >> complete. > >> > >> Tested that this fixes syzbot reports of stalls in shrink_dcache_parent() > > > > Well I guess the patch is OK as a stopgap, but things seem fairly > > messed up in there. shrink_dcache_parent() shouldn't be doing a > > busywait, waiting for the concurrent shrink_dentry_list(). > > > > Either we should be waiting (sleeping) for the concurrent operation to > > complete or we should just bail out of shrink_dcache_parent(), perhaps > > with > > > > if (list_empty(&data.dispose)) > > break; > > > > or similar. Dunno. > > I agree, however, not being a dcache expert I'd refrain from touching > it, since it seems to be rather fragile. Perhaps Al could take a look in > there? "Bail out" is definitely a bad idea, "sleep"... what on? Especially since there might be several evictions we are overlapping with...
Re: [PATCH v1 1/1] usb: core: Add quirk for HP v222w 16GB Mini
Hello! On 4/13/2018 8:40 PM, sathyanarayanan.kuppusw...@linux.intel.com wrote: From: Kamil Lulko Add DELAY_INIT quirk to fix the following problem with HP v222w 16GB Mini: usb 1-3: unable to read config index 0 descriptor/start: -110 usb 1-3: can't read configurations, error -110 usb 1-3: can't set config #1, error -110 Signed-off-by: Kamil Lulko Signed-off-by: Kuppuswamy Sathyanarayanan --- drivers/usb/core/quirks.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/usb/core/quirks.c b/drivers/usb/core/quirks.c index 54b019e..f2ef913 100644 --- a/drivers/usb/core/quirks.c +++ b/drivers/usb/core/quirks.c @@ -40,6 +40,9 @@ static const struct usb_device_id usb_quirk_list[] = { { USB_DEVICE(0x03f0, 0x0701), .driver_info = USB_QUIRK_STRING_FETCH_255 }, +/* HP v222w 16GB Mini USB Drive */ +{ USB_DEVICE(0x03f0, 0x3f40), .driver_info = USB_QUIRK_DELAY_INIT }, + Please indent with tabs (as above and below), not spaces. /* Creative SB Audigy 2 NX */ { USB_DEVICE(0x041e, 0x3020), .driver_info = USB_QUIRK_RESET_RESUME }, MBR, Sergei
[PATCH v3] net: davicom: dm9000: Avoid spinlock recursion during dm9000_timeout routine
On the DM9000B, dm9000_phy_write() is called after the main spinlock is held, during the dm9000_timeout() routine. Spinlock recursion occurs because the main spinlock is requested again in dm9000_phy_write(). So spinlock should be avoided in phy operation during the dm9000_timeout() routine. --- v3: When a task enters dm9000_timeout() and gets the main spinlock, another task that wants to do asynchronous phy operation must be running on another cpu.Because of different cpus, this asynchronous task will be blocked in dm9000_phy_write() until dm9000_timeout() routine is completed. --- Signed-off-by: Liu Xiang --- drivers/net/ethernet/davicom/dm9000.c | 39 +-- 1 file changed, 28 insertions(+), 11 deletions(-) diff --git a/drivers/net/ethernet/davicom/dm9000.c b/drivers/net/ethernet/davicom/dm9000.c index 50222b7..56df77d 100644 --- a/drivers/net/ethernet/davicom/dm9000.c +++ b/drivers/net/ethernet/davicom/dm9000.c @@ -112,7 +112,7 @@ struct board_info { u8 imr_all; unsigned intflags; - unsigned intin_timeout:1; + int timeout_cpu; unsigned intin_suspend:1; unsigned intwake_supported:1; @@ -158,6 +158,17 @@ static inline struct board_info *to_dm9000_board(struct net_device *dev) return netdev_priv(dev); } +static bool dm9000_current_in_timeout(struct board_info *db) +{ + bool ret = false; + + preempt_disable(); + ret = (db->timeout_cpu == smp_processor_id()); + preempt_enable(); + + return ret; +} + /* DM9000 network board routine */ /* @@ -276,7 +287,7 @@ static void dm9000_dumpblk_32bit(void __iomem *reg, int count) */ static void dm9000_msleep(struct board_info *db, unsigned int ms) { - if (db->in_suspend || db->in_timeout) + if (db->in_suspend || dm9000_current_in_timeout(db)) mdelay(ms); else msleep(ms); @@ -335,12 +346,13 @@ static void dm9000_msleep(struct board_info *db, unsigned int ms) struct board_info *db = netdev_priv(dev); unsigned long flags; unsigned long reg_save; + bool in_timeout = dm9000_current_in_timeout(db); dm9000_dbg(db, 5, "phy_write[%02x] = %04x\n", reg, value); - if (!db->in_timeout) + if (!in_timeout) { mutex_lock(&db->addr_lock); - - spin_lock_irqsave(&db->lock, flags); + spin_lock_irqsave(&db->lock, flags); + } /* Save previous register address */ reg_save = readb(db->io_addr); @@ -356,11 +368,13 @@ static void dm9000_msleep(struct board_info *db, unsigned int ms) iow(db, DM9000_EPCR, EPCR_EPOS | EPCR_ERPRW); writeb(reg_save, db->io_addr); - spin_unlock_irqrestore(&db->lock, flags); + if (!in_timeout) + spin_unlock_irqrestore(&db->lock, flags); dm9000_msleep(db, 1); /* Wait write complete */ - spin_lock_irqsave(&db->lock, flags); + if (!in_timeout) + spin_lock_irqsave(&db->lock, flags); reg_save = readb(db->io_addr); iow(db, DM9000_EPCR, 0x0); /* Clear phyxcer write command */ @@ -368,9 +382,10 @@ static void dm9000_msleep(struct board_info *db, unsigned int ms) /* restore the previous address */ writeb(reg_save, db->io_addr); - spin_unlock_irqrestore(&db->lock, flags); - if (!db->in_timeout) + if (!in_timeout) { + spin_unlock_irqrestore(&db->lock, flags); mutex_unlock(&db->addr_lock); + } } /* dm9000_set_io @@ -980,7 +995,7 @@ static void dm9000_timeout(struct net_device *dev) /* Save previous register address */ spin_lock_irqsave(&db->lock, flags); - db->in_timeout = 1; + db->timeout_cpu = smp_processor_id(); reg_save = readb(db->io_addr); netif_stop_queue(dev); @@ -992,7 +1007,7 @@ static void dm9000_timeout(struct net_device *dev) /* Restore previous register address */ writeb(reg_save, db->io_addr); - db->in_timeout = 0; + db->timeout_cpu = -1; spin_unlock_irqrestore(&db->lock, flags); } @@ -1670,6 +1685,8 @@ static struct dm9000_plat_data *dm9000_parse_dt(struct device *dev) db->mii.mdio_read= dm9000_phy_read; db->mii.mdio_write = dm9000_phy_write; + db->timeout_cpu = -1; + mac_src = "eeprom"; /* try reading the node address from the attached EEPROM */ -- 1.9.1
kernel-4.9.94 compile error: 'KMOD_DECOMP_LEN' undeclared
Hi, Compile linux-4.9.94 will have error related to KMOD_DECOMP_LEN undeclared. Searching string related to KMOD_DECOMP_LEN in linux-4.9.94 and linux-4.15.17 sources as below: sh-4.2# grep -r KMOD_DECOMP_LEN ./linux-4.15.17 ./linux-4.15.17/tools/perf/tests/code-reading.c: char decomp_name[KMOD_DECOMP_LEN]; ./linux-4.15.17/tools/perf/util/dso.h:#define KMOD_DECOMP_LEN sizeof(KMOD_DECOMP_NAME) ./linux-4.15.17/tools/perf/util/annotate.c: char tmp[KMOD_DECOMP_LEN]; ./linux-4.15.17/tools/perf/util/dso.c: char newpath[KMOD_DECOMP_LEN]; sh-4.2# grep -r KMOD_DECOMP_LEN ./linux-4.9.94 ./linux-4.9.94/tools/perf/tests/code-reading.c: char decomp_name[KMOD_DECOMP_LEN]; ./linux-4.9.94/tools/perf/util/dso.c: char newpath[KMOD_DECOMP_LEN]; So I guess for linux-4.9.94 has not define KMOD_DECOMP_LEN in tools/perf/util/dso.h? Thanks. Regards, Giam Teck Choon
INFO: rcu detected stall in shrink_dentry_list
Hello, syzbot hit the following crash on upstream commit 16e205cf42da1f497b10a4a24f563e6c0d574eec (Fri Apr 13 03:56:10 2018 +) Merge tag 'drm-fixes-for-v4.17-rc1' of git://people.freedesktop.org/~airlied/linux syzbot dashboard link: https://syzkaller.appspot.com/bug?extid=9275da3e0f734e102b61 Unfortunately, I don't have any reproducer for this crash yet. Raw console output: https://syzkaller.appspot.com/x/log.txt?id=4692036947017728 Kernel config: https://syzkaller.appspot.com/x/.config?id=-5947642240294114534 compiler: gcc (GCC) 8.0.1 20180301 (experimental) IMPORTANT: if you fix the bug, please add the following tag to the commit: Reported-by: syzbot+9275da3e0f734e102...@syzkaller.appspotmail.com It will help syzbot understand when the bug is fixed. See footer for details. If you forward the report, please keep this part and the footer. INFO: rcu_sched self-detected stall on CPU 1-...!: (124995 ticks this GP) idle=b86/1/4611686018427387906 softirq=32196/32196 fqs=3 (t=125000 jiffies g=16751 c=16750 q=347) rcu_sched kthread starved for 124987 jiffies! g16751 c16750 f0x0 RCU_GP_WAIT_FQS(3) ->state=0x0 ->cpu=1 RCU grace-period kthread stack dump: rcu_sched R running task23544 9 2 0x8000 Call Trace: context_switch kernel/sched/core.c:2848 [inline] __schedule+0x801/0x1e30 kernel/sched/core.c:3490 schedule+0xef/0x430 kernel/sched/core.c:3549 schedule_timeout+0x138/0x240 kernel/time/timer.c:1801 rcu_gp_kthread+0x6b5/0x1940 kernel/rcu/tree.c:2231 kthread+0x345/0x410 kernel/kthread.c:238 ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:411 NMI backtrace for cpu 1 CPU: 1 PID: 4559 Comm: syz-executor6 Not tainted 4.16.0+ #2 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x1b9/0x294 lib/dump_stack.c:113 nmi_cpu_backtrace.cold.4+0x19/0xce lib/nmi_backtrace.c:103 nmi_trigger_cpumask_backtrace+0x151/0x192 lib/nmi_backtrace.c:62 arch_trigger_cpumask_backtrace+0x14/0x20 arch/x86/kernel/apic/hw_nmi.c:38 trigger_single_cpu_backtrace include/linux/nmi.h:156 [inline] rcu_dump_cpu_stacks+0x175/0x1c2 kernel/rcu/tree.c:1376 print_cpu_stall kernel/rcu/tree.c:1525 [inline] check_cpu_stall.isra.61.cold.80+0x36c/0x59a kernel/rcu/tree.c:1593 __rcu_pending kernel/rcu/tree.c:3356 [inline] rcu_pending kernel/rcu/tree.c:3401 [inline] rcu_check_callbacks+0x21b/0xad0 kernel/rcu/tree.c:2763 update_process_times+0x2d/0x70 kernel/time/timer.c:1636 tick_sched_handle+0x9f/0x180 kernel/time/tick-sched.c:173 tick_sched_timer+0x45/0x130 kernel/time/tick-sched.c:1283 __run_hrtimer kernel/time/hrtimer.c:1386 [inline] __hrtimer_run_queues+0x3e3/0x10a0 kernel/time/hrtimer.c:1448 hrtimer_interrupt+0x286/0x650 kernel/time/hrtimer.c:1506 local_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1025 [inline] smp_apic_timer_interrupt+0x15d/0x710 arch/x86/kernel/apic/apic.c:1050 apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:862 RIP: 0010:__sanitizer_cov_trace_pc+0x14/0x50 kernel/kcov.c:94 RSP: 0018:88018e2e7a80 EFLAGS: 0246 ORIG_RAX: ff13 RAX: 88018e2de680 RBX: 88018e2e7bf8 RCX: 81c2b1d9 RDX: RSI: 81c26bf3 RDI: 88018e2e7bf8 RBP: 88018e2e7a80 R08: 88018e2de680 R09: ed003b51c378 R10: ed003b51c378 R11: 8801da8e1bc3 R12: 88018e2e7c30 R13: dc00 R14: 110031c5cf7e R15: ed0031c5cf81 shrink_dentry_list+0x5a8/0x7c0 fs/dcache.c:1087 shrink_dcache_parent+0xba/0x230 fs/dcache.c:1490 vfs_rmdir+0x202/0x470 fs/namei.c:3850 do_rmdir+0x523/0x610 fs/namei.c:3911 SYSC_rmdir fs/namei.c:3929 [inline] SyS_rmdir+0x1a/0x20 fs/namei.c:3927 do_syscall_64+0x29e/0x9d0 arch/x86/entry/common.c:287 entry_SYSCALL_64_after_hwframe+0x42/0xb7 RIP: 0033:0x455087 RSP: 002b:7fff8b6b76b8 EFLAGS: 0206 ORIG_RAX: 0054 RAX: ffda RBX: 0065 RCX: 00455087 RDX: RSI: 7fff8b6b9460 RDI: 7fff8b6b9460 RBP: 7fff8b6b9460 R08: R09: 0001 R10: 000a R11: 0206 R12: 02768940 R13: R14: 01ec R15: 0001984e --- This bug is generated by a dumb bot. It may contain errors. See https://goo.gl/tpsmEJ for details. Direct all questions to syzkal...@googlegroups.com. syzbot will keep track of this bug report. If you forgot to add the Reported-by tag, once the fix for this bug is merged into any tree, please reply to this email with: #syz fix: exact-commit-title To mark this as a duplicate of another syzbot report, please reply with: #syz dup: exact-subject-of-another-report If it's a one-off invalid bug report, please reply with: #syz invalid Note: if the crash happens again, it will cause creation of a new bug report. Note: all commands must start from beginning of the line in the email body.
Re: INFO: rcu detected stall in shrink_dentry_list
On Sat, Apr 14, 2018 at 11:43 AM, syzbot wrote: > Hello, > > syzbot hit the following crash on upstream commit > 16e205cf42da1f497b10a4a24f563e6c0d574eec (Fri Apr 13 03:56:10 2018 +) > Merge tag 'drm-fixes-for-v4.17-rc1' of > git://people.freedesktop.org/~airlied/linux > syzbot dashboard link: > https://syzkaller.appspot.com/bug?extid=9275da3e0f734e102b61 > > Unfortunately, I don't have any reproducer for this crash yet. > Raw console output: > https://syzkaller.appspot.com/x/log.txt?id=4692036947017728 > Kernel config: > https://syzkaller.appspot.com/x/.config?id=-5947642240294114534 > compiler: gcc (GCC) 8.0.1 20180301 (experimental) > > IMPORTANT: if you fix the bug, please add the following tag to the commit: > Reported-by: syzbot+9275da3e0f734e102...@syzkaller.appspotmail.com > It will help syzbot understand when the bug is fixed. See footer for > details. > If you forward the report, please keep this part and the footer. #syz dup: INFO: rcu detected stall in d_walk > INFO: rcu_sched self-detected stall on CPU > 1-...!: (124995 ticks this GP) idle=b86/1/4611686018427387906 > softirq=32196/32196 fqs=3 > (t=125000 jiffies g=16751 c=16750 q=347) > rcu_sched kthread starved for 124987 jiffies! g16751 c16750 f0x0 > RCU_GP_WAIT_FQS(3) ->state=0x0 ->cpu=1 > RCU grace-period kthread stack dump: > rcu_sched R running task23544 9 2 0x8000 > Call Trace: > context_switch kernel/sched/core.c:2848 [inline] > __schedule+0x801/0x1e30 kernel/sched/core.c:3490 > schedule+0xef/0x430 kernel/sched/core.c:3549 > schedule_timeout+0x138/0x240 kernel/time/timer.c:1801 > rcu_gp_kthread+0x6b5/0x1940 kernel/rcu/tree.c:2231 > kthread+0x345/0x410 kernel/kthread.c:238 > ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:411 > NMI backtrace for cpu 1 > CPU: 1 PID: 4559 Comm: syz-executor6 Not tainted 4.16.0+ #2 > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS > Google 01/01/2011 > Call Trace: > > __dump_stack lib/dump_stack.c:77 [inline] > dump_stack+0x1b9/0x294 lib/dump_stack.c:113 > nmi_cpu_backtrace.cold.4+0x19/0xce lib/nmi_backtrace.c:103 > nmi_trigger_cpumask_backtrace+0x151/0x192 lib/nmi_backtrace.c:62 > arch_trigger_cpumask_backtrace+0x14/0x20 arch/x86/kernel/apic/hw_nmi.c:38 > trigger_single_cpu_backtrace include/linux/nmi.h:156 [inline] > rcu_dump_cpu_stacks+0x175/0x1c2 kernel/rcu/tree.c:1376 > print_cpu_stall kernel/rcu/tree.c:1525 [inline] > check_cpu_stall.isra.61.cold.80+0x36c/0x59a kernel/rcu/tree.c:1593 > __rcu_pending kernel/rcu/tree.c:3356 [inline] > rcu_pending kernel/rcu/tree.c:3401 [inline] > rcu_check_callbacks+0x21b/0xad0 kernel/rcu/tree.c:2763 > update_process_times+0x2d/0x70 kernel/time/timer.c:1636 > tick_sched_handle+0x9f/0x180 kernel/time/tick-sched.c:173 > tick_sched_timer+0x45/0x130 kernel/time/tick-sched.c:1283 > __run_hrtimer kernel/time/hrtimer.c:1386 [inline] > __hrtimer_run_queues+0x3e3/0x10a0 kernel/time/hrtimer.c:1448 > hrtimer_interrupt+0x286/0x650 kernel/time/hrtimer.c:1506 > local_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1025 [inline] > smp_apic_timer_interrupt+0x15d/0x710 arch/x86/kernel/apic/apic.c:1050 > apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:862 > > RIP: 0010:__sanitizer_cov_trace_pc+0x14/0x50 kernel/kcov.c:94 > RSP: 0018:88018e2e7a80 EFLAGS: 0246 ORIG_RAX: ff13 > RAX: 88018e2de680 RBX: 88018e2e7bf8 RCX: 81c2b1d9 > RDX: RSI: 81c26bf3 RDI: 88018e2e7bf8 > RBP: 88018e2e7a80 R08: 88018e2de680 R09: ed003b51c378 > R10: ed003b51c378 R11: 8801da8e1bc3 R12: 88018e2e7c30 > R13: dc00 R14: 110031c5cf7e R15: ed0031c5cf81 > shrink_dentry_list+0x5a8/0x7c0 fs/dcache.c:1087 > shrink_dcache_parent+0xba/0x230 fs/dcache.c:1490 > vfs_rmdir+0x202/0x470 fs/namei.c:3850 > do_rmdir+0x523/0x610 fs/namei.c:3911 > SYSC_rmdir fs/namei.c:3929 [inline] > SyS_rmdir+0x1a/0x20 fs/namei.c:3927 > do_syscall_64+0x29e/0x9d0 arch/x86/entry/common.c:287 > entry_SYSCALL_64_after_hwframe+0x42/0xb7 > RIP: 0033:0x455087 > RSP: 002b:7fff8b6b76b8 EFLAGS: 0206 ORIG_RAX: 0054 > RAX: ffda RBX: 0065 RCX: 00455087 > RDX: RSI: 7fff8b6b9460 RDI: 7fff8b6b9460 > RBP: 7fff8b6b9460 R08: R09: 0001 > R10: 000a R11: 0206 R12: 02768940 > R13: R14: 01ec R15: 0001984e > > > --- > This bug is generated by a dumb bot. It may contain errors. > See https://goo.gl/tpsmEJ for details. > Direct all questions to syzkal...@googlegroups.com. > > syzbot will keep track of this bug report. > If you forgot to add the Reported-by tag, once the fix for this bug is > merged > into any tree, please reply to this email with: > #syz fix: exact-commit-title > To mark this as a duplicate of another syzbot report, please reply with: > #syz dup: exact-subj
Re: [PATCH] netfilter: CONFIG_NF_REJECT_IPV{4,6} becomes bool toggle
Hi Pablo, I love your patch! Yet something to improve: [auto build test ERROR on nf-next/master] [also build test ERROR on v4.16 next-20180413] [if your patch is applied to the wrong git tree, please drop us a note to help improve the system] url: https://github.com/0day-ci/linux/commits/Pablo-Neira-Ayuso/netfilter-CONFIG_NF_REJECT_IPV-4-6-becomes-bool-toggle/20180414-101337 base: https://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next.git master config: powerpc64-allmodconfig (attached as .config) compiler: powerpc64-linux-gnu-gcc (Debian 7.2.0-11) 7.2.0 reproduce: wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross # save the attached .config to linux build tree make.cross ARCH=powerpc64 All error/warnings (new ones prefixed by >>): powerpc64-linux-gnu-ld: warning: orphan section `.gnu.hash' from `linker stubs' being placed in section `.gnu.hash'. net/ipv6/netfilter/nf_reject_ipv6.o: In function `.nf_reject_ip6_tcphdr_get': >> (.text+0x1f0): undefined reference to `.nf_ip6_checksum' net/ipv6/netfilter/nf_reject_ipv6.o: In function `.nf_send_reset6': >> (.text+0x794): undefined reference to `.ip6_route_output_flags' net/ipv6/netfilter/nf_reject_ipv6.o: In function `.nf_send_unreach6': (.text+0xab8): undefined reference to `.nf_ip6_checksum' --- 0-DAY kernel test infrastructureOpen Source Technology Center https://lists.01.org/pipermail/kbuild-all Intel Corporation .config.gz Description: application/gzip
[PATCH net-next] net: introduce a new tracepoint for tcp_rcv_space_adjust
tcp_rcv_space_adjust is called every time data is copied to user space, introducing a tcp tracepoint for which could show us when the packet is copied to user. This could help us figure out whether there's latency in user process. When a tcp packet arrives, tcp_rcv_established() will be called and with the existed tracepoint tcp_probe we could get the time when this packet arrives. Then this packet will be copied to user, and tcp_rcv_space_adjust will be called and with this new introduced tracepoint we could get the time when this packet is copied to user. arrives time : user process time=> latency caused by user tcp_probe tcp_rcv_space_adjust Hence in the prink message, sk is printed as a key to connect these two tracepoints. Maybe we could export sockfd in this new tracepoint as well, then we could connect this new tracepoint with epoll/read/recv* tracepoint, and finally that could show us the whole lifespan of this packet. But we could also implement that with pid as these functions are executed in process context. Signed-off-by: Yafang Shao --- include/trace/events/tcp.h | 21 +++-- net/ipv4/tcp_input.c | 2 ++ 2 files changed, 17 insertions(+), 6 deletions(-) diff --git a/include/trace/events/tcp.h b/include/trace/events/tcp.h index 878b2be..65a6d22 100644 --- a/include/trace/events/tcp.h +++ b/include/trace/events/tcp.h @@ -146,10 +146,11 @@ sk->sk_v6_rcv_saddr, sk->sk_v6_daddr); ), - TP_printk("sport=%hu dport=%hu saddr=%pI4 daddr=%pI4 saddrv6=%pI6c daddrv6=%pI6c", + TP_printk("sport=%hu dport=%hu saddr=%pI4 daddr=%pI4 saddrv6=%pI6c daddrv6=%pI6c sock=0x%p", __entry->sport, __entry->dport, __entry->saddr, __entry->daddr, - __entry->saddr_v6, __entry->daddr_v6) + __entry->saddr_v6, __entry->daddr_v6, + __entry->skaddr) ); DEFINE_EVENT(tcp_event_sk, tcp_receive_reset, @@ -166,6 +167,13 @@ TP_ARGS(sk) ); +DEFINE_EVENT(tcp_event_sk, tcp_rcv_space_adjust, + + TP_PROTO(const struct sock *sk), + + TP_ARGS(sk) +); + TRACE_EVENT(tcp_set_state, TP_PROTO(const struct sock *sk, const int oldstate, const int newstate), @@ -265,6 +273,7 @@ TP_ARGS(sk, skb), TP_STRUCT__entry( + __field(const void *, skaddr) /* sockaddr_in6 is always bigger than sockaddr_in */ __array(__u8, saddr, sizeof(struct sockaddr_in6)) __array(__u8, daddr, sizeof(struct sockaddr_in6)) @@ -285,6 +294,8 @@ const struct tcp_sock *tp = tcp_sk(sk); const struct inet_sock *inet = inet_sk(sk); + __entry->skaddr = sk; + memset(__entry->saddr, 0, sizeof(struct sockaddr_in6)); memset(__entry->daddr, 0, sizeof(struct sockaddr_in6)); @@ -305,13 +316,11 @@ __entry->srtt = tp->srtt_us >> 3; ), - TP_printk("src=%pISpc dest=%pISpc mark=%#x length=%d snd_nxt=%#x " - "snd_una=%#x snd_cwnd=%u ssthresh=%u snd_wnd=%u srtt=%u " - "rcv_wnd=%u", + TP_printk("src=%pISpc dest=%pISpc mark=%#x length=%d snd_nxt=%#x snd_una=%#x snd_cwnd=%u ssthresh=%u snd_wnd=%u srtt=%u rcv_wnd=%u sock=0x%p", __entry->saddr, __entry->daddr, __entry->mark, __entry->length, __entry->snd_nxt, __entry->snd_una, __entry->snd_cwnd, __entry->ssthresh, __entry->snd_wnd, - __entry->srtt, __entry->rcv_wnd) + __entry->srtt, __entry->rcv_wnd, __entry->skaddr) ); #endif /* _TRACE_TCP_H */ diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index 367def6..4b4d6b9 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -582,6 +582,8 @@ void tcp_rcv_space_adjust(struct sock *sk) u32 copied; int time; + trace_tcp_rcv_space_adjust(sk); + tcp_mstamp_refresh(tp); time = tcp_stamp_us_delta(tp->tcp_mstamp, tp->rcvq_space.time); if (time < (tp->rcv_rtt_est.rtt_us >> 3) || tp->rcv_rtt_est.rtt_us == 0) -- 1.8.3.1
[PATCH v2 0/3] ti_am335x_tsc: Fix suspend/resume
This patch series fixes couple of issues wrt suspend/resume with TI AM335x TSC driver. Disable and clear any pending IRQs before suspend, and handle case where TSC wakeup would fail, if there were touch events during suspend. v2: Rebase onto latest linux-next. v1:https://lkml.org/lkml/2016/5/16/150 Grygorii Strashko (2): Input: ti_am335x_tsc - Ack pending IRQs at probe and before suspend Input: ti_am335x_tsc - Prevent system suspend when TSC is in use Vignesh R (1): Input: ti_am335x_tsc - Mark IRQ as wakeup capable drivers/input/touchscreen/ti_am335x_tsc.c | 14 ++ include/linux/mfd/ti_am335x_tscadc.h | 1 + 2 files changed, 15 insertions(+) -- 2.17.0
[PATCH v2 2/3] Input: ti_am335x_tsc - Ack pending IRQs at probe and before suspend
From: Grygorii Strashko It is seen that just enabling the TSC module triggers a HW_PEN IRQ without any interaction with touchscreen by user. This results in first suspend/resume sequence to fail as system immediately wakes up from suspend as soon as HW_PEN IRQ is enabled in suspend handler due to the pending IRQ. Therefore clear all IRQs at probe and also in suspend callback for sanity. Signed-off-by: Grygorii Strashko Signed-off-by: Vignesh R Acked-by: Lee Jones --- v2: Add Acks from v1. drivers/input/touchscreen/ti_am335x_tsc.c | 2 ++ include/linux/mfd/ti_am335x_tscadc.h | 1 + 2 files changed, 3 insertions(+) diff --git a/drivers/input/touchscreen/ti_am335x_tsc.c b/drivers/input/touchscreen/ti_am335x_tsc.c index 810e05c9c4f5..dcd9db768169 100644 --- a/drivers/input/touchscreen/ti_am335x_tsc.c +++ b/drivers/input/touchscreen/ti_am335x_tsc.c @@ -439,6 +439,7 @@ static int titsc_probe(struct platform_device *pdev) dev_err(&pdev->dev, "irq wake enable failed.\n"); } + titsc_writel(ts_dev, REG_IRQSTATUS, IRQENB_MASK); titsc_writel(ts_dev, REG_IRQENABLE, IRQENB_FIFO0THRES); titsc_writel(ts_dev, REG_IRQENABLE, IRQENB_EOS); err = titsc_config_wires(ts_dev); @@ -504,6 +505,7 @@ static int __maybe_unused titsc_suspend(struct device *dev) tscadc_dev = ti_tscadc_dev_get(to_platform_device(dev)); if (device_may_wakeup(tscadc_dev->dev)) { + titsc_writel(ts_dev, REG_IRQSTATUS, IRQENB_MASK); idle = titsc_readl(ts_dev, REG_IRQENABLE); titsc_writel(ts_dev, REG_IRQENABLE, (idle | IRQENB_HW_PEN)); diff --git a/include/linux/mfd/ti_am335x_tscadc.h b/include/linux/mfd/ti_am335x_tscadc.h index b9a53e013bff..1a6a34f726cc 100644 --- a/include/linux/mfd/ti_am335x_tscadc.h +++ b/include/linux/mfd/ti_am335x_tscadc.h @@ -63,6 +63,7 @@ #define IRQENB_FIFO1OVRRUN BIT(6) #define IRQENB_FIFO1UNDRFLWBIT(7) #define IRQENB_PENUP BIT(9) +#define IRQENB_MASK(0x7FF) /* Step Configuration */ #define STEPCONFIG_MODE_MASK (3 << 0) -- 2.17.0
[PATCH v2 1/3] Input: ti_am335x_tsc - Mark IRQ as wakeup capable
On AM335x, ti_am335x_tsc can wake up the system from suspend, mark the IRQ as wakeup capable, so that device irq is not disabled during system suspend. Signed-off-by: Vignesh R --- v2: No changes drivers/input/touchscreen/ti_am335x_tsc.c | 9 + 1 file changed, 9 insertions(+) diff --git a/drivers/input/touchscreen/ti_am335x_tsc.c b/drivers/input/touchscreen/ti_am335x_tsc.c index f1043ae71dcc..810e05c9c4f5 100644 --- a/drivers/input/touchscreen/ti_am335x_tsc.c +++ b/drivers/input/touchscreen/ti_am335x_tsc.c @@ -27,6 +27,7 @@ #include #include #include +#include #include @@ -432,6 +433,12 @@ static int titsc_probe(struct platform_device *pdev) goto err_free_mem; } + if (device_may_wakeup(tscadc_dev->dev)) { + err = dev_pm_set_wake_irq(tscadc_dev->dev, ts_dev->irq); + if (err) + dev_err(&pdev->dev, "irq wake enable failed.\n"); + } + titsc_writel(ts_dev, REG_IRQENABLE, IRQENB_FIFO0THRES); titsc_writel(ts_dev, REG_IRQENABLE, IRQENB_EOS); err = titsc_config_wires(ts_dev); @@ -462,6 +469,7 @@ static int titsc_probe(struct platform_device *pdev) return 0; err_free_irq: + dev_pm_clear_wake_irq(tscadc_dev->dev); free_irq(ts_dev->irq, ts_dev); err_free_mem: input_free_device(input_dev); @@ -474,6 +482,7 @@ static int titsc_remove(struct platform_device *pdev) struct titsc *ts_dev = platform_get_drvdata(pdev); u32 steps; + dev_pm_clear_wake_irq(ts_dev->mfd_tscadc->dev); free_irq(ts_dev->irq, ts_dev); /* total steps followed by the enable mask */ -- 2.17.0
[PATCH v2 3/3] Input: ti_am335x_tsc - Prevent system suspend when TSC is in use
From: Grygorii Strashko Prevent system suspend while user has finger on touch screen, because TSC is wakeup source and suspending device while in use will result in failure to disable the module. This patch uses pm_stay_awake() and pm_relax() APIs to prevent and resume system suspend as required. Signed-off-by: Grygorii Strashko Signed-off-by: Vignesh R --- v2: No changes. drivers/input/touchscreen/ti_am335x_tsc.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/input/touchscreen/ti_am335x_tsc.c b/drivers/input/touchscreen/ti_am335x_tsc.c index dcd9db768169..43b22e071842 100644 --- a/drivers/input/touchscreen/ti_am335x_tsc.c +++ b/drivers/input/touchscreen/ti_am335x_tsc.c @@ -275,6 +275,7 @@ static irqreturn_t titsc_irq(int irq, void *dev) if (status & IRQENB_HW_PEN) { ts_dev->pen_down = true; irqclr |= IRQENB_HW_PEN; + pm_stay_awake(ts_dev->mfd_tscadc->dev); } if (status & IRQENB_PENUP) { @@ -284,6 +285,7 @@ static irqreturn_t titsc_irq(int irq, void *dev) input_report_key(input_dev, BTN_TOUCH, 0); input_report_abs(input_dev, ABS_PRESSURE, 0); input_sync(input_dev); + pm_relax(ts_dev->mfd_tscadc->dev); } else { ts_dev->pen_down = true; } @@ -524,6 +526,7 @@ static int __maybe_unused titsc_resume(struct device *dev) titsc_writel(ts_dev, REG_IRQWAKEUP, 0x00); titsc_writel(ts_dev, REG_IRQCLR, IRQENB_HW_PEN); + pm_relax(ts_dev->mfd_tscadc->dev); } titsc_step_config(ts_dev); titsc_writel(ts_dev, REG_FIFO0THR, -- 2.17.0
Re: [PATCH 2/4 v4] sched/rt: add rt_rq utilization tracking
On Fri, Mar 16, 2018 at 12:25:39PM +0100, Vincent Guittot wrote: > diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h > index 783eacf..a8003a9 100644 > --- a/kernel/sched/sched.h > +++ b/kernel/sched/sched.h > @@ -592,6 +592,8 @@ struct rt_rq { > unsigned long rt_nr_total; > int overloaded; > struct plist_head pushable_tasks; > + > + struct sched_avg avg; We only want this for the root cgroup, right? So why is this per cgroup? That is, I was expecting it to be rq::rt_avg or something.
Re: [PATCH 0/4 v4] sched/rt: track rt rq utilization
What I don't see in this patch-set is removal of the current rt_avg stuff. And I didn't look closely enough; but are the root cfs and rt pelt windows aligned? They really should be; otherwise you can't combine them sanely.
Re: [PATCH] x86/cpufeature: guard asm_volatile_goto usage with CC_HAVE_ASM_GOTO
On Fri, Apr 13, 2018 at 01:42:14PM -0700, Alexei Starovoitov wrote: > On 4/13/18 11:19 AM, Peter Zijlstra wrote: > > On Tue, Apr 10, 2018 at 02:28:04PM -0700, Alexei Starovoitov wrote: > > > Instead of > > > #ifdef CC_HAVE_ASM_GOTO > > > we can replace it with > > > #ifndef __BPF__ > > > or some other name, > > > > I would prefer the BPF specific hack; otherwise we might be encouraging > > people to build the kernel proper without asm-goto. > > > > I don't understand this concern. The thing is; this will be a (temporary) BPF specific hack. Hiding it behind something that looks 'normal' (CC_HAVE_ASM_GOTO) is just not right.
Re: [PATCH 3/7] bus: add bus driver for accessing Allwinner A64 DE2
On Fri, Mar 16, 2018 at 11:23 PM, Icenowy Zheng wrote: > The "Display Engine 2.0" (usually called DE2) on the Allwinner A64 SoC > is different from the ones on other Allwinner SoCs. It requires a SRAM > region to be claimed, otherwise all DE2 subblocks won't be accessible. > > Add a bus driver for the Allwinner A64 DE2 part which claims the SRAM > region when probing. Along with this bus driver, we also need drivers/gpu/drm/sun4i/sun4i_drv.c which can usually drive the pipelines like mixer0 and 1 are the cases for A64? Jagan. -- Jagan Teki Senior Linux Kernel Engineer | Amarula Solutions U-Boot, Linux | Upstream Maintainer Hyderabad, India.
Re: [PATCH 3/7] bus: add bus driver for accessing Allwinner A64 DE2
On Sat, Apr 14, 2018 at 6:25 PM, Jagan Teki wrote: > On Fri, Mar 16, 2018 at 11:23 PM, Icenowy Zheng wrote: >> The "Display Engine 2.0" (usually called DE2) on the Allwinner A64 SoC >> is different from the ones on other Allwinner SoCs. It requires a SRAM >> region to be claimed, otherwise all DE2 subblocks won't be accessible. >> >> Add a bus driver for the Allwinner A64 DE2 part which claims the SRAM >> region when probing. > > Along with this bus driver, we also need > drivers/gpu/drm/sun4i/sun4i_drv.c which can usually drive the > pipelines like mixer0 and 1 are the cases for A64? I imagine that's the next part to be sent out, after the hardware representation in the device tree has been decided on. ChenYu
Re: [PATCH v3 3/3] ALSA: hda: Disabled unused audio controller for Dell platforms with Switchable Graphics
On Thu, Apr 12, 2018 at 10:15:41PM +0800, Kai-Heng Feng wrote: > > >>@@ -1711,6 +1745,11 @@ static int azx_create(struct snd_card *card, > > >>struct pci_dev *pci, > > >> if (err < 0) > > >> return err; > > >> > > >>+ if (check_dell_switchable_gfx(pci)) { > > >>+ pci_disable_device(pci); > > > > Now looking at it again... This code disables all ATI and NVIDIA sound > > cards available in any Dell System (laptop or AIO) if system says that > > SG is enabled, right? > > > > It means that also any external ATI or NVIDIA PCI card with audio device > > connected to Thunderbolt (e.g. via PCI <--> TB bridge) is always > > unconditionally disabled too? > > I never thought of this case, thanks for bringing this up. > Do you have any suggestion to check if it connects to the system via > Thunderbolt? Just use pci_is_thunderbolt_attached(), introduced by 8531e283bee6, like this: if (check_dell_switchable_gfx(pci) && !pci_is_thunderbolt_attached(pci)) > >>>+ /* Only need to check for Dell laptops and AIOs */ > >>>+ if (!dmi_find_device(DMI_DEV_TYPE_OEM_STRING, "Dell System", NULL) || > >>>+ !(dmi_match(DMI_CHASSIS_TYPE, "10") || > >>>+dmi_match(DMI_CHASSIS_TYPE, "13")) || > >>>+ !(pdev->vendor == PCI_VENDOR_ID_ATI || > >>>+pdev->vendor == PCI_VENDOR_ID_NVIDIA)) > >>>+ return false; It sure would be nice if someone could add macros for the chassis type to include/linux/dmi.h so that we don't have to use these magic numbers everywhere: $ git grep -l DMI_CHASSIS_TYPE drivers/firmware/dmi-id.c drivers/firmware/dmi_scan.c drivers/input/keyboard/atkbd.c drivers/input/serio/i8042-x86ia64io.h drivers/platform/x86/asus-wmi.c drivers/platform/x86/dell-laptop.c drivers/platform/x86/samsung-laptop.c include/linux/mod_devicetable.h scripts/mod/file2alias.c Thanks, Lukas
Re: [PATCH 3/7] bus: add bus driver for accessing Allwinner A64 DE2
On Sat, Apr 14, 2018 at 4:00 PM, Chen-Yu Tsai wrote: > On Sat, Apr 14, 2018 at 6:25 PM, Jagan Teki > wrote: >> On Fri, Mar 16, 2018 at 11:23 PM, Icenowy Zheng wrote: >>> The "Display Engine 2.0" (usually called DE2) on the Allwinner A64 SoC >>> is different from the ones on other Allwinner SoCs. It requires a SRAM >>> region to be claimed, otherwise all DE2 subblocks won't be accessible. >>> >>> Add a bus driver for the Allwinner A64 DE2 part which claims the SRAM >>> region when probing. >> >> Along with this bus driver, we also need >> drivers/gpu/drm/sun4i/sun4i_drv.c which can usually drive the >> pipelines like mixer0 and 1 are the cases for A64? > > I imagine that's the next part to be sent out, after the hardware > representation in the device tree has been decided on. Yeah, this hardware representation along with separate bus driver going like in another direction especially if we add pipelines support to it, may be we can add sram stuff to platdata of existinf sun4i_drv.c Jagan. -- Jagan Teki Senior Linux Kernel Engineer | Amarula Solutions U-Boot, Linux | Upstream Maintainer Hyderabad, India.
Re: [PATCH v3 3/3] ALSA: hda: Disabled unused audio controller for Dell platforms with Switchable Graphics
On Saturday 14 April 2018 12:45:12 Lukas Wunner wrote: > On Thu, Apr 12, 2018 at 10:15:41PM +0800, Kai-Heng Feng wrote: > > > >>@@ -1711,6 +1745,11 @@ static int azx_create(struct snd_card *card, > > > >>struct pci_dev *pci, > > > >>if (err < 0) > > > >>return err; > > > >> > > > >>+ if (check_dell_switchable_gfx(pci)) { > > > >>+ pci_disable_device(pci); > > > > > > Now looking at it again... This code disables all ATI and NVIDIA sound > > > cards available in any Dell System (laptop or AIO) if system says that > > > SG is enabled, right? > > > > > > It means that also any external ATI or NVIDIA PCI card with audio device > > > connected to Thunderbolt (e.g. via PCI <--> TB bridge) is always > > > unconditionally disabled too? > > > > I never thought of this case, thanks for bringing this up. > > Do you have any suggestion to check if it connects to the system via > > Thunderbolt? > > Just use pci_is_thunderbolt_attached(), introduced by 8531e283bee6, > like this: > > if (check_dell_switchable_gfx(pci) && !pci_is_thunderbolt_attached(pci)) And what about PCI-e device attached to ExpressCard slot? > > >>>+/* Only need to check for Dell laptops and AIOs */ > > >>>+if (!dmi_find_device(DMI_DEV_TYPE_OEM_STRING, "Dell System", > > >>>NULL) || > > >>>+!(dmi_match(DMI_CHASSIS_TYPE, "10") || > > >>>+ dmi_match(DMI_CHASSIS_TYPE, "13")) || > > >>>+!(pdev->vendor == PCI_VENDOR_ID_ATI || > > >>>+ pdev->vendor == PCI_VENDOR_ID_NVIDIA)) > > >>>+return false; > > It sure would be nice if someone could add macros for the chassis type > to include/linux/dmi.h so that we don't have to use these magic numbers > everywhere: > > $ git grep -l DMI_CHASSIS_TYPE > drivers/firmware/dmi-id.c > drivers/firmware/dmi_scan.c > drivers/input/keyboard/atkbd.c > drivers/input/serio/i8042-x86ia64io.h > drivers/platform/x86/asus-wmi.c > drivers/platform/x86/dell-laptop.c > drivers/platform/x86/samsung-laptop.c > include/linux/mod_devicetable.h > scripts/mod/file2alias.c > > Thanks, > > Lukas -- Pali Rohár pali.ro...@gmail.com signature.asc Description: PGP signature
[PATCH] selftests:vm: add include file
userfaultfd.c: In function ‘hugetlb_release_pages’: userfaultfd.c:145:25: error: ‘FALLOC_FL_PUNCH_HOLE’ undeclared (first use in this function) Signed-off-by: Peng Hao --- tools/testing/selftests/vm/userfaultfd.c | 1 + 1 file changed, 1 insertion(+) diff --git a/tools/testing/selftests/vm/userfaultfd.c b/tools/testing/selftests/vm/userfaultfd.c index de2f9ec..d8fe447 100644 --- a/tools/testing/selftests/vm/userfaultfd.c +++ b/tools/testing/selftests/vm/userfaultfd.c @@ -68,6 +68,7 @@ #include #include #include +#include #ifdef __NR_userfaultfd -- 1.8.3.1
OK
Dear Friend, Greetings to you my friend, I have a very lucrative Partnership offer for you. Kindly contact me for more details Best Regards Ahmed Zama
Re: [PATCHv4] gpio: Remove VLA from gpiolib
On 14/04/2018 05:10, Laura Abbott wrote: On 04/12/2018 05:39 PM, Phil Reid wrote: On 12/04/2018 16:38, Linus Walleij wrote: On Wed, Apr 11, 2018 at 3:03 AM, Laura Abbott wrote: The new challenge is to remove VLAs from the kernel (see https://lkml.org/lkml/2018/3/7/621) to eventually turn on -Wvla. Using a kmalloc array is the easy way to fix this but kmalloc is still more expensive than stack allocation. Introduce a fast path with a fixed size stack array to cover most chip with gpios below some fixed amount. The slow path dynamically allocates an array to cover those chips with a large number of gpios. Reviewed-and-tested-by: Lukas Wunner Signed-off-by: Lukas Wunner Signed-off-by: Laura Abbott --- v4: Changed some local variables to avoid coccinelle warnings. Added a warning if the number of GPIOs exceeds the current fast path define. Lukas, I kept your Tested-by because the changes were pretty minimal. Let me know if you want to run the tests again. This patch is starting to look really good. +/* + * Number of GPIOs to use for the fast path in set array + */ +#define FASTPATH_NGPIO 256 There is still some comment about this. And now that I am also tryint to think I wonder about it, we have a global ARCH_NR_GPIOS that is typically 512. Some archs set it up. This define is something of an abomination, in the ARM case it comes from arch/arm/include/asm/gpio.h where #define ARCH_NR_GPIOS CONFIG_ARCH_NR_GPIO where the latter is a Kconfig option that is mostly 512 for most ARM systems. Well, ARM looks like this: config ARCH_NR_GPIO int default 2048 if ARCH_SOCFPGA default 1024 if ARCH_BRCMSTB || ARCH_SHMOBILE || ARCH_TEGRA || \ ARCH_ZYNQ default 512 if ARCH_EXYNOS || ARCH_KEYSTONE || SOC_OMAP5 || \ SOC_DRA7XX || ARCH_S3C24XX || ARCH_S3C64XX || ARCH_S5PV210 default 416 if ARCH_SUNXI default 392 if ARCH_U8500 default 352 if ARCH_VT8500 default 288 if ARCH_ROCKCHIP default 264 if MACH_H4700 default 0 help Maximum number of GPIOs in the system. If unsure, leave the default value. So if FASTPATH_NGPIO should be anything else than ARCH_NR_GPIO this has to be established somewhere as a floor or half or something, but I would just set it as the same as ARCH_NR_GPIOS... The main reason this define exist is for this function from : /* Convert between the old gpio_ and new gpiod_ interfaces */ struct gpio_desc *gpio_to_desc(unsigned gpio); Nowadays that fact is a bit obscured since the variable is only used when assigning the base (in the global GPIO number space, which is what we want to get rid of but sigh) in gpiochip_find_base() where it attempts to place a newly allocated gpiochip in the higher region of this numberspace since the embedded SoC GPIO base tends to be 0, on old platforms. So I don't know about this. Can't we just use ARCH_NR_GPIOS? Very few systems have more than 512 assigned global GPIO numbers and those are FPGA experimental machines. In the long run obviously I want to get rid of these defines altogether and only allocate GPIO descriptos dynamically so as you see I am reluctant to add new numberspace weirdness around here. Isn't that for total GPIO's in the system? And the arrays just need to cater for max per chip? From what I can understand of the code which is admittedly limited. Yeah the switch back to 256 was a mistake on my end (I think I grabbed an incorrect version for my base). ARCH_NR_GPIOs is the total number in the system which may be multiple chips so yes we would be possibly allocating more space than necessary. unsigned long fastpath[2 * BITS_TO_LONGS(FASTPATH_NGPIO)] unsigned long fastpath[2 * BITS_TO_LONGS(512)] unsigned long fastpath[2 * DIV_ROUND_UP(512, 8 * sizeof(long))] so we end up with 128 bytes on the stack total assuming I can do math correctly. I think this a fairly reasonable amount though, even if we are over-estimating if there are multiple chips. Yeah that's not too bad. My system is a SOCFPGA so it'd be 2048 / 8 = 512. Still not unreasonable. But the system doesn't have a single gpio close to that. The largest chip is 32. -- Regards Phil Reid
Re: [PATCH v3 3/3] ALSA: hda: Disabled unused audio controller for Dell platforms with Switchable Graphics
On Sat, Apr 14, 2018 at 12:49:50PM +0200, Pali Rohár wrote: > On Saturday 14 April 2018 12:45:12 Lukas Wunner wrote: > > On Thu, Apr 12, 2018 at 10:15:41PM +0800, Kai-Heng Feng wrote: > > > Do you have any suggestion to check if it connects to the system via > > > Thunderbolt? > > > > Just use pci_is_thunderbolt_attached(), introduced by 8531e283bee6, > > like this: > > > > if (check_dell_switchable_gfx(pci) && !pci_is_thunderbolt_attached(pci)) > > And what about PCI-e device attached to ExpressCard slot? I don't know of a bullet-proof way to recognize those. In theory one could check if the PCIe port above the GPU is a non-hotplug root port, but I think there are machines with hotplug capable root ports with GPUs below them that aren't actually removable. However I think ExpressCard-attached GPUs were rare, much less ones with integrated HDA controller, so in reality that's probably a non-issue. Thanks, Lukas
Re: Potential problem with 31e77c93e432dec7 ("sched/fair: Update blocked load when newly idle")
Heiner, On 12 April 2018 at 21:43, Heiner Kallweit wrote: I'm going to prepare a debug patch to spy what's happening when entering idle >> >> I'd like to narrow the problem a bit more with the 2 patchies aboves. Can >> you try >> them separatly on top of c18bb396d3d261eb ("Merge >> git://git.kernel.org/pub/scm/linux/kernel/git/davem/net")) >> and check if one of them fixes the problem ?i >> >> (They should apply on linux-next as well) >> >> First patch always kick ilb instead of doing ilb on local cpu before >> entering idle >> >> --- >> kernel/sched/fair.c | 3 +-- >> 1 file changed, 1 insertion(+), 2 deletions(-) >> >> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c >> index 0951d1c..b21925b 100644 >> --- a/kernel/sched/fair.c >> +++ b/kernel/sched/fair.c >> @@ -9739,8 +9739,7 @@ static void nohz_newidle_balance(struct rq *this_rq) >>* candidate for ilb instead of waking up another idle CPU. >>* Kick an normal ilb if we failed to do the update. >>*/ >> - if (!_nohz_idle_balance(this_rq, NOHZ_STATS_KICK, CPU_NEWLY_IDLE)) >> - kick_ilb(NOHZ_STATS_KICK); >> + kick_ilb(NOHZ_STATS_KICK); >> raw_spin_lock(&this_rq->lock); >> } >> >> > I tested both patches, with both of them the issue still occurs. However, > on top of linux-next from yesterday I have the impression that it happens > less frequent with the second patch. > On top of the commit mentioned by you I don't see a change in system behavior > with either patch. Thanks for the tests. I was expecting to have more differences between the 2 patches and especially no problem with the 1st patch which only send a ipi reschedule to the other CPU if it is idle. It seems to not really be related to what is done but to the fact that it is done at that place in the code Thanks > > Regards, Heiner
Re: Potential problem with 31e77c93e432dec7 ("sched/fair: Update blocked load when newly idle")
Hi Niklas, On 13 April 2018 at 00:39, Niklas Söderlund wrote: > Hi Vincent, > > Thanks for helping trying to figure this out. > > On 2018-04-12 15:30:31 +0200, Vincent Guittot wrote: > > [snip] > >> >> I'd like to narrow the problem a bit more with the 2 patchies aboves. Can >> you try >> them separatly on top of c18bb396d3d261eb ("Merge >> git://git.kernel.org/pub/scm/linux/kernel/git/davem/net")) >> and check if one of them fixes the problem ?i > > I tried your suggested changes based on top of c18bb396d3d261eb. > >> >> (They should apply on linux-next as well) >> >> First patch always kick ilb instead of doing ilb on local cpu before >> entering idle >> >> --- >> kernel/sched/fair.c | 3 +-- >> 1 file changed, 1 insertion(+), 2 deletions(-) >> >> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c >> index 0951d1c..b21925b 100644 >> --- a/kernel/sched/fair.c >> +++ b/kernel/sched/fair.c >> @@ -9739,8 +9739,7 @@ static void nohz_newidle_balance(struct rq *this_rq) >>* candidate for ilb instead of waking up another idle CPU. >>* Kick an normal ilb if we failed to do the update. >>*/ >> - if (!_nohz_idle_balance(this_rq, NOHZ_STATS_KICK, CPU_NEWLY_IDLE)) >> - kick_ilb(NOHZ_STATS_KICK); >> + kick_ilb(NOHZ_STATS_KICK); >> raw_spin_lock(&this_rq->lock); >> } > > This change don't seem to effect the issue. I can still get the single > ssh session and the system to lockup by hitting the return key. And > opening a second ssh session immediately unblocks both the first ssh > session and the serial console. And I can still trigger the console > warning by just letting the system be once it locks-up. I do have > just as before reset the system a few times to trigger the issue. You results are similar to Heiner's ones. The problem is still there even if we only kick ilb which mainly send an IPI reschedule to the other CPU if Idle > > [ 245.351693] INFO: rcu_sched detected stalls on CPUs/tasks: > [ 245.357199] 0-...!: (1 GPs behind) idle=93c/0/0 softirq=2224/2225 fqs=0 > [ 245.363988] (detected by 1, t=3025 jiffies, g=337, c=336, q=10) > [ 245.370003] Sending NMI from CPU 1 to CPUs 0: > [ 245.374368] NMI backtrace for cpu 0 > [ 245.374377] CPU: 0 PID: 0 Comm: swapper/0 Not tainted > 4.16.0-10930-ged741fb4567c816f #42 > [ 245.374379] Hardware name: Generic R8A7791 (Flattened Device Tree) > [ 245.374393] PC is at arch_cpu_idle+0x24/0x40 > [ 245.374397] LR is at arch_cpu_idle+0x34/0x40 > [ 245.374400] pc : []lr : []psr: 60050013 > [ 245.374403] sp : c0b01f40 ip : c0b01f50 fp : c0b01f4c > [ 245.374405] r10: c0a56a38 r9 : e7fffbc0 r8 : c0b04c00 > [ 245.374407] r7 : c0b04c78 r6 : c0b04c2c r5 : e000 r4 : 0001 > [ 245.374410] r3 : c0119100 r2 : e77813a8 r1 : 0002d93c r0 : > [ 245.374414] Flags: nZCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment > none > [ 245.374417] Control: 10c5387d Table: 6662006a DAC: 0051 > [ 245.374421] CPU: 0 PID: 0 Comm: swapper/0 Not tainted > 4.16.0-10930-ged741fb4567c816f #42 > [ 245.374423] Hardware name: Generic R8A7791 (Flattened Device Tree) > [ 245.374425] Backtrace: > [ 245.374435] [] (dump_backtrace) from [] > (show_stack+0x18/0x1c) > [ 245.374440] r7:c0b47278 r6:60050193 r5: r4:c0b73d80 > [ 245.374450] [] (show_stack) from [] > (dump_stack+0x84/0xa4) > [ 245.374456] [] (dump_stack) from [] > (show_regs+0x14/0x18) > [ 245.374460] r7:c0b47278 r6:c0b01ef0 r5: r4:c0bc62c8 > [ 245.374468] [] (show_regs) from [] > (nmi_cpu_backtrace+0xfc/0x118) > [ 245.374475] [] (nmi_cpu_backtrace) from [] > (handle_IPI+0x22c/0x294) > [ 245.374479] r7:c0b47278 r6:c0b01ef0 r5:0007 r4:c0a775fc > [ 245.374488] [] (handle_IPI) from [] > (gic_handle_irq+0x8c/0x98) > [ 245.374492] r10:c0a56a38 r9:c0b0 r8:f0803000 r7:c0b47278 r6:c0b01ef0 > r5:c0b05244 > [ 245.374495] r4:f0802000 r3:0407 > [ 245.374501] [] (gic_handle_irq) from [] > (__irq_svc+0x6c/0x90) > [ 245.374504] Exception stack(0xc0b01ef0 to 0xc0b01f38) > [ 245.374507] 1ee0: 0002d93c > e77813a8 c0119100 > [ 245.374512] 1f00: 0001 e000 c0b04c2c c0b04c78 c0b04c00 e7fffbc0 > c0a56a38 c0b01f4c > [ 245.374516] 1f20: c0b01f50 c0b01f40 c0108564 c0108554 60050013 > [ 245.374521] r9:c0b0 r8:c0b04c00 r7:c0b01f24 r6: r5:60050013 > r4:c0108554 > [ 245.374528] [] (arch_cpu_idle) from [] > (default_idle_call+0x30/0x34) > [ 245.374535] [] (default_idle_call) from [] > (do_idle+0xd8/0x128) > [ 245.374540] [] (do_idle) from [] > (cpu_startup_entry+0x20/0x24) > [ 245.374543] r7:c0b04c08 r6: r5:c0b80380 r4:00c2 > [ 245.374549] [] (cpu_startup_entry) from [] > (rest_init+0x9c/0xbc) > [ 245.374555] [] (rest_init) from [] > (start_kernel+0x368/0x3ec) > [ 245.374558] r5:c0b80380 r4:c0b803c0 > [ 245.374563] [] (start_kernel) from [<>] ( (null)) > [ 245.375369] rcu_sched kthread starved
Re: [RFC v2] virtio: support packed ring
On Fri, Apr 13, 2018 at 06:22:45PM +0300, Michael S. Tsirkin wrote: > On Sun, Apr 01, 2018 at 10:12:16PM +0800, Tiwei Bie wrote: > > +static inline bool more_used(const struct vring_virtqueue *vq) > > +{ > > + return vq->packed ? more_used_packed(vq) : more_used_split(vq); > > +} > > + > > +void *virtqueue_get_buf_ctx_split(struct virtqueue *_vq, unsigned int *len, > > + void **ctx) > > +{ > > + struct vring_virtqueue *vq = to_vvq(_vq); > > + void *ret; > > + unsigned int i; > > + u16 last_used; > > + > > + START_USE(vq); > > + > > + if (unlikely(vq->broken)) { > > + END_USE(vq); > > + return NULL; > > + } > > + > > + if (!more_used(vq)) { > > + pr_debug("No more buffers in queue\n"); > > + END_USE(vq); > > + return NULL; > > + } > > So virtqueue_get_buf_ctx_split should only call more_used_split. Yeah, you're right! Will fix this in the next version. > > to avoid such issues I think we should lay out the code like this: > > XXX_split > > XXX_packed > > XXX wrappers I'll do it. Thanks for the suggestion! > > > +/* The standard layout > > I'd drop standard here. Got it. I'll drop the word "standard". > > > for the packed ring is a continuous chunk of memory > > + * which looks like this. > > + * > > + * struct vring_packed > > + * { > > Can the opening bracket go on the prev line pls? Sure. > > > + * // The actual descriptors (16 bytes each) > > + * struct vring_packed_desc desc[num]; > > + * > > + * // Padding to the next align boundary. > > + * char pad[]; > > + * > > + * // Driver Event Suppression > > + * struct vring_packed_desc_event driver; > > + * > > + * // Device Event Suppression > > + * struct vring_packed_desc_event device; > > Maybe that's how our driver does it but it's not based on spec > so I don't think this belongs in the header. I will move it to the place where vring_packed_init() is defined. > > > + * }; > > + */ > > + > > +static inline unsigned vring_packed_size(unsigned int num, unsigned long > > align) > > +{ > > + return ((sizeof(struct vring_packed_desc) * num + align - 1) > > + & ~(align - 1)) + sizeof(struct vring_packed_desc_event) * 2; > > +} > > + > > Cant say this API makes sense for me. Hmm, do you have any suggestion? Also move it out of this header? Thanks for the review! :) Best regards, Tiwei Bie > > > > #endif /* _UAPI_LINUX_VIRTIO_RING_H */ > > -- > > 2.11.0
Re: [PATCH v3 3/3] ALSA: hda: Disabled unused audio controller for Dell platforms with Switchable Graphics
On Thu, Apr 12, 2018 at 10:12:49PM +0800, Kai-Heng Feng wrote: > at 6:50 PM, Takashi Iwai wrote: > > On Thu, 12 Apr 2018 12:42:39 +0200, Kai-Heng Feng wrote: > > > When SG is enabled, the unused AMD audio controller still exposes its > > > sysfs, so userspace still opens the control file and stream. If > > > userspace tries to output sound through the stream, it hangs when > > > runtime suspend kicks in: > > > [ 12.796265] snd_hda_intel :01:00.1: Disabling via vga_switcheroo > > > [ 12.796367] snd_hda_intel :01:00.1: Cannot lock devices! > > > > > > Since the discrete audio controller isn't useful when SG enabled, we > > > should just disable the device. > > > > > > Signed-off-by: Kai-Heng Feng > > > > I thought we manage this better now with runtime PM by Lukas's recent > > patchset? > > Yes, that's true. I'll update commit log for next iteration. > > Nevertheless, the unusable control file and stream still get exposed via > sysfs. > We should disable them when SG is enabled. Right, the hang on runtime suspend as mentioned in the commit message should be gone in 4.17. The purpose of this patch is thus to prevent the user from seeing or opening the HDA controller on the discrete GPU. If SG is enabled, external DP/HDMI displays are muxed to the Intel GPU, hence the HDA controller on the discrete GPU cannot communicate with the attached displays. Thanks, Lukas
Re: Potential problem with 31e77c93e432dec7 ("sched/fair: Update blocked load when newly idle")
On 13 April 2018 at 22:38, Niklas Söderlund wrote: > Hi Vincent, > > On 2018-04-12 13:15:19 +0200, Niklas Söderlund wrote: >> Hi Vincent, >> >> Thanks for your feedback. >> >> On 2018-04-12 12:33:27 +0200, Vincent Guittot wrote: >> > Hi Niklas, >> > >> > On 12 April 2018 at 11:18, Niklas Söderlund >> > wrote: >> > > Hi Vincent, >> > > >> > > I have observed issues running on linus/master from a few days back [1]. >> > > I'm running on a Renesas Koelsch board (arm32) and I can trigger a issue >> > > by X forwarding the v4l2 test application qv4l2 over ssh and moving the >> > > courser around in the GUI (best test case description award...). I'm >> > > sorry about the really bad way I trigger this but I can't do it in any >> > > other way, I'm happy to try other methods if you got some ideas. The >> > > symptom of the issue is a complete hang of the system for more then 30 >> > > seconds and then this information is printed in the console: >> > >> > Heiner (edded cc) also reported similar problem with his platform: a >> > dual core celeron >> > >> > Do you confirm that your platform is a dual cortex-A15 ? At least that >> > what I have seen on web >> > This would confirm that dual system is a key point. >> >> I can confirm that my platform is a dual core. > > I tested another dual core system today Renesas M3-W ARM64 system and I > can observe the same lockups-on that system if it helps you understand > the problem. It seems to be much harder to trigger the issue on this > system for some reason. Hitting return in a ssh session don't seem to > produce the lockup while starting a GUI using X forwarding over ssh it's > possible. Thanks for the test. That's confirm, it's only happen on dual core > > [ 392.306441] INFO: rcu_preempt detected stalls on CPUs/tasks: > [ 392.312201] (detected by 0, t=19366 jiffies, g=7177, c=7176, q=35) > [ 392.318555] All QSes seen, last rcu_preempt kthread activity 19368 > (4294990375-4294971007), jiffies_till_next_fqs=1, root ->qsmask 0x0 > [ 392.330758] swapper/0 R running task0 0 0 > 0x0022 > [ 392.337883] Call trace: > [ 392.340365] dump_backtrace+0x0/0x1c8 > [ 392.344065] show_stack+0x14/0x20 > [ 392.347416] sched_show_task+0x224/0x2e8 > [ 392.351377] rcu_check_callbacks+0x8ac/0x8b0 > [ 392.355686] update_process_times+0x2c/0x58 > [ 392.359908] tick_sched_handle.isra.5+0x30/0x50 > [ 392.364479] tick_sched_timer+0x40/0x90 > [ 392.368351] __hrtimer_run_queues+0xfc/0x208 > [ 392.372659] hrtimer_interrupt+0xd4/0x258 > [ 392.376710] arch_timer_handler_virt+0x28/0x48 > [ 392.381194] handle_percpu_devid_irq+0x80/0x138 > [ 392.385767] generic_handle_irq+0x28/0x40 > [ 392.389813] __handle_domain_irq+0x5c/0xb8 > [ 392.393946] gic_handle_irq+0x58/0xa8 > [ 392.397640] el1_irq+0xb4/0x130 > [ 392.400810] arch_cpu_idle+0x14/0x20 > [ 392.404422] default_idle_call+0x1c/0x38 > [ 392.408381] do_idle+0x17c/0x1f8 > [ 392.411640] cpu_startup_entry+0x20/0x28 > [ 392.415598] rest_init+0x24c/0x260 > [ 392.419037] start_kernel+0x3e8/0x414 > > I was running the same tests on another ARM64 platform earlier using the > same build which have more then two cores and there I could not observe > this issue. > > -- > Regards, > Niklas Söderlund
Re: [PATCH 2/4 v4] sched/rt: add rt_rq utilization tracking
On 14 April 2018 at 12:05, Peter Zijlstra wrote: > On Fri, Mar 16, 2018 at 12:25:39PM +0100, Vincent Guittot wrote: >> diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h >> index 783eacf..a8003a9 100644 >> --- a/kernel/sched/sched.h >> +++ b/kernel/sched/sched.h >> @@ -592,6 +592,8 @@ struct rt_rq { >> unsigned long rt_nr_total; >> int overloaded; >> struct plist_head pushable_tasks; >> + >> + struct sched_avg avg; > > We only want this for the root cgroup, right? So why is this per cgroup? Yes it's only for root cgroup. I have put it there for consistency with the CFS' PELT but it's only waste Bytes > > That is, I was expecting it to be rq::rt_avg or something.
Re: [PATCH 0/4 v4] sched/rt: track rt rq utilization
On 14 April 2018 at 12:07, Peter Zijlstra wrote: > > > What I don't see in this patch-set is removal of the current rt_avg > stuff. This RT load tracking doesn't replace current rt_avg because they are not using same period and providing same function current rt_avg uses sysctl_sched_time_avg to define the averaging period and it's default period is 1 second. But PELT uses a fixed period current rt_avg is tracking irq accounting which this patch doesn't do. This is probably doable but will need more complex changes Replacing current rt_avg by this new RT utilization tracking would require more complex changes so I didn't want to add them this 1st step. > > And I didn't look closely enough; but are the root cfs and rt pelt > windows aligned? They really should be; otherwise you can't combine them > sanely. No They are not aligned. I agree that this could generate some variation on the sum. I'm going to fix this point
[PATCH] isofs: fix potential memory leak in mount option parsing
When specifying string type mount option (e.g., iocharset) several times in a mount, current option parsing may cause memory leak. Hence, call kfree for previous one in this case. Meanwhile, check memory allocation result for it. Signed-off-by: Chengguang Xu --- fs/isofs/inode.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/fs/isofs/inode.c b/fs/isofs/inode.c index bc258a4..ec3fba7 100644 --- a/fs/isofs/inode.c +++ b/fs/isofs/inode.c @@ -394,7 +394,10 @@ static int parse_options(char *options, struct iso9660_options *popt) break; #ifdef CONFIG_JOLIET case Opt_iocharset: + kfree(popt->iocharset); popt->iocharset = match_strdup(&args[0]); + if (!popt->iocharset) + return 0; break; #endif case Opt_map_a: -- 1.8.3.1
[PATCH] exofs: fix potential memory leak in mount option parsing
When specifying string type mount option several times in a mount, current option parsing may cause memory leak. Hence, call kfree for previous one in this case. Signed-off-by: Chengguang Xu --- fs/exofs/super.c | 1 + 1 file changed, 1 insertion(+) diff --git a/fs/exofs/super.c b/fs/exofs/super.c index 179cd5c..106b818 100644 --- a/fs/exofs/super.c +++ b/fs/exofs/super.c @@ -101,6 +101,7 @@ static int parse_options(char *options, struct exofs_mountopt *opts) token = match_token(p, tokens, args); switch (token) { case Opt_name: + kfree(opts->dev_name); opts->dev_name = match_strdup(&args[0]); if (unlikely(!opts->dev_name)) { EXOFS_ERR("Error allocating dev_name"); -- 1.8.3.1
Can i have a word with you?
Disclaimer: This message may contain privileged and confidential information and is solely for the use of intended recipient. The views expressed in this email are those of the sender and not Future Group's. The recipient should check this email and attachments for the presence of viruses. Future Group accepts no liabi lity for any damage caused by any virus transmitted by this email. Future Group may monitor and record all emails.
How to disable tracing at runtime from the Linux kernel command line?
Dear Linux folks, I am trying to reduce the boot time of a standard Linux distribution kernel. Currently, distributions – at least Debian und Ubuntu – enable function tracing. ``` CONFIG_FTRACE=y CONFIG_FUNCTION_TRACER=y CONFIG_FUNCTION_GRAPH_TRACER=y CONFIG_EVENT_TRACING=y ``` This is great, as it makes it easy to use tracing to hunt down things holding up the boot. But it also skews the boot time quite a lot. ``` $ sudo dmesg […] [0.318412] initcall init_graph_trace+0x0/0x64 returned 0 after 199218 usecs […] [1.770287] calling event_trace_init+0x0/0x2c2 @ 1 [2.052871] initcall event_trace_init+0x0/0x2c2 returned 0 after 275942 usecs […] ``` Is there a way to disable tracing on the Linux kernel command line to disable tracing? Kind regards, Paul
Re: [PATCH] Revert "xhci: plat: Register shutdown for xhci_plat"
On Fri, Apr 13, 2018 at 12:34:00PM +0530, Harsh Shandilya wrote: > On 13 April 2018 11:51:28 AM IST, Greg Kroah-Hartman > wrote: > >On Fri, Apr 13, 2018 at 08:12:31AM +0530, Harsh Shandilya wrote: > >> On 13 April 2018 5:59:51 AM IST, Greg Hackmann > >wrote: > >> >Pixel 2 field testers reported that when they tried to reboot their > >> >phones with some USB devices plugged in, the reboot would get wedged > >> >and > >> >eventually trigger watchdog reset. Once the Pixel kernel team found > >a > >> >reliable repro case, they narrowed it down to this commit's 4.4.y > >> >backport. Reverting the change made the issue go away. > >> > >> Are you allowed to make the repro steps public? I'm writing this from > >> a walleye and would be grateful if I could test for this in the > >> modifed tree I'm running atm. -- > > > >I was told the steps are pretty simple: > > - reboot the phone a lot > >eventually it will hang. There's a fix in the code aurora kernel tree > >for this that they never sent upstream for some odd reason (they sent > >the first patch, why not the second?) > > > >I'll go revert this for now, thanks for the patch! > > > >greg k-h > > That'd make sense, I only tried rebooting like five times before I had to run > for a class. > > As far as CAF is concerned, I feel the not submitting upstream, > working extra to write patches which have usually better variants > already upstream, seems to be common. All USB changes were dropped > when they merged kernel-common into msm-3.18 with no real explanation > which has been an annoyance more than once during merging -stable in > my fork of msm-3.18. While I understand their situation of maintaining > upwards of 5 million lines of code not upstream, it still feels sloppy > to not merge stable updates and do extra work instead. /* End rant */ CAF fixed this back on Feb 1 in their tree, yet did not send that upstream, or to anyone else: https://source.codeaurora.org/quic/la/kernel/msm-4.4/commit/?h=LV.HB.1.1.5-03810-8x96.0&id=a7a5307ee04ad349d365ad50f304605a9cd9bd0a Feel free to rant some more, I'm going to go revert the original upstream patch as that is half-completed, and obviously broken :( thanks, greg k-h
Re: [PATCH 2/5] dt-bindings: display: atmel: add optional output-mode property
On 2018-04-13 19:46, Rob Herring wrote: > On Mon, Apr 09, 2018 at 12:59:15PM +0200, Peter Rosin wrote: >> Useful for beating cases where an output mode selection heuristic >> fails. >> >> Signed-off-by: Peter Rosin >> --- >> Documentation/devicetree/bindings/display/atmel/hlcdc-dc.txt | 4 >> 1 file changed, 4 insertions(+) >> >> diff --git a/Documentation/devicetree/bindings/display/atmel/hlcdc-dc.txt >> b/Documentation/devicetree/bindings/display/atmel/hlcdc-dc.txt >> index 82f2acb3d374..dc478455b883 100644 >> --- a/Documentation/devicetree/bindings/display/atmel/hlcdc-dc.txt >> +++ b/Documentation/devicetree/bindings/display/atmel/hlcdc-dc.txt >> @@ -10,6 +10,10 @@ Required properties: >> - #address-cells: should be set to 1. >> - #size-cells: should be set to 0. >> >> +Optional properties: >> + - output-mode: override any output mode selection hueristic and force a >> + particular output mode. One of "rgb444", "rgb565", "rgb666" and "rgb888". >> + > > This needs to be generic, not just added to some random display > controller binding. > > It also belongs in the port or endpoint node as is done for camera > interfaces. Hmm, should I extend media/video-interfaces.txt with more bus types (or since I'm targeting parallel interfaces, perhaps the new bus types should be autodetected from other props?) or should a write a new binding similar to it? One question regarding bus-width, should it include hsync/vsync/de/clk? If yes, how to distinguish rgb565 with all those four from rgb666 with only de/clk (some panels do not need hsync/vsync)? 20 lines in both cases... Or are rgb444/rgb565/rgb666/rgb888 already supported by the media video interface binding? That's not at all obvious to me. Cheers, Peter
Re: [PATCH v2] IB: make INFINIBAND_ADDR_TRANS configurable
On 4/13/2018 1:27 PM, Greg Thelen wrote: Allow INFINIBAND without INFINIBAND_ADDR_TRANS. Signed-off-by: Greg Thelen Cc: Tarick Bedeir Change-Id: I6fbbf8a432e467710fa65e4904b7d61880b914e5 Forgot to remove the Gerrit thing. -Denny
Re: INFO: task hung in __blkdev_get
OK. The patch was sent to linux.git as commit 1e047eaab3bb5564. #syz fix: block/loop: fix deadlock after loop_set_status Dmitry Vyukov " wrote: > On Tue, Apr 10, 2018 at 3:04 PM, Tetsuo Handa > wrote: > > Dmitry Vyukov wrote: > >> On Tue, Apr 10, 2018 at 12:55 PM, Tetsuo Handa > >> wrote: > >> > Hello. > >> > > >> > Since syzbot is reporting so many hung up bug which involves /dev/loopX , > >> > is it possible to "temporarily" apply below patch for testing under > >> > syzbot > >> > >> Unfortunately it's not possible, for full explanation please see: > >> https://github.com/google/syzkaller/blob/master/docs/syzbot.md#no-custom-patches > >> > > > > I mean, sending custom patch to linux.git for -rc and revert the custom > > patch > > before -final is released. It won't take so much period until we get the > > result. > > Ah, I see, then I guess it wasn't a question to me. > I noticed that there already is the lockdep report at possible deadlock in blkdev_reread_part https://syzkaller.appspot.com/bug?id=bf154052f0eea4bc7712499e4569505907d15889 entry, and no patch is proposed yet: https://groups.google.com/forum/#!msg/syzkaller-bugs/2Rw8-OM6IbM/SI4DyK-1AQAJ
Re: WARNING: lock held when returning to user space!
The patch was sent to linux.git as commit bdac616db9bbadb9. #syz fix: loop: fix LOOP_GET_STATUS lock imbalance
Re: [PATCH v2] IB: make INFINIBAND_ADDR_TRANS configurable
On Sat, Apr 14, 2018 at 8:13 AM Dennis Dalessandro < dennis.dalessan...@intel.com> wrote: > On 4/13/2018 1:27 PM, Greg Thelen wrote: > > Allow INFINIBAND without INFINIBAND_ADDR_TRANS. > > > > Signed-off-by: Greg Thelen > > Cc: Tarick Bedeir > > Change-Id: I6fbbf8a432e467710fa65e4904b7d61880b914e5 > Forgot to remove the Gerrit thing. > -Denny Ack. My bad. Will repost. Unfortunately checkpatch didn't notice.
[PATCH v3] IB: make INFINIBAND_ADDR_TRANS configurable
Allow INFINIBAND without INFINIBAND_ADDR_TRANS. Signed-off-by: Greg Thelen Cc: Tarick Bedeir --- drivers/infiniband/Kconfig | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/drivers/infiniband/Kconfig b/drivers/infiniband/Kconfig index ee270e065ba9..2a972ed6851b 100644 --- a/drivers/infiniband/Kconfig +++ b/drivers/infiniband/Kconfig @@ -61,9 +61,12 @@ config INFINIBAND_ON_DEMAND_PAGING pages on demand instead. config INFINIBAND_ADDR_TRANS - bool + bool "RDMA/CM" depends on INFINIBAND default y + ---help--- + Support for RDMA communication manager (CM). + This allows for a generic connection abstraction over RDMA. config INFINIBAND_ADDR_TRANS_CONFIGFS bool -- 2.17.0.484.g0c8726318c-goog
Re: tg3 crashes under high load, when using 100Mbits
Hi Satish, > On 2018Mar21, at 00:57, Kai-Heng Feng wrote: > > Satish Baddipadige wrote: > >> On Thu, Feb 15, 2018 at 7:37 PM, Siva Reddy Kallam >> wrote: >>> On Mon, Feb 12, 2018 at 10:59 AM, Siva Reddy Kallam >>> wrote: On Fri, Feb 9, 2018 at 10:41 AM, Kai Heng Feng wrote: > Hi Broadcom folks, > > We are now enabling a new platform with tg3 nic, unfortunately we observed > the bug [1] that dated back to 2015. > I tried commit 4419bb1cedcd ("tg3: Add workaround to restrict 5762 MRRS to > 2048”) but it does’t work. > > Do you have any idea how to solve the issue? > > [1] https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1447664 > > Kai-Heng Thank you for reporting. We will check and update you. >>> With link aware mode, the clock speed could be slow and boot code does not >>> complete within the expected time with lower link speeds. Need to override >>> and the clock in driver. We are checking the feasibility of adding >>> this in driver or firmware. >> >> Hi Kai-Heng, >> >> Can you please test the attached patch? > > I built a kernel and asked affected users to try. Users reported that the crash still happens with the patch. Kai-Heng > > Thanks for your work. > > Kai-Heng > >> >> Thanks, >> Satish >>
Re: [PATCH v13 6/6] PCI/DPC: Do not do recovery for hotplug enabled system
Hi Keith, Bjorn; On 4/12/2018 1:41 PM, Sinan Kaya wrote: > On 4/12/2018 1:09 PM, Keith Busch wrote: >> On Thu, Apr 12, 2018 at 12:27:20PM -0400, Sinan Kaya wrote: >>> On 4/12/2018 11:02 AM, Keith Busch wrote: Also, I thought the plan was to keep hotplug and non-hotplug the same, except for the very end: if not a hotplug bridge, initiate the rescan automatically after releasing from containment, otherwise let pciehp handle it when the link reactivates. >>> >>> Hmm... >>> >>> AER driver doesn't do stop and rescan approach for fatal errors. AER driver >>> makes an error callback followed by secondary bus reset and finally driver >>> the resume callback on the endpoint only if link recovery is successful. >>> Otherwise, AER driver bails out with recovery unsuccessful message. >> >> I'm not sure if that's necessarily true. People have reported AER >> handling triggers PCIe hotplug events, and creates some interesting race >> conditions: > > By reading the code, I don't see a stop and rescan in the AER error recovery > path. > > As both logs indicate, stop and rescan is initiated in response to link down > and link up interrupts triggered by the secondary bus reset. > The SW entity handling these is not AER driver. It is the hotplug driver > running asynchronous to the AER driver. > > AER driver should have tried a slot reset before attempting to do a secondary > bus reset. > > /** > * pci_reset_slot - reset a PCI slot > * @slot: PCI slot to reset > * > * A PCI bus may host multiple slots, each slot may support a reset mechanism > * independent of other slots. For instance, some slots may support slot > power > * control. In the case of a 1:1 bus to slot architecture, this function may > * wrap the bus reset to avoid spurious slot related events such as hotplug. > * Generally a slot reset should be attempted before a bus reset. All of the > * function of the slot and any subordinate buses behind the slot are reset > * through this function. PCI config space of all devices in the slot and > * behind the slot is saved before and restored after reset. > * > * Return 0 on success, non-zero on error. > */ > int pci_reset_slot(struct pci_slot *slot) > > Slot reset is there to mask hotplug interrupts before the reset and unmask > them > after reset. > >> >> https://marc.info/?l=linux-pci&m=152336615707640&w=2 >> >> https://www.spinics.net/lists/linux-pci/msg70614.html >> >>> Why do we need an additional rescan in the DPC driver if the link is up >>> and driver resumes operation? >> >> I thought the plan was to have DPC always go through the removal path >> to ensure all devices are properly configured when containment is >> released. In order to reconfigure those, you'll need to initiate the >> rescan from somewhere. >> > > This is where the contradiction is. > > Bjorn is asking for a unified error handling for both AER and DPC. > > Current AER error recovery framework is error callback + secondary > bus reset + resume callback. > > How does this stop + rescan model fit? > > Do we want to change the error recovery framework? I suppose this will > become a bigger conversation as there are more customers of this. > I also want to highlight that the PCI Error recovery sequence is well documented here. https://www.kernel.org/doc/Documentation/PCI/pci-error-recovery.txt We don't really have to guess what Linux does. IMO, the hotplug issues Keith is seeing are orthogonal and needs to be addressed independent of this series by following the pci slot reset procedure. Hotplug driver handles link up/down events due to insertion/removal. Hotplug driver is expected to do the re-enumeration. I don't understand why we need to do another re-enumeration if system observes a PCIe error handled by the AER/DPC driver. These two are independent events. PCIe error recovery framework does the reset callback + SBR + resume behavior today. Bjorn, You indicated that you want to unify the AER and DPC behavior. Let's settle on what we want to do one more time. We have been going forth and back on the direction. We are on V13. I hope we won't hit V20 :) Sinan -- Sinan Kaya Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc. Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.
Re: [PATCH 2/2] kvm: nVMX: Introduce KVM_CAP_STATE
On Thu, 2018-04-12 at 17:12 +0200, KarimAllah Ahmed wrote: > From: Jim Mattson > > For nested virtualization L0 KVM is managing a bit of state for L2 guests, > this state can not be captured through the currently available IOCTLs. In > fact the state captured through all of these IOCTLs is usually a mix of L1 > and L2 state. It is also dependent on whether the L2 guest was running at > the moment when the process was interrupted to save its state. > > With this capability, there are two new vcpu ioctls: KVM_GET_VMX_STATE and > KVM_SET_VMX_STATE. These can be used for saving and restoring a VM that is > in VMX operation. > > Cc: Paolo Bonzini > Cc: Radim Krčmář > Cc: Thomas Gleixner > Cc: Ingo Molnar > Cc: H. Peter Anvin > Cc: x...@kernel.org > Cc: k...@vger.kernel.org > Cc: linux-kernel@vger.kernel.org > Signed-off-by: Jim Mattson > [karahmed@ - rename structs and functions and make them ready for AMD and > address previous comments. >- rebase & a bit of refactoring. >- Merge 7/8 and 8/8 into one patch. >- Force a VMExit from L2 after reading the kvm_state to avoid > mixed state between L1 and L2 on resurrecting the instance. ] > Signed-off-by: KarimAllah Ahmed > --- > v2 -> v3: > - Remove the forced VMExit from L2 after reading the kvm_state. The actual > problem is solved. > - Rebase again! > - Set nested_run_pending during restore (not sure if it makes sense yet or > not). > - Reduce KVM_REQUEST_ARCH_BASE to 7 instead of 8 (the other alternative is > to switch everything to u64) > > v1 -> v2: > - Rename structs and functions and make them ready for AMD and address > previous comments. > - Rebase & a bit of refactoring. > - Merge 7/8 and 8/8 into one patch. > - Force a VMExit from L2 after reading the kvm_state to avoid mixed state > between L1 and L2 on resurrecting the instance. > --- > Documentation/virtual/kvm/api.txt | 47 ++ > arch/x86/include/asm/kvm_host.h | 7 ++ > arch/x86/include/uapi/asm/kvm.h | 38 > arch/x86/kvm/vmx.c| 177 > +- > arch/x86/kvm/x86.c| 21 + > include/linux/kvm_host.h | 2 +- > include/uapi/linux/kvm.h | 5 ++ > 7 files changed, 292 insertions(+), 5 deletions(-) > > diff --git a/Documentation/virtual/kvm/api.txt > b/Documentation/virtual/kvm/api.txt > index 1c7958b..c51d5d3 100644 > --- a/Documentation/virtual/kvm/api.txt > +++ b/Documentation/virtual/kvm/api.txt > @@ -3548,6 +3548,53 @@ Returns: 0 on success, > -ENOENT on deassign if the conn_id isn't registered > -EEXIST on assign if the conn_id is already registered > > +4.114 KVM_GET_STATE > + > +Capability: KVM_CAP_STATE > +Architectures: x86 > +Type: vcpu ioctl > +Parameters: struct kvm_state (in/out) > +Returns: 0 on success, -1 on error > +Errors: > + E2BIG: the data size exceeds the value of 'size' specified by > + the user (the size required will be written into size). > + > +struct kvm_state { > + __u16 flags; > + __u16 format; > + __u32 size; > + union { > + struct kvm_vmx_state vmx; > + struct kvm_svm_state svm; > + __u8 pad[120]; > + }; > + __u8 data[0]; > +}; > + > +This ioctl copies the vcpu's kvm_state struct from the kernel to userspace. > + > +4.115 KVM_SET_STATE > + > +Capability: KVM_CAP_STATE > +Architectures: x86 > +Type: vcpu ioctl > +Parameters: struct kvm_state (in) > +Returns: 0 on success, -1 on error > + > +struct kvm_state { > + __u16 flags; > + __u16 format; > + __u32 size; > + union { > + struct kvm_vmx_state vmx; > + struct kvm_svm_state svm; > + __u8 pad[120]; > + }; > + __u8 data[0]; > +}; > + > +This copies the vcpu's kvm_state struct from userspace to the kernel. > +>>> 13a7c9e... kvm: nVMX: Introduce KVM_CAP_STATE > > 5. The kvm_run structure > > diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h > index 9fa4f57..ad2116a 100644 > --- a/arch/x86/include/asm/kvm_host.h > +++ b/arch/x86/include/asm/kvm_host.h > @@ -75,6 +75,7 @@ > #define KVM_REQ_HV_EXIT KVM_ARCH_REQ(21) > #define KVM_REQ_HV_STIMERKVM_ARCH_REQ(22) > #define KVM_REQ_LOAD_EOI_EXITMAP KVM_ARCH_REQ(23) > +#define KVM_REQ_GET_VMCS12_PAGES KVM_ARCH_REQ(24) > > #define CR0_RESERVED_BITS \ > (~(unsigned long)(X86_CR0_PE | X86_CR0_MP | X86_CR0_EM | X86_CR0_TS \ > @@ -1084,6 +1085,12 @@ struct kvm_x86_ops { > > void (*setup_mce)(struct kvm_vcpu *vcpu); > > + int (*get_state)(struct kvm_vcpu *vcpu, > + struct kvm_state __user *user_kvm_state); > + int (*set_state)(struct kvm_vcpu *vcpu, > + struct kvm_state __user *user_kvm_state); > + void (*get_vmcs12_pages)(str
Re: [PATCH v2] IB: make INFINIBAND_ADDR_TRANS configurable
On Sat, 2018-04-14 at 15:34 +, Greg Thelen wrote: > On Sat, Apr 14, 2018 at 8:13 AM Dennis Dalessandro < > dennis.dalessan...@intel.com> wrote: > > > On 4/13/2018 1:27 PM, Greg Thelen wrote: > > > Allow INFINIBAND without INFINIBAND_ADDR_TRANS. > > > > > > Signed-off-by: Greg Thelen > > > Cc: Tarick Bedeir > > > Change-Id: I6fbbf8a432e467710fa65e4904b7d61880b914e5 > > Forgot to remove the Gerrit thing. > > -Denny > > Ack. My bad. Will repost. Unfortunately checkpatch didn't notice. Probably because Change-Id: is after a Signed-off-by: line
Re: [PATCH] pinctrl/samsung: Correct EINTG banks order
On Wednesday, April 11, 2018 11:52:44 AM CEST Krzysztof Kozlowski wrote: > On Wed, Apr 11, 2018 at 10:36 AM, Tomasz Figa wrote: > > 2018-04-10 17:38 GMT+09:00 Tomasz Figa : > >> 2018-04-10 16:06 GMT+09:00 Krzysztof Kozlowski : > >>> On Sun, Apr 8, 2018 at 8:07 PM, Paweł Chmiel > >>> wrote: > All banks with GPIO interrupts should be at beginning > of bank array and without any other types of banks between them. > This order is expected by exynos_eint_gpio_irq, when doing > interrupt group to bank translation. > Otherwise, kernel NULL pointer dereference would happen > when trying to handle interrupt, due to wrong bank being looked up. > Observed on s5pv210, when trying to handle gpj0 interrupt, > where kernel was mapping it to gpi bank. > >>> > >>> Thanks for the patch. The issue looks real although one thing was > >>> missed - there is a gap in SVC group between GPK2 and GPL0 (pointed by > >>> Marek Szyprowski): > >>> > >>> 0x0 - EINT_23 - gpk0 > >>> 0x1 - EINT_24 - gpk1 > >>> 0x2 - EINT_25 - gpk2 > >>> 0x4 - EINT_27 - gpl0 > >>> 0x7 - EINT_8 - gpm0 > >>> > >>> Maybe this should be done differently - to remove such hidden > >>> requirement entirely in favor of another parameter of > >>> EXYNOS_PIN_BANK_EINTG argument? > >> > >> Perhaps let's limit this patch to s5pv210 and Exynos5410 alone, where > >> a simple swap of bank order in the arrays should be okay. > >> > >> We might also need to have some fixes on 4x12, because I noticed that > >> in exynos4x12_pin_banks0[] there is a hole in eint_offsets between > >> gpd1 and gpf0 and exynos4x12_pin_banks1[] starts with gpk0 that has > >> eint_offset equal to 0x08 (not 0). > > > > To close the loop, after talking offline and checking the > > documentation, Exynos4x12 is fine, because the group numbers in SVC > > register actually match what is defined in bank arrays. > > Great! Thanks for checking. > > Best regards, > Krzysztof > Thanks for all comments. I'll prepare new version of patches, with all fixes and documentation. Best regards Paweł
Re: How to disable tracing at runtime from the Linux kernel command line?
On Sat, 14 Apr 2018 15:09:33 +0200 Paul Menzel wrote: > Dear Linux folks, > > > I am trying to reduce the boot time of a standard Linux distribution > kernel. Currently, distributions – at least Debian und Ubuntu – enable > function tracing. > > ``` > CONFIG_FTRACE=y > CONFIG_FUNCTION_TRACER=y > CONFIG_FUNCTION_GRAPH_TRACER=y > > CONFIG_EVENT_TRACING=y > ``` > > This is great, as it makes it easy to use tracing to hunt down things > holding up the boot. But it also skews the boot time quite a lot. > > ``` > $ sudo dmesg > […] > [0.318412] initcall init_graph_trace+0x0/0x64 returned 0 after > 199218 usecs > […] > [1.770287] calling event_trace_init+0x0/0x2c2 @ 1 > [2.052871] initcall event_trace_init+0x0/0x2c2 returned 0 after > 275942 usecs > […] > ``` > > Is there a way to disable tracing on the Linux kernel command line to > disable tracing? > Try initcall_blacklist. But you acquire all risks when doing so. I never tried it, so I have no idea what side effects that may have. -- Steve
Re: [PATCH] fs/dcache.c: re-add cond_resched() in shrink_dcache_parent()
On Sat, Apr 14, 2018 at 1:02 AM, Al Viro wrote: > > "Bail out" is definitely a bad idea, "sleep"... what on? Especially > since there might be several evictions we are overlapping with... Well, one thing that should be looked at is the return condition from select_collect() that shrink_dcache_parent() uses. Because I think that return condition is somewhat insane. The logic there seems to be: - if we have found something, stop walking. Either NOW (if somebody is waiting) or after you've hit a rename (if nobody is) Now, this actually makes perfect sense for the whole rename situation: if there's nobody waiting for us, but we hit a rename, we probably should stop anyway just to let whoever is doing that rename continue, and we might as well try to get rid of the dentries we have found so far. But it does *not* make sense for the case where we've hit a dentry that is already on the shrink list. Sure, we'll continue to gather all the other dentries, but if there is concurrent shrinking, shouldn't we give up the CPU more eagerly - *particularly* if somebody else is waiting (it might be the other process that actually gets rid of the shrinking dentries!)? So my gut feel is that we should at least try doing something like this in select_collect(): - if (!list_empty(&data->dispose)) + if (data->found) ret = need_resched() ? D_WALK_QUIT : D_WALK_NORETRY; because even if we haven't actually been able to shrink something, if we hit an already shrinking entry we should probably at least not do the "retry for rename". And if we actually are going to reschedule, we might as well start from the beginning. I realize that *this* thread might not be making any actual progress (because it didn't find any dentries to shrink), but since it did find _a_ dentry that is being shrunk, we know the operation itself - on a bigger scale - is making progress. Hmm? Now, this is independent of the fact that we probably do need a cond_resched() in shrink_dcache_parent(), to actually do the reschedule if we're not preemptible. The "need_resched()" in select_collect() is obviously done while holding HOWEVER. Even in that case, I don't think shrink_dcache_parent() is the right point. I'd rather just do it differently in shrink_dentry_list(): do it even for the empty list case by just doing it at the top of the loop: static void shrink_dentry_list(struct list_head *list) { - while (!list_empty(list)) { + while (cond_resched(), !list_empty(list)) { struct dentry *dentry, *parent; - cond_resched(); so my full patch that I would suggest might be TheRightThing(tm) is attached (but it should be committed as two patches, since the two issues are independent - I'm just attaching it as one for testing in case somebody wants to run some nasty workloads on it) Comments? Side note: I think we might want to make that while (cond_resched(), ) { } thing a pattern for doing cond_resched() in loops, instead of having the cond_resched() inside the loop itself. It not only handles the "zero iterations" case, it also ends up being neutral location-waise wrt 'continue' statements, and potentially generates *better* code. For example, in this case, doing the cond_resched() at the very top of the loop means that the loop itself then does that dentry = list_entry(list->prev, struct dentry, d_lru); right after the "list_empty()" test - which means that register allocation etc might be easier, because it doesn't have a function call (with associated register clobbers) in between the two accesses to "list". And I think that might be a fairly common pattern - the loop conditional uses the same values as the loop itself then uses. I don't know. Maybe I'm just making excuses for the somewhat unusual syntax. Anybody want to test this out? Linus fs/dcache.c | 6 ++ 1 file changed, 2 insertions(+), 4 deletions(-) diff --git a/fs/dcache.c b/fs/dcache.c index 86d2de63461e..76507109cbcd 100644 --- a/fs/dcache.c +++ b/fs/dcache.c @@ -1049,11 +1049,9 @@ static bool shrink_lock_dentry(struct dentry *dentry) static void shrink_dentry_list(struct list_head *list) { - while (!list_empty(list)) { + while (cond_resched(), !list_empty(list)) { struct dentry *dentry, *parent; - cond_resched(); - dentry = list_entry(list->prev, struct dentry, d_lru); spin_lock(&dentry->d_lock); rcu_read_lock(); @@ -1462,7 +1460,7 @@ static enum d_walk_ret select_collect(void *_data, struct dentry *dentry) * ensures forward progress). We'll be coming back to find * the rest. */ - if (!list_empty(&data->dispose)) + if (data->found) ret = need_resched() ? D_WALK_QUIT : D_WALK_NORETRY; out: return ret;
Re: [PATCH] selftests:vm: add include file
On Sun, Apr 15, 2018 at 03:08:56AM +0800, Peng Hao wrote: > userfaultfd.c: In function ‘hugetlb_release_pages’: > userfaultfd.c:145:25: error: ‘FALLOC_FL_PUNCH_HOLE’ undeclared > (first use in this function) > > Signed-off-by: Peng Hao > --- > tools/testing/selftests/vm/userfaultfd.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/tools/testing/selftests/vm/userfaultfd.c > b/tools/testing/selftests/vm/userfaultfd.c > index de2f9ec..d8fe447 100644 > --- a/tools/testing/selftests/vm/userfaultfd.c > +++ b/tools/testing/selftests/vm/userfaultfd.c > @@ -68,6 +68,7 @@ > #include > #include > #include > +#include The FALLOC_FL_PUNCH_HOLE definition should come from #include . What are the versions of your kernel and the libc-development package? > #ifdef __NR_userfaultfd > > -- > 1.8.3.1 > -- Sincerely yours, Mike.
Re: kernel-4.9.94 compile error: 'KMOD_DECOMP_LEN' undeclared
On Sat, 14 Apr 2018 17:41:13 +0800, Teck Choon Giam wrote: > Hi, > > Compile linux-4.9.94 will have error related to KMOD_DECOMP_LEN > undeclared. Searching string related to KMOD_DECOMP_LEN in > linux-4.9.94 and linux-4.15.17 sources as below: > > sh-4.2# grep -r KMOD_DECOMP_LEN ./linux-4.15.17 > ./linux-4.15.17/tools/perf/tests/code-reading.c: char > decomp_name[KMOD_DECOMP_LEN]; > ./linux-4.15.17/tools/perf/util/dso.h:#define KMOD_DECOMP_LEN > sizeof(KMOD_DECOMP_NAME) > ./linux-4.15.17/tools/perf/util/annotate.c: char tmp[KMOD_DECOMP_LEN]; > ./linux-4.15.17/tools/perf/util/dso.c: char newpath[KMOD_DECOMP_LEN]; > sh-4.2# grep -r KMOD_DECOMP_LEN ./linux-4.9.94 > ./linux-4.9.94/tools/perf/tests/code-reading.c: char > decomp_name[KMOD_DECOMP_LEN]; > ./linux-4.9.94/tools/perf/util/dso.c: char newpath[KMOD_DECOMP_LEN]; > > So I guess for linux-4.9.94 has not define KMOD_DECOMP_LEN in > tools/perf/util/dso.h? > > Thanks. > > Regards, > Giam Teck Choon Just a note to say that we (ELRepo) see the same error when building kernel-4.4.128 on RHEL 6 and RHEL 7. Kernel 4.4.127 built fine. Akemi The ELRepo Project
Re: [PATCH 4/4] ALSA: usb: add UAC3 BADD profiles support
On 2018-04-13 23:24, Ruslan Bilovol wrote: Recently released USB Audio Class 3.0 specification contains BADD (Basic Audio Device Definition) document which describes pre-defined UAC3 configurations. BADD support is mandatory for UAC3 devices, it should be implemented as a separate USB device configuration. As per BADD document, class-specific descriptors shall not be included in the Device’s Configuration descriptor ("inferred"), but host can guess them from BADD profile number, number of endpoints and their max packed sizes. Right. I would have thought that, since BADD is a subset of UAC3, it may be simpler to fill the Class Specific descriptors buffer and let the UAC3 path intact as it would result in the same behavior (for UAC3 and BADD configs) without the need to add that much code to the mixer, which is already quite big. In the patch I proposed [1], the Class Specific buffer is filled once with the BADD descriptors, which are already UAC3 compliant, so the driver would handle the rest in the same way it would do with an UAC3 configuration. I will keep an eye on this as I'd need to do some work based on this instead. [1] https://www.spinics.net/lists/alsa-devel/msg71617.html Thanks, Jorge This patch adds support of all BADD profiles from the spec Signed-off-by: Ruslan Bilovol --- sound/usb/card.c | 14 +++ sound/usb/clock.c | 9 +- sound/usb/mixer.c | 313 +++-- sound/usb/mixer_maps.c | 65 ++ sound/usb/stream.c | 83 +++-- sound/usb/usbaudio.h | 2 + 6 files changed, 466 insertions(+), 20 deletions(-) diff --git a/sound/usb/card.c b/sound/usb/card.c index 4d866bd..47ebc50 100644 --- a/sound/usb/card.c +++ b/sound/usb/card.c @@ -307,6 +307,20 @@ static int snd_usb_create_streams(struct snd_usb_audio *chip, int ctrlif) return -EINVAL; } + if (protocol == UAC_VERSION_3) { + int badd = assoc->bFunctionSubClass; + + if (badd != UAC3_FUNCTION_SUBCLASS_FULL_ADC_3_0 && + (badd < UAC3_FUNCTION_SUBCLASS_GENERIC_IO || +badd > UAC3_FUNCTION_SUBCLASS_SPEAKERPHONE)) { + dev_err(&dev->dev, + "Unsupported UAC3 BADD profile\n"); + return -EINVAL; + } + + chip->badd_profile = badd; + } + for (i = 0; i < assoc->bInterfaceCount; i++) { int intf = assoc->bFirstInterface + i; diff --git a/sound/usb/clock.c b/sound/usb/clock.c index 0b030d8..17673f3 100644 --- a/sound/usb/clock.c +++ b/sound/usb/clock.c @@ -587,8 +587,15 @@ int snd_usb_init_sample_rate(struct snd_usb_audio *chip, int iface, default: return set_sample_rate_v1(chip, iface, alts, fmt, rate); - case UAC_VERSION_2: case UAC_VERSION_3: + if (chip->badd_profile >= UAC3_FUNCTION_SUBCLASS_GENERIC_IO) { + if (rate != UAC3_BADD_SAMPLING_RATE) + return -ENXIO; + else + return 0; + } + /* fall through */ + case UAC_VERSION_2: return set_sample_rate_v2v3(chip, iface, alts, fmt, rate); } } diff --git a/sound/usb/mixer.c b/sound/usb/mixer.c index 301ad61..e5c3b0d 100644 --- a/sound/usb/mixer.c +++ b/sound/usb/mixer.c @@ -112,14 +112,12 @@ enum { #include "mixer_maps.c" static const struct usbmix_name_map * -find_map(struct mixer_build *state, int unitid, int control) +find_map(const struct usbmix_name_map *p, int unitid, int control) { - const struct usbmix_name_map *p = state->map; - if (!p) return NULL; - for (p = state->map; p->id; p++) { + for (; p->id; p++) { if (p->id == unitid && (!control || !p->control || control == p->control)) return p; @@ -1333,6 +1331,76 @@ static struct usb_feature_control_info *get_feature_control_info(int control) return NULL; } +static void build_feature_ctl_badd(struct usb_mixer_interface *mixer, + unsigned int ctl_mask, int control, int unitid, + const struct usbmix_name_map *badd_map) +{ + struct usb_feature_control_info *ctl_info; + unsigned int len = 0; + struct snd_kcontrol *kctl; + struct usb_mixer_elem_info *cval; + const struct usbmix_name_map *map; + + map = find_map(badd_map, unitid, control); + if (!map) + return; + + cval = kzalloc(sizeof(*cval), GFP_KERNEL); + if (!cval) + return; + snd_usb_mixer_elem_init_std(&cval->head, mixer, unitid); + cval->control = control; + cval->cmask = ctl_mask;
Regression with 5dcd8400884c ("macsec: missing dev_put() on error in macsec_newlink()")
Hi, Fedora got a bug report of a regression when trying to remove the the macsec module (https://bugzilla.redhat.com/show_bug.cgi?id=1566410). I did a bisect and found commit 5dcd8400884cc4a043a6d4617e042489e5d566a9 Author: Dan Carpenter Date: Wed Mar 21 11:09:01 2018 +0300 macsec: missing dev_put() on error in macsec_newlink() We moved the dev_hold(real_dev); call earlier in the function but forgot to update the error paths. Fixes: 0759e552bce7 ("macsec: fix negative refcnt on parent link") Signed-off-by: Dan Carpenter Signed-off-by: David S. Miller The script I used for testing based on the reporter is attached. It looks like modprobe is stuck in the D state. Any idea? Thanks, Laura mac-sec-setup.sh Description: application/shellscript
Re: [PATCH v3] IB: make INFINIBAND_ADDR_TRANS configurable
Hi Greg, Thank you for the patch! Yet something to improve: [auto build test ERROR on linus/master] [also build test ERROR on v4.16 next-20180413] [if your patch is applied to the wrong git tree, please drop us a note to help improve the system] url: https://github.com/0day-ci/linux/commits/Greg-Thelen/IB-make-INFINIBAND_ADDR_TRANS-configurable/20180414-234042 config: x86_64-randconfig-x011-201815 (attached as .config) compiler: gcc-7 (Debian 7.3.0-1) 7.3.0 reproduce: # save the attached .config to linux build tree make ARCH=x86_64 All errors (new ones prefixed by >>): drivers/nvme/host/rdma.o: In function `nvme_rdma_stop_queue': >> drivers/nvme/host/rdma.c:554: undefined reference to `rdma_disconnect' drivers/nvme/host/rdma.o: In function `nvme_rdma_create_qp': >> drivers/nvme/host/rdma.c:258: undefined reference to `rdma_create_qp' drivers/nvme/host/rdma.o: In function `nvme_rdma_free_queue': >> drivers/nvme/host/rdma.c:570: undefined reference to `rdma_destroy_id' drivers/nvme/host/rdma.o: In function `nvme_rdma_alloc_queue': >> drivers/nvme/host/rdma.c:511: undefined reference to `__rdma_create_id' >> drivers/nvme/host/rdma.c:523: undefined reference to `rdma_resolve_addr' drivers/nvme/host/rdma.c:544: undefined reference to `rdma_destroy_id' drivers/nvme/host/rdma.o: In function `nvme_rdma_addr_resolved': >> drivers/nvme/host/rdma.c:1461: undefined reference to `rdma_resolve_route' drivers/nvme/host/rdma.o: In function `nvme_rdma_create_queue_ib': >> drivers/nvme/host/rdma.c:485: undefined reference to `rdma_destroy_qp' drivers/nvme/host/rdma.o: In function `nvme_rdma_route_resolved': >> drivers/nvme/host/rdma.c:1512: undefined reference to `rdma_connect' drivers/nvme/host/rdma.o: In function `nvme_rdma_conn_rejected': >> drivers/nvme/host/rdma.c:1436: undefined reference to `rdma_reject_msg' >> drivers/nvme/host/rdma.c:1437: undefined reference to >> `rdma_consumer_reject_data' vim +554 drivers/nvme/host/rdma.c f41725bb Israel Rukshin2017-11-26 423 ca6e95bb Sagi Grimberg 2017-05-04 424 static int nvme_rdma_create_queue_ib(struct nvme_rdma_queue *queue) 71102307 Christoph Hellwig 2016-07-06 425 { ca6e95bb Sagi Grimberg 2017-05-04 426 struct ib_device *ibdev; 71102307 Christoph Hellwig 2016-07-06 427 const int send_wr_factor = 3; /* MR, SEND, INV */ 71102307 Christoph Hellwig 2016-07-06 428 const int cq_factor = send_wr_factor + 1; /* + RECV */ 71102307 Christoph Hellwig 2016-07-06 429 int comp_vector, idx = nvme_rdma_queue_idx(queue); 71102307 Christoph Hellwig 2016-07-06 430 int ret; 71102307 Christoph Hellwig 2016-07-06 431 ca6e95bb Sagi Grimberg 2017-05-04 432 queue->device = nvme_rdma_find_get_device(queue->cm_id); ca6e95bb Sagi Grimberg 2017-05-04 433 if (!queue->device) { ca6e95bb Sagi Grimberg 2017-05-04 434 dev_err(queue->cm_id->device->dev.parent, ca6e95bb Sagi Grimberg 2017-05-04 435 "no client data found!\n"); ca6e95bb Sagi Grimberg 2017-05-04 436 return -ECONNREFUSED; ca6e95bb Sagi Grimberg 2017-05-04 437 } ca6e95bb Sagi Grimberg 2017-05-04 438 ibdev = queue->device->dev; 71102307 Christoph Hellwig 2016-07-06 439 71102307 Christoph Hellwig 2016-07-06 440 /* 0b36658c Sagi Grimberg 2017-07-13 441 * Spread I/O queues completion vectors according their queue index. 0b36658c Sagi Grimberg 2017-07-13 442 * Admin queues can always go on completion vector 0. 71102307 Christoph Hellwig 2016-07-06 443 */ 0b36658c Sagi Grimberg 2017-07-13 444 comp_vector = idx == 0 ? idx : idx - 1; 71102307 Christoph Hellwig 2016-07-06 445 71102307 Christoph Hellwig 2016-07-06 446 /* +1 for ib_stop_cq */ ca6e95bb Sagi Grimberg 2017-05-04 447 queue->ib_cq = ib_alloc_cq(ibdev, queue, ca6e95bb Sagi Grimberg 2017-05-04 448 cq_factor * queue->queue_size + 1, ca6e95bb Sagi Grimberg 2017-05-04 449 comp_vector, IB_POLL_SOFTIRQ); 71102307 Christoph Hellwig 2016-07-06 450 if (IS_ERR(queue->ib_cq)) { 71102307 Christoph Hellwig 2016-07-06 451 ret = PTR_ERR(queue->ib_cq); ca6e95bb Sagi Grimberg 2017-05-04 452 goto out_put_dev; 71102307 Christoph Hellwig 2016-07-06 453 } 71102307 Christoph Hellwig 2016-07-06 454 71102307 Christoph Hellwig 2016-07-06 455 ret = nvme_rdma_create_qp(queue, send_wr_factor); 71102307 Christoph Hellwig 2016-07-06 456 if (ret) 71102307 Christoph Hellwig 2016-07-06 457 goto out_destroy_ib_cq; 71102307 Christoph Hellwig 2016-07-06 4
[v4 PATCH] mm: introduce arg_lock to protect arg_start|end and env_start|end in mm_struct
mmap_sem is on the hot path of kernel, and it very contended, but it is abused too. It is used to protect arg_start|end and evn_start|end when reading /proc/$PID/cmdline and /proc/$PID/environ, but it doesn't make sense since those proc files just expect to read 4 values atomically and not related to VM, they could be set to arbitrary values by C/R. And, the mmap_sem contention may cause unexpected issue like below: INFO: task ps:14018 blocked for more than 120 seconds. Tainted: GE 4.9.79-009.ali3000.alios7.x86_64 #1 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. ps D0 14018 1 0x0004 885582f84000 885e8682f000 880972943000 885ebf499bc0 8828ee12 c900349bfca8 817154d0 0040 00ff812f872a 885ebf499bc0 024000d000948300 880972943000 Call Trace: [] ? __schedule+0x250/0x730 [] schedule+0x36/0x80 [] rwsem_down_read_failed+0xf0/0x150 [] call_rwsem_down_read_failed+0x18/0x30 [] down_read+0x20/0x40 [] proc_pid_cmdline_read+0xd9/0x4e0 [] ? do_filp_open+0xa5/0x100 [] __vfs_read+0x37/0x150 [] ? security_file_permission+0x9b/0xc0 [] vfs_read+0x96/0x130 [] SyS_read+0x55/0xc0 [] entry_SYSCALL_64_fastpath+0x1a/0xc5 Both Alexey Dobriyan and Michal Hocko suggested to use dedicated lock for them to mitigate the abuse of mmap_sem. So, introduce a new spinlock in mm_struct to protect the concurrent access to arg_start|end, env_start|end and others, as well as replace write map_sem to read to protect the race condition between prctl and sys_brk which might break check_data_rlimit(), and makes prctl more friendly to other VM operations. This patch just eliminates the abuse of mmap_sem, but it can't resolve the above hung task warning completely since the later access_remote_vm() call needs acquire mmap_sem. The mmap_sem scalability issue will be solved in the future. Signed-off-by: Yang Shi Cc: Alexey Dobriyan Cc: Michal Hocko Cc: Matthew Wilcox Cc: Mateusz Guzik Cc: Cyrill Gorcunov --- v3 --> v4: * Protected values update with down_read + spin_lock to prevent from race condition between prctl and sys_brk and made prctl more friendly to VM operations per Michal's suggestion v2 --> v3: * Restored down_write in prctl syscall * Elaborate the limitation of this patch suggested by Michal * Protect those fields by the new lock except brk and start_brk per Michal's suggestion * Based off Cyrill's non PR_SET_MM_MAP oprations deprecation patch (https://lkml.org/lkml/2018/4/5/541) v1 --> v2: * Use spinlock instead of rwlock per Mattew's suggestion * Replace down_write to down_read in prctl_set_mm (see commit log for details) fs/proc/base.c | 8 include/linux/mm_types.h | 2 ++ kernel/fork.c| 1 + kernel/sys.c | 6 -- mm/init-mm.c | 1 + 5 files changed, 12 insertions(+), 6 deletions(-) diff --git a/fs/proc/base.c b/fs/proc/base.c index eafa39a..3551757 100644 --- a/fs/proc/base.c +++ b/fs/proc/base.c @@ -239,12 +239,12 @@ static ssize_t proc_pid_cmdline_read(struct file *file, char __user *buf, goto out_mmput; } - down_read(&mm->mmap_sem); + spin_lock(&mm->arg_lock); arg_start = mm->arg_start; arg_end = mm->arg_end; env_start = mm->env_start; env_end = mm->env_end; - up_read(&mm->mmap_sem); + spin_unlock(&mm->arg_lock); BUG_ON(arg_start > arg_end); BUG_ON(env_start > env_end); @@ -929,10 +929,10 @@ static ssize_t environ_read(struct file *file, char __user *buf, if (!mmget_not_zero(mm)) goto free; - down_read(&mm->mmap_sem); + spin_lock(&mm->arg_lock); env_start = mm->env_start; env_end = mm->env_end; - up_read(&mm->mmap_sem); + spin_unlock(&mm->arg_lock); while (count > 0) { size_t this_len, max_len; diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index 2161234..49dd59e 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -413,6 +413,8 @@ struct mm_struct { unsigned long exec_vm; /* VM_EXEC & ~VM_WRITE & ~VM_STACK */ unsigned long stack_vm; /* VM_STACK */ unsigned long def_flags; + + spinlock_t arg_lock; /* protect the below fields */ unsigned long start_code, end_code, start_data, end_data; unsigned long start_brk, brk, start_stack; unsigned long arg_start, arg_end, env_start, env_end; diff --git a/kernel/fork.c b/kernel/fork.c index 242c8c9..295f903 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -900,6 +900,7 @@ static struct mm_struct *mm_init(struct mm_struct *mm, struct task_struct *p, mm->pinned_vm = 0; memset(&mm->rss_stat, 0, sizeof(mm->rss_stat)); spin_lock_init(&mm->page_table_lock); + spin_lock_init(&mm->arg_lock); mm_init_cpumask(mm); mm_init_
Re: syzbot dashboard
Coming back to this now that the merge window is almost over ] On Mon, Mar 26, 2018 at 1:46 AM, Dmitry Vyukov wrote: > > I've switched emails to links instead of attachments, here are few > recent examples: > https://lkml.org/lkml/2018/3/25/31 > https://lkml.org/lkml/2018/3/25/256 > https://lkml.org/lkml/2018/3/25/257 Looks good to me. I notice that only the last one got any replies, though. I wonder if some people auto-ignore the new reports because of having been burned by the previous "huge illegible emails" issue. I do see syzbot fixes in rdma, though, just not for that cma_listen_on_all issue. So maybe that bug is nastier. Linus
[GIT PULL V3] Thermal SoC management updates for v4.17-rc1
Hello Linus, Please find thermal-soc changes for v4.17-rc1. Rui asked me to send the pull request directly to you as we are close to the end of the merge window. Essentially this pull removes the series that caused warning regression. I will work with the developer to get that fixed later on, but I am still sending the other few patches that are unrelated to that. Let me know if this causes any issues and can still be pulled. Changelog: - New i.MX7 thermal sensor - Mediatek driver now supports MT7622 SoC - Removal of min max cpu cooling DT property Differences in V3: - Rebased on top current linus/master, to avoid and merge issues from previous pulled thermal code. Differences in V2: - Reordered the patches to drop exynos changes for now until we get agreement on the fix on that driver for the compilation warns caused by the confusing conversion functions. The following changes since commit 48023102b7078a6674516b1fe0d639669336049d: Merge branch 'overlayfs-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs (2018-04-13 16:55:41 -0700) are available in the git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/evalenti/linux-soc-thermal linus for you to fetch changes up to 15a32df1918259be6c23fc36014fc26ee66c836c: dt-bindings: thermal: Remove "cooling-{min|max}-level" properties (2018-04-14 09:37:55 -0700) Anson Huang (1): thermal: imx: add i.MX7 thermal sensor support Bartlomiej Zolnierkiewicz (1): dt-bindings: thermal: remove no longer needed samsung thermal properties Sean Wang (2): dt-bindings: thermal: add binding for MT7622 SoC thermal: mediatek: add support for MT7622 SoC Viresh Kumar (1): dt-bindings: thermal: Remove "cooling-{min|max}-level" properties .../devicetree/bindings/thermal/exynos-thermal.txt | 23 +- .../devicetree/bindings/thermal/imx-thermal.txt| 9 +- .../bindings/thermal/mediatek-thermal.txt | 1 + .../devicetree/bindings/thermal/thermal.txt| 16 +- drivers/thermal/imx_thermal.c | 295 - drivers/thermal/mtk_thermal.c | 35 +++ 6 files changed, 281 insertions(+), 98 deletions(-)
[GIT PULL] Kbuild updates for 4.17 (2nd round)
Hi Linus, Please pull more Kbuild updates for v4.17-rc1. Thanks! The following changes since commit f605ba97fb80522656c7dce9825a908f1e765b57: Merge tag 'vfio-v4.17-rc1' of git://github.com/awilliam/linux-vfio (2018-04-06 19:44:27 -0700) are available in the git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild.git tags/kbuild-v4.17-2 for you to fetch changes up to 17baab68d337a0bf4654091e2b4cd67c3fdb44d8: kconfig: extend output of 'listnewconfig' (2018-04-13 23:23:11 +0900) Kbuild updates for v4.17 (2nd) - pass HOSTLDFLAGS when compiling single .c host programs - build genksyms lexer and parser files instead of using shipped versions - rename *-asn1.[ch] to *.asn1.[ch] for suffix consistency - let the top .gitignore globally ignore artifacts generated by flex, bison, and asn1_compiler - let the top Makefile globally clean artifacts generated by flex, bison, and asn1_compiler - use safer .SECONDARY marker instead of .PRECIOUS to prevent intermediate files from being removed - support -fmacro-prefix-map option to make __FILE__ a relative path - fix # escaping to prepare for the future GNU Make release - clean up deb-pkg by using debian tools instead of handrolled source/changes generation - improve rpm-pkg portability by supporting kernel-install as a fallback of new-kernel-pkg - extend Kconfig listnewconfig target to provide more information Don Zickus (1): kconfig: extend output of 'listnewconfig' Javier Martinez Canillas (1): kbuild: rpm-pkg: use kernel-install as a fallback for new-kernel-pkg Masahiro Yamada (10): .gitignore: move *.lex.c *.tab.[ch] patterns to the top-level .gitignore kbuild: clean up *.lex.c and *.tab.[ch] patterns from top-level Makefile genksyms: generate lexer and parser during build instead of shipping kbuild: add %.lex.c and %.tab.[ch] to 'targets' automatically kbuild: add %.dtb.S and %.dtb to 'targets' automatically .gitignore: move *-asn1.[ch] patterns to the top-level .gitignore kbuild: clean up *-asn1.[ch] patterns from top-level Makefile kbuild: rename *-asn1.[ch] to *.asn1.[ch] kbuild: mark $(targets) as .SECONDARY and remove .PRECIOUS markers kbuild: use -fmacro-prefix-map to make __FILE__ a relative path Rasmus Villemoes (1): Kbuild: fix # escaping in .cmd files for future Make Riku Voipio (1): kbuild: deb-pkg: split generating packaging and build Robin Jarry (1): kbuild: use HOSTLDFLAGS for single .c executables .gitignore |7 +- Makefile|5 + arch/arc/boot/dts/Makefile |2 - arch/arm/crypto/Makefile|2 +- arch/arm64/crypto/Makefile |2 +- arch/sparc/vdso/Makefile|4 +- arch/x86/entry/vdso/Makefile|4 +- crypto/.gitignore |1 - crypto/Makefile | 14 +- crypto/asymmetric_keys/.gitignore |1 - crypto/asymmetric_keys/Makefile | 31 +- crypto/asymmetric_keys/mscode_parser.c |2 +- crypto/asymmetric_keys/pkcs7_parser.c |2 +- crypto/asymmetric_keys/x509_cert_parser.c |4 +- crypto/rsa_helper.c |4 +- drivers/crypto/qat/qat_common/.gitignore|1 - drivers/of/unittest-data/Makefile |6 - net/ipv4/netfilter/Makefile |5 +- net/ipv4/netfilter/nf_nat_snmp_basic_main.c |2 +- scripts/Kbuild.include |5 +- scripts/Makefile.build | 23 +- scripts/Makefile.host |2 +- scripts/Makefile.lib| 31 +- scripts/asn1_compiler.c |2 +- scripts/dtc/.gitignore |3 - scripts/dtc/Makefile|5 - scripts/genksyms/.gitignore |3 - scripts/genksyms/Makefile | 27 +- scripts/genksyms/lex.lex.c_shipped | 2291 scripts/genksyms/parse.tab.c_shipped| 2394 -- scripts/genksyms/parse.tab.h_shipped| 119 -- scripts/kconfig/.gitignore |3 - scripts/kconfig/Makefile|4 +- scripts/kconfig/conf.c | 14 +- scripts/package/Makefile| 34 +- scripts/package/builddeb| 221 +-- scripts/package/mkdebian| 189 +++ scripts/package/mkspec |2 + tools/build/Build.include |5 +- tools/objtool/Makefile |2 +- tools/scripts/Makefile.include |2 + 41 files c
Re: [PATCH v3] IB: make INFINIBAND_ADDR_TRANS configurable
Hi Greg, Thank you for the patch! Yet something to improve: [auto build test ERROR on linus/master] [also build test ERROR on v4.16 next-20180413] [if your patch is applied to the wrong git tree, please drop us a note to help improve the system] url: https://github.com/0day-ci/linux/commits/Greg-Thelen/IB-make-INFINIBAND_ADDR_TRANS-configurable/20180414-234042 config: i386-randconfig-x005-201815 (attached as .config) compiler: gcc-7 (Debian 7.3.0-1) 7.3.0 reproduce: # save the attached .config to linux build tree make ARCH=i386 All errors (new ones prefixed by >>): drivers/nvme/host/rdma.o: In function `nvme_rdma_stop_queue': drivers/nvme/host/rdma.c:554: undefined reference to `rdma_disconnect' drivers/nvme/host/rdma.o: In function `nvme_rdma_free_queue': drivers/nvme/host/rdma.c:570: undefined reference to `rdma_destroy_id' drivers/nvme/host/rdma.o: In function `nvme_rdma_alloc_queue': drivers/nvme/host/rdma.c:511: undefined reference to `__rdma_create_id' drivers/nvme/host/rdma.c:523: undefined reference to `rdma_resolve_addr' drivers/nvme/host/rdma.c:544: undefined reference to `rdma_destroy_id' drivers/nvme/host/rdma.o: In function `nvme_rdma_create_qp': drivers/nvme/host/rdma.c:258: undefined reference to `rdma_create_qp' drivers/nvme/host/rdma.o: In function `nvme_rdma_create_queue_ib': drivers/nvme/host/rdma.c:485: undefined reference to `rdma_destroy_qp' drivers/nvme/host/rdma.o: In function `nvme_rdma_addr_resolved': drivers/nvme/host/rdma.c:1461: undefined reference to `rdma_resolve_route' drivers/nvme/host/rdma.o: In function `nvme_rdma_route_resolved': drivers/nvme/host/rdma.c:1512: undefined reference to `rdma_connect' drivers/nvme/host/rdma.o: In function `nvme_rdma_conn_rejected': drivers/nvme/host/rdma.c:1436: undefined reference to `rdma_reject_msg' drivers/nvme/host/rdma.c:1437: undefined reference to `rdma_consumer_reject_data' drivers/infiniband/ulp/srp/ib_srp.o: In function `srp_create_ch_ib': >> drivers/infiniband/ulp/srp/ib_srp.c:585: undefined reference to >> `rdma_create_qp' >> drivers/infiniband/ulp/srp/ib_srp.c:647: undefined reference to >> `rdma_destroy_qp' drivers/infiniband/ulp/srp/ib_srp.o: In function `srp_disconnect_target': >> drivers/infiniband/ulp/srp/ib_srp.c:977: undefined reference to >> `rdma_disconnect' drivers/infiniband/ulp/srp/ib_srp.o: In function `srp_new_rdma_cm_id': >> drivers/infiniband/ulp/srp/ib_srp.c:336: undefined reference to >> `__rdma_create_id' >> drivers/infiniband/ulp/srp/ib_srp.c:345: undefined reference to >> `rdma_resolve_addr' >> drivers/infiniband/ulp/srp/ib_srp.c:369: undefined reference to >> `rdma_destroy_id' drivers/infiniband/ulp/srp/ib_srp.o: In function `srp_rdma_lookup_path': >> drivers/infiniband/ulp/srp/ib_srp.c:790: undefined reference to >> `rdma_resolve_route' drivers/infiniband/ulp/srp/ib_srp.o: In function `srp_send_req': >> drivers/infiniband/ulp/srp/ib_srp.c:938: undefined reference to >> `rdma_connect' drivers/infiniband/ulp/srp/ib_srp.o: In function `srp_free_ch_ib': drivers/infiniband/ulp/srp/ib_srp.c:677: undefined reference to `rdma_destroy_id' drivers/infiniband/ulp/srp/ib_srp.o: In function `srp_rdma_cm_handler': drivers/infiniband/ulp/srp/ib_srp.c:2808: undefined reference to `rdma_disconnect' vim +585 drivers/infiniband/ulp/srp/ib_srp.c 7dad6b2e Bart Van Assche 2014-10-21 542 509c07bc Bart Van Assche 2014-10-30 543 static int srp_create_ch_ib(struct srp_rdma_ch *ch) aef9ec39 Roland Dreier 2005-11-02 544 { 509c07bc Bart Van Assche 2014-10-30 545 struct srp_target_port *target = ch->target; 62154b2e Bart Van Assche 2014-05-20 546 struct srp_device *dev = target->srp_host->srp_dev; aef9ec39 Roland Dreier 2005-11-02 547 struct ib_qp_init_attr *init_attr; 73aa89ed Ishai Rabinovitz 2012-11-26 548 struct ib_cq *recv_cq, *send_cq; 73aa89ed Ishai Rabinovitz 2012-11-26 549 struct ib_qp *qp; d1b4289e Bart Van Assche 2014-05-20 550 struct ib_fmr_pool *fmr_pool = NULL; 5cfb1782 Bart Van Assche 2014-05-20 551 struct srp_fr_pool *fr_pool = NULL; 509c5f33 Bart Van Assche 2016-05-12 552 const int m = 1 + dev->use_fast_reg * target->mr_per_cmd * 2; aef9ec39 Roland Dreier 2005-11-02 553 int ret; aef9ec39 Roland Dreier 2005-11-02 554 aef9ec39 Roland Dreier 2005-11-02 555 init_attr = kzalloc(sizeof *init_attr, GFP_KERNEL); aef9ec39 Roland Dreier 2005-11-02 556 if (!init_attr) aef9ec39 Roland Dreier 2005-11-02 557 return -ENOMEM; aef9ec39 Roland Dreier 2005-11-02 558 56139
Re: blktest for [PATCH v2] block: do not use interruptible wait anywhere
On 13/04/18 09:31, Johannes Thumshirn wrote: Hi Alan, On Thu, 2018-04-12 at 19:11 +0100, Alan Jenkins wrote: # dd if=/dev/sda of=/dev/null iflag=direct & \ while killall -SIGUSR1 dd; do sleep 0.1; done & \ echo mem > /sys/power/state ; \ sleep 5; killall dd # stop after 5 seconds Can you please also add a regression test to blktests[1] for this? [1] https://github.com/osandov/blktests Thanks, Johannes Good question. It would be nice to promote this test. Template looks like I need the commit (sha1) first. I had some ideas about automating it, so I wrote a standalone (see end). I can automate the wakeup by using pm_test, but this is still a system suspend test. Unfortunately I don't think there's any alternative. To give the most dire example # This test is non-destructive, but it exercises suspend in all drivers. # If your system has a problem with suspend, it might not wake up again. So I'm not sure if it would be acceptable for the default set? How useful is this going to be? Is there an expanded/full set of tests that gets run somewhere? If you can't guarantee it's going to be run somewhere, I'd worry the cost/benefit feels a little narrow :-(. There were one or two further "interesting" details, and it might theoretically bitrot if it's not run periodically. If you look at the diff and title for the fix, I don't think it's at high risk of being reversed unintentionally. And I think you can trust users will notice if the fix gets merged away accidentally, before it hits -stable releases :-). The issue kills the entire GUI session on resume from suspend, say once every three days, on gnome-shell (due to Xwayland). One unfortunate user switched to Xorg only to find that was also affected. I honestly assume the issue applies generally to laptop systems. The only mitigating factor is if you have RAM to spare, so you don't hit the major pagefaults during resume. #!/bin/bash # This test is non-destructive, but it exercises suspend in all drivers. # If your system has a problem with suspend, it might not wake up again. # TEST_DEV must be SCSI (inc. libata). # # Additionally, this test will abort if $TEST_DEV is too tiny # and we finish reading it within 3 seconds. Sorry. TEST_DEV=sda # RATIONALE # # The original root cause issue was the behaviour around blk_queue_freeze(). # It put tasks into an interruptible wait, which is wrong for block devices. # # XXX Insert reference to fix commit XXX # # The freeze feature is not directly exposed to userspace, so I can not test # it directly :(. (It's used to "guarantee no request is in use, so we can # change any data structure of the queue afterward". I.e. freeze, modify the # queue structure, unfreeze). # # However, this lead to a regression with a decent reproducer. In v4.15 the # same interruptible wait was also used for SCSI suspend/resume. SCSI resume # can take a second or so... hence we like to do it asynchronously. This # means we can observe the wait at resume time, and we can test if it is # interruptible. # # Note `echo quiesce > /sys/class/scsi_device/*/device/state` can *not* # trigger the specific wait in the block layer. That code path only # sets the SCSI device state; it does not set any block device state. # (It does not call into blk_queue_freeze() or blk_set_preempt_only(); # it literally just sets sdev->sdev_state to SDEV_QUIESCE). set -o nounset abort() { echo "$*" echo "=== Test ERROR ===" exit 2 } SYSFS_PM_TEST_DELAY=/sys/module/suspend/parameters/pm_test_delay SAVED_PM_TEST_DELAY= # Child process IDs DD= SUBSHELL= cleanup() { # In many cases the subshell will already have exited... # and semantics for `wait` are crappy in shell. # Failure will be harmless in most cases. # Just try to provide enough context for the user to guess. echo "Cleaning up" if [ -n "$SUBSHELL" ]; then echo "Killing sub-shell PID $SUBSHELL..." kill $SUBSHELL wait $SUBSHELL fi if [ -n "$DD" ]; then echo "Killing 'dd' PID $DD..." kill $DD wait $DD fi echo "Resetting pm_test" echo none > /sys/power/pm_test echo "Resetting pm_test_delay" if [ -n "$SAVED_PM_TEST_DELAY" ]; then echo "$SAVED_PM_TEST_DELAY" > "$SYSFS_PM_TEST_DELAY" fi } trap cleanup EXIT # "If a user has disabled async probing a likely reason # is due to a storage enclosure that does not inject # staggered spin-ups. For safety, make resume # synchronous as well in that case." if ! SCAN="$(cat /sys/module/scsi_mod/parameters/scan)"; then abort "error reading '/sys/module/scsi_mod/parameters/scan' ?" fi if [ "$SCAN" != "async" ]; then abort "This test does not work if you have set 'scsi_mod.scan=sync'" fi # Ignore USR1, in the hope that this applies to child processes. # This allows us to safely `kill -USR1 $DD`, when we don't know # whether the child process has fully started yet.
Re: [PATCH net-next] net: introduce a new tracepoint for tcp_rcv_space_adjust
The net-next tree is closed, please resubmit this when the merge window ends and the net-next tree opens back up. Thank you.
Re: blktest for [PATCH v2] block: do not use interruptible wait anywhere
On 4/14/18 1:46 PM, Alan Jenkins wrote: > On 13/04/18 09:31, Johannes Thumshirn wrote: >> Hi Alan, >> >> On Thu, 2018-04-12 at 19:11 +0100, Alan Jenkins wrote: >>> # dd if=/dev/sda of=/dev/null iflag=direct & \ >>>while killall -SIGUSR1 dd; do sleep 0.1; done & \ >>>echo mem > /sys/power/state ; \ >>>sleep 5; killall dd # stop after 5 seconds >> Can you please also add a regression test to blktests[1] for this? >> >> [1] https://github.com/osandov/blktests >> >> Thanks, >> Johannes > > Good question. It would be nice to promote this test. > > Template looks like I need the commit (sha1) first. > > I had some ideas about automating it, so I wrote a standalone (see > end). I can automate the wakeup by using pm_test, but this is still a > system suspend test. Unfortunately I don't think there's any > alternative. To give the most dire example > > # This test is non-destructive, but it exercises suspend in all drivers. > # If your system has a problem with suspend, it might not wake up again. > > > So I'm not sure if it would be acceptable for the default set? > > How useful is this going to be? Is there an expanded/full set of tests > that gets run somewhere? > > If you can't guarantee it's going to be run somewhere, I'd worry the > cost/benefit feels a little narrow :-(. There were one or two further > "interesting" details, and it might theoretically bitrot if it's not run > periodically. I run it, just last week we found two new bugs with it. I'm requiring anyone that submits block patches to run the test suite, and also working towards having it be part of the 0-day runs so it gets run on posted patches automatically. So yes, it's useful and it won't bitrot. Please do turn it into a blktests test. -- Jens Axboe
Re: [PATCH v2] block: do not use interruptible wait anywhere
On 4/12/18 12:11 PM, Alan Jenkins wrote: > When blk_queue_enter() waits for a queue to unfreeze, or unset the > PREEMPT_ONLY flag, do not allow it to be interrupted by a signal. > > The PREEMPT_ONLY flag was introduced later in commit 3a0a529971ec > ("block, scsi: Make SCSI quiesce and resume work reliably"). Note the SCSI > device is resumed asynchronously, i.e. after un-freezing userspace tasks. > > So that commit exposed the bug as a regression in v4.15. A mysterious > SIGBUS (or -EIO) sometimes happened during the time the device was being > resumed. Most frequently, there was no kernel log message, and we saw Xorg > or Xwayland killed by SIGBUS.[1] > > [1] E.g. https://bugzilla.redhat.com/show_bug.cgi?id=1553979 > > Without this fix, I get an IO error in this test: > > # dd if=/dev/sda of=/dev/null iflag=direct & \ > while killall -SIGUSR1 dd; do sleep 0.1; done & \ > echo mem > /sys/power/state ; \ > sleep 5; killall dd # stop after 5 seconds > > The interruptible wait was added to blk_queue_enter in > commit 3ef28e83ab15 ("block: generic request_queue reference counting"). > Before then, the interruptible wait was only in blk-mq, but I don't think > it could ever have been correct. Applied, thanks. Still want that test in blktests, though! -- Jens Axboe
Yes Yes Yes Yes
Hello Dear, I am Mr. Gervase Emmanuel, executive office holder, general operation and regional accountant of Royal Bank of Scotland Plc, London United Kingdom. I believe it is the wish of God for me to come across you today. Also, I hope that you will not expose or betray this trust and confident that I am about to impose on you. I have been in search of someone with this same last name, so when I saw your name, I was pushed to contact you for our mutual benefit. One of our customer from your country had a fixed deposit account with our bank in 2004 that valued the Sum of £7,100,000.00 (Seven million, one hundred thousand British pounds). The maturity date for this deposit was on 2007; unfortunately he was among the death victims of the May 26, 2006 earthquake disaster in Jawa, Indonesia that killed about 5,782 people. He was on a business trip in Indonesia during this disaster that end up his life. Being single, he did not state any next of kin Heir-apparent when the account was opened, although as his account officer, he told me that he will later forward one of his relative’s names as his next of kin Heir to the account which he did not fulfilled before he met his death. Since then, I am searching for someone from your country with similar name. I was happy when I saw your name and I am now seeking for your co-operation to present you as the next of kin Heir to this account hence you have similar last name with the deceased. Do not be afraid, there is no risk involved and every legitimate arrangement to perfect this deal has been put in place. For your involvement in this deal, you will receive 45% of the total amount after the money is transfer to your bank account. Also for confidentiality in this transaction i will like us to keep via email for now. Should you consider this offer interesting, kindly send me the below information of yours completely. Your complete name: Your full contact address: Your direct mobile phone number: Your major occupation: Your Age: I look forward to hear from you to enable me give you more details about this fund, and please reply me through private email: gervaseemma...@myself.com Thanks in anticipation of your urgent response. Best Regards Mr. Gervase Emmanuel
repeatable boot randomness inside KVM guest
SLAB allocators got CONFIG_SLAB_FREELIST_RANDOM option which randomizes allocation pattern inside a slab: #ifdef CONFIG_SLAB_FREELIST_RANDOM /* Pre-initialize the random sequence cache */ static int init_cache_random_seq(struct kmem_cache *s) { ... Then I printed actual random sequences for each kmem cache. Turned out they were all the same for most of the caches and they didn't vary across guest reboots. int cache_random_seq_create(struct kmem_cache *cachep, unsigned int count, gfp_t gfp) { ... /* Get best entropy at this stage of boot */ prandom_seed_state(&state, get_random_long()); Then I searched internet and turned out KVM can pass randomness via virtio-rng or something. So I linked /dev/urandom. And it didn't help! The only way to get randomness for SLAB is to enable RDRAND inside guest. Is it KVM bug? For the record I'm using qemu 2.11.1-r2 and whatever F27 ships now.
Request for Quotation
Hello, Good day, I am Mohammed, Our company is interested in your product. We have gone through your product site online and wish to make order of your product. Please do send us details of your products and company to our {email} Also provide with the recent price We await your response with quotation and specification. [1] Payment terms [2] And your products Warranty (3] Minimum Order Quantity Mohammed /Purchasing Manager Telephone: +966 3 867 1902 Fax: +966 3 867 3435 tr.export.imp...@outlook.com PAN TRADING EQUIPMENT'S WORLDWIDE Address: Dallah street, Al Rehab Saudi Arabia
[PATCH] ipc: Adding new return type vm_fault_t
Use new return type vm_fault_t for fault handler. Signed-off-by: Souptick Joarder Reviewed-by: Matthew Wilcox --- ipc/shm.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/ipc/shm.c b/ipc/shm.c index 4643865..2ba0cfc 100644 --- a/ipc/shm.c +++ b/ipc/shm.c @@ -378,7 +378,7 @@ void exit_shm(struct task_struct *task) up_write(&shm_ids(ns).rwsem); } -static int shm_fault(struct vm_fault *vmf) +static vm_fault_t shm_fault(struct vm_fault *vmf) { struct file *file = vmf->vma->vm_file; struct shm_file_data *sfd = shm_file_data(file); -- 1.9.1
[PATCH] kernel: event: core: Change return type to vm_fault_t
Use new return type vm_fault_t for fault handler and page_mkwrite handler in struct vm_operations_struct. Signed-off-by: Souptick Joarder Reviewed-by: Matthew Wilcox --- kernel/events/core.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/kernel/events/core.c b/kernel/events/core.c index 96db9ae..d09f1c4 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -4918,11 +4918,11 @@ void perf_event_update_userpage(struct perf_event *event) } EXPORT_SYMBOL_GPL(perf_event_update_userpage); -static int perf_mmap_fault(struct vm_fault *vmf) +static vm_fault_t perf_mmap_fault(struct vm_fault *vmf) { struct perf_event *event = vmf->vma->vm_file->private_data; struct ring_buffer *rb; - int ret = VM_FAULT_SIGBUS; + vm_fault_t ret = VM_FAULT_SIGBUS; if (vmf->flags & FAULT_FLAG_MKWRITE) { if (vmf->pgoff == 0) -- 1.9.1
[PATCH] kernel: relay: Change return type to vm_fault_t
Use new return type vm_fault_t for fault handler in struct vm_operations_struct. Signed-off-by: Souptick Joarder Reviewed-by: Matthew Wilcox --- kernel/relay.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/kernel/relay.c b/kernel/relay.c index c302940..a8cdbf7 100644 --- a/kernel/relay.c +++ b/kernel/relay.c @@ -39,7 +39,7 @@ static void relay_file_mmap_close(struct vm_area_struct *vma) /* * fault() vm_op implementation for relay file mapping. */ -static int relay_buf_fault(struct vm_fault *vmf) +static vm_fault_t relay_buf_fault(struct vm_fault *vmf) { struct page *page; struct rchan_buf *buf = vmf->vma->vm_private_data; -- 1.9.1
Re: [PATCH] x86/cpufeature: guard asm_volatile_goto usage with CC_HAVE_ASM_GOTO
On 4/14/18 3:11 AM, Peter Zijlstra wrote: On Fri, Apr 13, 2018 at 01:42:14PM -0700, Alexei Starovoitov wrote: On 4/13/18 11:19 AM, Peter Zijlstra wrote: On Tue, Apr 10, 2018 at 02:28:04PM -0700, Alexei Starovoitov wrote: Instead of #ifdef CC_HAVE_ASM_GOTO we can replace it with #ifndef __BPF__ or some other name, I would prefer the BPF specific hack; otherwise we might be encouraging people to build the kernel proper without asm-goto. I don't understand this concern. The thing is; this will be a (temporary) BPF specific hack. Hiding it behind something that looks 'normal' (CC_HAVE_ASM_GOTO) is just not right. This is a fair concern. I will use a different macro and send v2 soon. Thanks.
[PATCH 0/3] Receive Side Coalescing for macb driver
This patch series adds support for receive side coalescing for Cadence GEM driver. Receive segmentation coalescing is a mechanism to reduce CPU overhead. This is done by coalescing received TCP message segments together into a single large message. This means that when the message is complete the CPU only has to process the single header and act upon the one data payload. Rafal Ozieblo (3): net: macb: Add support for rsc capable hardware net: macb: Add support for header data spliting net: macb: Receive Side Coalescing (RSC) feature added. drivers/net/ethernet/cadence/macb.h | 21 +++ drivers/net/ethernet/cadence/macb_main.c | 227 ++- 2 files changed, 212 insertions(+), 36 deletions(-) -- 2.4.5
[PATCH 1/3] net: macb: Add support for rsc capable hardware
When the pbuf_rsc has been enabled in hardware the receive buffer offset for incoming packets cannot be changed in the network configuration register (even when rsc is not use at all). Signed-off-by: Rafal Ozieblo --- drivers/net/ethernet/cadence/macb.h | 2 ++ drivers/net/ethernet/cadence/macb_main.c | 22 ++ 2 files changed, 20 insertions(+), 4 deletions(-) diff --git a/drivers/net/ethernet/cadence/macb.h b/drivers/net/ethernet/cadence/macb.h index 8665982..33c9a48 100644 --- a/drivers/net/ethernet/cadence/macb.h +++ b/drivers/net/ethernet/cadence/macb.h @@ -477,6 +477,8 @@ /* Bitfields in DCFG6. */ #define GEM_PBUF_LSO_OFFSET27 #define GEM_PBUF_LSO_SIZE 1 +#define GEM_PBUF_RSC_OFFSET26 +#define GEM_PBUF_RSC_SIZE 1 #define GEM_DAW64_OFFSET 23 #define GEM_DAW64_SIZE 1 diff --git a/drivers/net/ethernet/cadence/macb_main.c b/drivers/net/ethernet/cadence/macb_main.c index b4c9268..43201a8 100644 --- a/drivers/net/ethernet/cadence/macb_main.c +++ b/drivers/net/ethernet/cadence/macb_main.c @@ -930,8 +930,9 @@ static void gem_rx_refill(struct macb_queue *queue) macb_set_addr(bp, desc, paddr); desc->ctrl = 0; - /* properly align Ethernet header */ - skb_reserve(skb, NET_IP_ALIGN); + if (!(bp->dev->hw_features & NETIF_F_LRO)) + /* properly align Ethernet header */ + skb_reserve(skb, NET_IP_ALIGN); } else { desc->addr &= ~MACB_BIT(RX_USED); desc->ctrl = 0; @@ -2110,7 +2111,13 @@ static void macb_init_hw(struct macb *bp) config = macb_mdc_clk_div(bp); if (bp->phy_interface == PHY_INTERFACE_MODE_SGMII) config |= GEM_BIT(SGMIIEN) | GEM_BIT(PCSSEL); - config |= MACB_BF(RBOF, NET_IP_ALIGN); /* Make eth data aligned */ + /* When the pbuf_rsc has been enabled in hardware the receive buffer +* offset cannot be changed in the network configuration register. +*/ + if (!(bp->dev->hw_features & NETIF_F_LRO)) + /* Make eth data aligned */ + config |= MACB_BF(RBOF, NET_IP_ALIGN); + config |= MACB_BIT(PAE);/* PAuse Enable */ config |= MACB_BIT(DRFCS); /* Discard Rx FCS */ if (bp->caps & MACB_CAPS_JUMBO) @@ -2281,7 +2288,7 @@ static void macb_set_rx_mode(struct net_device *dev) static int macb_open(struct net_device *dev) { struct macb *bp = netdev_priv(dev); - size_t bufsz = dev->mtu + ETH_HLEN + ETH_FCS_LEN + NET_IP_ALIGN; + size_t bufsz = dev->mtu + ETH_HLEN + ETH_FCS_LEN; struct macb_queue *queue; unsigned int q; int err; @@ -2295,6 +2302,9 @@ static int macb_open(struct net_device *dev) if (!dev->phydev) return -EAGAIN; + if (!(bp->dev->hw_features & NETIF_F_LRO)) + bufsz += NET_IP_ALIGN; + /* RX buffers initialization */ macb_init_rx_buffer_size(bp, bufsz); @@ -3365,6 +3375,10 @@ static int macb_init(struct platform_device *pdev) if (GEM_BFEXT(PBUF_LSO, gem_readl(bp, DCFG6))) dev->hw_features |= MACB_NETIF_LSO; + /* Check RSC capability */ + if (GEM_BFEXT(PBUF_RSC, gem_readl(bp, DCFG6))) + dev->hw_features |= NETIF_F_LRO; + /* Checksum offload is only available on gem with packet buffer */ if (macb_is_gem(bp) && !(bp->caps & MACB_CAPS_FIFO_MODE)) dev->hw_features |= NETIF_F_HW_CSUM | NETIF_F_RXCSUM; -- 2.4.5
[PATCH 2/3] net: macb: Add support for header data spliting
This patch adds support for frames splited between many rx buffers. Header data spliting can be used but also buffers shorter than max frame length. The only limitation is that frame header can't be splited. Signed-off-by: Rafal Ozieblo --- drivers/net/ethernet/cadence/macb.h | 13 +++ drivers/net/ethernet/cadence/macb_main.c | 137 +++ 2 files changed, 118 insertions(+), 32 deletions(-) diff --git a/drivers/net/ethernet/cadence/macb.h b/drivers/net/ethernet/cadence/macb.h index 33c9a48..a2cb805 100644 --- a/drivers/net/ethernet/cadence/macb.h +++ b/drivers/net/ethernet/cadence/macb.h @@ -295,6 +295,8 @@ /* Bitfields in DMACFG. */ #define GEM_FBLDO_OFFSET 0 /* fixed burst length for DMA */ #define GEM_FBLDO_SIZE 5 +#define GEM_HDRS_OFFSET5 /* Header Data Splitting */ +#define GEM_HDRS_SIZE 1 #define GEM_ENDIA_DESC_OFFSET 6 /* endian swap mode for management descriptor access */ #define GEM_ENDIA_DESC_SIZE1 #define GEM_ENDIA_PKT_OFFSET 7 /* endian swap mode for packet data access */ @@ -755,8 +757,12 @@ struct gem_tx_ts { #define MACB_RX_SOF_SIZE 1 #define MACB_RX_EOF_OFFSET 15 #define MACB_RX_EOF_SIZE 1 +#define MACB_RX_HDR_OFFSET 16 +#define MACB_RX_HDR_SIZE 1 #define MACB_RX_CFI_OFFSET 16 #define MACB_RX_CFI_SIZE 1 +#define MACB_RX_EOH_OFFSET 17 +#define MACB_RX_EOH_SIZE 1 #define MACB_RX_VLAN_PRI_OFFSET17 #define MACB_RX_VLAN_PRI_SIZE 3 #define MACB_RX_PRI_TAG_OFFSET 20 @@ -1086,6 +1092,11 @@ struct tsu_incr { u32 ns; }; +struct rx_frag_list { + struct sk_buff *skb_head; + struct sk_buff *skb_tail; +}; + struct macb_queue { struct macb *bp; int irq; @@ -1121,6 +1132,8 @@ struct macb_queue { unsigned inttx_ts_head, tx_ts_tail; struct gem_tx_tstx_timestamps[PTP_TS_BUFFER_SIZE]; #endif + struct rx_frag_list rx_frag; + u32 rx_frag_len; }; struct ethtool_rx_fs_item { diff --git a/drivers/net/ethernet/cadence/macb_main.c b/drivers/net/ethernet/cadence/macb_main.c index 43201a8..27c406c 100644 --- a/drivers/net/ethernet/cadence/macb_main.c +++ b/drivers/net/ethernet/cadence/macb_main.c @@ -967,6 +967,13 @@ static void discard_partial_frame(struct macb_queue *queue, unsigned int begin, */ } +void gem_reset_rx_state(struct macb_queue *queue) +{ + queue->rx_frag.skb_head = NULL; + queue->rx_frag.skb_tail = NULL; + queue->rx_frag_len = 0; +} + static int gem_rx(struct macb_queue *queue, int budget) { struct macb *bp = queue->bp; @@ -977,6 +984,9 @@ static int gem_rx(struct macb_queue *queue, int budget) int count = 0; while (count < budget) { + struct sk_buff *skb_head, *skb_tail; + bool eoh = false, header = false; + bool sof, eof; u32 ctrl; dma_addr_t addr; bool rxused; @@ -995,57 +1005,118 @@ static int gem_rx(struct macb_queue *queue, int budget) break; queue->rx_tail++; - count++; - - if (!(ctrl & MACB_BIT(RX_SOF) && ctrl & MACB_BIT(RX_EOF))) { + skb = queue->rx_skbuff[entry]; + if (unlikely(!skb)) { netdev_err(bp->dev, - "not whole frame pointed by descriptor\n"); + "inconsistent Rx descriptor chain\n"); bp->dev->stats.rx_dropped++; queue->stats.rx_dropped++; break; } - skb = queue->rx_skbuff[entry]; - if (unlikely(!skb)) { + skb_head = queue->rx_frag.skb_head; + skb_tail = queue->rx_frag.skb_tail; + sof = !!(ctrl & MACB_BIT(RX_SOF)); + eof = !!(ctrl & MACB_BIT(RX_EOF)); + if (GEM_BFEXT(HDRS, gem_readl(bp, DMACFG))) { + eoh = !!(ctrl & MACB_BIT(RX_EOH)); + if (!eof) + header = !!(ctrl & MACB_BIT(RX_HDR)); + } + + queue->rx_skbuff[entry] = NULL; + /* Discard if out-of-sequence or header split across buffers */ + if ((!skb_head /* first frame buffer */ + && (!sof /* without start of frame */ + || (header && !eoh))) /* or without whole header */ + || (skb_head && sof)) { /* or new start before EOF */ + struct sk_buff *tmp_skb; + netdev_err(bp->dev, -
Inbox SMTP, Inbox Webmail, I Sell Sure Spamming Toolz
I Sell Sure Spamming Toolz What we have on Stock Daily Inbox Webmail Inbox SMTP Fresh USA email leads Fresh Canada email leads Fresh Loan email leads Fresh Business emails leads Real Eastate email leads Conference delegates email leads Fresh Job Seaker emails cPanel HTTP and HTTPs Shell Zip/Unzipp Mailer RDP All ScamPages Bank ScamPage Add me on whatsapp or call me Watsapp: +2348107268246 Only Real buyers
[PATCH 3/3] net: macb: Receive Side Coalescing (RSC) feature added.
This is basically the same as Large Receive Offload (LRO) in Linux framework. Signed-off-by: Rafal Ozieblo --- drivers/net/ethernet/cadence/macb.h | 6 +++ drivers/net/ethernet/cadence/macb_main.c | 70 +++- 2 files changed, 75 insertions(+), 1 deletion(-) diff --git a/drivers/net/ethernet/cadence/macb.h b/drivers/net/ethernet/cadence/macb.h index a2cb805..9ebdde7 100644 --- a/drivers/net/ethernet/cadence/macb.h +++ b/drivers/net/ethernet/cadence/macb.h @@ -83,6 +83,7 @@ #define GEM_USRIO 0x000c /* User IO */ #define GEM_DMACFG 0x0010 /* DMA Configuration */ #define GEM_JML0x0048 /* Jumbo Max Length */ +#define GEM_RSC0x0058 /* RSC Control */ #define GEM_HRB0x0080 /* Hash Bottom */ #define GEM_HRT0x0084 /* Hash Top */ #define GEM_SA1B 0x0088 /* Specific1 Bottom */ @@ -318,6 +319,11 @@ #define GEM_ADDR64_OFFSET 30 /* Address bus width - 64b or 32b */ #define GEM_ADDR64_SIZE1 +/* Bitfields in RSC control */ +#define GEM_RSCCTRL_OFFSET 1 /* RSC control */ +#define GEM_RSCCTRL_SIZE 15 +#define GEM_CLRMSK_OFFSET 16 /* RSC clear mask */ +#define GEM_CLRMSK_SIZE1 /* Bitfields in NSR */ #define MACB_NSR_LINK_OFFSET 0 /* pcs_link_state */ diff --git a/drivers/net/ethernet/cadence/macb_main.c b/drivers/net/ethernet/cadence/macb_main.c index 27c406c..92bdcf1 100644 --- a/drivers/net/ethernet/cadence/macb_main.c +++ b/drivers/net/ethernet/cadence/macb_main.c @@ -2377,6 +2377,8 @@ static int macb_open(struct net_device *dev) if (!(bp->dev->hw_features & NETIF_F_LRO)) bufsz += NET_IP_ALIGN; + else + bufsz = 0xFF * 64; // For RSC Buffer Sizes must be set to 16K. /* RX buffers initialization */ macb_init_rx_buffer_size(bp, bufsz); @@ -2801,6 +2803,62 @@ static int macb_get_ts_info(struct net_device *netdev, return ethtool_op_get_ts_info(netdev, info); } +static void gem_enable_hdr_data_split(struct macb *bp, bool enable) +{ + u32 dmacfg; + + dmacfg = gem_readl(bp, DMACFG); + if (enable) + dmacfg |= GEM_BIT(HDRS); + else + dmacfg &= ~GEM_BIT(HDRS); + gem_writel(bp, DMACFG, dmacfg); +} + +static void gem_update_rsc_state(struct macb *bp, netdev_features_t feature) +{ + u32 rsc_control, rsc_control_new, queue, rsc; + bool enable, jumbo, any_enabled = false; + struct ethtool_rx_fs_item *item; + unsigned long flags; + u32 ncfgr; + + enable = (!!(feature & NETIF_F_NTUPLE) && !!(feature & NETIF_F_LRO)); + rsc = gem_readl(bp, RSC); + rsc_control = GEM_BFEXT(RSCCTRL, rsc); + rsc_control_new = 0; + if (enable) { + list_for_each_entry(item, &bp->rx_fs_list.list, list) { + queue = item->fs.ring_cookie; + rsc_control_new |= (1 << (queue - 1)); + any_enabled = true; + netdev_dbg(bp->dev, "RSC %sabled for queue %u\n", + enable ? "en" : "dis", queue); + } + } + if (rsc_control_new != rsc_control) { + rsc = GEM_BFINS(RSCCTRL, rsc_control_new, rsc); + gem_writel(bp, RSC, rsc); + } + if (bp->caps & MACB_CAPS_JUMBO) { + /* Don't enable jumbo mode for RSC: +* disable unless not RSC and large MTU +*/ + ncfgr = gem_readl(bp, NCFGR); + enable = !any_enabled; + jumbo = !!MACB_BFEXT(JFRAME, ncfgr); + /* and don't touch if already in the state we want */ + if ((jumbo && !enable) || (!jumbo && enable)) { + ncfgr = MACB_BFINS(JFRAME, enable, ncfgr); + spin_lock_irqsave(&bp->lock, flags); + gem_writel(bp, NCFGR, ncfgr); + spin_unlock_irqrestore(&bp->lock, flags); + } + } + /* Need to enable header-data splitting also */ + gem_enable_hdr_data_split(bp, any_enabled); +} + static void gem_enable_flow_filters(struct macb *bp, bool enable) { struct ethtool_rx_fs_item *item; @@ -2969,6 +3027,8 @@ static int gem_add_flow_filter(struct net_device *netdev, if (netdev->features & NETIF_F_NTUPLE) gem_enable_flow_filters(bp, 1); + /* enable RSC if LRO & NTUPLE on */ + gem_update_rsc_state(bp, netdev->features); spin_unlock_irqrestore(&bp->rx_fs_lock, flags); return 0; @@ -3009,6 +3069,7 @@ static int gem_del_flow_filter(struct net_device *netdev, return 0; } } + gem_update_rsc_state(bp, netdev->features); spin_unlock_irqrestore(&bp->rx_fs_lock, flags); return -EINVAL; @@
Re: [PATCH] fs/dcache.c: re-add cond_resched() in shrink_dcache_parent()
On Sat, Apr 14, 2018 at 09:36:23AM -0700, Linus Torvalds wrote: > But it does *not* make sense for the case where we've hit a dentry > that is already on the shrink list. Sure, we'll continue to gather all > the other dentries, but if there is concurrent shrinking, shouldn't we > give up the CPU more eagerly - *particularly* if somebody else is > waiting (it might be the other process that actually gets rid of the > shrinking dentries!)? > > So my gut feel is that we should at least try doing something like > this in select_collect(): > > - if (!list_empty(&data->dispose)) > + if (data->found) > ret = need_resched() ? D_WALK_QUIT : D_WALK_NORETRY; > > because even if we haven't actually been able to shrink something, if > we hit an already shrinking entry we should probably at least not do > the "retry for rename". And if we actually are going to reschedule, we > might as well start from the beginning. > > I realize that *this* thread might not be making any actual progress > (because it didn't find any dentries to shrink), but since it did find > _a_ dentry that is being shrunk, we know the operation itself - on a > bigger scale - is making progress. > > Hmm? That breaks d_invalidate(), unfortunately. Look at the termination conditions in the loop there...
Re: [PATCH] checkpatch: Add a --strict test for structs with bool member definitions
On Wed, 11 Apr 2018, Joe Perches wrote: > On Thu, 2018-04-12 at 08:22 +0200, Julia Lawall wrote: > > On Wed, 11 Apr 2018, Joe Perches wrote: > > > On Wed, 2018-04-11 at 09:29 -0700, Andrew Morton wrote: > > > > We already have some 500 bools-in-structs > > > > > > I got at least triple that only in include/ > > > so I expect there are at probably an order > > > of magnitude more than 500 in the kernel. > > > > > > I suppose some cocci script could count the > > > actual number of instances. A regex can not. > > > > I got 12667. > > Could you please post the cocci script? > > > I'm not sure to understand the issue. Will using a bitfield help if there > > are no other bitfields in the structure? > > IMO, not really. > > The primary issue is described by Linus here: > https://lkml.org/lkml/2017/11/21/384 > > I personally do not find a significant issue with > uncontrolled sizes of bool in kernel structs as > all of the kernel structs are transitory and not > written out to storage. > > I suppose bool bitfields are also OK, but for the > RMW required. > > Using unsigned int :1 bitfield instead of bool :1 > has the negative of truncation so that the uint > has to be set with !! instead of a simple assign. At least with gcc 5.4.0, a number of structures become larger with unsigned int :1. bool:1 seems to mostly solve this problem. The structure ichx_desc, defined in drivers/gpio/gpio-ich.c seems to become larger with both approaches. julia
Re: 4.15.14 crash with iscsi target and dvd
Ming Lei wrote: > On Thu, Apr 12, 2018 at 09:43:02PM -0400, Wakko Warner wrote: > > Ming Lei wrote: > > > On Tue, Apr 10, 2018 at 08:45:25PM -0400, Wakko Warner wrote: > > > > Sorry for the delay. I reverted my change, added this one. I didn't > > > > reboot, I just unloaded and loaded this one. > > > > Note: /dev/sr1 as seen from the initiator is /dev/sr0 (physical disc) > > > > on the > > > > target. > > > > > > > > Doesn't crash, however on the initiator I see this: > > > > [9273849.70] ISO 9660 Extensions: RRIP_1991A > > > > [9273863.359718] scsi_io_completion: 13 callbacks suppressed > > > > [9273863.359788] sr 26:0:0:0: [sr1] tag#1 UNKNOWN(0x2003) Result: > > > > hostbyte=0x00 driverbyte=0x08 > > > > [9273863.359909] sr 26:0:0:0: [sr1] tag#1 Sense Key : 0x2 [current] > > > > [9273863.359974] sr 26:0:0:0: [sr1] tag#1 ASC=0x8 ASCQ=0x0 > > > > [9273863.360036] sr 26:0:0:0: [sr1] tag#1 CDB: opcode=0x28 28 00 00 22 > > > > f6 96 00 00 80 00 > > > > [9273863.360116] blk_update_request: 13 callbacks suppressed > > > > [9273863.360177] blk_update_request: I/O error, dev sr1, sector 9165400 > > > > [9273875.864648] sr 26:0:0:0: [sr1] tag#1 UNKNOWN(0x2003) Result: > > > > hostbyte=0x00 driverbyte=0x08 > > > > [9273875.864738] sr 26:0:0:0: [sr1] tag#1 Sense Key : 0x2 [current] > > > > [9273875.864801] sr 26:0:0:0: [sr1] tag#1 ASC=0x8 ASCQ=0x0 > > > > [9273875.864890] sr 26:0:0:0: [sr1] tag#1 CDB: opcode=0x28 28 00 00 22 > > > > f7 16 00 00 80 00 > > > > [9273875.864971] blk_update_request: I/O error, dev sr1, sector 9165912 > > > > > > > > To cause this, I mounted the dvd as seen in the first line and ran this > > > > command: find /cdrom2 -type f | xargs -tn1 cat > /dev/null > > > > I did some various tests. Each test was done after umount and mount to > > > > clear the cache. > > > > cat > /dev/null causes the message. > > > > dd if= of=/dev/null bs=2048 doesn't > > > > using bs=4096 doesn't > > > > using bs=64k doesn't > > > > using bs=128k does > > > > cat uses a blocksize of 128k. > > > > > > > > The following was done without being mounted. > > > > ddrescue -f -f /dev/sr1 /dev/null > > > > doesn't cause the message > > > > dd if=/dev/sr1 of=/dev/null bs=128k > > > > doesn't cause the message > > > > using bs=256k causes the message once: > > > > [9275916.857409] sr 27:0:0:0: [sr1] tag#0 UNKNOWN(0x2003) Result: > > > > hostbyte=0x00 driverbyte=0x08 > > > > [9275916.857482] sr 27:0:0:0: [sr1] tag#0 Sense Key : 0x2 [current] > > > > [9275916.857520] sr 27:0:0:0: [sr1] tag#0 ASC=0x8 ASCQ=0x0 > > > > [9275916.857556] sr 27:0:0:0: [sr1] tag#0 CDB: opcode=0x28 28 00 00 00 > > > > 00 00 00 00 80 00 > > > > [9275916.857614] blk_update_request: I/O error, dev sr1, sector 0 > > > > > > > > If I access the disc from the target natively either by mounting and > > > > accessing files or working with the device directly (ie dd) no errors > > > > are > > > > logged on the target. > > > > > > OK, thanks for your test. > > > > > > Could you test the following patch and see if there is still the failure > > > message? > > > > > > diff --git a/drivers/target/target_core_pscsi.c > > > b/drivers/target/target_core_pscsi.c > > > index 0d99b242e82e..6137287b52fb 100644 > > > --- a/drivers/target/target_core_pscsi.c > > > +++ b/drivers/target/target_core_pscsi.c > > > @@ -913,9 +913,11 @@ pscsi_map_sg(struct se_cmd *cmd, struct scatterlist > > > *sgl, u32 sgl_nents, > > > > > > rc = bio_add_pc_page(pdv->pdv_sd->request_queue, > > > bio, page, bytes, off); > > > + if (rc != bytes) > > > + goto fail; > > > pr_debug("PSCSI: bio->bi_vcnt: %d nr_vecs: %d\n", > > > bio_segments(bio), nr_vecs); > > > - if (rc != bytes) { > > > + if (/*rc != bytes*/0) { > > > pr_debug("PSCSI: Reached bio->bi_vcnt max:" > > > " %d i: %d bio: %p, allocating another" > > > " bio\n", bio->bi_vcnt, i, bio); > > > > Target doesn't crash but the errors on the initiator are still there. > > OK, then this error log isn't related with my commit, because the patch > I sent to you in last email is to revert my commit simply. > > But the following patch is one correct fix for your crash. > > https://marc.info/?l=linux-kernel&m=152331690727052&w=2 Ok, that'll be the one I used. Do you know when it'll go upstream? -- Microsoft has beaten Volkswagen's world record. Volkswagen only created 22 million bugs.
Re: [PATCH] fs/dcache.c: re-add cond_resched() in shrink_dcache_parent()
On Sat, Apr 14, 2018 at 1:58 PM, Al Viro wrote: > > That breaks d_invalidate(), unfortunately. Look at the termination > conditions in the loop there... Ugh. I was going to say "but that doesn't even use select_collect()", but yeah, detach_and_collect() calls it. It would be easy enough to just change the if (!list_empty(&data.select.dispose)) there to if (!list_empty(&data.select.found)) too. In fact, it probably *should* do that, exactly to get the whole "cond_resched()" call in that whole call chain too. Because as-is, it looks like it has the same issue as shrink_dcache_parent() does.. But yeah, the fact that I didn't notice that makes me a bit nervous. But now I triple-checked, there are no other indirect callers. Linus
[PATCH] KVM: Switch 'requests' to be 64-bit (explicitly)
Switch 'requests' to be explicitly 64-bit and update BUILD_BUG_ON check to use the size of "requests" instead of the hard-coded '32'. That gives us a bit more room again for arch-specific requests as we already ran out of space for x86 due to the hard-coded check. Cc: Paolo Bonzini Cc: Radim Krčmář Cc: k...@vger.kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: KarimAllah Ahmed --- include/linux/kvm_host.h | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index 6930c63..fe4f46b 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -129,7 +129,7 @@ static inline bool is_error_page(struct page *page) #define KVM_REQUEST_ARCH_BASE 8 #define KVM_ARCH_REQ_FLAGS(nr, flags) ({ \ - BUILD_BUG_ON((unsigned)(nr) >= 32 - KVM_REQUEST_ARCH_BASE); \ + BUILD_BUG_ON((unsigned)(nr) >= (sizeof(((struct kvm_vcpu *)0)->requests) * 8) - KVM_REQUEST_ARCH_BASE); \ (unsigned)(((nr) + KVM_REQUEST_ARCH_BASE) | (flags)); \ }) #define KVM_ARCH_REQ(nr) KVM_ARCH_REQ_FLAGS(nr, 0) @@ -223,7 +223,7 @@ struct kvm_vcpu { int vcpu_id; int srcu_idx; int mode; - unsigned long requests; + u64 requests; unsigned long guest_debug; int pre_pcpu; @@ -1122,7 +1122,7 @@ static inline void kvm_make_request(int req, struct kvm_vcpu *vcpu) * caller. Paired with the smp_mb__after_atomic in kvm_check_request. */ smp_wmb(); - set_bit(req & KVM_REQUEST_MASK, &vcpu->requests); + set_bit(req & KVM_REQUEST_MASK, (void *)&vcpu->requests); } static inline bool kvm_request_pending(struct kvm_vcpu *vcpu) @@ -1132,12 +1132,12 @@ static inline bool kvm_request_pending(struct kvm_vcpu *vcpu) static inline bool kvm_test_request(int req, struct kvm_vcpu *vcpu) { - return test_bit(req & KVM_REQUEST_MASK, &vcpu->requests); + return test_bit(req & KVM_REQUEST_MASK, (void *)&vcpu->requests); } static inline void kvm_clear_request(int req, struct kvm_vcpu *vcpu) { - clear_bit(req & KVM_REQUEST_MASK, &vcpu->requests); + clear_bit(req & KVM_REQUEST_MASK, (void *)&vcpu->requests); } static inline bool kvm_check_request(int req, struct kvm_vcpu *vcpu) -- 2.7.4
Re: Regression with 5dcd8400884c ("macsec: missing dev_put() on error in macsec_newlink()")
Hello Laura, 2018-04-14, 10:56:55 -0700, Laura Abbott wrote: > Hi, > > Fedora got a bug report of a regression when trying to remove the > the macsec module (https://bugzilla.redhat.com/show_bug.cgi?id=1566410). > I did a bisect and found > > commit 5dcd8400884cc4a043a6d4617e042489e5d566a9 > Author: Dan Carpenter > Date: Wed Mar 21 11:09:01 2018 +0300 > > macsec: missing dev_put() on error in macsec_newlink() > We moved the dev_hold(real_dev); call earlier in the function but forgot > to update the error paths. > Fixes: 0759e552bce7 ("macsec: fix negative refcnt on parent link") > Signed-off-by: Dan Carpenter > Signed-off-by: David S. Miller > > The script I used for testing based on the reporter is attached. It > looks like modprobe is stuck in the D state. Any idea? I don't think that reference was actually leaked. It gets released in macsec_free_netdev() when the device is deleted. modprobe getting stuck is just a side-effect of the refcount going negative on the parent device, since removing the module needs to take the lock that is held by device deletion. I'll send a revert tomorrow. Thanks for the report, -- Sabrina
Re: [PATCH 2/2] kvm: nVMX: Introduce KVM_CAP_STATE
On Sat, 2018-04-14 at 15:56 +, Raslan, KarimAllah wrote: > On Thu, 2018-04-12 at 17:12 +0200, KarimAllah Ahmed wrote: > > > > From: Jim Mattson > > > > For nested virtualization L0 KVM is managing a bit of state for L2 guests, > > this state can not be captured through the currently available IOCTLs. In > > fact the state captured through all of these IOCTLs is usually a mix of L1 > > and L2 state. It is also dependent on whether the L2 guest was running at > > the moment when the process was interrupted to save its state. > > > > With this capability, there are two new vcpu ioctls: KVM_GET_VMX_STATE and > > KVM_SET_VMX_STATE. These can be used for saving and restoring a VM that is > > in VMX operation. > > > > Cc: Paolo Bonzini > > Cc: Radim Krčmář > > Cc: Thomas Gleixner > > Cc: Ingo Molnar > > Cc: H. Peter Anvin > > Cc: x...@kernel.org > > Cc: k...@vger.kernel.org > > Cc: linux-kernel@vger.kernel.org > > Signed-off-by: Jim Mattson > > [karahmed@ - rename structs and functions and make them ready for AMD and > > address previous comments. > >- rebase & a bit of refactoring. > >- Merge 7/8 and 8/8 into one patch. > >- Force a VMExit from L2 after reading the kvm_state to avoid > > mixed state between L1 and L2 on resurrecting the instance. ] > > Signed-off-by: KarimAllah Ahmed > > --- > > v2 -> v3: > > - Remove the forced VMExit from L2 after reading the kvm_state. The actual > > problem is solved. > > - Rebase again! > > - Set nested_run_pending during restore (not sure if it makes sense yet or > > not). > > - Reduce KVM_REQUEST_ARCH_BASE to 7 instead of 8 (the other alternative is > > to switch everything to u64) > > > > v1 -> v2: > > - Rename structs and functions and make them ready for AMD and address > > previous comments. > > - Rebase & a bit of refactoring. > > - Merge 7/8 and 8/8 into one patch. > > - Force a VMExit from L2 after reading the kvm_state to avoid mixed state > > between L1 and L2 on resurrecting the instance. > > --- > > Documentation/virtual/kvm/api.txt | 47 ++ > > arch/x86/include/asm/kvm_host.h | 7 ++ > > arch/x86/include/uapi/asm/kvm.h | 38 > > arch/x86/kvm/vmx.c| 177 > > +- > > arch/x86/kvm/x86.c| 21 + > > include/linux/kvm_host.h | 2 +- > > include/uapi/linux/kvm.h | 5 ++ > > 7 files changed, 292 insertions(+), 5 deletions(-) > > > > diff --git a/Documentation/virtual/kvm/api.txt > > b/Documentation/virtual/kvm/api.txt > > index 1c7958b..c51d5d3 100644 > > --- a/Documentation/virtual/kvm/api.txt > > +++ b/Documentation/virtual/kvm/api.txt > > @@ -3548,6 +3548,53 @@ Returns: 0 on success, > > -ENOENT on deassign if the conn_id isn't registered > > -EEXIST on assign if the conn_id is already registered > > > > +4.114 KVM_GET_STATE > > + > > +Capability: KVM_CAP_STATE > > +Architectures: x86 > > +Type: vcpu ioctl > > +Parameters: struct kvm_state (in/out) > > +Returns: 0 on success, -1 on error > > +Errors: > > + E2BIG: the data size exceeds the value of 'size' specified by > > + the user (the size required will be written into size). > > + > > +struct kvm_state { > > + __u16 flags; > > + __u16 format; > > + __u32 size; > > + union { > > + struct kvm_vmx_state vmx; > > + struct kvm_svm_state svm; > > + __u8 pad[120]; > > + }; > > + __u8 data[0]; > > +}; > > + > > +This ioctl copies the vcpu's kvm_state struct from the kernel to userspace. > > + > > +4.115 KVM_SET_STATE > > + > > +Capability: KVM_CAP_STATE > > +Architectures: x86 > > +Type: vcpu ioctl > > +Parameters: struct kvm_state (in) > > +Returns: 0 on success, -1 on error > > + > > +struct kvm_state { > > + __u16 flags; > > + __u16 format; > > + __u32 size; > > + union { > > + struct kvm_vmx_state vmx; > > + struct kvm_svm_state svm; > > + __u8 pad[120]; > > + }; > > + __u8 data[0]; > > +}; > > + > > +This copies the vcpu's kvm_state struct from userspace to the kernel. > > +>>> 13a7c9e... kvm: nVMX: Introduce KVM_CAP_STATE > > > > 5. The kvm_run structure > > > > diff --git a/arch/x86/include/asm/kvm_host.h > > b/arch/x86/include/asm/kvm_host.h > > index 9fa4f57..ad2116a 100644 > > --- a/arch/x86/include/asm/kvm_host.h > > +++ b/arch/x86/include/asm/kvm_host.h > > @@ -75,6 +75,7 @@ > > #define KVM_REQ_HV_EXITKVM_ARCH_REQ(21) > > #define KVM_REQ_HV_STIMER KVM_ARCH_REQ(22) > > #define KVM_REQ_LOAD_EOI_EXITMAP KVM_ARCH_REQ(23) > > +#define KVM_REQ_GET_VMCS12_PAGES KVM_ARCH_REQ(24) > > > > #define CR0_RESERVED_BITS \ > > (~(unsigned long)(X86_CR0_PE | X86_CR0_MP | X86_CR0_EM | X86_CR0_TS \ > > @@ -1084,6 +1085,12 @@ struct kvm_x86_ops { > > > > void (*setup_mce)(struct kvm_vc
Re: repeatable boot randomness inside KVM guest
On Sat, Apr 14, 2018 at 12:59 PM, Alexey Dobriyan wrote: > SLAB allocators got CONFIG_SLAB_FREELIST_RANDOM option which randomizes > allocation pattern inside a slab: > > > #ifdef CONFIG_SLAB_FREELIST_RANDOM > /* Pre-initialize the random sequence cache */ > static int init_cache_random_seq(struct kmem_cache *s) > { > ... > > Then I printed actual random sequences for each kmem cache. > Turned out they were all the same for most of the caches and > they didn't vary across guest reboots. > > int cache_random_seq_create(struct kmem_cache *cachep, unsigned int > count, gfp_t gfp) > { > ... > /* Get best entropy at this stage of boot */ > prandom_seed_state(&state, get_random_long()); > > Then I searched internet and turned out KVM can pass randomness via > virtio-rng or something. So I linked /dev/urandom. > > And it didn't help! > > The only way to get randomness for SLAB is to enable RDRAND inside guest. > > Is it KVM bug? > > For the record I'm using qemu 2.11.1-r2 and whatever F27 ships now. virtio-rng doesn't really do that. I have an ancient patch set to do exactly what you want, and I should dust it off.
Re: [RFC PATCH for 4.18 12/23] cpu_opv: Provide cpu_opv system call (v7)
On Thu, Apr 12, 2018 at 12:43 PM, Linus Torvalds wrote: > On Thu, Apr 12, 2018 at 12:27 PM, Mathieu Desnoyers > wrote: >> The cpu_opv system call executes a vector of operations on behalf of >> user-space on a specific CPU with preemption disabled. It is inspired >> by readv() and writev() system calls which take a "struct iovec" >> array as argument. > > Do we really want the page pinning? > > This whole cpu_opv thing is the most questionable part of the series, > and the page pinning is the most questionable part of cpu_opv for me. > > Can we plan on merging just the plain rseq parts *without* this all > first, and then see the cpu_opv thing as a "maybe future expansion" > part. > > I think that would make Andy happier too. > It only makes me happier if the userspace code involved is actually going to work when single-stepped, which might actually be the case (fingers crossed). That being said, I'm not really convinced that cpu_opv() makes much difference here, since I'm not entirely convinced that user code will actually use it or that user code will actually be that well tested. C'est la vie.
Re: repeatable boot randomness inside KVM guest
+linux...@kvack.org k...@vger.kernel.org, secur...@kernel.org moved to bcc On Sat, Apr 14, 2018 at 10:59:21PM +0300, Alexey Dobriyan wrote: > SLAB allocators got CONFIG_SLAB_FREELIST_RANDOM option which randomizes > allocation pattern inside a slab: > > int cache_random_seq_create(struct kmem_cache *cachep, unsigned int > count, gfp_t gfp) > { > ... > /* Get best entropy at this stage of boot */ > prandom_seed_state(&state, get_random_long()); > > Then I printed actual random sequences for each kmem cache. > Turned out they were all the same for most of the caches and > they didn't vary across guest reboots. The problem is at the super-early state of the boot path, kernel code can't allocate memory. This is something most device drivers kinda assume they can do. :-) So it means we haven't yet initialized the virtio-rng driver, and it's before interrupts have been enabled, so we can't harvest any entropy from interrupt timing. So that's why trying to use virtio-rng didn't help. > The only way to get randomness for SLAB is to enable RDRAND inside guest. > > Is it KVM bug? No, it's not a KVM bug. The fundamental issue is in how the CONFIG_SLAB_FREELIST_RANDOM is currently implemented. What needs to happen is freelist should get randomized much later in the boot sequence. Doing it later will require locking; I don't know enough about the slab/slub code to know whether the slab_mutex would be sufficient, or some other lock might need to be added. The other thing I would note that is that using prandom_u32_state() doesn't really provide much security. In fact, if the the goal is to protect against a malicious attacker trying to guess what addresses will be returned by the slab allocator, I suspect it's much like the security patdowns done at airports. It might protect against a really stupid attacker, but it's mostly security theater. The freelist randomization is only being done once; so it's not like performance is really an issue. It would be much better to just use get_random_u32() and be done with it. I'd drop using prandom_* functions in slab.c and slubct and slab_common.c, and just use a really random number generator, if the goal is real security as opposed to security for show (Not that there's necessarily any thing wrong with security theater; the US spends over 3 billion dollars a year on security theater. As politicians know, symbolism can be important. :-) Cheers, - Ted
[GIT PULL] Please pull powerpc/linux.git powerpc-4.17-2 tag
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 Hi Linus, Please pull some powerpc fixes for 4.17-rc1 if you can: The following changes since commit 49a695ba723224875df50e327bd7b0b65dd9a56b: Merge tag 'powerpc-4.17-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux (2018-04-07 12:08:19 -0700) are available in the git repository at: https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git tags/powerpc-4.17-2 for you to fetch changes up to 81b654c273914704a4bdf580f28d67aaba1094e4: powerpc/64s: Fix CPU_FTRS_ALWAYS vs DT CPU features (2018-04-13 23:51:44 +1000) - powerpc fixes for 4.17 #2 - Fix crashes when loading modules built with a different CONFIG_RELOCATABLE value by adding CONFIG_RELOCATABLE to vermagic. - Fix busy loops in the OPAL NVRAM driver if we get certain error conditions from firmware. - Remove tlbie trace points from KVM code that's called in real mode, because it causes crashes. - Fix checkstops caused by invalid tlbiel on Power9 Radix. - Ensure the set of CPU features we "know" are always enabled is actually the minimal set when we build with support for firmware supplied CPU features. Thanks to: Aneesh Kumar K.V, Anshuman Khandual, Nicholas Piggin. - Aneesh Kumar K.V (1): powerpc/8xx: Fix build with hugetlbfs enabled Anshuman Khandual (1): powerpc/fscr: Enable interrupts earlier before calling get_user() Michael Ellerman (4): powerpc/modules: Fix crashes by adding CONFIG_RELOCATABLE to vermagic powerpc/64s: Fix section mismatch warnings from setup_rfi_flush() powerpc/mm/radix: Fix checkstops caused by invalid tlbiel powerpc/64s: Fix CPU_FTRS_ALWAYS vs DT CPU features Nicholas Piggin (3): powerpc/powernv: define a standard delay for OPAL_BUSY type retry loops powerpc/powernv: Fix OPAL NVRAM driver OPAL_BUSY loops KVM: PPC: Book3S HV: trace_tlbie must not be called in realmode arch/powerpc/include/asm/cputable.h | 23 +++-- arch/powerpc/include/asm/module.h | 12 ++- arch/powerpc/include/asm/opal.h | 3 +++ arch/powerpc/kernel/dt_cpu_ftrs.c | 14 + arch/powerpc/kernel/setup_64.c | 2 +- arch/powerpc/kernel/traps.c | 32 +++-- arch/powerpc/kvm/book3s_hv_rm_mmu.c | 4 arch/powerpc/mm/slice.c | 1 + arch/powerpc/mm/tlb-radix.c | 5 ++--- arch/powerpc/platforms/powernv/opal-nvram.c | 7 ++- 10 files changed, 63 insertions(+), 40 deletions(-) -BEGIN PGP SIGNATURE- iQIcBAEBCAAGBQJa0oerAAoJEFHr6jzI4aWAvCIP/iG5Fv1b3MoslwQkCsAcMtTy zO/Ga7H54Kpiopr9F5T4JTKVOMVBb4urs8mP9RXGUzqf9iuNy3ZWGJfbHn1VXw4r wKFx5MfZYKNkoi8tGZHmbxhcSzFKsGHdCAEfC/SiNZCZZnbnt4NjWlUV4/QRts4a /LEyvrWPGzGCZF1y+LpFREAJakhJ1uklJNbqwMvRtlXmoJoqODNCRPk1Tmy8fQ8E eYLtAYGcN0x9w0YRo/1cTkJM/cksLPzJjZZn/6GR1vWRS544+iTSs6xT81RC7/yB 2QVsKD+V7D3iJ3iQ4DhCNkpMHNZjLqDNymMLjcYM1H/mPobvsegwZuGDm1jr7++D 3XBZa9wO6/dAOflu+nMlVNd323BMsdhGcI2WiZzsdkBh+aWU6hkQrgEG1uY3XV90 8zlOk6Cmiq+aYkcemEzMCvV1gYSpiauZx2q8Y/GKww2BVekRUKpsBTgcZvvKbBUX XBJtAkRo5hR2o2qLAUwSXiJuGcfrlZBuZT0qCb1SYd9XRxIevvb1iQz2Yngxr6PI n9reO01c6a25CJQqNLH07iy+eZcsWUNcrzEjaeHaHWPl+zcl0AWEuGj3Q80/SM6Y rqOBMn3YSANpFjmat90c6CSWC/Bdf2nORMEtIHQ5mKsnL4rBv6X+pvMf3/TSwDNa QzAeCp1gwX940ngqW3H7 =VzIU -END PGP SIGNATURE-
Re: repeatable boot randomness inside KVM guest
On Sat, Apr 14, 2018 at 03:41:42PM -0700, Andy Lutomirski wrote: > On Sat, Apr 14, 2018 at 12:59 PM, Alexey Dobriyan wrote: > > SLAB allocators got CONFIG_SLAB_FREELIST_RANDOM option which randomizes > > allocation pattern inside a slab: > > > > > > #ifdef CONFIG_SLAB_FREELIST_RANDOM > > /* Pre-initialize the random sequence cache */ > > static int init_cache_random_seq(struct kmem_cache *s) > > { > > ... > > > > Then I printed actual random sequences for each kmem cache. > > Turned out they were all the same for most of the caches and > > they didn't vary across guest reboots. > > > > int cache_random_seq_create(struct kmem_cache *cachep, unsigned int > > count, gfp_t gfp) > > { > > ... > > /* Get best entropy at this stage of boot */ > > prandom_seed_state(&state, get_random_long()); > > > > Then I searched internet and turned out KVM can pass randomness via > > virtio-rng or something. So I linked /dev/urandom. > > > > And it didn't help! > > > > The only way to get randomness for SLAB is to enable RDRAND inside guest. > > > > Is it KVM bug? > > > > For the record I'm using qemu 2.11.1-r2 and whatever F27 ships now. > > virtio-rng doesn't really do that. I have an ancient patch set to do > exactly what you want, and I should dust it off. Please, do. Here is a list of caches which aren't exactly randomly randomized with my setup. Many important ones are there :-( XXX name 'dma-kmalloc-96', r b1e6718e2e7147d4 XXX name 'dma-kmalloc-192', r a7664a0d69968019 XXX name 'dma-kmalloc-8', r 662c2e986443235c XXX name 'dma-kmalloc-16', r 770a9b620ae4cd62 XXX name 'dma-kmalloc-32', r 2e200073d5fa9f46 XXX name 'dma-kmalloc-64', r d8538fda83c74168 XXX name 'dma-kmalloc-128', r 9e4b956d09dd7d44 XXX name 'dma-kmalloc-256', r 8b14bcb58f9e18f5 XXX name 'dma-kmalloc-512', r 2bbace4b7120624a XXX name 'dma-kmalloc-1024', r 7cdf44406db52f5b XXX name 'dma-kmalloc-2048', r 18fe0ebf6bcfdf43 XXX name 'dma-kmalloc-4096', r 9f1a5eee118facf7 XXX name 'dma-kmalloc-8192', r f514d72a1cc441a2 XXX name 'kmalloc-8192', r 14843df817b556cc XXX name 'kmalloc-4096', r 52ed85fa9c691bbe XXX name 'kmalloc-2048', r fa81aa9222ff65a7 XXX name 'kmalloc-1024', r ae355c02d31f21d3 XXX name 'kmalloc-512', r 5fe0d22aaf2ef8d9 XXX name 'kmalloc-256', r 336d07a06917b95 XXX name 'kmalloc-192', r 6b6cd5399dd06d95 XXX name 'kmalloc-128', r 893b9e85369964ab XXX name 'kmalloc-96', r 179e185395d2612 XXX name 'kmalloc-64', r 29cf688b37eccea7 XXX name 'kmalloc-32', r fb7b4e7dca6de00a XXX name 'kmalloc-16', r a2a441fdc499d0c7 XXX name 'kmalloc-8', r e5454c7095ddd2be XXX name 'kmem_cache_node', r 500dc6126a47b229 XXX name 'kmem_cache', r 816c8c7bcde08372 XXX name 'task_group', r c09c4d1c1436ce97 XXX name 'radix_tree_node', r 4dd9540b830a4ea8 XXX name 'pool_workqueue', r 88b1e9d9a1f0b570 XXX name 'Acpi-Namespace', r 3e34d55f8f1cb140 XXX name 'Acpi-State', r b94e04635e77b48a XXX name 'Acpi-Parse', r d5374863b90f2a4c XXX name 'Acpi-ParseExt', r eefb2fff892f64a9 XXX name 'Acpi-Operand', r ce51949bcc80af13 XXX name 'pid', r cd6d8ee9e5209156 XXX name 'anon_vma', r c3a9273a68127ac7 XXX name 'anon_vma_chain', r a7cec15033c31a9b XXX name 'cred_jar', r fe4cc38c6d99cf63 XXX name 'task_struct', r eecb8895c6b7dbdb XXX name 'sighand_cache', r e5243c5eb2ce3a63 XXX name 'signal_cache', r 88b2e108d8ef81c7 XXX name 'files_cache', r ee29814e58dc909c XXX name 'fs_cache', r bc700a5f8fc28ff8 XXX name 'mm_struct', r f5230f99c7447359 XXX name 'vm_area_struct', r e30f3f8e648a9f88 XXX name 'nsproxy', r ae7c08b524a0f4d4 XXX name 'uts_namespace', r 6b1266178968ed99 XXX name 'buffer_head', r b24c10679dc55a11 XXX name 'names_cache', r 2e023b54e3ca5b8f XXX name 'dentry', r 83cc18634fbd74e8 XXX name 'inode_cache', r ff9a0ff3b4665cf5 XXX name 'filp', r 4fdad214b7ca7fc1 XXX name 'mnt_cache', r 8e726d32470b23e0 XXX name 'kernfs_node_cache', r 929c5f56778d365d XXX name 'bdev_cache', r 8a5520036bd0a464 XXX name 'sigqueue', r 2cf75c4d16191efb XXX name 'seq_file', r ec3ba1fe514524d5 XXX name 'proc_inode_cache', r b0c76cbbda5bb41f XXX name 'pde_opener', r 5f82f8e7100a517c XXX name 'proc_dir_entry', r ebabc4e93b52d7b8 XXX name 'shmem_inode_cache', r 2b25a3eb9aa32973 XXX name 'net_namespace', r 95793a7eae08a33f
Re: [PATCH 17/30] Documentation: kconfig: document a new Kconfig macro language
On 04/12/18 22:06, Masahiro Yamada wrote: > Add a document for the macro language introduced to Kconfig. > > Signed-off-by: Masahiro Yamada > --- > > Changes in v3: None > Changes in v2: None > > Documentation/kbuild/kconfig-macro-language.txt | 179 > > MAINTAINERS | 2 +- > 2 files changed, 180 insertions(+), 1 deletion(-) > create mode 100644 Documentation/kbuild/kconfig-macro-language.txt > > diff --git a/Documentation/kbuild/kconfig-macro-language.txt > b/Documentation/kbuild/kconfig-macro-language.txt > new file mode 100644 > index 000..1f6281b > --- /dev/null > +++ b/Documentation/kbuild/kconfig-macro-language.txt > @@ -0,0 +1,179 @@ > +Concept > +--- > + > +The basic idea was inspired by Make. When we look at Make, we notice sort of > +two languages in one. One language describes dependency graphs consisting of > +targets and prerequisites. The other is a macro language for performing > textual > +substitution. > + > +There is clear distinction between the two language stages. For example, you > +can write a makefile like follows: > + > +APP := foo > +SRC := foo.c > +CC := gcc > + > +$(APP): $(SRC) > +$(CC) -o $(APP) $(SRC) > + > +The macro language replaces the variable references with their expanded form, > +and handles as if the source file were input like follows: > + > +foo: foo.c > +gcc -o foo foo.c > + > +Then, Make analyzes the dependency graph and determines the targets to be > +updated. > + > +The idea is quite similar in Kconfig - it is possible to describe a Kconfig > +file like this: > + > +CC := gcc > + > +config CC_HAS_FOO > +def_bool $(shell $(srctree)/scripts/gcc-check-foo.sh $(CC)) > + > +The macro language in Kconfig processes the source file into the following > +intermediate: > + > +config CC_HAS_FOO > +def_bool y > + > +Then, Kconfig moves onto the evaluation stage to resolve inter-symbol > +dependency, which is explained in kconfig-language.txt. > + > + > +Variables > +- > + > +Like in Make, a variable in Kconfig works as a macro variable. A macro > +variable is expanded "in place" to yield a text string that may then expanded may then be expanded > +further. To get the value of a variable, enclose the variable name in $( ). > +As a special case, single-letter variable names can omit the parentheses and > is and are > +simply referenced like $X. Unlike Make, Kconfig does not support curly braces > +as in ${CC}. > + > +There are two types of variables: simply expanded variables and recursively > +expanded variables. > + > +A simply expanded variable is defined using the := assignment operator. Its > +righthand side is expanded immediately upon reading the line from the Kconfig > +file. > + > +A recursively expanded variable is defined using the = assignment operator. > +Its righthand side is simply stored as the value of the variable without > +expanding it in any way. Instead, the expansion is performed when the > variable > +is used. > + > +There is another type of assignment operator; += is used to append text to a > +variable. The righthand side of += is expanded immediately if the lefthand > +side was originally defined as a simple variable. Otherwise, its evaluation > is > +deferred. > + > + > +Functions > +- > + > +Like Make, Kconfig supports both built-in and user-defined functions. A > +function invocation looks much like a variable reference, but includes one or > +more parameters separated by commas: > + > + $(function-name arg1, arg2, arg3) > + > +Some functions are implemented as a built-in function. Currently, Kconfig > +supports the following: > + > + - $(shell command) > + > + The 'shell' function accepts a single argument that is expanded and passed > + to a subshell for execution. The standard output of the command is then > read > + and returned as the value of the function. Every newline in the output is > + replaced with a space. Any trailing newlines are deleted. The standard > error > + is not returned, nor is any program exit status. > + > + - $(warning text) > + > + The 'warning' function prints its arguments to stderr. The output is > prefixed > + with the name of the current Kconfig file, the current line number. It file and the current line number. It > + evaluates to an empty string. > + > + - $(info text) > + > + The 'info' function is similar to 'warning' except that it sends its > argument > + to stdout without any Kconfig name or line number. Are current Kconfig file name and line number available so that someone can construct their own $(info message) messages? > + > +A user-defined function is defined by using the = operator. The parameters > are > +referenced w
[GIT PULL] OpenRISC updates for 4.17
Hi Linus, Please consider for pull, The following changes since commit 0adb32858b0bddf4ada5f364a84ed60b196dbcda: Linux 4.16 (2018-04-01 14:20:27 -0700) are available in the git repository at: git://github.com/openrisc/linux.git tags/for-linus for you to fetch changes up to d56f3af9e801970d21c57621de3b42bc17eac152: openrisc: remove unused __ARCH_HAVE_MMU define (2018-04-08 02:15:47 +0900) OpenRISC updates for v4.17 Just one small thing here, it came in a while back but I didnt have anything in my 4.16 queue, still its the only thing for 4.17 so sending it alone. Small cleanup: - remove unused __ARCH_HAVE_MMU define Tobias Klauser (1): openrisc: remove unused __ARCH_HAVE_MMU define arch/openrisc/include/uapi/asm/unistd.h | 2 -- 1 file changed, 2 deletions(-)
Re: [RFC tip/locking/lockdep v6 01/20] lockdep/Documention: Recursive read lock detection reasoning
Hi, Just a few typos etc. below... On 04/11/2018 06:50 AM, Boqun Feng wrote: > Signed-off-by: Boqun Feng > --- > Documentation/locking/lockdep-design.txt | 178 > +++ > 1 file changed, 178 insertions(+) > > diff --git a/Documentation/locking/lockdep-design.txt > b/Documentation/locking/lockdep-design.txt > index 9de1c158d44c..6bb9e90e2c4f 100644 > --- a/Documentation/locking/lockdep-design.txt > +++ b/Documentation/locking/lockdep-design.txt > @@ -284,3 +284,181 @@ Run the command and save the output, then compare > against the output from > a later run of this command to identify the leakers. This same output > can also help you find situations where runtime lock initialization has > been omitted. > + > +Recursive read locks: > +- > + > +Lockdep now is equipped with deadlock detection for recursive read locks. > + > +Recursive read locks, as their name indicates, are the locks able to be > +acquired recursively. Unlike non-recursive read locks, recursive read locks > +only get blocked by current write lock *holders* other than write lock > +*waiters*, for example: > + > + TASK A: TASK B: > + > + read_lock(X); > + > + write_lock(X); > + > + read_lock(X); > + > +is not a deadlock for recursive read locks, as while the task B is waiting > for > +the lock X, the second read_lock() doesn't need to wait because it's a > recursive > +read lock. However if the read_lock() is non-recursive read lock, then the > above > +case is a deadlock, because even if the write_lock() in TASK B can not get > the > +lock, but it can block the second read_lock() in TASK A. > + > +Note that a lock can be a write lock (exclusive lock), a non-recursive read > +lock (non-recursive shared lock) or a recursive read lock (recursive shared > +lock), depending on the lock operations used to acquire it (more > specifically, > +the value of the 'read' parameter for lock_acquire()). In other words, a > single > +lock instance has three types of acquisition depending on the acquisition > +functions: exclusive, non-recursive read, and recursive read. > + > +To be concise, we call that write locks and non-recursive read locks as > +"non-recursive" locks and recursive read locks as "recursive" locks. > + > +Recursive locks don't block each other, while non-recursive locks do (this is > +even true for two non-recursive read locks). A non-recursive lock can block > the > +corresponding recursive lock, and vice versa. > + > +A deadlock case with recursive locks involved is as follow: > + > + TASK A: TASK B: > + > + read_lock(X); > + read_lock(Y); > + write_lock(Y); > + write_lock(X); > + > +Task A is waiting for task B to read_unlock() Y and task B is waiting for > task > +A to read_unlock() X. > + > +Dependency types and strong dependency paths: > +- > +In order to detect deadlocks as above, lockdep needs to track different > dependencies. > +There are 4 categories for dependency edges in the lockdep graph: > + > +1) -(NN)->: non-recursive to non-recursive dependency. "X -(NN)-> Y" means > +X -> Y and both X and Y are non-recursive locks. > + > +2) -(RN)->: recursive to non-recursive dependency. "X -(RN)-> Y" means > +X -> Y and X is recursive read lock and Y is non-recursive lock. > + > +3) -(NR)->: non-recursive to recursive dependency, "X -(NR)-> Y" means > +X -> Y and X is non-recursive lock and Y is recursive lock. > + > +4) -(RR)->: recursive to recursive dependency, "X -(RR)-> Y" means > +X -> Y and both X and Y are recursive locks. > + > +Note that given two locks, they may have multiple dependencies between them, > for example: > + > + TASK A: > + > + read_lock(X); > + write_lock(Y); > + ... > + > + TASK B: > + > + write_lock(X); > + write_lock(Y); > + > +, we have both X -(RN)-> Y and X -(NN)-> Y in the dependency graph. > + > +We use -(*N)-> for edges that is either -(RN)-> or -(NN)->, the similar for > -(N*)->, > +-(*R)-> and -(R*)-> > + > +A "path" is a series of conjunct dependency edges in the graph. And we > define a > +"strong" path, which indicates the strong dependency throughout each > dependency > +in the path, as the path that doesn't have two conjunct edges (dependencies) > as > +-(*R)-> and -(R*)->. In other words, a "strong" path is a path from a lock > +walking to another through the lock dependencies, and if X -> Y -> Z in the > +path (where X, Y, Z are locks), if the walk from X to Y is through a -(NR)-> > or > +-(RR)-> dependency, the walk from Y to Z must not be through a -(RN)-> or > +-(RR)-> dependency, otherwise it's not a strong path. > + > +We will see why the path is called "strong" in next section. > + > +Recursive Read Deadlock Detection: > +-- > +
Re: repeatable boot randomness inside KVM guest
On Sat, Apr 14, 2018 at 06:44:19PM -0400, Theodore Y. Ts'o wrote: > What needs to happen is freelist should get randomized much later in > the boot sequence. Doing it later will require locking; I don't know > enough about the slab/slub code to know whether the slab_mutex would > be sufficient, or some other lock might need to be added. Could we have the bootloader pass in some initial randomness?
Re: [PATCH] fs/dcache.c: re-add cond_resched() in shrink_dcache_parent()
On Sat, Apr 14, 2018 at 02:47:21PM -0700, Linus Torvalds wrote: > On Sat, Apr 14, 2018 at 1:58 PM, Al Viro wrote: > > > > That breaks d_invalidate(), unfortunately. Look at the termination > > conditions in the loop there... > > Ugh. I was going to say "but that doesn't even use select_collect()", > but yeah, detach_and_collect() calls it. > > It would be easy enough to just change the > > if (!list_empty(&data.select.dispose)) > > there to > > if (!list_empty(&data.select.found)) > > too. You would have to do the same in check_and_drop() as well, and that brings back d_invalidate()/d_invalidate() livelock we used to have. See 81be24d263db... I'm trying to put something together, but the damn thing is full of potential livelocks, unfortunately ;-/ Will send a followup once I have something resembling a sane solution...
drivers/infiniband/hw/mlx5/main.c:4555: undefined reference to `uverbs_default_get_objects'
tree: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master head: 18b7fd1c93e5204355ddbf2608a097d64df81b88 commit: 8c84660bb437fe8692e6a2b4e85023ccb874a520 IB/mlx5: Initialize the parsing tree root without the help of uverbs date: 10 days ago config: x86_64-randconfig-s5-04150714 (attached as .config) compiler: gcc-7 (Debian 7.3.0-1) 7.3.0 reproduce: git checkout 8c84660bb437fe8692e6a2b4e85023ccb874a520 # save the attached .config to linux build tree make ARCH=x86_64 All errors (new ones prefixed by >>): drivers/infiniband/hw/mlx5/main.o: In function `populate_specs_root': >> drivers/infiniband/hw/mlx5/main.c:4555: undefined reference to >> `uverbs_default_get_objects' >> drivers/infiniband/hw/mlx5/main.c:4559: undefined reference to >> `uverbs_alloc_spec_tree' drivers/infiniband/hw/mlx5/main.o: In function `depopulate_specs_root': >> drivers/infiniband/hw/mlx5/main.c:4566: undefined reference to >> `uverbs_free_spec_tree' vim +4555 drivers/infiniband/hw/mlx5/main.c 4550 4551 #define NUM_TREES 0 4552 static int populate_specs_root(struct mlx5_ib_dev *dev) 4553 { 4554 const struct uverbs_object_tree_def *default_root[NUM_TREES + 1] = { > 4555 uverbs_default_get_objects()}; 4556 size_t num_trees = 1; 4557 4558 dev->ib_dev.specs_root = > 4559 uverbs_alloc_spec_tree(num_trees, default_root); 4560 4561 return PTR_ERR_OR_ZERO(dev->ib_dev.specs_root); 4562 } 4563 4564 static void depopulate_specs_root(struct mlx5_ib_dev *dev) 4565 { > 4566 uverbs_free_spec_tree(dev->ib_dev.specs_root); 4567 } 4568 --- 0-DAY kernel test infrastructureOpen Source Technology Center https://lists.01.org/pipermail/kbuild-all Intel Corporation .config.gz Description: application/gzip
Re: [PATCH v3 4/4] mm/sparse: Optimize memmap allocation during sparse_init()
Hi Dave, Sorry for late reply. On 04/11/18 at 08:48am, Dave Hansen wrote: > On 04/08/2018 01:20 AM, Baoquan He wrote: > > On 04/06/18 at 07:50am, Dave Hansen wrote: > >> The code looks fine to me. It's a bit of a shame that there's no > >> verification to ensure that idx_present never goes beyond the shiny new > >> nr_present_sections. > > > > This is a good point. Do you think it's OK to replace (section_nr < > > NR_MEM_SECTIONS) with (section_nr < nr_present_sections) in below > > for_each macro? This for_each_present_section_nr() is only used > > during sparse_init() execution. > > > > #define for_each_present_section_nr(start, section_nr) \ > > for (section_nr = next_present_section_nr(start-1); \ > > ((section_nr >= 0) && \ > > (section_nr < NR_MEM_SECTIONS) && \ > > > > (section_nr <= __highest_present_section_nr));\ > > section_nr = next_present_section_nr(section_nr)) > > I was more concerned about the loops that "consume" the section maps. > It seems like they might run over the end of the array. > > >>> @@ -583,6 +592,7 @@ void __init sparse_init(void) > >>> unsigned long *usemap; > >>> unsigned long **usemap_map; > >>> int size; > >>> + int idx_present = 0; > >> > >> I wonder whether idx_present is a good name. Isn't it the number of > >> consumed mem_map[]s or usemaps? > > > > Yeah, in sparse_init(), it's the index of present memory sections, and > > also the number of consumed mem_map[]s or usemaps. And I remember you > > suggested nr_consumed_maps instead. seems nr_consumed_maps is a little > > long to index array to make code line longer than 80 chars. How about > > name it idx_present in sparse_init(), nr_consumed_maps in > > alloc_usemap_and_memmap(), the maps allocation function? I am also fine > > to use nr_consumed_maps for all of them. > > Does the large array index make a bunch of lines wrap or something? If > not, I'd just use the long name. I am fine with the long name, will use 'nr_consumed_maps' you suggested earlier to replace. > > >>> if (!map) { > >>> ms->section_mem_map = 0; > >>> + idx_present++; > >>> continue; > >>> } > >>> > >> > >> > >> This hunk seems logically odd to me. I would expect a non-used section > >> to *not* consume an entry from the temporary array. Why does it? The > >> error and success paths seem to do the same thing. > > > > Yes, this place is the hardest to understand. The temorary arrays are > > allocated beforehand with the size of 'nr_present_sections'. The error > > paths you mentioned is caused by allocation failure of mem_map or > > map_map, but whatever it's error or success paths, the sections must be > > marked as present in memory_present(). Error or success paths happened > > in alloc_usemap_and_memmap(), while checking if it's erorr or success > > paths happened in the last for_each_present_section_nr() of > > sparse_init(), and clear the ms->section_mem_map if it goes along error > > paths. This is the key point of this new allocation way. > > I think you owe some commenting because this is so hard to understand. I can arrange and write a code comment above sparse_init() according to this patch's git log, do you think it's OK? Honestly, it took me several days to write code, while I spent more than one week to write the patch log. Writing patch log is really a headache to me. Thanks Baoquan