date:20180414

Re: [PATCH] fs/dcache.c: re-add cond_resched() in shrink_dcache_parent()

2018-04-14 Thread Nikolay Borisov



On 14.04.2018 00:14, Andrew Morton wrote:
> On Fri, 13 Apr 2018 13:28:23 -0700 Khazhismel Kumykov  
> wrote:
> 
>> shrink_dcache_parent may spin waiting for a parallel shrink_dentry_list.
>> In this case we may have 0 dentries to dispose, so we will never
>> schedule out while waiting for the parallel shrink_dentry_list to
>> complete.
>>
>> Tested that this fixes syzbot reports of stalls in shrink_dcache_parent()
> 
> Well I guess the patch is OK as a stopgap, but things seem fairly
> messed up in there.  shrink_dcache_parent() shouldn't be doing a
> busywait, waiting for the concurrent shrink_dentry_list().
> 
> Either we should be waiting (sleeping) for the concurrent operation to
> complete or we should just bail out of shrink_dcache_parent(), perhaps
> with 
> 
>   if (list_empty(&data.dispose))
>   break;
> 
> or similar.  Dunno.

I agree, however, not being a dcache expert I'd refrain from touching
it, since it seems to be rather fragile. Perhaps Al could take a look in
there?

> 
> 
> That block comment over `struct select_data' is not a good one.  "It
> returns zero iff...".  *What* returns zero?  select_collect()?  No it
> doesn't, it returns an `enum d_walk_ret'.  Perhaps the comment is
> trying to refer to select_data.found.  And the real interpretation of
> select_data.found is, umm, hard to describe.  "Counts the number of
> dentries which are on a shrink list or which were moved to the dispose
> list".  Why?  What's that all about?
> 
> This code needs a bit of thought, documentation and perhaps a redo,
> I suspect.
>

Re: INFO: rcu detected stall in shrink_dcache_parent

2018-04-14 Thread Tetsuo Handa

Infinite loop inside shrink_dcache_parent() due to lack of cond_resched().
I can reproduce this issue by running the reproducer on one CPU (using "taskset 
-c 0").
Reverting commit 32785c0539b7e96f ("fs/dcache.c: add cond_resched() in 
shrink_dentry_list()")
solves this issue.

#syz dup: INFO: rcu detected stall in d_walk

Re: INFO: rcu detected stall in vfs_rmdir

2018-04-14 Thread Tetsuo Handa

Infinite loop inside shrink_dcache_parent() due to lack of cond_resched().
I can reproduce this issue by running the reproducer on one CPU (using "taskset 
-c 0").
Reverting commit 32785c0539b7e96f ("fs/dcache.c: add cond_resched() in 
shrink_dentry_list()")
solves this issue.

#syz dup: INFO: rcu detected stall in d_walk

Re: INFO: rcu detected stall in do_raw_spin_unlock

2018-04-14 Thread Tetsuo Handa

Infinite loop inside shrink_dcache_parent() due to lack of cond_resched().
I can reproduce this issue by running the reproducer on one CPU (using "taskset 
-c 0").
Reverting commit 32785c0539b7e96f ("fs/dcache.c: add cond_resched() in 
shrink_dentry_list()")
solves this issue.

#syz dup: INFO: rcu detected stall in d_walk

Re: INFO: rcu detected stall in _raw_spin_unlock

2018-04-14 Thread Tetsuo Handa

Infinite loop inside shrink_dcache_parent() due to lack of cond_resched().
I can reproduce this issue by running the reproducer on one CPU (using "taskset 
-c 0").
Reverting commit 32785c0539b7e96f ("fs/dcache.c: add cond_resched() in 
shrink_dentry_list()")
solves this issue.

#syz dup: INFO: rcu detected stall in d_walk

Re: [PATCH] fs: avoid fdput() after failed fdget() in vfs_dedupe_file_range()

2018-04-14 Thread Al Viro

On Sat, Apr 14, 2018 at 01:16:58AM -0500, Zev Weiss wrote:
> It's a fairly inconsequential bug, since fdput() won't actually try to
> fput() the file due to fd.flags (and thus FDPUT_FPUT) being zero in
> the failure case, but most other vfs code takes steps to avoid this.

Applied.

Re: [PATCH] fs/dcache.c: re-add cond_resched() in shrink_dcache_parent()

2018-04-14 Thread Al Viro

On Sat, Apr 14, 2018 at 10:00:29AM +0300, Nikolay Borisov wrote:
> 
> 
> On 14.04.2018 00:14, Andrew Morton wrote:
> > On Fri, 13 Apr 2018 13:28:23 -0700 Khazhismel Kumykov  
> > wrote:
> > 
> >> shrink_dcache_parent may spin waiting for a parallel shrink_dentry_list.
> >> In this case we may have 0 dentries to dispose, so we will never
> >> schedule out while waiting for the parallel shrink_dentry_list to
> >> complete.
> >>
> >> Tested that this fixes syzbot reports of stalls in shrink_dcache_parent()
> > 
> > Well I guess the patch is OK as a stopgap, but things seem fairly
> > messed up in there.  shrink_dcache_parent() shouldn't be doing a
> > busywait, waiting for the concurrent shrink_dentry_list().
> > 
> > Either we should be waiting (sleeping) for the concurrent operation to
> > complete or we should just bail out of shrink_dcache_parent(), perhaps
> > with 
> > 
> > if (list_empty(&data.dispose))
> > break;
> > 
> > or similar.  Dunno.
> 
> I agree, however, not being a dcache expert I'd refrain from touching
> it, since it seems to be rather fragile. Perhaps Al could take a look in
> there?

"Bail out" is definitely a bad idea, "sleep"... what on?  Especially
since there might be several evictions we are overlapping with...

Re: [PATCH v1 1/1] usb: core: Add quirk for HP v222w 16GB Mini

2018-04-14 Thread Sergei Shtylyov


Hello!

On 4/13/2018 8:40 PM, sathyanarayanan.kuppusw...@linux.intel.com wrote:


From: Kamil Lulko 

Add DELAY_INIT quirk to fix the following problem with HP
v222w 16GB Mini:

usb 1-3: unable to read config index 0 descriptor/start: -110
usb 1-3: can't read configurations, error -110
usb 1-3: can't set config #1, error -110

Signed-off-by: Kamil Lulko 
Signed-off-by: Kuppuswamy Sathyanarayanan 

---
  drivers/usb/core/quirks.c | 3 +++
  1 file changed, 3 insertions(+)

diff --git a/drivers/usb/core/quirks.c b/drivers/usb/core/quirks.c
index 54b019e..f2ef913 100644
--- a/drivers/usb/core/quirks.c
+++ b/drivers/usb/core/quirks.c
@@ -40,6 +40,9 @@ static const struct usb_device_id usb_quirk_list[] = {
{ USB_DEVICE(0x03f0, 0x0701), .driver_info =
USB_QUIRK_STRING_FETCH_255 },
  
+/* HP v222w 16GB Mini USB Drive */

+{ USB_DEVICE(0x03f0, 0x3f40), .driver_info = USB_QUIRK_DELAY_INIT },
+


   Please indent with tabs (as above and below), not spaces.


/* Creative SB Audigy 2 NX */
{ USB_DEVICE(0x041e, 0x3020), .driver_info = USB_QUIRK_RESET_RESUME },
  


MBR, Sergei

[PATCH v3] net: davicom: dm9000: Avoid spinlock recursion during dm9000_timeout routine

2018-04-14 Thread Liu Xiang

On the DM9000B, dm9000_phy_write() is called after the main spinlock
is held, during the dm9000_timeout() routine. Spinlock recursion
occurs because the main spinlock is requested again in
dm9000_phy_write(). So spinlock should be avoided in phy operation
during the dm9000_timeout() routine.

---
v3:
   When a task enters dm9000_timeout() and gets the main spinlock,
   another task that wants to do asynchronous phy operation must be
   running on another cpu.Because of different cpus, this
   asynchronous task will be blocked in dm9000_phy_write() until
   dm9000_timeout() routine is completed.
---

Signed-off-by: Liu Xiang 
---
 drivers/net/ethernet/davicom/dm9000.c | 39 +--
 1 file changed, 28 insertions(+), 11 deletions(-)

diff --git a/drivers/net/ethernet/davicom/dm9000.c 
b/drivers/net/ethernet/davicom/dm9000.c
index 50222b7..56df77d 100644
--- a/drivers/net/ethernet/davicom/dm9000.c
+++ b/drivers/net/ethernet/davicom/dm9000.c
@@ -112,7 +112,7 @@ struct board_info {
u8  imr_all;
 
unsigned intflags;
-   unsigned intin_timeout:1;
+   int timeout_cpu;
unsigned intin_suspend:1;
unsigned intwake_supported:1;
 
@@ -158,6 +158,17 @@ static inline struct board_info *to_dm9000_board(struct 
net_device *dev)
return netdev_priv(dev);
 }
 
+static bool dm9000_current_in_timeout(struct board_info *db)
+{
+   bool ret = false;
+
+   preempt_disable();
+   ret = (db->timeout_cpu == smp_processor_id());
+   preempt_enable();
+
+   return ret;
+}
+
 /* DM9000 network board routine  */
 
 /*
@@ -276,7 +287,7 @@ static void dm9000_dumpblk_32bit(void __iomem *reg, int 
count)
  */
 static void dm9000_msleep(struct board_info *db, unsigned int ms)
 {
-   if (db->in_suspend || db->in_timeout)
+   if (db->in_suspend || dm9000_current_in_timeout(db))
mdelay(ms);
else
msleep(ms);
@@ -335,12 +346,13 @@ static void dm9000_msleep(struct board_info *db, unsigned 
int ms)
struct board_info *db = netdev_priv(dev);
unsigned long flags;
unsigned long reg_save;
+   bool in_timeout = dm9000_current_in_timeout(db);
 
dm9000_dbg(db, 5, "phy_write[%02x] = %04x\n", reg, value);
-   if (!db->in_timeout)
+   if (!in_timeout) {
mutex_lock(&db->addr_lock);
-
-   spin_lock_irqsave(&db->lock, flags);
+   spin_lock_irqsave(&db->lock, flags);
+   }
 
/* Save previous register address */
reg_save = readb(db->io_addr);
@@ -356,11 +368,13 @@ static void dm9000_msleep(struct board_info *db, unsigned 
int ms)
iow(db, DM9000_EPCR, EPCR_EPOS | EPCR_ERPRW);
 
writeb(reg_save, db->io_addr);
-   spin_unlock_irqrestore(&db->lock, flags);
+   if (!in_timeout)
+   spin_unlock_irqrestore(&db->lock, flags);
 
dm9000_msleep(db, 1);   /* Wait write complete */
 
-   spin_lock_irqsave(&db->lock, flags);
+   if (!in_timeout)
+   spin_lock_irqsave(&db->lock, flags);
reg_save = readb(db->io_addr);
 
iow(db, DM9000_EPCR, 0x0);  /* Clear phyxcer write command */
@@ -368,9 +382,10 @@ static void dm9000_msleep(struct board_info *db, unsigned 
int ms)
/* restore the previous address */
writeb(reg_save, db->io_addr);
 
-   spin_unlock_irqrestore(&db->lock, flags);
-   if (!db->in_timeout)
+   if (!in_timeout) {
+   spin_unlock_irqrestore(&db->lock, flags);
mutex_unlock(&db->addr_lock);
+   }
 }
 
 /* dm9000_set_io
@@ -980,7 +995,7 @@ static void dm9000_timeout(struct net_device *dev)
 
/* Save previous register address */
spin_lock_irqsave(&db->lock, flags);
-   db->in_timeout = 1;
+   db->timeout_cpu = smp_processor_id();
reg_save = readb(db->io_addr);
 
netif_stop_queue(dev);
@@ -992,7 +1007,7 @@ static void dm9000_timeout(struct net_device *dev)
 
/* Restore previous register address */
writeb(reg_save, db->io_addr);
-   db->in_timeout = 0;
+   db->timeout_cpu = -1;
spin_unlock_irqrestore(&db->lock, flags);
 }
 
@@ -1670,6 +1685,8 @@ static struct dm9000_plat_data *dm9000_parse_dt(struct 
device *dev)
db->mii.mdio_read= dm9000_phy_read;
db->mii.mdio_write   = dm9000_phy_write;
 
+   db->timeout_cpu = -1;
+
mac_src = "eeprom";
 
/* try reading the node address from the attached EEPROM */
-- 
1.9.1

kernel-4.9.94 compile error: 'KMOD_DECOMP_LEN' undeclared

2018-04-14 Thread Teck Choon Giam

Hi,

Compile linux-4.9.94 will have error related to KMOD_DECOMP_LEN
undeclared.  Searching string related to KMOD_DECOMP_LEN in
linux-4.9.94 and linux-4.15.17 sources as below:

sh-4.2# grep -r KMOD_DECOMP_LEN ./linux-4.15.17
./linux-4.15.17/tools/perf/tests/code-reading.c: char
decomp_name[KMOD_DECOMP_LEN];
./linux-4.15.17/tools/perf/util/dso.h:#define KMOD_DECOMP_LEN
sizeof(KMOD_DECOMP_NAME)
./linux-4.15.17/tools/perf/util/annotate.c: char tmp[KMOD_DECOMP_LEN];
./linux-4.15.17/tools/perf/util/dso.c: char newpath[KMOD_DECOMP_LEN];
sh-4.2# grep -r KMOD_DECOMP_LEN ./linux-4.9.94
./linux-4.9.94/tools/perf/tests/code-reading.c: char
decomp_name[KMOD_DECOMP_LEN];
./linux-4.9.94/tools/perf/util/dso.c: char newpath[KMOD_DECOMP_LEN];

So I guess for linux-4.9.94 has not define KMOD_DECOMP_LEN in
tools/perf/util/dso.h?

Thanks.

Regards,
Giam Teck Choon

INFO: rcu detected stall in shrink_dentry_list

2018-04-14 Thread syzbot


Hello,

syzbot hit the following crash on upstream commit
16e205cf42da1f497b10a4a24f563e6c0d574eec (Fri Apr 13 03:56:10 2018 +)
Merge tag 'drm-fixes-for-v4.17-rc1' of  
git://people.freedesktop.org/~airlied/linux
syzbot dashboard link:  
https://syzkaller.appspot.com/bug?extid=9275da3e0f734e102b61


Unfortunately, I don't have any reproducer for this crash yet.
Raw console output:  
https://syzkaller.appspot.com/x/log.txt?id=4692036947017728
Kernel config:  
https://syzkaller.appspot.com/x/.config?id=-5947642240294114534

compiler: gcc (GCC) 8.0.1 20180301 (experimental)

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+9275da3e0f734e102...@syzkaller.appspotmail.com
It will help syzbot understand when the bug is fixed. See footer for  
details.

If you forward the report, please keep this part and the footer.

INFO: rcu_sched self-detected stall on CPU
	1-...!: (124995 ticks this GP) idle=b86/1/4611686018427387906  
softirq=32196/32196 fqs=3

 (t=125000 jiffies g=16751 c=16750 q=347)
rcu_sched kthread starved for 124987 jiffies! g16751 c16750 f0x0  
RCU_GP_WAIT_FQS(3) ->state=0x0 ->cpu=1

RCU grace-period kthread stack dump:
rcu_sched   R  running task23544 9  2 0x8000
Call Trace:
 context_switch kernel/sched/core.c:2848 [inline]
 __schedule+0x801/0x1e30 kernel/sched/core.c:3490
 schedule+0xef/0x430 kernel/sched/core.c:3549
 schedule_timeout+0x138/0x240 kernel/time/timer.c:1801
 rcu_gp_kthread+0x6b5/0x1940 kernel/rcu/tree.c:2231
 kthread+0x345/0x410 kernel/kthread.c:238
 ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:411
NMI backtrace for cpu 1
CPU: 1 PID: 4559 Comm: syz-executor6 Not tainted 4.16.0+ #2
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS  
Google 01/01/2011

Call Trace:
 
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0x1b9/0x294 lib/dump_stack.c:113
 nmi_cpu_backtrace.cold.4+0x19/0xce lib/nmi_backtrace.c:103
 nmi_trigger_cpumask_backtrace+0x151/0x192 lib/nmi_backtrace.c:62
 arch_trigger_cpumask_backtrace+0x14/0x20 arch/x86/kernel/apic/hw_nmi.c:38
 trigger_single_cpu_backtrace include/linux/nmi.h:156 [inline]
 rcu_dump_cpu_stacks+0x175/0x1c2 kernel/rcu/tree.c:1376
 print_cpu_stall kernel/rcu/tree.c:1525 [inline]
 check_cpu_stall.isra.61.cold.80+0x36c/0x59a kernel/rcu/tree.c:1593
 __rcu_pending kernel/rcu/tree.c:3356 [inline]
 rcu_pending kernel/rcu/tree.c:3401 [inline]
 rcu_check_callbacks+0x21b/0xad0 kernel/rcu/tree.c:2763
 update_process_times+0x2d/0x70 kernel/time/timer.c:1636
 tick_sched_handle+0x9f/0x180 kernel/time/tick-sched.c:173
 tick_sched_timer+0x45/0x130 kernel/time/tick-sched.c:1283
 __run_hrtimer kernel/time/hrtimer.c:1386 [inline]
 __hrtimer_run_queues+0x3e3/0x10a0 kernel/time/hrtimer.c:1448
 hrtimer_interrupt+0x286/0x650 kernel/time/hrtimer.c:1506
 local_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1025 [inline]
 smp_apic_timer_interrupt+0x15d/0x710 arch/x86/kernel/apic/apic.c:1050
 apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:862
 
RIP: 0010:__sanitizer_cov_trace_pc+0x14/0x50 kernel/kcov.c:94
RSP: 0018:88018e2e7a80 EFLAGS: 0246 ORIG_RAX: ff13
RAX: 88018e2de680 RBX: 88018e2e7bf8 RCX: 81c2b1d9
RDX:  RSI: 81c26bf3 RDI: 88018e2e7bf8
RBP: 88018e2e7a80 R08: 88018e2de680 R09: ed003b51c378
R10: ed003b51c378 R11: 8801da8e1bc3 R12: 88018e2e7c30
R13: dc00 R14: 110031c5cf7e R15: ed0031c5cf81
 shrink_dentry_list+0x5a8/0x7c0 fs/dcache.c:1087
 shrink_dcache_parent+0xba/0x230 fs/dcache.c:1490
 vfs_rmdir+0x202/0x470 fs/namei.c:3850
 do_rmdir+0x523/0x610 fs/namei.c:3911
 SYSC_rmdir fs/namei.c:3929 [inline]
 SyS_rmdir+0x1a/0x20 fs/namei.c:3927
 do_syscall_64+0x29e/0x9d0 arch/x86/entry/common.c:287
 entry_SYSCALL_64_after_hwframe+0x42/0xb7
RIP: 0033:0x455087
RSP: 002b:7fff8b6b76b8 EFLAGS: 0206 ORIG_RAX: 0054
RAX: ffda RBX: 0065 RCX: 00455087
RDX:  RSI: 7fff8b6b9460 RDI: 7fff8b6b9460
RBP: 7fff8b6b9460 R08:  R09: 0001
R10: 000a R11: 0206 R12: 02768940
R13:  R14: 01ec R15: 0001984e


---
This bug is generated by a dumb bot. It may contain errors.
See https://goo.gl/tpsmEJ for details.
Direct all questions to syzkal...@googlegroups.com.

syzbot will keep track of this bug report.
If you forgot to add the Reported-by tag, once the fix for this bug is  
merged

into any tree, please reply to this email with:
#syz fix: exact-commit-title
To mark this as a duplicate of another syzbot report, please reply with:
#syz dup: exact-subject-of-another-report
If it's a one-off invalid bug report, please reply with:
#syz invalid
Note: if the crash happens again, it will cause creation of a new bug  
report.

Note: all commands must start from beginning of the line in the email body.

Re: INFO: rcu detected stall in shrink_dentry_list

2018-04-14 Thread Dmitry Vyukov

On Sat, Apr 14, 2018 at 11:43 AM, syzbot
 wrote:
> Hello,
>
> syzbot hit the following crash on upstream commit
> 16e205cf42da1f497b10a4a24f563e6c0d574eec (Fri Apr 13 03:56:10 2018 +)
> Merge tag 'drm-fixes-for-v4.17-rc1' of
> git://people.freedesktop.org/~airlied/linux
> syzbot dashboard link:
> https://syzkaller.appspot.com/bug?extid=9275da3e0f734e102b61
>
> Unfortunately, I don't have any reproducer for this crash yet.
> Raw console output:
> https://syzkaller.appspot.com/x/log.txt?id=4692036947017728
> Kernel config:
> https://syzkaller.appspot.com/x/.config?id=-5947642240294114534
> compiler: gcc (GCC) 8.0.1 20180301 (experimental)
>
> IMPORTANT: if you fix the bug, please add the following tag to the commit:
> Reported-by: syzbot+9275da3e0f734e102...@syzkaller.appspotmail.com
> It will help syzbot understand when the bug is fixed. See footer for
> details.
> If you forward the report, please keep this part and the footer.


#syz dup: INFO: rcu detected stall in d_walk


> INFO: rcu_sched self-detected stall on CPU
> 1-...!: (124995 ticks this GP) idle=b86/1/4611686018427387906
> softirq=32196/32196 fqs=3
>  (t=125000 jiffies g=16751 c=16750 q=347)
> rcu_sched kthread starved for 124987 jiffies! g16751 c16750 f0x0
> RCU_GP_WAIT_FQS(3) ->state=0x0 ->cpu=1
> RCU grace-period kthread stack dump:
> rcu_sched   R  running task23544 9  2 0x8000
> Call Trace:
>  context_switch kernel/sched/core.c:2848 [inline]
>  __schedule+0x801/0x1e30 kernel/sched/core.c:3490
>  schedule+0xef/0x430 kernel/sched/core.c:3549
>  schedule_timeout+0x138/0x240 kernel/time/timer.c:1801
>  rcu_gp_kthread+0x6b5/0x1940 kernel/rcu/tree.c:2231
>  kthread+0x345/0x410 kernel/kthread.c:238
>  ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:411
> NMI backtrace for cpu 1
> CPU: 1 PID: 4559 Comm: syz-executor6 Not tainted 4.16.0+ #2
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
> Google 01/01/2011
> Call Trace:
>  
>  __dump_stack lib/dump_stack.c:77 [inline]
>  dump_stack+0x1b9/0x294 lib/dump_stack.c:113
>  nmi_cpu_backtrace.cold.4+0x19/0xce lib/nmi_backtrace.c:103
>  nmi_trigger_cpumask_backtrace+0x151/0x192 lib/nmi_backtrace.c:62
>  arch_trigger_cpumask_backtrace+0x14/0x20 arch/x86/kernel/apic/hw_nmi.c:38
>  trigger_single_cpu_backtrace include/linux/nmi.h:156 [inline]
>  rcu_dump_cpu_stacks+0x175/0x1c2 kernel/rcu/tree.c:1376
>  print_cpu_stall kernel/rcu/tree.c:1525 [inline]
>  check_cpu_stall.isra.61.cold.80+0x36c/0x59a kernel/rcu/tree.c:1593
>  __rcu_pending kernel/rcu/tree.c:3356 [inline]
>  rcu_pending kernel/rcu/tree.c:3401 [inline]
>  rcu_check_callbacks+0x21b/0xad0 kernel/rcu/tree.c:2763
>  update_process_times+0x2d/0x70 kernel/time/timer.c:1636
>  tick_sched_handle+0x9f/0x180 kernel/time/tick-sched.c:173
>  tick_sched_timer+0x45/0x130 kernel/time/tick-sched.c:1283
>  __run_hrtimer kernel/time/hrtimer.c:1386 [inline]
>  __hrtimer_run_queues+0x3e3/0x10a0 kernel/time/hrtimer.c:1448
>  hrtimer_interrupt+0x286/0x650 kernel/time/hrtimer.c:1506
>  local_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1025 [inline]
>  smp_apic_timer_interrupt+0x15d/0x710 arch/x86/kernel/apic/apic.c:1050
>  apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:862
>  
> RIP: 0010:__sanitizer_cov_trace_pc+0x14/0x50 kernel/kcov.c:94
> RSP: 0018:88018e2e7a80 EFLAGS: 0246 ORIG_RAX: ff13
> RAX: 88018e2de680 RBX: 88018e2e7bf8 RCX: 81c2b1d9
> RDX:  RSI: 81c26bf3 RDI: 88018e2e7bf8
> RBP: 88018e2e7a80 R08: 88018e2de680 R09: ed003b51c378
> R10: ed003b51c378 R11: 8801da8e1bc3 R12: 88018e2e7c30
> R13: dc00 R14: 110031c5cf7e R15: ed0031c5cf81
>  shrink_dentry_list+0x5a8/0x7c0 fs/dcache.c:1087
>  shrink_dcache_parent+0xba/0x230 fs/dcache.c:1490
>  vfs_rmdir+0x202/0x470 fs/namei.c:3850
>  do_rmdir+0x523/0x610 fs/namei.c:3911
>  SYSC_rmdir fs/namei.c:3929 [inline]
>  SyS_rmdir+0x1a/0x20 fs/namei.c:3927
>  do_syscall_64+0x29e/0x9d0 arch/x86/entry/common.c:287
>  entry_SYSCALL_64_after_hwframe+0x42/0xb7
> RIP: 0033:0x455087
> RSP: 002b:7fff8b6b76b8 EFLAGS: 0206 ORIG_RAX: 0054
> RAX: ffda RBX: 0065 RCX: 00455087
> RDX:  RSI: 7fff8b6b9460 RDI: 7fff8b6b9460
> RBP: 7fff8b6b9460 R08:  R09: 0001
> R10: 000a R11: 0206 R12: 02768940
> R13:  R14: 01ec R15: 0001984e
>
>
> ---
> This bug is generated by a dumb bot. It may contain errors.
> See https://goo.gl/tpsmEJ for details.
> Direct all questions to syzkal...@googlegroups.com.
>
> syzbot will keep track of this bug report.
> If you forgot to add the Reported-by tag, once the fix for this bug is
> merged
> into any tree, please reply to this email with:
> #syz fix: exact-commit-title
> To mark this as a duplicate of another syzbot report, please reply with:
> #syz dup: exact-subj

Re: [PATCH] netfilter: CONFIG_NF_REJECT_IPV{4,6} becomes bool toggle

2018-04-14 Thread kbuild test robot

Hi Pablo,

I love your patch! Yet something to improve:

[auto build test ERROR on nf-next/master]
[also build test ERROR on v4.16 next-20180413]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/Pablo-Neira-Ayuso/netfilter-CONFIG_NF_REJECT_IPV-4-6-becomes-bool-toggle/20180414-101337
base:   https://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next.git master
config: powerpc64-allmodconfig (attached as .config)
compiler: powerpc64-linux-gnu-gcc (Debian 7.2.0-11) 7.2.0
reproduce:
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
make.cross ARCH=powerpc64 

All error/warnings (new ones prefixed by >>):

   powerpc64-linux-gnu-ld: warning: orphan section `.gnu.hash' from `linker 
stubs' being placed in section `.gnu.hash'.
   net/ipv6/netfilter/nf_reject_ipv6.o: In function `.nf_reject_ip6_tcphdr_get':
>> (.text+0x1f0): undefined reference to `.nf_ip6_checksum'
   net/ipv6/netfilter/nf_reject_ipv6.o: In function `.nf_send_reset6':
>> (.text+0x794): undefined reference to `.ip6_route_output_flags'
   net/ipv6/netfilter/nf_reject_ipv6.o: In function `.nf_send_unreach6':
   (.text+0xab8): undefined reference to `.nf_ip6_checksum'

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: application/gzip

[PATCH net-next] net: introduce a new tracepoint for tcp_rcv_space_adjust

2018-04-14 Thread Yafang Shao

tcp_rcv_space_adjust is called every time data is copied to user space,
introducing a tcp tracepoint for which could show us when the packet is
copied to user.
This could help us figure out whether there's latency in user process.

When a tcp packet arrives, tcp_rcv_established() will be called and with
the existed tracepoint tcp_probe we could get the time when this packet
arrives.
Then this packet will be copied to user, and tcp_rcv_space_adjust will
be called and with this new introduced tracepoint we could get the time
when this packet is copied to user.

arrives time : user process time=> latency caused by user
tcp_probe  tcp_rcv_space_adjust

Hence in the prink message, sk is printed as a key to connect these two
tracepoints.

Maybe we could export sockfd in this new tracepoint as well, then we
could connect this new tracepoint with epoll/read/recv* tracepoint, and
finally that could show us the whole lifespan of this packet. But we
could also implement that with pid as these functions are executed in
process context.

Signed-off-by: Yafang Shao 
---
 include/trace/events/tcp.h | 21 +++--
 net/ipv4/tcp_input.c   |  2 ++
 2 files changed, 17 insertions(+), 6 deletions(-)

diff --git a/include/trace/events/tcp.h b/include/trace/events/tcp.h
index 878b2be..65a6d22 100644
--- a/include/trace/events/tcp.h
+++ b/include/trace/events/tcp.h
@@ -146,10 +146,11 @@
   sk->sk_v6_rcv_saddr, sk->sk_v6_daddr);
),
 
-   TP_printk("sport=%hu dport=%hu saddr=%pI4 daddr=%pI4 saddrv6=%pI6c 
daddrv6=%pI6c",
+   TP_printk("sport=%hu dport=%hu saddr=%pI4 daddr=%pI4 saddrv6=%pI6c 
daddrv6=%pI6c sock=0x%p",
  __entry->sport, __entry->dport,
  __entry->saddr, __entry->daddr,
- __entry->saddr_v6, __entry->daddr_v6)
+ __entry->saddr_v6, __entry->daddr_v6,
+ __entry->skaddr)
 );
 
 DEFINE_EVENT(tcp_event_sk, tcp_receive_reset,
@@ -166,6 +167,13 @@
TP_ARGS(sk)
 );
 
+DEFINE_EVENT(tcp_event_sk, tcp_rcv_space_adjust,
+
+   TP_PROTO(const struct sock *sk),
+
+   TP_ARGS(sk)
+);
+
 TRACE_EVENT(tcp_set_state,
 
TP_PROTO(const struct sock *sk, const int oldstate, const int newstate),
@@ -265,6 +273,7 @@
TP_ARGS(sk, skb),
 
TP_STRUCT__entry(
+   __field(const void *, skaddr)
/* sockaddr_in6 is always bigger than sockaddr_in */
__array(__u8, saddr, sizeof(struct sockaddr_in6))
__array(__u8, daddr, sizeof(struct sockaddr_in6))
@@ -285,6 +294,8 @@
const struct tcp_sock *tp = tcp_sk(sk);
const struct inet_sock *inet = inet_sk(sk);
 
+   __entry->skaddr = sk;
+
memset(__entry->saddr, 0, sizeof(struct sockaddr_in6));
memset(__entry->daddr, 0, sizeof(struct sockaddr_in6));
 
@@ -305,13 +316,11 @@
__entry->srtt = tp->srtt_us >> 3;
),
 
-   TP_printk("src=%pISpc dest=%pISpc mark=%#x length=%d snd_nxt=%#x "
- "snd_una=%#x snd_cwnd=%u ssthresh=%u snd_wnd=%u srtt=%u "
- "rcv_wnd=%u",
+   TP_printk("src=%pISpc dest=%pISpc mark=%#x length=%d snd_nxt=%#x 
snd_una=%#x snd_cwnd=%u ssthresh=%u snd_wnd=%u srtt=%u rcv_wnd=%u sock=0x%p",
  __entry->saddr, __entry->daddr, __entry->mark,
  __entry->length, __entry->snd_nxt, __entry->snd_una,
  __entry->snd_cwnd, __entry->ssthresh, __entry->snd_wnd,
- __entry->srtt, __entry->rcv_wnd)
+ __entry->srtt, __entry->rcv_wnd, __entry->skaddr)
 );
 
 #endif /* _TRACE_TCP_H */
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 367def6..4b4d6b9 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -582,6 +582,8 @@ void tcp_rcv_space_adjust(struct sock *sk)
u32 copied;
int time;
 
+   trace_tcp_rcv_space_adjust(sk);
+
tcp_mstamp_refresh(tp);
time = tcp_stamp_us_delta(tp->tcp_mstamp, tp->rcvq_space.time);
if (time < (tp->rcv_rtt_est.rtt_us >> 3) || tp->rcv_rtt_est.rtt_us == 0)
-- 
1.8.3.1

[PATCH v2 0/3] ti_am335x_tsc: Fix suspend/resume

2018-04-14 Thread Vignesh R

This patch series fixes couple of issues wrt suspend/resume with TI AM335x
TSC driver. Disable and clear any pending IRQs before suspend, and
handle case where TSC wakeup would fail, if there were touch events
during suspend.

v2:
Rebase onto latest linux-next.
v1:https://lkml.org/lkml/2016/5/16/150

Grygorii Strashko (2):
  Input: ti_am335x_tsc - Ack pending IRQs at probe and before suspend
  Input: ti_am335x_tsc - Prevent system suspend when TSC is in use

Vignesh R (1):
  Input: ti_am335x_tsc - Mark IRQ as wakeup capable

 drivers/input/touchscreen/ti_am335x_tsc.c | 14 ++
 include/linux/mfd/ti_am335x_tscadc.h  |  1 +
 2 files changed, 15 insertions(+)

-- 
2.17.0

[PATCH v2 2/3] Input: ti_am335x_tsc - Ack pending IRQs at probe and before suspend

2018-04-14 Thread Vignesh R

From: Grygorii Strashko 

It is seen that just enabling the TSC module triggers a HW_PEN IRQ
without any interaction with touchscreen by user. This results in first
suspend/resume sequence to fail as system immediately wakes up from
suspend as soon as HW_PEN IRQ is enabled in suspend handler due to the
pending IRQ. Therefore clear all IRQs at probe and also in suspend
callback for sanity.

Signed-off-by: Grygorii Strashko 
Signed-off-by: Vignesh R 
Acked-by: Lee Jones 
---

v2: Add Acks from v1.

 drivers/input/touchscreen/ti_am335x_tsc.c | 2 ++
 include/linux/mfd/ti_am335x_tscadc.h  | 1 +
 2 files changed, 3 insertions(+)

diff --git a/drivers/input/touchscreen/ti_am335x_tsc.c 
b/drivers/input/touchscreen/ti_am335x_tsc.c
index 810e05c9c4f5..dcd9db768169 100644
--- a/drivers/input/touchscreen/ti_am335x_tsc.c
+++ b/drivers/input/touchscreen/ti_am335x_tsc.c
@@ -439,6 +439,7 @@ static int titsc_probe(struct platform_device *pdev)
dev_err(&pdev->dev, "irq wake enable failed.\n");
}
 
+   titsc_writel(ts_dev, REG_IRQSTATUS, IRQENB_MASK);
titsc_writel(ts_dev, REG_IRQENABLE, IRQENB_FIFO0THRES);
titsc_writel(ts_dev, REG_IRQENABLE, IRQENB_EOS);
err = titsc_config_wires(ts_dev);
@@ -504,6 +505,7 @@ static int __maybe_unused titsc_suspend(struct device *dev)
 
tscadc_dev = ti_tscadc_dev_get(to_platform_device(dev));
if (device_may_wakeup(tscadc_dev->dev)) {
+   titsc_writel(ts_dev, REG_IRQSTATUS, IRQENB_MASK);
idle = titsc_readl(ts_dev, REG_IRQENABLE);
titsc_writel(ts_dev, REG_IRQENABLE,
(idle | IRQENB_HW_PEN));
diff --git a/include/linux/mfd/ti_am335x_tscadc.h 
b/include/linux/mfd/ti_am335x_tscadc.h
index b9a53e013bff..1a6a34f726cc 100644
--- a/include/linux/mfd/ti_am335x_tscadc.h
+++ b/include/linux/mfd/ti_am335x_tscadc.h
@@ -63,6 +63,7 @@
 #define IRQENB_FIFO1OVRRUN BIT(6)
 #define IRQENB_FIFO1UNDRFLWBIT(7)
 #define IRQENB_PENUP   BIT(9)
+#define IRQENB_MASK(0x7FF)
 
 /* Step Configuration */
 #define STEPCONFIG_MODE_MASK   (3 << 0)
-- 
2.17.0

[PATCH v2 1/3] Input: ti_am335x_tsc - Mark IRQ as wakeup capable

2018-04-14 Thread Vignesh R

On AM335x, ti_am335x_tsc can wake up the system from suspend, mark the
IRQ as wakeup capable, so that device irq is not disabled during system
suspend.

Signed-off-by: Vignesh R 
---

v2: No changes

 drivers/input/touchscreen/ti_am335x_tsc.c | 9 +
 1 file changed, 9 insertions(+)

diff --git a/drivers/input/touchscreen/ti_am335x_tsc.c 
b/drivers/input/touchscreen/ti_am335x_tsc.c
index f1043ae71dcc..810e05c9c4f5 100644
--- a/drivers/input/touchscreen/ti_am335x_tsc.c
+++ b/drivers/input/touchscreen/ti_am335x_tsc.c
@@ -27,6 +27,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 
@@ -432,6 +433,12 @@ static int titsc_probe(struct platform_device *pdev)
goto err_free_mem;
}
 
+   if (device_may_wakeup(tscadc_dev->dev)) {
+   err = dev_pm_set_wake_irq(tscadc_dev->dev, ts_dev->irq);
+   if (err)
+   dev_err(&pdev->dev, "irq wake enable failed.\n");
+   }
+
titsc_writel(ts_dev, REG_IRQENABLE, IRQENB_FIFO0THRES);
titsc_writel(ts_dev, REG_IRQENABLE, IRQENB_EOS);
err = titsc_config_wires(ts_dev);
@@ -462,6 +469,7 @@ static int titsc_probe(struct platform_device *pdev)
return 0;
 
 err_free_irq:
+   dev_pm_clear_wake_irq(tscadc_dev->dev);
free_irq(ts_dev->irq, ts_dev);
 err_free_mem:
input_free_device(input_dev);
@@ -474,6 +482,7 @@ static int titsc_remove(struct platform_device *pdev)
struct titsc *ts_dev = platform_get_drvdata(pdev);
u32 steps;
 
+   dev_pm_clear_wake_irq(ts_dev->mfd_tscadc->dev);
free_irq(ts_dev->irq, ts_dev);
 
/* total steps followed by the enable mask */
-- 
2.17.0

[PATCH v2 3/3] Input: ti_am335x_tsc - Prevent system suspend when TSC is in use

2018-04-14 Thread Vignesh R

From: Grygorii Strashko 

Prevent system suspend while user has finger on touch screen,
because TSC is wakeup source and suspending device while in use will
result in failure to disable the module.
This patch uses pm_stay_awake() and pm_relax() APIs to prevent and
resume system suspend as required.

Signed-off-by: Grygorii Strashko 
Signed-off-by: Vignesh R 
---

v2: No changes.

 drivers/input/touchscreen/ti_am335x_tsc.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/input/touchscreen/ti_am335x_tsc.c 
b/drivers/input/touchscreen/ti_am335x_tsc.c
index dcd9db768169..43b22e071842 100644
--- a/drivers/input/touchscreen/ti_am335x_tsc.c
+++ b/drivers/input/touchscreen/ti_am335x_tsc.c
@@ -275,6 +275,7 @@ static irqreturn_t titsc_irq(int irq, void *dev)
if (status & IRQENB_HW_PEN) {
ts_dev->pen_down = true;
irqclr |= IRQENB_HW_PEN;
+   pm_stay_awake(ts_dev->mfd_tscadc->dev);
}
 
if (status & IRQENB_PENUP) {
@@ -284,6 +285,7 @@ static irqreturn_t titsc_irq(int irq, void *dev)
input_report_key(input_dev, BTN_TOUCH, 0);
input_report_abs(input_dev, ABS_PRESSURE, 0);
input_sync(input_dev);
+   pm_relax(ts_dev->mfd_tscadc->dev);
} else {
ts_dev->pen_down = true;
}
@@ -524,6 +526,7 @@ static int __maybe_unused titsc_resume(struct device *dev)
titsc_writel(ts_dev, REG_IRQWAKEUP,
0x00);
titsc_writel(ts_dev, REG_IRQCLR, IRQENB_HW_PEN);
+   pm_relax(ts_dev->mfd_tscadc->dev);
}
titsc_step_config(ts_dev);
titsc_writel(ts_dev, REG_FIFO0THR,
-- 
2.17.0

Re: [PATCH 2/4 v4] sched/rt: add rt_rq utilization tracking

2018-04-14 Thread Peter Zijlstra

On Fri, Mar 16, 2018 at 12:25:39PM +0100, Vincent Guittot wrote:
> diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
> index 783eacf..a8003a9 100644
> --- a/kernel/sched/sched.h
> +++ b/kernel/sched/sched.h
> @@ -592,6 +592,8 @@ struct rt_rq {
>   unsigned long   rt_nr_total;
>   int overloaded;
>   struct plist_head   pushable_tasks;
> +
> + struct sched_avg avg;

We only want this for the root cgroup, right? So why is this per cgroup?

That is, I was expecting it to be rq::rt_avg or something.

Re: [PATCH 0/4 v4] sched/rt: track rt rq utilization

2018-04-14 Thread Peter Zijlstra



What I don't see in this patch-set is removal of the current rt_avg
stuff.

And I didn't look closely enough; but are the root cfs and rt pelt
windows aligned? They really should be; otherwise you can't combine them
sanely.

Re: [PATCH] x86/cpufeature: guard asm_volatile_goto usage with CC_HAVE_ASM_GOTO

2018-04-14 Thread Peter Zijlstra

On Fri, Apr 13, 2018 at 01:42:14PM -0700, Alexei Starovoitov wrote:
> On 4/13/18 11:19 AM, Peter Zijlstra wrote:
> > On Tue, Apr 10, 2018 at 02:28:04PM -0700, Alexei Starovoitov wrote:
> > > Instead of
> > > #ifdef CC_HAVE_ASM_GOTO
> > > we can replace it with
> > > #ifndef __BPF__
> > > or some other name,
> > 
> > I would prefer the BPF specific hack; otherwise we might be encouraging
> > people to build the kernel proper without asm-goto.
> > 
> 
> I don't understand this concern.

The thing is; this will be a (temporary) BPF specific hack. Hiding it
behind something that looks 'normal' (CC_HAVE_ASM_GOTO) is just not
right.

Re: [PATCH 3/7] bus: add bus driver for accessing Allwinner A64 DE2

2018-04-14 Thread Jagan Teki

On Fri, Mar 16, 2018 at 11:23 PM, Icenowy Zheng  wrote:
> The "Display Engine 2.0" (usually called DE2) on the Allwinner A64 SoC
> is different from the ones on other Allwinner SoCs. It requires a SRAM
> region to be claimed, otherwise all DE2 subblocks won't be accessible.
>
> Add a bus driver for the Allwinner A64 DE2 part which claims the SRAM
> region when probing.

Along with this bus driver, we also need
drivers/gpu/drm/sun4i/sun4i_drv.c which can usually drive the
pipelines like mixer0 and 1 are the cases for A64?

Jagan.

-- 
Jagan Teki
Senior Linux Kernel Engineer | Amarula Solutions
U-Boot, Linux | Upstream Maintainer
Hyderabad, India.

Re: [PATCH 3/7] bus: add bus driver for accessing Allwinner A64 DE2

2018-04-14 Thread Chen-Yu Tsai

On Sat, Apr 14, 2018 at 6:25 PM, Jagan Teki  wrote:
> On Fri, Mar 16, 2018 at 11:23 PM, Icenowy Zheng  wrote:
>> The "Display Engine 2.0" (usually called DE2) on the Allwinner A64 SoC
>> is different from the ones on other Allwinner SoCs. It requires a SRAM
>> region to be claimed, otherwise all DE2 subblocks won't be accessible.
>>
>> Add a bus driver for the Allwinner A64 DE2 part which claims the SRAM
>> region when probing.
>
> Along with this bus driver, we also need
> drivers/gpu/drm/sun4i/sun4i_drv.c which can usually drive the
> pipelines like mixer0 and 1 are the cases for A64?

I imagine that's the next part to be sent out, after the hardware
representation in the device tree has been decided on.

ChenYu

Re: [PATCH v3 3/3] ALSA: hda: Disabled unused audio controller for Dell platforms with Switchable Graphics

2018-04-14 Thread Lukas Wunner

On Thu, Apr 12, 2018 at 10:15:41PM +0800, Kai-Heng Feng wrote:
> > >>@@ -1711,6 +1745,11 @@ static int azx_create(struct snd_card *card,
> > >>struct pci_dev *pci,
> > >>  if (err < 0)
> > >>  return err;
> > >>
> > >>+ if (check_dell_switchable_gfx(pci)) {
> > >>+ pci_disable_device(pci);
> > 
> > Now looking at it again... This code disables all ATI and NVIDIA sound
> > cards available in any Dell System (laptop or AIO) if system says that
> > SG is enabled, right?
> > 
> > It means that also any external ATI or NVIDIA PCI card with audio device
> > connected to Thunderbolt (e.g. via PCI <--> TB bridge) is always
> > unconditionally disabled too?
> 
> I never thought of this case, thanks for bringing this up.
> Do you have any suggestion to check if it connects to the system via
> Thunderbolt?

Just use pci_is_thunderbolt_attached(), introduced by 8531e283bee6,
like this:

if (check_dell_switchable_gfx(pci) && !pci_is_thunderbolt_attached(pci))


> >>>+  /* Only need to check for Dell laptops and AIOs */
> >>>+  if (!dmi_find_device(DMI_DEV_TYPE_OEM_STRING, "Dell System", NULL) ||
> >>>+  !(dmi_match(DMI_CHASSIS_TYPE, "10") ||
> >>>+dmi_match(DMI_CHASSIS_TYPE, "13")) ||
> >>>+  !(pdev->vendor == PCI_VENDOR_ID_ATI ||
> >>>+pdev->vendor == PCI_VENDOR_ID_NVIDIA))
> >>>+  return false;

It sure would be nice if someone could add macros for the chassis type
to include/linux/dmi.h so that we don't have to use these magic numbers
everywhere:

$ git grep -l DMI_CHASSIS_TYPE
drivers/firmware/dmi-id.c
drivers/firmware/dmi_scan.c
drivers/input/keyboard/atkbd.c
drivers/input/serio/i8042-x86ia64io.h
drivers/platform/x86/asus-wmi.c
drivers/platform/x86/dell-laptop.c
drivers/platform/x86/samsung-laptop.c
include/linux/mod_devicetable.h
scripts/mod/file2alias.c

Thanks,

Lukas

Re: [PATCH 3/7] bus: add bus driver for accessing Allwinner A64 DE2

2018-04-14 Thread Jagan Teki

On Sat, Apr 14, 2018 at 4:00 PM, Chen-Yu Tsai  wrote:
> On Sat, Apr 14, 2018 at 6:25 PM, Jagan Teki  
> wrote:
>> On Fri, Mar 16, 2018 at 11:23 PM, Icenowy Zheng  wrote:
>>> The "Display Engine 2.0" (usually called DE2) on the Allwinner A64 SoC
>>> is different from the ones on other Allwinner SoCs. It requires a SRAM
>>> region to be claimed, otherwise all DE2 subblocks won't be accessible.
>>>
>>> Add a bus driver for the Allwinner A64 DE2 part which claims the SRAM
>>> region when probing.
>>
>> Along with this bus driver, we also need
>> drivers/gpu/drm/sun4i/sun4i_drv.c which can usually drive the
>> pipelines like mixer0 and 1 are the cases for A64?
>
> I imagine that's the next part to be sent out, after the hardware
> representation in the device tree has been decided on.

Yeah, this hardware representation along with separate bus driver
going like in another direction especially if we add pipelines support
to it, may be we can add sram stuff to platdata of existinf
sun4i_drv.c

Jagan.

-- 
Jagan Teki
Senior Linux Kernel Engineer | Amarula Solutions
U-Boot, Linux | Upstream Maintainer
Hyderabad, India.

Re: [PATCH v3 3/3] ALSA: hda: Disabled unused audio controller for Dell platforms with Switchable Graphics

2018-04-14 Thread Pali Rohár

On Saturday 14 April 2018 12:45:12 Lukas Wunner wrote:
> On Thu, Apr 12, 2018 at 10:15:41PM +0800, Kai-Heng Feng wrote:
> > > >>@@ -1711,6 +1745,11 @@ static int azx_create(struct snd_card *card,
> > > >>struct pci_dev *pci,
> > > >>if (err < 0)
> > > >>return err;
> > > >>
> > > >>+   if (check_dell_switchable_gfx(pci)) {
> > > >>+   pci_disable_device(pci);
> > > 
> > > Now looking at it again... This code disables all ATI and NVIDIA sound
> > > cards available in any Dell System (laptop or AIO) if system says that
> > > SG is enabled, right?
> > > 
> > > It means that also any external ATI or NVIDIA PCI card with audio device
> > > connected to Thunderbolt (e.g. via PCI <--> TB bridge) is always
> > > unconditionally disabled too?
> > 
> > I never thought of this case, thanks for bringing this up.
> > Do you have any suggestion to check if it connects to the system via
> > Thunderbolt?
> 
> Just use pci_is_thunderbolt_attached(), introduced by 8531e283bee6,
> like this:
> 
> if (check_dell_switchable_gfx(pci) && !pci_is_thunderbolt_attached(pci))

And what about PCI-e device attached to ExpressCard slot?

> > >>>+/* Only need to check for Dell laptops and AIOs */
> > >>>+if (!dmi_find_device(DMI_DEV_TYPE_OEM_STRING, "Dell System", 
> > >>>NULL) ||
> > >>>+!(dmi_match(DMI_CHASSIS_TYPE, "10") ||
> > >>>+  dmi_match(DMI_CHASSIS_TYPE, "13")) ||
> > >>>+!(pdev->vendor == PCI_VENDOR_ID_ATI ||
> > >>>+  pdev->vendor == PCI_VENDOR_ID_NVIDIA))
> > >>>+return false;
> 
> It sure would be nice if someone could add macros for the chassis type
> to include/linux/dmi.h so that we don't have to use these magic numbers
> everywhere:
> 
> $ git grep -l DMI_CHASSIS_TYPE
> drivers/firmware/dmi-id.c
> drivers/firmware/dmi_scan.c
> drivers/input/keyboard/atkbd.c
> drivers/input/serio/i8042-x86ia64io.h
> drivers/platform/x86/asus-wmi.c
> drivers/platform/x86/dell-laptop.c
> drivers/platform/x86/samsung-laptop.c
> include/linux/mod_devicetable.h
> scripts/mod/file2alias.c
> 
> Thanks,
> 
> Lukas

-- 
Pali Rohár
pali.ro...@gmail.com


signature.asc
Description: PGP signature

[PATCH] selftests:vm: add include file

2018-04-14 Thread Peng Hao

userfaultfd.c: In function ‘hugetlb_release_pages’:
userfaultfd.c:145:25: error: ‘FALLOC_FL_PUNCH_HOLE’ undeclared 
(first use in this function)

Signed-off-by: Peng Hao 
---
 tools/testing/selftests/vm/userfaultfd.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/tools/testing/selftests/vm/userfaultfd.c 
b/tools/testing/selftests/vm/userfaultfd.c
index de2f9ec..d8fe447 100644
--- a/tools/testing/selftests/vm/userfaultfd.c
+++ b/tools/testing/selftests/vm/userfaultfd.c
@@ -68,6 +68,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #ifdef __NR_userfaultfd
 
-- 
1.8.3.1

OK

2018-04-14 Thread Ahmed Zama

Dear Friend,

Greetings to you my friend, I have a very lucrative Partnership offer
for you. Kindly contact me for more details

Best Regards

Ahmed Zama

Re: [PATCHv4] gpio: Remove VLA from gpiolib

2018-04-14 Thread Phil Reid


On 14/04/2018 05:10, Laura Abbott wrote:

On 04/12/2018 05:39 PM, Phil Reid wrote:

On 12/04/2018 16:38, Linus Walleij wrote:

On Wed, Apr 11, 2018 at 3:03 AM, Laura Abbott  wrote:


The new challenge is to remove VLAs from the kernel
(see https://lkml.org/lkml/2018/3/7/621) to eventually
turn on -Wvla.

Using a kmalloc array is the easy way to fix this but kmalloc is still
more expensive than stack allocation. Introduce a fast path with a
fixed size stack array to cover most chip with gpios below some fixed
amount. The slow path dynamically allocates an array to cover those
chips with a large number of gpios.

Reviewed-and-tested-by: Lukas Wunner 
Signed-off-by: Lukas Wunner 
Signed-off-by: Laura Abbott 
---
v4: Changed some local variables to avoid coccinelle warnings. Added a
warning if the number of GPIOs exceeds the current fast path define.

Lukas, I kept your Tested-by because the changes were pretty minimal.
Let me know if you want to run the tests again.


This patch is starting to look really good.


+/*
+ * Number of GPIOs to use for the fast path in set array
+ */
+#define FASTPATH_NGPIO 256


There is still some comment about this.

And now that I am also tryint to think I wonder about it, we
have a global ARCH_NR_GPIOS that is typically 512.
Some archs set it up.

This define is something of an abomination, in the ARM
case it comes from arch/arm/include/asm/gpio.h
where #define ARCH_NR_GPIOS CONFIG_ARCH_NR_GPIO
where the latter is a Kconfig option that is mostly 512 for
most ARM systems.

Well, ARM looks like this:

config ARCH_NR_GPIO
 int
 default 2048 if ARCH_SOCFPGA
 default 1024 if ARCH_BRCMSTB || ARCH_SHMOBILE || ARCH_TEGRA || \
 ARCH_ZYNQ
 default 512 if ARCH_EXYNOS || ARCH_KEYSTONE || SOC_OMAP5 || \
 SOC_DRA7XX || ARCH_S3C24XX || ARCH_S3C64XX || ARCH_S5PV210
 default 416 if ARCH_SUNXI
 default 392 if ARCH_U8500
 default 352 if ARCH_VT8500
 default 288 if ARCH_ROCKCHIP
 default 264 if MACH_H4700
 default 0
 help
   Maximum number of GPIOs in the system.

   If unsure, leave the default value.

So if FASTPATH_NGPIO should be anything else than
ARCH_NR_GPIO this has to be established somewhere
as a floor or half or something, but I would just set it as
the same as ARCH_NR_GPIOS...

The main reason this define exist is for this function
from :

/* Convert between the old gpio_ and new gpiod_ interfaces */
struct gpio_desc *gpio_to_desc(unsigned gpio);

Nowadays that fact is a bit obscured since the variable is
only used when assigning the base (in the global GPIO
number space, which is what we want to get rid of but
sigh) in gpiochip_find_base() where it attempts to place
a newly allocated gpiochip in the higher region of this
numberspace since the embedded SoC GPIO base tends
to be 0, on old platforms.

So I don't know about this.

Can't we just use ARCH_NR_GPIOS?

Very few systems have more than 512 assigned global
GPIO numbers and those are FPGA experimental machines.

In the long run obviously I want to get rid of these defines
altogether and only allocate GPIO descriptos dynamically
so as you see I am reluctant to add new numberspace weirdness
around here.

Isn't that for total GPIO's in the system?
And the arrays just need to cater for max per chip?
 From what I can understand of the code which is admittedly limited.




Yeah the switch back to 256 was a mistake on my end (I think I
grabbed an incorrect version for my base). ARCH_NR_GPIOs
is the total number in the system which may be multiple
chips so yes we would be possibly allocating more space
than necessary.

unsigned long fastpath[2 * BITS_TO_LONGS(FASTPATH_NGPIO)]

unsigned long fastpath[2 * BITS_TO_LONGS(512)]
unsigned long fastpath[2 * DIV_ROUND_UP(512, 8 * sizeof(long))]

so we end up with 128 bytes on the stack total assuming I
can do math correctly. I think this a fairly reasonable
amount though, even if we are over-estimating if there are
multiple chips.



Yeah that's not too bad.
My system is a SOCFPGA so it'd be 2048 / 8 = 512.
Still not unreasonable.

But the system doesn't have a single gpio close to that.
The largest chip is 32.


--
Regards
Phil Reid

Re: [PATCH v3 3/3] ALSA: hda: Disabled unused audio controller for Dell platforms with Switchable Graphics

2018-04-14 Thread Lukas Wunner

On Sat, Apr 14, 2018 at 12:49:50PM +0200, Pali Rohár wrote:
> On Saturday 14 April 2018 12:45:12 Lukas Wunner wrote:
> > On Thu, Apr 12, 2018 at 10:15:41PM +0800, Kai-Heng Feng wrote:
> > > Do you have any suggestion to check if it connects to the system via
> > > Thunderbolt?
> > 
> > Just use pci_is_thunderbolt_attached(), introduced by 8531e283bee6,
> > like this:
> > 
> > if (check_dell_switchable_gfx(pci) && !pci_is_thunderbolt_attached(pci))
> 
> And what about PCI-e device attached to ExpressCard slot?

I don't know of a bullet-proof way to recognize those.  In theory
one could check if the PCIe port above the GPU is a non-hotplug
root port, but I think there are machines with hotplug capable
root ports with GPUs below them that aren't actually removable.

However I think ExpressCard-attached GPUs were rare, much less ones
with integrated HDA controller, so in reality that's probably a
non-issue.

Thanks,

Lukas

Re: Potential problem with 31e77c93e432dec7 ("sched/fair: Update blocked load when newly idle")

2018-04-14 Thread Vincent Guittot

Heiner,

On 12 April 2018 at 21:43, Heiner Kallweit  wrote:


 I'm going to prepare a debug patch to spy what's happening when entering 
 idle
>>
>> I'd like to narrow the problem a bit more with the 2 patchies aboves. Can 
>> you try
>> them separatly on top of c18bb396d3d261eb ("Merge 
>> git://git.kernel.org/pub/scm/linux/kernel/git/davem/net"))
>> and check if one of them fixes the problem ?i
>>
>> (They should apply on linux-next as well)
>>
>> First patch always kick ilb instead of doing ilb on local cpu before 
>> entering idle
>>
>> ---
>>  kernel/sched/fair.c | 3 +--
>>  1 file changed, 1 insertion(+), 2 deletions(-)
>>
>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>> index 0951d1c..b21925b 100644
>> --- a/kernel/sched/fair.c
>> +++ b/kernel/sched/fair.c
>> @@ -9739,8 +9739,7 @@ static void nohz_newidle_balance(struct rq *this_rq)
>>* candidate for ilb instead of waking up another idle CPU.
>>* Kick an normal ilb if we failed to do the update.
>>*/
>> - if (!_nohz_idle_balance(this_rq, NOHZ_STATS_KICK, CPU_NEWLY_IDLE))
>> - kick_ilb(NOHZ_STATS_KICK);
>> + kick_ilb(NOHZ_STATS_KICK);
>>   raw_spin_lock(&this_rq->lock);
>>  }
>>
>>
> I tested both patches, with both of them the issue still occurs. However,
> on top of linux-next from yesterday I have the impression that it happens
> less frequent with the second patch.
> On top of the commit mentioned by you I don't see a change in system behavior
> with either patch.

Thanks for the tests.
I was expecting to have more differences between the 2 patches and
especially no problem with the 1st patch which only send a ipi
reschedule to the other CPU if it is idle.
It seems to not really be related to what is done but to the fact that
it is done at that place in the code

Thanks
>
> Regards, Heiner

Re: Potential problem with 31e77c93e432dec7 ("sched/fair: Update blocked load when newly idle")

2018-04-14 Thread Vincent Guittot

Hi Niklas,

On 13 April 2018 at 00:39, Niklas Söderlund
 wrote:
> Hi Vincent,
>
> Thanks for helping trying to figure this out.
>
> On 2018-04-12 15:30:31 +0200, Vincent Guittot wrote:
>
> [snip]
>
>>
>> I'd like to narrow the problem a bit more with the 2 patchies aboves. Can 
>> you try
>> them separatly on top of c18bb396d3d261eb ("Merge 
>> git://git.kernel.org/pub/scm/linux/kernel/git/davem/net"))
>> and check if one of them fixes the problem ?i
>
> I tried your suggested changes based on top of c18bb396d3d261eb.
>
>>
>> (They should apply on linux-next as well)
>>
>> First patch always kick ilb instead of doing ilb on local cpu before 
>> entering idle
>>
>> ---
>>  kernel/sched/fair.c | 3 +--
>>  1 file changed, 1 insertion(+), 2 deletions(-)
>>
>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>> index 0951d1c..b21925b 100644
>> --- a/kernel/sched/fair.c
>> +++ b/kernel/sched/fair.c
>> @@ -9739,8 +9739,7 @@ static void nohz_newidle_balance(struct rq *this_rq)
>>* candidate for ilb instead of waking up another idle CPU.
>>* Kick an normal ilb if we failed to do the update.
>>*/
>> - if (!_nohz_idle_balance(this_rq, NOHZ_STATS_KICK, CPU_NEWLY_IDLE))
>> - kick_ilb(NOHZ_STATS_KICK);
>> + kick_ilb(NOHZ_STATS_KICK);
>>   raw_spin_lock(&this_rq->lock);
>>  }
>
> This change don't seem to effect the issue. I can still get the single
> ssh session and the system to lockup by hitting the return key. And
> opening a second ssh session immediately unblocks both the first ssh
> session and the serial console. And I can still trigger the console
> warning by just letting the system be once it locks-up. I do have
> just as before reset the system a few times to trigger the issue.

You results are similar to Heiner's ones. The problem is still there
even if we only kick ilb which mainly send an IPI reschedule to the
other CPU if Idle

>
> [  245.351693] INFO: rcu_sched detected stalls on CPUs/tasks:
> [  245.357199]  0-...!: (1 GPs behind) idle=93c/0/0 softirq=2224/2225 fqs=0
> [  245.363988]  (detected by 1, t=3025 jiffies, g=337, c=336, q=10)
> [  245.370003] Sending NMI from CPU 1 to CPUs 0:
> [  245.374368] NMI backtrace for cpu 0
> [  245.374377] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 
> 4.16.0-10930-ged741fb4567c816f #42
> [  245.374379] Hardware name: Generic R8A7791 (Flattened Device Tree)
> [  245.374393] PC is at arch_cpu_idle+0x24/0x40
> [  245.374397] LR is at arch_cpu_idle+0x34/0x40
> [  245.374400] pc : []lr : []psr: 60050013
> [  245.374403] sp : c0b01f40  ip : c0b01f50  fp : c0b01f4c
> [  245.374405] r10: c0a56a38  r9 : e7fffbc0  r8 : c0b04c00
> [  245.374407] r7 : c0b04c78  r6 : c0b04c2c  r5 : e000  r4 : 0001
> [  245.374410] r3 : c0119100  r2 : e77813a8  r1 : 0002d93c  r0 : 
> [  245.374414] Flags: nZCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment 
> none
> [  245.374417] Control: 10c5387d  Table: 6662006a  DAC: 0051
> [  245.374421] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 
> 4.16.0-10930-ged741fb4567c816f #42
> [  245.374423] Hardware name: Generic R8A7791 (Flattened Device Tree)
> [  245.374425] Backtrace:
> [  245.374435] [] (dump_backtrace) from [] 
> (show_stack+0x18/0x1c)
> [  245.374440]  r7:c0b47278 r6:60050193 r5: r4:c0b73d80
> [  245.374450] [] (show_stack) from [] 
> (dump_stack+0x84/0xa4)
> [  245.374456] [] (dump_stack) from [] 
> (show_regs+0x14/0x18)
> [  245.374460]  r7:c0b47278 r6:c0b01ef0 r5: r4:c0bc62c8
> [  245.374468] [] (show_regs) from [] 
> (nmi_cpu_backtrace+0xfc/0x118)
> [  245.374475] [] (nmi_cpu_backtrace) from [] 
> (handle_IPI+0x22c/0x294)
> [  245.374479]  r7:c0b47278 r6:c0b01ef0 r5:0007 r4:c0a775fc
> [  245.374488] [] (handle_IPI) from [] 
> (gic_handle_irq+0x8c/0x98)
> [  245.374492]  r10:c0a56a38 r9:c0b0 r8:f0803000 r7:c0b47278 r6:c0b01ef0 
> r5:c0b05244
> [  245.374495]  r4:f0802000 r3:0407
> [  245.374501] [] (gic_handle_irq) from [] 
> (__irq_svc+0x6c/0x90)
> [  245.374504] Exception stack(0xc0b01ef0 to 0xc0b01f38)
> [  245.374507] 1ee0:  0002d93c 
> e77813a8 c0119100
> [  245.374512] 1f00: 0001 e000 c0b04c2c c0b04c78 c0b04c00 e7fffbc0 
> c0a56a38 c0b01f4c
> [  245.374516] 1f20: c0b01f50 c0b01f40 c0108564 c0108554 60050013 
> [  245.374521]  r9:c0b0 r8:c0b04c00 r7:c0b01f24 r6: r5:60050013 
> r4:c0108554
> [  245.374528] [] (arch_cpu_idle) from [] 
> (default_idle_call+0x30/0x34)
> [  245.374535] [] (default_idle_call) from [] 
> (do_idle+0xd8/0x128)
> [  245.374540] [] (do_idle) from [] 
> (cpu_startup_entry+0x20/0x24)
> [  245.374543]  r7:c0b04c08 r6: r5:c0b80380 r4:00c2
> [  245.374549] [] (cpu_startup_entry) from [] 
> (rest_init+0x9c/0xbc)
> [  245.374555] [] (rest_init) from [] 
> (start_kernel+0x368/0x3ec)
> [  245.374558]  r5:c0b80380 r4:c0b803c0
> [  245.374563] [] (start_kernel) from [<>] (  (null))
> [  245.375369] rcu_sched kthread starved

Re: [RFC v2] virtio: support packed ring

2018-04-14 Thread Tiwei Bie

On Fri, Apr 13, 2018 at 06:22:45PM +0300, Michael S. Tsirkin wrote:
> On Sun, Apr 01, 2018 at 10:12:16PM +0800, Tiwei Bie wrote:
> > +static inline bool more_used(const struct vring_virtqueue *vq)
> > +{
> > +   return vq->packed ? more_used_packed(vq) : more_used_split(vq);
> > +}
> > +
> > +void *virtqueue_get_buf_ctx_split(struct virtqueue *_vq, unsigned int *len,
> > + void **ctx)
> > +{
> > +   struct vring_virtqueue *vq = to_vvq(_vq);
> > +   void *ret;
> > +   unsigned int i;
> > +   u16 last_used;
> > +
> > +   START_USE(vq);
> > +
> > +   if (unlikely(vq->broken)) {
> > +   END_USE(vq);
> > +   return NULL;
> > +   }
> > +
> > +   if (!more_used(vq)) {
> > +   pr_debug("No more buffers in queue\n");
> > +   END_USE(vq);
> > +   return NULL;
> > +   }
> 
> So virtqueue_get_buf_ctx_split should only call more_used_split.

Yeah, you're right! Will fix this in the next version.

> 
> to avoid such issues I think we should lay out the code like this:
> 
> XXX_split
> 
> XXX_packed
> 
> XXX wrappers

I'll do it. Thanks for the suggestion!

> 
> > +/* The standard layout
> 
> I'd drop standard here.

Got it. I'll drop the word "standard".

> 
> > for the packed ring is a continuous chunk of memory
> > + * which looks like this.
> > + *
> > + * struct vring_packed
> > + * {
> 
> Can the opening bracket go on the prev line pls?

Sure.

> 
> > + * // The actual descriptors (16 bytes each)
> > + * struct vring_packed_desc desc[num];
> > + *
> > + * // Padding to the next align boundary.
> > + * char pad[];
> > + *
> > + * // Driver Event Suppression
> > + * struct vring_packed_desc_event driver;
> > + *
> > + * // Device Event Suppression
> > + * struct vring_packed_desc_event device;
> 
> Maybe that's how our driver does it but it's not based on spec
> so I don't think this belongs in the header.

I will move it to the place where vring_packed_init()
is defined.

> 
> > + * };
> > + */
> > +
> > +static inline unsigned vring_packed_size(unsigned int num, unsigned long 
> > align)
> > +{
> > +   return ((sizeof(struct vring_packed_desc) * num + align - 1)
> > +   & ~(align - 1)) + sizeof(struct vring_packed_desc_event) * 2;
> > +}
> > +
> 
> Cant say this API makes sense for me.

Hmm, do you have any suggestion? Also move it out of this header?

Thanks for the review! :)

Best regards,
Tiwei Bie

> 
> 
> >  #endif /* _UAPI_LINUX_VIRTIO_RING_H */
> > -- 
> > 2.11.0

Re: [PATCH v3 3/3] ALSA: hda: Disabled unused audio controller for Dell platforms with Switchable Graphics

2018-04-14 Thread Lukas Wunner

On Thu, Apr 12, 2018 at 10:12:49PM +0800, Kai-Heng Feng wrote:
> at 6:50 PM, Takashi Iwai  wrote:
> > On Thu, 12 Apr 2018 12:42:39 +0200, Kai-Heng Feng wrote:
> > > When SG is enabled, the unused AMD audio controller still exposes its
> > > sysfs, so userspace still opens the control file and stream. If
> > > userspace tries to output sound through the stream, it hangs when
> > > runtime suspend kicks in:
> > > [ 12.796265] snd_hda_intel :01:00.1: Disabling via vga_switcheroo
> > > [ 12.796367] snd_hda_intel :01:00.1: Cannot lock devices!
> > > 
> > > Since the discrete audio controller isn't useful when SG enabled, we
> > > should just disable the device.
> > > 
> > > Signed-off-by: Kai-Heng Feng 
> >
> > I thought we manage this better now with runtime PM by Lukas's recent
> > patchset?
> 
> Yes, that's true. I'll update commit log for next iteration.
> 
> Nevertheless, the unusable control file and stream still get exposed via
> sysfs.
> We should disable them when SG is enabled.

Right, the hang on runtime suspend as mentioned in the commit message
should be gone in 4.17.

The purpose of this patch is thus to prevent the user from seeing or
opening the HDA controller on the discrete GPU.  If SG is enabled,
external DP/HDMI displays are muxed to the Intel GPU, hence the HDA
controller on the discrete GPU cannot communicate with the attached
displays.

Thanks,

Lukas

Re: Potential problem with 31e77c93e432dec7 ("sched/fair: Update blocked load when newly idle")

2018-04-14 Thread Vincent Guittot

On 13 April 2018 at 22:38, Niklas Söderlund
 wrote:
> Hi Vincent,
>
> On 2018-04-12 13:15:19 +0200, Niklas Söderlund wrote:
>> Hi Vincent,
>>
>> Thanks for your feedback.
>>
>> On 2018-04-12 12:33:27 +0200, Vincent Guittot wrote:
>> > Hi Niklas,
>> >
>> > On 12 April 2018 at 11:18, Niklas Söderlund
>> >  wrote:
>> > > Hi Vincent,
>> > >
>> > > I have observed issues running on linus/master from a few days back [1].
>> > > I'm running on a Renesas Koelsch board (arm32) and I can trigger a issue
>> > > by X forwarding the v4l2 test application qv4l2 over ssh and moving the
>> > > courser around in the GUI (best test case description award...). I'm
>> > > sorry about the really bad way I trigger this but I can't do it in any
>> > > other way, I'm happy to try other methods if you got some ideas. The
>> > > symptom of the issue is a complete hang of the system for more then 30
>> > > seconds and then this information is printed in the console:
>> >
>> > Heiner (edded cc) also reported similar problem with his platform: a
>> > dual core celeron
>> >
>> > Do you confirm that your platform is a dual cortex-A15 ? At least that
>> > what I have seen on web
>> > This would confirm that dual system is a key point.
>>
>> I can confirm that my platform is a dual core.
>
> I tested another dual core system today Renesas M3-W ARM64 system and I
> can observe the same lockups-on that system if it helps you understand
> the problem. It seems to be much harder to trigger the issue on this
> system for some reason. Hitting return in a ssh session don't seem to
> produce the lockup while starting a GUI using X forwarding over ssh it's
> possible.

Thanks for the test. That's confirm, it's only happen on dual core

>
> [  392.306441] INFO: rcu_preempt detected stalls on CPUs/tasks:
> [  392.312201]  (detected by 0, t=19366 jiffies, g=7177, c=7176, q=35)
> [  392.318555] All QSes seen, last rcu_preempt kthread activity 19368
> (4294990375-4294971007), jiffies_till_next_fqs=1, root ->qsmask 0x0
> [  392.330758] swapper/0   R  running task0 0  0
> 0x0022
> [  392.337883] Call trace:
> [  392.340365]  dump_backtrace+0x0/0x1c8
> [  392.344065]  show_stack+0x14/0x20
> [  392.347416]  sched_show_task+0x224/0x2e8
> [  392.351377]  rcu_check_callbacks+0x8ac/0x8b0
> [  392.355686]  update_process_times+0x2c/0x58
> [  392.359908]  tick_sched_handle.isra.5+0x30/0x50
> [  392.364479]  tick_sched_timer+0x40/0x90
> [  392.368351]  __hrtimer_run_queues+0xfc/0x208
> [  392.372659]  hrtimer_interrupt+0xd4/0x258
> [  392.376710]  arch_timer_handler_virt+0x28/0x48
> [  392.381194]  handle_percpu_devid_irq+0x80/0x138
> [  392.385767]  generic_handle_irq+0x28/0x40
> [  392.389813]  __handle_domain_irq+0x5c/0xb8
> [  392.393946]  gic_handle_irq+0x58/0xa8
> [  392.397640]  el1_irq+0xb4/0x130
> [  392.400810]  arch_cpu_idle+0x14/0x20
> [  392.404422]  default_idle_call+0x1c/0x38
> [  392.408381]  do_idle+0x17c/0x1f8
> [  392.411640]  cpu_startup_entry+0x20/0x28
> [  392.415598]  rest_init+0x24c/0x260
> [  392.419037]  start_kernel+0x3e8/0x414
>
> I was running the same tests on another ARM64 platform earlier using the
> same build which have more then two cores and there I could not observe
> this issue.
>
> --
> Regards,
> Niklas Söderlund

Re: [PATCH 2/4 v4] sched/rt: add rt_rq utilization tracking

2018-04-14 Thread Vincent Guittot

On 14 April 2018 at 12:05, Peter Zijlstra  wrote:
> On Fri, Mar 16, 2018 at 12:25:39PM +0100, Vincent Guittot wrote:
>> diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
>> index 783eacf..a8003a9 100644
>> --- a/kernel/sched/sched.h
>> +++ b/kernel/sched/sched.h
>> @@ -592,6 +592,8 @@ struct rt_rq {
>>   unsigned long   rt_nr_total;
>>   int overloaded;
>>   struct plist_head   pushable_tasks;
>> +
>> + struct sched_avg avg;
>
> We only want this for the root cgroup, right? So why is this per cgroup?

Yes it's only for root cgroup. I have put it there for consistency
with the CFS' PELT but it's only waste Bytes

>
> That is, I was expecting it to be rq::rt_avg or something.

Re: [PATCH 0/4 v4] sched/rt: track rt rq utilization

2018-04-14 Thread Vincent Guittot

On 14 April 2018 at 12:07, Peter Zijlstra  wrote:
>
>
> What I don't see in this patch-set is removal of the current rt_avg
> stuff.

This RT load tracking doesn't replace current rt_avg because they are
not using same period and providing same function
current rt_avg uses sysctl_sched_time_avg to define the averaging
period and it's default period is 1 second. But PELT uses a fixed
period
current rt_avg is tracking irq accounting which this patch doesn't do.
This is probably doable but will need more complex changes

Replacing current rt_avg by this new RT utilization tracking would
require more complex changes so I didn't want to add them this 1st
step.

>
> And I didn't look closely enough; but are the root cfs and rt pelt
> windows aligned? They really should be; otherwise you can't combine them
> sanely.

No They are not aligned.
I agree that this could generate some variation on the sum. I'm going
to fix this point

[PATCH] isofs: fix potential memory leak in mount option parsing

2018-04-14 Thread Chengguang Xu

When specifying string type mount option (e.g., iocharset)
several times in a mount, current option parsing may
cause memory leak. Hence, call kfree for previous one
in this case. Meanwhile, check memory allocation result
for it.

Signed-off-by: Chengguang Xu 
---
 fs/isofs/inode.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/fs/isofs/inode.c b/fs/isofs/inode.c
index bc258a4..ec3fba7 100644
--- a/fs/isofs/inode.c
+++ b/fs/isofs/inode.c
@@ -394,7 +394,10 @@ static int parse_options(char *options, struct 
iso9660_options *popt)
break;
 #ifdef CONFIG_JOLIET
case Opt_iocharset:
+   kfree(popt->iocharset);
popt->iocharset = match_strdup(&args[0]);
+   if (!popt->iocharset)
+   return 0;
break;
 #endif
case Opt_map_a:
-- 
1.8.3.1

[PATCH] exofs: fix potential memory leak in mount option parsing

2018-04-14 Thread Chengguang Xu

When specifying string type mount option several times
in a mount, current option parsing may cause memory leak.
Hence, call kfree for previous one in this case.

Signed-off-by: Chengguang Xu 
---
 fs/exofs/super.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/fs/exofs/super.c b/fs/exofs/super.c
index 179cd5c..106b818 100644
--- a/fs/exofs/super.c
+++ b/fs/exofs/super.c
@@ -101,6 +101,7 @@ static int parse_options(char *options, struct 
exofs_mountopt *opts)
token = match_token(p, tokens, args);
switch (token) {
case Opt_name:
+   kfree(opts->dev_name);
opts->dev_name = match_strdup(&args[0]);
if (unlikely(!opts->dev_name)) {
EXOFS_ERR("Error allocating dev_name");
-- 
1.8.3.1

Can i have a word with you?

2018-04-14 Thread pradeep . bhardwaj



Disclaimer: This message may contain privileged and confidential information 
and 
is solely for the use of intended recipient. The views expressed in 
this email  
   are those of the sender and not Future Group's. The 
recipient should check this 
email and attachments for the presence of 
viruses. Future Group accepts no liabi  
  lity for any damage caused by any 
virus transmitted by this email. Future Group   
  may monitor and record 
all emails.

How to disable tracing at runtime from the Linux kernel command line?

2018-04-14 Thread Paul Menzel


Dear Linux folks,


I am trying to reduce the boot time of a standard Linux distribution 
kernel. Currently, distributions – at least Debian und Ubuntu – enable 
function tracing.


```
CONFIG_FTRACE=y
CONFIG_FUNCTION_TRACER=y
CONFIG_FUNCTION_GRAPH_TRACER=y

CONFIG_EVENT_TRACING=y
```

This is great, as it makes it easy to use tracing to hunt down things 
holding up the boot. But it also skews the boot time quite a lot.


```
$ sudo dmesg
[…]
[0.318412] initcall init_graph_trace+0x0/0x64 returned 0 after 
199218 usecs

[…]
[1.770287] calling  event_trace_init+0x0/0x2c2 @ 1
[2.052871] initcall event_trace_init+0x0/0x2c2 returned 0 after 
275942 usecs

[…]
```

Is there a way to disable tracing on the Linux kernel command line to 
disable tracing?



Kind regards,

Paul

Re: [PATCH] Revert "xhci: plat: Register shutdown for xhci_plat"

2018-04-14 Thread Greg Kroah-Hartman

On Fri, Apr 13, 2018 at 12:34:00PM +0530, Harsh Shandilya wrote:
> On 13 April 2018 11:51:28 AM IST, Greg Kroah-Hartman 
>  wrote:
> >On Fri, Apr 13, 2018 at 08:12:31AM +0530, Harsh Shandilya wrote:
> >> On 13 April 2018 5:59:51 AM IST, Greg Hackmann 
> >wrote:
> >> >Pixel 2 field testers reported that when they tried to reboot their
> >> >phones with some USB devices plugged in, the reboot would get wedged
> >> >and
> >> >eventually trigger watchdog reset.  Once the Pixel kernel team found
> >a
> >> >reliable repro case, they narrowed it down to this commit's 4.4.y
> >> >backport.  Reverting the change made the issue go away.
> >> 
> >> Are you allowed to make the repro steps public? I'm writing this from
> >> a walleye and would be grateful if I could test for this in the
> >> modifed tree I'm running atm.  -- 
> >
> >I was told the steps are pretty simple:
> > - reboot the phone a lot
> >eventually it will hang.  There's a fix in the code aurora kernel tree
> >for this that they never sent upstream for some odd reason (they sent
> >the first patch, why not the second?)
> >
> >I'll go revert this for now, thanks for the patch!
> >
> >greg k-h
> 
> That'd make sense, I only tried rebooting like five times before I had to run 
> for a class.
> 
> As far as CAF is concerned, I feel the not submitting upstream,
> working extra to write patches which have usually better variants
> already upstream, seems to be common. All USB changes were dropped
> when they merged kernel-common into msm-3.18 with no real explanation
> which has been an annoyance more than once during merging -stable in
> my fork of msm-3.18. While I understand their situation of maintaining
> upwards of 5 million lines of code not upstream, it still feels sloppy
> to not merge stable updates and do extra work instead. /* End rant */

CAF fixed this back on Feb 1 in their tree, yet did not send that
upstream, or to anyone else:

https://source.codeaurora.org/quic/la/kernel/msm-4.4/commit/?h=LV.HB.1.1.5-03810-8x96.0&id=a7a5307ee04ad349d365ad50f304605a9cd9bd0a

Feel free to rant some more, I'm going to go revert the original
upstream patch as that is half-completed, and obviously broken :(

thanks,

greg k-h

Re: [PATCH 2/5] dt-bindings: display: atmel: add optional output-mode property

2018-04-14 Thread Peter Rosin

On 2018-04-13 19:46, Rob Herring wrote:
> On Mon, Apr 09, 2018 at 12:59:15PM +0200, Peter Rosin wrote:
>> Useful for beating cases where an output mode selection heuristic
>> fails.
>>
>> Signed-off-by: Peter Rosin 
>> ---
>>  Documentation/devicetree/bindings/display/atmel/hlcdc-dc.txt | 4 
>>  1 file changed, 4 insertions(+)
>>
>> diff --git a/Documentation/devicetree/bindings/display/atmel/hlcdc-dc.txt 
>> b/Documentation/devicetree/bindings/display/atmel/hlcdc-dc.txt
>> index 82f2acb3d374..dc478455b883 100644
>> --- a/Documentation/devicetree/bindings/display/atmel/hlcdc-dc.txt
>> +++ b/Documentation/devicetree/bindings/display/atmel/hlcdc-dc.txt
>> @@ -10,6 +10,10 @@ Required properties:
>>   - #address-cells: should be set to 1.
>>   - #size-cells: should be set to 0.
>>  
>> +Optional properties:
>> + - output-mode: override any output mode selection hueristic and force a
>> +   particular output mode. One of "rgb444", "rgb565", "rgb666" and "rgb888".
>> +
> 
> This needs to be generic, not just added to some random display 
> controller binding.
> 
> It also belongs in the port or endpoint node as is done for camera 
> interfaces.

Hmm, should I extend media/video-interfaces.txt with more bus types (or since
I'm targeting parallel interfaces, perhaps the new bus types should be
autodetected from other props?) or should a write a new binding similar to
it?

One question regarding bus-width, should it include hsync/vsync/de/clk?
If yes, how to distinguish rgb565 with all those four from rgb666 with
only de/clk (some panels do not need hsync/vsync)? 20 lines in both
cases...

Or are rgb444/rgb565/rgb666/rgb888 already supported by the media video
interface binding? That's not at all obvious to me.

Cheers,
Peter

Re: [PATCH v2] IB: make INFINIBAND_ADDR_TRANS configurable

2018-04-14 Thread Dennis Dalessandro


On 4/13/2018 1:27 PM, Greg Thelen wrote:

Allow INFINIBAND without INFINIBAND_ADDR_TRANS.

Signed-off-by: Greg Thelen 
Cc: Tarick Bedeir 
Change-Id: I6fbbf8a432e467710fa65e4904b7d61880b914e5


Forgot to remove the Gerrit thing.

-Denny

Re: INFO: task hung in __blkdev_get

2018-04-14 Thread Tetsuo Handa

OK. The patch was sent to linux.git as commit 1e047eaab3bb5564.

#syz fix: block/loop: fix deadlock after loop_set_status

Dmitry Vyukov " wrote:
> On Tue, Apr 10, 2018 at 3:04 PM, Tetsuo Handa
>  wrote:
> > Dmitry Vyukov wrote:
> >> On Tue, Apr 10, 2018 at 12:55 PM, Tetsuo Handa
> >>  wrote:
> >> > Hello.
> >> >
> >> > Since syzbot is reporting so many hung up bug which involves /dev/loopX ,
> >> > is it possible to "temporarily" apply below patch for testing under 
> >> > syzbot
> >>
> >> Unfortunately it's not possible, for full explanation please see:
> >> https://github.com/google/syzkaller/blob/master/docs/syzbot.md#no-custom-patches
> >>
> >
> > I mean, sending custom patch to linux.git for -rc and revert the custom 
> > patch
> > before -final is released. It won't take so much period until we get the 
> > result.
> 
> Ah, I see, then I guess it wasn't a question to me.
> 
I noticed that there already is the lockdep report at

  possible deadlock in blkdev_reread_part
  https://syzkaller.appspot.com/bug?id=bf154052f0eea4bc7712499e4569505907d15889

entry, and no patch is proposed yet:

  https://groups.google.com/forum/#!msg/syzkaller-bugs/2Rw8-OM6IbM/SI4DyK-1AQAJ

Re: WARNING: lock held when returning to user space!

2018-04-14 Thread Tetsuo Handa

The patch was sent to linux.git as commit bdac616db9bbadb9.

#syz fix: loop: fix LOOP_GET_STATUS lock imbalance

Re: [PATCH v2] IB: make INFINIBAND_ADDR_TRANS configurable

2018-04-14 Thread Greg Thelen

On Sat, Apr 14, 2018 at 8:13 AM Dennis Dalessandro <
dennis.dalessan...@intel.com> wrote:

> On 4/13/2018 1:27 PM, Greg Thelen wrote:
> > Allow INFINIBAND without INFINIBAND_ADDR_TRANS.
> >
> > Signed-off-by: Greg Thelen 
> > Cc: Tarick Bedeir 
> > Change-Id: I6fbbf8a432e467710fa65e4904b7d61880b914e5

> Forgot to remove the Gerrit thing.

> -Denny

Ack.  My bad.  Will repost.  Unfortunately checkpatch didn't notice.

[PATCH v3] IB: make INFINIBAND_ADDR_TRANS configurable

2018-04-14 Thread Greg Thelen

Allow INFINIBAND without INFINIBAND_ADDR_TRANS.

Signed-off-by: Greg Thelen 
Cc: Tarick Bedeir 
---
 drivers/infiniband/Kconfig | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/infiniband/Kconfig b/drivers/infiniband/Kconfig
index ee270e065ba9..2a972ed6851b 100644
--- a/drivers/infiniband/Kconfig
+++ b/drivers/infiniband/Kconfig
@@ -61,9 +61,12 @@ config INFINIBAND_ON_DEMAND_PAGING
  pages on demand instead.
 
 config INFINIBAND_ADDR_TRANS
-   bool
+   bool "RDMA/CM"
depends on INFINIBAND
default y
+   ---help---
+ Support for RDMA communication manager (CM).
+ This allows for a generic connection abstraction over RDMA.
 
 config INFINIBAND_ADDR_TRANS_CONFIGFS
bool
-- 
2.17.0.484.g0c8726318c-goog

Re: tg3 crashes under high load, when using 100Mbits

2018-04-14 Thread Kai-Heng Feng

Hi Satish,

> On 2018Mar21, at 00:57, Kai-Heng Feng  wrote:
> 
> Satish Baddipadige  wrote:
> 
>> On Thu, Feb 15, 2018 at 7:37 PM, Siva Reddy Kallam
>>  wrote:
>>> On Mon, Feb 12, 2018 at 10:59 AM, Siva Reddy Kallam
>>>  wrote:
 On Fri, Feb 9, 2018 at 10:41 AM, Kai Heng Feng
  wrote:
> Hi Broadcom folks,
> 
> We are now enabling a new platform with tg3 nic, unfortunately we observed
> the bug [1] that dated back to 2015.
> I tried commit 4419bb1cedcd ("tg3: Add workaround to restrict 5762 MRRS to
> 2048”) but it does’t work.
> 
> Do you have any idea how to solve the issue?
> 
> [1] https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1447664
> 
> Kai-Heng
 Thank you for reporting. We will check and update you.
>>> With link aware mode, the clock speed could be slow and boot code does not
>>> complete within the expected time with lower link speeds. Need to override
>>> and the clock in driver. We are checking the feasibility of adding
>>> this in driver or firmware.
>> 
>> Hi Kai-Heng,
>> 
>> Can you please test the attached patch?
> 
> I built a kernel and asked affected users to try.

Users reported that the crash still happens with the patch.

Kai-Heng

> 
> Thanks for your work.
> 
> Kai-Heng
> 
>> 
>> Thanks,
>> Satish
>>

Re: [PATCH v13 6/6] PCI/DPC: Do not do recovery for hotplug enabled system

2018-04-14 Thread Sinan Kaya

Hi Keith, Bjorn;

On 4/12/2018 1:41 PM, Sinan Kaya wrote:
> On 4/12/2018 1:09 PM, Keith Busch wrote:
>> On Thu, Apr 12, 2018 at 12:27:20PM -0400, Sinan Kaya wrote:
>>> On 4/12/2018 11:02 AM, Keith Busch wrote:

 Also, I thought the plan was to keep hotplug and non-hotplug the same,
 except for the very end: if not a hotplug bridge, initiate the rescan
 automatically after releasing from containment, otherwise let pciehp
 handle it when the link reactivates.

>>>
>>> Hmm...
>>>
>>> AER driver doesn't do stop and rescan approach for fatal errors. AER driver
>>> makes an error callback followed by secondary bus reset and finally driver
>>> the resume callback on the endpoint only if link recovery is successful.
>>> Otherwise, AER driver bails out with recovery unsuccessful message.
>>
>> I'm not sure if that's necessarily true. People have reported AER
>> handling triggers PCIe hotplug events, and creates some interesting race
>> conditions:
> 
> By reading the code, I don't see a stop and rescan in the AER error recovery
> path.
> 
> As both logs indicate, stop and rescan is initiated in response to link down
> and link up interrupts triggered by the secondary bus reset. 
> The SW entity handling these is not AER driver. It is the hotplug driver
> running asynchronous to the AER driver.
> 
> AER driver should have tried a slot reset before attempting to do a secondary
> bus reset.
> 
> /**
>  * pci_reset_slot - reset a PCI slot
>  * @slot: PCI slot to reset
>  *
>  * A PCI bus may host multiple slots, each slot may support a reset mechanism
>  * independent of other slots.  For instance, some slots may support slot 
> power
>  * control.  In the case of a 1:1 bus to slot architecture, this function may
>  * wrap the bus reset to avoid spurious slot related events such as hotplug.
>  * Generally a slot reset should be attempted before a bus reset.  All of the
>  * function of the slot and any subordinate buses behind the slot are reset
>  * through this function.  PCI config space of all devices in the slot and
>  * behind the slot is saved before and restored after reset.
>  *
>  * Return 0 on success, non-zero on error.
>  */
> int pci_reset_slot(struct pci_slot *slot)
> 
> Slot reset is there to mask hotplug interrupts before the reset and unmask 
> them
> after reset.
> 
>>
>> https://marc.info/?l=linux-pci&m=152336615707640&w=2
>>
>> https://www.spinics.net/lists/linux-pci/msg70614.html
>>
>>> Why do we need an additional rescan in the DPC driver if the link is up
>>> and driver resumes operation?
>>
>> I thought the plan was to have DPC always go through the removal path
>> to ensure all devices are properly configured when containment is
>> released. In order to reconfigure those, you'll need to initiate the
>> rescan from somewhere.
>>
> 
> This is where the contradiction is. 
> 
> Bjorn is asking for a unified error handling for both AER and DPC.
> 
> Current AER error recovery framework is error callback + secondary
> bus reset + resume callback.
> 
> How does this stop + rescan model fit?
> 
> Do we want to change the error recovery framework? I suppose this will 
> become a bigger conversation as there are more customers of this.
> 

I also want to highlight that the PCI Error recovery sequence is well
documented here.

https://www.kernel.org/doc/Documentation/PCI/pci-error-recovery.txt

We don't really have to guess what Linux does. 

IMO, the hotplug issues Keith is seeing are orthogonal and needs to be
addressed independent of this series by following the pci slot reset
procedure.

Hotplug driver handles link up/down events due to insertion/removal.
Hotplug driver is expected to do the re-enumeration.

I don't understand why we need to do another re-enumeration if system
observes a PCIe error handled by the AER/DPC driver. 

These two are independent events.

PCIe error recovery framework does the reset callback + SBR + resume
behavior today.

Bjorn,

You indicated that you want to unify the AER and DPC behavior. Let's
settle on what we want to do one more time. We have been going forth
and back on the direction.

We are on V13. I hope we won't hit V20 :)

Sinan

-- 
Sinan Kaya
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm 
Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux 
Foundation Collaborative Project.

Re: [PATCH 2/2] kvm: nVMX: Introduce KVM_CAP_STATE

2018-04-14 Thread Raslan, KarimAllah

On Thu, 2018-04-12 at 17:12 +0200, KarimAllah Ahmed wrote:
> From: Jim Mattson 
> 
> For nested virtualization L0 KVM is managing a bit of state for L2 guests,
> this state can not be captured through the currently available IOCTLs. In
> fact the state captured through all of these IOCTLs is usually a mix of L1
> and L2 state. It is also dependent on whether the L2 guest was running at
> the moment when the process was interrupted to save its state.
> 
> With this capability, there are two new vcpu ioctls: KVM_GET_VMX_STATE and
> KVM_SET_VMX_STATE. These can be used for saving and restoring a VM that is
> in VMX operation.
> 
> Cc: Paolo Bonzini 
> Cc: Radim Krčmář 
> Cc: Thomas Gleixner 
> Cc: Ingo Molnar 
> Cc: H. Peter Anvin 
> Cc: x...@kernel.org
> Cc: k...@vger.kernel.org
> Cc: linux-kernel@vger.kernel.org
> Signed-off-by: Jim Mattson 
> [karahmed@ - rename structs and functions and make them ready for AMD and
>  address previous comments.
>- rebase & a bit of refactoring.
>- Merge 7/8 and 8/8 into one patch.
>- Force a VMExit from L2 after reading the kvm_state to avoid
>  mixed state between L1 and L2 on resurrecting the instance. ]
> Signed-off-by: KarimAllah Ahmed 
> ---
> v2 -> v3:
> - Remove the forced VMExit from L2 after reading the kvm_state. The actual
>   problem is solved.
> - Rebase again!
> - Set nested_run_pending during restore (not sure if it makes sense yet or
>   not).
> - Reduce KVM_REQUEST_ARCH_BASE to 7 instead of 8 (the other alternative is
>   to switch everything to u64)
> 
> v1 -> v2:
> - Rename structs and functions and make them ready for AMD and address
>   previous comments.
> - Rebase & a bit of refactoring.
> - Merge 7/8 and 8/8 into one patch.
> - Force a VMExit from L2 after reading the kvm_state to avoid mixed state
>   between L1 and L2 on resurrecting the instance.
> ---
>  Documentation/virtual/kvm/api.txt |  47 ++
>  arch/x86/include/asm/kvm_host.h   |   7 ++
>  arch/x86/include/uapi/asm/kvm.h   |  38 
>  arch/x86/kvm/vmx.c| 177 
> +-
>  arch/x86/kvm/x86.c|  21 +
>  include/linux/kvm_host.h  |   2 +-
>  include/uapi/linux/kvm.h  |   5 ++
>  7 files changed, 292 insertions(+), 5 deletions(-)
> 
> diff --git a/Documentation/virtual/kvm/api.txt 
> b/Documentation/virtual/kvm/api.txt
> index 1c7958b..c51d5d3 100644
> --- a/Documentation/virtual/kvm/api.txt
> +++ b/Documentation/virtual/kvm/api.txt
> @@ -3548,6 +3548,53 @@ Returns: 0 on success,
>   -ENOENT on deassign if the conn_id isn't registered
>   -EEXIST on assign if the conn_id is already registered
>  
> +4.114 KVM_GET_STATE
> +
> +Capability: KVM_CAP_STATE
> +Architectures: x86
> +Type: vcpu ioctl
> +Parameters: struct kvm_state (in/out)
> +Returns: 0 on success, -1 on error
> +Errors:
> +  E2BIG: the data size exceeds the value of 'size' specified by
> + the user (the size required will be written into size).
> +
> +struct kvm_state {
> + __u16 flags;
> + __u16 format;
> + __u32 size;
> + union {
> + struct kvm_vmx_state vmx;
> + struct kvm_svm_state svm;
> + __u8 pad[120];
> + };
> + __u8 data[0];
> +};
> +
> +This ioctl copies the vcpu's kvm_state struct from the kernel to userspace.
> +
> +4.115 KVM_SET_STATE
> +
> +Capability: KVM_CAP_STATE
> +Architectures: x86
> +Type: vcpu ioctl
> +Parameters: struct kvm_state (in)
> +Returns: 0 on success, -1 on error
> +
> +struct kvm_state {
> + __u16 flags;
> + __u16 format;
> + __u32 size;
> + union {
> + struct kvm_vmx_state vmx;
> + struct kvm_svm_state svm;
> + __u8 pad[120];
> + };
> + __u8 data[0];
> +};
> +
> +This copies the vcpu's kvm_state struct from userspace to the kernel.
> +>>> 13a7c9e... kvm: nVMX: Introduce KVM_CAP_STATE
>  
>  5. The kvm_run structure
>  
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 9fa4f57..ad2116a 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -75,6 +75,7 @@
>  #define KVM_REQ_HV_EXIT  KVM_ARCH_REQ(21)
>  #define KVM_REQ_HV_STIMERKVM_ARCH_REQ(22)
>  #define KVM_REQ_LOAD_EOI_EXITMAP KVM_ARCH_REQ(23)
> +#define KVM_REQ_GET_VMCS12_PAGES KVM_ARCH_REQ(24)
>  
>  #define CR0_RESERVED_BITS   \
>   (~(unsigned long)(X86_CR0_PE | X86_CR0_MP | X86_CR0_EM | X86_CR0_TS \
> @@ -1084,6 +1085,12 @@ struct kvm_x86_ops {
>  
>   void (*setup_mce)(struct kvm_vcpu *vcpu);
>  
> + int (*get_state)(struct kvm_vcpu *vcpu,
> +  struct kvm_state __user *user_kvm_state);
> + int (*set_state)(struct kvm_vcpu *vcpu,
> +  struct kvm_state __user *user_kvm_state);
> + void (*get_vmcs12_pages)(str

Re: [PATCH v2] IB: make INFINIBAND_ADDR_TRANS configurable

2018-04-14 Thread Joe Perches

On Sat, 2018-04-14 at 15:34 +, Greg Thelen wrote:
> On Sat, Apr 14, 2018 at 8:13 AM Dennis Dalessandro <
> dennis.dalessan...@intel.com> wrote:
> 
> > On 4/13/2018 1:27 PM, Greg Thelen wrote:
> > > Allow INFINIBAND without INFINIBAND_ADDR_TRANS.
> > > 
> > > Signed-off-by: Greg Thelen 
> > > Cc: Tarick Bedeir 
> > > Change-Id: I6fbbf8a432e467710fa65e4904b7d61880b914e5
> > Forgot to remove the Gerrit thing.
> > -Denny
> 
> Ack.  My bad.  Will repost.  Unfortunately checkpatch didn't notice.

Probably because Change-Id: is after a Signed-off-by: line

Re: [PATCH] pinctrl/samsung: Correct EINTG banks order

2018-04-14 Thread Paweł Chmiel

On Wednesday, April 11, 2018 11:52:44 AM CEST Krzysztof Kozlowski wrote:
> On Wed, Apr 11, 2018 at 10:36 AM, Tomasz Figa  wrote:
> > 2018-04-10 17:38 GMT+09:00 Tomasz Figa :
> >> 2018-04-10 16:06 GMT+09:00 Krzysztof Kozlowski :
> >>> On Sun, Apr 8, 2018 at 8:07 PM, Paweł Chmiel
> >>>  wrote:
>  All banks with GPIO interrupts should be at beginning
>  of bank array and without any other types of banks between them.
>  This order is expected by exynos_eint_gpio_irq, when doing
>  interrupt group to bank translation.
>  Otherwise, kernel NULL pointer dereference would happen
>  when trying to handle interrupt, due to wrong bank being looked up.
>  Observed on s5pv210, when trying to handle gpj0 interrupt,
>  where kernel was mapping it to gpi bank.
> >>>
> >>> Thanks for the patch. The issue looks real although one thing was
> >>> missed - there is a gap in SVC group between GPK2 and GPL0 (pointed by
> >>> Marek Szyprowski):
> >>>
> >>> 0x0 - EINT_23 - gpk0
> >>> 0x1 - EINT_24 - gpk1
> >>> 0x2 - EINT_25 - gpk2
> >>> 0x4 - EINT_27 - gpl0
> >>> 0x7 - EINT_8 - gpm0
> >>>
> >>> Maybe this should be done differently - to remove such hidden
> >>> requirement entirely in favor of another parameter of
> >>> EXYNOS_PIN_BANK_EINTG argument?
> >>
> >> Perhaps let's limit this patch to s5pv210 and Exynos5410 alone, where
> >> a simple swap of bank order in the arrays should be okay.
> >>
> >> We might also need to have some fixes on 4x12, because I noticed that
> >> in exynos4x12_pin_banks0[] there is a hole in eint_offsets between
> >> gpd1 and gpf0 and exynos4x12_pin_banks1[] starts with gpk0 that has
> >> eint_offset equal to 0x08 (not 0).
> >
> > To close the loop, after talking offline and checking the
> > documentation, Exynos4x12 is fine, because the group numbers in SVC
> > register actually match what is defined in bank arrays.
> 
> Great! Thanks for checking.
> 
> Best regards,
> Krzysztof
> 

Thanks for all comments. I'll prepare new version of patches, with all fixes 
and documentation.

Best regards
Paweł

Re: How to disable tracing at runtime from the Linux kernel command line?

2018-04-14 Thread Steven Rostedt

On Sat, 14 Apr 2018 15:09:33 +0200
Paul Menzel  wrote:

> Dear Linux folks,
> 
> 
> I am trying to reduce the boot time of a standard Linux distribution 
> kernel. Currently, distributions – at least Debian und Ubuntu – enable 
> function tracing.
> 
> ```
> CONFIG_FTRACE=y
> CONFIG_FUNCTION_TRACER=y
> CONFIG_FUNCTION_GRAPH_TRACER=y
> 
> CONFIG_EVENT_TRACING=y
> ```
> 
> This is great, as it makes it easy to use tracing to hunt down things 
> holding up the boot. But it also skews the boot time quite a lot.
> 
> ```
> $ sudo dmesg
> […]
> [0.318412] initcall init_graph_trace+0x0/0x64 returned 0 after 
> 199218 usecs
> […]
> [1.770287] calling  event_trace_init+0x0/0x2c2 @ 1
> [2.052871] initcall event_trace_init+0x0/0x2c2 returned 0 after 
> 275942 usecs
> […]
> ```
> 
> Is there a way to disable tracing on the Linux kernel command line to 
> disable tracing?
> 

Try initcall_blacklist. But you acquire all risks when doing so. I
never tried it, so I have no idea what side effects that may have.

-- Steve

Re: [PATCH] fs/dcache.c: re-add cond_resched() in shrink_dcache_parent()

2018-04-14 Thread Linus Torvalds

On Sat, Apr 14, 2018 at 1:02 AM, Al Viro  wrote:
>
> "Bail out" is definitely a bad idea, "sleep"... what on?  Especially
> since there might be several evictions we are overlapping with...

Well, one thing that should be looked at is the return condition from
select_collect() that shrink_dcache_parent() uses.

Because I think that return condition is somewhat insane.

The logic there seems to be:

 - if we have found something, stop walking. Either NOW (if somebody
is waiting) or after you've hit a rename (if nobody is)

Now, this actually makes perfect sense for the whole rename situation:
if there's nobody waiting for us, but we hit a rename, we probably
should stop anyway just to let whoever is doing that rename continue,
and we might as well try to get rid of the dentries we have found so
far.

But it does *not* make sense for the case where we've hit a dentry
that is already on the shrink list. Sure, we'll continue to gather all
the other dentries, but if there is concurrent shrinking, shouldn't we
give up the CPU more eagerly - *particularly* if somebody else is
waiting (it might be the other process that actually gets rid of the
shrinking dentries!)?

So my gut feel is that we should at least try doing something like
this in select_collect():

-   if (!list_empty(&data->dispose))
+   if (data->found)
ret = need_resched() ? D_WALK_QUIT : D_WALK_NORETRY;

because even if we haven't actually been able to shrink something, if
we hit an already shrinking entry we should probably at least not do
the "retry for rename". And if we actually are going to reschedule, we
might as well start from the beginning.

I realize that *this* thread might not be making any actual progress
(because it didn't find any dentries to shrink), but since it did find
_a_ dentry that is being shrunk, we know the operation itself - on a
bigger scale - is making progress.

Hmm?

Now, this is independent of the fact that we probably do need a
cond_resched() in shrink_dcache_parent(), to actually do the
reschedule if we're not preemptible. The "need_resched()" in
select_collect() is obviously done while holding

HOWEVER. Even in that case, I don't think shrink_dcache_parent() is
the right point. I'd rather just do it differently in
shrink_dentry_list(): do it even for the empty list case by just doing
it at the top of the loop:

 static void shrink_dentry_list(struct list_head *list)
 {
-   while (!list_empty(list)) {
+   while (cond_resched(), !list_empty(list)) {
struct dentry *dentry, *parent;

-   cond_resched();

so my full patch that I would suggest might be TheRightThing(tm) is
attached (but it should be committed as two patches, since the two
issues are independent - I'm just attaching it as one for testing in
case somebody wants to run some nasty workloads on it)

Comments?

Side note: I think we might want to make that

while (cond_resched(), ) {

}

thing a pattern for doing cond_resched() in loops, instead of having
the cond_resched() inside the loop itself.

It not only handles the "zero iterations" case, it also ends up being
neutral location-waise wrt 'continue' statements, and potentially
generates *better* code.

For example, in this case, doing the cond_resched() at the very top of
the loop means that the loop itself then does that

dentry = list_entry(list->prev, struct dentry, d_lru);

right after the "list_empty()" test - which means that register
allocation etc might be easier, because it doesn't have a function
call (with associated register clobbers) in between the two accesses
to "list".

And I think that might be a fairly common pattern - the loop
conditional uses the same values as the loop itself then uses.

I don't know. Maybe I'm just making excuses for the somewhat unusual syntax.

Anybody want to test this out?

   Linus
 fs/dcache.c | 6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/fs/dcache.c b/fs/dcache.c
index 86d2de63461e..76507109cbcd 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -1049,11 +1049,9 @@ static bool shrink_lock_dentry(struct dentry *dentry)

 static void shrink_dentry_list(struct list_head *list)
 {
-	while (!list_empty(list)) {
+	while (cond_resched(), !list_empty(list)) {
 		struct dentry *dentry, *parent;

-		cond_resched();
-
 		dentry = list_entry(list->prev, struct dentry, d_lru);
 		spin_lock(&dentry->d_lock);
 		rcu_read_lock();
@@ -1462,7 +1460,7 @@ static enum d_walk_ret select_collect(void *_data, struct dentry *dentry)
 	 * ensures forward progress). We'll be coming back to find
 	 * the rest.
 	 */
-	if (!list_empty(&data->dispose))
+	if (data->found)
 		ret = need_resched() ? D_WALK_QUIT : D_WALK_NORETRY;
 out:
 	return ret;

Re: [PATCH] selftests:vm: add include file

2018-04-14 Thread Mike Rapoport

On Sun, Apr 15, 2018 at 03:08:56AM +0800, Peng Hao wrote:
> userfaultfd.c: In function ‘hugetlb_release_pages’:
> userfaultfd.c:145:25: error: ‘FALLOC_FL_PUNCH_HOLE’ undeclared 
> (first use in this function)
> 
> Signed-off-by: Peng Hao 
> ---
>  tools/testing/selftests/vm/userfaultfd.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/tools/testing/selftests/vm/userfaultfd.c 
> b/tools/testing/selftests/vm/userfaultfd.c
> index de2f9ec..d8fe447 100644
> --- a/tools/testing/selftests/vm/userfaultfd.c
> +++ b/tools/testing/selftests/vm/userfaultfd.c
> @@ -68,6 +68,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
  
The FALLOC_FL_PUNCH_HOLE definition should come from #include .
What are the versions of your kernel and the libc-development package?

>  #ifdef __NR_userfaultfd
>  
> -- 
> 1.8.3.1
> 

-- 
Sincerely yours,
Mike.

Re: kernel-4.9.94 compile error: 'KMOD_DECOMP_LEN' undeclared

2018-04-14 Thread Akemi Yagi

On Sat, 14 Apr 2018 17:41:13 +0800, Teck Choon Giam wrote:

> Hi,
> 
> Compile linux-4.9.94 will have error related to KMOD_DECOMP_LEN
> undeclared.  Searching string related to KMOD_DECOMP_LEN in
> linux-4.9.94 and linux-4.15.17 sources as below:
> 
> sh-4.2# grep -r KMOD_DECOMP_LEN ./linux-4.15.17
> ./linux-4.15.17/tools/perf/tests/code-reading.c: char
> decomp_name[KMOD_DECOMP_LEN];
> ./linux-4.15.17/tools/perf/util/dso.h:#define KMOD_DECOMP_LEN
> sizeof(KMOD_DECOMP_NAME)
> ./linux-4.15.17/tools/perf/util/annotate.c: char tmp[KMOD_DECOMP_LEN];
> ./linux-4.15.17/tools/perf/util/dso.c: char newpath[KMOD_DECOMP_LEN];
> sh-4.2# grep -r KMOD_DECOMP_LEN ./linux-4.9.94
> ./linux-4.9.94/tools/perf/tests/code-reading.c: char
> decomp_name[KMOD_DECOMP_LEN];
> ./linux-4.9.94/tools/perf/util/dso.c: char newpath[KMOD_DECOMP_LEN];
> 
> So I guess for linux-4.9.94 has not define KMOD_DECOMP_LEN in
> tools/perf/util/dso.h?
> 
> Thanks.
> 
> Regards,
> Giam Teck Choon

Just a note to say that we (ELRepo) see the same error when building 
kernel-4.4.128 on RHEL 6 and RHEL 7. Kernel 4.4.127 built fine.

Akemi
The ELRepo Project

Re: [PATCH 4/4] ALSA: usb: add UAC3 BADD profiles support

2018-04-14 Thread Jorge Sanjuan




On 2018-04-13 23:24, Ruslan Bilovol wrote:

Recently released USB Audio Class 3.0 specification
contains BADD (Basic Audio Device Definition) document
which describes pre-defined UAC3 configurations.

BADD support is mandatory for UAC3 devices, it should be
implemented as a separate USB device configuration.
As per BADD document, class-specific descriptors
shall not be included in the Device’s Configuration
descriptor ("inferred"), but host can guess them
from BADD profile number, number of endpoints and
their max packed sizes.


Right. I would have thought that, since BADD is a subset of UAC3, it may 
be simpler to fill the Class Specific descriptors buffer and let the 
UAC3 path intact as it would result in the same behavior (for UAC3 and 
BADD configs) without the need to add that much code to the mixer, which 
is already quite big.


In the patch I proposed [1], the Class Specific buffer is filled once 
with the BADD descriptors, which are already UAC3 compliant, so the 
driver would handle the rest in the same way it would do with an UAC3 
configuration.


I will keep an eye on this as I'd need to do some work based on this 
instead.


[1] https://www.spinics.net/lists/alsa-devel/msg71617.html

Thanks,

Jorge



This patch adds support of all BADD profiles from the spec

Signed-off-by: Ruslan Bilovol 
---
 sound/usb/card.c   |  14 +++
 sound/usb/clock.c  |   9 +-
 sound/usb/mixer.c  | 313 
+++--

 sound/usb/mixer_maps.c |  65 ++
 sound/usb/stream.c |  83 +++--
 sound/usb/usbaudio.h   |   2 +
 6 files changed, 466 insertions(+), 20 deletions(-)

diff --git a/sound/usb/card.c b/sound/usb/card.c
index 4d866bd..47ebc50 100644
--- a/sound/usb/card.c
+++ b/sound/usb/card.c
@@ -307,6 +307,20 @@ static int snd_usb_create_streams(struct
snd_usb_audio *chip, int ctrlif)
return -EINVAL;
}

+   if (protocol == UAC_VERSION_3) {
+   int badd = assoc->bFunctionSubClass;
+
+   if (badd != UAC3_FUNCTION_SUBCLASS_FULL_ADC_3_0 &&
+   (badd < UAC3_FUNCTION_SUBCLASS_GENERIC_IO ||
+badd > UAC3_FUNCTION_SUBCLASS_SPEAKERPHONE)) {
+   dev_err(&dev->dev,
+   "Unsupported UAC3 BADD profile\n");
+   return -EINVAL;
+   }
+
+   chip->badd_profile = badd;
+   }
+
for (i = 0; i < assoc->bInterfaceCount; i++) {
int intf = assoc->bFirstInterface + i;

diff --git a/sound/usb/clock.c b/sound/usb/clock.c
index 0b030d8..17673f3 100644
--- a/sound/usb/clock.c
+++ b/sound/usb/clock.c
@@ -587,8 +587,15 @@ int snd_usb_init_sample_rate(struct snd_usb_audio
*chip, int iface,
default:
return set_sample_rate_v1(chip, iface, alts, fmt, rate);

-   case UAC_VERSION_2:
case UAC_VERSION_3:
+   if (chip->badd_profile >= UAC3_FUNCTION_SUBCLASS_GENERIC_IO) {
+   if (rate != UAC3_BADD_SAMPLING_RATE)
+   return -ENXIO;
+   else
+   return 0;
+   }
+   /* fall through */
+   case UAC_VERSION_2:
return set_sample_rate_v2v3(chip, iface, alts, fmt, rate);
}
 }
diff --git a/sound/usb/mixer.c b/sound/usb/mixer.c
index 301ad61..e5c3b0d 100644
--- a/sound/usb/mixer.c
+++ b/sound/usb/mixer.c
@@ -112,14 +112,12 @@ enum {
 #include "mixer_maps.c"

 static const struct usbmix_name_map *
-find_map(struct mixer_build *state, int unitid, int control)
+find_map(const struct usbmix_name_map *p, int unitid, int control)
 {
-   const struct usbmix_name_map *p = state->map;
-
if (!p)
return NULL;

-   for (p = state->map; p->id; p++) {
+   for (; p->id; p++) {
if (p->id == unitid &&
(!control || !p->control || control == p->control))
return p;
@@ -1333,6 +1331,76 @@ static struct usb_feature_control_info
*get_feature_control_info(int control)
return NULL;
 }

+static void build_feature_ctl_badd(struct usb_mixer_interface *mixer,
+ unsigned int ctl_mask, int control, int unitid,
+ const struct usbmix_name_map *badd_map)
+{
+   struct usb_feature_control_info *ctl_info;
+   unsigned int len = 0;
+   struct snd_kcontrol *kctl;
+   struct usb_mixer_elem_info *cval;
+   const struct usbmix_name_map *map;
+
+   map = find_map(badd_map, unitid, control);
+   if (!map)
+   return;
+
+   cval = kzalloc(sizeof(*cval), GFP_KERNEL);
+   if (!cval)
+   return;
+   snd_usb_mixer_elem_init_std(&cval->head, mixer, unitid);
+   cval->control = control;
+   cval->cmask = ctl_mask;

Regression with 5dcd8400884c ("macsec: missing dev_put() on error in macsec_newlink()")

2018-04-14 Thread Laura Abbott


Hi,

Fedora got a bug report of a regression when trying to remove the
the macsec module (https://bugzilla.redhat.com/show_bug.cgi?id=1566410).
I did a bisect and found

commit 5dcd8400884cc4a043a6d4617e042489e5d566a9
Author: Dan Carpenter 
Date:   Wed Mar 21 11:09:01 2018 +0300

macsec: missing dev_put() on error in macsec_newlink()

We moved the dev_hold(real_dev); call earlier in the function but forgot

to update the error paths.

Fixes: 0759e552bce7 ("macsec: fix negative refcnt on parent link")

Signed-off-by: Dan Carpenter 
Signed-off-by: David S. Miller 

The script I used for testing based on the reporter is attached. It
looks like modprobe is stuck in the D state. Any idea?

Thanks,
Laura


mac-sec-setup.sh
Description: application/shellscript

Re: [PATCH v3] IB: make INFINIBAND_ADDR_TRANS configurable

2018-04-14 Thread kbuild test robot

Hi Greg,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on linus/master]
[also build test ERROR on v4.16 next-20180413]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/Greg-Thelen/IB-make-INFINIBAND_ADDR_TRANS-configurable/20180414-234042
config: x86_64-randconfig-x011-201815 (attached as .config)
compiler: gcc-7 (Debian 7.3.0-1) 7.3.0
reproduce:
# save the attached .config to linux build tree
make ARCH=x86_64 

All errors (new ones prefixed by >>):

   drivers/nvme/host/rdma.o: In function `nvme_rdma_stop_queue':
>> drivers/nvme/host/rdma.c:554: undefined reference to `rdma_disconnect'
   drivers/nvme/host/rdma.o: In function `nvme_rdma_create_qp':
>> drivers/nvme/host/rdma.c:258: undefined reference to `rdma_create_qp'
   drivers/nvme/host/rdma.o: In function `nvme_rdma_free_queue':
>> drivers/nvme/host/rdma.c:570: undefined reference to `rdma_destroy_id'
   drivers/nvme/host/rdma.o: In function `nvme_rdma_alloc_queue':
>> drivers/nvme/host/rdma.c:511: undefined reference to `__rdma_create_id'
>> drivers/nvme/host/rdma.c:523: undefined reference to `rdma_resolve_addr'
   drivers/nvme/host/rdma.c:544: undefined reference to `rdma_destroy_id'
   drivers/nvme/host/rdma.o: In function `nvme_rdma_addr_resolved':
>> drivers/nvme/host/rdma.c:1461: undefined reference to `rdma_resolve_route'
   drivers/nvme/host/rdma.o: In function `nvme_rdma_create_queue_ib':
>> drivers/nvme/host/rdma.c:485: undefined reference to `rdma_destroy_qp'
   drivers/nvme/host/rdma.o: In function `nvme_rdma_route_resolved':
>> drivers/nvme/host/rdma.c:1512: undefined reference to `rdma_connect'
   drivers/nvme/host/rdma.o: In function `nvme_rdma_conn_rejected':
>> drivers/nvme/host/rdma.c:1436: undefined reference to `rdma_reject_msg'
>> drivers/nvme/host/rdma.c:1437: undefined reference to 
>> `rdma_consumer_reject_data'

vim +554 drivers/nvme/host/rdma.c

f41725bb Israel Rukshin2017-11-26  423  
ca6e95bb Sagi Grimberg 2017-05-04  424  static int 
nvme_rdma_create_queue_ib(struct nvme_rdma_queue *queue)
71102307 Christoph Hellwig 2016-07-06  425  {
ca6e95bb Sagi Grimberg 2017-05-04  426  struct ib_device *ibdev;
71102307 Christoph Hellwig 2016-07-06  427  const int send_wr_factor = 3;   
/* MR, SEND, INV */
71102307 Christoph Hellwig 2016-07-06  428  const int cq_factor = 
send_wr_factor + 1;   /* + RECV */
71102307 Christoph Hellwig 2016-07-06  429  int comp_vector, idx = 
nvme_rdma_queue_idx(queue);
71102307 Christoph Hellwig 2016-07-06  430  int ret;
71102307 Christoph Hellwig 2016-07-06  431  
ca6e95bb Sagi Grimberg 2017-05-04  432  queue->device = 
nvme_rdma_find_get_device(queue->cm_id);
ca6e95bb Sagi Grimberg 2017-05-04  433  if (!queue->device) {
ca6e95bb Sagi Grimberg 2017-05-04  434  
dev_err(queue->cm_id->device->dev.parent,
ca6e95bb Sagi Grimberg 2017-05-04  435  "no client data 
found!\n");
ca6e95bb Sagi Grimberg 2017-05-04  436  return -ECONNREFUSED;
ca6e95bb Sagi Grimberg 2017-05-04  437  }
ca6e95bb Sagi Grimberg 2017-05-04  438  ibdev = queue->device->dev;
71102307 Christoph Hellwig 2016-07-06  439  
71102307 Christoph Hellwig 2016-07-06  440  /*
0b36658c Sagi Grimberg 2017-07-13  441   * Spread I/O queues completion 
vectors according their queue index.
0b36658c Sagi Grimberg 2017-07-13  442   * Admin queues can always go 
on completion vector 0.
71102307 Christoph Hellwig 2016-07-06  443   */
0b36658c Sagi Grimberg 2017-07-13  444  comp_vector = idx == 0 ? idx : 
idx - 1;
71102307 Christoph Hellwig 2016-07-06  445  
71102307 Christoph Hellwig 2016-07-06  446  /* +1 for ib_stop_cq */
ca6e95bb Sagi Grimberg 2017-05-04  447  queue->ib_cq = 
ib_alloc_cq(ibdev, queue,
ca6e95bb Sagi Grimberg 2017-05-04  448  
cq_factor * queue->queue_size + 1,
ca6e95bb Sagi Grimberg 2017-05-04  449  
comp_vector, IB_POLL_SOFTIRQ);
71102307 Christoph Hellwig 2016-07-06  450  if (IS_ERR(queue->ib_cq)) {
71102307 Christoph Hellwig 2016-07-06  451  ret = 
PTR_ERR(queue->ib_cq);
ca6e95bb Sagi Grimberg 2017-05-04  452  goto out_put_dev;
71102307 Christoph Hellwig 2016-07-06  453  }
71102307 Christoph Hellwig 2016-07-06  454  
71102307 Christoph Hellwig 2016-07-06  455  ret = 
nvme_rdma_create_qp(queue, send_wr_factor);
71102307 Christoph Hellwig 2016-07-06  456  if (ret)
71102307 Christoph Hellwig 2016-07-06  457  goto out_destroy_ib_cq;
71102307 Christoph Hellwig 2016-07-06  4

[v4 PATCH] mm: introduce arg_lock to protect arg_start|end and env_start|end in mm_struct

2018-04-14 Thread Yang Shi

mmap_sem is on the hot path of kernel, and it very contended, but it is
abused too. It is used to protect arg_start|end and evn_start|end when
reading /proc/$PID/cmdline and /proc/$PID/environ, but it doesn't make
sense since those proc files just expect to read 4 values atomically and
not related to VM, they could be set to arbitrary values by C/R.

And, the mmap_sem contention may cause unexpected issue like below:

INFO: task ps:14018 blocked for more than 120 seconds.
   Tainted: GE 4.9.79-009.ali3000.alios7.x86_64 #1
 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this
message.
 ps  D0 14018  1 0x0004
  885582f84000 885e8682f000 880972943000 885ebf499bc0
  8828ee12 c900349bfca8 817154d0 0040
  00ff812f872a 885ebf499bc0 024000d000948300 880972943000
 Call Trace:
  [] ? __schedule+0x250/0x730
  [] schedule+0x36/0x80
  [] rwsem_down_read_failed+0xf0/0x150
  [] call_rwsem_down_read_failed+0x18/0x30
  [] down_read+0x20/0x40
  [] proc_pid_cmdline_read+0xd9/0x4e0
  [] ? do_filp_open+0xa5/0x100
  [] __vfs_read+0x37/0x150
  [] ? security_file_permission+0x9b/0xc0
  [] vfs_read+0x96/0x130
  [] SyS_read+0x55/0xc0
  [] entry_SYSCALL_64_fastpath+0x1a/0xc5

Both Alexey Dobriyan and Michal Hocko suggested to use dedicated lock
for them to mitigate the abuse of mmap_sem.

So, introduce a new spinlock in mm_struct to protect the concurrent
access to arg_start|end, env_start|end and others, as well as replace
write map_sem to read to protect the race condition between prctl and
sys_brk which might break check_data_rlimit(), and makes prctl more
friendly to other VM operations.

This patch just eliminates the abuse of mmap_sem, but it can't resolve the
above hung task warning completely since the later access_remote_vm() call
needs acquire mmap_sem. The mmap_sem scalability issue will be solved in the
future.

Signed-off-by: Yang Shi 
Cc: Alexey Dobriyan 
Cc: Michal Hocko 
Cc: Matthew Wilcox 
Cc: Mateusz Guzik 
Cc: Cyrill Gorcunov 
---
v3 --> v4:
* Protected values update with down_read + spin_lock to prevent from race
  condition between prctl and sys_brk and made prctl more friendly to VM
  operations per Michal's suggestion

v2 --> v3:
* Restored down_write in prctl syscall
* Elaborate the limitation of this patch suggested by Michal
* Protect those fields by the new lock except brk and start_brk per Michal's
  suggestion
* Based off Cyrill's non PR_SET_MM_MAP oprations deprecation patch
  (https://lkml.org/lkml/2018/4/5/541)

v1 --> v2:
* Use spinlock instead of rwlock per Mattew's suggestion
* Replace down_write to down_read in prctl_set_mm (see commit log for details)
 fs/proc/base.c   | 8 
 include/linux/mm_types.h | 2 ++
 kernel/fork.c| 1 +
 kernel/sys.c | 6 --
 mm/init-mm.c | 1 +
 5 files changed, 12 insertions(+), 6 deletions(-)

diff --git a/fs/proc/base.c b/fs/proc/base.c
index eafa39a..3551757 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -239,12 +239,12 @@ static ssize_t proc_pid_cmdline_read(struct file *file, 
char __user *buf,
goto out_mmput;
}
 
-   down_read(&mm->mmap_sem);
+   spin_lock(&mm->arg_lock);
arg_start = mm->arg_start;
arg_end = mm->arg_end;
env_start = mm->env_start;
env_end = mm->env_end;
-   up_read(&mm->mmap_sem);
+   spin_unlock(&mm->arg_lock);
 
BUG_ON(arg_start > arg_end);
BUG_ON(env_start > env_end);
@@ -929,10 +929,10 @@ static ssize_t environ_read(struct file *file, char 
__user *buf,
if (!mmget_not_zero(mm))
goto free;
 
-   down_read(&mm->mmap_sem);
+   spin_lock(&mm->arg_lock);
env_start = mm->env_start;
env_end = mm->env_end;
-   up_read(&mm->mmap_sem);
+   spin_unlock(&mm->arg_lock);
 
while (count > 0) {
size_t this_len, max_len;
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 2161234..49dd59e 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -413,6 +413,8 @@ struct mm_struct {
unsigned long exec_vm;  /* VM_EXEC & ~VM_WRITE & ~VM_STACK */
unsigned long stack_vm; /* VM_STACK */
unsigned long def_flags;
+
+   spinlock_t arg_lock; /* protect the below fields */
unsigned long start_code, end_code, start_data, end_data;
unsigned long start_brk, brk, start_stack;
unsigned long arg_start, arg_end, env_start, env_end;
diff --git a/kernel/fork.c b/kernel/fork.c
index 242c8c9..295f903 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -900,6 +900,7 @@ static struct mm_struct *mm_init(struct mm_struct *mm, 
struct task_struct *p,
mm->pinned_vm = 0;
memset(&mm->rss_stat, 0, sizeof(mm->rss_stat));
spin_lock_init(&mm->page_table_lock);
+   spin_lock_init(&mm->arg_lock);
mm_init_cpumask(mm);
mm_init_

Re: syzbot dashboard

2018-04-14 Thread Linus Torvalds

Coming back to this now that the merge window is almost over ]

On Mon, Mar 26, 2018 at 1:46 AM, Dmitry Vyukov  wrote:
>
> I've switched emails to links instead of attachments, here are few
> recent examples:
> https://lkml.org/lkml/2018/3/25/31
> https://lkml.org/lkml/2018/3/25/256
> https://lkml.org/lkml/2018/3/25/257

Looks good to me. I notice that only the last one got any replies, though.

I wonder if some people auto-ignore the new reports because of having
been burned by the previous "huge illegible emails" issue.

I do see syzbot fixes in rdma, though, just not for that
cma_listen_on_all issue. So maybe that bug is nastier.

  Linus

[GIT PULL V3] Thermal SoC management updates for v4.17-rc1

2018-04-14 Thread Eduardo Valentin

Hello Linus,

Please find thermal-soc changes for v4.17-rc1.
Rui asked me to send the pull request directly to you
as we are close to the end of the merge window.
Essentially this pull removes the series that caused
warning regression. I will work with the developer
to get that fixed later on, but I am still sending
the other few patches that are unrelated to that.
Let me know if this causes any issues and can still
be pulled.

Changelog:
- New i.MX7 thermal sensor
- Mediatek driver now supports MT7622 SoC
- Removal of min max cpu cooling DT property

Differences in V3:
- Rebased on top current linus/master, to avoid and merge issues
from previous pulled thermal code.

Differences in V2:
- Reordered the patches to drop exynos changes for now until we get
agreement on the fix on that driver for the compilation warns
caused by the confusing conversion functions.


The following changes since commit 48023102b7078a6674516b1fe0d639669336049d:

  Merge branch 'overlayfs-linus' of 
git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs (2018-04-13 16:55:41 
-0700)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/evalenti/linux-soc-thermal linus

for you to fetch changes up to 15a32df1918259be6c23fc36014fc26ee66c836c:

  dt-bindings: thermal: Remove "cooling-{min|max}-level" properties (2018-04-14 
09:37:55 -0700)


Anson Huang (1):
  thermal: imx: add i.MX7 thermal sensor support

Bartlomiej Zolnierkiewicz (1):
  dt-bindings: thermal: remove no longer needed samsung thermal properties

Sean Wang (2):
  dt-bindings: thermal: add binding for MT7622 SoC
  thermal: mediatek: add support for MT7622 SoC

Viresh Kumar (1):
  dt-bindings: thermal: Remove "cooling-{min|max}-level" properties

 .../devicetree/bindings/thermal/exynos-thermal.txt |  23 +-
 .../devicetree/bindings/thermal/imx-thermal.txt|   9 +-
 .../bindings/thermal/mediatek-thermal.txt  |   1 +
 .../devicetree/bindings/thermal/thermal.txt|  16 +-
 drivers/thermal/imx_thermal.c  | 295 -
 drivers/thermal/mtk_thermal.c  |  35 +++
 6 files changed, 281 insertions(+), 98 deletions(-)

[GIT PULL] Kbuild updates for 4.17 (2nd round)

2018-04-14 Thread Masahiro Yamada

Hi Linus,

Please pull more Kbuild updates for v4.17-rc1.
Thanks!


The following changes since commit f605ba97fb80522656c7dce9825a908f1e765b57:

  Merge tag 'vfio-v4.17-rc1' of git://github.com/awilliam/linux-vfio
(2018-04-06 19:44:27 -0700)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild.git
tags/kbuild-v4.17-2

for you to fetch changes up to 17baab68d337a0bf4654091e2b4cd67c3fdb44d8:

  kconfig: extend output of 'listnewconfig' (2018-04-13 23:23:11 +0900)


Kbuild updates for v4.17 (2nd)

- pass HOSTLDFLAGS when compiling single .c host programs

- build genksyms lexer and parser files instead of using shipped
  versions

- rename *-asn1.[ch] to *.asn1.[ch] for suffix consistency

- let the top .gitignore globally ignore artifacts generated by
  flex, bison, and asn1_compiler

- let the top Makefile globally clean artifacts generated by
  flex, bison, and asn1_compiler

- use safer .SECONDARY marker instead of .PRECIOUS to prevent
  intermediate files from being removed

- support -fmacro-prefix-map option to make __FILE__ a relative path

- fix # escaping to prepare for the future GNU Make release

- clean up deb-pkg by using debian tools instead of handrolled
  source/changes generation

- improve rpm-pkg portability by supporting kernel-install as a
  fallback of new-kernel-pkg

- extend Kconfig listnewconfig target to provide more information


Don Zickus (1):
  kconfig: extend output of 'listnewconfig'

Javier Martinez Canillas (1):
  kbuild: rpm-pkg: use kernel-install as a fallback for new-kernel-pkg

Masahiro Yamada (10):
  .gitignore: move *.lex.c *.tab.[ch] patterns to the top-level .gitignore
  kbuild: clean up *.lex.c and *.tab.[ch] patterns from top-level Makefile
  genksyms: generate lexer and parser during build instead of shipping
  kbuild: add %.lex.c and %.tab.[ch] to 'targets' automatically
  kbuild: add %.dtb.S and %.dtb to 'targets' automatically
  .gitignore: move *-asn1.[ch] patterns to the top-level .gitignore
  kbuild: clean up *-asn1.[ch] patterns from top-level Makefile
  kbuild: rename *-asn1.[ch] to *.asn1.[ch]
  kbuild: mark $(targets) as .SECONDARY and remove .PRECIOUS markers
  kbuild: use -fmacro-prefix-map to make __FILE__ a relative path

Rasmus Villemoes (1):
  Kbuild: fix # escaping in .cmd files for future Make

Riku Voipio (1):
  kbuild: deb-pkg: split generating packaging and build

Robin Jarry (1):
  kbuild: use HOSTLDFLAGS for single .c executables

 .gitignore  |7 +-
 Makefile|5 +
 arch/arc/boot/dts/Makefile  |2 -
 arch/arm/crypto/Makefile|2 +-
 arch/arm64/crypto/Makefile  |2 +-
 arch/sparc/vdso/Makefile|4 +-
 arch/x86/entry/vdso/Makefile|4 +-
 crypto/.gitignore   |1 -
 crypto/Makefile |   14 +-
 crypto/asymmetric_keys/.gitignore   |1 -
 crypto/asymmetric_keys/Makefile |   31 +-
 crypto/asymmetric_keys/mscode_parser.c  |2 +-
 crypto/asymmetric_keys/pkcs7_parser.c   |2 +-
 crypto/asymmetric_keys/x509_cert_parser.c   |4 +-
 crypto/rsa_helper.c |4 +-
 drivers/crypto/qat/qat_common/.gitignore|1 -
 drivers/of/unittest-data/Makefile   |6 -
 net/ipv4/netfilter/Makefile |5 +-
 net/ipv4/netfilter/nf_nat_snmp_basic_main.c |2 +-
 scripts/Kbuild.include  |5 +-
 scripts/Makefile.build  |   23 +-
 scripts/Makefile.host   |2 +-
 scripts/Makefile.lib|   31 +-
 scripts/asn1_compiler.c |2 +-
 scripts/dtc/.gitignore  |3 -
 scripts/dtc/Makefile|5 -
 scripts/genksyms/.gitignore |3 -
 scripts/genksyms/Makefile   |   27 +-
 scripts/genksyms/lex.lex.c_shipped  | 2291 
 scripts/genksyms/parse.tab.c_shipped| 2394
--
 scripts/genksyms/parse.tab.h_shipped|  119 --
 scripts/kconfig/.gitignore  |3 -
 scripts/kconfig/Makefile|4 +-
 scripts/kconfig/conf.c  |   14 +-
 scripts/package/Makefile|   34 +-
 scripts/package/builddeb|  221 +--
 scripts/package/mkdebian|  189 +++
 scripts/package/mkspec  |2 +
 tools/build/Build.include   |5 +-
 tools/objtool/Makefile  |2 +-
 tools/scripts/Makefile.include  |2 +
 41 files c

Re: [PATCH v3] IB: make INFINIBAND_ADDR_TRANS configurable

2018-04-14 Thread kbuild test robot

Hi Greg,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on linus/master]
[also build test ERROR on v4.16 next-20180413]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/Greg-Thelen/IB-make-INFINIBAND_ADDR_TRANS-configurable/20180414-234042
config: i386-randconfig-x005-201815 (attached as .config)
compiler: gcc-7 (Debian 7.3.0-1) 7.3.0
reproduce:
# save the attached .config to linux build tree
make ARCH=i386 

All errors (new ones prefixed by >>):

   drivers/nvme/host/rdma.o: In function `nvme_rdma_stop_queue':
   drivers/nvme/host/rdma.c:554: undefined reference to `rdma_disconnect'
   drivers/nvme/host/rdma.o: In function `nvme_rdma_free_queue':
   drivers/nvme/host/rdma.c:570: undefined reference to `rdma_destroy_id'
   drivers/nvme/host/rdma.o: In function `nvme_rdma_alloc_queue':
   drivers/nvme/host/rdma.c:511: undefined reference to `__rdma_create_id'
   drivers/nvme/host/rdma.c:523: undefined reference to `rdma_resolve_addr'
   drivers/nvme/host/rdma.c:544: undefined reference to `rdma_destroy_id'
   drivers/nvme/host/rdma.o: In function `nvme_rdma_create_qp':
   drivers/nvme/host/rdma.c:258: undefined reference to `rdma_create_qp'
   drivers/nvme/host/rdma.o: In function `nvme_rdma_create_queue_ib':
   drivers/nvme/host/rdma.c:485: undefined reference to `rdma_destroy_qp'
   drivers/nvme/host/rdma.o: In function `nvme_rdma_addr_resolved':
   drivers/nvme/host/rdma.c:1461: undefined reference to `rdma_resolve_route'
   drivers/nvme/host/rdma.o: In function `nvme_rdma_route_resolved':
   drivers/nvme/host/rdma.c:1512: undefined reference to `rdma_connect'
   drivers/nvme/host/rdma.o: In function `nvme_rdma_conn_rejected':
   drivers/nvme/host/rdma.c:1436: undefined reference to `rdma_reject_msg'
   drivers/nvme/host/rdma.c:1437: undefined reference to 
`rdma_consumer_reject_data'
   drivers/infiniband/ulp/srp/ib_srp.o: In function `srp_create_ch_ib':
>> drivers/infiniband/ulp/srp/ib_srp.c:585: undefined reference to 
>> `rdma_create_qp'
>> drivers/infiniband/ulp/srp/ib_srp.c:647: undefined reference to 
>> `rdma_destroy_qp'
   drivers/infiniband/ulp/srp/ib_srp.o: In function `srp_disconnect_target':
>> drivers/infiniband/ulp/srp/ib_srp.c:977: undefined reference to 
>> `rdma_disconnect'
   drivers/infiniband/ulp/srp/ib_srp.o: In function `srp_new_rdma_cm_id':
>> drivers/infiniband/ulp/srp/ib_srp.c:336: undefined reference to 
>> `__rdma_create_id'
>> drivers/infiniband/ulp/srp/ib_srp.c:345: undefined reference to 
>> `rdma_resolve_addr'
>> drivers/infiniband/ulp/srp/ib_srp.c:369: undefined reference to 
>> `rdma_destroy_id'
   drivers/infiniband/ulp/srp/ib_srp.o: In function `srp_rdma_lookup_path':
>> drivers/infiniband/ulp/srp/ib_srp.c:790: undefined reference to 
>> `rdma_resolve_route'
   drivers/infiniband/ulp/srp/ib_srp.o: In function `srp_send_req':
>> drivers/infiniband/ulp/srp/ib_srp.c:938: undefined reference to 
>> `rdma_connect'
   drivers/infiniband/ulp/srp/ib_srp.o: In function `srp_free_ch_ib':
   drivers/infiniband/ulp/srp/ib_srp.c:677: undefined reference to 
`rdma_destroy_id'
   drivers/infiniband/ulp/srp/ib_srp.o: In function `srp_rdma_cm_handler':
   drivers/infiniband/ulp/srp/ib_srp.c:2808: undefined reference to 
`rdma_disconnect'

vim +585 drivers/infiniband/ulp/srp/ib_srp.c

7dad6b2e Bart Van Assche   2014-10-21  542  
509c07bc Bart Van Assche   2014-10-30  543  static int srp_create_ch_ib(struct 
srp_rdma_ch *ch)
aef9ec39 Roland Dreier 2005-11-02  544  {
509c07bc Bart Van Assche   2014-10-30  545  struct srp_target_port *target 
= ch->target;
62154b2e Bart Van Assche   2014-05-20  546  struct srp_device *dev = 
target->srp_host->srp_dev;
aef9ec39 Roland Dreier 2005-11-02  547  struct ib_qp_init_attr 
*init_attr;
73aa89ed Ishai Rabinovitz  2012-11-26  548  struct ib_cq *recv_cq, *send_cq;
73aa89ed Ishai Rabinovitz  2012-11-26  549  struct ib_qp *qp;
d1b4289e Bart Van Assche   2014-05-20  550  struct ib_fmr_pool *fmr_pool = 
NULL;
5cfb1782 Bart Van Assche   2014-05-20  551  struct srp_fr_pool *fr_pool = 
NULL;
509c5f33 Bart Van Assche   2016-05-12  552  const int m = 1 + 
dev->use_fast_reg * target->mr_per_cmd * 2;
aef9ec39 Roland Dreier 2005-11-02  553  int ret;
aef9ec39 Roland Dreier 2005-11-02  554  
aef9ec39 Roland Dreier 2005-11-02  555  init_attr = kzalloc(sizeof 
*init_attr, GFP_KERNEL);
aef9ec39 Roland Dreier 2005-11-02  556  if (!init_attr)
aef9ec39 Roland Dreier 2005-11-02  557  return -ENOMEM;
aef9ec39 Roland Dreier 2005-11-02  558  
56139

Re: blktest for [PATCH v2] block: do not use interruptible wait anywhere

2018-04-14 Thread Alan Jenkins


On 13/04/18 09:31, Johannes Thumshirn wrote:

Hi Alan,

On Thu, 2018-04-12 at 19:11 +0100, Alan Jenkins wrote:

# dd if=/dev/sda of=/dev/null iflag=direct & \
   while killall -SIGUSR1 dd; do sleep 0.1; done & \
   echo mem > /sys/power/state ; \
   sleep 5; killall dd  # stop after 5 seconds

Can you please also add a regression test to blktests[1] for this?

[1] https://github.com/osandov/blktests

Thanks,
Johannes


Good question. It would be nice to promote this test.

Template looks like I need the commit (sha1) first.

I had some ideas about automating it, so I wrote a standalone (see 
end).  I can automate the wakeup by using pm_test, but this is still a 
system suspend test.  Unfortunately I don't think there's any 
alternative. To give the most dire example


# This test is non-destructive, but it exercises suspend in all drivers.
# If your system has a problem with suspend, it might not wake up again.


So I'm not sure if it would be acceptable for the default set?

How useful is this going to be? Is there an expanded/full set of tests 
that gets run somewhere?


If you can't guarantee it's going to be run somewhere, I'd worry the 
cost/benefit  feels a little narrow :-(. There were one or two further 
"interesting" details, and it might theoretically bitrot if it's not run 
periodically.


If you look at the diff and title for the fix, I don't think it's at 
high risk of being reversed unintentionally.


And I think you can trust users will notice if the fix gets merged away 
accidentally, before it hits -stable releases :-). The issue kills the 
entire GUI session on resume from suspend, say once every three days, on 
gnome-shell (due to Xwayland). One unfortunate user switched to Xorg 
only to find that was also affected.  I honestly assume the issue 
applies generally to laptop systems.  The only mitigating factor is if 
you have RAM to spare, so you don't hit the major pagefaults during resume.


#!/bin/bash

# This test is non-destructive, but it exercises suspend in all drivers.
# If your system has a problem with suspend, it might not wake up again.

# TEST_DEV must be SCSI (inc. libata).
#
# Additionally, this test will abort if $TEST_DEV is too tiny
# and we finish reading it within 3 seconds.  Sorry.
TEST_DEV=sda

# RATIONALE
#
# The original root cause issue was the behaviour around blk_queue_freeze().
# It put tasks into an interruptible wait, which is wrong for block devices.
#
# XXX Insert reference to fix commit XXX
#
# The freeze feature is not directly exposed to userspace, so I can not test
# it directly :(.  (It's used to "guarantee no request is in use, so we can
# change any data structure of the queue afterward".  I.e. freeze, modify the
# queue structure, unfreeze).
#
# However, this lead to a regression with a decent reproducer.  In v4.15 the
# same interruptible wait was also used for SCSI suspend/resume.  SCSI resume
# can take a second or so... hence we like to do it asynchronously.  This
# means we can observe the wait at resume time, and we can test if it is
# interruptible.
#
# Note `echo quiesce > /sys/class/scsi_device/*/device/state` can *not*
# trigger the specific wait in the block layer.  That code path only
# sets the SCSI device state; it does not set any block device state.
# (It does not call into blk_queue_freeze() or blk_set_preempt_only();
#  it literally just sets sdev->sdev_state to SDEV_QUIESCE).

set -o nounset

abort() {
echo "$*"
echo "=== Test ERROR ==="
exit 2
}

SYSFS_PM_TEST_DELAY=/sys/module/suspend/parameters/pm_test_delay
SAVED_PM_TEST_DELAY=

# Child process IDs
DD=
SUBSHELL=

cleanup() {
# In many cases the subshell will already have exited...
# and semantics for `wait` are crappy in shell.
# Failure will be harmless in most cases.
# Just try to provide enough context for the user to guess.

echo "Cleaning up"

if [ -n "$SUBSHELL" ]; then

echo "Killing sub-shell PID $SUBSHELL..."
kill $SUBSHELL
wait $SUBSHELL
fi
if [ -n "$DD" ]; then
echo "Killing 'dd' PID $DD..."
kill $DD
wait $DD
fi

echo "Resetting pm_test"
echo none > /sys/power/pm_test

echo "Resetting pm_test_delay"

if [ -n "$SAVED_PM_TEST_DELAY" ]; then
echo "$SAVED_PM_TEST_DELAY" > "$SYSFS_PM_TEST_DELAY"
fi
}
trap cleanup EXIT

# "If a user has disabled async probing a likely reason
#  is due to a storage enclosure that does not inject
#  staggered spin-ups. For safety, make resume
#  synchronous as well in that case."
if ! SCAN="$(cat /sys/module/scsi_mod/parameters/scan)"; then
abort "error reading '/sys/module/scsi_mod/parameters/scan' ?"
fi
if [ "$SCAN" != "async" ]; then
abort "This test does not work if you have set 'scsi_mod.scan=sync'"
fi

# Ignore USR1, in the hope that this applies to child processes.
# This allows us to safely `kill -USR1 $DD`, when we don't know
# whether the child process has fully started yet.

Re: [PATCH net-next] net: introduce a new tracepoint for tcp_rcv_space_adjust

2018-04-14 Thread David Miller


The net-next tree is closed, please resubmit this when the merge window
ends and the net-next tree opens back up.

Thank you.

Re: blktest for [PATCH v2] block: do not use interruptible wait anywhere

2018-04-14 Thread Jens Axboe

On 4/14/18 1:46 PM, Alan Jenkins wrote:
> On 13/04/18 09:31, Johannes Thumshirn wrote:
>> Hi Alan,
>>
>> On Thu, 2018-04-12 at 19:11 +0100, Alan Jenkins wrote:
>>> # dd if=/dev/sda of=/dev/null iflag=direct & \
>>>while killall -SIGUSR1 dd; do sleep 0.1; done & \
>>>echo mem > /sys/power/state ; \
>>>sleep 5; killall dd  # stop after 5 seconds
>> Can you please also add a regression test to blktests[1] for this?
>>
>> [1] https://github.com/osandov/blktests
>>
>> Thanks,
>>  Johannes
> 
> Good question. It would be nice to promote this test.
> 
> Template looks like I need the commit (sha1) first.
> 
> I had some ideas about automating it, so I wrote a standalone (see 
> end).  I can automate the wakeup by using pm_test, but this is still a 
> system suspend test.  Unfortunately I don't think there's any 
> alternative. To give the most dire example
> 
>  # This test is non-destructive, but it exercises suspend in all drivers.
>  # If your system has a problem with suspend, it might not wake up again.
> 
> 
> So I'm not sure if it would be acceptable for the default set?
> 
> How useful is this going to be? Is there an expanded/full set of tests 
> that gets run somewhere?
> 
> If you can't guarantee it's going to be run somewhere, I'd worry the 
> cost/benefit  feels a little narrow :-(. There were one or two further 
> "interesting" details, and it might theoretically bitrot if it's not run 
> periodically.

I run it, just last week we found two new bugs with it. I'm requiring
anyone that submits block patches to run the test suite, and also
working towards having it be part of the 0-day runs so it gets run
on posted patches automatically.

So yes, it's useful and it won't bitrot. Please do turn it into a blktests
test.

-- 
Jens Axboe

Re: [PATCH v2] block: do not use interruptible wait anywhere

2018-04-14 Thread Jens Axboe

On 4/12/18 12:11 PM, Alan Jenkins wrote:
> When blk_queue_enter() waits for a queue to unfreeze, or unset the
> PREEMPT_ONLY flag, do not allow it to be interrupted by a signal.
> 
> The PREEMPT_ONLY flag was introduced later in commit 3a0a529971ec
> ("block, scsi: Make SCSI quiesce and resume work reliably").  Note the SCSI
> device is resumed asynchronously, i.e. after un-freezing userspace tasks.
> 
> So that commit exposed the bug as a regression in v4.15.  A mysterious
> SIGBUS (or -EIO) sometimes happened during the time the device was being
> resumed.  Most frequently, there was no kernel log message, and we saw Xorg
> or Xwayland killed by SIGBUS.[1]
> 
> [1] E.g. https://bugzilla.redhat.com/show_bug.cgi?id=1553979
> 
> Without this fix, I get an IO error in this test:
> 
> # dd if=/dev/sda of=/dev/null iflag=direct & \
>   while killall -SIGUSR1 dd; do sleep 0.1; done & \
>   echo mem > /sys/power/state ; \
>   sleep 5; killall dd  # stop after 5 seconds
> 
> The interruptible wait was added to blk_queue_enter in
> commit 3ef28e83ab15 ("block: generic request_queue reference counting").
> Before then, the interruptible wait was only in blk-mq, but I don't think
> it could ever have been correct.

Applied, thanks.

Still want that test in blktests, though!

-- 
Jens Axboe

Yes Yes Yes Yes

2018-04-14 Thread Mr. Gersave Emmanuel

Hello Dear,

I am Mr. Gervase Emmanuel, executive office holder, general operation
and regional accountant of Royal Bank of Scotland Plc, London United
Kingdom. I believe it is the wish of God for me to come across you
today. Also, I hope that you will not expose or betray this trust and
confident that I am about to impose on you. I have been in search of
someone with this same last name, so when I saw your name, I was
pushed to contact you for our mutual benefit.

One of our customer from your country had a fixed deposit account with
our bank in 2004 that valued the Sum of £7,100,000.00 (Seven million,
one hundred thousand British pounds). The maturity date for this
deposit was on 2007; unfortunately he was among the death victims of
the May 26, 2006 earthquake disaster in Jawa, Indonesia that killed
about 5,782 people.

He was on a business trip in Indonesia during this disaster that end
up his life. Being single, he did not state any next of kin
Heir-apparent when the account was opened, although as his account
officer, he told me that he will later forward one of his relative’s
names as his next of kin Heir to the account which he did not
fulfilled before he met his death.

Since then, I am searching for someone from your country with similar
name. I was happy when I saw your name and I am now seeking for your
co-operation to present you as the next of kin Heir to this account
hence you have similar last name with the deceased. Do not be afraid,
there is no risk involved and every legitimate arrangement to perfect
this deal has been put in place.

For your involvement in this deal, you will receive 45% of the total
amount after the money is transfer to your bank account. Also for
confidentiality in this transaction i will like us to keep via email
for now. Should you consider this offer interesting, kindly send me
the below information of yours completely.

Your complete name:
Your full contact address:
Your direct mobile phone number:
Your major occupation:
Your Age:


I look forward to hear from you to enable me give you more details
about this fund, and please reply me through private email:
gervaseemma...@myself.com

Thanks in anticipation of your urgent response.

Best Regards
Mr. Gervase Emmanuel

repeatable boot randomness inside KVM guest

2018-04-14 Thread Alexey Dobriyan

SLAB allocators got CONFIG_SLAB_FREELIST_RANDOM option which randomizes
allocation pattern inside a slab:


#ifdef CONFIG_SLAB_FREELIST_RANDOM
/* Pre-initialize the random sequence cache */
static int init_cache_random_seq(struct kmem_cache *s)
{
...

Then I printed actual random sequences for each kmem cache.
Turned out they were all the same for most of the caches and
they didn't vary across guest reboots.

int cache_random_seq_create(struct kmem_cache *cachep, unsigned int 
count, gfp_t gfp)
{
...
/* Get best entropy at this stage of boot */
prandom_seed_state(&state, get_random_long());

Then I searched internet and turned out KVM can pass randomness via
virtio-rng or something. So I linked /dev/urandom.

And it didn't help!

The only way to get randomness for SLAB is to enable RDRAND inside guest.

Is it KVM bug?

For the record I'm using qemu 2.11.1-r2 and whatever F27 ships now.

Request for Quotation

2018-04-14 Thread Mohammed

Hello,
 
Good day,
 
I am  Mohammed, Our company is interested in your product.
We have gone through your product site online and wish to make order of your
product.
Please do send us details of your products and company to our {email} Also
provide with the recent price
 
We await your response with quotation and specification.
[1] Payment terms
[2] And your products Warranty
(3] Minimum Order Quantity
 
 
Mohammed /Purchasing Manager
Telephone: +966 3 867 1902
Fax: +966 3 867 3435
tr.export.imp...@outlook.com
 
PAN TRADING  EQUIPMENT'S WORLDWIDE
 
Address: Dallah street, Al Rehab
Saudi Arabia

[PATCH] ipc: Adding new return type vm_fault_t

2018-04-14 Thread Souptick Joarder

Use new return type vm_fault_t for fault handler.

Signed-off-by: Souptick Joarder 
Reviewed-by: Matthew Wilcox 
---
 ipc/shm.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/ipc/shm.c b/ipc/shm.c
index 4643865..2ba0cfc 100644
--- a/ipc/shm.c
+++ b/ipc/shm.c
@@ -378,7 +378,7 @@ void exit_shm(struct task_struct *task)
up_write(&shm_ids(ns).rwsem);
 }
 
-static int shm_fault(struct vm_fault *vmf)
+static vm_fault_t shm_fault(struct vm_fault *vmf)
 {
struct file *file = vmf->vma->vm_file;
struct shm_file_data *sfd = shm_file_data(file);
-- 
1.9.1

[PATCH] kernel: event: core: Change return type to vm_fault_t

2018-04-14 Thread Souptick Joarder

Use new return type vm_fault_t for fault handler and
page_mkwrite handler in struct vm_operations_struct.

Signed-off-by: Souptick Joarder 
Reviewed-by: Matthew Wilcox 
---
 kernel/events/core.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/events/core.c b/kernel/events/core.c
index 96db9ae..d09f1c4 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -4918,11 +4918,11 @@ void perf_event_update_userpage(struct perf_event 
*event)
 }
 EXPORT_SYMBOL_GPL(perf_event_update_userpage);

-static int perf_mmap_fault(struct vm_fault *vmf)
+static vm_fault_t perf_mmap_fault(struct vm_fault *vmf)
 {
struct perf_event *event = vmf->vma->vm_file->private_data;
struct ring_buffer *rb;
-   int ret = VM_FAULT_SIGBUS;
+   vm_fault_t ret = VM_FAULT_SIGBUS;

if (vmf->flags & FAULT_FLAG_MKWRITE) {
if (vmf->pgoff == 0)
--
1.9.1

[PATCH] kernel: relay: Change return type to vm_fault_t

2018-04-14 Thread Souptick Joarder

Use new return type vm_fault_t for fault handler
in struct vm_operations_struct.

Signed-off-by: Souptick Joarder 
Reviewed-by: Matthew Wilcox 
---
 kernel/relay.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/relay.c b/kernel/relay.c
index c302940..a8cdbf7 100644
--- a/kernel/relay.c
+++ b/kernel/relay.c
@@ -39,7 +39,7 @@ static void relay_file_mmap_close(struct vm_area_struct *vma)
 /*
  * fault() vm_op implementation for relay file mapping.
  */
-static int relay_buf_fault(struct vm_fault *vmf)
+static vm_fault_t relay_buf_fault(struct vm_fault *vmf)
 {
struct page *page;
struct rchan_buf *buf = vmf->vma->vm_private_data;
--
1.9.1

Re: [PATCH] x86/cpufeature: guard asm_volatile_goto usage with CC_HAVE_ASM_GOTO

2018-04-14 Thread Yonghong Song




On 4/14/18 3:11 AM, Peter Zijlstra wrote:

On Fri, Apr 13, 2018 at 01:42:14PM -0700, Alexei Starovoitov wrote:

On 4/13/18 11:19 AM, Peter Zijlstra wrote:

On Tue, Apr 10, 2018 at 02:28:04PM -0700, Alexei Starovoitov wrote:

Instead of
#ifdef CC_HAVE_ASM_GOTO
we can replace it with
#ifndef __BPF__
or some other name,


I would prefer the BPF specific hack; otherwise we might be encouraging
people to build the kernel proper without asm-goto.



I don't understand this concern.


The thing is; this will be a (temporary) BPF specific hack. Hiding it
behind something that looks 'normal' (CC_HAVE_ASM_GOTO) is just not
right.


This is a fair concern. I will use a different macro and send v2 soon.
Thanks.

[PATCH 0/3] Receive Side Coalescing for macb driver

2018-04-14 Thread Rafal Ozieblo

This patch series adds support for receive side coalescing
for Cadence GEM driver. Receive segmentation coalescing
is a mechanism to reduce CPU overhead. This is done by
coalescing received TCP message segments together into
a single large message. This means that when the message
is complete the CPU only has to process the single header
and act upon the one data payload.

Rafal Ozieblo (3):
  net: macb: Add support for rsc capable hardware
  net: macb: Add support for header data spliting
  net: macb: Receive Side Coalescing (RSC) feature added.

 drivers/net/ethernet/cadence/macb.h  |  21 +++
 drivers/net/ethernet/cadence/macb_main.c | 227 ++-
 2 files changed, 212 insertions(+), 36 deletions(-)

-- 
2.4.5

[PATCH 1/3] net: macb: Add support for rsc capable hardware

2018-04-14 Thread Rafal Ozieblo

When the pbuf_rsc has been enabled in hardware
the receive buffer offset for incoming packets
cannot be changed in the network configuration register
(even when rsc is not use at all).

Signed-off-by: Rafal Ozieblo 
---
 drivers/net/ethernet/cadence/macb.h  |  2 ++
 drivers/net/ethernet/cadence/macb_main.c | 22 ++
 2 files changed, 20 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/cadence/macb.h 
b/drivers/net/ethernet/cadence/macb.h
index 8665982..33c9a48 100644
--- a/drivers/net/ethernet/cadence/macb.h
+++ b/drivers/net/ethernet/cadence/macb.h
@@ -477,6 +477,8 @@
 /* Bitfields in DCFG6. */
 #define GEM_PBUF_LSO_OFFSET27
 #define GEM_PBUF_LSO_SIZE  1
+#define GEM_PBUF_RSC_OFFSET26
+#define GEM_PBUF_RSC_SIZE  1
 #define GEM_DAW64_OFFSET   23
 #define GEM_DAW64_SIZE 1
 
diff --git a/drivers/net/ethernet/cadence/macb_main.c 
b/drivers/net/ethernet/cadence/macb_main.c
index b4c9268..43201a8 100644
--- a/drivers/net/ethernet/cadence/macb_main.c
+++ b/drivers/net/ethernet/cadence/macb_main.c
@@ -930,8 +930,9 @@ static void gem_rx_refill(struct macb_queue *queue)
macb_set_addr(bp, desc, paddr);
desc->ctrl = 0;
 
-   /* properly align Ethernet header */
-   skb_reserve(skb, NET_IP_ALIGN);
+   if (!(bp->dev->hw_features & NETIF_F_LRO))
+   /* properly align Ethernet header */
+   skb_reserve(skb, NET_IP_ALIGN);
} else {
desc->addr &= ~MACB_BIT(RX_USED);
desc->ctrl = 0;
@@ -2110,7 +2111,13 @@ static void macb_init_hw(struct macb *bp)
config = macb_mdc_clk_div(bp);
if (bp->phy_interface == PHY_INTERFACE_MODE_SGMII)
config |= GEM_BIT(SGMIIEN) | GEM_BIT(PCSSEL);
-   config |= MACB_BF(RBOF, NET_IP_ALIGN);  /* Make eth data aligned */
+   /* When the pbuf_rsc has been enabled in hardware the receive buffer
+* offset cannot be changed in the network configuration register.
+*/
+   if (!(bp->dev->hw_features &  NETIF_F_LRO))
+   /* Make eth data aligned */
+   config |= MACB_BF(RBOF, NET_IP_ALIGN);
+
config |= MACB_BIT(PAE);/* PAuse Enable */
config |= MACB_BIT(DRFCS);  /* Discard Rx FCS */
if (bp->caps & MACB_CAPS_JUMBO)
@@ -2281,7 +2288,7 @@ static void macb_set_rx_mode(struct net_device *dev)
 static int macb_open(struct net_device *dev)
 {
struct macb *bp = netdev_priv(dev);
-   size_t bufsz = dev->mtu + ETH_HLEN + ETH_FCS_LEN + NET_IP_ALIGN;
+   size_t bufsz = dev->mtu + ETH_HLEN + ETH_FCS_LEN;
struct macb_queue *queue;
unsigned int q;
int err;
@@ -2295,6 +2302,9 @@ static int macb_open(struct net_device *dev)
if (!dev->phydev)
return -EAGAIN;
 
+   if (!(bp->dev->hw_features & NETIF_F_LRO))
+   bufsz += NET_IP_ALIGN;
+
/* RX buffers initialization */
macb_init_rx_buffer_size(bp, bufsz);
 
@@ -3365,6 +3375,10 @@ static int macb_init(struct platform_device *pdev)
if (GEM_BFEXT(PBUF_LSO, gem_readl(bp, DCFG6)))
dev->hw_features |= MACB_NETIF_LSO;
 
+   /* Check RSC capability */
+   if (GEM_BFEXT(PBUF_RSC, gem_readl(bp, DCFG6)))
+   dev->hw_features |= NETIF_F_LRO;
+
/* Checksum offload is only available on gem with packet buffer */
if (macb_is_gem(bp) && !(bp->caps & MACB_CAPS_FIFO_MODE))
dev->hw_features |= NETIF_F_HW_CSUM | NETIF_F_RXCSUM;
-- 
2.4.5

[PATCH 2/3] net: macb: Add support for header data spliting

2018-04-14 Thread Rafal Ozieblo

This patch adds support for frames splited between
many rx buffers. Header data spliting can be used
but also buffers shorter than max frame length.
The only limitation is that frame header can't
be splited.

Signed-off-by: Rafal Ozieblo 
---
 drivers/net/ethernet/cadence/macb.h  |  13 +++
 drivers/net/ethernet/cadence/macb_main.c | 137 +++
 2 files changed, 118 insertions(+), 32 deletions(-)

diff --git a/drivers/net/ethernet/cadence/macb.h 
b/drivers/net/ethernet/cadence/macb.h
index 33c9a48..a2cb805 100644
--- a/drivers/net/ethernet/cadence/macb.h
+++ b/drivers/net/ethernet/cadence/macb.h
@@ -295,6 +295,8 @@
 /* Bitfields in DMACFG. */
 #define GEM_FBLDO_OFFSET   0 /* fixed burst length for DMA */
 #define GEM_FBLDO_SIZE 5
+#define GEM_HDRS_OFFSET5 /* Header Data Splitting */
+#define GEM_HDRS_SIZE  1
 #define GEM_ENDIA_DESC_OFFSET  6 /* endian swap mode for management descriptor 
access */
 #define GEM_ENDIA_DESC_SIZE1
 #define GEM_ENDIA_PKT_OFFSET   7 /* endian swap mode for packet data access */
@@ -755,8 +757,12 @@ struct gem_tx_ts {
 #define MACB_RX_SOF_SIZE   1
 #define MACB_RX_EOF_OFFSET 15
 #define MACB_RX_EOF_SIZE   1
+#define MACB_RX_HDR_OFFSET 16
+#define MACB_RX_HDR_SIZE   1
 #define MACB_RX_CFI_OFFSET 16
 #define MACB_RX_CFI_SIZE   1
+#define MACB_RX_EOH_OFFSET 17
+#define MACB_RX_EOH_SIZE   1
 #define MACB_RX_VLAN_PRI_OFFSET17
 #define MACB_RX_VLAN_PRI_SIZE  3
 #define MACB_RX_PRI_TAG_OFFSET 20
@@ -1086,6 +1092,11 @@ struct tsu_incr {
u32 ns;
 };
 
+struct rx_frag_list {
+   struct sk_buff  *skb_head;
+   struct sk_buff  *skb_tail;
+};
+
 struct macb_queue {
struct macb *bp;
int irq;
@@ -1121,6 +1132,8 @@ struct macb_queue {
unsigned inttx_ts_head, tx_ts_tail;
struct gem_tx_tstx_timestamps[PTP_TS_BUFFER_SIZE];
 #endif
+   struct rx_frag_list rx_frag;
+   u32 rx_frag_len;
 };
 
 struct ethtool_rx_fs_item {
diff --git a/drivers/net/ethernet/cadence/macb_main.c 
b/drivers/net/ethernet/cadence/macb_main.c
index 43201a8..27c406c 100644
--- a/drivers/net/ethernet/cadence/macb_main.c
+++ b/drivers/net/ethernet/cadence/macb_main.c
@@ -967,6 +967,13 @@ static void discard_partial_frame(struct macb_queue 
*queue, unsigned int begin,
 */
 }
 
+void gem_reset_rx_state(struct macb_queue *queue)
+{
+   queue->rx_frag.skb_head = NULL;
+   queue->rx_frag.skb_tail = NULL;
+   queue->rx_frag_len = 0;
+}
+
 static int gem_rx(struct macb_queue *queue, int budget)
 {
struct macb *bp = queue->bp;
@@ -977,6 +984,9 @@ static int gem_rx(struct macb_queue *queue, int budget)
int count = 0;
 
while (count < budget) {
+   struct sk_buff *skb_head, *skb_tail;
+   bool eoh = false, header = false;
+   bool sof, eof;
u32 ctrl;
dma_addr_t addr;
bool rxused;
@@ -995,57 +1005,118 @@ static int gem_rx(struct macb_queue *queue, int budget)
break;
 
queue->rx_tail++;
-   count++;
-
-   if (!(ctrl & MACB_BIT(RX_SOF) && ctrl & MACB_BIT(RX_EOF))) {
+   skb = queue->rx_skbuff[entry];
+   if (unlikely(!skb)) {
netdev_err(bp->dev,
-  "not whole frame pointed by descriptor\n");
+  "inconsistent Rx descriptor chain\n");
bp->dev->stats.rx_dropped++;
queue->stats.rx_dropped++;
break;
}
-   skb = queue->rx_skbuff[entry];
-   if (unlikely(!skb)) {
+   skb_head = queue->rx_frag.skb_head;
+   skb_tail = queue->rx_frag.skb_tail;
+   sof = !!(ctrl & MACB_BIT(RX_SOF));
+   eof = !!(ctrl & MACB_BIT(RX_EOF));
+   if (GEM_BFEXT(HDRS, gem_readl(bp, DMACFG))) {
+   eoh = !!(ctrl & MACB_BIT(RX_EOH));
+   if (!eof)
+   header = !!(ctrl & MACB_BIT(RX_HDR));
+   }
+
+   queue->rx_skbuff[entry] = NULL;
+   /* Discard if out-of-sequence or header split across buffers */
+   if ((!skb_head /* first frame buffer */
+   && (!sof /* without start of frame */
+   || (header && !eoh))) /* or without whole header */
+   || (skb_head && sof)) { /* or new start before EOF */
+   struct sk_buff *tmp_skb;
+
netdev_err(bp->dev,
-

Inbox SMTP, Inbox Webmail, I Sell Sure Spamming Toolz

2018-04-14 Thread Mr Spamming

I Sell Sure Spamming Toolz 
What we have on Stock Daily 

Inbox Webmail
Inbox SMTP
Fresh USA email leads
Fresh Canada email leads
Fresh Loan email leads
Fresh Business emails leads
Real Eastate email leads
Conference delegates email leads
Fresh Job Seaker emails
cPanel HTTP and HTTPs
Shell Zip/Unzipp
Mailer
RDP
All ScamPages
Bank ScamPage

Add me on whatsapp or call me

Watsapp: +2348107268246

Only Real buyers

[PATCH 3/3] net: macb: Receive Side Coalescing (RSC) feature added.

2018-04-14 Thread Rafal Ozieblo

This is basically the same as Large Receive Offload (LRO)
in Linux framework.

Signed-off-by: Rafal Ozieblo 
---
 drivers/net/ethernet/cadence/macb.h  |  6 +++
 drivers/net/ethernet/cadence/macb_main.c | 70 +++-
 2 files changed, 75 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/cadence/macb.h 
b/drivers/net/ethernet/cadence/macb.h
index a2cb805..9ebdde7 100644
--- a/drivers/net/ethernet/cadence/macb.h
+++ b/drivers/net/ethernet/cadence/macb.h
@@ -83,6 +83,7 @@
 #define GEM_USRIO  0x000c /* User IO */
 #define GEM_DMACFG 0x0010 /* DMA Configuration */
 #define GEM_JML0x0048 /* Jumbo Max Length */
+#define GEM_RSC0x0058 /* RSC Control */
 #define GEM_HRB0x0080 /* Hash Bottom */
 #define GEM_HRT0x0084 /* Hash Top */
 #define GEM_SA1B   0x0088 /* Specific1 Bottom */
@@ -318,6 +319,11 @@
 #define GEM_ADDR64_OFFSET  30 /* Address bus width - 64b or 32b */
 #define GEM_ADDR64_SIZE1
 
+/* Bitfields in RSC control */
+#define GEM_RSCCTRL_OFFSET 1 /* RSC control */
+#define GEM_RSCCTRL_SIZE   15
+#define GEM_CLRMSK_OFFSET  16 /* RSC clear mask */
+#define GEM_CLRMSK_SIZE1
 
 /* Bitfields in NSR */
 #define MACB_NSR_LINK_OFFSET   0 /* pcs_link_state */
diff --git a/drivers/net/ethernet/cadence/macb_main.c 
b/drivers/net/ethernet/cadence/macb_main.c
index 27c406c..92bdcf1 100644
--- a/drivers/net/ethernet/cadence/macb_main.c
+++ b/drivers/net/ethernet/cadence/macb_main.c
@@ -2377,6 +2377,8 @@ static int macb_open(struct net_device *dev)
 
if (!(bp->dev->hw_features & NETIF_F_LRO))
bufsz += NET_IP_ALIGN;
+   else
+   bufsz = 0xFF * 64; // For RSC Buffer Sizes must be set to 16K.
 
/* RX buffers initialization */
macb_init_rx_buffer_size(bp, bufsz);
@@ -2801,6 +2803,62 @@ static int macb_get_ts_info(struct net_device *netdev,
return ethtool_op_get_ts_info(netdev, info);
 }
 
+static void gem_enable_hdr_data_split(struct macb *bp, bool enable)
+{
+   u32 dmacfg;
+
+   dmacfg = gem_readl(bp, DMACFG);
+   if (enable)
+   dmacfg |= GEM_BIT(HDRS);
+   else
+   dmacfg &= ~GEM_BIT(HDRS);
+   gem_writel(bp, DMACFG, dmacfg);
+}
+
+static void gem_update_rsc_state(struct macb *bp, netdev_features_t feature)
+{
+   u32 rsc_control, rsc_control_new, queue, rsc;
+   bool enable, jumbo, any_enabled = false;
+   struct ethtool_rx_fs_item *item;
+   unsigned long flags;
+   u32 ncfgr;
+
+   enable = (!!(feature & NETIF_F_NTUPLE) && !!(feature & NETIF_F_LRO));
+   rsc = gem_readl(bp, RSC);
+   rsc_control = GEM_BFEXT(RSCCTRL, rsc);
+   rsc_control_new = 0;
+   if (enable) {
+   list_for_each_entry(item, &bp->rx_fs_list.list, list) {
+   queue = item->fs.ring_cookie;
+   rsc_control_new |= (1 << (queue - 1));
+   any_enabled = true;
+   netdev_dbg(bp->dev, "RSC %sabled for queue %u\n",
+  enable ? "en" : "dis", queue);
+   }
+   }
+   if (rsc_control_new != rsc_control) {
+   rsc = GEM_BFINS(RSCCTRL, rsc_control_new, rsc);
+   gem_writel(bp, RSC, rsc);
+   }
+   if (bp->caps & MACB_CAPS_JUMBO) {
+   /* Don't enable jumbo mode for RSC:
+* disable unless not RSC and large MTU
+*/
+   ncfgr = gem_readl(bp, NCFGR);
+   enable = !any_enabled;
+   jumbo = !!MACB_BFEXT(JFRAME, ncfgr);
+   /* and don't touch if already in the state we want */
+   if ((jumbo && !enable) || (!jumbo && enable)) {
+   ncfgr = MACB_BFINS(JFRAME, enable, ncfgr);
+   spin_lock_irqsave(&bp->lock, flags);
+   gem_writel(bp, NCFGR, ncfgr);
+   spin_unlock_irqrestore(&bp->lock, flags);
+   }
+   }
+   /* Need to enable header-data splitting also */
+   gem_enable_hdr_data_split(bp, any_enabled);
+}
+
 static void gem_enable_flow_filters(struct macb *bp, bool enable)
 {
struct ethtool_rx_fs_item *item;
@@ -2969,6 +3027,8 @@ static int gem_add_flow_filter(struct net_device *netdev,
if (netdev->features & NETIF_F_NTUPLE)
gem_enable_flow_filters(bp, 1);
 
+   /* enable RSC if LRO & NTUPLE on */
+   gem_update_rsc_state(bp, netdev->features);
spin_unlock_irqrestore(&bp->rx_fs_lock, flags);
return 0;
 
@@ -3009,6 +3069,7 @@ static int gem_del_flow_filter(struct net_device *netdev,
return 0;
}
}
+   gem_update_rsc_state(bp, netdev->features);
 
spin_unlock_irqrestore(&bp->rx_fs_lock, flags);
return -EINVAL;
@@

Re: [PATCH] fs/dcache.c: re-add cond_resched() in shrink_dcache_parent()

2018-04-14 Thread Al Viro

On Sat, Apr 14, 2018 at 09:36:23AM -0700, Linus Torvalds wrote:
> But it does *not* make sense for the case where we've hit a dentry
> that is already on the shrink list. Sure, we'll continue to gather all
> the other dentries, but if there is concurrent shrinking, shouldn't we
> give up the CPU more eagerly - *particularly* if somebody else is
> waiting (it might be the other process that actually gets rid of the
> shrinking dentries!)?
> 
> So my gut feel is that we should at least try doing something like
> this in select_collect():
> 
> -   if (!list_empty(&data->dispose))
> +   if (data->found)
> ret = need_resched() ? D_WALK_QUIT : D_WALK_NORETRY;
> 
> because even if we haven't actually been able to shrink something, if
> we hit an already shrinking entry we should probably at least not do
> the "retry for rename". And if we actually are going to reschedule, we
> might as well start from the beginning.
> 
> I realize that *this* thread might not be making any actual progress
> (because it didn't find any dentries to shrink), but since it did find
> _a_ dentry that is being shrunk, we know the operation itself - on a
> bigger scale - is making progress.
> 
> Hmm?

That breaks d_invalidate(), unfortunately.  Look at the termination
conditions in the loop there...

Re: [PATCH] checkpatch: Add a --strict test for structs with bool member definitions

2018-04-14 Thread Julia Lawall



On Wed, 11 Apr 2018, Joe Perches wrote:

> On Thu, 2018-04-12 at 08:22 +0200, Julia Lawall wrote:
> > On Wed, 11 Apr 2018, Joe Perches wrote:
> > > On Wed, 2018-04-11 at 09:29 -0700, Andrew Morton wrote:
> > > > We already have some 500 bools-in-structs
> > >
> > > I got at least triple that only in include/
> > > so I expect there are at probably an order
> > > of magnitude more than 500 in the kernel.
> > >
> > > I suppose some cocci script could count the
> > > actual number of instances.  A regex can not.
> >
> > I got 12667.
>
> Could you please post the cocci script?
>
> > I'm not sure to understand the issue.  Will using a bitfield help if there
> > are no other bitfields in the structure?
>
> IMO, not really.
>
> The primary issue is described by Linus here:
> https://lkml.org/lkml/2017/11/21/384
>
> I personally do not find a significant issue with
> uncontrolled sizes of bool in kernel structs as
> all of the kernel structs are transitory and not
> written out to storage.
>
> I suppose bool bitfields are also OK, but for the
> RMW required.
>
> Using unsigned int :1 bitfield instead of bool :1
> has the negative of truncation so that the uint
> has to be set with !! instead of a simple assign.

At least with gcc 5.4.0, a number of structures become larger with
unsigned int :1. bool:1 seems to mostly solve this problem.  The structure
ichx_desc, defined in drivers/gpio/gpio-ich.c seems to become larger with
both approaches.

julia

Re: 4.15.14 crash with iscsi target and dvd

2018-04-14 Thread Wakko Warner

Ming Lei wrote:
> On Thu, Apr 12, 2018 at 09:43:02PM -0400, Wakko Warner wrote:
> > Ming Lei wrote:
> > > On Tue, Apr 10, 2018 at 08:45:25PM -0400, Wakko Warner wrote:
> > > > Sorry for the delay.  I reverted my change, added this one.  I didn't
> > > > reboot, I just unloaded and loaded this one.
> > > > Note: /dev/sr1 as seen from the initiator is /dev/sr0 (physical disc) 
> > > > on the
> > > > target.
> > > > 
> > > > Doesn't crash, however on the initiator I see this:
> > > > [9273849.70] ISO 9660 Extensions: RRIP_1991A
> > > > [9273863.359718] scsi_io_completion: 13 callbacks suppressed
> > > > [9273863.359788] sr 26:0:0:0: [sr1] tag#1 UNKNOWN(0x2003) Result: 
> > > > hostbyte=0x00 driverbyte=0x08
> > > > [9273863.359909] sr 26:0:0:0: [sr1] tag#1 Sense Key : 0x2 [current] 
> > > > [9273863.359974] sr 26:0:0:0: [sr1] tag#1 ASC=0x8 ASCQ=0x0 
> > > > [9273863.360036] sr 26:0:0:0: [sr1] tag#1 CDB: opcode=0x28 28 00 00 22 
> > > > f6 96 00 00 80 00
> > > > [9273863.360116] blk_update_request: 13 callbacks suppressed
> > > > [9273863.360177] blk_update_request: I/O error, dev sr1, sector 9165400
> > > > [9273875.864648] sr 26:0:0:0: [sr1] tag#1 UNKNOWN(0x2003) Result: 
> > > > hostbyte=0x00 driverbyte=0x08
> > > > [9273875.864738] sr 26:0:0:0: [sr1] tag#1 Sense Key : 0x2 [current] 
> > > > [9273875.864801] sr 26:0:0:0: [sr1] tag#1 ASC=0x8 ASCQ=0x0 
> > > > [9273875.864890] sr 26:0:0:0: [sr1] tag#1 CDB: opcode=0x28 28 00 00 22 
> > > > f7 16 00 00 80 00
> > > > [9273875.864971] blk_update_request: I/O error, dev sr1, sector 9165912
> > > > 
> > > > To cause this, I mounted the dvd as seen in the first line and ran this
> > > > command: find /cdrom2 -type f | xargs -tn1 cat > /dev/null
> > > > I did some various tests.  Each test was done after umount and mount to
> > > > clear the cache.
> > > > cat  > /dev/null causes the message.
> > > > dd if= of=/dev/null bs=2048 doesn't
> > > > using bs=4096 doesn't
> > > > using bs=64k doesn't
> > > > using bs=128k does
> > > > cat uses a blocksize of 128k.
> > > > 
> > > > The following was done without being mounted.
> > > > ddrescue -f -f /dev/sr1 /dev/null 
> > > > doesn't cause the message
> > > > dd if=/dev/sr1 of=/dev/null bs=128k
> > > > doesn't cause the message
> > > > using bs=256k causes the message once:
> > > > [9275916.857409] sr 27:0:0:0: [sr1] tag#0 UNKNOWN(0x2003) Result: 
> > > > hostbyte=0x00 driverbyte=0x08
> > > > [9275916.857482] sr 27:0:0:0: [sr1] tag#0 Sense Key : 0x2 [current] 
> > > > [9275916.857520] sr 27:0:0:0: [sr1] tag#0 ASC=0x8 ASCQ=0x0 
> > > > [9275916.857556] sr 27:0:0:0: [sr1] tag#0 CDB: opcode=0x28 28 00 00 00 
> > > > 00 00 00 00 80 00
> > > > [9275916.857614] blk_update_request: I/O error, dev sr1, sector 0
> > > > 
> > > > If I access the disc from the target natively either by mounting and
> > > > accessing files or working with the device directly (ie dd) no errors 
> > > > are
> > > > logged on the target.
> > > 
> > > OK, thanks for your test.
> > > 
> > > Could you test the following patch and see if there is still the failure
> > > message?
> > > 
> > > diff --git a/drivers/target/target_core_pscsi.c 
> > > b/drivers/target/target_core_pscsi.c
> > > index 0d99b242e82e..6137287b52fb 100644
> > > --- a/drivers/target/target_core_pscsi.c
> > > +++ b/drivers/target/target_core_pscsi.c
> > > @@ -913,9 +913,11 @@ pscsi_map_sg(struct se_cmd *cmd, struct scatterlist 
> > > *sgl, u32 sgl_nents,
> > >  
> > >   rc = bio_add_pc_page(pdv->pdv_sd->request_queue,
> > >   bio, page, bytes, off);
> > > + if (rc != bytes)
> > > + goto fail;
> > >   pr_debug("PSCSI: bio->bi_vcnt: %d nr_vecs: %d\n",
> > >   bio_segments(bio), nr_vecs);
> > > - if (rc != bytes) {
> > > + if (/*rc != bytes*/0) {
> > >   pr_debug("PSCSI: Reached bio->bi_vcnt max:"
> > >   " %d i: %d bio: %p, allocating another"
> > >   " bio\n", bio->bi_vcnt, i, bio);
> > 
> > Target doesn't crash but the errors on the initiator are still there.
> 
> OK, then this error log isn't related with my commit, because the patch
> I sent to you in last email is to revert my commit simply.
> 
> But the following patch is one correct fix for your crash.
> 
> https://marc.info/?l=linux-kernel&m=152331690727052&w=2

Ok, that'll be the one I used.  Do you know when it'll go upstream?

-- 
 Microsoft has beaten Volkswagen's world record.  Volkswagen only created 22
 million bugs.

Re: [PATCH] fs/dcache.c: re-add cond_resched() in shrink_dcache_parent()

2018-04-14 Thread Linus Torvalds

On Sat, Apr 14, 2018 at 1:58 PM, Al Viro  wrote:
>
> That breaks d_invalidate(), unfortunately.  Look at the termination
> conditions in the loop there...

Ugh. I was going to say "but that doesn't even use select_collect()",
but yeah, detach_and_collect() calls it.

It would be easy enough to just change the

if (!list_empty(&data.select.dispose))

there to

if (!list_empty(&data.select.found))

too.

In fact, it probably *should* do that, exactly to get the whole
"cond_resched()" call in that whole call chain too. Because as-is, it
looks like it has the same issue as shrink_dcache_parent() does..

But yeah, the fact that I didn't notice that makes me a bit nervous.
But now I triple-checked, there are no other indirect callers.

Linus

[PATCH] KVM: Switch 'requests' to be 64-bit (explicitly)

2018-04-14 Thread KarimAllah Ahmed

Switch 'requests' to be explicitly 64-bit and update BUILD_BUG_ON check to
use the size of "requests" instead of the hard-coded '32'.

That gives us a bit more room again for arch-specific requests as we
already ran out of space for x86 due to the hard-coded check.

Cc: Paolo Bonzini 
Cc: Radim Krčmář 
Cc: k...@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: KarimAllah Ahmed 
---
 include/linux/kvm_host.h | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 6930c63..fe4f46b 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -129,7 +129,7 @@ static inline bool is_error_page(struct page *page)
 #define KVM_REQUEST_ARCH_BASE 8
 
 #define KVM_ARCH_REQ_FLAGS(nr, flags) ({ \
-   BUILD_BUG_ON((unsigned)(nr) >= 32 - KVM_REQUEST_ARCH_BASE); \
+   BUILD_BUG_ON((unsigned)(nr) >= (sizeof(((struct kvm_vcpu 
*)0)->requests) * 8) - KVM_REQUEST_ARCH_BASE); \
(unsigned)(((nr) + KVM_REQUEST_ARCH_BASE) | (flags)); \
 })
 #define KVM_ARCH_REQ(nr)   KVM_ARCH_REQ_FLAGS(nr, 0)
@@ -223,7 +223,7 @@ struct kvm_vcpu {
int vcpu_id;
int srcu_idx;
int mode;
-   unsigned long requests;
+   u64 requests;
unsigned long guest_debug;
 
int pre_pcpu;
@@ -1122,7 +1122,7 @@ static inline void kvm_make_request(int req, struct 
kvm_vcpu *vcpu)
 * caller.  Paired with the smp_mb__after_atomic in kvm_check_request.
 */
smp_wmb();
-   set_bit(req & KVM_REQUEST_MASK, &vcpu->requests);
+   set_bit(req & KVM_REQUEST_MASK, (void *)&vcpu->requests);
 }
 
 static inline bool kvm_request_pending(struct kvm_vcpu *vcpu)
@@ -1132,12 +1132,12 @@ static inline bool kvm_request_pending(struct kvm_vcpu 
*vcpu)
 
 static inline bool kvm_test_request(int req, struct kvm_vcpu *vcpu)
 {
-   return test_bit(req & KVM_REQUEST_MASK, &vcpu->requests);
+   return test_bit(req & KVM_REQUEST_MASK, (void *)&vcpu->requests);
 }
 
 static inline void kvm_clear_request(int req, struct kvm_vcpu *vcpu)
 {
-   clear_bit(req & KVM_REQUEST_MASK, &vcpu->requests);
+   clear_bit(req & KVM_REQUEST_MASK, (void *)&vcpu->requests);
 }
 
 static inline bool kvm_check_request(int req, struct kvm_vcpu *vcpu)
-- 
2.7.4

Re: Regression with 5dcd8400884c ("macsec: missing dev_put() on error in macsec_newlink()")

2018-04-14 Thread Sabrina Dubroca

Hello Laura,

2018-04-14, 10:56:55 -0700, Laura Abbott wrote:
> Hi,
> 
> Fedora got a bug report of a regression when trying to remove the
> the macsec module (https://bugzilla.redhat.com/show_bug.cgi?id=1566410).
> I did a bisect and found
> 
> commit 5dcd8400884cc4a043a6d4617e042489e5d566a9
> Author: Dan Carpenter 
> Date:   Wed Mar 21 11:09:01 2018 +0300
> 
> macsec: missing dev_put() on error in macsec_newlink()
> We moved the dev_hold(real_dev); call earlier in the function but forgot
> to update the error paths.
> Fixes: 0759e552bce7 ("macsec: fix negative refcnt on parent link")
> Signed-off-by: Dan Carpenter 
> Signed-off-by: David S. Miller 
> 
> The script I used for testing based on the reporter is attached. It
> looks like modprobe is stuck in the D state. Any idea?

I don't think that reference was actually leaked. It gets released in
macsec_free_netdev() when the device is deleted.

modprobe getting stuck is just a side-effect of the refcount going
negative on the parent device, since removing the module needs to take
the lock that is held by device deletion.

I'll send a revert tomorrow.

Thanks for the report,

-- 
Sabrina

Re: [PATCH 2/2] kvm: nVMX: Introduce KVM_CAP_STATE

2018-04-14 Thread Raslan, KarimAllah

On Sat, 2018-04-14 at 15:56 +, Raslan, KarimAllah wrote:
> On Thu, 2018-04-12 at 17:12 +0200, KarimAllah Ahmed wrote:
> > 
> > From: Jim Mattson 
> > 
> > For nested virtualization L0 KVM is managing a bit of state for L2 guests,
> > this state can not be captured through the currently available IOCTLs. In
> > fact the state captured through all of these IOCTLs is usually a mix of L1
> > and L2 state. It is also dependent on whether the L2 guest was running at
> > the moment when the process was interrupted to save its state.
> > 
> > With this capability, there are two new vcpu ioctls: KVM_GET_VMX_STATE and
> > KVM_SET_VMX_STATE. These can be used for saving and restoring a VM that is
> > in VMX operation.
> > 
> > Cc: Paolo Bonzini 
> > Cc: Radim Krčmář 
> > Cc: Thomas Gleixner 
> > Cc: Ingo Molnar 
> > Cc: H. Peter Anvin 
> > Cc: x...@kernel.org
> > Cc: k...@vger.kernel.org
> > Cc: linux-kernel@vger.kernel.org
> > Signed-off-by: Jim Mattson 
> > [karahmed@ - rename structs and functions and make them ready for AMD and
> >  address previous comments.
> >- rebase & a bit of refactoring.
> >- Merge 7/8 and 8/8 into one patch.
> >- Force a VMExit from L2 after reading the kvm_state to avoid
> >  mixed state between L1 and L2 on resurrecting the instance. ]
> > Signed-off-by: KarimAllah Ahmed 
> > ---
> > v2 -> v3:
> > - Remove the forced VMExit from L2 after reading the kvm_state. The actual
> >   problem is solved.
> > - Rebase again!
> > - Set nested_run_pending during restore (not sure if it makes sense yet or
> >   not).
> > - Reduce KVM_REQUEST_ARCH_BASE to 7 instead of 8 (the other alternative is
> >   to switch everything to u64)
> > 
> > v1 -> v2:
> > - Rename structs and functions and make them ready for AMD and address
> >   previous comments.
> > - Rebase & a bit of refactoring.
> > - Merge 7/8 and 8/8 into one patch.
> > - Force a VMExit from L2 after reading the kvm_state to avoid mixed state
> >   between L1 and L2 on resurrecting the instance.
> > ---
> >  Documentation/virtual/kvm/api.txt |  47 ++
> >  arch/x86/include/asm/kvm_host.h   |   7 ++
> >  arch/x86/include/uapi/asm/kvm.h   |  38 
> >  arch/x86/kvm/vmx.c| 177 
> > +-
> >  arch/x86/kvm/x86.c|  21 +
> >  include/linux/kvm_host.h  |   2 +-
> >  include/uapi/linux/kvm.h  |   5 ++
> >  7 files changed, 292 insertions(+), 5 deletions(-)
> > 
> > diff --git a/Documentation/virtual/kvm/api.txt 
> > b/Documentation/virtual/kvm/api.txt
> > index 1c7958b..c51d5d3 100644
> > --- a/Documentation/virtual/kvm/api.txt
> > +++ b/Documentation/virtual/kvm/api.txt
> > @@ -3548,6 +3548,53 @@ Returns: 0 on success,
> > -ENOENT on deassign if the conn_id isn't registered
> > -EEXIST on assign if the conn_id is already registered
> >  
> > +4.114 KVM_GET_STATE
> > +
> > +Capability: KVM_CAP_STATE
> > +Architectures: x86
> > +Type: vcpu ioctl
> > +Parameters: struct kvm_state (in/out)
> > +Returns: 0 on success, -1 on error
> > +Errors:
> > +  E2BIG: the data size exceeds the value of 'size' specified by
> > + the user (the size required will be written into size).
> > +
> > +struct kvm_state {
> > +   __u16 flags;
> > +   __u16 format;
> > +   __u32 size;
> > +   union {
> > +   struct kvm_vmx_state vmx;
> > +   struct kvm_svm_state svm;
> > +   __u8 pad[120];
> > +   };
> > +   __u8 data[0];
> > +};
> > +
> > +This ioctl copies the vcpu's kvm_state struct from the kernel to userspace.
> > +
> > +4.115 KVM_SET_STATE
> > +
> > +Capability: KVM_CAP_STATE
> > +Architectures: x86
> > +Type: vcpu ioctl
> > +Parameters: struct kvm_state (in)
> > +Returns: 0 on success, -1 on error
> > +
> > +struct kvm_state {
> > +   __u16 flags;
> > +   __u16 format;
> > +   __u32 size;
> > +   union {
> > +   struct kvm_vmx_state vmx;
> > +   struct kvm_svm_state svm;
> > +   __u8 pad[120];
> > +   };
> > +   __u8 data[0];
> > +};
> > +
> > +This copies the vcpu's kvm_state struct from userspace to the kernel.
> > +>>> 13a7c9e... kvm: nVMX: Introduce KVM_CAP_STATE
> >  
> >  5. The kvm_run structure
> >  
> > diff --git a/arch/x86/include/asm/kvm_host.h 
> > b/arch/x86/include/asm/kvm_host.h
> > index 9fa4f57..ad2116a 100644
> > --- a/arch/x86/include/asm/kvm_host.h
> > +++ b/arch/x86/include/asm/kvm_host.h
> > @@ -75,6 +75,7 @@
> >  #define KVM_REQ_HV_EXITKVM_ARCH_REQ(21)
> >  #define KVM_REQ_HV_STIMER  KVM_ARCH_REQ(22)
> >  #define KVM_REQ_LOAD_EOI_EXITMAP   KVM_ARCH_REQ(23)
> > +#define KVM_REQ_GET_VMCS12_PAGES   KVM_ARCH_REQ(24)
> >  
> >  #define CR0_RESERVED_BITS   \
> > (~(unsigned long)(X86_CR0_PE | X86_CR0_MP | X86_CR0_EM | X86_CR0_TS \
> > @@ -1084,6 +1085,12 @@ struct kvm_x86_ops {
> >  
> > void (*setup_mce)(struct kvm_vc

Re: repeatable boot randomness inside KVM guest

2018-04-14 Thread Andy Lutomirski

On Sat, Apr 14, 2018 at 12:59 PM, Alexey Dobriyan  wrote:
> SLAB allocators got CONFIG_SLAB_FREELIST_RANDOM option which randomizes
> allocation pattern inside a slab:
>
>
> #ifdef CONFIG_SLAB_FREELIST_RANDOM
> /* Pre-initialize the random sequence cache */
> static int init_cache_random_seq(struct kmem_cache *s)
> {
> ...
>
> Then I printed actual random sequences for each kmem cache.
> Turned out they were all the same for most of the caches and
> they didn't vary across guest reboots.
>
> int cache_random_seq_create(struct kmem_cache *cachep, unsigned int 
> count, gfp_t gfp)
> {
> ...
> /* Get best entropy at this stage of boot */
> prandom_seed_state(&state, get_random_long());
>
> Then I searched internet and turned out KVM can pass randomness via
> virtio-rng or something. So I linked /dev/urandom.
>
> And it didn't help!
>
> The only way to get randomness for SLAB is to enable RDRAND inside guest.
>
> Is it KVM bug?
>
> For the record I'm using qemu 2.11.1-r2 and whatever F27 ships now.

virtio-rng doesn't really do that.  I have an ancient patch set to do
exactly what you want, and I should dust it off.

Re: [RFC PATCH for 4.18 12/23] cpu_opv: Provide cpu_opv system call (v7)

2018-04-14 Thread Andy Lutomirski

On Thu, Apr 12, 2018 at 12:43 PM, Linus Torvalds
 wrote:
> On Thu, Apr 12, 2018 at 12:27 PM, Mathieu Desnoyers
>  wrote:
>> The cpu_opv system call executes a vector of operations on behalf of
>> user-space on a specific CPU with preemption disabled. It is inspired
>> by readv() and writev() system calls which take a "struct iovec"
>> array as argument.
>
> Do we really want the page pinning?
>
> This whole cpu_opv thing is the most questionable part of the series,
> and the page pinning is the most questionable part of cpu_opv for me.
>
> Can we plan on merging just the plain rseq parts *without* this all
> first, and then see the cpu_opv thing as a "maybe future expansion"
> part.
>
> I think that would make Andy happier too.
>

It only makes me happier if the userspace code involved is actually
going to work when single-stepped, which might actually be the case
(fingers crossed).  That being said, I'm not really convinced that
cpu_opv() makes much difference here, since I'm not entirely convinced
that user code will actually use it or that user code will actually be
that well tested.  C'est la vie.

Re: repeatable boot randomness inside KVM guest

2018-04-14 Thread Theodore Y. Ts'o

+linux...@kvack.org
k...@vger.kernel.org, secur...@kernel.org moved to bcc

On Sat, Apr 14, 2018 at 10:59:21PM +0300, Alexey Dobriyan wrote:
> SLAB allocators got CONFIG_SLAB_FREELIST_RANDOM option which randomizes
> allocation pattern inside a slab:
> 
>   int cache_random_seq_create(struct kmem_cache *cachep, unsigned int 
> count, gfp_t gfp)
>   {
>   ...
>   /* Get best entropy at this stage of boot */
>   prandom_seed_state(&state, get_random_long());
>
> Then I printed actual random sequences for each kmem cache.
> Turned out they were all the same for most of the caches and
> they didn't vary across guest reboots.

The problem is at the super-early state of the boot path, kernel code
can't allocate memory.  This is something most device drivers kinda
assume they can do.  :-)

So it means we haven't yet initialized the virtio-rng driver, and it's
before interrupts have been enabled, so we can't harvest any entropy
from interrupt timing.  So that's why trying to use virtio-rng didn't
help.

> The only way to get randomness for SLAB is to enable RDRAND inside guest.
> 
> Is it KVM bug?

No, it's not a KVM bug.  The fundamental issue is in how the
CONFIG_SLAB_FREELIST_RANDOM is currently implemented.

What needs to happen is freelist should get randomized much later in
the boot sequence.  Doing it later will require locking; I don't know
enough about the slab/slub code to know whether the slab_mutex would
be sufficient, or some other lock might need to be added.

The other thing I would note that is that using prandom_u32_state() doesn't
really provide much security.  In fact, if the the goal is to protect
against a malicious attacker trying to guess what addresses will be
returned by the slab allocator, I suspect it's much like the security
patdowns done at airports.  It might protect against a really stupid
attacker, but it's mostly security theater.

The freelist randomization is only being done once; so it's not like
performance is really an issue.  It would be much better to just use
get_random_u32() and be done with it.  I'd drop using prandom_*
functions in slab.c and slubct and slab_common.c, and just use a
really random number generator, if the goal is real security as
opposed to security for show

(Not that there's necessarily any thing wrong with security theater;
the US spends over 3 billion dollars a year on security theater.  As
politicians know, symbolism can be important.  :-)

Cheers,

- Ted

[GIT PULL] Please pull powerpc/linux.git powerpc-4.17-2 tag

2018-04-14 Thread Michael Ellerman

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Hi Linus,

Please pull some powerpc fixes for 4.17-rc1 if you can:

The following changes since commit 49a695ba723224875df50e327bd7b0b65dd9a56b:

  Merge tag 'powerpc-4.17-1' of 
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux (2018-04-07 
12:08:19 -0700)

are available in the git repository at:

  https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git 
tags/powerpc-4.17-2

for you to fetch changes up to 81b654c273914704a4bdf580f28d67aaba1094e4:

  powerpc/64s: Fix CPU_FTRS_ALWAYS vs DT CPU features (2018-04-13 23:51:44 
+1000)

- 
powerpc fixes for 4.17 #2

 - Fix crashes when loading modules built with a different CONFIG_RELOCATABLE
   value by adding CONFIG_RELOCATABLE to vermagic.

 - Fix busy loops in the OPAL NVRAM driver if we get certain error conditions
   from firmware.

 - Remove tlbie trace points from KVM code that's called in real mode, because
   it causes crashes.

 - Fix checkstops caused by invalid tlbiel on Power9 Radix.

 - Ensure the set of CPU features we "know" are always enabled is actually the
   minimal set when we build with support for firmware supplied CPU features.

Thanks to:
  Aneesh Kumar K.V, Anshuman Khandual, Nicholas Piggin.

- 
Aneesh Kumar K.V (1):
  powerpc/8xx: Fix build with hugetlbfs enabled

Anshuman Khandual (1):
  powerpc/fscr: Enable interrupts earlier before calling get_user()

Michael Ellerman (4):
  powerpc/modules: Fix crashes by adding CONFIG_RELOCATABLE to vermagic
  powerpc/64s: Fix section mismatch warnings from setup_rfi_flush()
  powerpc/mm/radix: Fix checkstops caused by invalid tlbiel
  powerpc/64s: Fix CPU_FTRS_ALWAYS vs DT CPU features

Nicholas Piggin (3):
  powerpc/powernv: define a standard delay for OPAL_BUSY type retry loops
  powerpc/powernv: Fix OPAL NVRAM driver OPAL_BUSY loops
  KVM: PPC: Book3S HV: trace_tlbie must not be called in realmode

 arch/powerpc/include/asm/cputable.h | 23 +++--
 arch/powerpc/include/asm/module.h   | 12 ++-
 arch/powerpc/include/asm/opal.h |  3 +++
 arch/powerpc/kernel/dt_cpu_ftrs.c   | 14 +
 arch/powerpc/kernel/setup_64.c  |  2 +-
 arch/powerpc/kernel/traps.c | 32 +++--
 arch/powerpc/kvm/book3s_hv_rm_mmu.c |  4 
 arch/powerpc/mm/slice.c |  1 +
 arch/powerpc/mm/tlb-radix.c |  5 ++---
 arch/powerpc/platforms/powernv/opal-nvram.c |  7 ++-
 10 files changed, 63 insertions(+), 40 deletions(-)
-BEGIN PGP SIGNATURE-

iQIcBAEBCAAGBQJa0oerAAoJEFHr6jzI4aWAvCIP/iG5Fv1b3MoslwQkCsAcMtTy
zO/Ga7H54Kpiopr9F5T4JTKVOMVBb4urs8mP9RXGUzqf9iuNy3ZWGJfbHn1VXw4r
wKFx5MfZYKNkoi8tGZHmbxhcSzFKsGHdCAEfC/SiNZCZZnbnt4NjWlUV4/QRts4a
/LEyvrWPGzGCZF1y+LpFREAJakhJ1uklJNbqwMvRtlXmoJoqODNCRPk1Tmy8fQ8E
eYLtAYGcN0x9w0YRo/1cTkJM/cksLPzJjZZn/6GR1vWRS544+iTSs6xT81RC7/yB
2QVsKD+V7D3iJ3iQ4DhCNkpMHNZjLqDNymMLjcYM1H/mPobvsegwZuGDm1jr7++D
3XBZa9wO6/dAOflu+nMlVNd323BMsdhGcI2WiZzsdkBh+aWU6hkQrgEG1uY3XV90
8zlOk6Cmiq+aYkcemEzMCvV1gYSpiauZx2q8Y/GKww2BVekRUKpsBTgcZvvKbBUX
XBJtAkRo5hR2o2qLAUwSXiJuGcfrlZBuZT0qCb1SYd9XRxIevvb1iQz2Yngxr6PI
n9reO01c6a25CJQqNLH07iy+eZcsWUNcrzEjaeHaHWPl+zcl0AWEuGj3Q80/SM6Y
rqOBMn3YSANpFjmat90c6CSWC/Bdf2nORMEtIHQ5mKsnL4rBv6X+pvMf3/TSwDNa
QzAeCp1gwX940ngqW3H7
=VzIU
-END PGP SIGNATURE-

Re: repeatable boot randomness inside KVM guest

2018-04-14 Thread Alexey Dobriyan

On Sat, Apr 14, 2018 at 03:41:42PM -0700, Andy Lutomirski wrote:
> On Sat, Apr 14, 2018 at 12:59 PM, Alexey Dobriyan  wrote:
> > SLAB allocators got CONFIG_SLAB_FREELIST_RANDOM option which randomizes
> > allocation pattern inside a slab:
> >
> >
> > #ifdef CONFIG_SLAB_FREELIST_RANDOM
> > /* Pre-initialize the random sequence cache */
> > static int init_cache_random_seq(struct kmem_cache *s)
> > {
> > ...
> >
> > Then I printed actual random sequences for each kmem cache.
> > Turned out they were all the same for most of the caches and
> > they didn't vary across guest reboots.
> >
> > int cache_random_seq_create(struct kmem_cache *cachep, unsigned int 
> > count, gfp_t gfp)
> > {
> > ...
> > /* Get best entropy at this stage of boot */
> > prandom_seed_state(&state, get_random_long());
> >
> > Then I searched internet and turned out KVM can pass randomness via
> > virtio-rng or something. So I linked /dev/urandom.
> >
> > And it didn't help!
> >
> > The only way to get randomness for SLAB is to enable RDRAND inside guest.
> >
> > Is it KVM bug?
> >
> > For the record I'm using qemu 2.11.1-r2 and whatever F27 ships now.
> 
> virtio-rng doesn't really do that.  I have an ancient patch set to do
> exactly what you want, and I should dust it off.

Please, do. Here is a list of caches which aren't exactly randomly
randomized with my setup. Many important ones are there :-(

XXX name 'dma-kmalloc-96', r b1e6718e2e7147d4
XXX name 'dma-kmalloc-192', r a7664a0d69968019
XXX name 'dma-kmalloc-8', r 662c2e986443235c
XXX name 'dma-kmalloc-16', r 770a9b620ae4cd62
XXX name 'dma-kmalloc-32', r 2e200073d5fa9f46
XXX name 'dma-kmalloc-64', r d8538fda83c74168
XXX name 'dma-kmalloc-128', r 9e4b956d09dd7d44
XXX name 'dma-kmalloc-256', r 8b14bcb58f9e18f5
XXX name 'dma-kmalloc-512', r 2bbace4b7120624a
XXX name 'dma-kmalloc-1024', r 7cdf44406db52f5b
XXX name 'dma-kmalloc-2048', r 18fe0ebf6bcfdf43
XXX name 'dma-kmalloc-4096', r 9f1a5eee118facf7
XXX name 'dma-kmalloc-8192', r f514d72a1cc441a2
XXX name 'kmalloc-8192', r 14843df817b556cc
XXX name 'kmalloc-4096', r 52ed85fa9c691bbe
XXX name 'kmalloc-2048', r fa81aa9222ff65a7
XXX name 'kmalloc-1024', r ae355c02d31f21d3
XXX name 'kmalloc-512', r 5fe0d22aaf2ef8d9
XXX name 'kmalloc-256', r 336d07a06917b95
XXX name 'kmalloc-192', r 6b6cd5399dd06d95
XXX name 'kmalloc-128', r 893b9e85369964ab
XXX name 'kmalloc-96', r 179e185395d2612
XXX name 'kmalloc-64', r 29cf688b37eccea7
XXX name 'kmalloc-32', r fb7b4e7dca6de00a
XXX name 'kmalloc-16', r a2a441fdc499d0c7
XXX name 'kmalloc-8', r e5454c7095ddd2be
XXX name 'kmem_cache_node', r 500dc6126a47b229
XXX name 'kmem_cache', r 816c8c7bcde08372
XXX name 'task_group', r c09c4d1c1436ce97
XXX name 'radix_tree_node', r 4dd9540b830a4ea8
XXX name 'pool_workqueue', r 88b1e9d9a1f0b570
XXX name 'Acpi-Namespace', r 3e34d55f8f1cb140
XXX name 'Acpi-State', r b94e04635e77b48a
XXX name 'Acpi-Parse', r d5374863b90f2a4c
XXX name 'Acpi-ParseExt', r eefb2fff892f64a9
XXX name 'Acpi-Operand', r ce51949bcc80af13
XXX name 'pid', r cd6d8ee9e5209156
XXX name 'anon_vma', r c3a9273a68127ac7
XXX name 'anon_vma_chain', r a7cec15033c31a9b
XXX name 'cred_jar', r fe4cc38c6d99cf63
XXX name 'task_struct', r eecb8895c6b7dbdb
XXX name 'sighand_cache', r e5243c5eb2ce3a63
XXX name 'signal_cache', r 88b2e108d8ef81c7
XXX name 'files_cache', r ee29814e58dc909c
XXX name 'fs_cache', r bc700a5f8fc28ff8
XXX name 'mm_struct', r f5230f99c7447359
XXX name 'vm_area_struct', r e30f3f8e648a9f88
XXX name 'nsproxy', r ae7c08b524a0f4d4
XXX name 'uts_namespace', r 6b1266178968ed99
XXX name 'buffer_head', r b24c10679dc55a11
XXX name 'names_cache', r 2e023b54e3ca5b8f
XXX name 'dentry', r 83cc18634fbd74e8
XXX name 'inode_cache', r ff9a0ff3b4665cf5
XXX name 'filp', r 4fdad214b7ca7fc1
XXX name 'mnt_cache', r 8e726d32470b23e0
XXX name 'kernfs_node_cache', r 929c5f56778d365d
XXX name 'bdev_cache', r 8a5520036bd0a464
XXX name 'sigqueue', r 2cf75c4d16191efb
XXX name 'seq_file', r ec3ba1fe514524d5
XXX name 'proc_inode_cache', r b0c76cbbda5bb41f
XXX name 'pde_opener', r 5f82f8e7100a517c
XXX name 'proc_dir_entry', r ebabc4e93b52d7b8
XXX name 'shmem_inode_cache', r 2b25a3eb9aa32973
XXX name 'net_namespace', r 95793a7eae08a33f

Re: [PATCH 17/30] Documentation: kconfig: document a new Kconfig macro language

2018-04-14 Thread Randy Dunlap

On 04/12/18 22:06, Masahiro Yamada wrote:
> Add a document for the macro language introduced to Kconfig.
> 
> Signed-off-by: Masahiro Yamada 
> ---
> 
> Changes in v3: None
> Changes in v2: None
> 
>  Documentation/kbuild/kconfig-macro-language.txt | 179 
> 
>  MAINTAINERS |   2 +-
>  2 files changed, 180 insertions(+), 1 deletion(-)
>  create mode 100644 Documentation/kbuild/kconfig-macro-language.txt
> 
> diff --git a/Documentation/kbuild/kconfig-macro-language.txt 
> b/Documentation/kbuild/kconfig-macro-language.txt
> new file mode 100644
> index 000..1f6281b
> --- /dev/null
> +++ b/Documentation/kbuild/kconfig-macro-language.txt
> @@ -0,0 +1,179 @@
> +Concept
> +---
> +
> +The basic idea was inspired by Make. When we look at Make, we notice sort of
> +two languages in one. One language describes dependency graphs consisting of
> +targets and prerequisites. The other is a macro language for performing 
> textual
> +substitution.
> +
> +There is clear distinction between the two language stages. For example, you
> +can write a makefile like follows:
> +
> +APP := foo
> +SRC := foo.c
> +CC := gcc
> +
> +$(APP): $(SRC)
> +$(CC) -o $(APP) $(SRC)
> +
> +The macro language replaces the variable references with their expanded form,
> +and handles as if the source file were input like follows:
> +
> +foo: foo.c
> +gcc -o foo foo.c
> +
> +Then, Make analyzes the dependency graph and determines the targets to be
> +updated.
> +
> +The idea is quite similar in Kconfig - it is possible to describe a Kconfig
> +file like this:
> +
> +CC := gcc
> +
> +config CC_HAS_FOO
> +def_bool $(shell $(srctree)/scripts/gcc-check-foo.sh $(CC))
> +
> +The macro language in Kconfig processes the source file into the following
> +intermediate:
> +
> +config CC_HAS_FOO
> +def_bool y
> +
> +Then, Kconfig moves onto the evaluation stage to resolve inter-symbol
> +dependency, which is explained in kconfig-language.txt.
> +
> +
> +Variables
> +-
> +
> +Like in Make, a variable in Kconfig works as a macro variable.  A macro
> +variable is expanded "in place" to yield a text string that may then expanded

   may then be 
expanded

> +further. To get the value of a variable, enclose the variable name in $( ).
> +As a special case, single-letter variable names can omit the parentheses and 
> is

and 
are

> +simply referenced like $X. Unlike Make, Kconfig does not support curly braces
> +as in ${CC}.
> +
> +There are two types of variables: simply expanded variables and recursively
> +expanded variables.
> +
> +A simply expanded variable is defined using the := assignment operator. Its
> +righthand side is expanded immediately upon reading the line from the Kconfig
> +file.
> +
> +A recursively expanded variable is defined using the = assignment operator.
> +Its righthand side is simply stored as the value of the variable without
> +expanding it in any way. Instead, the expansion is performed when the 
> variable
> +is used.
> +
> +There is another type of assignment operator; += is used to append text to a
> +variable. The righthand side of += is expanded immediately if the lefthand
> +side was originally defined as a simple variable. Otherwise, its evaluation 
> is
> +deferred.
> +
> +
> +Functions
> +-
> +
> +Like Make, Kconfig supports both built-in and user-defined functions. A
> +function invocation looks much like a variable reference, but includes one or
> +more parameters separated by commas:
> +
> +  $(function-name arg1, arg2, arg3)
> +
> +Some functions are implemented as a built-in function. Currently, Kconfig
> +supports the following:
> +
> + - $(shell command)
> +
> +  The 'shell' function accepts a single argument that is expanded and passed
> +  to a subshell for execution. The standard output of the command is then 
> read
> +  and returned as the value of the function. Every newline in the output is
> +  replaced with a space. Any trailing newlines are deleted. The standard 
> error
> +  is not returned, nor is any program exit status.
> +
> + - $(warning text)
> +
> +  The 'warning' function prints its arguments to stderr. The output is 
> prefixed
> +  with the name of the current Kconfig file, the current line number. It

  file and the current line number. It

> +  evaluates to an empty string.
> +
> + - $(info text)
> +
> +  The 'info' function is similar to 'warning' except that it sends its 
> argument
> +  to stdout without any Kconfig name or line number.

Are current Kconfig file name and line number available so that someone can
construct their own $(info message) messages?

> +
> +A user-defined function is defined by using the = operator. The parameters 
> are
> +referenced w

[GIT PULL] OpenRISC updates for 4.17

2018-04-14 Thread Stafford Horne

Hi Linus,

Please consider for pull,


The following changes since commit 0adb32858b0bddf4ada5f364a84ed60b196dbcda:

  Linux 4.16 (2018-04-01 14:20:27 -0700)

are available in the git repository at:

  git://github.com/openrisc/linux.git tags/for-linus

for you to fetch changes up to d56f3af9e801970d21c57621de3b42bc17eac152:

  openrisc: remove unused __ARCH_HAVE_MMU define (2018-04-08 02:15:47 +0900)


OpenRISC updates for v4.17

Just one small thing here, it came in a while back but I didnt have
anything in my 4.16 queue, still its the only thing for 4.17 so sending
it alone.

Small cleanup:
 - remove unused __ARCH_HAVE_MMU define


Tobias Klauser (1):
  openrisc: remove unused __ARCH_HAVE_MMU define

 arch/openrisc/include/uapi/asm/unistd.h | 2 --
 1 file changed, 2 deletions(-)

Re: [RFC tip/locking/lockdep v6 01/20] lockdep/Documention: Recursive read lock detection reasoning

2018-04-14 Thread Randy Dunlap

Hi,

Just a few typos etc. below...

On 04/11/2018 06:50 AM, Boqun Feng wrote:
> Signed-off-by: Boqun Feng 
> ---
>  Documentation/locking/lockdep-design.txt | 178 
> +++
>  1 file changed, 178 insertions(+)
> 
> diff --git a/Documentation/locking/lockdep-design.txt 
> b/Documentation/locking/lockdep-design.txt
> index 9de1c158d44c..6bb9e90e2c4f 100644
> --- a/Documentation/locking/lockdep-design.txt
> +++ b/Documentation/locking/lockdep-design.txt
> @@ -284,3 +284,181 @@ Run the command and save the output, then compare 
> against the output from
>  a later run of this command to identify the leakers.  This same output
>  can also help you find situations where runtime lock initialization has
>  been omitted.
> +
> +Recursive read locks:
> +-
> +
> +Lockdep now is equipped with deadlock detection for recursive read locks.
> +
> +Recursive read locks, as their name indicates, are the locks able to be
> +acquired recursively. Unlike non-recursive read locks, recursive read locks
> +only get blocked by current write lock *holders* other than write lock
> +*waiters*, for example:
> +
> + TASK A: TASK B:
> +
> + read_lock(X);
> +
> + write_lock(X);
> +
> + read_lock(X);
> +
> +is not a deadlock for recursive read locks, as while the task B is waiting 
> for
> +the lock X, the second read_lock() doesn't need to wait because it's a 
> recursive
> +read lock. However if the read_lock() is non-recursive read lock, then the 
> above
> +case is a deadlock, because even if the write_lock() in TASK B can not get 
> the
> +lock, but it can block the second read_lock() in TASK A.
> +
> +Note that a lock can be a write lock (exclusive lock), a non-recursive read
> +lock (non-recursive shared lock) or a recursive read lock (recursive shared
> +lock), depending on the lock operations used to acquire it (more 
> specifically,
> +the value of the 'read' parameter for lock_acquire()). In other words, a 
> single
> +lock instance has three types of acquisition depending on the acquisition
> +functions: exclusive, non-recursive read, and recursive read.
> +
> +To be concise, we call that write locks and non-recursive read locks as
> +"non-recursive" locks and recursive read locks as "recursive" locks.
> +
> +Recursive locks don't block each other, while non-recursive locks do (this is
> +even true for two non-recursive read locks). A non-recursive lock can block 
> the
> +corresponding recursive lock, and vice versa.
> +
> +A deadlock case with recursive locks involved is as follow:
> +
> + TASK A: TASK B:
> +
> + read_lock(X);
> + read_lock(Y);
> + write_lock(Y);
> + write_lock(X);
> +
> +Task A is waiting for task B to read_unlock() Y and task B is waiting for 
> task
> +A to read_unlock() X.
> +
> +Dependency types and strong dependency paths:
> +-
> +In order to detect deadlocks as above, lockdep needs to track different 
> dependencies.
> +There are 4 categories for dependency edges in the lockdep graph:
> +
> +1) -(NN)->: non-recursive to non-recursive dependency. "X -(NN)-> Y" means
> +X -> Y and both X and Y are non-recursive locks.
> +
> +2) -(RN)->: recursive to non-recursive dependency. "X -(RN)-> Y" means
> +X -> Y and X is recursive read lock and Y is non-recursive lock.
> +
> +3) -(NR)->: non-recursive to recursive dependency, "X -(NR)-> Y" means
> +X -> Y and X is non-recursive lock and Y is recursive lock.
> +
> +4) -(RR)->: recursive to recursive dependency, "X -(RR)-> Y" means
> +X -> Y and both X and Y are recursive locks.
> +
> +Note that given two locks, they may have multiple dependencies between them, 
> for example:
> +
> + TASK A:
> +
> + read_lock(X);
> + write_lock(Y);
> + ...
> +
> + TASK B:
> +
> + write_lock(X);
> + write_lock(Y);
> +
> +, we have both X -(RN)-> Y and X -(NN)-> Y in the dependency graph.
> +
> +We use -(*N)-> for edges that is either -(RN)-> or -(NN)->, the similar for 
> -(N*)->,
> +-(*R)-> and -(R*)->
> +
> +A "path" is a series of conjunct dependency edges in the graph. And we 
> define a
> +"strong" path, which indicates the strong dependency throughout each 
> dependency
> +in the path, as the path that doesn't have two conjunct edges (dependencies) 
> as
> +-(*R)-> and -(R*)->. In other words, a "strong" path is a path from a lock
> +walking to another through the lock dependencies, and if X -> Y -> Z in the
> +path (where X, Y, Z are locks), if the walk from X to Y is through a -(NR)-> 
> or
> +-(RR)-> dependency, the walk from Y to Z must not be through a -(RN)-> or
> +-(RR)-> dependency, otherwise it's not a strong path.
> +
> +We will see why the path is called "strong" in next section.
> +
> +Recursive Read Deadlock Detection:
> +--
> +

Re: repeatable boot randomness inside KVM guest

2018-04-14 Thread Matthew Wilcox

On Sat, Apr 14, 2018 at 06:44:19PM -0400, Theodore Y. Ts'o wrote:
> What needs to happen is freelist should get randomized much later in
> the boot sequence.  Doing it later will require locking; I don't know
> enough about the slab/slub code to know whether the slab_mutex would
> be sufficient, or some other lock might need to be added.

Could we have the bootloader pass in some initial randomness?

Re: [PATCH] fs/dcache.c: re-add cond_resched() in shrink_dcache_parent()

2018-04-14 Thread Al Viro

On Sat, Apr 14, 2018 at 02:47:21PM -0700, Linus Torvalds wrote:
> On Sat, Apr 14, 2018 at 1:58 PM, Al Viro  wrote:
> >
> > That breaks d_invalidate(), unfortunately.  Look at the termination
> > conditions in the loop there...
> 
> Ugh. I was going to say "but that doesn't even use select_collect()",
> but yeah, detach_and_collect() calls it.
> 
> It would be easy enough to just change the
> 
> if (!list_empty(&data.select.dispose))
> 
> there to
> 
> if (!list_empty(&data.select.found))
> 
> too.

You would have to do the same in check_and_drop() as well,
and that brings back d_invalidate()/d_invalidate() livelock
we used to have.  See 81be24d263db...

I'm trying to put something together, but the damn thing is
full of potential livelocks, unfortunately ;-/  Will send
a followup once I have something resembling a sane solution...

drivers/infiniband/hw/mlx5/main.c:4555: undefined reference to `uverbs_default_get_objects'

2018-04-14 Thread kbuild test robot

tree:   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 
master
head:   18b7fd1c93e5204355ddbf2608a097d64df81b88
commit: 8c84660bb437fe8692e6a2b4e85023ccb874a520 IB/mlx5: Initialize the 
parsing tree root without the help of uverbs
date:   10 days ago
config: x86_64-randconfig-s5-04150714 (attached as .config)
compiler: gcc-7 (Debian 7.3.0-1) 7.3.0
reproduce:
git checkout 8c84660bb437fe8692e6a2b4e85023ccb874a520
# save the attached .config to linux build tree
make ARCH=x86_64 

All errors (new ones prefixed by >>):

   drivers/infiniband/hw/mlx5/main.o: In function `populate_specs_root':
>> drivers/infiniband/hw/mlx5/main.c:4555: undefined reference to 
>> `uverbs_default_get_objects'
>> drivers/infiniband/hw/mlx5/main.c:4559: undefined reference to 
>> `uverbs_alloc_spec_tree'
   drivers/infiniband/hw/mlx5/main.o: In function `depopulate_specs_root':
>> drivers/infiniband/hw/mlx5/main.c:4566: undefined reference to 
>> `uverbs_free_spec_tree'

vim +4555 drivers/infiniband/hw/mlx5/main.c

  4550  
  4551  #define NUM_TREES   0
  4552  static int populate_specs_root(struct mlx5_ib_dev *dev)
  4553  {
  4554  const struct uverbs_object_tree_def *default_root[NUM_TREES + 
1] = {
> 4555  uverbs_default_get_objects()};
  4556  size_t num_trees = 1;
  4557  
  4558  dev->ib_dev.specs_root =
> 4559  uverbs_alloc_spec_tree(num_trees, default_root);
  4560  
  4561  return PTR_ERR_OR_ZERO(dev->ib_dev.specs_root);
  4562  }
  4563  
  4564  static void depopulate_specs_root(struct mlx5_ib_dev *dev)
  4565  {
> 4566  uverbs_free_spec_tree(dev->ib_dev.specs_root);
  4567  }
  4568  

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: application/gzip

Re: [PATCH v3 4/4] mm/sparse: Optimize memmap allocation during sparse_init()

2018-04-14 Thread Baoquan He

Hi Dave,

Sorry for late reply.

On 04/11/18 at 08:48am, Dave Hansen wrote:
> On 04/08/2018 01:20 AM, Baoquan He wrote:
> > On 04/06/18 at 07:50am, Dave Hansen wrote:
> >> The code looks fine to me.  It's a bit of a shame that there's no
> >> verification to ensure that idx_present never goes beyond the shiny new
> >> nr_present_sections. 
> > 
> > This is a good point. Do you think it's OK to replace (section_nr <
> > NR_MEM_SECTIONS) with (section_nr < nr_present_sections) in below
> > for_each macro? This for_each_present_section_nr() is only used
> > during sparse_init() execution.
> > 
> > #define for_each_present_section_nr(start, section_nr)  \
> > for (section_nr = next_present_section_nr(start-1); \
> >  ((section_nr >= 0) &&  \
> >   (section_nr < NR_MEM_SECTIONS) && \   
> >   
> >   (section_nr <= __highest_present_section_nr));\
> >  section_nr = next_present_section_nr(section_nr))
> 
> I was more concerned about the loops that "consume" the section maps.
> It seems like they might run over the end of the array.



> 
> >>> @@ -583,6 +592,7 @@ void __init sparse_init(void)
> >>>   unsigned long *usemap;
> >>>   unsigned long **usemap_map;
> >>>   int size;
> >>> + int idx_present = 0;
> >>
> >> I wonder whether idx_present is a good name.  Isn't it the number of
> >> consumed mem_map[]s or usemaps?
> > 
> > Yeah, in sparse_init(), it's the index of present memory sections, and
> > also the number of consumed mem_map[]s or usemaps. And I remember you
> > suggested nr_consumed_maps instead. seems nr_consumed_maps is a little
> > long to index array to make code line longer than 80 chars. How about
> > name it idx_present in sparse_init(), nr_consumed_maps in
> > alloc_usemap_and_memmap(), the maps allocation function? I am also fine
> > to use nr_consumed_maps for all of them.
> 
> Does the large array index make a bunch of lines wrap or something?  If
> not, I'd just use the long name.

I am fine with the long name, will use 'nr_consumed_maps' you suggested
earlier to replace.

> 
> >>>   if (!map) {
> >>>   ms->section_mem_map = 0;
> >>> + idx_present++;
> >>>   continue;
> >>>   }
> >>>  
> >>
> >>
> >> This hunk seems logically odd to me.  I would expect a non-used section
> >> to *not* consume an entry from the temporary array.  Why does it?  The
> >> error and success paths seem to do the same thing.
> > 
> > Yes, this place is the hardest to understand. The temorary arrays are
> > allocated beforehand with the size of 'nr_present_sections'. The error
> > paths you mentioned is caused by allocation failure of mem_map or
> > map_map, but whatever it's error or success paths, the sections must be
> > marked as present in memory_present(). Error or success paths happened
> > in alloc_usemap_and_memmap(), while checking if it's erorr or success
> > paths happened in the last for_each_present_section_nr() of
> > sparse_init(), and clear the ms->section_mem_map if it goes along error
> > paths. This is the key point of this new allocation way.
> 
> I think you owe some commenting because this is so hard to understand.

I can arrange and write a code comment above sparse_init() according to
this patch's git log, do you think it's OK?

Honestly, it took me several days to write code, while I spent more
than one week to write the patch log. Writing patch log is really a
headache to me.

Thanks
Baoquan

1 2 >

1 - 100 of 112 matches

Mail list logo