On 04/16/2018 10:33 AM, Eric Dumazet wrote: > Some networks can make sure TCP payload can exactly fit 4KB pages, > with well chosen MSS/MTU and architectures. > > Implement mmap() system call so that applications can avoid > copying data without complex splice() games. > > Note that a successful mmap( X bytes) on TCP socket is consuming > bytes, as if recvmsg() has been done. (tp->copied += X) >
Oh well, I should have run this code with LOCKDEP enabled :/ [ 974.320412] ====================================================== [ 974.326631] WARNING: possible circular locking dependency detected [ 974.332816] 4.16.0-dbx-DEV #40 Not tainted [ 974.336927] ------------------------------------------------------ [ 974.343107] b78299096/15790 is trying to acquire lock: [ 974.348246] 000000006074c9cf (sk_lock-AF_INET6){+.+.}, at: tcp_mmap+0x7c/0x550 [ 974.355505] but task is already holding lock: [ 974.361366] 000000008dbe063b (&mm->mmap_sem){++++}, at: vm_mmap_pgoff+0x99/0x100 [ 974.368801] which lock already depends on the new lock. [ 974.377010] the existing dependency chain (in reverse order) is: [ 974.384501] -> #1 (&mm->mmap_sem){++++}: [ 974.389911] __might_fault+0x68/0x90 [ 974.394025] _copy_from_user+0x23/0xa0 [ 974.398311] sock_setsockopt+0x4a2/0xac0 [ 974.402761] __sys_setsockopt+0xd9/0xf0 [ 974.407118] SyS_setsockopt+0xe/0x20 [ 974.411242] do_syscall_64+0x6e/0x1a0 [ 974.415431] entry_SYSCALL_64_after_hwframe+0x42/0xb7 [ 974.421011] -> #0 (sk_lock-AF_INET6){+.+.}: [ 974.426690] lock_acquire+0x95/0x1e0 [ 974.430813] lock_sock_nested+0x71/0xa0 [ 974.435196] tcp_mmap+0x7c/0x550 [ 974.438940] sock_mmap+0x23/0x30 [ 974.442695] mmap_region+0x3a4/0x5d0 [ 974.446808] do_mmap+0x313/0x530 [ 974.450571] vm_mmap_pgoff+0xc7/0x100 [ 974.454769] ksys_mmap_pgoff+0x1d5/0x260 [ 974.459247] SyS_mmap+0x1b/0x30 [ 974.462936] do_syscall_64+0x6e/0x1a0 [ 974.467114] entry_SYSCALL_64_after_hwframe+0x42/0xb7 [ 974.472678] other info that might help us debug this: [ 974.480677] Possible unsafe locking scenario: [ 974.486600] CPU0 CPU1 [ 974.491152] ---- ---- [ 974.495684] lock(&mm->mmap_sem); [ 974.499089] lock(sk_lock-AF_INET6); [ 974.505285] lock(&mm->mmap_sem); [ 974.511211] lock(sk_lock-AF_INET6); [ 974.514885] *** DEADLOCK *** [ 974.520825] 1 lock held by b78299096/15790: [ 974.525018] #0: 000000008dbe063b (&mm->mmap_sem){++++}, at: vm_mmap_pgoff+0x99/0x100 [ 974.532852] stack backtrace: [ 974.537224] CPU: 25 PID: 15790 Comm: b78299096 Not tainted 4.16.0-dbx-DEV #40 [ 974.544371] Hardware name: Intel RML,PCH/Iota_QC_19, BIOS 2.40.0 06/22/2016 [ 974.551333] Call Trace: [ 974.553792] dump_stack+0x70/0xa5 [ 974.557111] print_circular_bug.isra.39+0x1d8/0x1e6 [ 974.561982] __lock_acquire+0x1284/0x1340 [ 974.565992] ? tcp_mmap+0x7c/0x550 [ 974.569419] lock_acquire+0x95/0x1e0 [ 974.573011] ? lock_acquire+0x95/0x1e0 [ 974.576767] ? tcp_mmap+0x7c/0x550 [ 974.580167] lock_sock_nested+0x71/0xa0 [ 974.584023] ? tcp_mmap+0x7c/0x550 [ 974.587437] tcp_mmap+0x7c/0x550 [ 974.590677] sock_mmap+0x23/0x30 [ 974.593909] mmap_region+0x3a4/0x5d0 [ 974.597506] do_mmap+0x313/0x530 [ 974.600749] vm_mmap_pgoff+0xc7/0x100 [ 974.604414] ksys_mmap_pgoff+0x1d5/0x260 [ 974.608341] ? fd_install+0x25/0x30 [ 974.611849] ? trace_hardirqs_on_caller+0xef/0x180 [ 974.616641] SyS_mmap+0x1b/0x30 [ 974.619804] do_syscall_64+0x6e/0x1a0 [ 974.623462] entry_SYSCALL_64_after_hwframe+0x42/0xb7 [ 974.628549] RIP: 0033:0x433749 [ 974.631600] RSP: 002b:00007ffd29fdb438 EFLAGS: 00000216 ORIG_RAX: 0000000000000009 [ 974.639197] RAX: ffffffffffffffda RBX: 00000000004002e0 RCX: 0000000000433749 [ 974.646323] RDX: 0000000000000008 RSI: 0000000000004000 RDI: 0000000020ab7000 [ 974.653463] RBP: 00007ffd29fdb460 R08: 0000000000000003 R09: 0000000000000000 [ 974.660603] R10: 0000000000000012 R11: 0000000000000216 R12: 0000000000401670 [ 974.667737] R13: 0000000000401700 R14: 0000000000000000 R15: 0000000000000000 I am not sure we can keep mmap() API, since we probably need to first lock the socket, then grab vm semaphore.