I've done some tests using ceph-fuse and it looks far more stable. I've not 
experienced any issues so far with ceph-fuse mount point over nfs. Will do more 
stress testing over and update. 

Anyone experiencing issues with hang tasks using ceph kernel module mount 
method? 

Thanks 
----- Original Message -----

> From: "Andrei Mikhailovsky" <and...@arhont.com>
> To: "ceph-users" <ceph-users@lists.ceph.com>
> Sent: Friday, 28 November, 2014 12:02:57 PM
> Subject: Re: [ceph-users] Giant + nfs over cephfs hang tasks

> I've just tried the latest ubuntu-vivid kernel and also seeing hang
> tasks with dd tests:

> [ 3721.026421] INFO: task nfsd:16596 blocked for more than 120
> seconds.
> [ 3721.065141] Not tainted 3.17.4-031704-generic #201411211317
> [ 3721.103721] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> disables this message.
> [ 3721.180409] nfsd D 0000000000000009 0 16596 2 0x00000000
> [ 3721.180412] ffff88006f0cbc18 0000000000000046 ffff88006f0cbbc8
> ffffffff8109677f
> [ 3721.180414] ffff88006f0cbfd8 00000000000145c0 ffff88067089f700
> 00000000000145c0
> [ 3721.180417] ffff880673dac600 ffff88045adfbc00 ffff8801bdf8be40
> ffff88000b841500
> [ 3721.180420] Call Trace:
> [ 3721.180423] [<ffffffff8109677f>] ? set_groups+0x2f/0x60
> [ 3721.180427] [<ffffffff817a20c9>] schedule+0x29/0x70
> [ 3721.180440] [<ffffffff817a23ee>]
> schedule_preempt_disabled+0xe/0x10
> [ 3721.180443] [<ffffffff817a429d>] __mutex_lock_slowpath+0xcd/0x1d0
> [ 3721.180447] [<ffffffff817a43c3>] mutex_lock+0x23/0x37
> [ 3721.180454] [<ffffffffc071cadd>] nfsd_setattr+0x15d/0x2a0 [nfsd]
> [ 3721.180460] [<ffffffffc0727d2e>] nfsd4_setattr+0x14e/0x180 [nfsd]
> [ 3721.180467] [<ffffffffc0729eac>] nfsd4_proc_compound+0x4cc/0x730
> [nfsd]
> [ 3721.180478] [<ffffffffc0715e55>] nfsd_dispatch+0xe5/0x230 [nfsd]
> [ 3721.180491] [<ffffffffc05b9882>] ? svc_tcp_adjust_wspace+0x12/0x30
> [sunrpc]
> [ 3721.180500] [<ffffffffc05b8694>] svc_process_common+0x324/0x680
> [sunrpc]
> [ 3721.180510] [<ffffffffc05b8d43>] svc_process+0x103/0x160 [sunrpc]
> [ 3721.180516] [<ffffffffc07159c7>] nfsd+0x117/0x190 [nfsd]
> [ 3721.180526] [<ffffffffc07158b0>] ? nfsd_destroy+0x80/0x80 [nfsd]
> [ 3721.180528] [<ffffffff81093359>] kthread+0xc9/0xe0
> [ 3721.180533] [<ffffffff81093290>] ? flush_kthread_worker+0x90/0x90
> [ 3721.180536] [<ffffffff817a64bc>] ret_from_fork+0x7c/0xb0
> [ 3721.180540] [<ffffffff81093290>] ? flush_kthread_worker+0x90/0x90
> [ 3721.180577] INFO: task kworker/2:3:28061 blocked for more than 120
> seconds.
> [ 3721.221450] Not tainted 3.17.4-031704-generic #201411211317
> [ 3721.261440] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> disables this message.
> [ 3721.341394] kworker/2:3 D 0000000000000002 0 28061 2 0x00000000
> [ 3721.341408] Workqueue: ceph-trunc ceph_vmtruncate_work [ceph]
> [ 3721.341409] ffff8805a6507b08 0000000000000046 ffff8805a6507b88
> ffffea00040e8c80
> [ 3721.341412] ffff8805a6507fd8 00000000000145c0 ffff88067089b480
> 00000000000145c0
> [ 3721.341414] ffff8801fffec600 ffff880102535a00 ffff88000b8415a8
> ffff88046fc94ec0
> [ 3721.341417] Call Trace:
> [ 3721.341421] [<ffffffff817a2970>] ? bit_wait+0x50/0x50
> [ 3721.341424] [<ffffffff817a20c9>] schedule+0x29/0x70
> [ 3721.341427] [<ffffffff817a219f>] io_schedule+0x8f/0xd0
> [ 3721.341430] [<ffffffff817a299b>] bit_wait_io+0x2b/0x50
> [ 3721.341433] [<ffffffff817a2656>] __wait_on_bit_lock+0x76/0xb0
> [ 3721.341438] [<ffffffff811756b5>] ? find_get_entries+0xe5/0x160
> [ 3721.341440] [<ffffffff8117245e>] __lock_page+0xae/0xb0
> [ 3721.341446] [<ffffffff810b3fd0>] ?
> wake_atomic_t_function+0x40/0x40
> [ 3721.341451] [<ffffffff81183226>]
> truncate_inode_pages_range+0x446/0x700
> [ 3721.341455] [<ffffffff81183565>] truncate_inode_pages+0x15/0x20
> [ 3721.341457] [<ffffffff811835bc>] truncate_pagecache+0x4c/0x70
> [ 3721.341464] [<ffffffffc09f815e>]
> __ceph_do_pending_vmtruncate+0xde/0x230 [ceph]
> [ 3721.341470] [<ffffffffc09f8c73>] ceph_vmtruncate_work+0x23/0x50
> [ceph]
> [ 3721.341476] [<ffffffff8108cece>] process_one_work+0x14e/0x460
> [ 3721.341479] [<ffffffff8108d84b>] worker_thread+0x11b/0x3f0
> [ 3721.341482] [<ffffffff8108d730>] ? create_worker+0x1e0/0x1e0
> [ 3721.341485] [<ffffffff81093359>] kthread+0xc9/0xe0
> [ 3721.341487] [<ffffffff81093290>] ? flush_kthread_worker+0x90/0x90
> [ 3721.341490] [<ffffffff817a64bc>] ret_from_fork+0x7c/0xb0
> [ 3721.341492] [<ffffffff81093290>] ? flush_kthread_worker+0x90/0x90

> They do not happen with every dd test, but happen pretty often.
> Especially when I am running a few dd tests concurrently.

> An example test that generated hang tasks above after just 2 runs:

> # dd if=/dev/zero of=/tmp/cephfs/4G bs=1M count=4K oflag=direct & dd
> if=/dev/zero of=/tmp/cephfs/4G1 bs=1M count=4K oflag=direct & dd
> if=/dev/zero of=/tmp/cephfs/4G2 bs=1M count=4K oflag=direct & dd
> if=/dev/zero of=/tmp/cephfs/4G3 bs=1M count=4K oflag=direct &

> Cheers

> ----- Original Message -----

> > From: "Andrei Mikhailovsky" <and...@arhont.com>
> 
> > To: "ceph-users" <ceph-users@lists.ceph.com>
> 
> > Sent: Friday, 28 November, 2014 11:22:07 AM
> 
> > Subject: Re: [ceph-users] Giant + nfs over cephfs hang tasks
> 

> > I am also noticing some delays working with nfs over cephfs.
> > Especially when making an initial connection. For instance, I run
> > the following:
> 

> > # time for i in {0..10} ; do time touch /tmp/cephfs/test-$i ; done
> 

> > where /tmp/cephfs is the nfs mount point running over cephfs
> 

> > I am noticing that the first touch file is created after about
> > 20-30
> > seconds. All the following files files are created with no delay.
> 

> > If I run the command once again, all files are created pretty
> > quickly
> > without any delay. However, if I wait 20-30 minutes and run the
> > command again, the delay with the first file is back again.
> 

> > Has anyone experienced similar issues?
> 

> > Andrei
> 
> > ----- Original Message -----
> 

> > > From: "Andrei Mikhailovsky" <and...@arhont.com>
> > 
> 
> > > To: "ceph-users" <ceph-users@lists.ceph.com>
> > 
> 
> > > Sent: Friday, 28 November, 2014 9:08:17 AM
> > 
> 
> > > Subject: [ceph-users] Giant + nfs over cephfs hang tasks
> > 
> 

> > > Hello guys,
> > 
> 

> > > I've got a bunch of hang tasks of the nfsd service running over
> > > the
> > > cephfs (kernel) mounted file system. Here is an example of one of
> > > them.
> > 
> 

> > > [433079.991218] INFO: task nfsd:32625 blocked for more than 120
> > > seconds.
> > 
> 
> > > [433080.029685] Not tainted 3.15.10-031510-generic #201408132333
> > 
> 
> > > [433080.068036] "echo 0 >
> > > /proc/sys/kernel/hung_task_timeout_secs"
> > > disables this message.
> > 
> 
> > > [433080.144235] nfsd D 000000000000000a 0 32625 2 0x00000000
> > 
> 
> > > [433080.144241] ffff8801a94dba78 0000000000000002
> > > ffff8801a94dba38
> > > ffff8801a94dbfd8
> > 
> 
> > > [433080.144244] 0000000000014540 0000000000014540
> > > ffff880673d63260
> > > ffff880491d264c0
> > 
> 
> > > [433080.144247] ffff8801a94dba78 ffff88067fd14e40
> > > ffff880491d264c0
> > > ffffffff8115dff0
> > 
> 
> > > [433080.144250] Call Trace:
> > 
> 
> > > [433080.144260] [<ffffffff8115dff0>] ? __lock_page+0x70/0x70
> > 
> 
> > > [433080.144274] [<ffffffff81778449>] schedule+0x29/0x70
> > 
> 
> > > [433080.144279] [<ffffffff8177851f>] io_schedule+0x8f/0xd0
> > 
> 
> > > [433080.144282] [<ffffffff8115dffe>] sleep_on_page+0xe/0x20
> > 
> 
> > > [433080.144286] [<ffffffff81778be2>] __wait_on_bit+0x62/0x90
> > 
> 
> > > [433080.144288] [<ffffffff8115eacb>] ?
> > > find_get_pages_tag+0xcb/0x170
> > 
> 
> > > [433080.144291] [<ffffffff8115e160>] wait_on_page_bit+0x80/0x90
> > 
> 
> > > [433080.144296] [<ffffffff810b54a0>] ?
> > > wake_atomic_t_function+0x40/0x40
> > 
> 
> > > [433080.144299] [<ffffffff8115e334>]
> > > filemap_fdatawait_range+0xf4/0x180
> > 
> 
> > > [433080.144302] [<ffffffff8116027d>]
> > > filemap_write_and_wait_range+0x4d/0x80
> > 
> 
> > > [433080.144315] [<ffffffffa06bf1b8>] ceph_fsync+0x58/0x200 [ceph]
> > 
> 
> > > [433080.144330] [<ffffffff813308f5>] ? ima_file_check+0x35/0x40
> > 
> 
> > > [433080.144337] [<ffffffff812028c8>] vfs_fsync_range+0x18/0x30
> > 
> 
> > > [433080.144352] [<ffffffffa03ee491>] nfsd_commit+0xb1/0xd0 [nfsd]
> > 
> 
> > > [433080.144363] [<ffffffffa03fb787>] nfsd4_commit+0x57/0x60
> > > [nfsd]
> > 
> 
> > > [433080.144370] [<ffffffffa03fcf9e>]
> > > nfsd4_proc_compound+0x54e/0x740
> > > [nfsd]
> > 
> 
> > > [433080.144377] [<ffffffffa03e8e05>] nfsd_dispatch+0xe5/0x230
> > > [nfsd]
> > 
> 
> > > [433080.144401] [<ffffffffa03205a5>]
> > > svc_process_common+0x345/0x680
> > > [sunrpc]
> > 
> 
> > > [433080.144413] [<ffffffffa0320c33>] svc_process+0x103/0x160
> > > [sunrpc]
> > 
> 
> > > [433080.144418] [<ffffffffa03e895f>] nfsd+0xbf/0x130 [nfsd]
> > 
> 
> > > [433080.144424] [<ffffffffa03e88a0>] ? nfsd_destroy+0x80/0x80
> > > [nfsd]
> > 
> 
> > > [433080.144428] [<ffffffff81091439>] kthread+0xc9/0xe0
> > 
> 
> > > [433080.144431] [<ffffffff81091370>] ?
> > > flush_kthread_worker+0xb0/0xb0
> > 
> 
> > > [433080.144434] [<ffffffff8178567c>] ret_from_fork+0x7c/0xb0
> > 
> 
> > > [433080.144437] [<ffffffff81091370>] ?
> > > flush_kthread_worker+0xb0/0xb0
> > 
> 

> > > I am using Ubuntu 12.04 servers with 3.15.10 kernel and ceph
> > > Giant.
> > 
> 

> > > Thanks
> > 
> 

> > > Andrei
> > 
> 

> > > _______________________________________________
> > 
> 
> > > ceph-users mailing list
> > 
> 
> > > ceph-users@lists.ceph.com
> > 
> 
> > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> > 
> 

> > _______________________________________________
> 
> > ceph-users mailing list
> 
> > ceph-users@lists.ceph.com
> 
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 

> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to