Hello guys, 

I've been trying to map an rbd disk to run some testing and I've noticed that 
while I can successfully read from the rbd image mapped to /dev/rbdX, I am 
failing to reliably write to it. Sometimes write tests work perfectly well, 
especially if I am using large block sizes. But often writes hang for a 
considerable amount of time and I have kernel hang task messages (shown below) 
in my dmesg. the hang tasks show particularly frequently when using 4K block 
size. However, with large block sizes writes also sometimes hang, but for sure 
less frequent 

I am using simple dd tests like: dd if=/dev/zero of=<mountpoint/file> bs=4K 
count=250K. 

I am running firefly on Ubuntu 12.04 on all osd/mon servers. The rbd image is 
mapped on one of the osd servers. All osd servers are running kernel version 
3.15.10-031510-generic . 


Sep 13 21:24:30 arh-ibstorage2-ib kernel: [11880.439974] INFO: task 
jbd2/rbd0-8:3505 blocked for more than 120 seconds. 
Sep 13 21:24:30 arh-ibstorage2-ib kernel: [11880.441586] Not tainted 
3.15.10-031510-generic #201408132333 
Sep 13 21:24:30 arh-ibstorage2-ib kernel: [11880.443022] "echo 0 > 
/proc/sys/kernel/hung_task_timeout_secs" disables this message. 
Sep 13 21:24:30 arh-ibstorage2-ib kernel: [11880.444862] jbd2/rbd0-8 D 
0000000000000003 0 3505 2 0x00000000 
Sep 13 21:24:30 arh-ibstorage2-ib kernel: [11880.444870] ffff8803a10a7c48 
0000000000000002 ffff88007963b288 ffff8803a10a7fd8 
Sep 13 21:24:30 arh-ibstorage2-ib kernel: [11880.444874] 0000000000014540 
0000000000014540 ffff880469f63260 ffff880866969930 
Sep 13 21:24:30 arh-ibstorage2-ib kernel: [11880.444876] ffff8803a10a7c58 
ffff8803a10a7d88 ffff88034d142100 ffff880848519824 
Sep 13 21:24:30 arh-ibstorage2-ib kernel: [11880.444879] Call Trace: 
Sep 13 21:24:30 arh-ibstorage2-ib kernel: [11880.444893] [<ffffffff81778449>] 
schedule+0x29/0x70 
Sep 13 21:24:30 arh-ibstorage2-ib kernel: [11880.444901] [<ffffffff812a0360>] 
jbd2_journal_commit_transaction+0x240/0x1510 
Sep 13 21:24:30 arh-ibstorage2-ib kernel: [11880.444908] [<ffffffff810a5045>] ? 
sched_clock_cpu+0x85/0xc0 
Sep 13 21:24:30 arh-ibstorage2-ib kernel: [11880.444920] [<ffffffff810a5cba>] ? 
arch_vtime_task_switch+0x8a/0x90 
Sep 13 21:24:30 arh-ibstorage2-ib kernel: [11880.444923] [<ffffffff810a5cfd>] ? 
vtime_common_task_switch+0x3d/0x50 
Sep 13 21:24:30 arh-ibstorage2-ib kernel: [11880.444928] [<ffffffff810b5420>] ? 
__wake_up_sync+0x20/0x20 
Sep 13 21:24:30 arh-ibstorage2-ib kernel: [11880.444933] [<ffffffff81079f9f>] ? 
try_to_del_timer_sync+0x4f/0x70 
Sep 13 21:24:30 arh-ibstorage2-ib kernel: [11880.444938] [<ffffffff812a4e88>] 
kjournald2+0xb8/0x240 
Sep 13 21:24:30 arh-ibstorage2-ib kernel: [11880.444941] [<ffffffff810b5420>] ? 
__wake_up_sync+0x20/0x20 
Sep 13 21:24:30 arh-ibstorage2-ib kernel: [11880.444943] [<ffffffff812a4dd0>] ? 
commit_timeout+0x10/0x10 
Sep 13 21:24:30 arh-ibstorage2-ib kernel: [11880.444949] [<ffffffff81091439>] 
kthread+0xc9/0xe0 
Sep 13 21:24:30 arh-ibstorage2-ib kernel: [11880.444952] [<ffffffff81091370>] ? 
flush_kthread_worker+0xb0/0xb0 
Sep 13 21:24:30 arh-ibstorage2-ib kernel: [11880.444965] [<ffffffff8178567c>] 
ret_from_fork+0x7c/0xb0 
Sep 13 21:24:30 arh-ibstorage2-ib kernel: [11880.444969] [<ffffffff81091370>] ? 
flush_kthread_worker+0xb0/0xb0 
Sep 13 21:24:30 arh-ibstorage2-ib kernel: [11880.445141] INFO: task dd:21180 
blocked for more than 120 seconds. 
Sep 13 21:24:30 arh-ibstorage2-ib kernel: [11880.446595] Not tainted 
3.15.10-031510-generic #201408132333 
Sep 13 21:24:30 arh-ibstorage2-ib kernel: [11880.448070] "echo 0 > 
/proc/sys/kernel/hung_task_timeout_secs" disables this message. 
Sep 13 21:24:30 arh-ibstorage2-ib kernel: [11880.449910] dd D 0000000000000002 
0 21180 19562 0x00000002 
Sep 13 21:24:30 arh-ibstorage2-ib kernel: [11880.449913] ffff880485a1b5d8 
0000000000000002 ffff880485a1b5e8 ffff880485a1bfd8 
Sep 13 21:24:30 arh-ibstorage2-ib kernel: [11880.449916] 0000000000014540 
0000000000014540 ffff880341833260 ffff88011086cb90 
Sep 13 21:24:30 arh-ibstorage2-ib kernel: [11880.449919] ffff880485a1b5a8 
ffff88046fc94e40 ffff88011086cb90 ffffffff81204da0 
Sep 13 21:24:30 arh-ibstorage2-ib kernel: [11880.449921] Call Trace: 
Sep 13 21:24:30 arh-ibstorage2-ib kernel: [11880.449927] [<ffffffff81204da0>] ? 
__wait_on_buffer+0x30/0x30 
Sep 13 21:24:30 arh-ibstorage2-ib kernel: [11880.449930] [<ffffffff81778449>] 
schedule+0x29/0x70 
Sep 13 21:24:30 arh-ibstorage2-ib kernel: [11880.449934] [<ffffffff8177851f>] 
io_schedule+0x8f/0xd0 
Sep 13 21:24:30 arh-ibstorage2-ib kernel: [11880.449936] [<ffffffff81204dae>] 
sleep_on_buffer+0xe/0x20 
Sep 13 21:24:30 arh-ibstorage2-ib kernel: [11880.449940] [<ffffffff81778be2>] 
__wait_on_bit+0x62/0x90 
Sep 13 21:24:30 arh-ibstorage2-ib kernel: [11880.449945] [<ffffffff8120a110>] ? 
bio_alloc_bioset+0xa0/0x1d0 
Sep 13 21:24:30 arh-ibstorage2-ib kernel: [11880.449947] [<ffffffff81204da0>] ? 
__wait_on_buffer+0x30/0x30 
Sep 13 21:24:30 arh-ibstorage2-ib kernel: [11880.449951] [<ffffffff81778c8c>] 
out_of_line_wait_on_bit+0x7c/0x90 
Sep 13 21:24:30 arh-ibstorage2-ib kernel: [11880.449954] [<ffffffff810b54a0>] ? 
wake_atomic_t_function+0x40/0x40 
Sep 13 21:24:30 arh-ibstorage2-ib kernel: [11880.449957] [<ffffffff81204d9e>] 
__wait_on_buffer+0x2e/0x30 
Sep 13 21:24:30 arh-ibstorage2-ib kernel: [11880.449962] [<ffffffff8124eb28>] 
ext4_wait_block_bitmap+0xd8/0xe0 
Sep 13 21:24:30 arh-ibstorage2-ib kernel: [11880.449969] [<ffffffff812888de>] 
ext4_mb_init_cache+0x1de/0x750 
Sep 13 21:24:30 arh-ibstorage2-ib kernel: [11880.449974] [<ffffffff8115fb65>] ? 
find_or_create_page+0x65/0xb0 
Sep 13 21:24:30 arh-ibstorage2-ib kernel: [11880.449977] [<ffffffff81288f0e>] 
ext4_mb_init_group+0xbe/0x130 
Sep 13 21:24:30 arh-ibstorage2-ib kernel: [11880.449984] [<ffffffff8134bb59>] ? 
__get_request+0x339/0x560 
Sep 13 21:24:30 arh-ibstorage2-ib kernel: [11880.449987] [<ffffffff8128948c>] 
ext4_mb_load_buddy+0x33c/0x360 
Sep 13 21:24:30 arh-ibstorage2-ib kernel: [11880.449990] [<ffffffff81289bf3>] 
ext4_mb_find_by_goal+0xa3/0x310 
Sep 13 21:24:30 arh-ibstorage2-ib kernel: [11880.449993] [<ffffffff8128a60e>] 
ext4_mb_regular_allocator+0x5e/0x450 
Sep 13 21:24:30 arh-ibstorage2-ib kernel: [11880.449996] [<ffffffff8127fcf0>] ? 
ext4_ext_find_extent+0x220/0x2a0 
Sep 13 21:24:30 arh-ibstorage2-ib kernel: [11880.449999] [<ffffffff8128db48>] 
ext4_mb_new_blocks+0x3f8/0x570 
Sep 13 21:24:30 arh-ibstorage2-ib kernel: [11880.450002] [<ffffffff8127fbf0>] ? 
ext4_ext_find_extent+0x120/0x2a0 
Sep 13 21:24:30 arh-ibstorage2-ib kernel: [11880.450004] [<ffffffff8127cabc>] ? 
ext4_ext_check_overlap.isra.28+0xbc/0xd0 
Sep 13 21:24:30 arh-ibstorage2-ib kernel: [11880.450007] [<ffffffff812839a3>] 
ext4_ext_map_blocks+0xa13/0xb80 
Sep 13 21:24:30 arh-ibstorage2-ib kernel: [11880.450010] [<ffffffff81290800>] ? 
mext_replace_branches+0x220/0x490 
Sep 13 21:24:30 arh-ibstorage2-ib kernel: [11880.450014] [<ffffffff8125781f>] 
ext4_map_blocks+0x15f/0x550 
Sep 13 21:24:30 arh-ibstorage2-ib kernel: [11880.450017] [<ffffffff81256110>] ? 
mpage_prepare_extent_to_map+0x2d0/0x320 
Sep 13 21:24:30 arh-ibstorage2-ib kernel: [11880.450020] [<ffffffff81257c7e>] 
mpage_map_one_extent+0x6e/0x1a0 
Sep 13 21:24:30 arh-ibstorage2-ib kernel: [11880.450023] [<ffffffff8125c34b>] 
mpage_map_and_submit_extent+0x4b/0x1e0 
Sep 13 21:24:30 arh-ibstorage2-ib kernel: [11880.450026] [<ffffffff8125ca07>] 
ext4_writepages+0x527/0x7b0 
Sep 13 21:24:30 arh-ibstorage2-ib kernel: [11880.450032] [<ffffffff8109f1f5>] ? 
check_preempt_curr+0x75/0xa0 
Sep 13 21:24:30 arh-ibstorage2-ib kernel: [11880.450035] [<ffffffff810a2742>] ? 
default_wake_function+0x12/0x20 
Sep 13 21:24:30 arh-ibstorage2-ib kernel: [11880.450038] [<ffffffff810b5436>] ? 
autoremove_wake_function+0x16/0x40 
Sep 13 21:24:30 arh-ibstorage2-ib kernel: [11880.450044] [<ffffffff8116b3a0>] 
do_writepages+0x20/0x40 
Sep 13 21:24:30 arh-ibstorage2-ib kernel: [11880.450046] [<ffffffff81160229>] 
__filemap_fdatawrite_range+0x59/0x60 
Sep 13 21:24:30 arh-ibstorage2-ib kernel: [11880.450049] [<ffffffff81160abc>] 
filemap_flush+0x1c/0x20 
Sep 13 21:24:30 arh-ibstorage2-ib kernel: [11880.450051] [<ffffffff81258129>] 
ext4_alloc_da_blocks+0x49/0xa0 
Sep 13 21:24:30 arh-ibstorage2-ib kernel: [11880.450055] [<ffffffff81250d39>] 
ext4_release_file+0x79/0xc0 
Sep 13 21:24:30 arh-ibstorage2-ib kernel: [11880.450059] [<ffffffff811d477d>] 
__fput+0xbd/0x250 
Sep 13 21:24:30 arh-ibstorage2-ib kernel: [11880.450061] [<ffffffff811d495e>] 
____fput+0xe/0x10 
Sep 13 21:24:30 arh-ibstorage2-ib kernel: [11880.450066] [<ffffffff8108df37>] 
task_work_run+0xa7/0xe0 
Sep 13 21:24:30 arh-ibstorage2-ib kernel: [11880.450072] [<ffffffff81013e67>] 
do_notify_resume+0xc7/0xd0 
Sep 13 21:24:30 arh-ibstorage2-ib kernel: [11880.450075] [<ffffffff817859ea>] 
int_signal+0x12/0x17 
Sep 13 21:24:45 arh-ibstorage2-ib kernel: [11894.974467] perf interrupt took 
too long (5003 > 5000), lowering kernel.perf_event_max_sample_rate to 25000 

Has anyone notices similar behavior? Are there any cases where people use rbd 
mapped images in production? 

Thanks 

Andrei 
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to