Hi list,
         We have hit and reproduce this issue for several times, ceph will 
suicide because FileStore: sync_entry timed out after a very heavy random IO on 
top of the RBD.
         My test environment is:
                            4 Nodes ceph cluster with 20 HDDs for OSDs and 4 
Intel DCS3700 ssds for journal per node, that is 80 spindles in total
                            48 VMs spread across 12 Physical nodes, 48 RBD 
attached to the VMs 1:1 via Qemu.
                            Ceph @ 0.58
                            XFS were used.
         I am using Aiostress (something like FIO) to produce random write 
requests on top of each RBDs.

         From Ceph-w , ceph reports a very high Ops (10000+ /s) , but 
technically , 80 spindles can provide up to 150*80/2=6000 IOPS for 4K random 
write.
         When digging into the code, I found that the OSD write data to 
Pagecache than returned, although it called ::sync_file_range, but this syscall 
doesn't actually sync data to disk when it return,it's an aync call. So the 
situation is , the random write will be extremely fast since it only write to 
journal and pagecache, but once syncing , it will take very long time. The 
speed gap between journal and OSDs exist, the amount of data that need to be 
sync keep increasing, and it will certainly exceed 600s.

         For more information, I have tried to reproduce this by rados 
bench,but failed.

         Could you please let me know if you need any more informations & have 
some solutions? Thanks
                                                                                
                                                                                
                                                                                
            Xiaoxi
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to