On May 15, 2014, at 6:06 PM, Cao, Buddy <buddy....@intel.com> wrote:

> Hi,
>  
> One of the osd in my cluster downs w no reason, I saw the error message in 
> the log below, I restarted osd, but after several hours, the problem come 
> back again. Could you help?
>  
> “Too many open files not handled on operation 24 (541468.0.1, or op 1, 
> counting from 0)
It looks like you are running out of FD from the above error message.
You can check the limit by ‘bash-$: ulimit -a’, and how many are being used by 
‘bash-$: cat /proc/sys/fs/file-nr’, if they are close, it is likely you are at 
risk of running out of FD with load (or other cluster wide activities).
>    -96> 2014-05-14 22:12:24.281185 7f617b33e700  5 -- op tracker -- , seq: 
> 788808, time: 2014-05-14 22:12:24.281164, event: reached_pg, request:  
> osd_op(client.21276.0:3884815 rb.0.31c7.238e1f 29.000000003c15 [write 
> 2273280~65536] 4.110fcf4 e12271) v4
>   -95> 2014-05-14 22:12:24.281192 7f618556d700  0 
> filestore(/var/lib/ceph/osd/ceph-3) unexpected error code
>    -94> 2014-05-14 22:12:24.281197 7f6181b4b700  5 -- op tracker -- , seq: 
> 788843, time: 2014-05-14 22:12:24.281011, event: header_read, request:
> osd_op(client.21276.0:3884929 rb.0.31c7.238e1 f29.000000005614 [write 
> 3137536~65536] 4.63e147e e12271) v4
> > 2014-05-14 22:12:24.289987 7f6185d6e700 -1 os/FileStore.cc: In function 
> > 'unsigned int FileStore::_do_transaction(ObjectStore::Transaction&, 
> > uint64_t, int, ThreadPool::TPHandle*)' thre ad 7f6185d6e700 time 2014-05-14 
> > 22:12:24.282488
> os/FileStore.cc: 2448: FAILED assert(0 == "unexpected error")
>  ceph version 0.72.2 (a913ded2ff138aefb8cb84d347d72164099cfd60)
> 1: (FileStore::_do_transaction(ObjectStore::Transaction&, unsigned long, int, 
> ThreadPool::TPHandle*)+0x11c3) [0x723a43]
> 2: (FileStore::_do_transactions(std::list<ObjectStore::Transaction*, 
> std::allocator<ObjectStore::Transaction*> >&, unsigned long, 
> ThreadPool::TPHandle*)+0x74) [0x72a4d4]
> 3: (FileStore::_do_op(FileStore::OpSequencer*, ThreadPool::TPHandle&)+0x29a) 
> [0x72a78a]
> 4: (ThreadPool::worker(ThreadPool::WorkThread*)+0x551) [0x988f21]
> 5: (ThreadPool::WorkThread::entry()+0x10) [0x98bf50]
> 6: /lib64/libpthread.so.0() [0x3a7ce079d1]
> 7: (clone()+0x6d) [0x3a7cae8b6d]
> NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to 
> interpret this.”……..
>  
>  
> #iostat
> avg-cpu:  %user   %nice %system %iowait  %steal   %idle
>            0.44    0.00    0.14    0.41    0.00   99.01
> Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
> sdb               1.23         0.10        35.72      12738    4762008
> sdc               5.25       214.25      1288.81   28564314  171824232
> sdd               4.16       139.98      1021.69   18662490  136211888
> sde               4.61       207.50      1039.20   27663258  138545960
> sdf               7.94       203.24      2530.63   27095930  337383704
> sdg               4.77         0.57      1459.29      75330  194553064
> sdh               4.38         0.37      1287.42      48954  171638304
> sdi              85.80       132.13      8157.53   17616004 1087562272
> sdj               8.77        10.99      1701.90    1465844  226897024
> sda               4.55         0.60      1331.50      80010  177516216
>  
>  
> osd log attached.
>  
> Wei Cao (Buddy)
>  
> <ceph-osd.21_short.log>_______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to