On May 15, 2014, at 6:06 PM, Cao, Buddy <buddy....@intel.com> wrote: > Hi, > > One of the osd in my cluster downs w no reason, I saw the error message in > the log below, I restarted osd, but after several hours, the problem come > back again. Could you help? > > “Too many open files not handled on operation 24 (541468.0.1, or op 1, > counting from 0) It looks like you are running out of FD from the above error message. You can check the limit by ‘bash-$: ulimit -a’, and how many are being used by ‘bash-$: cat /proc/sys/fs/file-nr’, if they are close, it is likely you are at risk of running out of FD with load (or other cluster wide activities). > -96> 2014-05-14 22:12:24.281185 7f617b33e700 5 -- op tracker -- , seq: > 788808, time: 2014-05-14 22:12:24.281164, event: reached_pg, request: > osd_op(client.21276.0:3884815 rb.0.31c7.238e1f 29.000000003c15 [write > 2273280~65536] 4.110fcf4 e12271) v4 > -95> 2014-05-14 22:12:24.281192 7f618556d700 0 > filestore(/var/lib/ceph/osd/ceph-3) unexpected error code > -94> 2014-05-14 22:12:24.281197 7f6181b4b700 5 -- op tracker -- , seq: > 788843, time: 2014-05-14 22:12:24.281011, event: header_read, request: > osd_op(client.21276.0:3884929 rb.0.31c7.238e1 f29.000000005614 [write > 3137536~65536] 4.63e147e e12271) v4 > > 2014-05-14 22:12:24.289987 7f6185d6e700 -1 os/FileStore.cc: In function > > 'unsigned int FileStore::_do_transaction(ObjectStore::Transaction&, > > uint64_t, int, ThreadPool::TPHandle*)' thre ad 7f6185d6e700 time 2014-05-14 > > 22:12:24.282488 > os/FileStore.cc: 2448: FAILED assert(0 == "unexpected error") > ceph version 0.72.2 (a913ded2ff138aefb8cb84d347d72164099cfd60) > 1: (FileStore::_do_transaction(ObjectStore::Transaction&, unsigned long, int, > ThreadPool::TPHandle*)+0x11c3) [0x723a43] > 2: (FileStore::_do_transactions(std::list<ObjectStore::Transaction*, > std::allocator<ObjectStore::Transaction*> >&, unsigned long, > ThreadPool::TPHandle*)+0x74) [0x72a4d4] > 3: (FileStore::_do_op(FileStore::OpSequencer*, ThreadPool::TPHandle&)+0x29a) > [0x72a78a] > 4: (ThreadPool::worker(ThreadPool::WorkThread*)+0x551) [0x988f21] > 5: (ThreadPool::WorkThread::entry()+0x10) [0x98bf50] > 6: /lib64/libpthread.so.0() [0x3a7ce079d1] > 7: (clone()+0x6d) [0x3a7cae8b6d] > NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to > interpret this.”…….. > > > #iostat > avg-cpu: %user %nice %system %iowait %steal %idle > 0.44 0.00 0.14 0.41 0.00 99.01 > Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn > sdb 1.23 0.10 35.72 12738 4762008 > sdc 5.25 214.25 1288.81 28564314 171824232 > sdd 4.16 139.98 1021.69 18662490 136211888 > sde 4.61 207.50 1039.20 27663258 138545960 > sdf 7.94 203.24 2530.63 27095930 337383704 > sdg 4.77 0.57 1459.29 75330 194553064 > sdh 4.38 0.37 1287.42 48954 171638304 > sdi 85.80 132.13 8157.53 17616004 1087562272 > sdj 8.77 10.99 1701.90 1465844 226897024 > sda 4.55 0.60 1331.50 80010 177516216 > > > osd log attached. > > Wei Cao (Buddy) > > <ceph-osd.21_short.log>_______________________________________________ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com