Dear ceph users,
I like to get some clarification regarding the Fast-EC BUG in tentacle[1].
The clyso article mentions "on CephFS", which gave the impression that
just CephFS as a application is affected, but others like RBD are not. I
assume and may found out, that RDB is also affected. In a cephadm test
environment I started a qemu-VM on an EC-data-pool with ec-overwrite and
ec-optimisations set and issued fstrim (qemu discard=unmap via librbd)
which immediately brings down some OSDs (see log below).
Also the clyso article points to issue 71642 "crash if
allow_ec_optimizations set but with allow_ec_overwrites not set"[2], which
from its title seems misleading because others and I have
allow_ec_overwrites set.
Thanks to clyso's known-bugs collection! Otherwise it's hard to get an
overview what problem one might hit currently ;-). And also thanks for
providing fixed images on
harbor.clyso.com/custom-ceph/ceph/ceph:v20.2.0-fast-ec-path-hf.2.
It seems the fix[3] is from Jul 11, 2025, but still awaiting backport to,
in the mean time released, stable version tentacle? Sounds odd.
Thanks, Sascha.
[1]
https://docs.clyso.com/docs/kb/known-bugs/tentacle/#osd-crash-when-enabling-ec-optimizations-on-cephfs
[2] https://tracker.ceph.com/issues/71642
[3] https://github.com/ceph/ceph/commit/51679b948bd9bb45cce9d778b0555f45ff808cd8
ODS crash log
======8<--------------------------
osd-1[197069]: terminate called after throwing an instance of
'std::out_of_range'
osd-1[197069]: what(): Key not found
osd-1[197069]: *** Caught signal (Aborted) **
osd-1[197069]: in thread 7f2dd5c65640 thread_name:tp_osd_tp
osd-1[197069]: ceph version 20.2.0 (69f84cc2651aa259a15bc192ddaabd3baba07489)
tentacle (stable - RelWithDebInfo)
osd-1[197069]: 1: /lib64/libc.so.6(+0x3fc30) [0x7f2df4da8c30]
osd-1[197069]: 2: /lib64/libc.so.6(+0x8d03c) [0x7f2df4df603c]
osd-1[197069]: 3: raise()
osd-1[197069]: 4: abort()
osd-1[197069]: 5: /lib64/libstdc++.so.6(+0xa1b21) [0x7f2df5b45b21]
osd-1[197069]: 6: /lib64/libstdc++.so.6(+0xad53c) [0x7f2df5b5153c]
osd-1[197069]: 7: /lib64/libstdc++.so.6(+0xad5a7) [0x7f2df5b515a7]
osd-1[197069]: 8: /lib64/libstdc++.so.6(+0xad809) [0x7f2df5b51809]
osd-1[197069]: 9: /usr/bin/ceph-osd(+0x41b012) [0x55ef3f2b1012]
osd-1[197069]: 10: (ECTransaction::WritePlanObj::WritePlanObj(hobject_t const&,
PGTransaction::ObjectOperation const&, ECU>
osd-1[197069]: 11: /usr/bin/ceph-osd(+0x99cdaa) [0x55ef3f832daa]
osd-1[197069]: 12: (ECCommon::get_write_plan(ECUtil::stripe_info_t const&, PGTransaction&, ECCommon::ReadPipeline&, ECComm>
osd-1[197069]: 13: (ECBackend::submit_transaction(hobject_t const&, object_stat_sum_t
const&, eversion_t const&, std::uniq>
osd-1[197069]: 14: /usr/bin/ceph-osd(+0x7d980f) [0x55ef3f66f80f]
osd-1[197069]: 15: (PrimaryLogPG::issue_repop(PrimaryLogPG::RepGather*,
PrimaryLogPG::OpContext*)+0x3ae) [0x55ef3f5f002e]
osd-1[197069]: 16: (PrimaryLogPG::execute_ctx(PrimaryLogPG::OpContext*)+0xf6a)
[0x55ef3f5cd63a]
osd-1[197069]: 17:
(PrimaryLogPG::do_op(boost::intrusive_ptr<OpRequest>&)+0x2d5f) [0x55ef3f5bdf5f]
osd-1[197069]: 18: (OSD::dequeue_op(boost::intrusive_ptr<PG>,
boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x19>
osd-1[197069]: 19: (ceph::osd::scheduler::PGOpItem::run(OSD*, OSDShard*,
boost::intrusive_ptr<PG>&, ThreadPool::TPHandle&)>
osd-1[197069]: 20: (OSD::ShardedOpWQ::_process(unsigned int,
ceph::heartbeat_handle_d*)+0x8bc) [0x55ef3f51611c]
osd-1[197069]: 21: (ShardedThreadPool::shardedthreadpool_worker(unsigned
int)+0x23a) [0x55ef3fa98e1a]
osd-1[197069]: 22: /usr/bin/ceph-osd(+0xc033d4) [0x55ef3fa993d4]
osd-1[197069]: 23: /lib64/libc.so.6(+0x8b2fa) [0x7f2df4df42fa]
osd-1[197069]: 24: /lib64/libc.so.6(+0x110400) [0x7f2df4e79400]
osd-1[197069]: 2026-02-15T00:10:48.350+0000 7f2dd5c65640 -1 *** Caught signal
(Aborted) **
osd-1[197069]: in thread 7f2dd5c65640 thread_name:tp_osd_tp
--------------------------------------
_______________________________________________
ceph-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]