Hi, Gyula. It seems related to https://issues.apache.org/jira/browse/FLINK-23346. We also saw core dump while using list state after triggering state migration and ttl compaction filter. Have you triggered the schema evolution ? It seems a bug of the rocksdb list state together with ttl compaction filter.
On Wed, May 17, 2023 at 7:05 PM Gyula Fóra <gyula.f...@gmail.com> wrote: > Hi All! > > We are encountering an error on a larger stateful job (around 1 TB + > state) on restore from a rocksdb checkpoint. The taskmanagers keep crashing > with a segfault coming from the rocksdb native logic and seem to be related > to the FlinkCompactionFilter mechanism. > > The gist with the full error report: report: > https://gist.github.com/gyfora/f307aa570d324d063e0ade9810f8bb25 > > The core part is here: > V [libjvm.so+0x79478f] Exceptions:: > (Thread*, char const*, int, oopDesc*)+0x15f > V [libjvm.so+0x960a68] jni_Throw+0x88 > C [librocksdbjni-linux64.so+0x222aa1] > JavaListElementFilter::NextUnexpiredOffset(rocksdb::Slice const&, long, > long) const+0x121 > C [librocksdbjni-linux64.so+0x6486c1] > rocksdb::flink::FlinkCompactionFilter::ListDecide(rocksdb::Slice const&, > std::string*) const+0x81 > C [librocksdbjni-linux64.so+0x648bea] > rocksdb::flink::FlinkCompactionFilter::FilterV2(int, rocksdb::Slice > const&, rocksdb::CompactionFilter::ValueType, rocksdb::Slice const&, > std::string*, std::string*) const+0x14a > > Has anyone encountered a similar issue before? > > Thanks > Gyula > > -- Best, Hangxiang.