Re: RocksDB segfault inside timer when accessing/clearing state

2017-10-09 Thread Kien Truong
Hi Stephan, I guess this is the case. Our cluster is a bit overloaded network-wise, so sometime a Task Manager got disconnected, which causes the restart of the entire job, leading to multiple segfaults in other task managers, prolonging recovery. We're upgrading the network, hopefully the p

Re: RocksDB segfault inside timer when accessing/clearing state

2017-10-08 Thread Stefan Richter
Hi, I would assume that those segfaults are only observed *after* a job is already in the process of canceling? This is a known problem, but currently „accepted“ behaviour after discussions with Stephan and Aljoscha (in CC). From that discussion, the background is that the native RocksDB resour

RocksDB segfault inside timer when accessing/clearing state

2017-10-06 Thread Kien Truong
Hi, We are using processing timer to implement some state clean up logic. After switching from FsStateBackend to RocksDB, we encounter a lot of segfault from the Time Trigger threads when accessing/clearing state value. We currently uses the latest 1.3-SNAPSHOT, with the patch upgrading RocksDB