Re: RocksDB local snapshot sliently disappears and cause checkpoint to fail

2019-03-28 Thread Yu Li
Ok, much clearer now. Thanks. Best Regards, Yu On Thu, 28 Mar 2019 at 15:59, Paul Lam wrote: > Hi Yu, > > I’ve set `fs.default-scheme` to hdfs, and it's mainly used for simplifying > checkpoint / savepoint / HA paths. > > And I leave the rocksdb local dir empty, so the local snapshot still goe

Re: RocksDB local snapshot sliently disappears and cause checkpoint to fail

2019-03-28 Thread Paul Lam
Hi Yu, I’ve set `fs.default-scheme` to hdfs, and it's mainly used for simplifying checkpoint / savepoint / HA paths. And I leave the rocksdb local dir empty, so the local snapshot still goes to YARN local cache dirs. Hope that answers your question. Best, Paul Lam > 在 2019年3月28日,15:34,Yu Li

Re: RocksDB local snapshot sliently disappears and cause checkpoint to fail

2019-03-28 Thread Yu Li
Hi Paul, Regarding "mistakenly uses the default filesystem scheme, which is specified to hdfs in the new cluster in my case", could you further clarify the configuration property and value you're using? Do you mean you're using an HDFS directory to store the local snapshot data? Thanks. Best Rega

Re: RocksDB local snapshot sliently disappears and cause checkpoint to fail

2019-03-27 Thread Paul Lam
Hi, It turns out that under certain circumstances rocksdb statebackend mistakenly uses the default filesystem scheme, which is specified to hdfs in the new cluster in my case. I’ve filed a Jira to track this[1]. [1] https://issues.apache.org/jira/browse/FLINK-12042

RocksDB local snapshot sliently disappears and cause checkpoint to fail

2019-03-27 Thread Paul Lam
Hi,I’m using Flink 1.6.4 and recently I ran into a weird issue of rocksdb statebackend. A job that runs fine on a YARN cluster keeps failing on checkpoint after migrated to a new one (with almost everything the same but better machines), and even a clean restart doesn’t help. The root cause is Ille