Ok, much clearer now. Thanks. Best Regards, Yu
On Thu, 28 Mar 2019 at 15:59, Paul Lam <paullin3...@gmail.com> wrote: > Hi Yu, > > I’ve set `fs.default-scheme` to hdfs, and it's mainly used for simplifying > checkpoint / savepoint / HA paths. > > And I leave the rocksdb local dir empty, so the local snapshot still goes > to YARN local cache dirs. > > Hope that answers your question. > > Best, > Paul Lam > > 在 2019年3月28日,15:34,Yu Li <l...@apache.org> 写道: > > Hi Paul, > > Regarding "mistakenly uses the default filesystem scheme, which is > specified to hdfs in the new cluster in my case", could you further clarify > the configuration property and value you're using? Do you mean you're using > an HDFS directory to store the local snapshot data? Thanks. > Best Regards, > Yu > > > > On Thu, 28 Mar 2019 at 14:34, Paul Lam <paullin3...@gmail.com> wrote: > >> Hi, >> >> It turns out that under certain circumstances rocksdb statebackend >> mistakenly uses the default filesystem scheme, which is specified to hdfs >> in the new cluster in my case. >> >> I’ve filed a Jira to track this[1]. >> >> [1] https://issues.apache.org/jira/browse/FLINK-12042 >> >> Best, >> Paul Lam >> >> 在 2019年3月27日,19:06,Paul Lam <paullin3...@gmail.com> 写道: >> >> Hi, >> >> I’m using Flink 1.6.4 and recently I ran into a weird issue of rocksdb >> statebackend. A job that runs fine on a YARN cluster keeps failing on >> checkpoint after migrated to a new one >> (with almost everything the same but better machines), and even a clean >> restart doesn’t help. >> >> The root cause is IllegalStateException but with no error message. The >> stack trace shows that when the rocksdb statebackend is doing the async >> part of snapshots (runSnapshot), >> it finds that the local snapshot directory that is created by rocksdb >> earlier (takeSnapshot) does not exist. >> >> I tried to log more informations in RocksDBKeyedStateBackend (see >> attachment), and found that the local snapshot performed as expected and >> the .sst files were written, >> but when the async task accessed the directory, the whole snapshot >> directory was gone. >> >> What could possibly be the cause? Thanks a lot. >> >> Best, >> Paul Lam >> >> <rocksdb_illegal_state.log.md> >> >> >>