[ https://issues.apache.org/jira/browse/FLINK-4193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15624754#comment-15624754 ]
Gyula Fora commented on FLINK-4193: ----------------------------------- These issues usually happened inside the RocksDB.open(...) method during initialization of the state backend. If you think that the refactoring can affect this then we might get lucky :) We are running this in production applications and haven't ported them to 1.2 but in a week or two I will start working on that. > Task manager JVM crashes while deploying cancelling jobs > -------------------------------------------------------- > > Key: FLINK-4193 > URL: https://issues.apache.org/jira/browse/FLINK-4193 > Project: Flink > Issue Type: Bug > Components: Streaming, TaskManager > Reporter: Gyula Fora > Priority: Critical > > We have observed several TM crashes while deploying larger stateful streaming > jobs that use the RocksDB state backend. > As the JVMs crash the logs don't show anything but I have uploaded all the > info I have got from the standard output. > This indicates some GC and possibly some RocksDB issues underneath but we > could not really figure out much more. > GC segfault > https://gist.github.com/gyfora/9e56d4a0d4fc285a8d838e1b281ae125 > Other crashes (maybe rocks related) > https://gist.github.com/gyfora/525c67c747873f0ff2ff2ed1682efefa > https://gist.github.com/gyfora/b93611fde87b1f2516eeaf6bfbe8d818 > The third link shows 2 issues that happened in parallel... -- This message was sent by Atlassian JIRA (v6.3.4#6332)