Re: Task manager processes crashing one after the other

2016-12-15 Thread Robert Metzger
I experienced a quite similar issue with RocksDB on my cluster, also after some retries (with the Flink 1.1.4 RC3) # # A fatal error has been detected by the Java Runtime Environment: # # SIGSEGV (0xb) at pc=0x7f1611829f4e, pid=3545, tid=139732543575808 # # JRE version: Java(TM) SE Runtime En

Re: Task manager processes crashing one after the other

2016-08-26 Thread Gyula Fóra
Some addtitional info: It doesn't seem to happen the first time I start the jobs / restore them from a savepoint. It happens as jobs are failing over after a task manager failure. This could be an issue caused by a non-empty rocks directory (that was somehow in an inconsistent state) but that sho

Re: Task manager processes crashing one after the other

2016-08-25 Thread Gyula Fóra
Stephan, I ported the fix for the concurrency issue from the Flink commit so now that should be fine. I ran some fail/restore tests and that specific issue hasn't appeared again. However I now get many segfaults in the initializeForJob method where the RocksDb instance is opened. Just for the rec

Re: Task manager processes crashing one after the other

2016-08-25 Thread Gyula Fóra
Yes seems like that, I remember the fix in Flink. I apparently made a mistake somewhere in our code :) Thanks, Gyula On Thu, Aug 25, 2016, 18:59 Stephan Ewen wrote: > We saw some crashes in earlier versions when native handles in RocksDB > (even for config option objects) were manually and too

Re: Task manager processes crashing one after the other

2016-08-25 Thread Stephan Ewen
We saw some crashes in earlier versions when native handles in RocksDB (even for config option objects) were manually and too eagerly released. Maybe you have a similar issue here? On Thu, Aug 25, 2016 at 6:27 PM, Gyula Fóra wrote: > Hi, > This seems to be a sneaky concurrency issue in our cust

Re: Task manager processes crashing one after the other

2016-08-25 Thread Gyula Fóra
Hi, This seems to be a sneaky concurrency issue in our custom statebackend implementation. I made some changes, will keep you posted. Cheers, Gyula On Thu, Aug 25, 2016, 10:54 Gyula Fóra wrote: > Hi, > > Sure I am sending the TM logs in priv. > > Currently what I did was to bump the Rocks vers

Re: Task manager processes crashing one after the other

2016-08-25 Thread Gyula Fóra
Hi, Sure I am sending the TM logs in priv. Currently what I did was to bump the Rocks version to 4.9.0 let's see if that helps. Cheers, Gyula Till Rohrmann ezt írta (időpont: 2016. aug. 25., Cs, 10:35): > Hi Gyula, > > I haven't seen this problem before. Do you have the logs of the failed TMs

Re: Task manager processes crashing one after the other

2016-08-25 Thread Till Rohrmann
Hi Gyula, I haven't seen this problem before. Do you have the logs of the failed TMs so that we have some more context what was going on? Cheers, Till On Thu, Aug 25, 2016 at 9:40 AM, Gyula Fóra wrote: > Hi guys, > > For quite some time now we fairly frequently experience a task manager > cras

Task manager processes crashing one after the other

2016-08-25 Thread Gyula Fóra
Hi guys, For quite some time now we fairly frequently experience a task manager crashes around the time new streaming jobs are deployed. We use RocksDB backend so this might be related. We tried changing the GC from G1 to CMS that didnt help. Yesterday for instance 6 task managers crashed one of