Hi, This seems to be a sneaky concurrency issue in our custom statebackend implementation.
I made some changes, will keep you posted. Cheers, Gyula On Thu, Aug 25, 2016, 10:54 Gyula Fóra <gyula.f...@gmail.com> wrote: > Hi, > > Sure I am sending the TM logs in priv. > > Currently what I did was to bump the Rocks version to 4.9.0 let's see if > that helps. > > Cheers, > Gyula > > Till Rohrmann <trohrm...@apache.org> ezt írta (időpont: 2016. aug. 25., > Cs, 10:35): > >> Hi Gyula, >> >> I haven't seen this problem before. Do you have the logs of the failed TMs >> so that we have some more context what was going on? >> >> Cheers, >> Till >> >> On Thu, Aug 25, 2016 at 9:40 AM, Gyula Fóra <gyf...@apache.org> wrote: >> >> > Hi guys, >> > >> > For quite some time now we fairly frequently experience a task manager >> > crashes around the time new streaming jobs are deployed. We use RocksDB >> > backend so this might be related. >> > >> > We tried changing the GC from G1 to CMS that didnt help. >> > >> > Yesterday for instance 6 task managers crashed one ofter the other with >> > similar errors: >> > >> > *** Error in `java': double free or corruption (!prev): >> 0x00007fac0414d760 >> > *** >> > *** Error in `java': free(): invalid pointer: 0x00007f8dcc0026c0 *** >> > *** Error in `java': double free or corruption (!prev): >> 0x00007f15247f9a90 >> > *** >> > ... >> > >> > Does anyone have any clue what might cause this or how to debug? >> > This is very a critical issue :( >> > >> > Cheers, >> > Gyula >> > >> >