My job is a batch one, not a streaming job. Is it possible that the cause is the one you mentioned?
On Mon, 14 May 2018, 14:23 Stefan Richter, <s.rich...@data-artisans.com> wrote: > Hi, > > that looks like a known issue where Flink did not wait for the shutdown of > the timer service before disposing state backends. This is problem fixed in > the >= 1.4 branches. > > Best, > Stefan > > Am 14.05.2018 um 14:12 schrieb Flavio Pompermaier <pomperma...@okkam.it>: > > Hi to all, > I have a Flink 1.3.1 job that runs multiple times. > Everything goes well for some time (e.g. 10 jobs). Then, one or more TMs > suddently die. > > In the .out file I find something like this: > > # > # A fatal error has been detected by the Java Runtime Environment: > # > # SIGSEGV (0xb) at pc=0x00007f6f3897712f, pid=18794, tid=140110535448320 > # > # JRE version: Java(TM) SE Runtime Environment (8.0_72-b15) (build > 1.8.0_72-b15) > # Java VM: Java HotSpot(TM) 64-Bit Server VM (25.72-b15 mixed mode > linux-amd64 compressed oops) > # Problematic frame: > # C [libc.so.6+0x7f12f] > # > # Failed to write core dump. Core dumps have been disabled. To enable core > dumping, try "ulimit -c unlimited" before starting Java again > # > # An error report file with more information is saved as: > # /home/user/hs_err_pid18794.log > # > # If you would like to submit a bug report, please visit: > # http://bugreport.java.com/bugreport/crash.jsp > # > > > Attached the produced error report. Do you find anything useful? > I can even send you the job's jar with the data but it requires about 200 > MB.. > > Best, > Flavio > <hs_err_pid18794.log> > > >