Hi Stephan, Other jobs run fine but this one is not working on the machine that I was using previously (16GB RAM) [1]
Is there a way to debug the Akka messages to understand what's happening between the JobManager and the Client? I can add logging and send it. Thanks! Fred [1] The failure started to happen when I added the flatMap transformation. Previously I was calling the collect function after the reduceGroup and then using Scala's flatten function. Since this was very slow and failed with large datafile I used Flink to flatten the list of lists and now it faster. On Jan 15, 2016 11:51, "Stephan Ewen" <se...@apache.org> wrote: > Hi! > > Do you get this problem with other Jobs as well? > > The logs suggest that the JobManager receives the job and starts tasks, > but the Client thinks it lost connection. > > Greetings, > Stephan > > > On Fri, Jan 15, 2016 at 10:31 AM, Frederick Ayala < > frederickay...@gmail.com> wrote: > >> Hi Robert, >> >> Thanks for your reply. >> >> I set the akka.ask.timeout to 10k seconds just to see what happened. I >> tried different values but non did the trick. >> >> My problem was solved by using a machine with more RAM. However, it was >> not clear that the memory was the problem :) >> >> Attached are the log and the Scala code of the transformation that I was >> running. >> >> The data file I am processing is around 57M lines (~1.7GB). >> >> Let me know if you have any comment or suggestion. >> >> Thanks again, >> >> Frederick >> >> >> >> On Fri, Jan 15, 2016 at 10:01 AM, Robert Metzger <rmetz...@apache.org> >> wrote: >> >>> Hi Frederick, >>> >>> sorry for the delayed response. >>> >>> I have no idea what the problem could be. >>> Has the exception been thrown from the env.execute() call? >>> Why did you set the akka.ask.timeout to 10k seconds? >>> >>> >>> >>> >>> On Wed, Jan 13, 2016 at 2:13 AM, Frederick Ayala < >>> frederickay...@gmail.com> wrote: >>> >>>> Hi, >>>> >>>> I am having an error while running some Flink transformations in a >>>> DataStream Scala API. >>>> >>>> The error I get is: >>>> >>>> Timeout while waiting for JobManager answer. Job time exceeded 21474835 >>>> seconds >>>> ... >>>> >>>> Caused by: akka.pattern.AskTimeoutException: Ask timed out on >>>> [Actor[akka://flink/user/$a#183984057]] after [21474835000 ms] >>>> >>>> >>>> This happens after a couple of minutes. Not after 21474835 seconds... >>>> >>>> I tried different configurations but no result so far: >>>> val customConfiguration = new Configuration() >>>> customConfiguration.setInteger("parallelism", 8) >>>> customConfiguration.setInteger("jobmanager.heap.mb",2560) >>>> customConfiguration.setInteger("taskmanager.heap.mb",10240) >>>> customConfiguration.setInteger("taskmanager.numberOfTaskSlots",8) >>>> >>>> customConfiguration.setInteger("taskmanager.network.numberOfBuffers",16384) >>>> customConfiguration.setString("akka.ask.timeout","10000 s") >>>> customConfiguration.setString("akka.lookup.timeout","100 s") >>>> env = >>>> ExecutionEnvironment.createLocalEnvironment(customConfiguration) >>>> >>>> Any idea what could it be the problem? >>>> >>>> Thanks! >>>> >>>> Frederick >>>> >>> >>> >> >> >> -- >> Frederick Ayala >> > >