Re: Flink 1.6.0 not allocating specified TMs in Yarn

2018-09-18 Thread Subramanya Suresh
nd the command with which > you started the Flink cluster. From the log snippet it looks as if Flink > only got 8GB of memory assigned. > > Cheers, > Till > > On Mon, Sep 17, 2018 at 11:34 PM Subramanya Suresh > wrote: > >> I got these logs from one of the Yarn logs. Not

Re: Migration to Flink 1.6.0, issues with StreamExecutionEnvironment.registerCachedFile

2018-09-18 Thread Subramanya Suresh
nd just passes on the function call. > I tried it locally on my machine and it works for me. > > What is your setup? Are you running on Yarn? > > Maybe Chesnay or Dawid (added to CC) can help to track the problem down. > > Best, Fabian > > 2018-09-18 6:10 GMT+02:00 Subrama

Migration to Flink 1.6.0, issues with StreamExecutionEnvironment.registerCachedFile

2018-09-17 Thread Subramanya Suresh
Hi, We are running into some trouble with StreamExecutionEnvironment.registerCachedFile (works perfectly fine in 1.4.2). - We register some CSV files in HDFS with executionEnvironment.registerCachedFile("hdfs:///myPath/myCsv", myCSV.csv) - In a UDF (ScalarFunction), in the open function

Re: Flink 1.6.0 not allocating specified TMs in Yarn

2018-09-17 Thread Subramanya Suresh
fraction, taskmanager.network.memory.min, taskmanager.network.memory.max) : (0.1, 80, 120) - Network buffer memory size too large: 80 >= 7769948160(maximum JVM heap size) Please also see my questions above. Cheers, On Mon, Sep 17, 2018 at 12:19 PM, Subramanya Su

Re: Flink 1.6.0 not allocating specified TMs in Yarn

2018-09-17 Thread Subramanya Suresh
say. If there is nothing > suspicious, then it would be helpful if you could share the complete logs > with us. > > Cheers, > Till > > > > On Mon, Sep 17, 2018 at 9:16 AM Subramanya Suresh > wrote: > >> Hi, >> Was suggested here to migrate to 1.6.0 in lieu

Re: Job goes down after 1/145 TMs is lost (NoResourceAvailableException)

2018-09-05 Thread Subramanya Suresh
2018-09-03 06:34:57,005 INFO org.apache.flink.yarn.YarnFlinkResourceManager- Shutting down cluster with status FAILED : The monitored job with ID 96d3b4f60a80a898f44f87c5b06f6981 has failed to complete. 2018-09-03 06:34:57,007 INFO org.apache.flink.yarn.YarnFlinkResourceManager

Re: Job goes down after 1/145 TMs is lost (NoResourceAvailableException)

2018-08-31 Thread Subramanya Suresh
askmanager.exit-on-fatal-akka-error: true` in the `flink-conf.yaml`. > > Cheers, > Till > > On Fri, Aug 31, 2018 at 10:43 AM Subramanya Suresh > wrote: > >> Hi Till, >> Greatly appreciate your reply. >> We use version 1.4.2. I do not see nothing unusual in the

Re: Job goes down after 1/145 TMs is lost (NoResourceAvailableException)

2018-08-31 Thread Subramanya Suresh
m occur when using the latest release > Flink 1.6.0? > > Cheers, > Till > > On Thu, Aug 30, 2018 at 8:48 AM Subramanya Suresh > wrote: > >> Hi, we are seeing a weird issue where one TaskManager is lost and then >> never re-allocated and subsequently operators fail with &g

Job goes down after 1/145 TMs is lost (NoResourceAvailableException)

2018-08-29 Thread Subramanya Suresh
Hi, we are seeing a weird issue where one TaskManager is lost and then never re-allocated and subsequently operators fail with NoResourceAvailableException and after 5 restarts (we have FixedDelay restarts of 5) the application goes down. - We have explicitly set *yarn.reallocate-failed: *true