Re: Flink job didn't restart when a task failed

2020-04-14 Thread Hanson, Bruce
cha Krettek mailto:aljos...@apache.org>> wrote: Hi, this indeed seems very strange! @Gary Could you maybe have a look at this since you work/worked quite a bit on the scheduler? Best, Aljoscha On 09.04.20 05:46, Hanson, Bruce wrote: > Hello Flink folks: > > We had a problem with

Flink job didn't restart when a task failed

2020-04-08 Thread Hanson, Bruce
Hello Flink folks: We had a problem with a Flink job the other day that I haven’t seen before. One task encountered a failure and switched to FAILED (see the full exception below). After the failure, the task said it was notifying the Job Manager: 2020-04-06 08:21:04.329 [flink-akka.actor.defau

Fencing token exceptions from Job Manager High Availability mode

2019-09-30 Thread Hanson, Bruce
Hi all, We are running some of our Flink jobs with Job Manager High Availability. Occasionally we get a cluster that comes up improperly and doesn’t respond. Attempts to submit the job seem to hang and when we hit the /overview REST endpoint in the Job Manager we get a 500 error and a fencing t

Error submitting stand-alone Flink job to EMR YARN cluster

2016-06-30 Thread Hanson, Bruce
I’m trying to submit a stand-alone Flink job to a YARN cluster running on EMR (Elastic MapReduce) nodes in AWS. When it tries to start a container for the Job Manager, it fails. The error message from the container is below. The command I’m using is: $ flink run -m yarn-cluster -yn 1 -ynm test1

Streaming job software update

2016-05-04 Thread Hanson, Bruce
Hi all, I’m working on using Flink to do a variety of streaming jobs that will be processing very high-volume streams. I want to be able to update a job’s software with an absolute minimum impact on the processing of the data. What I don’t understand the best way to update the software running