Hi all,

Oh, I took this ticket, will fix it as soon as possible.

Thanks, vino.

Till Rohrmann <trohrm...@apache.org> 于2018年9月20日周四 下午4:35写道:

> Hi Tzanko,
>
> in order to make the container entrypoint properly work with HA, we need
> to fix the JobID (see https://issues.apache.org/jira/browse/FLINK-10291).
> At the moment, we generate a new JobID for every restart of the cluster
> entrypoint container. Due to that the system cannot find the existing
> checkpoints.
>
> Fixing the JobID is not a big deal and it should be fixed with the next
> bug fix release.
>
> Cheers,
> Till
>
> On Thu, Sep 20, 2018 at 10:12 AM vino yang <yanghua1...@gmail.com> wrote:
>
>> Hi Tzanko,
>>
>> Maybe Till is more appropriate to answer this question.
>>
>> Thanks, vino.
>>
>> Tzanko Matev <tsa...@gmail.com> 于2018年9月19日周三 下午5:47写道:
>>
>>> Dear all,
>>>
>>> I am currently experimenting with a Flink 1.6.0 job cluster. The goal is
>>> to run a streaming job on K8s. Right now I am using docker-compose to
>>> experiment with the job cluster.
>>>
>>> I am trying to set-up HA with Zookeeper, but I seem to fail. I have a
>>> docker-compose file which contains the following services:
>>> - Zookeeper
>>> - Flink job manager
>>> - Flink task manager
>>>
>>> The containers are set up as per the documentation for docker-compose,
>>> but I have also set up the necessary HA settings in the conf file. However,
>>> when I kill the job manager container and start it again, the job being
>>> processed does not recover but always starts from scratch. Instead I get
>>> the following error:
>>>
>>> > ERROR org.apache.flink.runtime.rest.handler.job.JobDetailsHandler  -
>>> Could not retrieve the redirect address.
>>> >
>>> > java.util.concurrent.CompletionException:
>>> org.apache.flink.runtime.rpc.exceptions.FencingTokenException: Fencing
>>> token not set: Ignoring message
>>> LocalFencedMessage(8c4887f5c13f6d907d82a55d97ac428f,
>>> LocalRpcInvocation(requestRestAddress(Time))) sent to
>>> akka.tcp://flink@blockprocessor-job-cluster:50000/user/dispatcher
>>> because the fencing token is null.
>>>
>>> Am I missing something? Is HA implemented for job clusters at all?
>>>
>>> Best wishes,
>>> Tzanko Matev
>>>
>>>

Reply via email to