Re: How to debug a job stuck in a deployment/run loop?

Till Rohrmann Wed, 29 Jan 2020 04:41:10 -0800

Hi Jason,

getting access to the log files would help most to figure out what's going
wrong.


Cheers,
Till

On Tue, Jan 28, 2020 at 9:08 AM Arvid Heise <ar...@ververica.com> wrote:

> Hi Jason,
>
> could you describe your topology? Are you writing to Kafka? Are you using
> exactly once? Are you seeing any warning?
> If so, one thing that immediately comes to my mind is
> transaction.max.timeout.ms. If the value in flink (by default 1h) is
> higher than what the Kafka brokers support, it may run into indefinite
> restart loops in rare cases.
>
> "Kafka brokers by default have transaction.max.timeout.ms set to 15
> minutes. This property will not allow to set transaction timeouts for the
> producers larger than it’s value. FlinkKafkaProducer011 by default sets
> the transaction.timeout.ms property in producer config to 1 hour, thus
> transaction.max.timeout.ms should be increased before using the
> Semantic.EXACTLY_ONCE mode."
>
> Best,
>
> Arvid
>
> On Fri, Jan 24, 2020 at 2:47 AM Jason Kania <jason.ka...@ymail.com> wrote:
>
>> I am attempting to migrate from 1.7.1 to 1.9.1 and I have hit a problem
>> where previously working jobs can no longer launch after being submitted.
>> In the UI, the submitted jobs show up as deploying for a period, then go
>> into a run state before returning to the deploy state and this repeats
>> regularly with the job bouncing between states. No exceptions or errors are
>> visible in the logs. There is no data coming in for the job to process and
>> the kafka queues are empty.
>>
>> If I look at the thread activity of the task manager running the job in
>> top, I see that the busiest threads are flink-akka threads, sometimes
>> jumping to very high CPU numbers. That is all I have for info.
>>
>> Any suggestions on how to debug this? I can set break points and connect
>> if that helps, just not sure at this point where to start.
>>
>> Thanks,
>>
>> Jason
>>
>

Re: How to debug a job stuck in a deployment/run loop?

Reply via email to