would it be possible to
>>>> create so many operator states? Did you configure some parameters wrongly?
>>>>
>>>>
>>>> [1]
>>>> https://github.com/apache/beam/blob/4fc924a8193bb9495c6b7ba755ced576bb8a35d5/runners/flink/src/main/java/org/ap
b9495c6b7ba755ced576bb8a35d5/runners/flink/src/main/java/org/apache/beam/runners/flink/translation/wrappers/streaming/stableinput/BufferingDoFnRunner.java#L95
>>>
>>> Best
>>> Yun Tang
>>> --
>>> *From:* Stephen Patel
>&g
5/runners/flink/src/main/java/org/apache/beam/runners/flink/translation/wrappers/streaming/stableinput/BufferingDoFnRunner.java#L95
>>
>> Best
>> Yun Tang
>> ----------
>> *From:* Stephen Patel
>> *Sent:* Thursday, April 16, 2020 22:30
>&g
: Re: Streaming Job eventually begins failing during checkpointing
Correction. I've actually found a place where it potentially might be creating
a new operator state per checkpoint:
https://github.com/apache/beam/blob/4fc924a8193bb9495c6b7ba755ced576bb8a35d5/runners/flink/src/main/jav
> *To:* Yun Tang
> *Cc:* user@flink.apache.org
> *Subject:* Re: Streaming Job eventually begins failing during
> checkpointing
>
> Correction. I've actually found a place where it potentially might be
> creating a new operator state per checkpoint:
>
> https://git
call context.getOperatorStateStore().getListState or
>> context.getOperatorStateStore().getBroadcastState ? Did you pass a
>> different operator state descriptor each time?
>>
>> Best
>> Yun Tang
>> --
>> *From:* Stephen Patel
>&
ent operator state descriptor each time?
>
> Best
> Yun Tang
> --
> *From:* Stephen Patel
> *Sent:* Thursday, April 16, 2020 2:09
> *To:* user@flink.apache.org
> *Subject:* Streaming Job eventually begins failing during checkpointing
>
> I&
().getBroadcastState ? Did you pass a different
operator state descriptor each time?
Best
Yun Tang
From: Stephen Patel
Sent: Thursday, April 16, 2020 2:09
To: user@flink.apache.org
Subject: Streaming Job eventually begins failing during checkpointing
I've got a flink (
I've got a flink (1.8.0, emr-5.26) streaming job running on yarn. It's
configured to use rocksdb, and checkpoint once a minute to hdfs. This job
operates just fine for around 20 days, and then begins failing with this
exception (it fails, restarts, and fails again, repeatedly):
2020-04-15 13:15: