Re: Streaming Job eventually begins failing during checkpointing

2020-04-27 Thread Yu Li
would it be possible to >>>> create so many operator states? Did you configure some parameters wrongly? >>>> >>>> >>>> [1] >>>> https://github.com/apache/beam/blob/4fc924a8193bb9495c6b7ba755ced576bb8a35d5/runners/flink/src/main/java/org/ap

Re: Streaming Job eventually begins failing during checkpointing

2020-04-25 Thread Eleanore Jin
b9495c6b7ba755ced576bb8a35d5/runners/flink/src/main/java/org/apache/beam/runners/flink/translation/wrappers/streaming/stableinput/BufferingDoFnRunner.java#L95 >>> >>> Best >>> Yun Tang >>> -- >>> *From:* Stephen Patel >&g

Re: Streaming Job eventually begins failing during checkpointing

2020-04-23 Thread Stephan Ewen
5/runners/flink/src/main/java/org/apache/beam/runners/flink/translation/wrappers/streaming/stableinput/BufferingDoFnRunner.java#L95 >> >> Best >> Yun Tang >> ---------- >> *From:* Stephen Patel >> *Sent:* Thursday, April 16, 2020 22:30 >&g

Re: Streaming Job eventually begins failing during checkpointing

2020-04-16 Thread Yun Tang
: Re: Streaming Job eventually begins failing during checkpointing Correction. I've actually found a place where it potentially might be creating a new operator state per checkpoint: https://github.com/apache/beam/blob/4fc924a8193bb9495c6b7ba755ced576bb8a35d5/runners/flink/src/main/jav

Re: Streaming Job eventually begins failing during checkpointing

2020-04-16 Thread Stephen Patel
> *To:* Yun Tang > *Cc:* user@flink.apache.org > *Subject:* Re: Streaming Job eventually begins failing during > checkpointing > > Correction. I've actually found a place where it potentially might be > creating a new operator state per checkpoint: > > https://git

Re: Streaming Job eventually begins failing during checkpointing

2020-04-16 Thread Stephen Patel
call context.getOperatorStateStore().getListState or >> context.getOperatorStateStore().getBroadcastState ? Did you pass a >> different operator state descriptor each time? >> >> Best >> Yun Tang >> -- >> *From:* Stephen Patel >&

Re: Streaming Job eventually begins failing during checkpointing

2020-04-16 Thread Stephen Patel
ent operator state descriptor each time? > > Best > Yun Tang > -- > *From:* Stephen Patel > *Sent:* Thursday, April 16, 2020 2:09 > *To:* user@flink.apache.org > *Subject:* Streaming Job eventually begins failing during checkpointing > > I&

Re: Streaming Job eventually begins failing during checkpointing

2020-04-15 Thread Yun Tang
().getBroadcastState ? Did you pass a different operator state descriptor each time? Best Yun Tang From: Stephen Patel Sent: Thursday, April 16, 2020 2:09 To: user@flink.apache.org Subject: Streaming Job eventually begins failing during checkpointing I've got a flink (

Streaming Job eventually begins failing during checkpointing

2020-04-15 Thread Stephen Patel
I've got a flink (1.8.0, emr-5.26) streaming job running on yarn. It's configured to use rocksdb, and checkpoint once a minute to hdfs. This job operates just fine for around 20 days, and then begins failing with this exception (it fails, restarts, and fails again, repeatedly): 2020-04-15 13:15: