回复: Flink failure rate restart not work as expect

2022-03-01 Thread 刘 家锹
Hi, all I think we may find the reason, that's relate to the 'jobmanager.execution.failover-strategy' configuration and the job region numbers. In our case, we set failover-strategy to 'region' and this job has 6 regions running on only one TaskManager. So when the container goes down, every r

org.apache.flink.streaming.connectors.kafka.FlinkKafkaException: Failed to send data to Kafka: Pending record count must be zero at this point: 5592

2022-03-01 Thread 刘方亮
Hi everyone, The *exactly-once* of kafka connector and checkpoint is turned on, and the reason for the following exception occurs ? Main exception information: org.apache.flink.streaming.connectors.kafka.FlinkKafkaException: Failed to send data to Kafka: Pending record count must be zero at this

Re: Flink job recovery after task manager failure

2022-03-01 Thread yidan zhao
State backend can be set as hashMap or rocksDB. Checkpoint storage must be a shared file system(nfs or hdfs or something else). Afek, Ifat (Nokia - IL/Kfar Sava) 于2022年3月2日周三 05:55写道: > Hi, > > > > I’m trying to understand the guidelines for task manager recovery. > > From what I see in the docu

Re: Flink job recovery after task manager failure

2022-03-01 Thread Afek, Ifat (Nokia - IL/Kfar Sava)
Hi, I’m trying to understand the guidelines for task manager recovery. From what I see in the documentation, state backend can be set as in memory / file system / rocksdb, and the checkpoint storage requires a shared file system for both file system and rocksdb. Is that correct? Must the file sy

Looking for advice on moving Datastream job to Table SQL

2022-03-01 Thread Jonathan Weaver
I'm doing a POC on moving an existing Datastream API job to use Table SQL to make it more accessible for some of my teammates. However I'm at a loss on how to handle watermarking in a similar way to how it was handled in the Datastream API. In the existing job a CDC stream is read, and 3 SQL tabl

Netty Client Thread - Classloader leak

2022-03-01 Thread Sudharsan R
Hello, I'm running a flink 1.11.1 cluster. When I submit a job, it spawns a thread named "Flink Netty Client (0) Thread 0 Thread". It seems to be executing "org.apache.flink.shaded.netty4.io.netty.util.internal.ThreadExecutorMap" This thread associates as its context classloader a ChildFirstClassLo

Re: Flink failure rate restart not work as expect

2022-03-01 Thread 刘 家锹
I realized I missed mentioning something above, the container exit code is 163, which is not the normal code, at least I can’t find any meaning from google. So, my test didn’t cover this situation, I don’t know whether it impacts the results. 获取 Outlook for iOS __

Re: Flink failure rate restart not work as expect

2022-03-01 Thread 刘 家锹
We didn't find any obvious configuration issues in our cluster. As far as I know, It works fine in most cases; I also simulate failover under current configuration, by starting a new job with only one TaskManager, then kill the TaskManager container, and this job recovery from failures successfu

Re: Flink failure rate restart not work as expect

2022-03-01 Thread Matthias Pohl
The YARN node manager logs support my observation: The container exits with a failure which, if I understand it correctly, should cause a container restart on the YARN side. In HA mode, Flink expects the underlying resource management to restart the Flink cluster in case of failure. This does not s

Re: Adaptive Batch Configuration Does Not Work

2022-03-01 Thread Lijie Wang
Thanks for feedback! Can you post the jobmanager log? This may be because the source parallelism has been set in some way (Kind of source can infer parallelism according to the catalog). Best, Lijie Edwin 于2022年3月1日周二 19:30写道: > Hi all, > > I was trying to run flink-tpcds tests on flink v1.15-s

Adaptive Batch Configuration Does Not Work

2022-03-01 Thread Edwin
Hi all, I was trying to run flink-tpcds tests on flink v1.15-snapshot, and I added "jobmanager.adaptive-batch-scheduler.default-source-parallelism: 8" along with some other configurations to the conf yaml file to enable adaptive job scheduler. But it turned out the source parallelism kept unch

Re: processwindowfunction output Iterator

2022-03-01 Thread HG
Ok I had the impression, obviously incorrect that it could be called only once. Thanks On Tue, Mar 1, 2022, 09:26 Schwalbe Matthias wrote: > Goedemorgen Hans, > > > > You can call the out.collect(…) multiple times, i.e. for each forwarded > event … how about this 😊 > > > > Thias > > > > > > *Fr

Re: Flink failure rate restart not work as expect

2022-03-01 Thread Matthias Pohl
Hi, I second Alex' observation - based on the logs it looks like the task restart functionality worked as expected: It tried to restart the tasks until it reached the limit of 4 attempts due to the missing TaskManager. The job-cluster shut down with an error code. At this point, YARN should pick it

RE: processwindowfunction output Iterator

2022-03-01 Thread Schwalbe Matthias
Goedemorgen Hans, You can call the out.collect(…) multiple times, i.e. for each forwarded event … how about this 😊 Thias From: HG Sent: Montag, 28. Februar 2022 16:25 To: user Subject: processwindowfunction output Iterator Hi, Can processwindowfunction output an Iterator? I need to sort a