Re: Metric counter gets reset when leader jobmanager changes in Flink native K8s HA solution

2021-06-14 Thread Prasanna kumar
amit, This is expected behaviour from counter . If the total count irrespective of the restarts needed to be found, aggregate functions need to be applied on the counter . Example sum(Rate(counter)) https://prometheus.io/docs/prometheus/latest/querying/functions/ Prasanna. On Tue, Jun 15, 2021

Confusions and suggestions about Configuration

2021-06-14 Thread Jason Lee
Hi everyone, When I was researching and using Flink recently, I found that the official documentation on how to configure parameters is confusing, mainly as follows: 1. In the Configuration module of the official document, the description and default value of each parameter are introduced. T

Metric counter gets reset when leader jobmanager changes in Flink native K8s HA solution

2021-06-14 Thread Amit Bhatia
Hi, We have configured jobmanager HA with flink 1.12.1 on the k8s environment. We have implemented a HA solution using Native K8s HA solution ( https://cwiki.apache.org/confluence/display/FLINK/FLIP-144%3A+Native+Kubernetes+HA+for+Flink). We have used deployment controller for both jobmanager & ta

Got multiple issues when running the tutorial project "table-walkthrough" on IDEA

2021-06-14 Thread Lingfeng Pu
Hi, I'm currently trying to debug the tutorial project "table-walkthrough" on IDEA on a standalone Flink environment. I installed the required software (Java 8 or 11, Maven, Docker) according to the tutorial. I'll provide some key environment info about the running environment before describing th

Discard checkpoint files through a single recursive call

2021-06-14 Thread Jiahui Jiang
Hello Flink! We are building an infrastructure where we implement our own CompletedCheckpointStore. The read and write to the external storage location of these checkpoints are through HTTP calls to an external service. Recently we noticed some checkpoint file cleanup performance issue when the

Flink job restart when one ZK node is down

2021-06-14 Thread Lu Niu
HI, Flink Users We use a Zk cluster of 5 node for JM HA. When we terminate one node for maintenance, we notice lots of flink job fully restarts. The error looks like: ``` org.apache.flink.util.FlinkException: ResourceManager leader changed to new address null at org.apache.flink.runtime.taskex

TypeInfo issue with Avro SpecificRecord

2021-06-14 Thread Patrick Lucas
Hi, I have read [1] when it comes to using Avro for serialization, but I'm stuck with a mysterious exception when Flink is doing type resolution. (Flink 1.13.1) Basically, I'm able to use a SpecificRecord type in my source, but I am unable to use different SpecificRecord types later in the pipeli

Re: PyFlink: Upload resource files to Flink cluster

2021-06-14 Thread Dian Fu
Hi Sumeet, The archive files will be uploaded to the blob server. This is the same no matter specifying the archives via command line option `—pyArchives` or via `add_python_archive`. > And when I try to programmatically do this by calling add_python_archive(), > the job gets submitted but f

Re: How to know (in code) how many times the job restarted?

2021-06-14 Thread Roman Khachatryan
You can also use accumulators [1] to collect the number of restarts (and then access it via client); but side outputs should work as well. [1] https://ci.apache.org/projects/flink/flink-docs-release-1.4/dev/api_concepts.html#accumulators--counters Regards, Roman On Sun, Jun 13, 2021 at 10:01 PM

Re: Checkpoint is timing out - inspecting state

2021-06-14 Thread Yun Gao
Hi Dan, Flink should already have integrate a tool in the web UI to monitor the detailed statistics of the checkpoint [1]. It would show the time consumed in each part and each task, thus it could be used to debug the checkpoint timeout. Best, Yun [1] https://ci.apache.org/projects/flink/fli

Re: should i expect POJO serialization warnings when dealing w/ kryo protobuf serialization?

2021-06-14 Thread Yun Gao
Hi Jin, The warning would be given as long as trying to parse the type as PoJo failed, and turn to the Kryo serializer. The registered ProtobufSerializer would acts as a plugin inside the kryo serializer. Thus the warning should be able to be ignored. When serializing it would first turn to the k