Re: [DISCUSS] FLIP-133: Rework PyFlink Documentation

2020-08-05 Thread jincheng sun
Hi David, Thank you for sharing the problems with the current document, and I agree with you as I also got the same feedback from Chinese users. I am often contacted by users to ask questions such as whether PyFlink supports "Java UDF" and whether PyFlink supports "xxxConnector". The root cause of

Re: Issue with single job yarn flink cluster HA

2020-08-05 Thread Ken Krugler
Hi Dinesh, Did updating to Flink 1.10 resolve the issue? Thanks, — Ken > Hi Andrey, > Sure We will try to use Flink 1.10 to see if HA issues we are facing is fixed > and update in this thread. > > Thanks, > Dinesh > > On Thu, Apr 2, 2020 at 3:22 PM Andrey Zagrebin

Flink Promethues Metricsreporter Question

2020-08-05 Thread Avijit Saha
Hi, Have a general question about Flink support for Prometheus metrics. We already have a Prometheus setup in our cluster with ServiceMonitor-s monitoring ports like 8080 etc. for scraping metrics. In a setup like this, if we deploy Flink Job managers/Task managers in the cluster, is there any ne

Re: Flink CPU load metrics in K8s

2020-08-05 Thread Bajaj, Abhinav
Thanks Roman for providing the details. I also made more observations that has increased my confusion about this topic 😝 To ease the calculations, I deployed a test cluster this time providing 1 CPU in K8s(with docker) for all the taskmanager container. When I check the taskmanager CPU load, the

Re: Two Queries and a Kafka Topic

2020-08-05 Thread Marco Villalobos
Hi Theo, Thank you. I just read the State Processor API in an effort to understand Option 1, it seems though I can just use a KeyedProcessFunction that loads the data just once (maybe on the "open" method), and serialize the values into MapState and use it from that point on. Another option i

Re: Handle idle kafka source in Flink 1.9

2020-08-05 Thread bat man
Hello Arvid, Thanks for the suggestion/reference and my apologies for the late reply. With this I am able to process the data with some topics not having regular data. Obviously, late data is being handheld as in side-output and has a process for it. One challenge is to handle the back-fill as wh

Re: Status of a job when a kafka source dies

2020-08-05 Thread Piotr Nowojski
Hi Nick, Could you elaborate more, what event and how would you like Flink to handle? Is there some kind of Kafka's API that can be used to listen to such kind of events? Becket, do you maybe know something about this? As a side note Nick, can not you configure some timeouts [1] in the KafkaConsu

Re: Status of a job when a kafka source dies

2020-08-05 Thread Nick Bendtner
+user group. On Wed, Aug 5, 2020 at 12:57 PM Nick Bendtner wrote: > Thanks Piotr but shouldn't this event be handled by the FlinkKafkaConsumer > since the poll happens inside the FlinkKafkaConsumer. How can I catch this > event in my code since I don't have control over the poll. > > Best, > Nic

Re: Status of a job when a kafka source dies

2020-08-05 Thread Piotr Nowojski
Hi Nick, What Aljoscha was trying to say is that Flink is not trying to do any magic. If `KafkaConsumer` - which is being used under the hood of `FlinkKafkaConsumer` connector - throws an exception, this exception bubbles up causing the job to failover. If the failure is handled by the `KafkaConsu

Re: Kafka transaction error lead to data loss under end to end exact-once

2020-08-05 Thread Piotr Nowojski
Hi Lu, In this case, as it looks from the quite fragmented log/error message that you posted, the job has failed so Flink indeed detected some issue and that probably means a data loss in Kafka (in such case you could probably recover some lost records by reading with `read_uncommitted` mode from

Re: Behavior for flink job running on K8S failed after restart strategy exhausted

2020-08-05 Thread Eleanore Jin
Hi Yang and Till, Thanks a lot for the help! I have the similar question as Till mentioned, if we do not fail Flink pods when the restart strategy is exhausted, it might be hard to monitor such failures. Today I get alerts if the k8s pods are restarted or in crash loop, but if this will no longer

Re: Behavior for flink job running on K8S failed after restart strategy exhausted

2020-08-05 Thread Till Rohrmann
You are right Yang Wang. Thanks for creating this issue. Cheers, Till On Wed, Aug 5, 2020 at 1:33 PM Yang Wang wrote: > Actually, the application status shows in YARN web UI is not determined by > the jobmanager process exit code. > Instead, we use "resourceManagerClient.unregisterApplicationM

Re: Behavior for flink job running on K8S failed after restart strategy exhausted

2020-08-05 Thread Yang Wang
Actually, the application status shows in YARN web UI is not determined by the jobmanager process exit code. Instead, we use "resourceManagerClient.unregisterApplicationMaster" to control the final status of YARN application. So although jobmanager exit with zero code, it still could show failed st

Re: Two Queries and a Kafka Topic

2020-08-05 Thread Theo Diefenthal
Hi Marco, In general, I see three solutions here you could approach: 1. Use the StateProcessorAPI: You can run a program with the stateProcessorAPI that loads the data from JDBC and stores it into a Flink SavePoint. Afterwards, you start your streaming job from that savepoint which will load

Re: The bytecode of the class does not match the source code

2020-08-05 Thread Chesnay Schepler
Well of course these differ; on the left you have the decompiled bytecode, on the right the original source. If these were the same you wouldn't need source jars. On 05/08/2020 12:20, 魏子涵 wrote: I'm sure the two versions match up. Following is the pic comparing codes in IDEA https://img-blog.

Re: The bytecode of the class does not match the source code

2020-08-05 Thread Jake
hi 魏子涵 Idea decompiled code is not match java source code, you can download java source code in idea. /Volumes/work/maven_repository/org/apache/flink/flink-runtime_2.11/1.10.1/flink-runtime_2.11-1.10.1-sources.jar!/org/apache/flink/runtime/jobmaster/slotpool/SlotPoolImpl.java Jake > On Aug 5

Re:Re: The bytecode of the class does not match the source code

2020-08-05 Thread 魏子涵
I'm sure the two versions match up. Following is the pic comparing codes in IDEA https://img-blog.csdnimg.cn/20200805180232929.png?x-oss-process=image/watermark,type_ZmFuZ3poZW5naGVpdGk,shadow_10,text_aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L0NTRE5yaG1t,size_16,color_FF,t_70 At 2020-08-05 1

Re: [DISCUSS] FLIP-133: Rework PyFlink Documentation

2020-08-05 Thread Wei Zhong
Hi Xingbo, Thanks for your information. I think the PySpark's documentation redesigning deserves our attention. It seems that the Spark community has also begun to treat the user experience of Python documentation more seriously. We can continue to pay attention to the discussion and progress

Re: The bytecode of the class does not match the source code

2020-08-05 Thread Chesnay Schepler
Please make sure you have loaded the correct source jar, and aren't by chance still using the 1.11.0 source jar. On 05/08/2020 09:57, 魏子涵 wrote: Hi, everyone:       I found  the 【org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl】 class in【flink-runtime_2.11-1.11.1.jar】does not match th

Re: Kafka transaction error lead to data loss under end to end exact-once

2020-08-05 Thread Khachatryan Roman
Hi Lu, AFAIK, it's not going to be fixed. As you mentioned in the first email, Kafka should be configured so that it's transaction timeout is less than your max checkpoint duration. However, you should not only change transaction.timeout.ms in producer but also transaction.max.timeout.ms on your

The bytecode of the class does not match the source code

2020-08-05 Thread 魏子涵
Hi, everyone: I found the 【org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl】 class in【flink-runtime_2.11-1.11.1.jar】does not match the source code. Is it a problem we need to fix(if it is, what should we do)? or just let it go?

Re: Behavior for flink job running on K8S failed after restart strategy exhausted

2020-08-05 Thread Till Rohrmann
Yes for the other deployments it is not a problem. A reason why people preferred non-zero exit codes in case of FAILED jobs is that this is easier to monitor than having to take a look at the actual job result. Moreover, in the YARN web UI the application shows as failed if I am not mistaken. Howev

Re: Behavior for flink job running on K8S failed after restart strategy exhausted

2020-08-05 Thread Yang Wang
Hi Eleanore, Yes, I suggest to use Job to replace Deployment. It could be used to run jobmanager one time and finish after a successful/failed completion. However, using Job still could not solve your problem completely. Just as Till said, When a job exhausts the restart strategy, the jobmanager

Re: [DISCUSS] FLIP-133: Rework PyFlink Documentation

2020-08-05 Thread Xingbo Huang
Hi, I found that the spark community is also working on redesigning pyspark documentation[1] recently. Maybe we can compare the difference between our document structure and its document structure. [1] https://issues.apache.org/jira/browse/SPARK-31851 http://apache-spark-developers-list.1001551.n