Re: Kafka source, committing and retries

2020-08-04 Thread jackphelanbroadcom
Hi Till, Till: Could you please give me a bit more context? Are you asking how Flink realizes exactly once processing guarantees with its connectors? Thank you very much for your response! Flink has a lot of really cool ideas :) I did read more about connectors and I think I can elaborate. The prob

Re: Two Queries and a Kafka Topic

2020-08-04 Thread Marco Villalobos
Hi Leonard, First, Thank you. I am currently trying to restrict my solution to Apache Flink 1.10 because its the current version supported by Amazon EMR. i am not ready to change our operational environment to solve this. Second, I am using the DataStream API. The Kafka Topic is not in a table

Re: Two Queries and a Kafka Topic

2020-08-04 Thread Leonard Xu
Hi, Marco > If I need SQL Query One and SQL Query Two to happen just one time, Looks like you want to reuse this kafka table in one job, It’s supported to execute multiple query in one sql job in Flink 1.11. You can use `StatementSet`[1] to add SQL Query one and SQL query Two in a single SQL j

Re: Kafka transaction error lead to data loss under end to end exact-once

2020-08-04 Thread Lu Niu
Hi, Khachatryan Thank you for the reply. Is that a problem that can be fixed? If so, is the fix on roadmap? Thanks! Best Lu On Tue, Aug 4, 2020 at 1:24 PM Khachatryan Roman < khachatryan.ro...@gmail.com> wrote: > Hi Lu, > > Yes, this error indicates data loss (unless there were no records in th

Two Queries and a Kafka Topic

2020-08-04 Thread Marco Villalobos
Lets say that I have: SQL Query One from data in PostgreSQL (200K records). SQL Query Two from data in PostgreSQL (1000 records). and Kafka Topic One. Let's also say that main data from this Flink job arrives in Kafka Topic One. If I need SQL Query One and SQL Query Two to happen just one time,

Re: Kafka transaction error lead to data loss under end to end exact-once

2020-08-04 Thread Khachatryan Roman
Hi Lu, Yes, this error indicates data loss (unless there were no records in the transactions). Regards, Roman On Mon, Aug 3, 2020 at 9:14 PM Lu Niu wrote: > Hi, > > We are using end to end exact-once flink + kafka and encountered belowing > exception which usually came after checkpoint failur

Re: Status of a job when a kafka source dies

2020-08-04 Thread Nick Bendtner
Hi, I don't observe this behaviour though, we use flink 1.7.2 . I stopped kafka and zookeeper on all broker nodes. On the flink side, I see the messages in the log ( data is obfuscated) . There are no error logs. The kafka consumer properties are 1. "bootstrap.servers" 2. "zookeeper.connect 3. "

Re: [DISCUSS] FLIP-133: Rework PyFlink Documentation

2020-08-04 Thread David Anderson
I'm delighted to see energy going into improving the documentation. With the current documentation, I get a lot of questions that I believe reflect two fundamental problems with what we currently provide: (1) We have a lot of contextual information in our heads about how Flink works, and we are a

Re: Behavior for flink job running on K8S failed after restart strategy exhausted

2020-08-04 Thread Eleanore Jin
Hi Yang & Till, Thanks for your prompt reply! Yang, regarding your question, I am actually not using k8s job, as I put my app.jar and its dependencies under flink's lib directory. I have 1 k8s deployment for job manager, and 1 k8s deployment for task manager, and 1 k8s service for job manager. A

Re: Metrics for number of events in a timeframe

2020-08-04 Thread Chesnay Schepler
No, because Flink counters are mapped to Prometheus gauges. On 04/08/2020 15:52, Manish G wrote: That documentation states: |delta| should only be used with gauges. Would that cause an issue as we are using counter. With regards On Tue, Aug 4, 2020 at 7:12 PM Chesnay Schepler

Re: Metrics for number of events in a timeframe

2020-08-04 Thread Manish G
That documentation states: delta should only be used with gauges. Would that cause an issue as we are using counter. With regards On Tue, Aug 4, 2020 at 7:12 PM Chesnay Schepler wrote: > meter * timeframe (in seconds) is the simplest option, although it will > not be that accurate due to the

Re: Metrics for number of events in a timeframe

2020-08-04 Thread Chesnay Schepler
meter * timeframe (in seconds) is the simplest option, although it will not be that accurate due to the flattening of spikes. You'd get the best results by using a time-series database, and calculating the difference between the current count and one 5 minutes ago. An example for Prometheus: h

Re: Unexpected unnamed sink in SQL job

2020-08-04 Thread godfrey he
I think we assign a meaningful name to sink Transformation like other Transformations in StreamExecLegacySink/BatchExecLegacySink. Paul Lam 于2020年8月4日周二 下午5:25写道: > Hi Jingsong, > > Thanks for your input. Now I understand the design. > > I think in my case the StreamingFileCommitter is not chai

Re: Support for Event time clock specific to each stream in parallel streams

2020-08-04 Thread Sush Bankapura
Hi David, Thanks for the quick reply You are indeed right, "Key based watermarking support" in Flink would be most useful I gather this cannot be implemented in Flink as per this mail chain - http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Maintaining-watermarks-per-key-ins

Re: getting error after upgrade Flink 1.11.1

2020-08-04 Thread Kostas Kloudas
Hi Dasraj, You are right. On your previous email I did not pay attention that you migrated from 1.9. Since 1.9 the ClusterClient has changed significantly as it is not annotated as @Public API. I am not sure how easy it is to use the old logic in your settings. You could try copying the old code

Metrics for number of events in a timeframe

2020-08-04 Thread Manish G
Hi, Meter gives throughput while counter gives number of events since system started. What if I have to find number of events processed in say, last 5 minutes. Is there an inbuilt metrics for it OR do I need to do meter * timeframe? With regards

Re: getting error after upgrade Flink 1.11.1

2020-08-04 Thread dasraj
Hi Kostas, ClusterClient class is not our custom class it is in org.apache.flink.client.program package. i can see defination of ClusterClient has changed from 1.9.2 to 1.11.1 https://ci.apache.org/projects/flink/flink-docs-release-1.9/api/java/org/apache/flink/client/program/ClusterClient.html <

Re: [DISCUSS] FLIP-133: Rework PyFlink Documentation

2020-08-04 Thread jincheng sun
Hi Seth and David, I'm very happy to have your reply and suggestions. I would like to share my thoughts here: The main motivation we want to refactor the PyFlink doc is that we want to make sure that the Python users could find all they want starting from the PyFlink documentation mainpage. That’

Re: In JobGraph, does an IntermediateDataSet contain multiple JobEdges?

2020-08-04 Thread Zhu Zhu
What you obsessed is right. At the moment, one IntermediateDataSet can have one only consumer job edge in production code path. Thanks, Zhu Zhu yuehan1 于2020年8月4日周二 下午5:14写道: > IntermediateDataSet.java has a JobEdge list named consumers. > In which case, an IntermediateDataSet contains multiple

Re: getting error after upgrade Flink 1.11.1

2020-08-04 Thread Kostas Kloudas
Hi Dasraj, Could you please specify where is the clusterClient.run() method and how does it submit a job to a cluster? Is the clusterClient your custom code? Any details will help us pin down the problem. One thing that is worth looking at is the release-notes of 1.11 [1]. There you will find all

Re: Behavior for flink job running on K8S failed after restart strategy exhausted

2020-08-04 Thread Yang Wang
@Till Rohrmann In native mode, when a Flink application terminates with FAILED state, all the resources will be cleaned up. However, in standalone mode, I agree with you that we need to rethink the exit code of Flink. When a job exhausts the restart strategy, we should terminate the pod and do no

Re: Unexpected unnamed sink in SQL job

2020-08-04 Thread Paul Lam
Hi Jingsong, Thanks for your input. Now I understand the design. I think in my case the StreamingFileCommitter is not chained because its upstream operator is not parallelism 1. BTW, it’d be better if it has a more meaningful operator name. Best, Paul Lam > 2020年8月4日 17:11,Jingsong Li 写道: >

In JobGraph, does an IntermediateDataSet contain multiple JobEdges?

2020-08-04 Thread yuehan1
IntermediateDataSet.java has a JobEdge list named consumers. In which case, an IntermediateDataSet contains multiple JobEdge. I read the code, it seems that an IntermediateDataSet can only have one JobEdge // IntermediateDataSet.java private final List consumers = new ArrayList(); thanks for y

Re: Unexpected unnamed sink in SQL job

2020-08-04 Thread Jingsong Li
Hi Paul, It is a meaningless sink. This is because for the sake of flexibility, the `StreamingFileCommitter` is implemented as a `StreamOperator` rather than a `SinkFunction`. But `StreamTableSink` requires a `SinkFunction`, so we give a meaningless `DiscardingSink` to it. And this sink should b

Unexpected unnamed sink in SQL job

2020-08-04 Thread Paul Lam
Hi, I noticed that a simple SQL like 'insert into hive_parquet_table select … from some_kafka_table' will generates an additional operator called ’Sink: Unnamed’ with parallelism 1. I wonder if it’s by design? And what’s the functionality of this operator? Thanks in advance! Env: - Flink 1.

Re: Flink streaming job logging reserves space

2020-08-04 Thread Maxim Parkachov
Hi Yang, Thanks for your advice, now I have a good reason to upgrade to 1.11. Regards, Maxim. On Tue, Aug 4, 2020 at 9:39 AM Yang Wang wrote: > AFAIK, there is no way to roll the *.out/err files except we hijack the > stdout/stderr in Flink code. However, it is a temporary hack. > > A good way

Re: Behavior for flink job running on K8S failed after restart strategy exhausted

2020-08-04 Thread Till Rohrmann
@Yang Wang I believe that we should rethink the exit codes of Flink. In general you want K8s to restart a failed Flink process. Hence, an application which terminates in state FAILED should not return a non-zero exit code because it is a valid termination state. Cheers, Till On Tue, Aug 4, 2020

Re: Flink CPU load metrics in K8s

2020-08-04 Thread Roman Grebennikov
Hi, JVM.CPU.Load is just a wrapper (MetricUtils.instantiateCPUMetrics) on top of OperatingSystemMXBean.getProcessCpuLoad (see https://docs.oracle.com/javase/7/docs/jre/api/management/extension/com/sun/management/OperatingSystemMXBean.html#getProcessCpuLoad()) Usually it looks weird if you have

Re: Flink streaming job logging reserves space

2020-08-04 Thread Yang Wang
AFAIK, there is no way to roll the *.out/err files except we hijack the stdout/stderr in Flink code. However, it is a temporary hack. A good way is to write your logs to other separate files that could roll via log4j. If you want to access them in the Flink webUI, upgrade to the 1.11 version. Then

getting error after upgrade Flink 1.11.1

2020-08-04 Thread dasraj
Hi, Currently we are using Flink 1.9.2 in our project and we are submitting Job from jar. below is that code we are using currently for jobsubmission: clusterClient.setDetached(true); clusterClient.run(packagedProgram, Integer.valueOf(params.get(Constants.PARAMS_PARALLELISM))); After upgrading i