date:20200804

Re: Kafka source, committing and retries

2020-08-04 Thread jackphelanbroadcom

Hi Till, Till: Could you please give me a bit more context? Are you asking how Flink realizes exactly once processing guarantees with its connectors? Thank you very much for your response! Flink has a lot of really cool ideas :) I did read more about connectors and I think I can elaborate. The prob

Re: Two Queries and a Kafka Topic

2020-08-04 Thread Marco Villalobos

Hi Leonard, First, Thank you. I am currently trying to restrict my solution to Apache Flink 1.10 because its the current version supported by Amazon EMR. i am not ready to change our operational environment to solve this. Second, I am using the DataStream API. The Kafka Topic is not in a table

Re: Two Queries and a Kafka Topic

2020-08-04 Thread Leonard Xu

Hi, Marco > If I need SQL Query One and SQL Query Two to happen just one time, Looks like you want to reuse this kafka table in one job, It’s supported to execute multiple query in one sql job in Flink 1.11. You can use `StatementSet`[1] to add SQL Query one and SQL query Two in a single SQL j

Re: Kafka transaction error lead to data loss under end to end exact-once

2020-08-04 Thread Lu Niu

Hi, Khachatryan Thank you for the reply. Is that a problem that can be fixed? If so, is the fix on roadmap? Thanks! Best Lu On Tue, Aug 4, 2020 at 1:24 PM Khachatryan Roman < khachatryan.ro...@gmail.com> wrote: > Hi Lu, > > Yes, this error indicates data loss (unless there were no records in th

Two Queries and a Kafka Topic

2020-08-04 Thread Marco Villalobos

Lets say that I have: SQL Query One from data in PostgreSQL (200K records). SQL Query Two from data in PostgreSQL (1000 records). and Kafka Topic One. Let's also say that main data from this Flink job arrives in Kafka Topic One. If I need SQL Query One and SQL Query Two to happen just one time,

Re: Kafka transaction error lead to data loss under end to end exact-once

2020-08-04 Thread Khachatryan Roman

Hi Lu, Yes, this error indicates data loss (unless there were no records in the transactions). Regards, Roman On Mon, Aug 3, 2020 at 9:14 PM Lu Niu wrote: > Hi, > > We are using end to end exact-once flink + kafka and encountered belowing > exception which usually came after checkpoint failur

Re: Status of a job when a kafka source dies

2020-08-04 Thread Nick Bendtner

Hi, I don't observe this behaviour though, we use flink 1.7.2 . I stopped kafka and zookeeper on all broker nodes. On the flink side, I see the messages in the log ( data is obfuscated) . There are no error logs. The kafka consumer properties are 1. "bootstrap.servers" 2. "zookeeper.connect 3. "

Re: [DISCUSS] FLIP-133: Rework PyFlink Documentation

2020-08-04 Thread David Anderson

I'm delighted to see energy going into improving the documentation. With the current documentation, I get a lot of questions that I believe reflect two fundamental problems with what we currently provide: (1) We have a lot of contextual information in our heads about how Flink works, and we are a

Re: Behavior for flink job running on K8S failed after restart strategy exhausted

2020-08-04 Thread Eleanore Jin

Hi Yang & Till, Thanks for your prompt reply! Yang, regarding your question, I am actually not using k8s job, as I put my app.jar and its dependencies under flink's lib directory. I have 1 k8s deployment for job manager, and 1 k8s deployment for task manager, and 1 k8s service for job manager. A

Re: Metrics for number of events in a timeframe

2020-08-04 Thread Chesnay Schepler

No, because Flink counters are mapped to Prometheus gauges. On 04/08/2020 15:52, Manish G wrote: That documentation states: |delta| should only be used with gauges. Would that cause an issue as we are using counter. With regards On Tue, Aug 4, 2020 at 7:12 PM Chesnay Schepler

Re: Metrics for number of events in a timeframe

2020-08-04 Thread Manish G

That documentation states: delta should only be used with gauges. Would that cause an issue as we are using counter. With regards On Tue, Aug 4, 2020 at 7:12 PM Chesnay Schepler wrote: > meter * timeframe (in seconds) is the simplest option, although it will > not be that accurate due to the

Re: Metrics for number of events in a timeframe

2020-08-04 Thread Chesnay Schepler

meter * timeframe (in seconds) is the simplest option, although it will not be that accurate due to the flattening of spikes. You'd get the best results by using a time-series database, and calculating the difference between the current count and one 5 minutes ago. An example for Prometheus: h

Re: Unexpected unnamed sink in SQL job

2020-08-04 Thread godfrey he

I think we assign a meaningful name to sink Transformation like other Transformations in StreamExecLegacySink/BatchExecLegacySink. Paul Lam 于2020年8月4日周二下午5:25写道： > Hi Jingsong, > > Thanks for your input. Now I understand the design. > > I think in my case the StreamingFileCommitter is not chai

Re: Support for Event time clock specific to each stream in parallel streams

2020-08-04 Thread Sush Bankapura

Hi David, Thanks for the quick reply You are indeed right, "Key based watermarking support" in Flink would be most useful I gather this cannot be implemented in Flink as per this mail chain - http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Maintaining-watermarks-per-key-ins

Re: getting error after upgrade Flink 1.11.1

2020-08-04 Thread Kostas Kloudas

Hi Dasraj, You are right. On your previous email I did not pay attention that you migrated from 1.9. Since 1.9 the ClusterClient has changed significantly as it is not annotated as @Public API. I am not sure how easy it is to use the old logic in your settings. You could try copying the old code

Metrics for number of events in a timeframe

2020-08-04 Thread Manish G

Hi, Meter gives throughput while counter gives number of events since system started. What if I have to find number of events processed in say, last 5 minutes. Is there an inbuilt metrics for it OR do I need to do meter * timeframe? With regards

Re: getting error after upgrade Flink 1.11.1

2020-08-04 Thread dasraj

Hi Kostas, ClusterClient class is not our custom class it is in org.apache.flink.client.program package. i can see defination of ClusterClient has changed from 1.9.2 to 1.11.1 https://ci.apache.org/projects/flink/flink-docs-release-1.9/api/java/org/apache/flink/client/program/ClusterClient.html <

Re: [DISCUSS] FLIP-133: Rework PyFlink Documentation

2020-08-04 Thread jincheng sun

Hi Seth and David, I'm very happy to have your reply and suggestions. I would like to share my thoughts here: The main motivation we want to refactor the PyFlink doc is that we want to make sure that the Python users could find all they want starting from the PyFlink documentation mainpage. That’

Re: In JobGraph, does an IntermediateDataSet contain multiple JobEdges?

2020-08-04 Thread Zhu Zhu

What you obsessed is right. At the moment, one IntermediateDataSet can have one only consumer job edge in production code path. Thanks, Zhu Zhu yuehan1 于2020年8月4日周二下午5:14写道： > IntermediateDataSet.java has a JobEdge list named consumers. > In which case, an IntermediateDataSet contains multiple

Re: getting error after upgrade Flink 1.11.1

2020-08-04 Thread Kostas Kloudas

Hi Dasraj, Could you please specify where is the clusterClient.run() method and how does it submit a job to a cluster? Is the clusterClient your custom code? Any details will help us pin down the problem. One thing that is worth looking at is the release-notes of 1.11 [1]. There you will find all

Re: Behavior for flink job running on K8S failed after restart strategy exhausted

2020-08-04 Thread Yang Wang

@Till Rohrmann In native mode, when a Flink application terminates with FAILED state, all the resources will be cleaned up. However, in standalone mode, I agree with you that we need to rethink the exit code of Flink. When a job exhausts the restart strategy, we should terminate the pod and do no

Re: Unexpected unnamed sink in SQL job

2020-08-04 Thread Paul Lam

Hi Jingsong, Thanks for your input. Now I understand the design. I think in my case the StreamingFileCommitter is not chained because its upstream operator is not parallelism 1. BTW, it’d be better if it has a more meaningful operator name. Best, Paul Lam > 2020年8月4日 17:11，Jingsong Li 写道： >

In JobGraph, does an IntermediateDataSet contain multiple JobEdges?

2020-08-04 Thread yuehan1

IntermediateDataSet.java has a JobEdge list named consumers. In which case, an IntermediateDataSet contains multiple JobEdge. I read the code, it seems that an IntermediateDataSet can only have one JobEdge // IntermediateDataSet.java private final List consumers = new ArrayList(); thanks for y

Re: Unexpected unnamed sink in SQL job

2020-08-04 Thread Jingsong Li

Hi Paul, It is a meaningless sink. This is because for the sake of flexibility, the `StreamingFileCommitter` is implemented as a `StreamOperator` rather than a `SinkFunction`. But `StreamTableSink` requires a `SinkFunction`, so we give a meaningless `DiscardingSink` to it. And this sink should b

Unexpected unnamed sink in SQL job

2020-08-04 Thread Paul Lam

Hi, I noticed that a simple SQL like 'insert into hive_parquet_table select … from some_kafka_table' will generates an additional operator called ’Sink: Unnamed’ with parallelism 1. I wonder if it’s by design? And what’s the functionality of this operator? Thanks in advance! Env: - Flink 1.

Re: Flink streaming job logging reserves space

2020-08-04 Thread Maxim Parkachov

Hi Yang, Thanks for your advice, now I have a good reason to upgrade to 1.11. Regards, Maxim. On Tue, Aug 4, 2020 at 9:39 AM Yang Wang wrote: > AFAIK, there is no way to roll the *.out/err files except we hijack the > stdout/stderr in Flink code. However, it is a temporary hack. > > A good way

Re: Behavior for flink job running on K8S failed after restart strategy exhausted

2020-08-04 Thread Till Rohrmann

@Yang Wang I believe that we should rethink the exit codes of Flink. In general you want K8s to restart a failed Flink process. Hence, an application which terminates in state FAILED should not return a non-zero exit code because it is a valid termination state. Cheers, Till On Tue, Aug 4, 2020

Re: Flink CPU load metrics in K8s

2020-08-04 Thread Roman Grebennikov

Hi, JVM.CPU.Load is just a wrapper (MetricUtils.instantiateCPUMetrics) on top of OperatingSystemMXBean.getProcessCpuLoad (see https://docs.oracle.com/javase/7/docs/jre/api/management/extension/com/sun/management/OperatingSystemMXBean.html#getProcessCpuLoad()) Usually it looks weird if you have

Re: Flink streaming job logging reserves space

2020-08-04 Thread Yang Wang

AFAIK, there is no way to roll the *.out/err files except we hijack the stdout/stderr in Flink code. However, it is a temporary hack. A good way is to write your logs to other separate files that could roll via log4j. If you want to access them in the Flink webUI, upgrade to the 1.11 version. Then

getting error after upgrade Flink 1.11.1

2020-08-04 Thread dasraj

Hi, Currently we are using Flink 1.9.2 in our project and we are submitting Job from jar. below is that code we are using currently for jobsubmission: clusterClient.setDetached(true); clusterClient.run(packagedProgram, Integer.valueOf(params.get(Constants.PARAMS_PARALLELISM))); After upgrading i

Re: Kafka source, committing and retries

Re: Two Queries and a Kafka Topic

Re: Two Queries and a Kafka Topic

Re: Kafka transaction error lead to data loss under end to end exact-once

Two Queries and a Kafka Topic

Re: Kafka transaction error lead to data loss under end to end exact-once

Re: Status of a job when a kafka source dies

Re: [DISCUSS] FLIP-133: Rework PyFlink Documentation

Re: Behavior for flink job running on K8S failed after restart strategy exhausted

Re: Metrics for number of events in a timeframe

Re: Metrics for number of events in a timeframe

Re: Metrics for number of events in a timeframe

Re: Unexpected unnamed sink in SQL job

Re: Support for Event time clock specific to each stream in parallel streams

Re: getting error after upgrade Flink 1.11.1

Metrics for number of events in a timeframe

Re: getting error after upgrade Flink 1.11.1

Re: [DISCUSS] FLIP-133: Rework PyFlink Documentation

Re: In JobGraph, does an IntermediateDataSet contain multiple JobEdges?

Re: getting error after upgrade Flink 1.11.1

Re: Behavior for flink job running on K8S failed after restart strategy exhausted

Re: Unexpected unnamed sink in SQL job

In JobGraph, does an IntermediateDataSet contain multiple JobEdges?

Re: Unexpected unnamed sink in SQL job

Unexpected unnamed sink in SQL job

Re: Flink streaming job logging reserves space

Re: Behavior for flink job running on K8S failed after restart strategy exhausted

Re: Flink CPU load metrics in K8s

Re: Flink streaming job logging reserves space

getting error after upgrade Flink 1.11.1

30 matches

Site Navigation

Mail list logo

Footer information