accuracy validation of streaming pipeline

2022-05-20 Thread vtygoss
Hi community! I'm working on migrating from full-data-pipeline(with spark) to incremental-data-pipeline(with flink cdc), and i met a problem about accuracy validation between pipeline based flink and spark. For bounded data, it's simple to validate the two result sets are consitent or not.

Re: Checkpointing - Job Manager S3 Head Requests are 404s

2022-05-20 Thread David Anderson
One more thing to be aware of: the Presto S3 implementation has issues too. See FLINK-24392 [1]. This means that there's no ideal solution, and in some cases it is preferable to use Hadoop, perhaps in combination with increasing the value of state.storage.fs.memory-threshold [2] in order to decreas

[DISCUSS] Update Scala 2.12.7 to 2.12.15

2022-05-20 Thread Martijn Visser
Hi everyone, I would like to get some opinions from our Scala users, therefore I'm also looping in the user mailing list. Flink currently is tied to Scala 2.12.7. As outlined in FLINK-12461 [1] there is a binary incompatibility introduced by Scala 2.12.8, which currently limits Flink from upgradi

Re: [DISCUSS] Update Scala 2.12.7 to 2.12.15

2022-05-20 Thread Yuval Itzchakov
As a Scala API user I'd prefer a breaking change to get all the benefits of the latest Scala minor versions. On Fri, May 20, 2022, 11:37 Martijn Visser wrote: > Hi everyone, > > I would like to get some opinions from our Scala users, therefore I'm also > looping in the user mailing list. > > Fli

Fwd: [DISCUSS] Update Scala 2.12.7 to 2.12.15

2022-05-20 Thread Ran Tao
-- Forwarded message - 发件人: Ran Tao Date: 2022年5月20日周五 18:23 Subject: Re: [DISCUSS] Update Scala 2.12.7 to 2.12.15 To: Hi, Martijn. Even if we upgrade scala to 2.12.15 to support Java17, it just fix the compilation of FLINK-25000

Re: Window aggregation fails after upgrading to Flink 1.15

2022-05-20 Thread Pouria Pirzadeh
Thanks for help; I digged into it and the issue turned out to be the version of Janino: flink-table has pinned Janino's version to 3.011 as that is the version Calcite is using; However due to other dependencies in my project, at runtime application code had ended up using a newer version of Janino

Python Job Type Support in Flink Kubernetes Operator

2022-05-20 Thread Jeesmon Jacob
Hi there, Is there a plan to support Python Job Type in Flink Kubernetes Operator? If yes, any ETA? According to this previous operator overview only Java jobs are supported in operator. This page was recently modified to remove the features table. https://github.com/apache/flink-kubernetes-oper

Flink SQL Dates Between and Parallelism

2022-05-20 Thread Jason Politis
Good evening all, We are working on a project where a few queries that are joining based on dates from table A are between dates from table B. Something like: SELECT A.ID, B.NAME FROM A, B WHERE A.DATE BETWEEN B.START_DATE AND B.END_DATE; Both A and B are topics in Kafka with 5 partitions. Doi

Re: Kinesis Sink - Data being received with intermittent breaks

2022-05-20 Thread Zain Haider Nemati
Hi Danny, I looked into it in a bit more thorough detail, the bottleneck seems to be the transform function which is at 100% and causing back pressuring. Im looking into that. Thanks for your help, much appreciated ! On Fri, May 20, 2022 at 1:24 AM Ber, Jeremy wrote: > Hi Zain— > > > > Are you s

Re: Job Logs - Yarn Application Mode

2022-05-20 Thread Zain Haider Nemati
Hi, Thanks for your response folks. Is it possible to access logs via flink UI in yarn application mode? Similar to how we can access them in standalone mode On Fri, May 20, 2022 at 11:06 AM Shengkai Fang wrote: > Thanks for Biao's explanation. > > Best, > Shengkai > > Biao Geng 于2022年5月20日周五 1

Example of Beam Go job that uses Flink Kubernetes operator

2022-05-20 Thread Red Daly
Hi, This is a request for documentation to walk through using the Flink Kubernetes operator with the Go Beam SDK. I will plan to update the thread as I investigate. I hope this feedback is useful. A few observations: 1) The official "Quick Start guide

Re: Flink SQL Dates Between and Parallelism

2022-05-20 Thread Yuval Itzchakov
Hi Jason, When using interval joins, Flink can't parallelize the execution as the join key (semantically) is even time, thus all events must fall into the same partition for Flink to be able to lookup events from the two streams. See the IntervalJoinOperator ( https://github.com/apache/flink/blob/

Re: Example of Beam Go job that uses Flink Kubernetes operator

2022-05-20 Thread Red Daly
I created an example project with my attempts to get something working: https://github.com/gonzojive/beam-go-k8s The good news is I got something working with Beam, Go, Bazel, Flink, and Kubernetes (minikube). The working version doesn't use the operator and only runs in a single pod. I would st

Application mode deployment through API call

2022-05-20 Thread Leon Xu
Hi Flink community, I am looking to deploy my flink job through *Application Mode *in my Java program. Ideally I'd like my java code to just call an API to achieve this. Does flink provide a rest API to support this? I don't seem to find any documentation or code on that. If I need to build this o

Json Deserialize in DataStream API with array length not fixed

2022-05-20 Thread Zain Haider Nemati
Hi Folks, I have data coming in this format: { “data”: { “oid__id”: “61de4f26f01131783f162453”, “array_coordinates”:“[ { \“speed\” : \“xxx\“, \“accuracy\” : \“xxx\“, \“bearing\” : \“xxx\“, \“altitude\” : \“xxx\“, \“longitude\” : \“xxx\“, \“latitude\” : \“xxx\“, \“dateTimeS