Re: Flink POJO documentation for primitive boolean state variables

2022-01-27 Thread Makhanchan Pandey
Hi all, Just wanted to follow up on this again :) Thanks in advance. Regards, Mac Pandey On Tue, Jan 25, 2022 at 12:59 PM Makhanchan Pandey < makhanchanpan...@gmail.com> wrote: > Hi all, > > For Flink to treat a model class as a special POJO type, these are the > documented conditions: > http

Duplicate job submission error

2022-01-27 Thread Parag Somani
Hello All, While deploying on our one of environment, we encountered crashloopback of job manager pod. Env: K8s Flink: 1.14.2 Could you suggest, how can we troubleshoot this and possible handling of this? exception snipper as follows: 2022-01-27 06:58:07.326 ERROR 44 --- [lt-dispatcher-4] c.b.

Does FileSource download all remote files for generating splits

2022-01-27 Thread Meghajit Mazumdar
Hello, I had a question about the FileSource in Flink 1.14 . Considering FileSource is set to read from a remote GCS URL, I could read and understand that the FileEnumerator is actu

Re: Inaccurate checkpoint trigger time

2022-01-27 Thread Paul Lam
Hi Yun, Sorry for the late reply. I finally found some time to investigate this problem further. I upgraded the job to 1.14.0, but it’s still the same. I’ve checked the debug logs, and I found that Zookeeper notifies watched event of checkpoint id changes very late [1]. Each time a checkpoint f

Determinism of interval joins

2022-01-27 Thread Alexis Sarda-Espinosa
Hi everyone, I'm seeing a lack of determinism in unit tests when using an interval join. I am testing with both Flink 1.11.3 and 1.14.3 and the relevant bits of my pipeline look a bit like this: keySelector1 = ... keySelector2 = ... rightStream = stream1 .flatMap(...) .keyBy(keySelector1)

Re: Determinism of interval joins

2022-01-27 Thread Alexis Sarda-Espinosa
I'm not sure if the issue in [1] is relevant since it mentions the Table API, but it could be. Since stream1 and stream2 in my example have a long chain of operators behind, I presume they might "run" at very different paces. Oh and, in the context of my unit tests, watermarks should be determin

Re: Flink POJO documentation for primitive boolean state variables

2022-01-27 Thread Alexander Fedulov
Hi Mac, I just verified that objects with isXXX methods indeed will be interpreted as POJOs. Would you be willing to contribute a documentation update? Here are some guidelines: [1]. [1] https://flink.apache.org/contributing/contribute-documentation.html Thanks, Alexander Fedulov On Thu,

Aggregation support in the Table API

2022-01-27 Thread Pouria Pirzadeh
I am using the Table api in Java to write queries with grouping/aggregation. The aggregations may use built-in functions or user defined aggregate functions. Therefore I am using the aggregate() method on a WindowGroupedTable. table.window(...) .groupBy(...) .aggregate(Expressions.call("

Re: Flink-ML: Sink model data in online training

2022-01-27 Thread Zhipeng Zhang
Hi thekingofcity, Thanks for your interest! Unfortunately we don't have an example for online learning for now. We are working on an online machine learning example. Hopefully it will be added here [1] in the next three weeks. [1] https://github.com/apache/flink-ml thekingofcity 于2022年1月26日周三

Re: Aggregation support in the Table API

2022-01-27 Thread Caizhi Weng
Hi! You can directly use .select() to call an aggregate function (either built-in or user-defined). See [1] for an example on how to use call() expression to call an user-defined aggregate function. [1] https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/functions/udfs/#aggregate-

Re: Does FileSource download all remote files for generating splits

2022-01-27 Thread Caizhi Weng
Hi! FileEnumerator never reads the actual content of a file. FileEnumerator lives in job managers and it only reads the necessary meta-data of the file (for example how large is the file) so that it can split the work across all task managers. Corresponding file readers, in the other hand, lives i

Re: Questions about checkpoint retention

2022-01-27 Thread Caizhi Weng
Hi! So you'd like to remove all checkpoints after a savepoint is completed? Could you elaborate more on why you'd like to retain 10 checkpoints? For most of the cases retaining one checkpoint is enough. Also you mentioned that you're keeping 10 checkpoints for each version of your app. For each v

Re: Unbounded streaming with table API and large json as one of the columns

2022-01-27 Thread Caizhi Weng
Hi! This job will work as long as your SQL statement is valid. Did you meet some difficulties? Or what is your concern? A record of 100K is sort of large, but I've seen quite a lot of jobs with such record size so it is OK. HG 于2022年1月27日周四 02:57写道: > Hi, > > I need to calculate elapsed times b

Re: Does FileSource download all remote files for generating splits

2022-01-27 Thread Meghajit Mazumdar
Thanks Caizhi. This clarifies. On Fri, Jan 28, 2022 at 12:06 PM Caizhi Weng wrote: > Hi! > > FileEnumerator never reads the actual content of a file. FileEnumerator > lives in job managers and it only reads the necessary meta-data of the file > (for example how large is the file) so that it can