Re: Caching

2020-11-26 Thread Navneeth Krishnan
Thanks Dongwon. It was extremely helpful. I didn't quite understand how async io can be used here. It would be great if you can share some info on it. Also how are you propagating any changes to values? Regards, Navneeth On Thu, Nov 26, 2020 at 6:26 AM Dongwon Kim wrote: > Oops, I forgot to me

Re: FlinkSQL kafka->dedup->kafka

2020-11-26 Thread Leonard Xu
Hi, Laurent > Basically, I need to deduplicate, but only keeping in the deduplication state > the latest value of the changed column to compare with. While here it seems > to keep all previous values… You can use ` ORDER BY proctime() DESC` in the deduplication query, it will keep last row,

Re: PyFlink Table API and UDF Limitations

2020-11-26 Thread Xingbo Huang
Hi Niklas, Thanks a lot for supporting PyFlink. In fact, your requirement for multiple input and multiple output is essentially Table Aggregation Functions[1]. Although PyFlink does not support it yet, we have listed it in the release 1.13 plan. In addition, row-based operations[2] that are very u

Duplication error on Kafka Connector Libraries

2020-11-26 Thread Kevin Kwon
Hi community, I'm testing out 1.12-SNAPSHOT in master branch I built my application with library 'flink-connector-kafka' but when I start the app, I get Caused by: org.apache.kafka.common.KafkaException: class org.apache.kafka.common.serialization.ByteArrayDeserializer is not an instance of org.a

Re: FlinkSQL kafka->dedup->kafka

2020-11-26 Thread Laurent Exsteens
Hello, seems like LAG would probably be the right function to use. However, I get unexpected results from it: --- -- Deduplicate addresses --- --CREATE TEMPORARY VIEW dedup_address as SELECT * FROM ( SELECT client_number , address ,

queryLogicalType != sinkLogicalType when UDAF returns List

2020-11-26 Thread Dongwon Kim
Hello, I'm using Flink-1.11.2. Let's assume that I want to store on a table the result of the following UDAF: > public class Agg extends AggregateFunction, List> > { > @Override > public List createAccumulator() { > return new LinkedList<>(); > } > @Override > public

Re: prometheus variable value is "normalized"

2020-11-26 Thread Alexey Trenikhun
Thank you! From: Chesnay Schepler Sent: Wednesday, November 25, 2020 2:59:54 PM To: Alexey Trenikhun ; Flink User Mail List Subject: Re: prometheus variable value is "normalized" Essentially, we started with this behavior, and kept it to not break existing Fli

Re: statefun creates unexpected new physical function

2020-11-26 Thread Igal Shilman
Glad to hear that you were able to resolve the issue! One comment tho: I would really encourage you to upgrade to statefun 2.2.1, which was released recently and it fixes a checkpointing related issue. Kind regards, Igal. On Tue, Nov 24, 2020 at 10:10 PM Lian Jiang wrote: > Probolved solved. I

Re: Concise example of how to deploy flink on Kubernetes

2020-11-26 Thread Igal Shilman
Hi George, Specifically for StateFun, we have the following Helm charts [1] to help you deploy Stateful Functions on k8s. The greeter example's docker-compose file also includes Kafka (and hence Zookeeper). Indeed the Flink cluster is "included" in the master/worker stateful functions docker image

Re: Statefun delayed message

2020-11-26 Thread Robert Metzger
Hey Tim, delayed messages are stored in Flink's state while they are waiting to be sent again. Thus they are not blocking any checkpoints (and thus the persisting of Kafka offsets). If you are restoring from a checkpoint (or savepoint), the pending delayed messages will be reloaded into Flink's s

Re: Caching

2020-11-26 Thread Dongwon Kim
Oops, I forgot to mention that when doing bulk insert into Redis, you'd better open a pipeline with a 'transaction' property set to False [1]. Otherwise, API calls from your Flink job will be timeout. [1] https://github.com/andymccurdy/redis-py#pipelines On Thu, Nov 26, 2020 at 11:09 PM Dongwon

Re: Caching

2020-11-26 Thread Dongwon Kim
Hi Navneeth, I reported a similar issue to yours before [1] but I took the broadcasting approach at first. As you already anticipated, broadcasting is going to use more memory than your current approach based on a static object on each TM . And the broadcasted data will be treated as operator st

Re: Caching

2020-11-26 Thread Prasanna kumar
Navneeth, Thanks for posting this question. This looks like our future scenario where we might end up with. We are working on a Similar problem statement with two differences. 1) The cache items would not change frequently say max of once per month or few times per year and the number of entiti

Re: left join flink stream

2020-11-26 Thread Guowei Ma
Hi, Youzha Sorry for the late reply. It seems that the type is mis-type-match. Could you 1. tableA.printSchema to print the schema? 2. KafkaSource.getType() to print the typeinformation? Best, Guowei On Mon, Nov 23, 2020 at 5:28 PM Youzha wrote: > Hi, this is sample code : > > > Table tableA =

Re: PyFlink Table API and UDF Limitations

2020-11-26 Thread Niklas Wilcke
Hi Xingbo, thanks for taking care and letting me know. I was about to share an example, how to reproduce this. Now I will wait for the next release candidate and give it a try. Regards, Niklas -- niklas.wil...@uniberg.com Mobile: +49 160 9793 2593 Office: +49 40 2380 6523 Simon-von-Utrecht-St

Re: Questions on Table to Datastream conversion and onTimer method in KeyedProcessFunction

2020-11-26 Thread Timo Walther
Hi Fuyao, yes I agree. The code evolved quicker than the docs. I will create an issue for this. Regards, Timo On 25.11.20 19:27, Fuyao Li wrote: Hi Timo, Thanks for your information. I saw the Flink SQL can actually do the full outer join in the test code with interval join semantic. Howe

Re: Is there a way we can specify operator ID for DDLs?

2020-11-26 Thread Danny Chan
Here is the issue https://issues.apache.org/jira/browse/FLINK-20368 Kevin Kwon 于2020年11月26日周四 上午8:50写道: > thanks alot :) > > On Wed, Nov 25, 2020 at 3:26 PM Danny Chan wrote: > >> SQL does not support that now. But i think your request is reasonable. >> AFAIK . SQL hints may be a way to configu

Caching

2020-11-26 Thread Navneeth Krishnan
Hi All, We have a flink streaming job processing around 200k events per second. The job requires a lot of less frequently changing data (sort of static but there will be some changes over time, say 5% change once per day or so). There are about 12 caches with some containing approximately 20k entr