Hi Felipe, there are multiple ways to do DISTINCT COUNT in Table/SQL API. In fact there's already a thread going on recently [1] Based on the description you provided, it seems like it might be a better API level to use.
To answer your question, - You should be able to use other TimeCharacteristic. You might want to try WindowProcessFunction and see if this fits your use case. - Not sure I fully understand the question, your keyed by should be done on your distinct key (or a combo key) and if you do keyby correctly then yes all msg with same key is processed by the same TM thread. -- Rong [1] http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/count-DISTINCT-in-flink-SQL-td28061.html On Tue, Jun 11, 2019 at 1:27 AM Felipe Gutierrez < felipe.o.gutier...@gmail.com> wrote: > Hi all, > > I have implemented a Flink data stream application to compute distinct > count of words. Flink does not have a built-in operator which does this > computation. I used KeyedProcessFunction and I am saving the state on a > ValueState descriptor. > Could someone check if my implementation is the best way of doing it? Here > is my solution: > https://stackoverflow.com/questions/56524962/how-can-i-improve-my-count-distinct-for-data-stream-implementation-in-flink/56539296#56539296 > > I have some points that I could not understand better: > - I only could use TimeCharacteristic.IngestionTime. > - I split the words using "Tuple2<Integer, String>(0, word)", so I will > have always the same key (0). As I understand, all the events will be > processed on the same TaskManager which will not achieve parallelism if I > am in a cluster. > > Kind Regards, > Felipe > *--* > *-- Felipe Gutierrez* > > *-- skype: felipe.o.gutierrez* > *--* *https://felipeogutierrez.blogspot.com > <https://felipeogutierrez.blogspot.com>* >