Hi all, I have implemented a Flink data stream application to compute distinct count of words. Flink does not have a built-in operator which does this computation. I used KeyedProcessFunction and I am saving the state on a ValueState descriptor. Could someone check if my implementation is the best way of doing it? Here is my solution: https://stackoverflow.com/questions/56524962/how-can-i-improve-my-count-distinct-for-data-stream-implementation-in-flink/56539296#56539296
I have some points that I could not understand better: - I only could use TimeCharacteristic.IngestionTime. - I split the words using "Tuple2<Integer, String>(0, word)", so I will have always the same key (0). As I understand, all the events will be processed on the same TaskManager which will not achieve parallelism if I am in a cluster. Kind Regards, Felipe *--* *-- Felipe Gutierrez* *-- skype: felipe.o.gutierrez* *--* *https://felipeogutierrez.blogspot.com <https://felipeogutierrez.blogspot.com>*