How can I improve this Flink application for "Distinct Count of elements" in the data stream?

Felipe Gutierrez Tue, 11 Jun 2019 01:27:15 -0700

Hi all,

I have implemented a Flink data stream application to compute distinct
count of words. Flink does not have a built-in operator which does this
computation. I used KeyedProcessFunction and I am saving the state on a
ValueState descriptor.
Could someone check if my implementation is the best way of doing it? Here
is my solution:
https://stackoverflow.com/questions/56524962/how-can-i-improve-my-count-distinct-for-data-stream-implementation-in-flink/56539296#56539296


I have some points that I could not understand better:
- I only could use TimeCharacteristic.IngestionTime.
- I split the words using "Tuple2<Integer, String>(0, word)", so I will
have always the same key (0). As I understand, all the events will be
processed on the same TaskManager which will not achieve parallelism if I
am in a cluster.

Kind Regards,
Felipe
*--*
*-- Felipe Gutierrez*

*-- skype: felipe.o.gutierrez*
*--* *https://felipeogutierrez.blogspot.com
<https://felipeogutierrez.blogspot.com>*

How can I improve this Flink application for "Distinct Count of elements" in the data stream?

Reply via email to