I am porting a calculation from Spark batches that uses broadcast variables to compute percentiles from metrics and curious for tips on doing this with Flink streaming.
I have a windowed computation where I am compute metrics for IP-addresses (a windowed stream of metrics objects grouped by IP-addresses). Now I would like to compute percentiles for each IP from the metrics. My idea is to send all the metrics to a node that computes a global TDigest and then rejoins the computed global TDigest with the IP-grouped metrics stream to compute the percentiles for each IP. Is there a neat way to implement this in Flink? I am curious about the best way to join a global valuem like our TDigest, with every result of a grouped window stream. Also how to know when the TDigest is complete and has seen every element in the window (say if I implement it in a stateful flatMap that emits the value after seeing all stream values). Thanks! William