I am porting a calculation from Spark batches that uses broadcast
variables to compute percentiles from metrics and curious for tips on
doing this with Flink streaming.

I have a windowed computation where I am compute metrics for
IP-addresses (a windowed stream of metrics objects grouped by
IP-addresses). Now I would like to compute percentiles for each IP
from the metrics.

My idea is to send all the metrics to a node that computes a global
TDigest and then rejoins the computed global TDigest with the
IP-grouped metrics stream to compute the percentiles for each IP. Is
there a neat way to implement this in Flink?

I am curious about the best way to join a global valuem like our
TDigest, with every result of a grouped window stream.  Also how to
know when the TDigest is complete and has seen every element in the
window (say if I implement it in a stateful flatMap that emits the
value after seeing all stream values).

Thanks!
William

Reply via email to