Just want to bump this in case anyone has any feedback. On Wed, Oct 1, 2025 at 9:56 AM Joey Tran <[email protected]> wrote:
> Hey all, > > Our distribution metric is fairly coarse in how it describes a > distribution with just min, max, sum, mean, and count. There is currently > very little information on the actual shape of the distribution. I'd like > to propose a couple of improvements. > > As a small improvement, I think it would be nice to include stdev. As a > large improvement, we could use tdigests to create a more granular > distribution. This shouldn't be too hard to support since we already have a > TDigest[1] java transform which we can probably adapt for the java sdk > harness; and we can use the fastdigest[2] python library for extending the > python SDK harness > > I propose that we extend the current encoding [3] of the distribution > metrics. The current encoding is: > // Encoding: <count><sum><min><max> > // - count: beam:coder:varint:v1 > // - sum: beam:coder:double:v1 > // - min: beam:coder:double:v1 > // - max: beam:coder:double:v1 > > I suggest just appending to it an encoding of a new `TDigestData` proto > that'd include information on the tdigest centroids and options (e.g. max > centroids). This can be an optional field so we wouldn't need to update all > SDKs at once. Similarly, runners will have the option to ignore the tdigest > option (which by default they will currently). The benefit of this is that > we don't need to expand the user metrics API - users will just get better > distribution information as their sdks/runners implement tdigest support. > > Looking for any thoughts or feedback on this idea. > > Cheers, > Joey > > > [1] > https://beam.apache.org/releases/javadoc/2.5.0/index.html?org/apache/beam/sdk/extensions/sketching/TDigestQuantiles.html > [2] > https://beam.apache.org/releases/javadoc/2.5.0/index.html?org/apache/beam/sdk/extensions/sketching/TDigestQuantiles.html > [3] > https://github.com/apache/beam/blob/75866588752de0c47136fde173944bd57c323401/model/pipeline/src/main/proto/org/apache/beam/model/pipeline/v1/metrics.proto#L534 >
