Just want to bump this in case anyone has any feedback.

On Wed, Oct 1, 2025 at 9:56 AM Joey Tran <[email protected]> wrote:

> Hey all,
>
> Our distribution metric is fairly coarse in how it describes a
> distribution with just min, max, sum, mean, and count. There is currently
> very little information on the actual shape of the distribution. I'd like
> to propose a couple of improvements.
>
> As a small improvement, I think it would be nice to include stdev. As a
> large improvement, we could use tdigests to create a more granular
> distribution. This shouldn't be too hard to support since we already have a
> TDigest[1] java transform  which we can probably adapt for the java sdk
> harness; and we can use the fastdigest[2] python library for extending the
> python SDK harness
>
> I propose that we extend the current encoding [3] of the distribution
> metrics. The current encoding is:
>     // Encoding: <count><sum><min><max>
>     //   - count: beam:coder:varint:v1
>     //   - sum:   beam:coder:double:v1
>     //   - min:   beam:coder:double:v1
>     //   - max:   beam:coder:double:v1
>
> I suggest just appending to it an encoding of a new `TDigestData` proto
> that'd include information on the tdigest centroids and options (e.g. max
> centroids). This can be an optional field so we wouldn't need to update all
> SDKs at once. Similarly, runners will have the option to ignore the tdigest
> option (which by default they will currently). The benefit of this is that
> we don't need to expand the user metrics API - users will just get better
> distribution information as their sdks/runners implement tdigest support.
>
> Looking for any thoughts or feedback on this idea.
>
> Cheers,
> Joey
>
>
> [1]
> https://beam.apache.org/releases/javadoc/2.5.0/index.html?org/apache/beam/sdk/extensions/sketching/TDigestQuantiles.html
> [2]
> https://beam.apache.org/releases/javadoc/2.5.0/index.html?org/apache/beam/sdk/extensions/sketching/TDigestQuantiles.html
> [3]
> https://github.com/apache/beam/blob/75866588752de0c47136fde173944bd57c323401/model/pipeline/src/main/proto/org/apache/beam/model/pipeline/v1/metrics.proto#L534
>

Reply via email to