tdunning commented on PR #471: URL: https://github.com/apache/datasketches-cpp/pull/471#issuecomment-3736732713
Arithmetic with infinity in IEEE floating point *can* produce NaN, but only when opposing values are involved. Sorting works correctly because infinities compare with normal numbers. Since the centroids are always sorted, merging will only ever combine infinities of like sign. This means that infinity will creep slowly like a contagion into the centroids in the digest. If there are lots of infinite values in a sample, we may get a slight over-estimate of quantiles because an entire centroid will go to infinity if a single infinite value is introduced, but no single centroid is that large in a reasonably configured digest. ``` julia> 3 + +Inf Inf julia> +Inf + +Inf Inf julia> +Inf + -Inf NaN julia> -Inf + -Inf -Inf julia> 3 + -Inf -Inf ``` On Sun, Jan 11, 2026 at 6:56 PM tison ***@***.***> wrote: > *tisonkun* left a comment (apache/datasketches-cpp#471) > <https://github.com/apache/datasketches-cpp/pull/471#issuecomment-3736718766> > > Allowing +Inf and -Inf in TDigest structure would cause internal NaN > values during compression or merging and potentially break internal > assumption. > > See apache/datasketches-java#702 > <https://github.com/apache/datasketches-java/issues/702> and apache/datasketches-rust#23 > (comment) > <https://github.com/apache/datasketches-rust/pull/23#discussion_r2622754228> > . > > This is the behavior that any function like sum should follow. This > includes mean and, in a more nuanced way, t-digest. What is the median of > [0, 0, 1, +∞, +∞] ? I think it should be 1, not 0. Julia agrees: julia> > median([0, 0, 1, +Inf, +Inf]) 1.0 If the infinities are ignored, we get > the wrong answer. > > I agree with this at some point. But since we're free to merge/compress > tdigests sketches, this may cause issues described above. > > — > Reply to this email directly, view it on GitHub > <https://github.com/apache/datasketches-cpp/pull/471#issuecomment-3736718766>, > or unsubscribe > <https://github.com/notifications/unsubscribe-auth/AAB5E6QWNYN3IVQFMHGAUKL4GMEO5AVCNFSM6AAAAACQ5Y54J2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTOMZWG4YTQNZWGY> > . > You are receiving this because you were mentioned.Message ID: > ***@***.***> > -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
