kosiew commented on code in PR #1416:
URL:
https://github.com/apache/datafusion-python/pull/1416#discussion_r2929384345
##########
python/datafusion/functions.py:
##########
@@ -1894,6 +1894,15 @@ def approx_distinct(
Args:
expression: Values to check for distinct entries
filter: If provided, only compute against rows for which the filter is
True
+
+ Examples:
+ ---------
+ >>> ctx = dfn.SessionContext()
+ >>> df = ctx.from_pydict({"a": [1, 1, 2, 3]})
+ >>> result = df.aggregate(
+ ... [], [dfn.functions.approx_distinct(dfn.col("a")).alias("v")])
+ >>> result.collect_column("v")[0].as_py() >= 2
Review Comment:
`>= 2` is a weak regression signal for a 4-row input with 3 distinct values.
Could we pick an input where the approximation is still deterministic enough
to show a concrete answer, or at least tighten the expectation so the example
documents the intended behavior more clearly?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]