berkaysynnada commented on code in PR #15296:
URL: https://github.com/apache/datafusion/pull/15296#discussion_r2007079688
##
datafusion/expr-common/src/statistics.rs:
##
@@ -203,6 +203,138 @@ impl Distribution {
};
Ok(dt)
}
+
+/// Merges two distributi
xudong963 commented on PR #15296:
URL: https://github.com/apache/datafusion/pull/15296#issuecomment-2743831182
> Attribute `total_count` is derivable from `counts`, so we may not want to
store it for normalization/consistency reasons. Same goes for `range`, it can
constructed from `bins` in
ozankabak commented on PR #15296:
URL: https://github.com/apache/datafusion/pull/15296#issuecomment-2743175661
This API, as it currently stands, does not seem to make sense. It seems to
make the assumption that outcomes (i.e. individual items in the range) of the
`Distribution`s are equally
xudong963 commented on PR #15296:
URL: https://github.com/apache/datafusion/pull/15296#issuecomment-2742833876
FYI @berkaysynnada @ozankabak
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the s
xudong963 commented on PR #15296:
URL: https://github.com/apache/datafusion/pull/15296#issuecomment-2743593315
> Do you know any use cases where this method would be especially useful? If
so, maybe we can study one of those cases in more detail. That could help us
understand the real need a
xudong963 commented on PR #15296:
URL: https://github.com/apache/datafusion/pull/15296#issuecomment-2745345742
Thanks for your suggestions!! @alamb @ozankabak @berkaysynnada and @kosiew
I'll continue to do such work after the `Migrate to Distribution from
Precision` work is done. I t
xudong963 closed pull request #15296: feat: support merge for `Distribution`
URL: https://github.com/apache/datafusion/pull/15296
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
ozankabak commented on PR #15296:
URL: https://github.com/apache/datafusion/pull/15296#issuecomment-2744132824
The most likely way we will end up with `HistogramDistribution`s will be via
sampling. We can also leverage statistics in file metadata if a file format
stores this information. AF
ozankabak commented on PR #15296:
URL: https://github.com/apache/datafusion/pull/15296#issuecomment-2743724228
> I confused the merge and mix, after reviewing the information, "Merge"
suggests combining datasets that maintain their original properties, but what's
implemented is actually clo
xudong963 commented on PR #15296:
URL: https://github.com/apache/datafusion/pull/15296#issuecomment-2743665328
> We can only merge two statistical objects in certain special
circumstances. For example, if we have a statistical object that tracks sample
averages along with counts, we can mer
xudong963 commented on code in PR #15296:
URL: https://github.com/apache/datafusion/pull/15296#discussion_r2007748334
##
datafusion/expr-common/src/statistics.rs:
##
@@ -203,6 +203,138 @@ impl Distribution {
};
Ok(dt)
}
+
+/// Merges two distributions
berkaysynnada commented on code in PR #15296:
URL: https://github.com/apache/datafusion/pull/15296#discussion_r2007079688
##
datafusion/expr-common/src/statistics.rs:
##
@@ -203,6 +203,138 @@ impl Distribution {
};
Ok(dt)
}
+
+/// Merges two distributi
xudong963 commented on code in PR #15296:
URL: https://github.com/apache/datafusion/pull/15296#discussion_r2004911809
##
datafusion/expr-common/src/statistics.rs:
##
@@ -857,6 +857,143 @@ pub fn compute_variance(
ScalarValue::try_from(target_type)
}
+/// Merges two distr
xudong963 commented on code in PR #15296:
URL: https://github.com/apache/datafusion/pull/15296#discussion_r2002959262
##
datafusion/expr-common/src/statistics.rs:
##
@@ -203,6 +203,121 @@ impl Distribution {
};
Ok(dt)
}
+
+/// Merges two distributions
kosiew commented on code in PR #15296:
URL: https://github.com/apache/datafusion/pull/15296#discussion_r2002828639
##
datafusion/expr-common/src/statistics.rs:
##
@@ -203,6 +203,121 @@ impl Distribution {
};
Ok(dt)
}
+
+/// Merges two distributions int
xudong963 commented on code in PR #15296:
URL: https://github.com/apache/datafusion/pull/15296#discussion_r2002526236
##
datafusion/expr-common/src/statistics.rs:
##
@@ -203,6 +203,121 @@ impl Distribution {
};
Ok(dt)
}
+
+/// Merges two distributions
xudong963 commented on code in PR #15296:
URL: https://github.com/apache/datafusion/pull/15296#discussion_r2002503571
##
datafusion/expr-common/src/statistics.rs:
##
@@ -203,6 +203,121 @@ impl Distribution {
};
Ok(dt)
}
+
+/// Merges two distributions
kosiew commented on code in PR #15296:
URL: https://github.com/apache/datafusion/pull/15296#discussion_r2002421418
##
datafusion/expr-common/src/statistics.rs:
##
@@ -203,6 +203,121 @@ impl Distribution {
};
Ok(dt)
}
+
+/// Merges two distributions int
xudong963 commented on PR #15296:
URL: https://github.com/apache/datafusion/pull/15296#issuecomment-2735210417
> I think eventually it would be nice to add some tests for this code
Yes, as the ticket description said: I'll do it after we are consistent.
--
This is an automated messa
xudong963 commented on code in PR #15296:
URL: https://github.com/apache/datafusion/pull/15296#discussion_r2002299377
##
datafusion/expr-common/src/statistics.rs:
##
@@ -857,6 +857,143 @@ pub fn compute_variance(
ScalarValue::try_from(target_type)
}
+/// Merges two distr
xudong963 commented on code in PR #15296:
URL: https://github.com/apache/datafusion/pull/15296#discussion_r2002335941
##
datafusion/expr-common/src/statistics.rs:
##
@@ -857,6 +857,143 @@ pub fn compute_variance(
ScalarValue::try_from(target_type)
}
+/// Merges two distr
xudong963 commented on code in PR #15296:
URL: https://github.com/apache/datafusion/pull/15296#discussion_r2002309255
##
datafusion/expr-common/src/statistics.rs:
##
@@ -857,6 +857,143 @@ pub fn compute_variance(
ScalarValue::try_from(target_type)
}
+/// Merges two distr
xudong963 commented on code in PR #15296:
URL: https://github.com/apache/datafusion/pull/15296#discussion_r2002299377
##
datafusion/expr-common/src/statistics.rs:
##
@@ -857,6 +857,143 @@ pub fn compute_variance(
ScalarValue::try_from(target_type)
}
+/// Merges two distr
xudong963 commented on code in PR #15296:
URL: https://github.com/apache/datafusion/pull/15296#discussion_r2002296888
##
datafusion/expr-common/src/statistics.rs:
##
@@ -857,6 +857,143 @@ pub fn compute_variance(
ScalarValue::try_from(target_type)
}
+/// Merges two distr
alamb commented on code in PR #15296:
URL: https://github.com/apache/datafusion/pull/15296#discussion_r2002037023
##
datafusion/expr-common/src/statistics.rs:
##
@@ -857,6 +857,143 @@ pub fn compute_variance(
ScalarValue::try_from(target_type)
}
+/// Merges two distribut
25 matches
Mail list logo