andygrove commented on PR #1790:
URL:
https://github.com/apache/datafusion-comet/pull/1790#issuecomment-2908051424
This approach seems to work, and the Spark 4 tests ran a little faster than
usual.
```
Run completed in 1 hour, 10 minutes, 53 seconds.
Total number of tests run:
logan-keede commented on issue #11201:
URL: https://github.com/apache/datafusion/issues/11201#issuecomment-2908018288
do we just need to port
[cast](https://github.com/apache/datafusion-comet/blob/main/native/spark-expr/src/conversion_funcs/cast.rs)
here from comet?
--
This is an automa
codecov-commenter commented on PR #1792:
URL:
https://github.com/apache/datafusion-comet/pull/1792#issuecomment-2908024355
##
[Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/1792?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_ca
kosiew commented on PR #16169:
URL: https://github.com/apache/datafusion/pull/16169#issuecomment-2908327917
hi @adriangb ,
Thanks for the ping.
I recorded some observation and thoughts in #16188
--
This is an automated message from the Apache Git Service.
To respond to the m
kosiew opened a new issue, #16188:
URL: https://github.com/apache/datafusion/issues/16188
### Is your feature request related to a problem or challenge?
The current filter pushdown APIs in DataFusion (FilterPushdownPropagation,
PredicateSupports, etc.) have grown organically but now a
kosiew commented on code in PR #16148:
URL: https://github.com/apache/datafusion/pull/16148#discussion_r2106428855
##
datafusion/core/tests/integration_tests/schema_adapter_integration_tests.rs:
##
@@ -0,0 +1,197 @@
+// Licensed to the Apache Software Foundation (ASF) under one
duongcongtoai commented on code in PR #16186:
URL: https://github.com/apache/datafusion/pull/16186#discussion_r2106233488
##
datafusion/optimizer/src/push_down_filter.rs:
##
@@ -1089,7 +1089,11 @@ impl OptimizerRule for PushDownFilter {
let (volatile_filters, no
logan-keede commented on code in PR #16186:
URL: https://github.com/apache/datafusion/pull/16186#discussion_r2106240915
##
datafusion/sql/src/planner.rs:
##
@@ -235,18 +235,27 @@ impl PlannerContext {
}
// Return a reference to the outer query's schema
-pub fn ou
logan-keede commented on code in PR #16186:
URL: https://github.com/apache/datafusion/pull/16186#discussion_r2106240774
##
datafusion/sql/src/planner.rs:
##
@@ -235,18 +235,27 @@ impl PlannerContext {
}
// Return a reference to the outer query's schema
-pub fn ou
kazantsev-maksim opened a new pull request, #1792:
URL: https://github.com/apache/datafusion-comet/pull/1792
## Which issue does this PR close?
Part of https://github.com/apache/datafusion-comet/issues/1330
Closes #.
## Rationale for this change
See https://github.
irenjj commented on code in PR #16186:
URL: https://github.com/apache/datafusion/pull/16186#discussion_r2106333691
##
datafusion/sqllogictest/test_files/subquery.slt:
##
@@ -1482,3 +1482,85 @@ logical_plan
statement count 0
drop table person;
+
+#
correlated_recursive_scala
irenjj commented on code in PR #16186:
URL: https://github.com/apache/datafusion/pull/16186#discussion_r2106333691
##
datafusion/sqllogictest/test_files/subquery.slt:
##
@@ -1482,3 +1482,85 @@ logical_plan
statement count 0
drop table person;
+
+#
correlated_recursive_scala
kosiew commented on code in PR #16148:
URL: https://github.com/apache/datafusion/pull/16148#discussion_r2106445116
##
datafusion/core/tests/test_source_adapter_tests.rs:
##
@@ -0,0 +1,233 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor l
kosiew commented on code in PR #16148:
URL: https://github.com/apache/datafusion/pull/16148#discussion_r2106439691
##
datafusion/core/tests/test_adapter_updated.rs:
##
@@ -0,0 +1,201 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor licens
kosiew commented on code in PR #16148:
URL: https://github.com/apache/datafusion/pull/16148#discussion_r2106439691
##
datafusion/core/tests/test_adapter_updated.rs:
##
@@ -0,0 +1,201 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor licens
duongcongtoai commented on code in PR #16186:
URL: https://github.com/apache/datafusion/pull/16186#discussion_r2106485055
##
datafusion/sqllogictest/test_files/subquery.slt:
##
@@ -1482,3 +1482,85 @@ logical_plan
statement count 0
drop table person;
+
+#
correlated_recursiv
hendrikmakait opened a new pull request, #1860:
URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1860
Part of #1758
This PR adds support for the `|> TABLESAMPLE ...` pipe operator.
Open question:
[BigQuery's `TABLESAMPLE`
operator](https://cloud.google.com/big
liamzwbao commented on issue #15969:
URL: https://github.com/apache/datafusion/issues/15969#issuecomment-2907974116
take
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To
codecov-commenter commented on PR #1791:
URL:
https://github.com/apache/datafusion-comet/pull/1791#issuecomment-2907982759
##
[Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/1791?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_ca
duongcongtoai commented on code in PR #16186:
URL: https://github.com/apache/datafusion/pull/16186#discussion_r2106246132
##
datafusion/sql/src/planner.rs:
##
@@ -235,18 +235,27 @@ impl PlannerContext {
}
// Return a reference to the outer query's schema
-pub fn
lifan-ake commented on code in PR #16184:
URL: https://github.com/apache/datafusion/pull/16184#discussion_r2106399333
##
datafusion/expr/src/logical_plan/plan.rs:
##
@@ -4059,120 +4060,118 @@ mod tests {
.project(vec![col("id"), exists(plan1).alias("exists")])?
lifan-ake commented on code in PR #16184:
URL: https://github.com/apache/datafusion/pull/16184#discussion_r2106399160
##
datafusion/expr/src/logical_plan/builder.rs:
##
@@ -2269,11 +2270,11 @@ mod tests {
.project(vec![col("id")])?
.build()?;
lifan-ake commented on code in PR #16184:
URL: https://github.com/apache/datafusion/pull/16184#discussion_r2106399499
##
datafusion/expr/src/logical_plan/builder.rs:
##
@@ -2498,19 +2495,8 @@ mod tests {
.project(vec![col("id"), col("first_name").alias("id")]);
Curricane commented on PR #14057:
URL: https://github.com/apache/datafusion/pull/14057#issuecomment-2908303408
We look forward to supporting this feature as soon as possible, and perhaps
enrich the optimization strategy by hiding columns
--
This is an automated message from the Apache Git
github-actions[bot] closed pull request #14180: refactor: do
ambiguous_distinct_check in select
URL: https://github.com/apache/datafusion/pull/14180
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to t
irenjj commented on code in PR #16016:
URL: https://github.com/apache/datafusion/pull/16016#discussion_r2106217418
##
datafusion/expr/src/logical_plan/plan.rs:
##
@@ -287,6 +287,63 @@ pub enum LogicalPlan {
Unnest(Unnest),
/// A variadic query (e.g. "Recursive CTEs")
onlyjackfrost commented on PR #16181:
URL: https://github.com/apache/datafusion/pull/16181#issuecomment-2907881472
@alamb thanks for the review and comments!
I adjusted the diagram, used `cargo doc --open`, and looked.
https://github.com/user-attachments/assets/55d2a9f5-b90c-4e6c-9148-
Rachelint commented on PR #15591:
URL: https://github.com/apache/datafusion/pull/15591#issuecomment-2907882330
> I wonder what the plan is for this PR?
>
> From what I understand, it currently improves performance for aggregates
with large numbers of groups, but (slightly) slows down
duongcongtoai commented on PR #16016:
URL: https://github.com/apache/datafusion/pull/16016#issuecomment-2907891074
https://github.com/apache/datafusion/pull/16186/files
I added this dummy fix to support multiple level outer ref columns, it is
enough for us to continue with this story.
andygrove commented on issue #1786:
URL:
https://github.com/apache/datafusion-comet/issues/1786#issuecomment-2907932180
> Here is an example CI Failure:
>
> *
https://github.com/apache/datafusion-comet/actions/runs/15201820724/job/42757252219
>
>
> [@andygrove](https:/
andygrove opened a new pull request, #1790:
URL: https://github.com/apache/datafusion-comet/pull/1790
## Which issue does this PR close?
Part of https://github.com/apache/datafusion-comet/issues/1786
## Rationale for this change
Exploring the idea of runni
andygrove commented on issue #1786:
URL:
https://github.com/apache/datafusion-comet/issues/1786#issuecomment-2908074722
The failure happened at a different point in this run:
https://github.com/apache/datafusion-comet/actions/runs/15240457481/job/42860102625?pr=1792
The failur
andygrove commented on PR #1792:
URL:
https://github.com/apache/datafusion-comet/pull/1792#issuecomment-2908075868
CI failure is unrelated -
https://github.com/apache/datafusion-comet/issues/1786
--
This is an automated message from the Apache Git Service.
To respond to the message, plea
duongcongtoai commented on code in PR #16186:
URL: https://github.com/apache/datafusion/pull/16186#discussion_r2106305831
##
datafusion/optimizer/src/push_down_filter.rs:
##
@@ -1089,7 +1089,13 @@ impl OptimizerRule for PushDownFilter {
let (volatile_filters, no
davisp commented on issue #16158:
URL: https://github.com/apache/datafusion/issues/16158#issuecomment-2908080182
Registering my official +1 to default to collecting statistics.
For reference, I was working on the TPC-H benchmarks with a scale factor of
20 which generates roughly 20GiB
atahanyorganci commented on PR #16164:
URL: https://github.com/apache/datafusion/pull/16164#issuecomment-2907862590
thanks for the review @alamb
CI passed, I think we can merge it at your discretion.
--
This is an automated message from the Apache Git Service.
To respond to the mes
andygrove closed pull request #1790: [experiment] Run Comet tests in Docker
URL: https://github.com/apache/datafusion-comet/pull/1790
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comm
andygrove commented on issue #1786:
URL:
https://github.com/apache/datafusion-comet/issues/1786#issuecomment-2907943915
@alamb more specifically, the failing build that you linked to:
```
CometShuffle4_0Suite:
- Fallback to Spark when shuffling on struct with duplicate field nam
irenjj commented on PR #16016:
URL: https://github.com/apache/datafusion/pull/16016#issuecomment-2908203622
We should also fix compile error in
`physical_planner.rs(map_logical_node_to_physical)`, need to handle
`LogicalPlan::DependentJoin`.
--
This is an automated message from the Apach
duongcongtoai opened a new pull request, #16186:
URL: https://github.com/apache/datafusion/pull/16186
## Which issue does this PR close?
- Closes #.
## Rationale for this change
## What changes are included in this PR?
## Are these changes t
andygrove opened a new pull request, #1791:
URL: https://github.com/apache/datafusion-comet/pull/1791
## Which issue does this PR close?
Closes #.
## Rationale for this change
## What changes are included in this PR?
## How are these changes
comphead opened a new issue, #16187:
URL: https://github.com/apache/datafusion/issues/16187
### Describe the bug
The query below crashes
```
> select map_values(map([named_struct('a', 1, 'b', null)],
[named_struct('a', 1, 'b', null)]))[0] as a;
thread 'main' panicked at
kosiew commented on code in PR #16148:
URL: https://github.com/apache/datafusion/pull/16148#discussion_r2106461041
##
datafusion/datasource-parquet/tests/apply_schema_adapter_tests.rs:
##
@@ -0,0 +1,224 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or mor
logan-keede commented on issue #16137:
URL: https://github.com/apache/datafusion/issues/16137#issuecomment-2907673352
https://github.com/apache/datafusion/blob/34f250a2b4800845b5c4e61bd928ddbbc4af7ba0/datafusion/expr/src/logical_plan/invariants.rs#L174-L201
DataFusion tries to pre
berkaysynnada merged PR #16164:
URL: https://github.com/apache/datafusion/pull/16164
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscr...@
kosiew commented on code in PR #16148:
URL: https://github.com/apache/datafusion/pull/16148#discussion_r2106508912
##
datafusion/datasource/src/test_util.rs:
##
@@ -81,6 +83,8 @@ impl FileSource for MockSource {
fn file_type(&self) -> &str {
"mock"
}
+
+im
duongcongtoai commented on code in PR #16016:
URL: https://github.com/apache/datafusion/pull/16016#discussion_r2106527079
##
datafusion/expr/src/logical_plan/plan.rs:
##
@@ -287,6 +287,63 @@ pub enum LogicalPlan {
Unnest(Unnest),
/// A variadic query (e.g. "Recursive C
kosiew commented on code in PR #16148:
URL: https://github.com/apache/datafusion/pull/16148#discussion_r2106508912
##
datafusion/datasource/src/test_util.rs:
##
@@ -81,6 +83,8 @@ impl FileSource for MockSource {
fn file_type(&self) -> &str {
"mock"
}
+
+im
duongcongtoai commented on code in PR #16016:
URL: https://github.com/apache/datafusion/pull/16016#discussion_r2106138463
##
datafusion/optimizer/src/decorrelate_general.rs:
##
@@ -0,0 +1,856 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contribut
irenjj commented on PR #16016:
URL: https://github.com/apache/datafusion/pull/16016#issuecomment-2907709927
Thanks @duongcongtoai , I'll review the other code later, but regarding the
depth issue, I don't think it's likely to be handled in the optimizer. I'll
organize some questions and We
irenjj commented on code in PR #16016:
URL: https://github.com/apache/datafusion/pull/16016#discussion_r2106147135
##
datafusion/optimizer/src/decorrelate_general.rs:
##
@@ -0,0 +1,856 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor lice
irenjj commented on PR #16016:
URL: https://github.com/apache/datafusion/pull/16016#issuecomment-2907729186
I think a possible implementation approach is to construct the
PlannerContext at the outer layer, and then carry the PlannerContext
information into the optimizer/physical_planner.
blaginin commented on code in PR #16184:
URL: https://github.com/apache/datafusion/pull/16184#discussion_r2106160686
##
datafusion/expr/src/logical_plan/builder.rs:
##
@@ -2759,10 +2749,24 @@ mod tests {
let join = LogicalPlanBuilder::from(left).cross_join(right)?.bui
alamb commented on code in PR #15022:
URL: https://github.com/apache/datafusion/pull/15022#discussion_r2106160675
##
datafusion/functions-aggregate-common/src/aggregate/groups_accumulator.rs:
##
@@ -299,6 +424,17 @@ impl GroupsAccumulatorAdapter {
}
impl GroupsAccumulator fo
blaginin commented on PR #16184:
URL: https://github.com/apache/datafusion/pull/16184#issuecomment-2907753148
Also can you please check the CI? Some tests are failing
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
logan-keede commented on PR #16016:
URL: https://github.com/apache/datafusion/pull/16016#issuecomment-2907757622
> regarding the depth issue, I don't think it's likely to be handled in the
optimizer.
I agree with @irenjj , it seems like correlated subqueries with depth>1 does
not rea
irenjj commented on PR #16016:
URL: https://github.com/apache/datafusion/pull/16016#issuecomment-2907724823
The difference between DataFusion and DuckDB in constructing logical plans
is: DataFusion directly assigns schema to `LogicalPlan`, while DuckDB saves
metadata information in the `Bin
duongcongtoai commented on issue #5492:
URL: https://github.com/apache/datafusion/issues/5492#issuecomment-2907708177
This [PR](https://github.com/apache/datafusion/pull/16016) is ready for
review,
let me know your opinions
I think after this it will unblock us to start implementing so
irenjj commented on code in PR #16016:
URL: https://github.com/apache/datafusion/pull/16016#discussion_r2106132282
##
datafusion/optimizer/src/decorrelate_general.rs:
##
@@ -0,0 +1,856 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor lice
duongcongtoai commented on code in PR #16016:
URL: https://github.com/apache/datafusion/pull/16016#discussion_r2106139236
##
datafusion/optimizer/src/decorrelate_general.rs:
##
@@ -0,0 +1,856 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contribut
alamb commented on issue #15771:
URL: https://github.com/apache/datafusion/issues/15771#issuecomment-2907743279
I am not sure if it is related, but we have also seen some intermittent
failures in DataFusion CI
- https://github.com/apache/datafusion/issues/16180
--
This is an automated
alamb commented on issue #1786:
URL:
https://github.com/apache/datafusion-comet/issues/1786#issuecomment-2907743068
Here is an example CI Failure:
-
https://github.com/apache/datafusion-comet/actions/runs/15201820724/job/42757252219
@andygrove can you help me understand what the
chenkovsky opened a new pull request, #16185:
URL: https://github.com/apache/datafusion/pull/16185
## Which issue does this PR close?
- Closes #16171.
## Rationale for this change
equivalence is not set.
## What changes are included in this PR?
compute i
alamb commented on code in PR #16167:
URL: https://github.com/apache/datafusion/pull/16167#discussion_r2106168938
##
datafusion/functions-nested/src/length.rs:
##
@@ -128,26 +148,20 @@ pub fn array_length_inner(args: &[ArrayRef]) ->
Result {
match &args[0].data_type() {
alamb commented on PR #16167:
URL: https://github.com/apache/datafusion/pull/16167#issuecomment-2907765509
Thanks again @chenkovsky
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific c
alamb closed issue #16163: Arraylength UDF not fully implemented or inconcistent
URL: https://github.com/apache/datafusion/issues/16163
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific co
alamb merged PR #16079:
URL: https://github.com/apache/datafusion/pull/16079
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscr...@datafusi
alamb closed issue #16057: Reduce repetition in the parameter type inference
tests
URL: https://github.com/apache/datafusion/issues/16057
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific
alamb commented on PR #15591:
URL: https://github.com/apache/datafusion/pull/15591#issuecomment-2907769450
I wonder what the plan is for this PR?
From what I understand, it currently improves performance for aggregates
with large numbers of groups, but (slightly) slows down aggregates
alamb commented on code in PR #16185:
URL: https://github.com/apache/datafusion/pull/16185#discussion_r2106171762
##
datafusion/physical-expr/src/equivalence/class.rs:
##
@@ -422,6 +423,32 @@ impl EquivalenceGroup {
self.bridge_classes()
}
+#[allow(clippy::ty
alamb commented on PR #16165:
URL: https://github.com/apache/datafusion/pull/16165#issuecomment-2907770923
I am surprised this shows any performance difference. I will rerun and see
if I can reproduce
--
This is an automated message from the Apache Git Service.
To respond to the message,
alamb commented on PR #16165:
URL: https://github.com/apache/datafusion/pull/16165#issuecomment-2907771160
🤖 `./gh_compare_branch.sh` [Benchmark
Script](https://github.com/alamb/datafusion-benchmarking/blob/main/gh_compare_branch.sh)
Running
Linux aal-dev 6.11.0-1013-gcp #13~24.04.1-Ubun
duongcongtoai commented on code in PR #16016:
URL: https://github.com/apache/datafusion/pull/16016#discussion_r2106175049
##
datafusion/optimizer/src/decorrelate_general.rs:
##
@@ -0,0 +1,856 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contribut
duongcongtoai commented on PR #16016:
URL: https://github.com/apache/datafusion/pull/16016#issuecomment-2907779111
> It's thrown in planning phase, the error is thrown because in planning
phase, planner can only get the schema info from upper query block.
Ahah, confirmed, given this q
alamb commented on PR #16165:
URL: https://github.com/apache/datafusion/pull/16165#issuecomment-290856
🤖 `./gh_compare_branch.sh` [Benchmark
Script](https://github.com/alamb/datafusion-benchmarking/blob/main/gh_compare_branch.sh)
Running
Linux aal-dev 6.11.0-1013-gcp #13~24.04.1-Ubun
alamb commented on PR #16165:
URL: https://github.com/apache/datafusion/pull/16165#issuecomment-2907785532
🤖: Benchmark completed
Details
```
Comparing HEAD and fix_aggregation-seed
Benchmark clickbench_extended.json
logan-keede commented on PR #16016:
URL: https://github.com/apache/datafusion/pull/16016#issuecomment-2907793456
> The optimizor is actually invoked, but with the plan of `EmptyRelation`
for some reason, we better do something in the planning!
How did you confirm that?
I tried by p
alamb commented on PR #16181:
URL: https://github.com/apache/datafusion/pull/16181#issuecomment-2907762892
Thank you @onlyjackfrost 🙏
I have a few comments on this diagram here:
1. I think the names in the diagram should match as much as possible the
names in the code "file so
rluvaton commented on code in PR #15022:
URL: https://github.com/apache/datafusion/pull/15022#discussion_r2106174038
##
datafusion/functions-aggregate-common/src/aggregate/groups_accumulator.rs:
##
@@ -299,6 +424,17 @@ impl GroupsAccumulatorAdapter {
}
impl GroupsAccumulator
irenjj commented on PR #16016:
URL: https://github.com/apache/datafusion/pull/16016#issuecomment-290338
> Actually this error is thrown after all optimizors are executed, the error
is thrown because no existing optimizers are capable of handle nested
subqueries.
It's thrown in pl
alamb commented on PR #16165:
URL: https://github.com/apache/datafusion/pull/16165#issuecomment-290827
🤖: Benchmark completed
Details
```
Comparing HEAD and fix_aggregation-seed
Benchmark clickbench_extended.json
alamb commented on issue #14608:
URL: https://github.com/apache/datafusion/issues/14608#issuecomment-2907767246
Thansk @clflushopt -- I'll try and check this out tomorrow or Tuesday
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to Gi
alamb merged PR #16167:
URL: https://github.com/apache/datafusion/pull/16167
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscr...@datafusi
duongcongtoai commented on PR #16016:
URL: https://github.com/apache/datafusion/pull/16016#issuecomment-2907772662
> it seems like correlated subqueries with depth>1 does not reach optimizer
as they report Schema Error: No field named xyz.col
Actually this error is thrown after all o
duongcongtoai commented on PR #16016:
URL: https://github.com/apache/datafusion/pull/16016#issuecomment-2907798826
> How did you confirm that?
I tried by putting a debug statement here-
my bad, the EmptyRelation is actually invoked for the queries to create
table :disappointed:
duongcongtoai commented on PR #16016:
URL: https://github.com/apache/datafusion/pull/16016#issuecomment-2907805553
regarding of providing the schema from outer to the deep down subquery, can
we do something like this:
```
pub(super) fn parse_scalar_subquery(
&self,
86 matches
Mail list logo