Re: [PR] feat: support inability to yeild for loop when it's not using Tok… [datafusion]

2025-05-28 Thread via GitHub
zhuqi-lucas commented on PR #16196: URL: https://github.com/apache/datafusion/pull/16196#issuecomment-2915817369 I polish the code only affect the no grouping aggregate, maybe we can compare the clickbench, so we can be confident to merge if it not affect aggregate performance. -- This i

Re: [PR] feat: create builder for disk manager [datafusion]

2025-05-28 Thread via GitHub
jdrouet commented on code in PR #16191: URL: https://github.com/apache/datafusion/pull/16191#discussion_r2111306335 ## datafusion/execution/src/disk_manager.rs: ## @@ -32,6 +32,92 @@ use crate::memory_pool::human_readable_size; const DEFAULT_MAX_TEMP_DIRECTORY_SIZE: u64 = 100

[PR] Fix ScalarStructBuilder::build() for an empty struct [datafusion]

2025-05-28 Thread via GitHub
Blizzara opened a new pull request, #16205: URL: https://github.com/apache/datafusion/pull/16205 ## Which issue does this PR close? - Closes #. ## Rationale for this change The bump to Arrow 55.1 brings with it https://github.com/apache/arrow-rs/pull/7247. That cause

Re: [PR] feat: remove `ClusterStorageConfig` as it is redundant [datafusion-ballista]

2025-05-28 Thread via GitHub
milenkovicm merged PR #1265: URL: https://github.com/apache/datafusion-ballista/pull/1265 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubsc

Re: [PR] Add support for `TABLESAMPLE` pipe operator [datafusion-sqlparser-rs]

2025-05-28 Thread via GitHub
hendrikmakait commented on code in PR #1860: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1860#discussion_r2111252622 ## src/ast/query.rs: ## @@ -2680,28 +2680,32 @@ pub enum PipeOperator { full_table_exprs: Vec, group_by_expr: Vec, }, +

Re: [PR] feat: create builder for disk manager [datafusion]

2025-05-28 Thread via GitHub
zhuqi-lucas commented on code in PR #16191: URL: https://github.com/apache/datafusion/pull/16191#discussion_r2111412349 ## datafusion/execution/src/disk_manager.rs: ## @@ -32,7 +32,95 @@ use crate::memory_pool::human_readable_size; const DEFAULT_MAX_TEMP_DIRECTORY_SIZE: u64 =

Re: [I] `CollectLeft` / "right deep tree" optimization not triggered for join between 3 or more delta tables [datafusion]

2025-05-28 Thread via GitHub
aditanase closed issue #16106: `CollectLeft` / "right deep tree" optimization not triggered for join between 3 or more delta tables URL: https://github.com/apache/datafusion/issues/16106 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to G

[PR] chore(deps): bump clap from 4.5.38 to 4.5.39 [datafusion]

2025-05-28 Thread via GitHub
dependabot[bot] opened a new pull request, #16204: URL: https://github.com/apache/datafusion/pull/16204 Bumps [clap](https://github.com/clap-rs/clap) from 4.5.38 to 4.5.39. Release notes Sourced from https://github.com/clap-rs/clap/releases";>clap's releases. v4.5.39 [4.5.

Re: [PR] Add support for `TABLESAMPLE` pipe operator [datafusion-sqlparser-rs]

2025-05-28 Thread via GitHub
hendrikmakait commented on code in PR #1860: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1860#discussion_r2111257608 ## src/ast/query.rs: ## @@ -1559,7 +1559,7 @@ impl fmt::Display for TableSampleBucket { } impl fmt::Display for TableSample { fn fmt(&self

Re: [I] AggregateExec not cancellable [datafusion]

2025-05-28 Thread via GitHub
zhuqi-lucas commented on issue #16193: URL: https://github.com/apache/datafusion/issues/16193#issuecomment-2915766242 > Yes this is more or less the same issue. PR [#14028](https://github.com/apache/datafusion/pull/14028) proposed adding a yield point at the leaf of the plan when moving fro

Re: [I] `CollectLeft` / "right deep tree" optimization not triggered for join between 3 or more delta tables [datafusion]

2025-05-28 Thread via GitHub
aditanase commented on issue #16106: URL: https://github.com/apache/datafusion/issues/16106#issuecomment-2915167215 Thanks for the feedback @alamb - both suggestions make a lot of sense. It does not appear that we can control join ordering through a config. I will send a PR with this.

Re: [PR] feat: create builder for disk manager [datafusion]

2025-05-28 Thread via GitHub
jdrouet commented on code in PR #16191: URL: https://github.com/apache/datafusion/pull/16191#discussion_r2111307101 ## datafusion/execution/src/disk_manager.rs: ## @@ -91,6 +177,11 @@ pub struct DiskManager { } impl DiskManager { +/// Creates a builder for [DiskManager]

Re: [PR] feat: disable task stage plan binary cache [datafusion-ballista]

2025-05-28 Thread via GitHub
milenkovicm merged PR #1266: URL: https://github.com/apache/datafusion-ballista/pull/1266 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubsc

Re: [PR] feat: support inability to yield for loop when it's not using Tok… [datafusion]

2025-05-28 Thread via GitHub
zhuqi-lucas commented on code in PR #16196: URL: https://github.com/apache/datafusion/pull/16196#discussion_r2111658311 ## datafusion/physical-plan/src/aggregates/no_grouping.rs: ## @@ -77,6 +77,11 @@ impl AggregateStream { let baseline_metrics = BaselineMetrics::new(&a

Re: [I] AggregateExec not cancellable [datafusion]

2025-05-28 Thread via GitHub
zhuqi-lucas commented on issue #16193: URL: https://github.com/apache/datafusion/issues/16193#issuecomment-2916014231 Updated no performance regression for the PR with huge aggregate testing: https://github.com/apache/datafusion/pull/16196#issuecomment-2916000852 -- This is an autom

Re: [PR] feat: support inability to yield for loop when it's not using Tok… [datafusion]

2025-05-28 Thread via GitHub
zhuqi-lucas commented on code in PR #16196: URL: https://github.com/apache/datafusion/pull/16196#discussion_r2111675314 ## datafusion/physical-plan/src/aggregates/no_grouping.rs: ## @@ -77,6 +77,11 @@ impl AggregateStream { let baseline_metrics = BaselineMetrics::new(&a

Re: [I] AggregateExec not cancellable [datafusion]

2025-05-28 Thread via GitHub
pepijnve commented on issue #16193: URL: https://github.com/apache/datafusion/issues/16193#issuecomment-2916044328 Just for context, I ran into this while working on a Java based application that drives the DataFusion queries. I want to be able to interrupt query execution from the Java sid

Re: [PR] feat: support inability to yield for loop when it's not using Tok… [datafusion]

2025-05-28 Thread via GitHub
zhuqi-lucas commented on PR #16196: URL: https://github.com/apache/datafusion/pull/16196#issuecomment-2916000852 Updated the performance for current PR: ```rust SET datafusion.execution.target_partitions = 1; SELECT SUM(value) FROM range(1,500) AS t; +--

Re: [PR] fix: equivalence for union [datafusion]

2025-05-28 Thread via GitHub
chenkovsky commented on code in PR #16185: URL: https://github.com/apache/datafusion/pull/16185#discussion_r2111650159 ## datafusion/physical-expr/src/equivalence/class.rs: ## @@ -422,6 +423,60 @@ impl EquivalenceGroup { self.bridge_classes() } +/// Returns a

Re: [PR] feat: support inability to yield for loop when it's not using Tok… [datafusion]

2025-05-28 Thread via GitHub
pepijnve commented on code in PR #16196: URL: https://github.com/apache/datafusion/pull/16196#discussion_r2111655000 ## datafusion/physical-plan/src/aggregates/no_grouping.rs: ## @@ -77,6 +77,11 @@ impl AggregateStream { let baseline_metrics = BaselineMetrics::new(&agg.

Re: [PR] feat: support inability to yield for loop when it's not using Tok… [datafusion]

2025-05-28 Thread via GitHub
zhuqi-lucas commented on code in PR #16196: URL: https://github.com/apache/datafusion/pull/16196#discussion_r2111685115 ## datafusion/physical-plan/src/aggregates/no_grouping.rs: ## @@ -77,6 +77,11 @@ impl AggregateStream { let baseline_metrics = BaselineMetrics::new(&a

Re: [I] AggregateExec not cancellable [datafusion]

2025-05-28 Thread via GitHub
zhuqi-lucas commented on issue #16193: URL: https://github.com/apache/datafusion/issues/16193#issuecomment-2916070640 > Just for context, I ran into this while working on a Java based application that drives the DataFusion queries. I want to be able to interrupt query execution from the Jav

Re: [I] AggregateExec not cancellable [datafusion]

2025-05-28 Thread via GitHub
pepijnve commented on issue #16193: URL: https://github.com/apache/datafusion/issues/16193#issuecomment-2916076401 > do you mean it's not the RecordBatchReceiverStream which help the cancellation? Trying to figure this out :D I'm a Java developer mainly; still getting my head around

Re: [PR] fix: fall back on nested types for default values [datafusion-comet]

2025-05-28 Thread via GitHub
mbutrovich commented on code in PR #1799: URL: https://github.com/apache/datafusion-comet/pull/1799#discussion_r2111696963 ## spark/src/main/scala/org/apache/comet/rules/CometScanRule.scala: ## @@ -118,7 +119,17 @@ case class CometScanRule(session: SparkSession) extends Rule[Sp

Re: [PR] fix: equivalence for union [datafusion]

2025-05-28 Thread via GitHub
chenkovsky commented on code in PR #16185: URL: https://github.com/apache/datafusion/pull/16185#discussion_r2111650159 ## datafusion/physical-expr/src/equivalence/class.rs: ## @@ -422,6 +423,60 @@ impl EquivalenceGroup { self.bridge_classes() } +/// Returns a

Re: [PR] minor: release docker on when release has been tagged [datafusion-ballista]

2025-05-28 Thread via GitHub
milenkovicm merged PR #1264: URL: https://github.com/apache/datafusion-ballista/pull/1264 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubsc

Re: [I] AggregateExec not cancellable [datafusion]

2025-05-28 Thread via GitHub
zhuqi-lucas commented on issue #16193: URL: https://github.com/apache/datafusion/issues/16193#issuecomment-2916148300 I am trying to do a solution with smallest change, may be can also wrapper with CoalescePartitionExec when the partition is 1, and if it has no regression, i believe it's th

Re: [I] AggregateExec not cancellable [datafusion]

2025-05-28 Thread via GitHub
pepijnve commented on issue #16193: URL: https://github.com/apache/datafusion/issues/16193#issuecomment-2916133761 FYI I'm basing myself on the documentation and https://users.rust-lang.org/t/tokio-does-not-terminate-all-tasks-immediately-on-program-exit/100790/10 -- This is an automated

Re: [PR] Shift from Field to FieldRef for all user defined functions [datafusion]

2025-05-28 Thread via GitHub
alamb merged PR #16122: URL: https://github.com/apache/datafusion/pull/16122 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [I] Reduce Field Copy operations before releasing 48.0.0 [datafusion]

2025-05-28 Thread via GitHub
alamb closed issue #16121: Reduce Field Copy operations before releasing 48.0.0 URL: https://github.com/apache/datafusion/issues/16121 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific com

Re: [PR] Shift from Field to FieldRef for all user defined functions [datafusion]

2025-05-28 Thread via GitHub
alamb commented on PR #16122: URL: https://github.com/apache/datafusion/pull/16122#issuecomment-2917166418 Thanks @timsaucer -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] Set aggregation hash seed [datafusion]

2025-05-28 Thread via GitHub
alamb commented on PR #16165: URL: https://github.com/apache/datafusion/pull/16165#issuecomment-2917168057 Second performance run looks as good / better so let's merge this in! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub a

Re: [PR] Return an error on overflow in `do_append_val_inner` [datafusion]

2025-05-28 Thread via GitHub
alamb commented on PR #16201: URL: https://github.com/apache/datafusion/pull/16201#issuecomment-2917175705 🤖: Benchmark completed Details ``` Comparing HEAD and issue-15969-error-on-buffer-overflow Benchmark clickbench_extended.json -

Re: [I] General framework to decorrelate the subqueries [datafusion]

2025-05-28 Thread via GitHub
duongcongtoai commented on issue #5492: URL: https://github.com/apache/datafusion/issues/5492#issuecomment-2917215746 There is also one thing i want to highlight, is that in DuckDB, a SubqueryExpr may result into 2 output expr after decorrelation, this is because they want to support this q

Re: [PR] Set Formatted TableOptions Enum [datafusion]

2025-05-28 Thread via GitHub
alamb commented on code in PR #16166: URL: https://github.com/apache/datafusion/pull/16166#discussion_r2112474913 ## datafusion/datasource/src/file_format.rs: ## @@ -120,7 +121,26 @@ pub trait FileFormatFactory: Sync + Send + GetExt + fmt::Debug { &self, state

Re: [PR] fix: fall back on nested types for default values [datafusion-comet]

2025-05-28 Thread via GitHub
andygrove merged PR #1799: URL: https://github.com/apache/datafusion-comet/pull/1799 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@

Re: [PR] feat: support inability to yield for loop when it's not using Tok… [datafusion]

2025-05-28 Thread via GitHub
alamb commented on PR #16196: URL: https://github.com/apache/datafusion/pull/16196#issuecomment-2917252077 🤖 `./gh_compare_branch.sh` [Benchmark Script](https://github.com/alamb/datafusion-benchmarking/blob/main/gh_compare_branch.sh) Running Linux aal-dev 6.11.0-1013-gcp #13~24.04.1-Ubun

Re: [I] Release DataFusion-Python 47.0.0 [datafusion-python]

2025-05-28 Thread via GitHub
timsaucer closed issue #1115: Release DataFusion-Python 47.0.0 URL: https://github.com/apache/datafusion-python/issues/1115 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To u

Re: [PR] build(deps): bump object_store from 0.12.0 to 0.12.1 [datafusion-python]

2025-05-28 Thread via GitHub
dependabot[bot] commented on PR #1127: URL: https://github.com/apache/datafusion-python/pull/1127#issuecomment-2917263077 Looks like object_store is up-to-date now, so this is no longer needed. -- This is an automated message from the Apache Git Service. To respond to the message, please

Re: [PR] Release DataFusion 47.0.0 [datafusion-python]

2025-05-28 Thread via GitHub
timsaucer merged PR #1130: URL: https://github.com/apache/datafusion-python/pull/1130 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...

Re: [PR] build(deps): bump ring from 0.17.9 to 0.17.14 [datafusion-python]

2025-05-28 Thread via GitHub
dependabot[bot] commented on PR #1124: URL: https://github.com/apache/datafusion-python/pull/1124#issuecomment-2917263369 Looks like ring is up-to-date now, so this is no longer needed. -- This is an automated message from the Apache Git Service. To respond to the message, please log on t

Re: [PR] build(deps): bump object_store from 0.12.0 to 0.12.1 [datafusion-python]

2025-05-28 Thread via GitHub
dependabot[bot] closed pull request #1127: build(deps): bump object_store from 0.12.0 to 0.12.1 URL: https://github.com/apache/datafusion-python/pull/1127 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to g

Re: [PR] build(deps): bump arrow from 55.0.0 to 55.1.0 [datafusion-python]

2025-05-28 Thread via GitHub
dependabot[bot] closed pull request #1128: build(deps): bump arrow from 55.0.0 to 55.1.0 URL: https://github.com/apache/datafusion-python/pull/1128 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to th

Re: [PR] build(deps): bump arrow from 55.0.0 to 55.1.0 [datafusion-python]

2025-05-28 Thread via GitHub
dependabot[bot] commented on PR #1128: URL: https://github.com/apache/datafusion-python/pull/1128#issuecomment-2917263195 Looks like arrow is up-to-date now, so this is no longer needed. -- This is an automated message from the Apache Git Service. To respond to the message, please log on

Re: [PR] build(deps): bump ring from 0.17.9 to 0.17.14 [datafusion-python]

2025-05-28 Thread via GitHub
dependabot[bot] closed pull request #1124: build(deps): bump ring from 0.17.9 to 0.17.14 URL: https://github.com/apache/datafusion-python/pull/1124 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to th

Re: [PR] feat: rewrite subquery into dependent join logical plan [datafusion]

2025-05-28 Thread via GitHub
duongcongtoai commented on code in PR #16016: URL: https://github.com/apache/datafusion/pull/16016#discussion_r2112550252 ## datafusion/optimizer/src/decorrelate_general.rs: ## @@ -0,0 +1,1137 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contribu

Re: [PR] feat: support inability to yield for loop when it's not using Tok… [datafusion]

2025-05-28 Thread via GitHub
alamb commented on PR #16196: URL: https://github.com/apache/datafusion/pull/16196#issuecomment-2917351712 🤖: Benchmark completed Details ``` Comparing HEAD and issue_16193 Benchmark clickbench_extended.json ┏━

Re: [PR] perf: Only add CopyExec if source of `ScanExec` is `native_comet` [datafusion-comet]

2025-05-28 Thread via GitHub
codecov-commenter commented on PR #1808: URL: https://github.com/apache/datafusion-comet/pull/1808#issuecomment-2917149316 ## [Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/1808?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_ca

Re: [PR] Add new stats pruning helpers to allow combining partition values in file level stats [datafusion]

2025-05-28 Thread via GitHub
adriangb commented on code in PR #16139: URL: https://github.com/apache/datafusion/pull/16139#discussion_r2112474979 ## datafusion/common/src/pruning.rs: ## @@ -122,3 +126,1002 @@ pub trait PruningStatistics { values: &HashSet, ) -> Option; } + +/// Prune files ba

Re: [I] Simplify Filter Pushdown APIs for Better Maintainability and Developer Experience [datafusion]

2025-05-28 Thread via GitHub
alamb commented on issue #16188: URL: https://github.com/apache/datafusion/issues/16188#issuecomment-2917159985 In general I agree with the premise that making the filter pushdown APIs easier to use / understand would be very valuable to DataFusion -- the goals @kosiew describe all sound w

Re: [PR] Set aggregation hash seed [datafusion]

2025-05-28 Thread via GitHub
alamb merged PR #16165: URL: https://github.com/apache/datafusion/pull/16165 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] Set aggregation hash seed [datafusion]

2025-05-28 Thread via GitHub
alamb commented on PR #16165: URL: https://github.com/apache/datafusion/pull/16165#issuecomment-2917168337 Thanks again @ctsk -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment

Re: [PR] Add new stats pruning helpers to allow combining partition values in file level stats [datafusion]

2025-05-28 Thread via GitHub
adriangb commented on code in PR #16139: URL: https://github.com/apache/datafusion/pull/16139#discussion_r2112477825 ## datafusion/common/src/pruning.rs: ## @@ -122,3 +126,1002 @@ pub trait PruningStatistics { values: &HashSet, ) -> Option; } + +/// Prune files ba

Re: [PR] fix: Re-enable Spark 4 tests on Linux [datafusion-comet]

2025-05-28 Thread via GitHub
andygrove merged PR #1806: URL: https://github.com/apache/datafusion-comet/pull/1806 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@

Re: [PR] Add new stats pruning helpers to allow combining partition values in file level stats [datafusion]

2025-05-28 Thread via GitHub
alamb commented on code in PR #16139: URL: https://github.com/apache/datafusion/pull/16139#discussion_r2112606619 ## datafusion/common/src/pruning.rs: ## @@ -122,3 +126,1002 @@ pub trait PruningStatistics { values: &HashSet, ) -> Option; } + +/// Prune files based

Re: [PR] Add new stats pruning helpers to allow combining partition values in file level stats [datafusion]

2025-05-28 Thread via GitHub
adriangb commented on code in PR #16139: URL: https://github.com/apache/datafusion/pull/16139#discussion_r2112613924 ## datafusion/common/src/pruning.rs: ## @@ -122,3 +126,1002 @@ pub trait PruningStatistics { values: &HashSet, ) -> Option; } + +/// Prune files ba

Re: [PR] feat: support inability to yield for loop when it's not using Tok… [datafusion]

2025-05-28 Thread via GitHub
alamb commented on PR #16196: URL: https://github.com/apache/datafusion/pull/16196#issuecomment-2917408198 Running the benchmarks again to gather more details -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abo

Re: [PR] feat: support inability to yield for loop when it's not using Tok… [datafusion]

2025-05-28 Thread via GitHub
alamb commented on PR #16196: URL: https://github.com/apache/datafusion/pull/16196#issuecomment-2917407566 🤖 `./gh_compare_branch.sh` [Benchmark Script](https://github.com/alamb/datafusion-benchmarking/blob/main/gh_compare_branch.sh) Running Linux aal-dev 6.11.0-1013-gcp #13~24.04.1-Ubun

Re: [PR] feat: create builder for disk manager [datafusion]

2025-05-28 Thread via GitHub
jdrouet commented on code in PR #16191: URL: https://github.com/apache/datafusion/pull/16191#discussion_r2111797754 ## datafusion/execution/src/disk_manager.rs: ## @@ -32,7 +32,95 @@ use crate::memory_pool::human_readable_size; const DEFAULT_MAX_TEMP_DIRECTORY_SIZE: u64 = 100

Re: [I] AggregateExec not cancellable [datafusion]

2025-05-28 Thread via GitHub
zhuqi-lucas commented on issue #16193: URL: https://github.com/apache/datafusion/issues/16193#issuecomment-2916192179 Tried now, it also works for the wrapper with CoalescePartitionExec when the partition is 1. ```rust diff --git a/datafusion/physical-plan/src/coalesce_partitions.r

Re: [PR] feat: support inability to yield for loop when it's not using Tok… [datafusion]

2025-05-28 Thread via GitHub
zhuqi-lucas commented on code in PR #16196: URL: https://github.com/apache/datafusion/pull/16196#discussion_r2111771381 ## datafusion/physical-plan/src/aggregates/no_grouping.rs: ## @@ -77,6 +77,11 @@ impl AggregateStream { let baseline_metrics = BaselineMetrics::new(&a

Re: [PR] fix: fallback to Spark scan if encryption is enabled (native_datafusion/native_iceberg_compat) [datafusion-comet]

2025-05-28 Thread via GitHub
parthchandra commented on PR #1785: URL: https://github.com/apache/datafusion-comet/pull/1785#issuecomment-2916451874 reverted change for field_ids -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go t

Re: [PR] implement `AggregateExec.partition_statistics` [datafusion]

2025-05-28 Thread via GitHub
UBarney commented on PR #15954: URL: https://github.com/apache/datafusion/pull/15954#issuecomment-2916643078 > LGTM, thank you @UBarney @berkaysynnada Thanks for reviewing. I have addressed your comment -- This is an automated message from the Apache Git Service. To respond to the

Re: [I] AggregateExec not cancellable [datafusion]

2025-05-28 Thread via GitHub
pepijnve commented on issue #16193: URL: https://github.com/apache/datafusion/issues/16193#issuecomment-2916647205 @zhuqi-lucas @alamb I slapped together something quickly to test my cancellation hypothesis. See https://gist.github.com/pepijnve/c013a697b1869ea067e793bf3e1e115a For me

Re: [PR] Implement schema adapter support for FileSource and add integration tests [datafusion]

2025-05-28 Thread via GitHub
alamb commented on code in PR #16148: URL: https://github.com/apache/datafusion/pull/16148#discussion_r2112141978 ## datafusion/datasource/src/test_util.rs: ## @@ -81,6 +83,8 @@ impl FileSource for MockSource { fn file_type(&self) -> &str { "mock" } + +imp

Re: [PR] chore: add assertion that not using comet scan but using native scan [datafusion-comet]

2025-05-28 Thread via GitHub
andygrove commented on PR #1793: URL: https://github.com/apache/datafusion-comet/pull/1793#issuecomment-2916690342 > I thought everything that came from JVM is reusing buffers, if it's not the case than the copy should not always be added when there is a ScanExec, no? (looking at the wrap

Re: [PR] Implement schema adapter support for FileSource and add integration tests [datafusion]

2025-05-28 Thread via GitHub
adriangb commented on code in PR #16148: URL: https://github.com/apache/datafusion/pull/16148#discussion_r2112148579 ## datafusion/datasource/src/test_util.rs: ## @@ -81,6 +83,8 @@ impl FileSource for MockSource { fn file_type(&self) -> &str { "mock" } + +

Re: [PR] Eliminate Self Joins [datafusion]

2025-05-28 Thread via GitHub
atahanyorganci commented on PR #16023: URL: https://github.com/apache/datafusion/pull/16023#issuecomment-2916493090 Current implementation fails in call to `assert_valid_optimization` where schema of a optimization pass is compared against the previous state. Failure occurs when alias is re

[PR] [wip] chore: more CI work [datafusion-comet]

2025-05-28 Thread via GitHub
andygrove opened a new pull request, #1807: URL: https://github.com/apache/datafusion-comet/pull/1807 ## Which issue does this PR close? Closes #. ## Rationale for this change ## What changes are included in this PR? ## How are these changes

[PR] Add support for mysql's drop index (`DROP INDEX idx_a ON table_a` and `ALTER TABLE table_a DROP INDEX idx_a`) [datafusion-sqlparser-rs]

2025-05-28 Thread via GitHub
vimko opened a new pull request, #1865: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1865 This PR adds support for mysql's drop index: - `DROP INDEX idx_a ON table_a` - `ALTER TABLE table_a DROP INDEX idx_a` Addresses https://github.com/apache/datafusion-sqlpar

Re: [PR] chore: Make CI rules more consistent and reduce PR coverage for Spark 3.4 [datafusion-comet]

2025-05-28 Thread via GitHub
andygrove closed pull request #1784: chore: Make CI rules more consistent and reduce PR coverage for Spark 3.4 URL: https://github.com/apache/datafusion-comet/pull/1784 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

Re: [PR] Implement schema adapter support for FileSource and add integration tests [datafusion]

2025-05-28 Thread via GitHub
adriangb commented on code in PR #16148: URL: https://github.com/apache/datafusion/pull/16148#discussion_r2112246851 ## datafusion/datasource/src/test_util.rs: ## @@ -81,6 +83,8 @@ impl FileSource for MockSource { fn file_type(&self) -> &str { "mock" } + +

Re: [PR] [wip] chore: more CI work [datafusion-comet]

2025-05-28 Thread via GitHub
codecov-commenter commented on PR #1807: URL: https://github.com/apache/datafusion-comet/pull/1807#issuecomment-2916840689 ## [Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/1807?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_ca

Re: [PR] implement `AggregateExec.partition_statistics` [datafusion]

2025-05-28 Thread via GitHub
berkaysynnada merged PR #15954: URL: https://github.com/apache/datafusion/pull/15954 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@

Re: [PR] fix: fall back on nested types for default values [datafusion-comet]

2025-05-28 Thread via GitHub
comphead commented on code in PR #1799: URL: https://github.com/apache/datafusion-comet/pull/1799#discussion_r2112239658 ## spark/src/main/scala/org/apache/comet/rules/CometScanRule.scala: ## @@ -118,7 +119,17 @@ case class CometScanRule(session: SparkSession) extends Rule[Spar

Re: [PR] Implement schema adapter support for FileSource and add integration tests [datafusion]

2025-05-28 Thread via GitHub
alamb commented on code in PR #16148: URL: https://github.com/apache/datafusion/pull/16148#discussion_r2112237909 ## datafusion/datasource/src/test_util.rs: ## @@ -81,6 +83,8 @@ impl FileSource for MockSource { fn file_type(&self) -> &str { "mock" } + +imp

Re: [PR] chore(deps): bump clap from 4.5.38 to 4.5.39 [datafusion]

2025-05-28 Thread via GitHub
alamb merged PR #16204: URL: https://github.com/apache/datafusion/pull/16204 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] fix: fall back on nested types for default values [datafusion-comet]

2025-05-28 Thread via GitHub
mbutrovich commented on code in PR #1799: URL: https://github.com/apache/datafusion-comet/pull/1799#discussion_r2112261140 ## spark/src/main/scala/org/apache/comet/rules/CometScanRule.scala: ## @@ -118,7 +119,17 @@ case class CometScanRule(session: SparkSession) extends Rule[Sp

Re: [PR] Add new stats pruning helpers to allow combining partition values in file level stats [datafusion]

2025-05-28 Thread via GitHub
alamb commented on code in PR #16139: URL: https://github.com/apache/datafusion/pull/16139#discussion_r2112252563 ## datafusion/common/src/pruning.rs: ## @@ -122,3 +126,1002 @@ pub trait PruningStatistics { values: &HashSet, ) -> Option; } + +/// Prune files based

Re: [PR] Handle dicts for distinct count [datafusion]

2025-05-28 Thread via GitHub
alamb commented on code in PR #15871: URL: https://github.com/apache/datafusion/pull/15871#discussion_r2112273596 ## datafusion/functions-aggregate/src/count.rs: ## @@ -764,4 +774,49 @@ mod tests { assert_eq!(accumulator.evaluate()?, ScalarValue::Int64(Some(0)));

Re: [PR] implement `AggregateExec.partition_statistics` [datafusion]

2025-05-28 Thread via GitHub
xudong963 commented on PR #15954: URL: https://github.com/apache/datafusion/pull/15954#issuecomment-2916884677 @UBarney Thank you again!! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the speci

Re: [PR] Additional placeholder datatype inferencing [datafusion]

2025-05-28 Thread via GitHub
alamb commented on code in PR #15980: URL: https://github.com/apache/datafusion/pull/15980#discussion_r2112276143 ## datafusion/expr/src/logical_plan/plan.rs: ## @@ -1494,6 +1494,14 @@ impl LogicalPlan { let mut param_types: HashMap> = HashMap::new(); self.a

Re: [PR] doc: add diagram to describe how DataSource, FileSource, and DataSourceExec are related [datafusion]

2025-05-28 Thread via GitHub
xudong963 merged PR #16181: URL: https://github.com/apache/datafusion/pull/16181 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@data

Re: [PR] implement `AggregateExec.partition_statistics` [datafusion]

2025-05-28 Thread via GitHub
alamb commented on PR #15954: URL: https://github.com/apache/datafusion/pull/15954#issuecomment-2916893985 🎉 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe

Re: [I] Add diagrams for relationship between `FileSource`, `DataSource` and `DataSourceExec` [datafusion]

2025-05-28 Thread via GitHub
xudong963 closed issue #15887: Add diagrams for relationship between `FileSource`, `DataSource` and `DataSourceExec` URL: https://github.com/apache/datafusion/issues/15887 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use t

Re: [PR] feat: support inability to yield for loop when it's not using Tok… [datafusion]

2025-05-28 Thread via GitHub
alamb commented on PR #16196: URL: https://github.com/apache/datafusion/pull/16196#issuecomment-2917443275 🤖: Benchmark completed Details ``` Comparing HEAD and issue_16193 Benchmark clickbench_extended.json ┏━

Re: [PR] chore: [native scans] Ignore Spark SQL test for string predicate pushdown [datafusion-comet]

2025-05-28 Thread via GitHub
parthchandra commented on code in PR #1768: URL: https://github.com/apache/datafusion-comet/pull/1768#discussion_r2112650217 ## spark/src/test/scala/org/apache/comet/CometExpressionSuite.scala: ## @@ -1330,6 +1330,25 @@ class CometExpressionSuite extends CometTestBase with Adap

Re: [PR] Implement intermediate result blocked approach to aggregation memory management [datafusion]

2025-05-28 Thread via GitHub
Rachelint commented on PR #15591: URL: https://github.com/apache/datafusion/pull/15591#issuecomment-2917518553 Thanks @adriangb @Dandandan . I just start my new job this week and a bit busy, and I will continue to push it forward this weekend. The new targets for this one may b

Re: [PR] Specialized `GroupValues` for `primitive` and `large_primitive` [datafusion]

2025-05-28 Thread via GitHub
Rachelint commented on code in PR #16136: URL: https://github.com/apache/datafusion/pull/16136#discussion_r2112691325 ## datafusion/physical-plan/src/aggregates/group_values/single_group_by/primitive/mod.rs: ## @@ -116,42 +122,60 @@ where { fn intern(&mut self, cols: &[Arr

Re: [PR] Specialized `GroupValues` for `primitive` and `large_primitive` [datafusion]

2025-05-28 Thread via GitHub
Rachelint commented on PR #16136: URL: https://github.com/apache/datafusion/pull/16136#issuecomment-2917542628 > I think this PR makes things better so approving. Nice work @Rachelint. Thanks @alamb , I think still two blocked things before merging it: - Maybe we should also compare

Re: [PR] feat: Support parsing subqueries with `OuterReferenceColumn` belongs to non-adjacent outer relations [datafusion]

2025-05-28 Thread via GitHub
logan-keede commented on PR #16186: URL: https://github.com/apache/datafusion/pull/16186#issuecomment-2917556654 Currently error look like:- ```sql > explain SELECT e1.employee_name, e1.salary FROM employees e1 WHERE e1.salary > ( SELECT AVG(e2.salary) FROM employee

Re: [PR] Specialized `GroupValues` for `primitive` and `large_primitive` [datafusion]

2025-05-28 Thread via GitHub
alamb commented on PR #16136: URL: https://github.com/apache/datafusion/pull/16136#issuecomment-2917562287 🤖 `./gh_compare_branch.sh` [Benchmark Script](https://github.com/alamb/datafusion-benchmarking/blob/main/gh_compare_branch.sh) Running Linux aal-dev 6.11.0-1013-gcp #13~24.04.1-Ubun

Re: [PR] Implement intermediate result blocked approach to aggregation memory management [datafusion]

2025-05-28 Thread via GitHub
alamb commented on PR #15591: URL: https://github.com/apache/datafusion/pull/15591#issuecomment-2917574821 Thanks for all your help @Rachelint and congratulations on the new job -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

Re: [PR] Specialized `GroupValues` for `primitive` and `large_primitive` [datafusion]

2025-05-28 Thread via GitHub
alamb commented on PR #16136: URL: https://github.com/apache/datafusion/pull/16136#issuecomment-2917647569 🤖: Benchmark completed Details ``` Comparing HEAD and improve-primitive-group-values Benchmark clickbench_extended.json ---

[PR] Add support for parameter default values in SQL Server [datafusion-sqlparser-rs]

2025-05-28 Thread via GitHub
aharpervc opened a new pull request, #1866: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1866 This PR adds support for default values for parameters, as documented here: https://learn.microsoft.com/en-us/sql/t-sql/statements/create-function-transact-sql?view=sql-server-ver17#-

Re: [PR] Implement schema adapter support for FileSource and add integration tests [datafusion]

2025-05-28 Thread via GitHub
alamb commented on code in PR #16148: URL: https://github.com/apache/datafusion/pull/16148#discussion_r2112282534 ## datafusion/datasource/src/test_util.rs: ## @@ -81,6 +83,8 @@ impl FileSource for MockSource { fn file_type(&self) -> &str { "mock" } + +imp

Re: [PR] chore: manual "git bisect" to try and determine when CI failures started [datafusion-comet]

2025-05-28 Thread via GitHub
andygrove closed pull request #1804: chore: manual "git bisect" to try and determine when CI failures started URL: https://github.com/apache/datafusion-comet/pull/1804 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the U

Re: [PR] Chore: Moved strings expressions to separate file [datafusion-comet]

2025-05-28 Thread via GitHub
andygrove merged PR #1792: URL: https://github.com/apache/datafusion-comet/pull/1792 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@

Re: [I] union all +aggregate function in the recursive cte results an infinite loop [datafusion-python]

2025-05-28 Thread via GitHub
timsaucer commented on issue #1131: URL: https://github.com/apache/datafusion-python/issues/1131#issuecomment-2916920418 Ok to close this issue? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to t

Re: [PR] Clarify documentation about gathering statistics for parquet files [datafusion]

2025-05-28 Thread via GitHub
xudong963 commented on code in PR #16157: URL: https://github.com/apache/datafusion/pull/16157#discussion_r2112300128 ## docs/source/user-guide/sql/ddl.md: ## @@ -91,6 +93,23 @@ STORED AS PARQUET LOCATION '/mnt/nyctaxi/tripdata.parquet'; ``` +:::{note} Review Comment: >

  1   2   >