Re: [PR] feat: add read array support [datafusion-comet]

2025-02-28 Thread via GitHub
comphead commented on code in PR #1456: URL: https://github.com/apache/datafusion-comet/pull/1456#discussion_r1976123048 ## spark/src/test/scala/org/apache/comet/exec/CometNativeReaderSuite.scala: ## @@ -0,0 +1,65 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under

Re: [PR] Standardize CREATE TABLE options equals signs [datafusion-sqlparser-rs]

2025-02-28 Thread via GitHub
mvzink commented on code in PR #1751: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1751#discussion_r1975782813 ## tests/sqlparser_mysql.rs: ## @@ -1047,6 +1047,174 @@ fn parse_create_table_gencol() { mysql_and_generic().verified_stmt("CREATE TABLE t1 (a INT,

Re: [PR] Add tests for Demonstrate EnforceSorting can remove a needed coalesce [datafusion]

2025-02-28 Thread via GitHub
wiedld commented on code in PR #14919: URL: https://github.com/apache/datafusion/pull/14919#discussion_r1975963815 ## datafusion/core/tests/physical_optimizer/enforce_sorting.rs: ## @@ -3346,3 +3351,62 @@ async fn test_window_partial_constant_and_set_monotonicity() -> Result<()

[PR] Split out avro, parquet, json and csv into individual crates [datafusion]

2025-02-28 Thread via GitHub
AdamGS opened a new pull request, #14951: URL: https://github.com/apache/datafusion/pull/14951 ## Which issue does this PR close? Part of #1. The PR is obviously huge, but because the file are split (keeping tests in core), github doesn't understand some of the changes as m

Re: [I] Release DataFusion `46.0.0` [datafusion]

2025-02-28 Thread via GitHub
shehabgamin commented on issue #14123: URL: https://github.com/apache/datafusion/issues/14123#issuecomment-2691748036 > [@shehabgamin](https://github.com/shehabgamin) / [@jayzhan211](https://github.com/jayzhan211) should we makr `Expr::Wildcard` as deprecated? I'm in support of this!

Re: [I] Remove redundant statistics from FileScanConfig [datafusion]

2025-02-28 Thread via GitHub
Standing-Man commented on issue #14937: URL: https://github.com/apache/datafusion/issues/14937#issuecomment-2691759266 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[PR] Myst warnings [datafusion]

2025-02-28 Thread via GitHub
AmosAidoo opened a new pull request, #14952: URL: https://github.com/apache/datafusion/pull/14952 ## Which issue does this PR close? - Closes #14945. ## Rationale for this change docs/build.sh produces warnings. This change removes all those warnings and trea

Re: [PR] feat: add read array support [datafusion-comet]

2025-02-28 Thread via GitHub
codecov-commenter commented on PR #1456: URL: https://github.com/apache/datafusion-comet/pull/1456#issuecomment-2691785564 ## [Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/1456?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_ca

Re: [PR] Examples: boundary analysis example for `AND/OR` conjunctions [datafusion]

2025-02-28 Thread via GitHub
clflushopt commented on code in PR #14735: URL: https://github.com/apache/datafusion/pull/14735#discussion_r1976136920 ## docs/source/library-user-guide/query-optimizer.md: ## @@ -388,3 +388,119 @@ In the following example, the `type_coercion` and `simplify_expressions` passes

Re: [PR] Update dependencies for df-ray [datafusion-ray]

2025-02-28 Thread via GitHub
andygrove merged PR #67: URL: https://github.com/apache/datafusion-ray/pull/67 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafu

Re: [PR] Add tests for Demonstrate EnforceSorting can remove a needed coalesce [datafusion]

2025-02-28 Thread via GitHub
wiedld commented on code in PR #14919: URL: https://github.com/apache/datafusion/pull/14919#discussion_r1975963815 ## datafusion/core/tests/physical_optimizer/enforce_sorting.rs: ## @@ -3346,3 +3351,62 @@ async fn test_window_partial_constant_and_set_monotonicity() -> Result<()

Re: [PR] Add tests for Demonstrate EnforceSorting can remove a needed coalesce [datafusion]

2025-02-28 Thread via GitHub
alamb commented on code in PR #14919: URL: https://github.com/apache/datafusion/pull/14919#discussion_r1975962627 ## datafusion/core/tests/physical_optimizer/enforce_sorting.rs: ## @@ -3346,3 +3351,62 @@ async fn test_window_partial_constant_and_set_monotonicity() -> Result<()>

[PR] Minor: improve documentation of `AggregateMode` [datafusion]

2025-02-28 Thread via GitHub
alamb opened a new pull request, #14946: URL: https://github.com/apache/datafusion/pull/14946 ## Which issue does this PR close? - related to https://github.com/apache/datafusion/issues/14691 ## Rationale for this change I have always found `AggregateMode` to be s

Re: [PR] Add tests for Demonstrate EnforceSorting can remove a needed coalesce [datafusion]

2025-02-28 Thread via GitHub
alamb commented on code in PR #14919: URL: https://github.com/apache/datafusion/pull/14919#discussion_r1975938366 ## datafusion/core/tests/physical_optimizer/enforce_sorting.rs: ## @@ -3346,3 +3351,62 @@ async fn test_window_partial_constant_and_set_monotonicity() -> Result<()>

Re: [PR] Minor: improve documentation of `AggregateMode` [datafusion]

2025-02-28 Thread via GitHub
wiedld commented on code in PR #14946: URL: https://github.com/apache/datafusion/pull/14946#discussion_r1975989405 ## datafusion/physical-plan/src/aggregates/mod.rs: ## @@ -57,41 +57,53 @@ mod row_hash; mod topk; mod topk_stream; -/// Hash aggregate modes +/// Aggregation mo

Re: [I] ParallelizeSorts, a subrule of EnforceSorting optimizer, should not remove necessary coalesce. [datafusion]

2025-02-28 Thread via GitHub
wiedld commented on issue #14691: URL: https://github.com/apache/datafusion/issues/14691#issuecomment-2691661588 > So does this mean that whatever issue we were hitting has been fixed on main (in Datafusion 46?) yes, it should be. -- This is an automated message from the Apache Git

Re: [I] ParallelizeSorts, a subrule of EnforceSorting optimizer, should not remove necessary coalesce. [datafusion]

2025-02-28 Thread via GitHub
alamb commented on issue #14691: URL: https://github.com/apache/datafusion/issues/14691#issuecomment-2691657431 So does this mean that whatever issue we were hitting has been fixed on main (in Datafusion 46?) > Update: > > * the input to the EnforceSorting is invalid. > * it

Re: [PR] Add support for aggregate expressions with filters [datafusion-sqlparser-rs]

2025-02-28 Thread via GitHub
coveralls commented on PR #585: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/585#issuecomment-2691671261 ## Pull Request Test Coverage Report for [Build 2889288145](https://coveralls.io/builds/51794474) ### Warning: This coverage report may be inaccurate. This p

Re: [I] Code clean for new datafusion-cli streaming printing logic [datafusion]

2025-02-28 Thread via GitHub
zhuqi-lucas commented on issue #14886: URL: https://github.com/apache/datafusion/issues/14886#issuecomment-2690082324 Sure @shruti2522 , thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to th

Re: [PR] Add tests for Demonstrate EnforceSorting can remove a needed coalesce [datafusion]

2025-02-28 Thread via GitHub
berkaysynnada commented on code in PR #14919: URL: https://github.com/apache/datafusion/pull/14919#discussion_r1975931108 ## datafusion/core/tests/physical_optimizer/enforce_sorting.rs: ## @@ -3346,3 +3351,62 @@ async fn test_window_partial_constant_and_set_monotonicity() -> Re

Re: [PR] Document SQL literal syntax and escaping [datafusion]

2025-02-28 Thread via GitHub
alamb merged PR #14934: URL: https://github.com/apache/datafusion/pull/14934 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [I] Remove the need for registering an ObjectStore for remote files [datafusion-python]

2025-02-28 Thread via GitHub
kylebarron commented on issue #899: URL: https://github.com/apache/datafusion-python/issues/899#issuecomment-2691422190 To answer the original question of this issue: I think it's necessary to export classes to Python to customize `object_store` instances, because people often have custom

Re: [PR] Refactor SortPushdown using the standard top-down visitor and using `EquivalenceProperties` [datafusion]

2025-02-28 Thread via GitHub
berkaysynnada commented on PR #14821: URL: https://github.com/apache/datafusion/pull/14821#issuecomment-2691426261 > I went over this PR again carefully with @wiedld and we discussed the plan changes and they seem like improvements to me > > @berkaysynnada I wonder if you would be int

Re: [I] datafusion-cli regression: explain plan output looks bad (error rendering multi-lines) [datafusion]

2025-02-28 Thread via GitHub
alamb commented on issue #14947: URL: https://github.com/apache/datafusion/issues/14947#issuecomment-2691583601 If we can't figure this out by tomorrow, I am thinking we can revert https://github.com/apache/datafusion/pull/14877. I will prepare a PR to do so as a backup -- This is an aut

Re: [PR] Fix failing extended `sqlite`test on main / update `datafusion-testing` pin [datafusion]

2025-02-28 Thread via GitHub
alamb commented on PR #14940: URL: https://github.com/apache/datafusion/pull/14940#issuecomment-2691450162 @comphead can I beg a review for this one to get main running clean again? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to Gi

Re: [PR] Add tests for Demonstrate EnforceSorting can remove a needed coalesce [datafusion]

2025-02-28 Thread via GitHub
wiedld commented on code in PR #14919: URL: https://github.com/apache/datafusion/pull/14919#discussion_r1975963815 ## datafusion/core/tests/physical_optimizer/enforce_sorting.rs: ## @@ -3346,3 +3351,62 @@ async fn test_window_partial_constant_and_set_monotonicity() -> Result<()

Re: [I] Release DataFusion `46.0.0` [datafusion]

2025-02-28 Thread via GitHub
alamb commented on issue #14123: URL: https://github.com/apache/datafusion/issues/14123#issuecomment-2691604815 I have a PR up to fix the regression - https://github.com/apache/datafusion/pull/14948 -- This is an automated message from the Apache Git Service. To respond to the message,

Re: [PR] fix: remove code duplication in native_datafusion and native_iceberg_compat implementations [datafusion-comet]

2025-02-28 Thread via GitHub
mbutrovich commented on PR #1443: URL: https://github.com/apache/datafusion-comet/pull/1443#issuecomment-2691594784 Would it be difficult to split this PR? The title makes it seem like a minor refactor to the ParquetExec instantiation, but it's actually bringing in a lot of the object stor

Re: [PR] Minor fixes to README [datafusion-ray]

2025-02-28 Thread via GitHub
vmingchen commented on PR #64: URL: https://github.com/apache/datafusion-ray/pull/64#issuecomment-2691622009 Hi @andygrove , can you take a look at this small PR as well? Thanks a lot! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

Re: [PR] Datafusion-cli: Redesign the datafusion-cli execution and print, make it totally streaming printing without memory overhead [datafusion]

2025-02-28 Thread via GitHub
alamb commented on PR #14877: URL: https://github.com/apache/datafusion/pull/14877#issuecomment-2691547433 FYI we found one issue with this code here (fixed in https://github.com/apache/datafusion/pull/14921): - https://github.com/apache/datafusion/issues/14920 - https://github.com/apa

Re: [I] datafusion-cli regression: explain plan output looks bad [datafusion]

2025-02-28 Thread via GitHub
alamb commented on issue #14947: URL: https://github.com/apache/datafusion/issues/14947#issuecomment-2691564690 I think the issue is the code added in https://github.com/apache/datafusion/pull/14877 doesn't handle multiple lines -- This is an automated message from the Apache Git Service.

Re: [PR] Fix failing extended `sqlite`test on main / update `datafusion-testing` pin [datafusion]

2025-02-28 Thread via GitHub
alamb commented on PR #14940: URL: https://github.com/apache/datafusion/pull/14940#issuecomment-2691571198 Thanks @comphead -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [I] Split properties.rs into smaller modules [datafusion]

2025-02-28 Thread via GitHub
alamb closed issue #14913: Split properties.rs into smaller modules URL: https://github.com/apache/datafusion/issues/14913 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To un

Re: [I] Release DataFusion `46.0.0` [datafusion]

2025-02-28 Thread via GitHub
alamb commented on issue #14123: URL: https://github.com/apache/datafusion/issues/14123#issuecomment-2691572889 My update here: 1. We fixed all the known bugs https://github.com/apache/datafusion/issues/14123#issuecomment-2689507558 2. I found another regression today that I think need

Re: [PR] Refactor SortPushdown using the standard top-down visitor and using `EquivalenceProperties` [datafusion]

2025-02-28 Thread via GitHub
alamb commented on PR #14821: URL: https://github.com/apache/datafusion/pull/14821#issuecomment-2691575366 > Of course, we can. My username `berkaysynnada` on discord, feel free to reach me. I reached out to find some time -- This is an automated message from the Apache Git Service

Re: [I] Upgrade pyo3 and DataFusion dependencies [datafusion-ray]

2025-02-28 Thread via GitHub
andygrove closed issue #63: Upgrade pyo3 and DataFusion dependencies URL: https://github.com/apache/datafusion-ray/issues/63 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [PR] Fix failing extended `sqlite`test on main / update `datafusion-testing` pin [datafusion]

2025-02-28 Thread via GitHub
comphead merged PR #14940: URL: https://github.com/apache/datafusion/pull/14940 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

Re: [PR] Substrait support for propagating TableScan.filters to Substrait ReadRel.filter and ReadRel.best_effort_filter [datafusion]

2025-02-28 Thread via GitHub
vbarua commented on code in PR #14194: URL: https://github.com/apache/datafusion/pull/14194#discussion_r1976069030 ## datafusion/substrait/src/logical_plan/producer.rs: ## @@ -559,12 +559,31 @@ pub fn from_table_scan( let table_schema = scan.source.schema().to_dfschema_ref(

Re: [I] Support IntegralDivide function [datafusion-comet]

2025-02-28 Thread via GitHub
kazuyukitanimura closed issue #1422: Support IntegralDivide function URL: https://github.com/apache/datafusion-comet/issues/1422 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] feat: Support IntegralDivide function [datafusion-comet]

2025-02-28 Thread via GitHub
kazuyukitanimura merged PR #1428: URL: https://github.com/apache/datafusion-comet/pull/1428 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsub

Re: [PR] feat: Support IntegralDivide function [datafusion-comet]

2025-02-28 Thread via GitHub
kazuyukitanimura commented on PR #1428: URL: https://github.com/apache/datafusion-comet/pull/1428#issuecomment-2691685680 Merged thanks @wForget -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] test: enforce distribution is no longer inserting the coalesce [datafusion]

2025-02-28 Thread via GitHub
wiedld commented on code in PR #14949: URL: https://github.com/apache/datafusion/pull/14949#discussion_r1976064464 ## datafusion/core/tests/physical_optimizer/enforce_distribution.rs: ## @@ -3154,3 +3157,204 @@ fn optimize_away_unnecessary_repartition2() -> Result<()> {

Re: [I] ParallelizeSorts, a subrule of EnforceSorting optimizer, should not remove necessary coalesce. [datafusion]

2025-02-28 Thread via GitHub
alamb commented on issue #14691: URL: https://github.com/apache/datafusion/issues/14691#issuecomment-2691430295 I believe in https://github.com/apache/datafusion/pull/14919/files#r1975931108 @berkaysynnada is saying that the input plan is not valid: ``` "SortExec: expr=[a@0 ASC]

Re: [PR] Substrait support for propagating TableScan.filters to Substrait ReadRel.filter and ReadRel.best_effort_filter [datafusion]

2025-02-28 Thread via GitHub
jamxia155 commented on code in PR #14194: URL: https://github.com/apache/datafusion/pull/14194#discussion_r1975909845 ## datafusion/substrait/src/logical_plan/producer.rs: ## @@ -559,12 +559,31 @@ pub fn from_table_scan( let table_schema = scan.source.schema().to_dfschema_r

Re: [PR] Document SQL literal syntax and escaping [datafusion]

2025-02-28 Thread via GitHub
alamb commented on PR #14934: URL: https://github.com/apache/datafusion/pull/14934#issuecomment-2691416179 Thank you for the review @comphead -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] chore: forbide `with_default_features` override existing information [datafusion]

2025-02-28 Thread via GitHub
alamb commented on code in PR #14935: URL: https://github.com/apache/datafusion/pull/14935#discussion_r1975987954 ## datafusion/core/src/execution/session_state.rs: ## @@ -1144,7 +1146,9 @@ impl SessionStateBuilder { mut self, expr_planners: Vec>, ) -> Sel

[PR] Alamb/revert cli update [datafusion]

2025-02-28 Thread via GitHub
alamb opened a new pull request, #14948: URL: https://github.com/apache/datafusion/pull/14948 ## Which issue does this PR close? - Closes https://github.com/apache/datafusion/issues/14947 - Related to https://github.com/apache/datafusion/issues/14123 - Reopens https://git

[PR] Fix extended `sqlite` tests vai datafusion-testing pin [datafusion]

2025-02-28 Thread via GitHub
alamb opened a new pull request, #14940: URL: https://github.com/apache/datafusion/pull/14940 ## Which issue does this PR close? CI is failing on main: https://github.com/apache/datafusion/actions/runs/13591237968/job/37997646819 ## Rationale for this change The fail

Re: [PR] Substrait support for propagating TableScan.filters to Substrait ReadRel.filter and ReadRel.best_effort_filter [datafusion]

2025-02-28 Thread via GitHub
westonpace commented on code in PR #14194: URL: https://github.com/apache/datafusion/pull/14194#discussion_r1975966228 ## datafusion/substrait/src/logical_plan/producer.rs: ## @@ -559,12 +559,31 @@ pub fn from_table_scan( let table_schema = scan.source.schema().to_dfschema_

Re: [PR] Add tests for Demonstrate EnforceSorting can remove a needed coalesce [datafusion]

2025-02-28 Thread via GitHub
wiedld commented on code in PR #14919: URL: https://github.com/apache/datafusion/pull/14919#discussion_r1975887998 ## datafusion/core/tests/physical_optimizer/enforce_sorting.rs: ## @@ -3346,3 +3351,62 @@ async fn test_window_partial_constant_and_set_monotonicity() -> Result<()

[PR] test: enforce distribution is no longer inserting the coalesce [datafusion]

2025-02-28 Thread via GitHub
wiedld opened a new pull request, #14949: URL: https://github.com/apache/datafusion/pull/14949 In a previous reproducer, I showed how the enforce distribution was inserting the coalesce. See here: https://github.com/influxdata/arrow-datafusion/pull/58#discussion_r1976044998 On the

Re: [PR] test: enforce distribution is no longer inserting the coalesce [datafusion]

2025-02-28 Thread via GitHub
wiedld closed pull request #14949: test: enforce distribution is no longer inserting the coalesce URL: https://github.com/apache/datafusion/pull/14949 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] Minor fixes to README [datafusion-ray]

2025-02-28 Thread via GitHub
andygrove merged PR #64: URL: https://github.com/apache/datafusion-ray/pull/64 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafu

Re: [I] ParallelizeSorts, a subrule of EnforceSorting optimizer, should not remove necessary coalesce. [datafusion]

2025-02-28 Thread via GitHub
wiedld closed issue #14691: ParallelizeSorts, a subrule of EnforceSorting optimizer, should not remove necessary coalesce. URL: https://github.com/apache/datafusion/issues/14691 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

Re: [PR] Add tests for Demonstrate EnforceSorting can remove a needed coalesce [datafusion]

2025-02-28 Thread via GitHub
wiedld commented on code in PR #14919: URL: https://github.com/apache/datafusion/pull/14919#discussion_r1976065081 ## datafusion/core/tests/physical_optimizer/enforce_sorting.rs: ## @@ -3346,3 +3351,62 @@ async fn test_window_partial_constant_and_set_monotonicity() -> Result<()

[PR] Remove invalid bug reproducer. [datafusion]

2025-02-28 Thread via GitHub
wiedld opened a new pull request, #14950: URL: https://github.com/apache/datafusion/pull/14950 Revert "test(14691): demonstrate EnforceSorting can remove a needed coalesce (#14919)" This reverts commit 32224b48ca5f779ed2833f480f69af51c6637408. -- This is an automated message fr

Re: [PR] Datafusion-cli: Redesign the datafusion-cli execution and print, make it totally streaming printing without memory overhead [datafusion]

2025-02-28 Thread via GitHub
alamb commented on PR #14877: URL: https://github.com/apache/datafusion/pull/14877#issuecomment-2691547654 I believe I also found another related issue with explain plans - https://github.com/apache/datafusion/issues/14947 -- This is an automated message from the Apache Git Service. To

Re: [I] Release DataFusion `46.0.0` [datafusion]

2025-02-28 Thread via GitHub
jayzhan211 commented on issue #14123: URL: https://github.com/apache/datafusion/issues/14123#issuecomment-2691801390 > I think this sounds like a good idea to me. I don't think we are actively trying to cause pain for downstream users, more like we struggle to find the time to properly thin

Re: [PR] Support WITHIN GROUP syntax to standardize certain existing aggregate functions [datafusion]

2025-02-28 Thread via GitHub
vbarua commented on code in PR #13511: URL: https://github.com/apache/datafusion/pull/13511#discussion_r1976134261 ## datafusion/core/tests/dataframe/dataframe_functions.rs: ## @@ -360,14 +360,15 @@ async fn test_fn_approx_median() -> Result<()> { #[tokio::test] async fn tes

Re: [PR] perf: Reduce native shuffle memory overhead by 50% [datafusion-comet]

2025-02-28 Thread via GitHub
andygrove merged PR #1452: URL: https://github.com/apache/datafusion-comet/pull/1452 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@

Re: [PR] chore: forbide `with_default_features` override existing information [datafusion]

2025-02-28 Thread via GitHub
irenjj commented on code in PR #14935: URL: https://github.com/apache/datafusion/pull/14935#discussion_r1976239466 ## datafusion/core/src/execution/session_state.rs: ## @@ -1081,6 +1081,8 @@ impl SessionStateBuilder { /// Create default builder with defaults for table_fac

Re: [PR] Feat: support array_compact function [datafusion-comet]

2025-02-28 Thread via GitHub
kazantsev-maksim commented on code in PR #1321: URL: https://github.com/apache/datafusion-comet/pull/1321#discussion_r1975770217 ## native/core/src/execution/planner.rs: ## @@ -830,6 +830,25 @@ impl PhysicalPlanner { )); Ok(array_has_any_expr)

[PR] chore(deps): bump the arrow-parquet group with 7 updates [datafusion]

2025-02-28 Thread via GitHub
dependabot[bot] opened a new pull request, #14930: URL: https://github.com/apache/datafusion/pull/14930 Bumps the arrow-parquet group with 7 updates: | Package | From | To | | --- | --- | --- | | [arrow](https://github.com/apache/arrow-rs) | `54.2.0` | `54.2.1` | | [arrow-buff

[PR] chore(deps): bump aws-config from 1.5.16 to 1.5.17 [datafusion]

2025-02-28 Thread via GitHub
dependabot[bot] opened a new pull request, #14931: URL: https://github.com/apache/datafusion/pull/14931 Bumps [aws-config](https://github.com/smithy-lang/smithy-rs) from 1.5.16 to 1.5.17. Commits See full diff in https://github.com/smithy-lang/smithy-rs/commits";>compare view

Re: [PR] DNM: test dpp support [datafusion-comet]

2025-02-28 Thread via GitHub
wForget closed pull request #1396: DNM: test dpp support URL: https://github.com/apache/datafusion-comet/pull/1396 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscrib

Re: [PR] Fix the null handling for to_char function [datafusion]

2025-02-28 Thread via GitHub
jayzhan211 commented on code in PR #14908: URL: https://github.com/apache/datafusion/pull/14908#discussion_r1974559047 ## datafusion/functions/src/datetime/to_char.rs: ## @@ -663,4 +678,51 @@ mod tests { "Execution error: Format for `to_char` must be non-null Utf8,

Re: [PR] feat: Support IntegralDivide function [datafusion-comet]

2025-02-28 Thread via GitHub
wForget commented on code in PR #1428: URL: https://github.com/apache/datafusion-comet/pull/1428#discussion_r1974971511 ## native/core/src/execution/planner.rs: ## @@ -922,13 +956,18 @@ impl PhysicalPlanner { Ok(DataType::Decimal128(_p2, _s2)), ) =>

Re: [PR] chore(deps): bump aws-config from 1.5.16 to 1.5.17 [datafusion]

2025-02-28 Thread via GitHub
berkaysynnada merged PR #14931: URL: https://github.com/apache/datafusion/pull/14931 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@

Re: [I] Release DataFusion `46.0.0` [datafusion]

2025-02-28 Thread via GitHub
xudong963 commented on issue #14123: URL: https://github.com/apache/datafusion/issues/14123#issuecomment-2690357837 > So I am hopeful we'll be able to close out the tickets tomorrow and maybe create a release branch over the weekend or Monday. > > FYI [@xudong963](https://github.com/x

Re: [PR] Prepare for 46.0.0 release: Version and Changelog [datafusion]

2025-02-28 Thread via GitHub
xudong963 commented on PR #14903: URL: https://github.com/apache/datafusion/pull/14903#issuecomment-2690360871 > Then we can do final testing on that branch What does the final testing include? -- This is an automated message from the Apache Git Service. To respond to the message,

[I] Comet executor memory overriding to absurd numbers (unified mode) [datafusion-comet]

2025-02-28 Thread via GitHub
LukMRVC opened a new issue, #1460: URL: https://github.com/apache/datafusion-comet/issues/1460 ### Describe the bug When initialization Spark session in unified mode, Comet overrides `spark.executor.memoryOverhead` to over 100GB memory requirements. https://github.com/apache/d

Re: [PR] feat: Support IntegralDivide function [datafusion-comet]

2025-02-28 Thread via GitHub
wForget commented on PR #1428: URL: https://github.com/apache/datafusion-comet/pull/1428#issuecomment-2690250964 Unrelated failure: `org.apache.spark.sql.execution.ui.SQLAppStatusListenerMemoryLeakSuite` in `spark-sql-sql/core-1/ubuntu-24.04/spark-4.0.0-preview1/java-17` ```

[PR] Expose `build_row_filter` method [datafusion]

2025-02-28 Thread via GitHub
xudong963 opened a new pull request, #14933: URL: https://github.com/apache/datafusion/pull/14933 ## Which issue does this PR close? - Closes #. ## Rationale for this change The method is useful for user to build their own `ParquetExec`. ## What cha

Re: [PR] Prepare for 46.0.0 release: Version and Changelog [datafusion]

2025-02-28 Thread via GitHub
alamb commented on PR #14903: URL: https://github.com/apache/datafusion/pull/14903#issuecomment-2690457511 > > Then we can do final testing on that branch > > @alamb What does the final testing include? I don't think I have anything specific in mind -- maybe re-check that the d

Re: [PR] Add additional protobuf tests for plans that read parquet with projections [datafusion]

2025-02-28 Thread via GitHub
alamb commented on PR #14924: URL: https://github.com/apache/datafusion/pull/14924#issuecomment-2691030177 > thanks @alamb @blaginin and @mertak-synnada and @xudong963 did all the work ! -- This is an automated message from the Apache Git Service. To respond to the message, please

Re: [PR] Set projection before configuring the source [datafusion]

2025-02-28 Thread via GitHub
blaginin commented on PR #14685: URL: https://github.com/apache/datafusion/pull/14685#issuecomment-2690999140 Thank you for the help! 🥹 I'll do follow up prs -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abo

Re: [PR] Use arrow IPC Stream format for spill files [datafusion]

2025-02-28 Thread via GitHub
comphead commented on PR #14868: URL: https://github.com/apache/datafusion/pull/14868#issuecomment-2691112331 Thanks @davidhewitt I let the PR some time for other approvers since this is a first contribution. I do not see any objections and planning to merge this PR, thanks again -- This

Re: [PR] TESTING (NOT FOR MERGE) Test arrow/parquet 54.2.1 [datafusion]

2025-02-28 Thread via GitHub
alamb closed pull request #14915: TESTING (NOT FOR MERGE) Test arrow/parquet 54.2.1 URL: https://github.com/apache/datafusion/pull/14915 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] refactor(properties): Split properties.rs into smaller modules [datafusion]

2025-02-28 Thread via GitHub
alamb merged PR #14925: URL: https://github.com/apache/datafusion/pull/14925 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] Fix sequential metadata fetching in ListingTable causing high latency [datafusion]

2025-02-28 Thread via GitHub
alamb commented on code in PR #14918: URL: https://github.com/apache/datafusion/pull/14918#discussion_r1975866768 ## datafusion/core/src/datasource/listing/table.rs: ## @@ -1670,6 +1749,56 @@ mod tests { Ok(()) } +/// Check that the files listed by the table

Re: [PR] Fix sequential metadata fetching in ListingTable causing high latency [datafusion]

2025-02-28 Thread via GitHub
alamb commented on code in PR #14918: URL: https://github.com/apache/datafusion/pull/14918#discussion_r1975867484 ## datafusion/core/src/test/object_store.rs: ## @@ -61,3 +71,121 @@ pub fn local_unpartitioned_file(path: impl AsRef) -> ObjectMeta version: None, }

Re: [PR] perf: Reduce native shuffle memory overhead by 50% [datafusion-comet]

2025-02-28 Thread via GitHub
andygrove commented on code in PR #1452: URL: https://github.com/apache/datafusion-comet/pull/1452#discussion_r1975564762 ## native/core/src/execution/shuffle/shuffle_writer.rs: ## @@ -829,54 +781,53 @@ impl PartitionBuffer { }); self.num_active_row

[I] Fix warnings in doc build [datafusion]

2025-02-28 Thread via GitHub
alamb opened a new issue, #14945: URL: https://github.com/apache/datafusion/issues/14945 ### Is your feature request related to a problem or challenge? If I got to ``` cd docs ./build.sh ``` I see several warnings: ``` updating environment: [new config] 60

Re: [I] TPCH queries 7,8,9, do not validate [datafusion-ray]

2025-02-28 Thread via GitHub
andygrove closed issue #65: TPCH queries 7,8,9, do not validate URL: https://github.com/apache/datafusion-ray/issues/65 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsub

Re: [PR] Change tpch validation to use `exec_sql_on_tables` [datafusion-ray]

2025-02-28 Thread via GitHub
andygrove merged PR #66: URL: https://github.com/apache/datafusion-ray/pull/66 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafu

Re: [I] `FileSource` and `DataSource` traits require deep copies [datafusion]

2025-02-28 Thread via GitHub
alamb commented on issue #14939: URL: https://github.com/apache/datafusion/issues/14939#issuecomment-2691065652 Maybe we can try a mutable API instead of a builder style API 🤔 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

Re: [I] Native shuffle double allocates memory [datafusion-comet]

2025-02-28 Thread via GitHub
andygrove closed issue #1448: Native shuffle double allocates memory URL: https://github.com/apache/datafusion-comet/issues/1448 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [I] Error projecting statistics in `DataSourceExec` [datafusion]

2025-02-28 Thread via GitHub
alamb closed issue #14905: Error projecting statistics in `DataSourceExec` URL: https://github.com/apache/datafusion/issues/14905 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [I] Release DataFusion `46.0.0` [datafusion]

2025-02-28 Thread via GitHub
alamb commented on issue #14123: URL: https://github.com/apache/datafusion/issues/14123#issuecomment-2691106681 > If Expr::Wildcard no longer works, why is it still being kept around? I don't know -- maybe we should mark it deprecated > There is a concerning pattern of various

Re: [PR] Use arrow IPC Stream format for spill files [datafusion]

2025-02-28 Thread via GitHub
comphead merged PR #14868: URL: https://github.com/apache/datafusion/pull/14868 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

Re: [I] Epic: Statistics improvements [datafusion]

2025-02-28 Thread via GitHub
alamb commented on issue #8227: URL: https://github.com/apache/datafusion/issues/8227#issuecomment-2690840822 The process has started in - https://github.com/apache/datafusion/pull/14699 -- This is an automated message from the Apache Git Service. To respond to the message, please log

Re: [PR] Add H2O.ai Database-like Ops benchmark to dfbench (join support) [datafusion]

2025-02-28 Thread via GitHub
Dandandan merged PR #14902: URL: https://github.com/apache/datafusion/pull/14902 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@data

[I] Should pruningpredicate coerce? [datafusion]

2025-02-28 Thread via GitHub
ion-elgreco opened a new issue, #14944: URL: https://github.com/apache/datafusion/issues/14944 ### Describe the bug Currently when you pass a pruning predicate where the predicate has a different type as the targeted column it will not prune it, even though in theory the value is cas

[I] NoSuchMethodError: PartitionedFileUtil$.splitFiles when running with Spark 3.5.5 [datafusion-comet]

2025-02-28 Thread via GitHub
andygrove opened a new issue, #1461: URL: https://github.com/apache/datafusion-comet/issues/1461 ### Describe the bug The signature of `PartitionedFileUtil$.splitFiles` changed in Spark 3.5.5 so Comet is not compatible. We need to add a shim to fix this. ### Steps to rep

Re: [PR] Change tpch validation to use `exec_sql_on_tables` [datafusion-ray]

2025-02-28 Thread via GitHub
andygrove commented on code in PR #66: URL: https://github.com/apache/datafusion-ray/pull/66#discussion_r1975841145 ## src/util.rs: ## @@ -397,6 +402,52 @@ fn print_node(plan: &Arc, indent: usize, output: &mut String) } } +async fn exec_sql(query: String, tables: Vec<(S

Re: [PR] Refactor SortPushdown using the standard top-down visitor. [datafusion]

2025-02-28 Thread via GitHub
wiedld commented on code in PR #14821: URL: https://github.com/apache/datafusion/pull/14821#discussion_r1975847459 ## datafusion/core/tests/physical_optimizer/enforce_sorting.rs: ## @@ -2242,7 +2242,7 @@ async fn test_window_partial_constant_and_set_monotonicity() -> Result<()>

Re: [PR] test: add more test cases to demonstrate handling of heterogeneous constants [datafusion]

2025-02-28 Thread via GitHub
wiedld closed pull request #14923: test: add more test cases to demonstrate handling of heterogeneous constants URL: https://github.com/apache/datafusion/pull/14923 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

Re: [PR] perf: Reduce native shuffle memory overhead by 50% [datafusion-comet]

2025-02-28 Thread via GitHub
andygrove commented on PR #1452: URL: https://github.com/apache/datafusion-comet/pull/1452#issuecomment-2690887992 @mbutrovich This PR is ready for review now -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL ab

Re: [PR] perf: Reduce native shuffle memory overhead by 50% [datafusion-comet]

2025-02-28 Thread via GitHub
andygrove commented on code in PR #1452: URL: https://github.com/apache/datafusion-comet/pull/1452#discussion_r1975562414 ## native/core/src/execution/shuffle/shuffle_writer.rs: ## @@ -793,10 +747,8 @@ impl PartitionBuffer { let mut repart_timer = metrics.repart_tim

Re: [PR] Set projection before configuring the source [datafusion]

2025-02-28 Thread via GitHub
alamb commented on code in PR #14685: URL: https://github.com/apache/datafusion/pull/14685#discussion_r1975526113 ## datafusion/datasource/src/file_scan_config.rs: ## @@ -433,54 +494,13 @@ impl FileScanConfig { ); } -let proj_indices = if let Some

  1   2   >