[PR] chore(deps): bump clap from 4.5.35 to 4.5.36 [datafusion]

2025-04-18 Thread via GitHub
dependabot[bot] opened a new pull request, #15759: URL: https://github.com/apache/datafusion/pull/15759 Bumps [clap](https://github.com/clap-rs/clap) from 4.5.35 to 4.5.36. Release notes Sourced from https://github.com/clap-rs/clap/releases";>clap's releases. v4.5.36 [4.5.

Re: [PR] chore(deps): bump clap from 4.5.35 to 4.5.36 [datafusion]

2025-04-18 Thread via GitHub
xudong963 merged PR #15759: URL: https://github.com/apache/datafusion/pull/15759 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@data

Re: [I] Job graph fails to display in UI [datafusion-ballista]

2025-04-18 Thread via GitHub
milenkovicm commented on issue #627: URL: https://github.com/apache/datafusion-ballista/issues/627#issuecomment-2814076120 Web UI is not part of core ballista, closing this issue -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHu

Re: [PR] Add a fast path for `optimize_projection` [datafusion]

2025-04-18 Thread via GitHub
xudong963 commented on PR #15746: URL: https://github.com/apache/datafusion/pull/15746#issuecomment-2815275392 > > I don't find SQL in the file that has the following pattern > > They sometimes might have after a optimization rule that adds one extra projection? Not sure if that happe

[I] `Cargo bench --bench sql_planner` is failing [datafusion]

2025-04-18 Thread via GitHub
xudong963 opened a new issue, #15762: URL: https://github.com/apache/datafusion/issues/15762 ### Describe the bug ``` Benchmarking physical_plan_clickbench_q50: Warming up for 3. s thread 'main' panicked at datafusion/core/benches/sql_planner.rs:60:14: called `Result::unwr

[I] Memory limited nest loop join [datafusion]

2025-04-18 Thread via GitHub
2010YOUY01 opened a new issue, #15760: URL: https://github.com/apache/datafusion/issues/15760 ### Is your feature request related to a problem or challenge? The common NLJ implementation consumes constant memory. However, DataFusion's implementation is optimized for execution time, wh

Re: [PR] Add a fast path for `optimize_projection` [datafusion]

2025-04-18 Thread via GitHub
xudong963 commented on PR #15746: URL: https://github.com/apache/datafusion/pull/15746#issuecomment-281524 Thank you for your review, let's go and continue to optimize. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and u

Re: [PR] Add a fast path for `optimize_projection` [datafusion]

2025-04-18 Thread via GitHub
xudong963 commented on PR #15746: URL: https://github.com/apache/datafusion/pull/15746#issuecomment-2815109693 > > > sql planner benchmarks > > > > > > Do we have some existing sql planner benchmarks? > > Yes, there is a sql_planner bench. https://github.com/apache/datafusi

[PR] docs: Update compatibility docs for new native scans [datafusion-comet]

2025-04-18 Thread via GitHub
andygrove opened a new pull request, #1657: URL: https://github.com/apache/datafusion-comet/pull/1657 ## Which issue does this PR close? Closes #. ## Rationale for this change ## What changes are included in this PR? ## How are these changes

Re: [PR] docs: Update compatibility docs for new native scans [datafusion-comet]

2025-04-18 Thread via GitHub
andygrove commented on code in PR #1657: URL: https://github.com/apache/datafusion-comet/pull/1657#discussion_r2050730872 ## docs/source/user-guide/compatibility.md: ## @@ -93,104 +97,104 @@ Cast operations in Comet fall into three levels of support: - **Compatible**: The re

Re: [PR] docs: Update compatibility docs for new native scans [datafusion-comet]

2025-04-18 Thread via GitHub
andygrove commented on code in PR #1657: URL: https://github.com/apache/datafusion-comet/pull/1657#discussion_r2050731258 ## docs/source/user-guide/compatibility.md: ## @@ -40,19 +40,23 @@ Comet currently has three distinct implementations of the Parquet scan operator. | `nati

Re: [PR] docs: Update compatibility docs for new native scans [datafusion-comet]

2025-04-18 Thread via GitHub
andygrove commented on code in PR #1657: URL: https://github.com/apache/datafusion-comet/pull/1657#discussion_r2050729883 ## docs/source/user-guide/compatibility.md: ## @@ -40,19 +40,23 @@ Comet currently has three distinct implementations of the Parquet scan operator. | `nati

Re: [PR] feat: Add support for complex types in native shuffle [datafusion-comet]

2025-04-18 Thread via GitHub
andygrove commented on PR #1655: URL: https://github.com/apache/datafusion-comet/pull/1655#issuecomment-2813894419 I'm seeing a number of failures like this: ``` 2025-04-17T19:12:54.7914570Z - columnar shuffle on struct including nulls *** FAILED *** (352 milliseconds) 2025-04-

Re: [PR] docs: Update compatibility docs for new native scans [datafusion-comet]

2025-04-18 Thread via GitHub
mbutrovich commented on code in PR #1657: URL: https://github.com/apache/datafusion-comet/pull/1657#discussion_r2050734119 ## docs/source/user-guide/compatibility.md: ## @@ -40,19 +40,23 @@ Comet currently has three distinct implementations of the Parquet scan operator. | `nat

[PR] feat: update datafusion dependency 47 [datafusion-python]

2025-04-18 Thread via GitHub
timsaucer opened a new pull request, #1107: URL: https://github.com/apache/datafusion-python/pull/1107 # Which issue does this PR close? None # Rationale for this change DataFusion 47 is updating upstream. # What changes are included in this PR? Mostly smal

Re: [I] Support more types when pruning Parquet data [datafusion]

2025-04-18 Thread via GitHub
etseidl commented on issue #15742: URL: https://github.com/apache/datafusion/issues/15742#issuecomment-2815595171 BTW, this issue is somewhat tied up with https://github.com/apache/parquet-format/pull/221. Take for example ```sql > select * from 'parquet-testing/data/float16_nonzeros_a

Re: [PR] Perf: Optimize in memory sort [datafusion]

2025-04-18 Thread via GitHub
zhuqi-lucas commented on code in PR #15380: URL: https://github.com/apache/datafusion/pull/15380#discussion_r2050341651 ## datafusion/physical-plan/src/sorts/sort.rs: ## @@ -662,53 +665,152 @@ impl ExternalSorter { let elapsed_compute = metrics.elapsed_compute().clone()

Re: [I] OOM when nested join + limit [datafusion]

2025-04-18 Thread via GitHub
2010YOUY01 commented on issue #15628: URL: https://github.com/apache/datafusion/issues/15628#issuecomment-2814936184 > [@alamb](https://github.com/alamb) > > I think a SortMergeJoin derivative that handles inequality could be helpful here but there are comments that SortMergeJoin is n

Re: [PR] Add a fast path for `optimize_projection` [datafusion]

2025-04-18 Thread via GitHub
alamb commented on PR #15746: URL: https://github.com/apache/datafusion/pull/15746#issuecomment-2814089929 🤖: Benchmark completed Details ``` Comparing HEAD and speed_up_optimize_projection Benchmark clickbench_extended.json -

Re: [PR] Add a fast path for `optimize_projection` [datafusion]

2025-04-18 Thread via GitHub
xudong963 commented on PR #15746: URL: https://github.com/apache/datafusion/pull/15746#issuecomment-2814864380 > sql planner benchmarks Do we have some existing sql planner benchmarks? -- This is an automated message from the Apache Git Service. To respond to the message, please lo

Re: [PR] Perf: Optimize in memory sort [datafusion]

2025-04-18 Thread via GitHub
Dandandan commented on code in PR #15380: URL: https://github.com/apache/datafusion/pull/15380#discussion_r2050293912 ## datafusion/physical-plan/src/sorts/sort.rs: ## @@ -662,53 +665,152 @@ impl ExternalSorter { let elapsed_compute = metrics.elapsed_compute().clone();

Re: [PR] Perf: Optimize in memory sort [datafusion]

2025-04-18 Thread via GitHub
Dandandan commented on code in PR #15380: URL: https://github.com/apache/datafusion/pull/15380#discussion_r2050295792 ## datafusion/physical-plan/src/sorts/sort.rs: ## @@ -662,53 +665,152 @@ impl ExternalSorter { let elapsed_compute = metrics.elapsed_compute().clone();

Re: [I] Issue with partitioned `ListingTable` [datafusion-ballista]

2025-04-18 Thread via GitHub
milenkovicm closed issue #1239: Issue with partitioned `ListingTable` URL: https://github.com/apache/datafusion-ballista/issues/1239 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comme

Re: [PR] Perf: Optimize in memory sort [datafusion]

2025-04-18 Thread via GitHub
Dandandan commented on code in PR #15380: URL: https://github.com/apache/datafusion/pull/15380#discussion_r2050297829 ## datafusion/physical-plan/src/sorts/sort.rs: ## @@ -662,53 +665,152 @@ impl ExternalSorter { let elapsed_compute = metrics.elapsed_compute().clone();

Re: [I] OOM when nested join + limit [datafusion]

2025-04-18 Thread via GitHub
2010YOUY01 commented on issue #15628: URL: https://github.com/apache/datafusion/issues/15628#issuecomment-2815082026 I think https://github.com/apache/datafusion/issues/15760 is needed to fix the issue. -- This is an automated message from the Apache Git Service. To respond to the message

Re: [I] [Epic] A collection of dynamic filtering related items [datafusion]

2025-04-18 Thread via GitHub
adriangb commented on issue #15512: URL: https://github.com/apache/datafusion/issues/15512#issuecomment-2814538954 @acking-you have you seen https://github.com/apache/datafusion/pull/15301? -- This is an automated message from the Apache Git Service. To respond to the message, please log o

Re: [PR] Perf: Optimize in memory sort [datafusion]

2025-04-18 Thread via GitHub
Dandandan commented on code in PR #15380: URL: https://github.com/apache/datafusion/pull/15380#discussion_r2050293912 ## datafusion/physical-plan/src/sorts/sort.rs: ## @@ -662,53 +665,152 @@ impl ExternalSorter { let elapsed_compute = metrics.elapsed_compute().clone();

Re: [PR] Perf: Optimize in memory sort [datafusion]

2025-04-18 Thread via GitHub
zhuqi-lucas commented on code in PR #15380: URL: https://github.com/apache/datafusion/pull/15380#discussion_r2050395194 ## datafusion/physical-plan/src/sorts/sort.rs: ## @@ -662,53 +665,152 @@ impl ExternalSorter { let elapsed_compute = metrics.elapsed_compute().clone()

Re: [PR] Perf: Optimize in memory sort [datafusion]

2025-04-18 Thread via GitHub
zhuqi-lucas commented on PR #15380: URL: https://github.com/apache/datafusion/pull/15380#issuecomment-2815067483 > Thanks @zhuqi-lucas for sticking to this issue! > > I think we're close to have a PR that can be merged that improved sort performance and gets some good insights for whe

Re: [PR] Add a fast path for `optimize_projection` [datafusion]

2025-04-18 Thread via GitHub
xudong963 commented on PR #15746: URL: https://github.com/apache/datafusion/pull/15746#issuecomment-2815281930 > > > I don't find SQL in the file that has the following pattern > > > > > > They sometimes might have after a optimization rule that adds one extra projection? Not sure

[I] Improve Display format of `BoundedWindowAggExec` [datafusion]

2025-04-18 Thread via GitHub
jayzhan211 opened a new issue, #15758: URL: https://github.com/apache/datafusion/issues/15758 ### Is your feature request related to a problem or challenge? We can see Debug format in this explain statement, turn it to Display format would be nice for readability ``` Ok(Fiel

Re: [PR] fix: handle missing field correctly in native_iceberg_compat [datafusion-comet]

2025-04-18 Thread via GitHub
parthchandra merged PR #1656: URL: https://github.com/apache/datafusion-comet/pull/1656 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr.

Re: [PR] Add support for `GO` batch delimiter in SQL Server [datafusion-sqlparser-rs]

2025-04-18 Thread via GitHub
iffyio commented on code in PR #1809: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1809#discussion_r2050229485 ## tests/sqlparser_mssql.rs: ## @@ -2053,3 +2054,171 @@ fn parse_drop_trigger() { } ); } + +#[test] +fn parse_mssql_go_keyword() { +l

Re: [I] Add `try_new` for `LogicalPlan::Join` `Join` and others [datafusion]

2025-04-18 Thread via GitHub
kumarlokesh commented on issue #14363: URL: https://github.com/apache/datafusion/issues/14363#issuecomment-2814836741 @phisn @alamb made an attempt to address this requirement here: https://github.com/apache/datafusion/pull/15757. Please have a look. -- This is an automated message from t

Re: [PR] Add a fast path for `optimize_projection` [datafusion]

2025-04-18 Thread via GitHub
xudong963 commented on PR #15746: URL: https://github.com/apache/datafusion/pull/15746#issuecomment-2814871592 > > sql planner benchmarks > > Do we have some existing sql planner benchmarks? Also, there is a real production case: ![image](https://github.com/user-attachments/

[PR] Speed up `optimize_projection` by improving `is_projection_unnecessary` [datafusion]

2025-04-18 Thread via GitHub
xudong963 opened a new pull request, #15761: URL: https://github.com/apache/datafusion/pull/15761 ## Which issue does this PR close? - Closes #. ## Rationale for this change In our prod, we found that the `optimize_projection` is slow when the number of c

Re: [PR] experiment: Selectively remove CoalesceBatchesExec [datafusion]

2025-04-18 Thread via GitHub
Dandandan commented on PR #15479: URL: https://github.com/apache/datafusion/pull/15479#issuecomment-2815260149 I think this is a nice experiment. That said, I think we can better try changing the build side of the join to use `Vec`. I remember we (I) changed it to concatenate all buil

Re: [PR] Add DataFusion 47.0.0 Upgrade Guide [datafusion]

2025-04-18 Thread via GitHub
xudong963 merged PR #15749: URL: https://github.com/apache/datafusion/pull/15749 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@data

Re: [PR] Add a fast path for `optimize_projection` [datafusion]

2025-04-18 Thread via GitHub
Dandandan commented on PR #15746: URL: https://github.com/apache/datafusion/pull/15746#issuecomment-2815265485 > I don't find SQL in the file that has the following pattern They sometimes might have after a optimization rule that adds one extra projection? Not sure if that happens in

Re: [PR] Add Extension Type / Metadata support for Scalar UDFs [datafusion]

2025-04-18 Thread via GitHub
timsaucer commented on code in PR #15646: URL: https://github.com/apache/datafusion/pull/15646#discussion_r2050653344 ## datafusion/core/tests/user_defined/user_defined_scalar_functions.rs: ## @@ -1367,3 +1370,346 @@ async fn register_alltypes_parquet(ctx: &SessionContext) -> R

Re: [PR] Add Extension Type / Metadata support for Scalar UDFs [datafusion]

2025-04-18 Thread via GitHub
timsaucer commented on code in PR #15646: URL: https://github.com/apache/datafusion/pull/15646#discussion_r2050650946 ## datafusion/expr/src/udf.rs: ## @@ -293,9 +293,11 @@ where /// Arguments passed to [`ScalarUDFImpl::invoke_with_args`] when invoking a /// scalar function.

Re: [PR] Add Extension Type / Metadata support for Scalar UDFs [datafusion]

2025-04-18 Thread via GitHub
timsaucer commented on code in PR #15646: URL: https://github.com/apache/datafusion/pull/15646#discussion_r2050662012 ## datafusion/core/tests/user_defined/user_defined_scalar_functions.rs: ## @@ -1367,3 +1370,346 @@ async fn register_alltypes_parquet(ctx: &SessionContext) -> R

Re: [PR] Add Extension Type / Metadata support for Scalar UDFs [datafusion]

2025-04-18 Thread via GitHub
timsaucer commented on code in PR #15646: URL: https://github.com/apache/datafusion/pull/15646#discussion_r2050655982 ## datafusion/core/tests/user_defined/user_defined_scalar_functions.rs: ## @@ -1367,3 +1370,346 @@ async fn register_alltypes_parquet(ctx: &SessionContext) -> R

Re: [PR] Add a fast path for `optimize_projection` [datafusion]

2025-04-18 Thread via GitHub
Dandandan commented on PR #15746: URL: https://github.com/apache/datafusion/pull/15746#issuecomment-2815034641 > > sql planner benchmarks > > Do we have some existing sql planner benchmarks? Yes, there is a sql_planner bench. https://github.com/apache/datafusion/blob/main/d

Re: [PR] feat: make task distribution policies pluggable [datafusion-ballista]

2025-04-18 Thread via GitHub
milenkovicm merged PR #1243: URL: https://github.com/apache/datafusion-ballista/pull/1243 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubsc

Re: [PR] fix: respect `ignoreNulls` flag in `first_value` and `last_value` [datafusion-comet]

2025-04-18 Thread via GitHub
andygrove commented on PR #1626: URL: https://github.com/apache/datafusion-comet/pull/1626#issuecomment-2815592279 @anuragmantri fyi, I alread had this PR open from a while back. It does not resolve the issue but seems worth merging while we work on figuring out a test approach. -- This

Re: [PR] docs: Update compatibility docs for new native scans [datafusion-comet]

2025-04-18 Thread via GitHub
andygrove commented on code in PR #1657: URL: https://github.com/apache/datafusion-comet/pull/1657#discussion_r2050747143 ## docs/source/user-guide/compatibility.md: ## @@ -40,19 +40,23 @@ Comet currently has three distinct implementations of the Parquet scan operator. | `nati

Re: [PR] Add Configurable HTML Table Formatter for DataFusion DataFrames in Python [datafusion-python]

2025-04-18 Thread via GitHub
timsaucer commented on PR #1100: URL: https://github.com/apache/datafusion-python/pull/1100#issuecomment-2815615744 I'm sorry I haven't had time to look through it until now. **This looks incredible!** I want to take some time to play with it myself before I hit approve. Thank you s

Re: [I] Publish official Docker images as part of release process [datafusion-ballista]

2025-04-18 Thread via GitHub
milenkovicm commented on issue #455: URL: https://github.com/apache/datafusion-ballista/issues/455#issuecomment-2815668613 I believe this can be closed, we do publish images to GitHub packages and docker hub -- This is an automated message from the Apache Git Service. To respond to the m

Re: [I] Publish official Docker images as part of release process [datafusion-ballista]

2025-04-18 Thread via GitHub
milenkovicm closed issue #455: Publish official Docker images as part of release process URL: https://github.com/apache/datafusion-ballista/issues/455 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] chore: update python deps to 45 [datafusion-ballista]

2025-04-18 Thread via GitHub
milenkovicm merged PR #1240: URL: https://github.com/apache/datafusion-ballista/pull/1240 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubsc

Re: [PR] Add S3 object store support to executor and scheduler [datafusion-ballista]

2025-04-18 Thread via GitHub
milenkovicm merged PR #1230: URL: https://github.com/apache/datafusion-ballista/pull/1230 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubsc

[PR] chore(deps): bump crossbeam-channel from 0.5.14 to 0.5.15 in /python [datafusion-ballista]

2025-04-18 Thread via GitHub
dependabot[bot] opened a new pull request, #1244: URL: https://github.com/apache/datafusion-ballista/pull/1244 Bumps [crossbeam-channel](https://github.com/crossbeam-rs/crossbeam) from 0.5.14 to 0.5.15. Release notes Sourced from https://github.com/crossbeam-rs/crossbeam/releases";

Re: [I] Add support for S3 Object Store to scheduler/executor binaries [datafusion-ballista]

2025-04-18 Thread via GitHub
milenkovicm closed issue #1205: Add support for S3 Object Store to scheduler/executor binaries URL: https://github.com/apache/datafusion-ballista/issues/1205 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above t

Re: [PR] Testing DF 47 before release cut [datafusion-python]

2025-04-18 Thread via GitHub
timsaucer closed pull request #1104: Testing DF 47 before release cut URL: https://github.com/apache/datafusion-python/pull/1104 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] chore: reduce log levels for few log statements [datafusion-ballista]

2025-04-18 Thread via GitHub
milenkovicm merged PR #1237: URL: https://github.com/apache/datafusion-ballista/pull/1237 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubsc

Re: [I] pluggable task distribution policies [datafusion-ballista]

2025-04-18 Thread via GitHub
milenkovicm closed issue #1238: pluggable task distribution policies URL: https://github.com/apache/datafusion-ballista/issues/1238 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific commen

Re: [I] Ballista: Implement configuration mechanism [datafusion-ballista]

2025-04-18 Thread via GitHub
milenkovicm closed issue #19: Ballista: Implement configuration mechanism URL: https://github.com/apache/datafusion-ballista/issues/19 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific com

Re: [PR] Show current SQL recursion limit in RecursionLimitExceeded error message [datafusion]

2025-04-18 Thread via GitHub
kumarlokesh commented on PR #15644: URL: https://github.com/apache/datafusion/pull/15644#issuecomment-2815675711 @comphead updated the PR after resolving merge conflicts. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

Re: [PR] docs: Update compatibility docs for new native scans [datafusion-comet]

2025-04-18 Thread via GitHub
comphead commented on code in PR #1657: URL: https://github.com/apache/datafusion-comet/pull/1657#discussion_r2050792116 ## docs/source/user-guide/compatibility.md: ## @@ -34,25 +34,36 @@ This guide offers information about areas of functionality where there are known Comet cu

Re: [I] Expose the lower level scheduler API [datafusion-ballista]

2025-04-18 Thread via GitHub
milenkovicm commented on issue #888: URL: https://github.com/apache/datafusion-ballista/issues/888#issuecomment-2815650611 closing this as #1243 exposed some way to extend task scheduling, if more is needed happy to discuss and open new issue -- This is an automated message from the Apa

[I] [EPIC] Ballista 2025/H2 Roadmap Proposal [datafusion-ballista]

2025-04-18 Thread via GitHub
milenkovicm opened a new issue, #1245: URL: https://github.com/apache/datafusion-ballista/issues/1245 Following the completion of #1068, it's time to propose the next steps for Ballista. In the short term, I would like to focus on the following areas: - **Improving test coverag

Re: [I] [EPIC]: Ballista Reloaded, Roadmap Proposal [datafusion-ballista]

2025-04-18 Thread via GitHub
milenkovicm commented on issue #1068: URL: https://github.com/apache/datafusion-ballista/issues/1068#issuecomment-2815711646 After epic 6 months, I'm closing this issue. I would propose to have further discussion about ballista in #1245 Thank you everyone for your help, support and c

Re: [I] [EPIC]: Ballista Reloaded, Roadmap Proposal [datafusion-ballista]

2025-04-18 Thread via GitHub
milenkovicm closed issue #1068: [EPIC]: Ballista Reloaded, Roadmap Proposal URL: https://github.com/apache/datafusion-ballista/issues/1068 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[PR] predicate pruning: support dictionaries [datafusion]

2025-04-18 Thread via GitHub
adriangb opened a new pull request, #15764: URL: https://github.com/apache/datafusion/pull/15764 Adds support for dictionaries to pruning predicate. I encountered this when feeding in a stats schema with dictionary encoded columns. I see no reason we can't support this. -- This is a

Re: [I] Ballista: Implement configuration mechanism [datafusion-ballista]

2025-04-18 Thread via GitHub
milenkovicm commented on issue #19: URL: https://github.com/apache/datafusion-ballista/issues/19#issuecomment-2815671096 I believe this has been done in #1099 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL ab

Re: [PR] chore(deps): bump crossbeam-channel from 0.5.14 to 0.5.15 in /python [datafusion-ballista]

2025-04-18 Thread via GitHub
milenkovicm merged PR #1244: URL: https://github.com/apache/datafusion-ballista/pull/1244 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubsc

Re: [I] etcdserver: request is too large [datafusion-ballista]

2025-04-18 Thread via GitHub
milenkovicm commented on issue #416: URL: https://github.com/apache/datafusion-ballista/issues/416#issuecomment-2814090040 Etch has been removed, closing this issue -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

Re: [I] Update datafusion <> homebrew instructions [datafusion]

2025-04-18 Thread via GitHub
kevinjqliu commented on issue #15751: URL: https://github.com/apache/datafusion/issues/15751#issuecomment-2813294681 cc @alamb i found this while looking into how datafusion was added to homebrew, as per https://github.com/clflushopt/tpchgen-rs/pull/121 -- This is an automated message fro

Re: [I] Improve Display format of `BoundedWindowAggExec` [datafusion]

2025-04-18 Thread via GitHub
Standing-Man commented on issue #15758: URL: https://github.com/apache/datafusion/issues/15758#issuecomment-2815019499 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific commen

Re: [PR] Perf: Optimize in memory sort [datafusion]

2025-04-18 Thread via GitHub
zhuqi-lucas commented on code in PR #15380: URL: https://github.com/apache/datafusion/pull/15380#discussion_r2050384125 ## datafusion/physical-plan/src/sorts/sort.rs: ## @@ -662,53 +665,152 @@ impl ExternalSorter { let elapsed_compute = metrics.elapsed_compute().clone()

[I] Parquet: coerce_int96 does not work for int96 in nested types with repeated names [datafusion]

2025-04-18 Thread via GitHub
mbutrovich opened a new issue, #15763: URL: https://github.com/apache/datafusion/issues/15763 ### Describe the bug The logic that coerces timestamps to a different resolution iterates through fields and uses their key in the Parquet schema as a key to match against the Arrow schema.

Re: [PR] Add S3 object store support to executor and scheduler [datafusion-ballista]

2025-04-18 Thread via GitHub
milenkovicm commented on PR #1230: URL: https://github.com/apache/datafusion-ballista/pull/1230#issuecomment-2815461468 Thanks @joaoferrao I agree with you. As a follow up I will try to enable s3 in default context, once i merge few other PRs. Will look into improving doc as well.

Re: [PR] Return field instead of datatype for `return_type_from_args` [datafusion]

2025-04-18 Thread via GitHub
timsaucer closed pull request #15728: Return field instead of datatype for `return_type_from_args` URL: https://github.com/apache/datafusion/pull/15728 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go t

Re: [I] Memory limited nest loop join [datafusion]

2025-04-18 Thread via GitHub
shruti2522 commented on issue #15760: URL: https://github.com/apache/datafusion/issues/15760#issuecomment-2815464344 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [PR] Add Extension Type / Metadata support for Scalar UDFs [datafusion]

2025-04-18 Thread via GitHub
timsaucer commented on PR #15646: URL: https://github.com/apache/datafusion/pull/15646#issuecomment-2815465523 I've been working through switching over to returning field instead of data type. This had many touch points. The PR has grown very large, but a significant portion of that is upda

[PR] Add try_new for LogicalPlan::Join [datafusion]

2025-04-18 Thread via GitHub
kumarlokesh opened a new pull request, #15757: URL: https://github.com/apache/datafusion/pull/15757 ## Which issue does this PR close? - Closes #14363. ## Rationale for this change ## What changes are included in this PR? This PR adds a new `Joi

Re: [PR] Add support for `PRINT` statement for SQL Server [datafusion-sqlparser-rs]

2025-04-18 Thread via GitHub
iffyio merged PR #1811: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1811 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr

[PR] add support for XMLTABLE(...) [datafusion-sqlparser-rs]

2025-04-18 Thread via GitHub
lovasoa opened a new pull request, #1817: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1817 adds support for xmltable(...) see https://www.postgresql.org/docs/15/functions-xml.html#FUNCTIONS-XML-PROCESSING fixes https://github.com/apache/datafusion-sqlparser-rs/i

Re: [I] Filter multiple columns from TopK using Lexicographical ordering [datafusion]

2025-04-18 Thread via GitHub
Standing-Man commented on issue #15698: URL: https://github.com/apache/datafusion/issues/15698#issuecomment-2816406328 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [I] Expose the lower level scheduler API [datafusion-ballista]

2025-04-18 Thread via GitHub
milenkovicm closed issue #888: Expose the lower level scheduler API URL: https://github.com/apache/datafusion-ballista/issues/888 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] User defined window functions blog post [datafusion-site]

2025-04-18 Thread via GitHub
Adez017 commented on PR #66: URL: https://github.com/apache/datafusion-site/pull/66#issuecomment-2815810736 > Thanks @Adez017 ! Let's give it another day or two for any remaining comments and then I'll plan to publish it I think now , we should move forward -- This is an automated

Re: [PR] Add support for `GO` batch delimiter in SQL Server [datafusion-sqlparser-rs]

2025-04-18 Thread via GitHub
aharpervc commented on code in PR #1809: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1809#discussion_r2050843298 ## src/ast/mod.rs: ## @@ -4054,6 +4054,12 @@ pub enum Statement { arguments: Vec, options: Vec, }, +/// Go (MSSQL) +//

Re: [I] [BUG] Error when adding Date32 and Int64 [datafusion]

2025-04-18 Thread via GitHub
qstommyshu commented on issue #12342: URL: https://github.com/apache/datafusion/issues/12342#issuecomment-2812968577 Oh I see, thanks @Omega359 for sending the doc. Hope you don't mind me asking another question. I'm still wondering why we want DF to support operations like `select t

[PR] minor: change log level for object store creation [datafusion-ballista]

2025-04-18 Thread via GitHub
milenkovicm opened a new pull request, #1247: URL: https://github.com/apache/datafusion-ballista/pull/1247 # Which issue does this PR close? Closes #. # Rationale for this change object store creation will print configuration with secret key at INFO log level, w

Re: [I] [EPIC] Ballista 2025/H2 Roadmap Proposal [datafusion-ballista]

2025-04-18 Thread via GitHub
andygrove commented on issue #1245: URL: https://github.com/apache/datafusion-ballista/issues/1245#issuecomment-2816157686 There has been a lot of progress with shuffle performance in Comet that Ballista could benefit from. -- This is an automated message from the Apache Git Service. To

Re: [PR] feat: Add `datafusion-spark` crate [datafusion]

2025-04-18 Thread via GitHub
andygrove commented on code in PR #15168: URL: https://github.com/apache/datafusion/pull/15168#discussion_r2051099886 ## datafusion/spark/src/function/math/expm1.rs: ## @@ -0,0 +1,169 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor licen

Re: [I] [EPIC] Ballista 2025/H2 Roadmap Proposal [datafusion-ballista]

2025-04-18 Thread via GitHub
milenkovicm commented on issue #1245: URL: https://github.com/apache/datafusion-ballista/issues/1245#issuecomment-2816167845 I would take shuffle related task with highest priority @andygrove was thinking of #320 and few others related to compression, schema serialization and so on, but

Re: [I] [EPIC] Ballista 2025/H2 Roadmap Proposal [datafusion-ballista]

2025-04-18 Thread via GitHub
andygrove commented on issue #1245: URL: https://github.com/apache/datafusion-ballista/issues/1245#issuecomment-2816170438 There is work in progress to add a `datafusion-spark` crate in the core DataFusion repo. See https://github.com/apache/datafusion/issues/5600 and https://github.com/ap

Re: [PR] User `interleave` in hash repartitioning [datafusion]

2025-04-18 Thread via GitHub
Dandandan commented on code in PR #15768: URL: https://github.com/apache/datafusion/pull/15768#discussion_r2051062202 ## datafusion/physical-plan/src/repartition/mod.rs: ## @@ -298,25 +299,15 @@ impl BatchPartitioner { .into_iter()

Re: [I] [EPIC] Ballista 2025/H2 Roadmap Proposal [datafusion-ballista]

2025-04-18 Thread via GitHub
milenkovicm commented on issue #1245: URL: https://github.com/apache/datafusion-ballista/issues/1245#issuecomment-2816176772 would be happy to help. will have a look at comet -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub a

Re: [PR] Optimize TopK with threshold filter ~1.4x speedup [datafusion]

2025-04-18 Thread via GitHub
adriangb commented on PR #15697: URL: https://github.com/apache/datafusion/pull/15697#issuecomment-2816181258 FYI @Dandandan although very rough I put up a draft of filter pushdown in https://github.com/apache/datafusion/pull/15770. The interaction with this PR is something to think a

Re: [I] Support non UTF-8 in CSV files [datafusion]

2025-04-18 Thread via GitHub
jayzhan211 commented on issue #15756: URL: https://github.com/apache/datafusion/issues/15756#issuecomment-2815340158 `╦` should be utf8. what error did you get -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL ab

Re: [PR] feat: Add support for complex types in native shuffle [datafusion-comet]

2025-04-18 Thread via GitHub
andygrove commented on PR #1655: URL: https://github.com/apache/datafusion-comet/pull/1655#issuecomment-2815370919 @parthchandra @mbutrovich Ths PR is now ready for review -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and u

Re: [PR] Support `Accumulator` for avg duration [datafusion]

2025-04-18 Thread via GitHub
shruti2522 commented on code in PR #15468: URL: https://github.com/apache/datafusion/pull/15468#discussion_r2050599634 ## datafusion/sqllogictest/test_files/aggregate.slt: ## @@ -4969,6 +4969,25 @@ select count(distinct column1), count(distinct column2) from dict_test group by

Re: [I] [Feature] Add support iceberg table [datafusion-ballista]

2025-04-18 Thread via GitHub
milenkovicm commented on issue #890: URL: https://github.com/apache/datafusion-ballista/issues/890#issuecomment-2815561069 There is ongoing discussion about support of table formats in #1241 if anybody interested -- This is an automated message from the Apache Git Service. To respond to

Re: [I] Support more types when pruning Parquet data [datafusion]

2025-04-18 Thread via GitHub
etseidl commented on issue #15742: URL: https://github.com/apache/datafusion/issues/15742#issuecomment-2815571292 More background in #3377 and #3442. It seems like additional data types were planned, but abandoned for some reason. @alamb do you think it would be safe to replace the lo

Re: [PR] Improve `simplify_expressions` rule [datafusion]

2025-04-18 Thread via GitHub
xudong963 commented on code in PR #15735: URL: https://github.com/apache/datafusion/pull/15735#discussion_r2048178156 ## datafusion/optimizer/src/simplify_expressions/expr_simplifier.rs: ## @@ -188,7 +188,7 @@ impl ExprSimplifier { /// assert_eq!(expr, b_lt_2); /// ```

Re: [PR] ExecutionPlan: add APIs for filter pushdown & optimizer rule to apply them [datafusion]

2025-04-18 Thread via GitHub
berkaysynnada commented on code in PR #15566: URL: https://github.com/apache/datafusion/pull/15566#discussion_r2049226852 ## datafusion/physical-optimizer/src/push_down_filter.rs: ## @@ -0,0 +1,535 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more con

Re: [I] Make it easier to run TPCH queries with datafusion-cli [datafusion]

2025-04-18 Thread via GitHub
clflushopt commented on issue #14608: URL: https://github.com/apache/datafusion/issues/14608#issuecomment-2811379764 In order to try and make progress on this, I decided to go with having a single function that builds all tables for a single scale factor similar to how DuckDB does it. My re

  1   2   3   >