Re: [PR] Minor: use FileScanConfig builder API in some tests [datafusion]

2025-03-03 Thread via GitHub
alamb merged PR #14938: URL: https://github.com/apache/datafusion/pull/14938 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] Minor: use FileScanConfig builder API in some tests [datafusion]

2025-03-03 Thread via GitHub
alamb commented on PR #14938: URL: https://github.com/apache/datafusion/pull/14938#issuecomment-2694712233 Thank you for the review @berkaysynnada -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] Minor: improve documentation of `AggregateMode` [datafusion]

2025-03-03 Thread via GitHub
alamb commented on PR #14946: URL: https://github.com/apache/datafusion/pull/14946#issuecomment-2694712698 Thank you for the review @2010YOUY01 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to th

Re: [PR] Minor: improve documentation of `AggregateMode` [datafusion]

2025-03-03 Thread via GitHub
alamb merged PR #14946: URL: https://github.com/apache/datafusion/pull/14946 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] feat: instrument spawned tasks with current tracing span when `tracing` feature is enabled [datafusion]

2025-03-03 Thread via GitHub
geoffreyclaude commented on PR #14547: URL: https://github.com/apache/datafusion/pull/14547#issuecomment-2694747715 > Thank you for working on this @geoffreyclaude -- I am sorry for the delay in responding. > > Primarily my concern about this PR is that it adds new more dependencies

Re: [PR] Fix sequential metadata fetching in ListingTable causing high latency [datafusion]

2025-03-03 Thread via GitHub
alamb merged PR #14918: URL: https://github.com/apache/datafusion/pull/14918 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

[I] error: toolchain '1.85.0-x86_64-unknown-linux-gnu' is not installed on CI / verification script [datafusion]

2025-03-03 Thread via GitHub
alamb opened a new issue, #14982: URL: https://github.com/apache/datafusion/issues/14982 ### Describe the bug https://github.com/apache/datafusion/actions/runs/13634013005/job/38108199083 ``` Run cargo check --profile ci --all-targets error: toolchain '1.85.0-x86_64-unknow

Re: [PR] feat: Implementation of udf and udaf decorator [datafusion-python]

2025-03-03 Thread via GitHub
CrystalZhou0529 commented on PR #1040: URL: https://github.com/apache/datafusion-python/pull/1040#issuecomment-2694819663 Thanks for your suggestion! I totally agree that `@udf` is a better name. I'll experiment it and provide an update soon! -- This is an automated message from the Apac

Re: [I] Release DataFusion `46.0.0` [datafusion]

2025-03-03 Thread via GitHub
alamb commented on issue #14123: URL: https://github.com/apache/datafusion/issues/14123#issuecomment-2694820798 > [@alamb](https://github.com/alamb) My pleasure > > I sent the release email to [d...@datafusion.apache.org](mailto:d...@datafusion.apache.org), did you see it? Yup!

Re: [I] error: toolchain '1.85.0-x86_64-unknown-linux-gnu' is not installed on CI / verification script [datafusion]

2025-03-03 Thread via GitHub
alamb commented on issue #14982: URL: https://github.com/apache/datafusion/issues/14982#issuecomment-2694821546 Looks like rustup may have just released a new version: 🤔 https://github.com/rust-lang/rustup/blob/master/CHANGELOG.md#1280---2025-03-04 -- This is an automated message f

Re: [PR] feat: Add div operator for fuzz testing and update expression doc [datafusion-comet]

2025-03-03 Thread via GitHub
andygrove merged PR #1464: URL: https://github.com/apache/datafusion-comet/pull/1464 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@

Re: [PR] chore(deps): bump thiserror from 2.0.11 to 2.0.12 [datafusion]

2025-03-03 Thread via GitHub
comphead merged PR #14971: URL: https://github.com/apache/datafusion/pull/14971 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

[PR] Alamb/fix verification [datafusion]

2025-03-03 Thread via GitHub
alamb opened a new pull request, #14983: URL: https://github.com/apache/datafusion/pull/14983 ## Which issue does this PR close? - Closes https://github.com/apache/datafusion/issues/14982 ## Rationale for this change I am not sure why but the verification script started f

Re: [PR] Ci fixes [datafusion-ray]

2025-03-03 Thread via GitHub
robtandy closed pull request #73: Ci fixes URL: https://github.com/apache/datafusion-ray/pull/73 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-

Re: [PR] chore: Upgrade `rand` crate and some other minor crates [datafusion]

2025-03-03 Thread via GitHub
comphead commented on PR #14967: URL: https://github.com/apache/datafusion/pull/14967#issuecomment-2694899892 Looks like some tests relying on random data generation may fail if the `rand` version changed -- This is an automated message from the Apache Git Service. To respond to the messa

Re: [I] Release DataFusion `46.0.0` [datafusion]

2025-03-03 Thread via GitHub
alamb commented on issue #14123: URL: https://github.com/apache/datafusion/issues/14123#issuecomment-2694932150 I found an issue while validation: - https://github.com/apache/datafusion/issues/14982 -- This is an automated message from the Apache Git Service. To respond to the message,

[I] Fix Star and Fork buttons on docs index page [datafusion]

2025-03-03 Thread via GitHub
amoeba opened a new issue, #14984: URL: https://github.com/apache/datafusion/issues/14984 ### Describe the bug The Star and Fork buttons aren't rendering correctly on https://datafusion.apache.org. ### To Reproduce Visit https://datafusion.apache.org which looks like thi

Re: [PR] Workaround verification script error for branch-46 [datafusion]

2025-03-03 Thread via GitHub
andygrove commented on PR #14983: URL: https://github.com/apache/datafusion/pull/14983#issuecomment-2694972949 > I run into the same issue when using 46.0.0-rc1 in Comet, and this PR will not help with that. I may need to update Comet's toolchain file to match. I will try that next.

Re: [PR] _repr_ and _html_repr_ show '... and additional rows' message [datafusion-python]

2025-03-03 Thread via GitHub
Spaarsh commented on PR #1041: URL: https://github.com/apache/datafusion-python/pull/1041#issuecomment-2694647145 This is the new output: ## For ```_repr_``` ``` +-+-+ | letters | numbers | +-+-+ | A | 1 | | B | 2 |

Re: [I] Slow Physical Plan Creation for Remote Parquet Files [datafusion]

2025-03-03 Thread via GitHub
alamb closed issue #14916: Slow Physical Plan Creation for Remote Parquet Files URL: https://github.com/apache/datafusion/issues/14916 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific com

Re: [PR] Fix sequential metadata fetching in ListingTable causing high latency [datafusion]

2025-03-03 Thread via GitHub
alamb commented on PR #14918: URL: https://github.com/apache/datafusion/pull/14918#issuecomment-2694766133 I still could not reproduce any improvement with this PR, FWIW. I still think it is a good change so i merged it in, but it might be cool to find some benchmark results that showed the

Re: [PR] Fix sequential metadata fetching in ListingTable causing high latency [datafusion]

2025-03-03 Thread via GitHub
alamb commented on PR #14918: URL: https://github.com/apache/datafusion/pull/14918#issuecomment-269472 Thanks again @geoffreyclaude -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specif

Re: [PR] Fix sequential metadata fetching in ListingTable causing high latency [datafusion]

2025-03-03 Thread via GitHub
geoffreyclaude commented on PR #14918: URL: https://github.com/apache/datafusion/pull/14918#issuecomment-2694756767 > I tried to verify these changes but I couldn't figure out how to create an external table with explicitly listing the names via SQL. > > For posterity here is what I t

Re: [PR] chore(deps): bump async-trait from 0.1.86 to 0.1.87 [datafusion]

2025-03-03 Thread via GitHub
comphead merged PR #14973: URL: https://github.com/apache/datafusion/pull/14973 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

Re: [PR] fix: Executor memory overhead overriding [datafusion-comet]

2025-03-03 Thread via GitHub
LukMRVC commented on PR #1462: URL: https://github.com/apache/datafusion-comet/pull/1462#issuecomment-269486 Okay, @wForget I reverted some of my changes back to align with your proposal of overriding executor memory. -- This is an automated message from the Apache Git Service. To re

Re: [PR] chore(deps): bump pyo3 from 0.23.4 to 0.23.5 [datafusion]

2025-03-03 Thread via GitHub
comphead merged PR #14972: URL: https://github.com/apache/datafusion/pull/14972 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

Re: [PR] feat: Add one config to limit max disk usage for spilling queries [datafusion]

2025-03-03 Thread via GitHub
comphead commented on PR #14975: URL: https://github.com/apache/datafusion/pull/14975#issuecomment-2694847925 Nice PR thanks @2010YOUY01 I'm planning to review it today -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use t

Re: [PR] Improve `SessionStateBuilder::new` documentation [datafusion]

2025-03-03 Thread via GitHub
comphead merged PR #14980: URL: https://github.com/apache/datafusion/pull/14980 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

Re: [I] Reduce spilling overhead in Comet shuffle [datafusion-comet]

2025-03-03 Thread via GitHub
andygrove commented on issue #1436: URL: https://github.com/apache/datafusion-comet/issues/1436#issuecomment-2694865235 Fixed by https://github.com/apache/datafusion-comet/pull/1440 and https://github.com/apache/datafusion-comet/pull/1452 -- This is an automated message from the Apache G

Re: [I] Reduce spilling overhead in Comet shuffle [datafusion-comet]

2025-03-03 Thread via GitHub
andygrove closed issue #1436: Reduce spilling overhead in Comet shuffle URL: https://github.com/apache/datafusion-comet/issues/1436 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific commen

[PR] Ci fixes [datafusion-ray]

2025-03-03 Thread via GitHub
robtandy opened a new pull request, #73: URL: https://github.com/apache/datafusion-ray/pull/73 (no comment) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe,

Re: [PR] Workaround verification script error for branch-46 [datafusion]

2025-03-03 Thread via GitHub
andygrove commented on PR #14983: URL: https://github.com/apache/datafusion/pull/14983#issuecomment-2694969618 I run into the same issue when using 46.0.0-rc1 in Comet, and this PR will not help with that. -- This is an automated message from the Apache Git Service. To respond to the mess

Re: [PR] Workaround verification script error for branch-46 [datafusion]

2025-03-03 Thread via GitHub
alamb commented on PR #14983: URL: https://github.com/apache/datafusion/pull/14983#issuecomment-2695021703 > > I run into the same issue when using 46.0.0-rc1 in Comet, and this PR will not help with that. > > I may need to update Comet's toolchain file to match. I will try that next.

[PR] build: Use stable channel in rust-toolchain [datafusion-comet]

2025-03-03 Thread via GitHub
andygrove opened a new pull request, #1465: URL: https://github.com/apache/datafusion-comet/pull/1465 ## Which issue does this PR close? Closes #. ## Rationale for this change ## What changes are included in this PR? ## How are these changes

Re: [I] error: toolchain '1.85.0-x86_64-unknown-linux-gnu' is not installed on CI / verification script [datafusion]

2025-03-03 Thread via GitHub
andygrove commented on issue #14982: URL: https://github.com/apache/datafusion/issues/14982#issuecomment-2695029074 For Comet, I am testing with updating the rust-toolchain file to use channel "stable" rather than a specific version number. -- This is an automated message from the Apache

Re: [PR] feat: Implementation of udf and udaf decorator [datafusion-python]

2025-03-03 Thread via GitHub
CrystalZhou0529 commented on PR #1040: URL: https://github.com/apache/datafusion-python/pull/1040#issuecomment-2695035830 @timsaucer Hi, I borrowed your suggested idea and managed to get it work! I also used llm a bit to write the documentation. I hope it's not too confusing for users to u

[PR] build: Use "stable" channel [datafusion]

2025-03-03 Thread via GitHub
andygrove opened a new pull request, #14985: URL: https://github.com/apache/datafusion/pull/14985 ## Which issue does this PR close? - Closes #. ## Rationale for this change ## What changes are included in this PR? ## Are these changes teste

[I] Bulid is broken due to new rustup release [datafusion-comet]

2025-03-03 Thread via GitHub
andygrove opened a new issue, #1466: URL: https://github.com/apache/datafusion-comet/issues/1466 ### Describe the bug Bulid is broken due to new rustup release ### Steps to reproduce _No response_ ### Expected behavior _No response_ ### Additional con

Re: [PR] datafusion-cli: add streaming state for printing logic [datafusion]

2025-03-03 Thread via GitHub
alamb commented on code in PR #14961: URL: https://github.com/apache/datafusion/pull/14961#discussion_r1977876884 ## datafusion-cli/src/print_format.rs: ## @@ -153,6 +155,164 @@ fn format_batches_with_maxrows( Ok(()) } +/// The state and methods for displaying output Re

[PR] feat: Add div operator for fuzz testing and update expression doc [datafusion-comet]

2025-03-03 Thread via GitHub
wForget opened a new pull request, #1464: URL: https://github.com/apache/datafusion-comet/pull/1464 ## Which issue does this PR close? Follow-up to #1422 ## Rationale for this change ## What changes are included in this PR? Add div operator for fuzz-testin

Re: [PR] feat: Add div operator for fuzz testing and update expression doc [datafusion-comet]

2025-03-03 Thread via GitHub
wForget closed pull request #1463: feat: Add div operator for fuzz testing and update expression doc URL: https://github.com/apache/datafusion-comet/pull/1463 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[PR] feat: Add one config to limit max disk usage for spilling queries [datafusion]

2025-03-03 Thread via GitHub
2010YOUY01 opened a new pull request, #14975: URL: https://github.com/apache/datafusion/pull/14975 ## Which issue does this PR close? - Closes #. ## Rationale for this change For memory-limit queries, executors might write temporary results into the disk to r

[I] Eliminate limit when doing scalar aggregation query [datafusion]

2025-03-03 Thread via GitHub
niebayes opened a new issue, #14974: URL: https://github.com/apache/datafusion/issues/14974 ### Is your feature request related to a problem or challenge? In scalar aggregation queries, the result is always a single row. As a result, applying a LIMIT operator above the Aggregate opera

[PR] chore(deps): bump pyo3 from 0.23.4 to 0.23.5 [datafusion]

2025-03-03 Thread via GitHub
dependabot[bot] opened a new pull request, #14972: URL: https://github.com/apache/datafusion/pull/14972 Bumps [pyo3](https://github.com/pyo3/pyo3) from 0.23.4 to 0.23.5. Release notes Sourced from https://github.com/pyo3/pyo3/releases";>pyo3's releases. PyO3 0.23.5 This r

[PR] chore(deps): bump async-trait from 0.1.86 to 0.1.87 [datafusion]

2025-03-03 Thread via GitHub
dependabot[bot] opened a new pull request, #14973: URL: https://github.com/apache/datafusion/pull/14973 Bumps [async-trait](https://github.com/dtolnay/async-trait) from 0.1.86 to 0.1.87. Release notes Sourced from https://github.com/dtolnay/async-trait/releases";>async-trait's rel

[PR] [branch-46] Update changelog for backports to 46.0.0 [datafusion]

2025-03-03 Thread via GitHub
xudong963 opened a new pull request, #14977: URL: https://github.com/apache/datafusion/pull/14977 ## Which issue does this PR close? - Part of https://github.com/apache/datafusion/issues/14123 ## Rationale for this change FInal update of changelog prior to making an RC to

Re: [PR] Add Upgrade Guide for DataFusion 46.0.0 [datafusion]

2025-03-03 Thread via GitHub
alamb merged PR #14891: URL: https://github.com/apache/datafusion/pull/14891 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [I] Eliminate limit when doing scalar aggregation query [datafusion]

2025-03-03 Thread via GitHub
niebayes commented on issue #14974: URL: https://github.com/apache/datafusion/issues/14974#issuecomment-2693687086 Appreciate any feedback -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spec

Re: [PR] feat: Add one config to limit max disk usage for spilling queries [datafusion]

2025-03-03 Thread via GitHub
2010YOUY01 commented on code in PR #14975: URL: https://github.com/apache/datafusion/pull/14975#discussion_r1977146593 ## datafusion/physical-plan/src/spill.rs: ## @@ -54,41 +52,13 @@ pub(crate) fn read_spill_as_stream( Ok(builder.build()) } -/// Spills in-memory `batche

Re: [PR] BUG: schema_force_view_type configuration not working for CREATE EXTERNAL TABLE [datafusion]

2025-03-03 Thread via GitHub
2010YOUY01 commented on code in PR #14922: URL: https://github.com/apache/datafusion/pull/14922#discussion_r1977094812 ## datafusion/core/src/datasource/file_format/parquet.rs: ## @@ -377,6 +377,21 @@ impl FileFormat for ParquetFormat { Ok(Arc::new(schema)) } +

Re: [I] Release DataFusion `46.0.0` [datafusion]

2025-03-03 Thread via GitHub
xudong963 commented on issue #14123: URL: https://github.com/apache/datafusion/issues/14123#issuecomment-2693752495 > Remaining steps are: > > * Merge [Deprecate `Expr::Wildcard`  #14959](https://github.com/apache/datafusion/pull/14959) > * Make a backport and merge of Expr wildcard

Re: [PR] [branch-46] Deprecate `Expr::Wildcard` (#14959) [datafusion]

2025-03-03 Thread via GitHub
xudong963 merged PR #14976: URL: https://github.com/apache/datafusion/pull/14976 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@data

[PR] feat: Add div operator for fuzz testing and update expression doc [datafusion-comet]

2025-03-03 Thread via GitHub
wForget opened a new pull request, #1463: URL: https://github.com/apache/datafusion-comet/pull/1463 ## Which issue does this PR close? Follow-up to #1422 ## Rationale for this change ## What changes are included in this PR? Add div operator for fuzz-testin

[PR] chore(deps): bump thiserror from 2.0.11 to 2.0.12 [datafusion]

2025-03-03 Thread via GitHub
dependabot[bot] opened a new pull request, #14971: URL: https://github.com/apache/datafusion/pull/14971 Bumps [thiserror](https://github.com/dtolnay/thiserror) from 2.0.11 to 2.0.12. Release notes Sourced from https://github.com/dtolnay/thiserror/releases";>thiserror's releases.

Re: [I] Deterministic IDs for ExecutionPlan [datafusion]

2025-03-03 Thread via GitHub
xudong963 commented on issue #11364: URL: https://github.com/apache/datafusion/issues/11364#issuecomment-2693678321 Just noticed the related PR was closed, maybe we can continue to discuss the final/general way here. -- This is an automated message from the Apache Git Service. To respond

Re: [PR] feat: Add div operator for fuzz testing and update expression doc [datafusion-comet]

2025-03-03 Thread via GitHub
codecov-commenter commented on PR #1464: URL: https://github.com/apache/datafusion-comet/pull/1464#issuecomment-2693820070 ## [Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/1464?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_ca

Re: [I] Eliminate limit when doing scalar aggregation query [datafusion]

2025-03-03 Thread via GitHub
Dandandan closed issue #14974: Eliminate limit when doing scalar aggregation query URL: https://github.com/apache/datafusion/issues/14974 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [I] Eliminate limit when doing scalar aggregation query [datafusion]

2025-03-03 Thread via GitHub
Dandandan closed issue #14974: Eliminate limit when doing scalar aggregation query URL: https://github.com/apache/datafusion/issues/14974 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] Deprecate `Expr::Wildcard` [datafusion]

2025-03-03 Thread via GitHub
xudong963 merged PR #14959: URL: https://github.com/apache/datafusion/pull/14959 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@data

Re: [I] Eliminate limit when doing scalar aggregation query [datafusion]

2025-03-03 Thread via GitHub
niebayes commented on issue #14974: URL: https://github.com/apache/datafusion/issues/14974#issuecomment-2693906643 @Dandandan I'm afraid it's not the case. The limit operator is the top operator which only takes effect on its child. No matter of the number of rows a table has, and regardles

Re: [PR] BUG: schema_force_view_type configuration not working for CREATE EXTERNAL TABLE [datafusion]

2025-03-03 Thread via GitHub
zhuqi-lucas commented on code in PR #14922: URL: https://github.com/apache/datafusion/pull/14922#discussion_r1977263162 ## datafusion/core/src/datasource/file_format/parquet.rs: ## @@ -377,6 +377,21 @@ impl FileFormat for ParquetFormat { Ok(Arc::new(schema)) } +

Re: [I] Eliminate limit when doing scalar aggregation query [datafusion]

2025-03-03 Thread via GitHub
niebayes commented on issue #14974: URL: https://github.com/apache/datafusion/issues/14974#issuecomment-2693911804 @Dandandan Please see the following example: ``` sql > select * from information_schema.tables; +---++-++ | ta

Re: [I] Eliminate limit when doing scalar aggregation query [datafusion]

2025-03-03 Thread via GitHub
niebayes commented on issue #14974: URL: https://github.com/apache/datafusion/issues/14974#issuecomment-2693915944 @alamb cc -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[PR] [branch-46] Deprecate `Expr::Wildcard` (#14959) [datafusion]

2025-03-03 Thread via GitHub
xudong963 opened a new pull request, #14976: URL: https://github.com/apache/datafusion/pull/14976 ## Which issue does this PR close? - Part of https://github.com/apache/datafusion/issues/14123 - Backports https://github.com/apache/datafusion/pull/14959 ## Rationale fo

Re: [PR] Add Upgrade Guide for DataFusion 46.0.0 [datafusion]

2025-03-03 Thread via GitHub
alamb commented on PR #14891: URL: https://github.com/apache/datafusion/pull/14891#issuecomment-2694061520 Thank you @xudong963 for the approval and @comphead for the review and @shehabgamin for the content assistance! Since this PR affects only https://datafusion.apache.org/ it doe

Re: [I] Weekly Plan (Andrew Lamb) March 3, 2025 [datafusion]

2025-03-03 Thread via GitHub
alamb commented on issue #14978: URL: https://github.com/apache/datafusion/issues/14978#issuecomment-2694087167 DataFusion: Bugs/UX/Performance - [ ] https://github.com/apache/datafusion/pull/14677 - [ ] https://github.com/apache/datafusion/pull/14547 DataFusion: New Features

[I] Weekly Plan (Andrew Lamb) March 3, 2025 [datafusion]

2025-03-03 Thread via GitHub
alamb opened a new issue, #14978: URL: https://github.com/apache/datafusion/issues/14978 This is an attempt to organize myself and make what I plan to work on more visible ## Weekly High Level Goals - [ ] Help @xudong963 to get #14123 ready (mostly testing / herding bugs) - [ ]

Re: [I] Weekly Plan (Andrew Lamb) Feb 24, 2025 [datafusion]

2025-03-03 Thread via GitHub
alamb closed issue #14850: Weekly Plan (Andrew Lamb) Feb 24, 2025 URL: https://github.com/apache/datafusion/issues/14850 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsu

Re: [I] Weekly Plan (Andrew Lamb) Feb 24, 2025 [datafusion]

2025-03-03 Thread via GitHub
alamb commented on issue #14850: URL: https://github.com/apache/datafusion/issues/14850#issuecomment-2694087859 - Next week: https://github.com/apache/datafusion/issues/14978 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

Re: [PR] [branch-46] Deprecate `Expr::Wildcard` (#14959) [datafusion]

2025-03-03 Thread via GitHub
alamb commented on PR #14976: URL: https://github.com/apache/datafusion/pull/14976#issuecomment-2694093903 Looks great -- thanks @xudong963 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the sp

Re: [PR] [branch-46] Update changelog for backports to 46.0.0 [datafusion]

2025-03-03 Thread via GitHub
xudong963 merged PR #14977: URL: https://github.com/apache/datafusion/pull/14977 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@data

Re: [I] Release DataFusion `46.0.0` [datafusion]

2025-03-03 Thread via GitHub
xudong963 commented on issue #14123: URL: https://github.com/apache/datafusion/issues/14123#issuecomment-2694098342 All PRs have been patched to branch-46! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [PR] feat: Add `array_max` function support [datafusion]

2025-03-03 Thread via GitHub
findepi commented on code in PR #14470: URL: https://github.com/apache/datafusion/pull/14470#discussion_r1977038476 ## datafusion/functions-nested/src/max.rs: ## @@ -0,0 +1,137 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agre

Re: [PR] Bug: Fix multi-lines printing issue for datafusion-cli [datafusion]

2025-03-03 Thread via GitHub
alamb commented on code in PR #14954: URL: https://github.com/apache/datafusion/pull/14954#discussion_r1977405243 ## datafusion-cli/src/print_format.rs: ## @@ -209,14 +211,175 @@ impl PrintFormat { } Ok(()) } + +#[allow(clippy::too_many_arguments)] +

Re: [PR] Add note about upgrade guide into the release notes [datafusion]

2025-03-03 Thread via GitHub
alamb merged PR #14979: URL: https://github.com/apache/datafusion/pull/14979 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [I] Statistics: Implement `SampledDistribution` variant to `Distribution` to support estimated distributions [datafusion]

2025-03-03 Thread via GitHub
cj-zhukov commented on issue #14897: URL: https://github.com/apache/datafusion/issues/14897#issuecomment-2694328468 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [I] Statistics: Migrate to `Distribution` from `Precision` [datafusion]

2025-03-03 Thread via GitHub
cj-zhukov commented on issue #14896: URL: https://github.com/apache/datafusion/issues/14896#issuecomment-2694352019 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [I] doc: RecordBatchReceiverStreamBuilder::spawn_blocking does not abort threads [datafusion]

2025-03-03 Thread via GitHub
shruti2522 commented on issue #9152: URL: https://github.com/apache/datafusion/issues/9152#issuecomment-2694428828 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] chore: forbide `with_default_features` override existing information [datafusion]

2025-03-03 Thread via GitHub
milenkovicm commented on PR #14935: URL: https://github.com/apache/datafusion/pull/14935#issuecomment-2694440302 > Thank you @irenjj -- I think this makes a lot of sense to me > > > I just wonder if we should provide with_default_features or Default::default implementation for this ca

[PR] Improve `SessionStateBuilder::new` documentation [datafusion]

2025-03-03 Thread via GitHub
alamb opened a new pull request, #14980: URL: https://github.com/apache/datafusion/pull/14980 ## Which issue does this PR close? - related to https://github.com/apache/datafusion/issues/14899 - related to https://github.com/apache/datafusion/pull/14935 ## Rationale for this c

Re: [PR] Improve `SessionStateBuilder::new` documentation [datafusion]

2025-03-03 Thread via GitHub
alamb commented on code in PR #14980: URL: https://github.com/apache/datafusion/pull/14980#discussion_r1977540685 ## datafusion/core/src/execution/session_state.rs: ## @@ -1042,9 +1045,10 @@ impl SessionStateBuilder { } } -/// Returns a new [SessionStateBuild

Re: [I] Release DataFusion `46.0.0` [datafusion]

2025-03-03 Thread via GitHub
xudong963 commented on issue #14123: URL: https://github.com/apache/datafusion/issues/14123#issuecomment-2694451125 @alamb My pleasure I sent the release email to d...@datafusion.apache.org, did you see it? -- This is an automated message from the Apache Git Service. To respond to t

Re: [PR] Remove redundant statistics from FileScanConfig [datafusion]

2025-03-03 Thread via GitHub
Standing-Man commented on PR #14955: URL: https://github.com/apache/datafusion/pull/14955#issuecomment-2694451044 Hi @alamb and @blaginin, I found that four tests failed due to the statistics `num_rows` and `total_byte_size`. I'm confused about how to proceed with fixing this issue, and I n

Re: [PR] chore: Update `SessionStateBuilder::with_default_features` does not replace existing features [datafusion]

2025-03-03 Thread via GitHub
alamb commented on PR #14935: URL: https://github.com/apache/datafusion/pull/14935#issuecomment-2694451094 I also made a small PR to try and clarify the docs: - https://github.com/apache/datafusion/pull/14980 -- This is an automated message from the Apache Git Service. To respond to the

Re: [PR] chore: forbide `with_default_features` override existing information [datafusion]

2025-03-03 Thread via GitHub
alamb commented on code in PR #14935: URL: https://github.com/apache/datafusion/pull/14935#discussion_r1977530272 ## datafusion/core/src/execution/session_state.rs: ## @@ -1081,14 +1081,40 @@ impl SessionStateBuilder { /// Create default builder with defaults for table_fa

Re: [I] Release DataFusion `46.0.0` [datafusion]

2025-03-03 Thread via GitHub
alamb commented on issue #14123: URL: https://github.com/apache/datafusion/issues/14123#issuecomment-2694403217 > > Would you like to try and make the release candidate now? > > Yes, but need to wait a bit. I'm out now SOunds good -- let me know if you hit issues or want me to t

Re: [I] doc: RecordBatchReceiverStreamBuilder::spawn_blocking does not abort threads [datafusion]

2025-03-03 Thread via GitHub
shruti2522 commented on issue #9152: URL: https://github.com/apache/datafusion/issues/9152#issuecomment-2694428414 Hey @alamb, since the issue I have been working on is on hold for now, I would like to look into this in the meantime. -- This is an automated message from the Apache Git Se

Re: [PR] Remove redundant statistics from FileScanConfig [datafusion]

2025-03-03 Thread via GitHub
blaginin commented on PR #14955: URL: https://github.com/apache/datafusion/pull/14955#issuecomment-2694459822 i feel it may be easier if we fix https://github.com/apache/datafusion/issues/14936 first. I was planning to do it this week, but feel free to take over (just take the issue then)

Re: [I] `FileSource` and `DataSource` traits require deep copies [datafusion]

2025-03-03 Thread via GitHub
alamb commented on issue #14939: URL: https://github.com/apache/datafusion/issues/14939#issuecomment-2694137152 > Should we simply avoid putting these config methods in the trait, but just as each implementation's method? 🤔 That certainly sounds like a good thing to try. -- This i

Re: [I] Table function supports non-literal args [datafusion]

2025-03-03 Thread via GitHub
alamb commented on issue #14958: URL: https://github.com/apache/datafusion/issues/14958#issuecomment-2694202386 It might also be time to treat table functions more generally too -- so they could refer to actual columns, for example. Doing so would likely make simplifier work fall out 🤔 -

Re: [I] Discuss: not update Cargo.toml minor/patch version? [datafusion]

2025-03-03 Thread via GitHub
alamb commented on issue #14962: URL: https://github.com/apache/datafusion/issues/14962#issuecomment-2694210949 > From the discussion, I can see the main motivation is to have reproducible build (agains near latest dependencies) in CI. To achieve this, Cargo.lock (updated by bot) is enough.

Re: [I] Should pruningpredicate coerce? [datafusion]

2025-03-03 Thread via GitHub
alamb commented on issue #14944: URL: https://github.com/apache/datafusion/issues/14944#issuecomment-2694168148 > It's like cast(month_id, 'utf8') = '202502', see below: So it seems like it would be a valuable thing to apply the type coercion rewriter in the expr simplifier then.

Re: [I] Statistics: Migrate to `Distribution` from `Precision` [datafusion]

2025-03-03 Thread via GitHub
alamb commented on issue #14896: URL: https://github.com/apache/datafusion/issues/14896#issuecomment-2694141931 Thank you @clflushopt Given what I have seen so far I think you would do a great job at it and I would be very happy to help find time to review your PRs (though I realize

Re: [I] Fix multi-lines printing issue for datafusion-cli [datafusion]

2025-03-03 Thread via GitHub
zhuqi-lucas commented on issue #14953: URL: https://github.com/apache/datafusion/issues/14953#issuecomment-2691969110 I will fix in this ticket, and we can review and merge after the release: cc @alamb @xudong963 1. Revert the reverted PR 2. Fix multi-lines printing issue for data

Re: [I] ExternalSorter Fails to Spill Dictionaries [datafusion]

2025-03-03 Thread via GitHub
comphead closed issue #4658: ExternalSorter Fails to Spill Dictionaries URL: https://github.com/apache/datafusion/issues/4658 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [PR] Remove redundant statistics from FileScanConfig [datafusion]

2025-03-03 Thread via GitHub
alamb commented on PR #14955: URL: https://github.com/apache/datafusion/pull/14955#issuecomment-2694196626 Looks like there are some CI issues to address Note that @blaginin fixed some issues recently, so if you merge up from main it might be better now -- This is an automated mess

Re: [I] Release DataFusion `46.0.0` [datafusion]

2025-03-03 Thread via GitHub
xudong963 commented on issue #14123: URL: https://github.com/apache/datafusion/issues/14123#issuecomment-2694262604 > Would you like to try and make the release candidate now? Yes, but need to wait a bit. I'm out now -- This is an automated message from the Apache Git Service. To re

Re: [I] Release DataFusion `46.0.0` [datafusion]

2025-03-03 Thread via GitHub
alamb commented on issue #14123: URL: https://github.com/apache/datafusion/issues/14123#issuecomment-2694187272 Thanks @xudong963! I also pushed a note about the upgrade guide into the branch - https://github.com/apache/datafusion/pull/14979 Would you like to try and make the

Re: [I] Statistics: Migrate to `Distribution` from `Precision` [datafusion]

2025-03-03 Thread via GitHub
cj-zhukov commented on issue #14896: URL: https://github.com/apache/datafusion/issues/14896#issuecomment-2694351305 I had a discussion with @ozankabak about it and I'm currently working on it now. @clflushopt I'm afraid I should have mentioned this. -- This is an automated message fro

Re: [I] Eliminate limit when doing scalar aggregation query [datafusion]

2025-03-03 Thread via GitHub
alamb commented on issue #14974: URL: https://github.com/apache/datafusion/issues/14974#issuecomment-2694390544 I agree in this case the limit is not necessary. I tried it in postgres to double check ```sql postgres=# create table foo(x int); CREATE TABLE postgres=# i

  1   2   3   >