Re: [PR] Test: Add checks to sqllogictest temporary file creations [datafusion]

2025-08-02 Thread via GitHub
2010YOUY01 commented on code in PR #17017: URL: https://github.com/apache/datafusion/pull/17017#discussion_r2249156779 ## docs/source/user-guide/sql/window_functions.md: ## @@ -331,6 +331,8 @@ FROM employees; +-++-+ ``` +# Review Comment: CI f

Re: [PR] fix error result in execute&pre_selection [datafusion]

2025-08-02 Thread via GitHub
acking-you commented on code in PR #16930: URL: https://github.com/apache/datafusion/pull/16930#discussion_r2249183827 ## datafusion/physical-expr/src/expressions/binary.rs: ## @@ -375,7 +375,19 @@ impl PhysicalExpr for BinaryExpr { // as it takes into account c

Re: [PR] fix error result in execute&pre_selection [datafusion]

2025-08-02 Thread via GitHub
alamb commented on code in PR #16930: URL: https://github.com/apache/datafusion/pull/16930#discussion_r2249196239 ## datafusion/physical-expr/src/expressions/binary.rs: ## @@ -375,7 +375,46 @@ impl PhysicalExpr for BinaryExpr { // as it takes into account cases

Re: [PR] feat: Cache Parquet metadata in built in parquet reader [datafusion]

2025-08-02 Thread via GitHub
alamb commented on PR #16971: URL: https://github.com/apache/datafusion/pull/16971#issuecomment-3146416770 Thanks for checking @BlakeOrth @nuno-faria -- how do you want to proceed with this PR? it would be great to avoid the panic in the initial PR, but since it is turned off by

Re: [I] Cache Parquet Metadata [datafusion]

2025-08-02 Thread via GitHub
alamb closed issue #15582: Cache Parquet Metadata URL: https://github.com/apache/datafusion/issues/15582 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

Re: [PR] feat: Cache Parquet metadata in built in parquet reader [datafusion]

2025-08-02 Thread via GitHub
alamb merged PR #16971: URL: https://github.com/apache/datafusion/pull/16971 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] fix: `ComposedPhysicalExtensionCodec` does not use the same codec as encoding when decoding [datafusion]

2025-08-02 Thread via GitHub
alamb merged PR #16986: URL: https://github.com/apache/datafusion/pull/16986 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [I] `ComposedPhysicalExtensionCodec` is unsound [datafusion]

2025-08-02 Thread via GitHub
alamb closed issue #16980: `ComposedPhysicalExtensionCodec` is unsound URL: https://github.com/apache/datafusion/issues/16980 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [PR] fix: `ComposedPhysicalExtensionCodec` does not use the same codec as encoding when decoding [datafusion]

2025-08-02 Thread via GitHub
alamb commented on PR #16986: URL: https://github.com/apache/datafusion/pull/16986#issuecomment-3146417994 Thanks again everyone -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comme

Re: [PR] fix: `ComposedPhysicalExtensionCodec` does not use the same codec as encoding when decoding [datafusion]

2025-08-02 Thread via GitHub
alamb commented on code in PR #16986: URL: https://github.com/apache/datafusion/pull/16986#discussion_r2249201460 ## datafusion/proto/src/physical_plan/mod.rs: ## @@ -2941,12 +2941,126 @@ impl PhysicalExtensionCodec for DefaultPhysicalExtensionCodec { } } +/// DataEncod

Re: [PR] Make `AsyncScalarUDFImpl::invoke_async_with_args` consistent with `ScalarUDFImpl::invoke_with_args` [datafusion]

2025-08-02 Thread via GitHub
alamb commented on PR #16902: URL: https://github.com/apache/datafusion/pull/16902#issuecomment-3146418199 I plan to merge this Monday unless anyone would like additional time to comment -- This is an automated message from the Apache Git Service. To respond to the message, please log on

Re: [PR] chore: Refactor StructsToJson serde [datafusion-comet]

2025-08-02 Thread via GitHub
codecov-commenter commented on PR #2064: URL: https://github.com/apache/datafusion-comet/pull/2064#issuecomment-3146583685 ## [Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/2064?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_ca

[PR] chore: Refactor string expression serde [datafusion-comet]

2025-08-02 Thread via GitHub
andygrove opened a new pull request, #2065: URL: https://github.com/apache/datafusion-comet/pull/2065 ## Which issue does this PR close? Part of https://github.com/apache/datafusion-comet/issues/2019 ## Rationale for this change ## What changes are include

[PR] Add `prettier` to the devcontainer [datafusion]

2025-08-02 Thread via GitHub
alamb opened a new pull request, #17019: URL: https://github.com/apache/datafusion/pull/17019 ## Which issue does this PR close? - Closes #. ## Rationale for this change I suggested using dev containers on this PR https://github.com/apache/datafusion/pull/17018

Re: [PR] docs: Fix random extra bullet for 'Analytical Functions' [datafusion]

2025-08-02 Thread via GitHub
alamb commented on PR #17014: URL: https://github.com/apache/datafusion/pull/17014#issuecomment-3146435305 Somehow this caused an error in CI on main - https://github.com/apache/datafusion/actions/runs/16692610983/job/47252467412 Probably because it didn't run the `check_function_d

Re: [PR] Rewrite Nested Loop Join executor for 3.5× speed and 1% memory usage [datafusion]

2025-08-02 Thread via GitHub
2010YOUY01 commented on PR #16996: URL: https://github.com/apache/datafusion/pull/16996#issuecomment-3146452338 @UBarney I'm so sorry about that — I completely let that issue discussion slip through. For important matters, feel free to ping me multiple times or bring it up again in a PR

[PR] chore: Refactor StructsToJson serde [datafusion-comet]

2025-08-02 Thread via GitHub
andygrove opened a new pull request, #2064: URL: https://github.com/apache/datafusion-comet/pull/2064 ## Which issue does this PR close? Closes #. ## Rationale for this change ## What changes are included in this PR? ## How are these changes

Re: [PR] build(deps): bump tokio from 1.46.1 to 1.47.0 [datafusion-python]

2025-08-02 Thread via GitHub
dependabot[bot] commented on PR #1191: URL: https://github.com/apache/datafusion-python/pull/1191#issuecomment-3146672176 Superseded by #1194. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] build(deps): bump tokio from 1.46.1 to 1.47.0 [datafusion-python]

2025-08-02 Thread via GitHub
dependabot[bot] closed pull request #1191: build(deps): bump tokio from 1.46.1 to 1.47.0 URL: https://github.com/apache/datafusion-python/pull/1191 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to th

Re: [I] RFC: What table provider features would be helpful in an example? [datafusion]

2025-08-02 Thread via GitHub
coracuity commented on issue #16821: URL: https://github.com/apache/datafusion/issues/16821#issuecomment-3147382664 @alamb it's definitely a kind of range join. Having generalized optimizations for that would definitely be welcome, but I think in this case I can outperform any join. Each pa

Re: [PR] Implement spark `array` function `array` [datafusion]

2025-08-02 Thread via GitHub
Standing-Man commented on code in PR #16936: URL: https://github.com/apache/datafusion/pull/16936#discussion_r2249484518 ## datafusion/spark/src/function/array/spark_array.rs: ## @@ -0,0 +1,273 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contrib

Re: [PR] Implement spark `array` function `array` [datafusion]

2025-08-02 Thread via GitHub
Standing-Man commented on code in PR #16936: URL: https://github.com/apache/datafusion/pull/16936#discussion_r2249484518 ## datafusion/spark/src/function/array/spark_array.rs: ## @@ -0,0 +1,273 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contrib

[PR] Postgres: enhance NUMERIC/DECIMAL parsing to support negative scale [datafusion-sqlparser-rs]

2025-08-02 Thread via GitHub
IndexSeek opened a new pull request, #1990: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1990 Closes #1923 > Beginning in PostgreSQL 15, it is allowed to declare a numeric column with a negative scale. **Source:** [PostgreSQL Documentation - Numeric Types](http

Re: [PR] feat: support multiple value for pivot [datafusion-sqlparser-rs]

2025-08-02 Thread via GitHub
chenkovsky commented on code in PR #1970: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1970#discussion_r2249219058 ## src/parser/mod.rs: ## @@ -13828,7 +13840,13 @@ impl<'a> Parser<'a> { self.expect_token(&Token::LParen)?; let aggregate_function

Re: [PR] Rewrite Nested Loop Join executor for 3.5× speed and 1% memory usage [datafusion]

2025-08-02 Thread via GitHub
UBarney commented on PR #16996: URL: https://github.com/apache/datafusion/pull/16996#issuecomment-3146402155 Hi @2010YOUY01 Thanks for creating this amazing PR and for the detailed explanation on why we don't need to maintain the right_side order. This is a great optimization!

Re: [PR] fix error result in execute&pre_selection [datafusion]

2025-08-02 Thread via GitHub
acking-you commented on code in PR #16930: URL: https://github.com/apache/datafusion/pull/16930#discussion_r2249190771 ## datafusion/physical-expr/src/expressions/binary.rs: ## @@ -375,7 +375,19 @@ impl PhysicalExpr for BinaryExpr { // as it takes into account c

Re: [I] [Epic] Enable parquet metadata cache by default (mini) [datafusion]

2025-08-02 Thread via GitHub
shehabgamin commented on issue #17000: URL: https://github.com/apache/datafusion/issues/17000#issuecomment-3146494278 This is how I've implemented the `FileMetadataCache` in Sail so far, in case it's helpful to anyone: https://github.com/lakehq/sail/pull/687/files#diff-6a11a7f50f3537aaf5

Re: [PR] feat: Add ConfigOptions to ScalarFunctionArgs [datafusion]

2025-08-02 Thread via GitHub
Omega359 commented on PR #13527: URL: https://github.com/apache/datafusion/pull/13527#issuecomment-3146636245 Closing as #16970 is preferred approach -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [PR] feat: Add ConfigOptions to ScalarFunctionArgs [datafusion]

2025-08-02 Thread via GitHub
Omega359 closed pull request #13527: feat: Add ConfigOptions to ScalarFunctionArgs URL: https://github.com/apache/datafusion/pull/13527 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific c

Re: [PR] Blog on Extending SQL to create own SQL Dialects [datafusion-site]

2025-08-02 Thread via GitHub
Adez017 commented on PR #97: URL: https://github.com/apache/datafusion-site/pull/97#issuecomment-3146952580 Hi @alamb , just checking in. If you need any help, let me know, as you have many overheads. -- This is an automated message from the Apache Git Service. To respond to the message,

Re: [PR] Implement Spark `url` function `parse_url` [datafusion]

2025-08-02 Thread via GitHub
Standing-Man commented on code in PR #16937: URL: https://github.com/apache/datafusion/pull/16937#discussion_r2249512245 ## datafusion/spark/src/function/url/parse_url.rs: ## @@ -0,0 +1,514 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor

Re: [PR] Test and fix for issue #16998: SortExec shares DynamicFilterPhysicalExpr across multiple executions [datafusion]

2025-08-02 Thread via GitHub
robertream commented on PR #17016: URL: https://github.com/apache/datafusion/pull/17016#issuecomment-3146515978 I was vibe coding this one as an experiment, so if this isn’t the right fix, feel free to throw it away, I'm only out tokens, not labor/ :) On Fri, Aug 1, 2025 at 9:32 PM

Re: [PR] feat: Support On-Demand Repartition [datafusion]

2025-08-02 Thread via GitHub
AdamGS commented on PR #14411: URL: https://github.com/apache/datafusion/pull/14411#issuecomment-3146537884 I can take the work to rebase this PR and fix the conflicts, is there interest? -- This is an automated message from the Apache Git Service. To respond to the message, please log on

Re: [PR] Chore: implement string_space as ScalarUDFImpl [datafusion-comet]

2025-08-02 Thread via GitHub
kazantsev-maksim commented on PR #2041: URL: https://github.com/apache/datafusion-comet/pull/2041#issuecomment-3146618427 Related: #2065 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spec

Re: [PR] chore: Refactor string expression serde [datafusion-comet]

2025-08-02 Thread via GitHub
codecov-commenter commented on PR #2065: URL: https://github.com/apache/datafusion-comet/pull/2065#issuecomment-3146621961 ## [Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/2065?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_ca

Re: [PR] feat: implement partition_statistics for HashJoinExec [datafusion]

2025-08-02 Thread via GitHub
0xPoe commented on PR #16956: URL: https://github.com/apache/datafusion/pull/16956#issuecomment-3146699924 @xudong963 Could you please take a look when you have time? Thank you! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

Re: [PR] feat: Support On-Demand Repartition [datafusion]

2025-08-02 Thread via GitHub
AdamGS commented on PR #14411: URL: https://github.com/apache/datafusion/pull/14411#issuecomment-3146689387 Ready here - https://github.com/apache/datafusion/pull/17020, I think I didn't mess up the conflict resolution too much, I'll probably do another pass to make sure. I don't ful

[PR] Reviving #14411 - introducing on-demand repartitioning [datafusion]

2025-08-02 Thread via GitHub
AdamGS opened a new pull request, #17020: URL: https://github.com/apache/datafusion/pull/17020 This PR closes #14287, but is actually just a squash + rebase of #14411 by @Weijun-H., see the original PR for all the benchmarking results, rationale and thinking. AFAICT there are no open

Re: [PR] Update Scalar_functions.md [datafusion]

2025-08-02 Thread via GitHub
alamb commented on PR #17018: URL: https://github.com/apache/datafusion/pull/17018#issuecomment-3146420097 Hi @Adez017 > hi @alamb , i coudnt find any rust file for this function . I think you can find the corresponding rust file by searching for the function description. For

Re: [PR] Upgrade arrow/parquet to 56.0.0 [datafusion]

2025-08-02 Thread via GitHub
alamb commented on code in PR #16690: URL: https://github.com/apache/datafusion/pull/16690#discussion_r2249203274 ## datafusion/common/src/config.rs: ## @@ -601,13 +601,6 @@ config_namespace! { /// default parquet writer setting pub statistics_enabled: Option,

Re: [PR] Upgrade arrow/parquet to 56.0.0 [datafusion]

2025-08-02 Thread via GitHub
alamb commented on PR #16690: URL: https://github.com/apache/datafusion/pull/16690#issuecomment-3146420535 This one is now ready for review -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spe

Re: [PR] Address memory over-accounting in array_agg [datafusion]

2025-08-02 Thread via GitHub
2010YOUY01 commented on code in PR #16816: URL: https://github.com/apache/datafusion/pull/16816#discussion_r2249181890 ## datafusion/functions-aggregate/src/array_agg.rs: ## @@ -378,7 +372,7 @@ impl Accumulator for ArrayAggAccumulator { + self .valu

Re: [PR] feat: Support On-Demand Repartition [datafusion]

2025-08-02 Thread via GitHub
adriangb commented on PR #14411: URL: https://github.com/apache/datafusion/pull/14411#issuecomment-3146547049 I think so! I haven't fully grokked what this PR means but I will say one of the pieces of DataFusion we customize is partitioning / concurrency so I'm interested to see what impact

[PR] build(deps): bump tokio from 1.46.1 to 1.47.1 [datafusion-python]

2025-08-02 Thread via GitHub
dependabot[bot] opened a new pull request, #1194: URL: https://github.com/apache/datafusion-python/pull/1194 Bumps [tokio](https://github.com/tokio-rs/tokio) from 1.46.1 to 1.47.1. Release notes Sourced from https://github.com/tokio-rs/tokio/releases";>tokio's releases. Tokio

[PR] build(deps): bump datafusion-proto from 48.0.1 to 49.0.0 [datafusion-python]

2025-08-02 Thread via GitHub
dependabot[bot] opened a new pull request, #1196: URL: https://github.com/apache/datafusion-python/pull/1196 Bumps [datafusion-proto](https://github.com/apache/datafusion) from 48.0.1 to 49.0.0. Commits https://github.com/apache/datafusion/commit/273d37a5968571900bfe9efa1ee89f9

[PR] build(deps): bump datafusion from 48.0.1 to 49.0.0 [datafusion-python]

2025-08-02 Thread via GitHub
dependabot[bot] opened a new pull request, #1197: URL: https://github.com/apache/datafusion-python/pull/1197 Bumps [datafusion](https://github.com/apache/datafusion) from 48.0.1 to 49.0.0. Commits https://github.com/apache/datafusion/commit/273d37a5968571900bfe9efa1ee89f97914da

[PR] build(deps): bump datafusion-substrait from 48.0.1 to 49.0.0 [datafusion-python]

2025-08-02 Thread via GitHub
dependabot[bot] opened a new pull request, #1195: URL: https://github.com/apache/datafusion-python/pull/1195 Bumps [datafusion-substrait](https://github.com/apache/datafusion) from 48.0.1 to 49.0.0. Commits https://github.com/apache/datafusion/commit/273d37a5968571900bfe9efa1ee

[PR] build(deps): bump datafusion-ffi from 48.0.1 to 49.0.0 [datafusion-python]

2025-08-02 Thread via GitHub
dependabot[bot] opened a new pull request, #1198: URL: https://github.com/apache/datafusion-python/pull/1198 Bumps [datafusion-ffi](https://github.com/apache/datafusion) from 48.0.1 to 49.0.0. Commits https://github.com/apache/datafusion/commit/273d37a5968571900bfe9efa1ee89f979