Re: [I] [Epic]: Google Summer of Code 2025 Improving Spilling Execution [datafusion]

2025-05-17 Thread via GitHub
2010YOUY01 commented on issue #16065: URL: https://github.com/apache/datafusion/issues/16065#issuecomment-2888233712 Welcome aboard! We're excited to collaborate with you for this GSoC project 😄 Regarding the plan, I can see the following sub-tasks: 1. Stabilize external sort

Re: [PR] Implement intermediate result blocked approach to aggregation memory management [datafusion]

2025-05-17 Thread via GitHub
Rachelint commented on PR #15591: URL: https://github.com/apache/datafusion/pull/15591#issuecomment-2888248445 OK... I guess it is due to the computation of `block_id` and `block_offset`. I found a machine with similar cores and profiling. Then I found in the old and so good int

Re: [PR] Implement intermediate result blocked approach to aggregation memory management [datafusion]

2025-05-17 Thread via GitHub
Rachelint commented on PR #15591: URL: https://github.com/apache/datafusion/pull/15591#issuecomment-2888248441 OK... I guess it is due to the computation of `block_id` and `block_offset`. I found a machine with similar cores and profiling. Then I found in the old and so good int

[PR] [Docs]: Added SQL example for all window functions [datafusion]

2025-05-17 Thread via GitHub
Adez017 opened a new pull request, #16074: URL: https://github.com/apache/datafusion/pull/16074 ## Which issue does this PR close? - Closes #15777 ## Rationale for this change Added SQl example for different functions in the window functions docs #

Re: [PR] [Docs]: Added SQL example for all window functions [datafusion]

2025-05-17 Thread via GitHub
Adez017 commented on PR #16074: URL: https://github.com/apache/datafusion/pull/16074#issuecomment-2888498532 @alamb could you trigger the CI . also I think there might be need for running `./dev/update_function_docs.sh` as I think it halting in my machine . -- This is an automated messag

Re: [PR] Update a bunch of dependencies [datafusion]

2025-05-17 Thread via GitHub
comphead commented on PR #16070: URL: https://github.com/apache/datafusion/pull/16070#issuecomment-2888488039 this might be related to https://github.com/apache/datafusion/pull/16062 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to Gi

Re: [I] Combine the optimization rules `decorrelate`, `decorrelate_lateral_join`, and `decorrelate_predicate_subquery` into one. [datafusion]

2025-05-17 Thread via GitHub
duongcongtoai commented on issue #16073: URL: https://github.com/apache/datafusion/issues/16073#issuecomment-2888503454 Do you think this draft [PR](https://github.com/apache/datafusion/pull/16016/files#diff-500ed5b40952dd2bdecdd297383a15a290ac6314ea4cc6162b160ad05d01) can combine all t

[PR] feat: Add auto mode for `COMET_PARQUET_SCAN_IMPL` [datafusion-comet]

2025-05-17 Thread via GitHub
andygrove opened a new pull request, #1747: URL: https://github.com/apache/datafusion-comet/pull/1747 ## Which issue does this PR close? N/A Follows on from https://github.com/apache/datafusion-comet/pull/1746 ## Rationale for this change Rather tha

Re: [PR] fix: clippy issue after rust update to 1.87 [datafusion-ballista]

2025-05-17 Thread via GitHub
andygrove merged PR #1262: URL: https://github.com/apache/datafusion-ballista/pull/1262 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr.

Re: [PR] feat: expose submit and cancel job methods as public in scheduler [datafusion-ballista]

2025-05-17 Thread via GitHub
andygrove commented on code in PR #1260: URL: https://github.com/apache/datafusion-ballista/pull/1260#discussion_r2094123218 ## ballista/scheduler/src/scheduler_server/query_stage_scheduler.rs: ## @@ -376,9 +376,9 @@ mod tests { ) .await?; -let job_i

Re: [PR] Add late pruning of file based on file level statistics [datafusion]

2025-05-17 Thread via GitHub
adriangb commented on code in PR #16014: URL: https://github.com/apache/datafusion/pull/16014#discussion_r2094127261 ## datafusion/datasource-parquet/src/opener.rs: ## @@ -111,19 +120,61 @@ impl FileOpener for ParquetOpener { .create(projected_schema, Arc::clone(&se

[PR] [wip] Scan refactor #3 [datafusion-comet]

2025-05-17 Thread via GitHub
andygrove opened a new pull request, #1746: URL: https://github.com/apache/datafusion-comet/pull/1746 ## Which issue does this PR close? Follows on from https://github.com/apache/datafusion-comet/pull/1744 ## Rationale for this change TBD ## What ch

Re: [PR] Add late pruning of file based on file level statistics [datafusion]

2025-05-17 Thread via GitHub
adriangb commented on code in PR #16014: URL: https://github.com/apache/datafusion/pull/16014#discussion_r2094128404 ## datafusion/common/src/pruning.rs: ## @@ -0,0 +1,490 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreement

Re: [PR] Add late pruning of file based on file level statistics [datafusion]

2025-05-17 Thread via GitHub
adriangb commented on code in PR #16014: URL: https://github.com/apache/datafusion/pull/16014#discussion_r2094128256 ## datafusion/common/src/pruning.rs: ## @@ -0,0 +1,490 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreement

Re: [PR] Perf: Optimize in memory sort [datafusion]

2025-05-17 Thread via GitHub
Dandandan commented on PR #15380: URL: https://github.com/apache/datafusion/pull/15380#issuecomment-2888419860 So it seems re > I am confused why my benchmark for local Mac no regression for sort-tpch Q3, but the generated benchmark for linux we can reproduce the regression. I

[I] Combine the optimization rules `decorrelate`, `decorrelate_lateral_join`, and `decorrelate_predicate_subquery` into one. [datafusion]

2025-05-17 Thread via GitHub
irenjj opened a new issue, #16073: URL: https://github.com/apache/datafusion/issues/16073 ### Is your feature request related to a problem or challenge? related to: #16015 ``` It would be really nice to figure out how to combine these passes into one unified set of decorrelation

Re: [PR] Support metadata on scalar values [datafusion]

2025-05-17 Thread via GitHub
timsaucer commented on PR #16053: URL: https://github.com/apache/datafusion/pull/16053#issuecomment-2888458903 @kylebarron @paleolimbot I tested this latest push against the `test_st_point` including the additional parts that were commented out. Do you have other examples of problems you've

Re: [PR] Perf: Optimize in memory sort [datafusion]

2025-05-17 Thread via GitHub
zhuqi-lucas commented on PR #15380: URL: https://github.com/apache/datafusion/pull/15380#issuecomment-2888423022 > > I am confused why my benchmark for local Mac no regression for sort-tpch Q3, but the generated benchmark for linux we can reproduce the regression. > > It may be that t

Re: [PR] [WIP] Experiment with DataFusion against Arrow with Extension DataType support [datafusion]

2025-05-17 Thread via GitHub
paleolimbot commented on PR #15663: URL: https://github.com/apache/datafusion/pull/15663#issuecomment-2888431041 Closing since this is no longer needed! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

Re: [PR] [WIP] Experiment with DataFusion against Arrow with Extension DataType support [datafusion]

2025-05-17 Thread via GitHub
paleolimbot closed pull request #15663: [WIP] Experiment with DataFusion against Arrow with Extension DataType support URL: https://github.com/apache/datafusion/pull/15663 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use t

Re: [PR] [wip] Scan refactor #3 [datafusion-comet]

2025-05-17 Thread via GitHub
codecov-commenter commented on PR #1746: URL: https://github.com/apache/datafusion-comet/pull/1746#issuecomment-2888432508 ## [Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/1746?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_ca

Re: [PR] Perf: Optimize in memory sort [datafusion]

2025-05-17 Thread via GitHub
zhuqi-lucas commented on PR #15380: URL: https://github.com/apache/datafusion/pull/15380#issuecomment-2888436428 Thank you @Dandandan @alamb , Addressed it in latest PR, it should be no regression now. -- This is an automated message from the Apache Git Service. To respond to the messag

Re: [PR] feat: metadata handling for aggregates and window functions [datafusion]

2025-05-17 Thread via GitHub
timsaucer commented on PR #15911: URL: https://github.com/apache/datafusion/pull/15911#issuecomment-2888464798 > Do you think it is feasible to update the scalar, aggregate, and window function APIs to use `FieldRef` instead of Field? That way we can avoid most string copies. Do you

Re: [PR] feat: Add auto mode for `COMET_PARQUET_SCAN_IMPL` [datafusion-comet]

2025-05-17 Thread via GitHub
codecov-commenter commented on PR #1747: URL: https://github.com/apache/datafusion-comet/pull/1747#issuecomment-2888529840 ## [Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/1747?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_ca

Re: [PR] minor: reduce error size [datafusion-python]

2025-05-17 Thread via GitHub
timsaucer merged PR #1126: URL: https://github.com/apache/datafusion-python/pull/1126 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...

Re: [PR] adding support for Min/Max over LargeList and FixedSizeList [datafusion]

2025-05-17 Thread via GitHub
gabotechs commented on code in PR #16071: URL: https://github.com/apache/datafusion/pull/16071#discussion_r2094196073 ## datafusion/sqllogictest/test_files/fixed_size_list.slt: ## Review Comment: Really nice to get this tested! maybe it's a bit overkill to add dedicated .s

Re: [PR] adding support for Min/Max over LargeList and FixedSizeList [datafusion]

2025-05-17 Thread via GitHub
gabotechs commented on code in PR #16071: URL: https://github.com/apache/datafusion/pull/16071#discussion_r2094196073 ## datafusion/sqllogictest/test_files/fixed_size_list.slt: ## Review Comment: Really nice to get this tested! maybe it's a bit overkill to add dedicated .s

Re: [PR] adding support for Min/Max over LargeList and FixedSizeList [datafusion]

2025-05-17 Thread via GitHub
gabotechs commented on code in PR #16071: URL: https://github.com/apache/datafusion/pull/16071#discussion_r2094196073 ## datafusion/sqllogictest/test_files/fixed_size_list.slt: ## Review Comment: Really nice to get this tested! maybe it's a bit overkill to add dedicated .s

[PR] minor: release docker on release tag (not only rc) [datafusion-ballista]

2025-05-17 Thread via GitHub
milenkovicm opened a new pull request, #1264: URL: https://github.com/apache/datafusion-ballista/pull/1264 # Which issue does this PR close? Closes #. # Rationale for this change looks like docker will only be released on rc candidate (`45.0.0-rc1`) but not on full r

[PR] build(deps): bump object_store from 0.12.0 to 0.12.1 [datafusion-python]

2025-05-17 Thread via GitHub
dependabot[bot] opened a new pull request, #1127: URL: https://github.com/apache/datafusion-python/pull/1127 Bumps [object_store](https://github.com/apache/arrow-rs-object-store) from 0.12.0 to 0.12.1. Changelog Sourced from https://github.com/apache/arrow-rs-object-store/blob/main

[PR] build(deps): bump arrow from 55.0.0 to 55.1.0 [datafusion-python]

2025-05-17 Thread via GitHub
dependabot[bot] opened a new pull request, #1128: URL: https://github.com/apache/datafusion-python/pull/1128 Bumps [arrow](https://github.com/apache/arrow-rs) from 55.0.0 to 55.1.0. Release notes Sourced from https://github.com/apache/arrow-rs/releases";>arrow's releases. ar

Re: [PR] build(deps): bump object_store from 0.11.2 to 0.12.0 [datafusion-python]

2025-05-17 Thread via GitHub
dependabot[bot] commented on PR #1071: URL: https://github.com/apache/datafusion-python/pull/1071#issuecomment-2888561264 Superseded by #1127. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] build(deps): bump object_store from 0.11.2 to 0.12.0 [datafusion-python]

2025-05-17 Thread via GitHub
dependabot[bot] closed pull request #1071: build(deps): bump object_store from 0.11.2 to 0.12.0 URL: https://github.com/apache/datafusion-python/pull/1071 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to g

[PR] build(deps): bump pyo3-build-config from 0.24.1 to 0.25.0 [datafusion-python]

2025-05-17 Thread via GitHub
dependabot[bot] opened a new pull request, #1129: URL: https://github.com/apache/datafusion-python/pull/1129 Bumps [pyo3-build-config](https://github.com/pyo3/pyo3) from 0.24.1 to 0.25.0. Release notes Sourced from https://github.com/pyo3/pyo3/releases";>pyo3-build-config's releas

Re: [PR] Move the udf module to user_defined [datafusion-python]

2025-05-17 Thread via GitHub
timsaucer merged PR #1112: URL: https://github.com/apache/datafusion-python/pull/1112 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...

Re: [PR] add unit tests for expression functions [datafusion-python]

2025-05-17 Thread via GitHub
timsaucer merged PR #1121: URL: https://github.com/apache/datafusion-python/pull/1121 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...

Re: [PR] Implement intermediate result blocked approach to aggregation memory management [datafusion]

2025-05-17 Thread via GitHub
Dandandan commented on PR #15591: URL: https://github.com/apache/datafusion/pull/15591#issuecomment-2888271159 > OK... I guess it is due to the computation of `block_id` and `block_offset`. > > I found a machine with similar inteal cores like the testing machine and profiled. >

Re: [PR] Implement intermediate result blocked approach to aggregation memory management [datafusion]

2025-05-17 Thread via GitHub
Rachelint commented on PR #15591: URL: https://github.com/apache/datafusion/pull/15591#issuecomment-2888275782 > we could enforce blocksize being a power of two so we can avoid the expensive div and mod operations? Yes, I am trying to do it. -- This is an automated message from the

Re: [PR] chore(CI) Upgrade toolchain to Rust-1.87 [datafusion]

2025-05-17 Thread via GitHub
kadai0308 commented on PR #16068: URL: https://github.com/apache/datafusion/pull/16068#issuecomment-2888172345 > Thank you for this PR @kadai0308 -- very helpful > > I am not sure about the implications of `Box`ing all the variants -- I worry it will just add additional overhead for p

Re: [PR] Require space after -- to start single line comment in MySQL [datafusion-sqlparser-rs]

2025-05-17 Thread via GitHub
hansott commented on code in PR #1705: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1705#discussion_r2094095237 ## src/tokenizer.rs: ## @@ -1229,14 +1229,26 @@ impl<'a> Tokenizer<'a> { // operators '-' => { c

Re: [PR] Require space after -- to start single line comment in MySQL [datafusion-sqlparser-rs]

2025-05-17 Thread via GitHub
vimko commented on code in PR #1705: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1705#discussion_r2094040996 ## src/tokenizer.rs: ## @@ -1229,14 +1229,26 @@ impl<'a> Tokenizer<'a> { // operators '-' => { cha

Re: [PR] Update extending-operators.md [datafusion]

2025-05-17 Thread via GitHub
Adez017 commented on PR #15832: URL: https://github.com/apache/datafusion/pull/15832#issuecomment-2888345095 @alamb cc: @xudong963 check it now -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to th

[PR] hex function [datafusion]

2025-05-17 Thread via GitHub
ajita-asthana opened a new pull request, #16077: URL: https://github.com/apache/datafusion/pull/16077 ## Which issue does this PR close? - Closes #15986 ## Rationale for this change ## What changes are included in this PR? ## Are these chang

Re: [PR] adding support for Min/Max over LargeList and FixedSizeList [datafusion]

2025-05-17 Thread via GitHub
logan-keede commented on code in PR #16071: URL: https://github.com/apache/datafusion/pull/16071#discussion_r2094229214 ## datafusion/sqllogictest/test_files/fixed_size_list.slt: ## Review Comment: > Very nice that this was shipped so fast, thanks! I did not do anyth

[PR] Semver-checks for all crate on merge and PR [datafusion]

2025-05-17 Thread via GitHub
logan-keede opened a new pull request, #16078: URL: https://github.com/apache/datafusion/pull/16078 ## Which issue does this PR close? - Closes #. ## Rationale for this change - closes #15408 - closes #13648 ## What changes are included in this PR?

Re: [PR] Semver-checks for all crate on merge and PR [datafusion]

2025-05-17 Thread via GitHub
logan-keede commented on PR #16078: URL: https://github.com/apache/datafusion/pull/16078#issuecomment-2888652078 `cargo semver-checks` is probably too heavy to run on every push but I have kept it for testing purpose. I would also like to know other contributor/committer`s opinion.

Re: [I] [DISCUSS] Data quality framework using DataFusion [datafusion]

2025-05-17 Thread via GitHub
jsai28 closed issue #15483: [DISCUSS] Data quality framework using DataFusion URL: https://github.com/apache/datafusion/issues/15483 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comme

[PR] Remove outdated param in macos bench guide [datafusion-comet]

2025-05-17 Thread via GitHub
ding-young opened a new pull request, #1748: URL: https://github.com/apache/datafusion-comet/pull/1748 ## Which issue does this PR close? It seems like `tpcbench.py` in `datafusion-benchmarks` no longer take `name` as argument. This pr removes `--name` in `benchmarking_macos.md`

Re: [I] Support Push down expression evaluation in `TableProviders` [datafusion]

2025-05-17 Thread via GitHub
adriangb commented on issue #14993: URL: https://github.com/apache/datafusion/issues/14993#issuecomment-2888737240 Another example that this enables: https://docs.pinot.apache.org/basics/indexing/timestamp-index -- This is an automated message from the Apache Git Service. To respond to th

[PR] fix: describe escaped quoted identifiers [datafusion]

2025-05-17 Thread via GitHub
jfahne opened a new pull request, #16082: URL: https://github.com/apache/datafusion/pull/16082 ## Which issue does this PR close? - Closes #16017 ## Rationale for this change The dataframe `describe` method serves as a tidier way to produce standard summ

Re: [PR] feat: add RightMark Join [datafusion]

2025-05-17 Thread via GitHub
jonathanc-n closed pull request #13252: feat: add RightMark Join URL: https://github.com/apache/datafusion/pull/13252 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubsc

[PR] Set `TrackConsumersPool` as default in datafusion-cli [datafusion]

2025-05-17 Thread via GitHub
ding-young opened a new pull request, #16081: URL: https://github.com/apache/datafusion/pull/16081 ## Which issue does this PR close? Part of #16065 ## Rationale for this change Currently, datafusion-cli does not provide an option to use `TrackConsumersPool` as memory p

Re: [PR] Optimize char expression [datafusion]

2025-05-17 Thread via GitHub
ajita-asthana commented on PR #16076: URL: https://github.com/apache/datafusion/pull/16076#issuecomment-2888593287 > ## Which issue does this PR close? > > > * Closes #16009 > > ## Rationale for this change > ## What changes are included in this PR? > ## Are these chan

Re: [PR] adding support for Min/Max over LargeList and FixedSizeList [datafusion]

2025-05-17 Thread via GitHub
logan-keede commented on code in PR #16071: URL: https://github.com/apache/datafusion/pull/16071#discussion_r2094229214 ## datafusion/sqllogictest/test_files/fixed_size_list.slt: ## Review Comment: > Very nice that this was shipped so fast, thanks! I did not do anyth

Re: [PR] Support metadata on scalar values [datafusion]

2025-05-17 Thread via GitHub
paleolimbot commented on code in PR #16053: URL: https://github.com/apache/datafusion/pull/16053#discussion_r2094213318 ## datafusion/physical-expr/src/expressions/literal.rs: ## @@ -34,15 +36,37 @@ use datafusion_expr_common::interval_arithmetic::Interval; use datafusion_expr_

[PR] Optimize char expression [datafusion]

2025-05-17 Thread via GitHub
ajita-asthana opened a new pull request, #16076: URL: https://github.com/apache/datafusion/pull/16076 ## Which issue does this PR close? - Closes #. ## Rationale for this change ## What changes are included in this PR? ## Are these changes t

Re: [PR] adding support for Min/Max over LargeList and FixedSizeList [datafusion]

2025-05-17 Thread via GitHub
logan-keede commented on code in PR #16071: URL: https://github.com/apache/datafusion/pull/16071#discussion_r2094229214 ## datafusion/sqllogictest/test_files/fixed_size_list.slt: ## Review Comment: > Very nice that this was shipped so fast, thanks! I did not do anyth

[PR] feat: add RightMark Join [datafusion]

2025-05-17 Thread via GitHub
jonathanc-n opened a new pull request, #13252: URL: https://github.com/apache/datafusion/pull/13252 ## Which issue does this PR close? Closes #13138 . ## Rationale for this change ## What changes are included in this PR? ## Are these changes tes

Re: [PR] feat: Fix multi-lines printing issue for datafusion-cli and add the streaming printing feature back [datafusion]

2025-05-17 Thread via GitHub
github-actions[bot] commented on PR #14954: URL: https://github.com/apache/datafusion/pull/14954#issuecomment-2888711873 Thank you for your contribution. Unfortunately, this pull request is stale because it has been open 60 days with no activity. Please remove the stale label or comment or

Re: [PR] BUG: schema_force_view_type configuration not working for CREATE EXTERNAL TABLE [datafusion]

2025-05-17 Thread via GitHub
github-actions[bot] closed pull request #14922: BUG: schema_force_view_type configuration not working for CREATE EXTERNAL TABLE URL: https://github.com/apache/datafusion/pull/14922 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[PR] Don't collect statistics by default [datafusion]

2025-05-17 Thread via GitHub
adriangb opened a new pull request, #16080: URL: https://github.com/apache/datafusion/pull/16080 Working on https://github.com/apache/datafusion/pull/16014 I think I found that we collect parquet statistics by default on `ListingTable` *despite the fact that the config option defaults to fa

Re: [PR] disable coercison for unmatched struct type [datafusion]

2025-05-17 Thread via GitHub
github-actions[bot] closed pull request #14409: disable coercison for unmatched struct type URL: https://github.com/apache/datafusion/pull/14409 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the s

Re: [PR] Make ListingTable obey `collect_statistics` config [datafusion]

2025-05-17 Thread via GitHub
adriangb commented on PR #16080: URL: https://github.com/apache/datafusion/pull/16080#issuecomment-2888713683 cc @alamb am I missing something here ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [PR] Make ListingTable obey `collect_statistics` config [datafusion]

2025-05-17 Thread via GitHub
adriangb commented on PR #16080: URL: https://github.com/apache/datafusion/pull/16080#issuecomment-2888713483 We could also flip the default of `ListingTableOptions` which IMO is reasonable (it should match the default in `SessionConfig`) and since https://github.com/apache/datafusion/pull/

Re: [PR] feat: Support On-Demand Repartition [datafusion]

2025-05-17 Thread via GitHub
github-actions[bot] commented on PR #14411: URL: https://github.com/apache/datafusion/pull/14411#issuecomment-2888711945 Thank you for your contribution. Unfortunately, this pull request is stale because it has been open 60 days with no activity. Please remove the stale label or comment or

Re: [PR] Introducing mutation testing [datafusion]

2025-05-17 Thread via GitHub
github-actions[bot] closed pull request #14590: Introducing mutation testing URL: https://github.com/apache/datafusion/pull/14590 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] refactor: do ambiguous_distinct_check in select [datafusion]

2025-05-17 Thread via GitHub
github-actions[bot] commented on PR #14180: URL: https://github.com/apache/datafusion/pull/14180#issuecomment-2888711988 Thank you for your contribution. Unfortunately, this pull request is stale because it has been open 60 days with no activity. Please remove the stale label or comment or

Re: [PR] [WIP] chore: Add detailed error for sum::coerce_type [datafusion]

2025-05-17 Thread via GitHub
github-actions[bot] closed pull request #14710: [WIP] chore: Add detailed error for sum::coerce_type URL: https://github.com/apache/datafusion/pull/14710 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [PR] feat: Support On-Demand Repartition [datafusion]

2025-05-17 Thread via GitHub
adriangb commented on PR #14411: URL: https://github.com/apache/datafusion/pull/14411#issuecomment-2888717383 I'm sad to see this go stale 😢 , sadly I also don't have the bandwith to push it forward -- This is an automated message from the Apache Git Service. To respond to the message, pl

Re: [PR] Reuse last projection layer when renaming columns [datafusion]

2025-05-17 Thread via GitHub
github-actions[bot] commented on PR #14684: URL: https://github.com/apache/datafusion/pull/14684#issuecomment-2888711920 Thank you for your contribution. Unfortunately, this pull request is stale because it has been open 60 days with no activity. Please remove the stale label or comment or

Re: [PR] [Docs]: Added SQL example for all window functions [datafusion]

2025-05-17 Thread via GitHub
Adez017 commented on PR #16074: URL: https://github.com/apache/datafusion/pull/16074#issuecomment-2888720939 > To get the examples updated in md files it would be needed to update Documentation builders Thanks for suggestion mate I fired ` ./dev/update_function_docs.sh` but I think i

Re: [I] Provide a way to enable source level statistics for tables registered in the CLI [datafusion]

2025-05-17 Thread via GitHub
adriangb commented on issue #3774: URL: https://github.com/apache/datafusion/issues/3774#issuecomment-2888721001 I think this is now solved. There is a config option, that can be set in datafusion-cli, to collect stats or not during planning. @Dandandan @isidentical can we close the issue?

Re: [PR] Make ListingTable obey `collect_statistics` config [datafusion]

2025-05-17 Thread via GitHub
adriangb commented on PR #16080: URL: https://github.com/apache/datafusion/pull/16080#issuecomment-2888721262 Looks like the config option was added in https://github.com/apache/datafusion/pull/3846 and it's just never agreed with `ListingTableOptions` -- This is an automated message fro

Re: [PR] Add late pruning of file based on file level statistics [datafusion]

2025-05-17 Thread via GitHub
adriangb commented on PR #16014: URL: https://github.com/apache/datafusion/pull/16014#issuecomment-2888722612 My plan for this PR now is to first resolve blockers. In particular: - https://github.com/apache/datafusion/pull/16069 - https://github.com/apache/datafusion/pull/16080 - PR

Re: [PR] Make ListingTable obey `collect_statistics` config [datafusion]

2025-05-17 Thread via GitHub
adriangb commented on PR #16080: URL: https://github.com/apache/datafusion/pull/16080#issuecomment-2888725508 cc @Dandandan since you added the config option originally in #3846 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[PR] feat: Support RightMark join for NestedLoop and Hash join [datafusion]

2025-05-17 Thread via GitHub
jonathanc-n opened a new pull request, #16083: URL: https://github.com/apache/datafusion/pull/16083 ## Which issue does this PR close? - Closes #13138 . ## Rationale for this change Revamp implementation of the previous stale implementation for RightMark ##

Re: [PR] feat: Support RightMark join for NestedLoop and Hash join [datafusion]

2025-05-17 Thread via GitHub
jonathanc-n commented on PR #16083: URL: https://github.com/apache/datafusion/pull/16083#issuecomment-2888794567 cc @comphead -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment

Re: [PR] feat: Support RightMark join for NestedLoop and Hash join [datafusion]

2025-05-17 Thread via GitHub
jonathanc-n commented on code in PR #16083: URL: https://github.com/apache/datafusion/pull/16083#discussion_r2094393265 ## datafusion/physical-plan/src/joins/nested_loop_join.rs: ## @@ -1009,15 +1010,27 @@ fn join_left_and_right_batch( right_side_ordered, )?; -

Re: [I] Combine the optimization rules `decorrelate`, `decorrelate_lateral_join`, and `decorrelate_predicate_subquery` into one. [datafusion]

2025-05-17 Thread via GitHub
irenjj commented on issue #16073: URL: https://github.com/apache/datafusion/issues/16073#issuecomment-201933 Cool! Thanks @duongcongtoai, maybe we can support lateral join after #16016 merged. -- This is an automated message from the Apache Git Service. To respond to the message, plea