Re: [I] Run all benchmarks on merge to main branch [datafusion]

2025-04-01 Thread via GitHub
Shreyaskr1409 commented on issue #15511: URL: https://github.com/apache/datafusion/issues/15511#issuecomment-2769119055 > I don't think we need to actually "benchmark" the code for each merge How about we set a tag/label for performance related PRs and run benchmark tests for those sp

[I] AQE Unable to Rewrite Joins as Broadcast Hash Joins Due to Existing CometBroadcastHashJoin Operator [datafusion-comet]

2025-04-01 Thread via GitHub
Kontinuation opened a new issue, #1589: URL: https://github.com/apache/datafusion-comet/issues/1589 ### Describe the bug AQE could transform SortMergeJoin or ShuffledHashJoin to BroadcastHashJoin dynamically after discovering that one of the Exchange operator only shuffle writes smal

Re: [PR] fix: Queries similar to `count-bug` produce incorrect results [datafusion]

2025-04-01 Thread via GitHub
suibianwanwank commented on PR #15281: URL: https://github.com/apache/datafusion/pull/15281#issuecomment-2769563811 @jayzhan211 From this field, it seems there's no issue, but how to get the e.b field if we perform aggregation on the Join. Since this query requires all rows from the left ta

[I] Internal error: PhysicalExpr Column references bound error, Failure in spilling for `AggregateMode::Single` [datafusion]

2025-04-01 Thread via GitHub
rluvaton opened a new issue, #15530: URL: https://github.com/apache/datafusion/issues/15530 ### Describe the bug when using aggregate exec with single mode, and spilling and the group by expressions are not the first expressions from the previous plan there will be schema mismatch

Re: [PR] docs: change OSX/OS X to macOS [datafusion-comet]

2025-04-01 Thread via GitHub
andygrove merged PR #1584: URL: https://github.com/apache/datafusion-comet/pull/1584 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@

Re: [I] Add documentation example for `AggregateExprBuilder` [datafusion]

2025-04-01 Thread via GitHub
Shreyaskr1409 commented on issue #15369: URL: https://github.com/apache/datafusion/issues/15369#issuecomment-2769196668 @alamb should I add this example to datafusion-examples as well? as per https://github.com/apache/datafusion/pull/15504#issuecomment-2767270135 -- This is an automated m

Re: [PR] chore: Reimplement ShuffleWriterExec using interleave_record_batch [datafusion-comet]

2025-04-01 Thread via GitHub
Kontinuation commented on code in PR #1511: URL: https://github.com/apache/datafusion-comet/pull/1511#discussion_r2022987423 ## native/core/src/execution/shuffle/shuffle_writer.rs: ## @@ -852,16 +1079,64 @@ impl PartitionBuffer { file: spill_data, }

Re: [I] Extend TopK early termination to partially sorted inputs [datafusion]

2025-04-01 Thread via GitHub
geoffreyclaude commented on issue #15529: URL: https://github.com/apache/datafusion/issues/15529#issuecomment-2769593513 I ran some quick [experiments on my fork](https://github.com/geoffreyclaude/datafusion/pull/3) by checking for early termination after each batch processed in the "topK"

Re: [I] March 17, 2025: This week(s) in DataFusion [datafusion]

2025-04-01 Thread via GitHub
alamb commented on issue #15269: URL: https://github.com/apache/datafusion/issues/15269#issuecomment-2769431998 And we now have explain plans on by default 😍 - https://github.com/apache/datafusion/pull/15427 -- This is an automated message from the Apache Git Service. To respond to the

Re: [PR] Migrate `datafusion/sql` tests to insta, part2 [datafusion]

2025-04-01 Thread via GitHub
alamb merged PR #15499: URL: https://github.com/apache/datafusion/pull/15499 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [I] Improve html table rendering formatting [datafusion-python]

2025-04-01 Thread via GitHub
AgalyaS1757 commented on issue #1078: URL: https://github.com/apache/datafusion-python/issues/1078#issuecomment-2769659726 Solution Are; 1. Modify the Data Size Limit Identify the part of the code where the 2MB limit is enforced. Introduce a user-configurable parameter (e.g., ma

Re: [I] Use global tokio runtime per executor process [datafusion-comet]

2025-04-01 Thread via GitHub
andygrove commented on issue #1590: URL: https://github.com/apache/datafusion-comet/issues/1590#issuecomment-2769667177 See https://github.com/apache/datafusion-comet/pull/1104 for a previous attempt at implementing this. -- This is an automated message from the Apache Git Service. To re

Re: [PR] Fix sequential metadata fetching in ListingTable causing high latency [datafusion]

2025-04-01 Thread via GitHub
alamb commented on PR #14918: URL: https://github.com/apache/datafusion/pull/14918#issuecomment-2769716046 This was referenced by @sergiimk in https://discord.com/channels/885562378132000778/1290751484807352412/1356393367566553240 (they hit the same problem and was pleased to find it fixed

Re: [PR] fix: Queries similar to `count-bug` produce incorrect results [datafusion]

2025-04-01 Thread via GitHub
jayzhan211 commented on PR #15281: URL: https://github.com/apache/datafusion/pull/15281#issuecomment-2769722147 The projection required to be in the group expression. I think the query of these 2 are equivalent but the subquery one group by `e2.b` and the join query group by `e1.b`.

Re: [PR] Fix duplicate unqualified Field name (schema error) on join queries [datafusion]

2025-04-01 Thread via GitHub
LiaCastaneda commented on code in PR #15438: URL: https://github.com/apache/datafusion/pull/15438#discussion_r2022571200 ## datafusion/expr/src/logical_plan/builder.rs: ## @@ -1470,17 +1470,27 @@ impl ValuesFields { pub fn change_redundant_column(fields: &Fields) -> Vec { Re

Re: [PR] Add short circuit evaluation for `AND` and `OR` [datafusion]

2025-04-01 Thread via GitHub
ctsk commented on code in PR #15462: URL: https://github.com/apache/datafusion/pull/15462#discussion_r2022894121 ## datafusion/physical-expr/src/expressions/binary.rs: ## @@ -805,6 +811,47 @@ impl BinaryExpr { } } +/// Check if it meets the short-circuit condition +/// 1

Re: [PR] use state machine to refactor the `get_files_with_limit` method [datafusion]

2025-04-01 Thread via GitHub
xudong963 commented on PR #15521: URL: https://github.com/apache/datafusion/pull/15521#issuecomment-2769429971 Thanks @jayzhan211 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific com

Re: [I] Follow up #15432 [datafusion]

2025-04-01 Thread via GitHub
xudong963 closed issue #15519: Follow up #15432 URL: https://github.com/apache/datafusion/issues/15519 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: g

Re: [I] Weekly Plan (Andrew Lamb) March 24, 2025 [datafusion]

2025-04-01 Thread via GitHub
alamb closed issue #15393: Weekly Plan (Andrew Lamb) March 24, 2025 URL: https://github.com/apache/datafusion/issues/15393 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To un

[I] Weekly Plan (Andrew Lamb) March 31, 2025 [datafusion]

2025-04-01 Thread via GitHub
alamb opened a new issue, #15528: URL: https://github.com/apache/datafusion/issues/15528 This is an attempt to organize myself and make what I plan to work on more visible ## Weekly High Level Goals - [ ] Work on integrating tpch data generator with @clflushopt : https://github.c

Re: [I] Consolidate statistics aggregation [datafusion]

2025-04-01 Thread via GitHub
alamb commented on issue #8229: URL: https://github.com/apache/datafusion/issues/8229#issuecomment-2769446857 @xudong963 do you think we have completed this issue now? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use t

[I] Extend TopK early termination to partially sorted inputs [datafusion]

2025-04-01 Thread via GitHub
geoffreyclaude opened a new issue, #15529: URL: https://github.com/apache/datafusion/issues/15529 ### Is your feature request related to a problem or challenge? DataFusion currently has a "TopK early termination" optimization, which speeds up queries that involve `ORDER BY` and `LIMIT

Re: [PR] fix: Queries similar to `count-bug` produce incorrect results [datafusion]

2025-04-01 Thread via GitHub
jayzhan211 commented on PR #15281: URL: https://github.com/apache/datafusion/pull/15281#issuecomment-2769195062 ``` [2025-04-01T12:19:41Z DEBUG datafusion_optimizer::utils] scalar_subquery_to_join: Projection: e.b, __scalar_sq_1.CASE WHEN max(e2.a) > Int64(10) THEN Utf8("a") ELSE

Re: [I] AQE Unable to Rewrite Joins as Broadcast Hash Joins Due to Existing CometBroadcastHashJoin Operator [datafusion-comet]

2025-04-01 Thread via GitHub
mbutrovich commented on issue #1589: URL: https://github.com/apache/datafusion-comet/issues/1589#issuecomment-2769269205 Good catch, @Kontinuation! https://github.com/apache/datafusion-comet/pull/1578 has me looking at AQE wondering if there are other places where Comet isn't working with

Re: [PR] GroupsAccumulator for Duration (#15322) [datafusion]

2025-04-01 Thread via GitHub
emilk closed pull request #15522: GroupsAccumulator for Duration (#15322) URL: https://github.com/apache/datafusion/pull/15522 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. T

Re: [I] [EPIC] ClickBench Improvements (Vanity Benchmark) [datafusion]

2025-04-01 Thread via GitHub
zhuqi-lucas commented on issue #14586: URL: https://github.com/apache/datafusion/issues/14586#issuecomment-2768790003 Make Clickbench Q29 5X faster: https://github.com/apache/datafusion/issues/15524 -- This is an automated message from the Apache Git Service. To respond to the messa

Re: [PR] Introduce load-balanced `split_groups_by_statistics` method [datafusion]

2025-04-01 Thread via GitHub
xudong963 commented on code in PR #15473: URL: https://github.com/apache/datafusion/pull/15473#discussion_r2022286476 ## datafusion/datasource/src/file_scan_config.rs: ## @@ -,4 +2315,163 @@ mod tests { assert_eq!(new_config.constraints, Constraints::default());

Re: [PR] Add dynamic pruning filters from TopK state [datafusion]

2025-04-01 Thread via GitHub
ctsk commented on code in PR #15301: URL: https://github.com/apache/datafusion/pull/15301#discussion_r2022551332 ## datafusion/physical-plan/src/sorts/sort_filters.rs: ## @@ -0,0 +1,236 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor lic

Re: [I] TPCH unit tests failure [datafusion-ballista]

2025-04-01 Thread via GitHub
milenkovicm commented on issue #1194: URL: https://github.com/apache/datafusion-ballista/issues/1194#issuecomment-2769278754 @vmingchen is this issue closed with #1195? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

Re: [I] Run all benchmarks on merge to main branch [datafusion]

2025-04-01 Thread via GitHub
Omega359 commented on issue #15511: URL: https://github.com/apache/datafusion/issues/15511#issuecomment-2769284420 > > There has been a number of issues where benchmarks stopped working and no one noticed until someone happened to try and run them > > Instead of running the benchmark,

Re: [PR] Add dynamic pruning filters from TopK state [datafusion]

2025-04-01 Thread via GitHub
ctsk commented on code in PR #15301: URL: https://github.com/apache/datafusion/pull/15301#discussion_r2022274853 ## datafusion/physical-plan/src/topk/mod.rs: ## @@ -186,6 +235,90 @@ impl TopK { Ok(()) } +fn calculate_dynamic_filters( +thresholds: Vec,

Re: [PR] Add dynamic pruning filters from TopK state [datafusion]

2025-04-01 Thread via GitHub
ctsk commented on code in PR #15301: URL: https://github.com/apache/datafusion/pull/15301#discussion_r2022327718 ## datafusion/physical-plan/src/sorts/sort_filters.rs: ## @@ -0,0 +1,236 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor lic

[I] Make Clickbench Q29 5x faster for datafusion [datafusion]

2025-04-01 Thread via GitHub
zhuqi-lucas opened a new issue, #15524: URL: https://github.com/apache/datafusion/issues/15524 ### Is your feature request related to a problem or challenge? https://github.com/user-attachments/assets/c82b798f-7c14-42e9-b6d9-b67a6b038c9d"; /> Our datafusion is 5x slower tha

Re: [PR] chore: Reimplement ShuffleWriterExec using interleave_record_batch [datafusion-comet]

2025-04-01 Thread via GitHub
Kontinuation commented on code in PR #1511: URL: https://github.com/apache/datafusion-comet/pull/1511#discussion_r2022553633 ## native/core/src/execution/shuffle/shuffle_writer.rs: ## @@ -667,175 +740,322 @@ impl Debug for ShuffleRepartitioner { } } -/// The status of ap

Re: [PR] chore: Reimplement ShuffleWriterExec using interleave_record_batch [datafusion-comet]

2025-04-01 Thread via GitHub
Kontinuation commented on code in PR #1511: URL: https://github.com/apache/datafusion-comet/pull/1511#discussion_r2022551508 ## native/core/src/execution/shuffle/shuffle_writer.rs: ## @@ -667,175 +740,322 @@ impl Debug for ShuffleRepartitioner { } } -/// The status of ap

[PR] ArraySort: support structs [datafusion]

2025-04-01 Thread via GitHub
cht42 opened a new pull request, #15527: URL: https://github.com/apache/datafusion/pull/15527 ## Which issue does this PR close? - Closes #15526 ## Rationale for this change ## What changes are included in this PR? ## Are these changes teste

Re: [I] Make Clickbench Q29 5x faster for datafusion [datafusion]

2025-04-01 Thread via GitHub
zhuqi-lucas commented on issue #15524: URL: https://github.com/apache/datafusion/issues/15524#issuecomment-2768994999 Thank you @jayzhan211 for the guide, i will try this! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and us

Re: [PR] Add documentation example for `AggregateExprBuilder` [datafusion]

2025-04-01 Thread via GitHub
alamb commented on PR #15504: URL: https://github.com/apache/datafusion/pull/15504#issuecomment-2769364593 > @berkaysynnada thank you, actually I did use the datafusion-examples as reference. > > > Perhaps we can make this in datafusion-examples as well > > Yeah that could also

Re: [I] Run all benchmarks on merge to main branch [datafusion]

2025-04-01 Thread via GitHub
jayzhan211 commented on issue #15511: URL: https://github.com/apache/datafusion/issues/15511#issuecomment-2769754304 > > I don't think we need to actually "benchmark" the code for each merge. > > The issue [#5504](https://github.com/apache/datafusion/issues/5504) would require all ben

Re: [PR] Add documentation example for `AggregateExprBuilder` [datafusion]

2025-04-01 Thread via GitHub
alamb commented on code in PR #15504: URL: https://github.com/apache/datafusion/pull/15504#discussion_r2023080660 ## datafusion/physical-expr/src/aggregate.rs: ## @@ -97,6 +97,167 @@ impl AggregateExprBuilder { /// Constructs an `AggregateFunctionExpr` from the builder

Re: [PR] datafusion-cli: document reading partitioned parquet [datafusion]

2025-04-01 Thread via GitHub
alamb commented on code in PR #15505: URL: https://github.com/apache/datafusion/pull/15505#discussion_r202314 ## docs/source/user-guide/cli/datasources.md: ## @@ -126,6 +125,32 @@ select count(*) from hits; 1 row in set. Query took 0.344 seconds. ``` +**Why Wildcards Are

[PR] fix: update group by columns for merge phase after spill [datafusion]

2025-04-01 Thread via GitHub
rluvaton opened a new pull request, #15531: URL: https://github.com/apache/datafusion/pull/15531 ## Which issue does this PR close? - Closes #15530. ## Rationale for this change the PR forgot to update the group by expressions: - #13995 ## What changes are inclu

Re: [PR] ArraySort: support structs [datafusion]

2025-04-01 Thread via GitHub
alamb commented on code in PR #15527: URL: https://github.com/apache/datafusion/pull/15527#discussion_r2023103674 ## datafusion/functions-nested/src/sort.rs: ## @@ -207,9 +208,21 @@ pub fn array_sort_inner(args: &[ArrayRef]) -> Result { valid.append_null();

Re: [PR] fix: Queries similar to `count-bug` produce incorrect results [datafusion]

2025-04-01 Thread via GitHub
suibianwanwank commented on PR #15281: URL: https://github.com/apache/datafusion/pull/15281#issuecomment-2769782451 > The projection required to be in the group expression. I think the query of these 2 are equivalent but the subquery one group by `e2.b` and the join query group by `e1.b`.

[PR] Draft: Make Clickbench Q29 5x faster for datafusion [datafusion]

2025-04-01 Thread via GitHub
zhuqi-lucas opened a new pull request, #15532: URL: https://github.com/apache/datafusion/pull/15532 ## Which issue does this PR close? - Closes #. ## Rationale for this change ## What changes are included in this PR? ## Are these changes tes

Re: [I] Collecting parquet without any transformations throws an exception [datafusion-comet]

2025-04-01 Thread via GitHub
l0kr commented on issue #1588: URL: https://github.com/apache/datafusion-comet/issues/1588#issuecomment-2769772035 Ah nice catch @mbutrovich! Yup, looks like a dupe 👀 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use th

Re: [PR] Test: configuration fuzzer for (external) sort queries [datafusion]

2025-04-01 Thread via GitHub
2010YOUY01 commented on code in PR #15501: URL: https://github.com/apache/datafusion/pull/15501#discussion_r2022134277 ## datafusion/core/tests/fuzz_cases/sort_query_fuzz.rs: ## @@ -0,0 +1,635 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contribu

Re: [PR] Migrate `datafusion/sql` tests to insta, part2 [datafusion]

2025-04-01 Thread via GitHub
alamb commented on PR #15499: URL: https://github.com/apache/datafusion/pull/15499#issuecomment-2769639137 Thanks again @qstommyshu -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific c

Re: [PR] Add documentation example for `AggregateExprBuilder` [datafusion]

2025-04-01 Thread via GitHub
alamb commented on code in PR #15504: URL: https://github.com/apache/datafusion/pull/15504#discussion_r2023660049 ## datafusion/physical-expr/src/aggregate.rs: ## @@ -97,6 +97,165 @@ impl AggregateExprBuilder { /// Constructs an `AggregateFunctionExpr` from the builder

Re: [PR] Add documentation example for `AggregateExprBuilder` [datafusion]

2025-04-01 Thread via GitHub
alamb commented on PR #15504: URL: https://github.com/apache/datafusion/pull/15504#issuecomment-2770660147 I also merged up from main to fix the CO -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [I] CSV data with double quotes fails [datafusion]

2025-04-01 Thread via GitHub
alamb commented on issue #439: URL: https://github.com/apache/datafusion/issues/439#issuecomment-2770467590 I think this is fixed -- datafusion can read data with double quotes now -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitH

Re: [PR] feat: fix struct of arrays [datafusion-comet]

2025-04-01 Thread via GitHub
comphead commented on PR #1592: URL: https://github.com/apache/datafusion-comet/pull/1592#issuecomment-2770697619 @parthchandra @andygrove @kazuyukitanimura -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abo

Re: [I] Physical plan refactor to support optimization rules and more efficient use of threads [datafusion]

2025-04-01 Thread via GitHub
alamb closed issue #92: Physical plan refactor to support optimization rules and more efficient use of threads URL: https://github.com/apache/datafusion/issues/92 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL ab

Re: [I] Word Count [datafusion]

2025-04-01 Thread via GitHub
alamb commented on issue #197: URL: https://github.com/apache/datafusion/issues/197#issuecomment-2770485282 Make lineitem SF 10 (3 seconds!) data using tpchgen-cli https://github.com/clflushopt/tpchgen-rs ```shell tpchgen-cli -v --tables=lineitem --scale-factor=10 --format=parquet

Re: [PR] Fix duplicate unqualified Field name (schema error) on join queries [datafusion]

2025-04-01 Thread via GitHub
alamb commented on PR #15438: URL: https://github.com/apache/datafusion/pull/15438#issuecomment-2770491240 Hi @LiaCastaneda -- I believe the CI has failed on this PR due to a change in the CI actions. Can you please merge the PR up to main which i think will address the issue -- This is

Re: [PR] Add dynamic pruning filters from TopK state [datafusion]

2025-04-01 Thread via GitHub
alamb commented on PR #15301: URL: https://github.com/apache/datafusion/pull/15301#issuecomment-2770508515 FYI I will likely try and review this PR again carefully first thing tomorrow morning -- This is an automated message from the Apache Git Service. To respond to the message, please l

[PR] chore(ci): build fails with strange error [datafusion-ballista]

2025-04-01 Thread via GitHub
milenkovicm opened a new pull request, #1222: URL: https://github.com/apache/datafusion-ballista/pull/1222 # Which issue does this PR close? Closes #. # Rationale for this change There spurious CI error, - https://github.com/apache/datafusion-ballista/actions/run

Re: [I] Extend TopK early termination to partially sorted inputs [datafusion]

2025-04-01 Thread via GitHub
alamb commented on issue #15529: URL: https://github.com/apache/datafusion/issues/15529#issuecomment-2770458720 This may be some overlap with this work from @adriangb (though I realize you are talking about a different optimization) - https://github.com/apache/datafusion/issues/15037

Re: [I] Add test cases for NULL in joins as key values [datafusion]

2025-04-01 Thread via GitHub
alamb closed issue #148: Add test cases for NULL in joins as key values URL: https://github.com/apache/datafusion/issues/148 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [I] CSV data with double quotes fails [datafusion]

2025-04-01 Thread via GitHub
alamb closed issue #439: CSV data with double quotes fails URL: https://github.com/apache/datafusion/issues/439 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe,

Re: [I] Physical plan refactor to support optimization rules and more efficient use of threads [datafusion]

2025-04-01 Thread via GitHub
alamb commented on issue #92: URL: https://github.com/apache/datafusion/issues/92#issuecomment-2770464823 I think this is no longer relevant so closing. Let's open a new ticket if there is anything actionable remaining -- This is an automated message from the Apache Git Service. To respon

Re: [PR] Minor: clone and debug for FileSinkConfig [datafusion]

2025-04-01 Thread via GitHub
jayzhan211 merged PR #15516: URL: https://github.com/apache/datafusion/pull/15516 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dat

Re: [I] Building project takes a *long* time (esp compilation time for `datafusion` core crate) [datafusion]

2025-04-01 Thread via GitHub
logan-keede commented on issue #13814: URL: https://github.com/apache/datafusion/issues/13814#issuecomment-2770108919 I did some profiling. ```sh RUSTC_BOOTSTRAP=1 cargo rustc -p datafusion-catalog -- -Z self-profile -Z self-profile-events=default,args ``` ![Image](https://githu

Re: [PR] chore: update changelog for 45.0.0 [datafusion-ballista]

2025-04-01 Thread via GitHub
milenkovicm commented on PR #1218: URL: https://github.com/apache/datafusion-ballista/pull/1218#issuecomment-2770545131 just a heads up @andygrove I've added release of scheduler, executor ... docker containers when new tag created. I did try it, hopefully it will not make problems when ne

Re: [PR] chore(ci): build fails with strange error [datafusion-ballista]

2025-04-01 Thread via GitHub
milenkovicm commented on PR #1222: URL: https://github.com/apache/datafusion-ballista/pull/1222#issuecomment-2770514586 does not look as it fixes the issue: https://github.com/apache/datafusion-ballista/actions/runs/14204595182 -- This is an automated message from the Apache Git Service.

[PR] Migrate datafusion/sql tests to insta, part3 [datafusion]

2025-04-01 Thread via GitHub
qstommyshu opened a new pull request, #15533: URL: https://github.com/apache/datafusion/pull/15533 ## Which issue does this PR close? - Related #15397, #15497, #15499 this is a part of #15484 breaking down. - Checkout things to note of the whole migration in comments sectio

Re: [PR] feat: fix struct of arrays [datafusion-comet]

2025-04-01 Thread via GitHub
codecov-commenter commented on PR #1592: URL: https://github.com/apache/datafusion-comet/pull/1592#issuecomment-2770532403 ## [Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/1592?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_ca

Re: [PR] Migrate datafusion/sql tests to insta, part3 [datafusion]

2025-04-01 Thread via GitHub
qstommyshu commented on PR #15533: URL: https://github.com/apache/datafusion/pull/15533#issuecomment-2770233544 Humm, seems there are some issue with the CI pipeline. https://github.com/user-attachments/assets/1079a7c8-3e5c-4bca-9473-3c0e9fe69ec7"; /> > sccache: error: Server s

Re: [PR] Add GreptimeDB to the "Users" in README [datafusion-sqlparser-rs]

2025-04-01 Thread via GitHub
iffyio merged PR #1788: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1788 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr

[PR] added functionality to handle output statement [datafusion-sqlparser-rs]

2025-04-01 Thread via GitHub
dilovancelik opened a new pull request, #1790: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1790 Hey I have added a feature to handle OUTPUT Statements in the end of merge statements, which is used in MS SQL per the following issue [1789](https://github.com/apache/datafusion-

[PR] feat: fix struct of arrays [datafusion-comet]

2025-04-01 Thread via GitHub
comphead opened a new pull request, #1592: URL: https://github.com/apache/datafusion-comet/pull/1592 ## Which issue does this PR close? Closes #1551. ## Rationale for this change Fixing STRUCT of ARRAY ## What changes are included in this PR?

Re: [PR] ArraySort: support structs [datafusion]

2025-04-01 Thread via GitHub
cht42 commented on code in PR #15527: URL: https://github.com/apache/datafusion/pull/15527#discussion_r2023172721 ## datafusion/functions-nested/src/sort.rs: ## @@ -207,9 +208,21 @@ pub fn array_sort_inner(args: &[ArrayRef]) -> Result { valid.append_null();

Re: [I] Make it easier to run TPCH queries with datafusion-cli [datafusion]

2025-04-01 Thread via GitHub
alamb commented on issue #14608: URL: https://github.com/apache/datafusion/issues/14608#issuecomment-2770127700 @scsmithr of GlareDB integrated the tpchgen library in glaredb as a table function - https://github.com/GlareDB/glaredb/pull/3549 Which is quite cool ```shell g

Re: [PR] fix: update group by columns for merge phase after spill [datafusion]

2025-04-01 Thread via GitHub
rluvaton commented on PR #15531: URL: https://github.com/apache/datafusion/pull/15531#issuecomment-2769949735 The CI failures are infra related... -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] Add all missing table options to be handled in any order [datafusion-sqlparser-rs]

2025-04-01 Thread via GitHub
tomershaniii commented on code in PR #1747: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1747#discussion_r2023395604 ## src/parser/mod.rs: ## @@ -7081,18 +7029,243 @@ impl<'a> Parser<'a> { if let Token::Word(word) = self.peek_token().token {

Re: [PR] Add all missing table options to be handled in any order [datafusion-sqlparser-rs]

2025-04-01 Thread via GitHub
tomershaniii commented on code in PR #1747: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1747#discussion_r2023395604 ## src/parser/mod.rs: ## @@ -7081,18 +7029,243 @@ impl<'a> Parser<'a> { if let Token::Word(word) = self.peek_token().token {

Re: [I] Building project takes a *long* time (esp compilation time for `datafusion` core crate) [datafusion]

2025-04-01 Thread via GitHub
alamb commented on issue #13814: URL: https://github.com/apache/datafusion/issues/13814#issuecomment-2770840025 It seems like 1.86 won't be released for 3 more days: https://releases.rs/docs/1.86.0/ It would be cool to update and try it -- This is an automated message from the Apac

Re: [PR] Improve spill performance: Disable re-validation of spilled files [datafusion]

2025-04-01 Thread via GitHub
alamb commented on PR #15454: URL: https://github.com/apache/datafusion/pull/15454#issuecomment-2770842980 Merged up to get latest changes and rerun CI -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to g

Re: [I] NoSuchMethodError: java.lang.Object org.apache.spark.executor.TaskMetrics.withExternalAccums(scala.Function1) [datafusion-comet]

2025-04-01 Thread via GitHub
mkgada commented on issue #1576: URL: https://github.com/apache/datafusion-comet/issues/1576#issuecomment-2770843418 Thank you so much for looking into this though, @andygrove I will take this up with GCP folks -- This is an automated message from the Apache Git Service. To respon

Re: [I] datafusion-cli: document reading partitioned parquet [datafusion]

2025-04-01 Thread via GitHub
alamb closed issue #15309: datafusion-cli: document reading partitioned parquet URL: https://github.com/apache/datafusion/issues/15309 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific com

Re: [I] feat: fix schema issues for `native reader - read STRUCT of ARRAY fields` [datafusion-comet]

2025-04-01 Thread via GitHub
comphead closed issue #1551: feat: fix schema issues for `native reader - read STRUCT of ARRAY fields` URL: https://github.com/apache/datafusion-comet/issues/1551 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL ab

Re: [PR] feat: Fix struct of arrays schema issue [datafusion-comet]

2025-04-01 Thread via GitHub
comphead commented on PR #1592: URL: https://github.com/apache/datafusion-comet/pull/1592#issuecomment-2770808027 Thanks @andygrove for the review -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [I] Make Clickbench Q29 5x faster for datafusion [datafusion]

2025-04-01 Thread via GitHub
alamb commented on issue #15524: URL: https://github.com/apache/datafusion/issues/15524#issuecomment-2770823621 I found a duckdb implementation of a seemingling similar optimization: https://github.com/duckdb/duckdb/blob/7912713493b38b1eda162f29b7759d5024989a5f/src/optimizer/sum_rewriter.cpp

Re: [I] NoSuchMethodError: java.lang.Object org.apache.spark.executor.TaskMetrics.withExternalAccums(scala.Function1) [datafusion-comet]

2025-04-01 Thread via GitHub
mkgada commented on issue #1576: URL: https://github.com/apache/datafusion-comet/issues/1576#issuecomment-2770838744 I am using a Spark image supplied by GCP Dataproc, thank you for checking this -- This is an automated message from the Apache Git Service. To respond to the message, plea

Re: [I] Support zero copy hash repartitioning for Hash Aggregate [datafusion]

2025-04-01 Thread via GitHub
Rachelint commented on issue #15383: URL: https://github.com/apache/datafusion/issues/15383#issuecomment-2770996279 > I'm considering another approach. Maybe I shouldn't use filter_record_batch 🤔. It filters the all column iteratly. I should filter the row when the accumulator merge_batch 🤔

Re: [PR] feat: respect `batchSize/workerThreads/blockingThreads` configurations for native_iceberg_compat scan [datafusion-comet]

2025-04-01 Thread via GitHub
wForget commented on code in PR #1587: URL: https://github.com/apache/datafusion-comet/pull/1587#discussion_r2023897072 ## native/core/src/parquet/mod.rs: ## @@ -650,21 +651,29 @@ pub unsafe extern "system" fn Java_org_apache_comet_parquet_Native_initRecordBat required_sch

Re: [I] NoSuchMethodError: java.lang.Object org.apache.spark.executor.TaskMetrics.withExternalAccums(scala.Function1) [datafusion-comet]

2025-04-01 Thread via GitHub
andygrove commented on issue #1576: URL: https://github.com/apache/datafusion-comet/issues/1576#issuecomment-2770840853 I am assuming that the GCP version of Spark has some differences in these internal APIs -- This is an automated message from the Apache Git Service. To respond to the m

Re: [I] Make Clickbench Q29 5x faster for datafusion [datafusion]

2025-04-01 Thread via GitHub
zhuqi-lucas commented on issue #15524: URL: https://github.com/apache/datafusion/issues/15524#issuecomment-2770953658 Thank you @Dandandan @alamb for double check and confirm! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub a

[PR] fix: fix spark/sql test failures in native_iceberg_compat [datafusion-comet]

2025-04-01 Thread via GitHub
parthchandra opened a new pull request, #1593: URL: https://github.com/apache/datafusion-comet/pull/1593 ## Which issue does this PR close? Part of #1542 ## Rationale for this change A bug in the logic of `NativeBatchReader` caused NPE and array index out of bounds erro

Re: [I] NoSuchMethodError: java.lang.Object org.apache.spark.executor.TaskMetrics.withExternalAccums(scala.Function1) [datafusion-comet]

2025-04-01 Thread via GitHub
mkgada commented on issue #1576: URL: https://github.com/apache/datafusion-comet/issues/1576#issuecomment-2770817710 Update: spun up another cluster on Spark 3.5.3 and used the same prebuilt Comet JAR 0.7.0 Was able to get past the initial error documented here but now running into

Re: [I] NoSuchMethodError: java.lang.Object org.apache.spark.executor.TaskMetrics.withExternalAccums(scala.Function1) [datafusion-comet]

2025-04-01 Thread via GitHub
andygrove commented on issue #1576: URL: https://github.com/apache/datafusion-comet/issues/1576#issuecomment-2770828974 > Update: spun up another cluster on Spark 3.5.3 and used the same prebuilt Comet JAR 0.7.0 > > Was able to get past the initial error documented here but now runni

Re: [PR] feat: Fix struct of arrays schema issue [datafusion-comet]

2025-04-01 Thread via GitHub
comphead merged PR #1592: URL: https://github.com/apache/datafusion-comet/pull/1592 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@d

Re: [PR] feat: respect `batchSize/workerThreads/blockingThreads` configurations for native_iceberg_compat scan [datafusion-comet]

2025-04-01 Thread via GitHub
parthchandra commented on code in PR #1587: URL: https://github.com/apache/datafusion-comet/pull/1587#discussion_r2023778763 ## native/core/src/parquet/mod.rs: ## @@ -650,21 +651,29 @@ pub unsafe extern "system" fn Java_org_apache_comet_parquet_Native_initRecordBat require

Re: [PR] datafusion-cli: document reading partitioned parquet [datafusion]

2025-04-01 Thread via GitHub
alamb merged PR #15505: URL: https://github.com/apache/datafusion/pull/15505 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] Disable sccache action to fix gh cache issue [datafusion]

2025-04-01 Thread via GitHub
alamb merged PR #15536: URL: https://github.com/apache/datafusion/pull/15536 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] feat: Add Aggregate UDF to FFI crate [datafusion]

2025-04-01 Thread via GitHub
CrystalZhou0529 commented on code in PR #14775: URL: https://github.com/apache/datafusion/pull/14775#discussion_r2023878333 ## datafusion/ffi/tests/ffi_integration.rs: ## @@ -179,4 +181,103 @@ mod tests { Ok(()) } + +#[tokio::test] +async fn test_ffi_udaf

Re: [I] feat: Support read array type using native reader [datafusion-comet]

2025-04-01 Thread via GitHub
comphead closed issue #1454: feat: Support read array type using native reader URL: https://github.com/apache/datafusion-comet/issues/1454 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[PR] feat: adding more struct/arrays tests [datafusion-comet]

2025-04-01 Thread via GitHub
comphead opened a new pull request, #1594: URL: https://github.com/apache/datafusion-comet/pull/1594 ## Which issue does this PR close? Related #1550 . ## Rationale for this change ## What changes are included in this PR? ## How are these ch

Re: [PR] Add short circuit evaluation for `AND` and `OR` [datafusion]

2025-04-01 Thread via GitHub
acking-you commented on code in PR #15462: URL: https://github.com/apache/datafusion/pull/15462#discussion_r2023116930 ## datafusion/physical-expr/src/expressions/binary.rs: ## @@ -805,6 +811,47 @@ impl BinaryExpr { } } +/// Check if it meets the short-circuit condition

Re: [I] Spark SQL test failures in native_iceberg_compat mode [datafusion-comet]

2025-04-01 Thread via GitHub
mbutrovich commented on issue #1542: URL: https://github.com/apache/datafusion-comet/issues/1542#issuecomment-2769901472 ``` catalyst: Passed: Total 6925, Failed 0, Errors 0, Passed 6925, Ignored 5, Canceled 1 core 1: Failed: Total 8686, Failed 47, Errors 0, Passed 8639, Ignored 277,

  1   2   >