Re: [PR] feat: Support null aware + equijoins for `NestedLoopJoin` [datafusion]

2025-06-15 Thread via GitHub
jonathanc-n closed pull request #16210: feat: Support null aware + equijoins for `NestedLoopJoin` URL: https://github.com/apache/datafusion/pull/16210 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] TopK dynamic filter pushdown attempt 2 [datafusion]

2025-06-15 Thread via GitHub
Dandandan commented on PR #15770: URL: https://github.com/apache/datafusion/pull/15770#issuecomment-2974698767 > @Dandandan thank you so much for pushing this forward! Really appreciate the help and collaboration. > > Since you've been looking at the code do you have any thoughts on [

Re: [PR] TopK dynamic filter pushdown attempt 2 [datafusion]

2025-06-15 Thread via GitHub
adriangb commented on PR #15770: URL: https://github.com/apache/datafusion/pull/15770#issuecomment-2974704013 > I think two different phases sounds good. > > I don't really like complicating FilterPushdown with post / pre "phases". Why not creating the `PushDownDynamicFilters` one

Re: [PR] Draft: Use upstream arrow `coalesce` kernel in DataFusion [datafusion]

2025-06-15 Thread via GitHub
Dandandan commented on PR #16249: URL: https://github.com/apache/datafusion/pull/16249#issuecomment-2974721937 This is for me ```diff --- i/datafusion/physical-plan/src/coalesce/mod.rs +++ w/datafusion/physical-plan/src/coalesce/mod.rs @@ -33,6 +33,9 @@ pub struct LimitedBatc

[PR] explicitly create temp path [datafusion-ballista]

2025-06-15 Thread via GitHub
Huy1Ng opened a new pull request, #1273: URL: https://github.com/apache/datafusion-ballista/pull/1273 # Which issue does this PR close? Closes #1117. # Rationale for this change Tests failed on Windows image because the path is not resolved/canonicalized before feeding into

Re: [I] decimal calculate overflow but not throw error [datafusion]

2025-06-15 Thread via GitHub
mmooyyii commented on issue #16406: URL: https://github.com/apache/datafusion/issues/16406#issuecomment-2974920966 ``` use bigdecimal::num_bigint::BigInt; use bigdecimal::{BigDecimal}; use std::fs; fn main() { let f = fs::read_to_string("/tmp/decimal.csv").unwrap();

Re: [I] Few tests fail on windows [datafusion-ballista]

2025-06-15 Thread via GitHub
Huy1Ng commented on issue #1117: URL: https://github.com/apache/datafusion-ballista/issues/1117#issuecomment-2974921643 I made a PR to fix this bug here: https://github.com/apache/datafusion-ballista/pull/1273 -- This is an automated message from the Apache Git Service. To respond to the

Re: [PR] fix: miss output ordering during projection [datafusion]

2025-06-15 Thread via GitHub
github-actions[bot] commented on PR #15683: URL: https://github.com/apache/datafusion/pull/15683#issuecomment-2974908011 Thank you for your contribution. Unfortunately, this pull request is stale because it has been open 60 days with no activity. Please remove the stale label or comment or

Re: [PR] fix!: incorrect coercion when comparing with string literals [datafusion]

2025-06-15 Thread via GitHub
github-actions[bot] commented on PR #15482: URL: https://github.com/apache/datafusion/pull/15482#issuecomment-2974908124 Thank you for your contribution. Unfortunately, this pull request is stale because it has been open 60 days with no activity. Please remove the stale label or comment or

Re: [PR] Migrate core test to insta, part1 [datafusion]

2025-06-15 Thread via GitHub
Chen-Yuan-Lai commented on code in PR #16324: URL: https://github.com/apache/datafusion/pull/16324#discussion_r2148615171 ## datafusion/core/tests/sql/explain_analyze.rs: ## @@ -797,14 +796,16 @@ async fn explain_physical_plan_only() { let sql = "EXPLAIN select count(*) fro

Re: [PR] TopK dynamic filter pushdown attempt 2 [datafusion]

2025-06-15 Thread via GitHub
Dandandan commented on PR #15770: URL: https://github.com/apache/datafusion/pull/15770#issuecomment-2974699280 Apart from that, I agree with the sentiment to try eliminate it from `EnforceSorting` -- This is an automated message from the Apache Git Service. To respond to the message, plea

Re: [I] decimal calculate overflow but not throw error [datafusion]

2025-06-15 Thread via GitHub
mmooyyii commented on issue #16406: URL: https://github.com/apache/datafusion/issues/16406#issuecomment-2975198860 https://github.com/apache/datafusion/blob/ca0b760af6137c0dbec8b07daa5f48e262420cb5/datafusion/functions-aggregate/src/sum.rs#L309 Use `*v = v.add_checked(x)?;` ? --

Re: [PR] chore: generate basic spark function tests [datafusion]

2025-06-15 Thread via GitHub
alamb commented on PR #16409: URL: https://github.com/apache/datafusion/pull/16409#issuecomment-2973641672 > I'm personally intrigued tbh but I'd say the DF core should be agnostic of specific data-driven architecture(like Spark) even if we do a lot of Spark integration like Sail or Comet.

Re: [PR] chore: generate basic spark function tests [datafusion]

2025-06-15 Thread via GitHub
alamb commented on code in PR #16409: URL: https://github.com/apache/datafusion/pull/16409#discussion_r2147518786 ## datafusion/sqllogictest/test_files/spark/array/array.slt: ## @@ -0,0 +1,22 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor

Re: [PR] Update PMC management instructions to follow new ASF process [datafusion]

2025-06-15 Thread via GitHub
xudong963 commented on PR #16417: URL: https://github.com/apache/datafusion/pull/16417#issuecomment-2973660882 https://github.com/user-attachments/assets/42ef5941-3879-4b2a-84de-51162c2f06fb"; /> I don't find any changes -- This is an automated message from the Apache Git Service.

Re: [I] [DISCUSSION] JOIN "task force" / project team [datafusion]

2025-06-15 Thread via GitHub
xudong963 commented on issue #15885: URL: https://github.com/apache/datafusion/issues/15885#issuecomment-2973683721 FYI, I just followed the latest paper from TUM, "Improving Unnesting of Complex Queries", and will learn about the current code in DF and read the latest PRs in DF about unnes

[PR] fix: Enable WASM compilation by making sqlparser's recursive-protection optional [datafusion]

2025-06-15 Thread via GitHub
jonmmease opened a new pull request, #16418: URL: https://github.com/apache/datafusion/pull/16418 ## Summary This PR fixes the WASM compilation issue (#13513) by making sqlparser's `recursive-protection` feature optional. This allows DataFusion to be compiled for WebAssembly targets

Re: [PR] Update Roadmap documentation [datafusion]

2025-06-15 Thread via GitHub
alamb commented on code in PR #16399: URL: https://github.com/apache/datafusion/pull/16399#discussion_r2147731773 ## docs/source/contributor-guide/roadmap.md: ## @@ -46,81 +46,12 @@ make review efficient and avoid surprises. # Quarterly Roadmap -A quarterly roadmap will be

Re: [PR] Update PMC management instructions to follow new ASF process [datafusion]

2025-06-15 Thread via GitHub
alamb commented on PR #16417: URL: https://github.com/apache/datafusion/pull/16417#issuecomment-2973734192 > https://private-user-images.githubusercontent.com/41979257/455251135-42ef5941-3879-4b2a-84de-51162c2f06fb.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkI

Re: [PR] Add an example of embedding indexes inside a parquet file [datafusion]

2025-06-15 Thread via GitHub
zhuqi-lucas commented on code in PR #16395: URL: https://github.com/apache/datafusion/pull/16395#discussion_r2148133565 ## datafusion-examples/examples/embedding_parquet_indexes.rs: ## @@ -0,0 +1,243 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more c

Re: [PR] Add an example of embedding indexes inside a parquet file [datafusion]

2025-06-15 Thread via GitHub
zhuqi-lucas commented on code in PR #16395: URL: https://github.com/apache/datafusion/pull/16395#discussion_r2148132009 ## datafusion-examples/examples/embedding_parquet_indexes.rs: ## @@ -0,0 +1,243 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more c

Re: [PR] Add an example of embedding indexes inside a parquet file [datafusion]

2025-06-15 Thread via GitHub
zhuqi-lucas commented on code in PR #16395: URL: https://github.com/apache/datafusion/pull/16395#discussion_r2148132009 ## datafusion-examples/examples/embedding_parquet_indexes.rs: ## @@ -0,0 +1,243 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more c

Re: [I] Blog post about DataFusion Async / Stream execution model / cancellation [datafusion]

2025-06-15 Thread via GitHub
pepijnve commented on issue #16396: URL: https://github.com/apache/datafusion/issues/16396#issuecomment-2974057031 @alamb I have a first draft written up at https://github.com/apache/datafusion-site/pull/75. I'm not a great blog writer, so any help in getting this over the finish line would

[PR] Blog post on query cancellation [datafusion-site]

2025-06-15 Thread via GitHub
pepijnve opened a new pull request, #75: URL: https://github.com/apache/datafusion-site/pull/75 (no comment) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe,

Re: [PR] Blog post on query cancellation [datafusion-site]

2025-06-15 Thread via GitHub
pepijnve commented on PR #75: URL: https://github.com/apache/datafusion-site/pull/75#issuecomment-2974061097 I've marked this as draft for now. I think I have the narrative arc I was going for in place, but the text probably still needs some editing work. -- This is an automated message f

Re: [PR] chore: generate basic spark function tests [datafusion]

2025-06-15 Thread via GitHub
comphead commented on PR #16409: URL: https://github.com/apache/datafusion/pull/16409#issuecomment-2973561640 I'm personally intrigued tbh but I'd say the DF core should be agnostic of specific data-driven architecture(like Spark) even if we do a lot of Spark integration like Sail or Comet.

[PR] Minor: Clean-up `bench.sh` usage message [datafusion]

2025-06-15 Thread via GitHub
2010YOUY01 opened a new pull request, #16416: URL: https://github.com/apache/datafusion/pull/16416 ## Which issue does this PR close? - Closes #. ## Rationale for this change There are a large number of benchmarks now in `bench.sh`'s help message. This PR gro

Re: [PR] Chore: implement datetime funcs as ScalarUDFImpl [datafusion-comet]

2025-06-15 Thread via GitHub
trompa commented on PR #1874: URL: https://github.com/apache/datafusion-comet/pull/1874#issuecomment-2973588238 code is now using a macro to generate the 3 hour, minute and second functions. new spark test to check literals are folded on jvm side -- This is an automated message from t

Re: [PR] Use Tokio's task budget consistently [datafusion]

2025-06-15 Thread via GitHub
pepijnve commented on PR #16398: URL: https://github.com/apache/datafusion/pull/16398#issuecomment-2973548782 I've added a commit to this PR that: - Makes the tests more robust. Rather than hanging when there's an issue they will fail. - Removes duplication from the tests - Removes

Re: [PR] chore: generate basic spark function tests [datafusion]

2025-06-15 Thread via GitHub
shehabgamin commented on PR #16409: URL: https://github.com/apache/datafusion/pull/16409#issuecomment-2973551244 > Got it @shehabgamin > > I'm seeing a lot of `slt` tests like > > ``` > #S > > > #E > #query > #L > ``` > > which not very explanato

Re: [I] Explore integration with Delta Lake [datafusion-comet]

2025-06-15 Thread via GitHub
tglanz commented on issue #174: URL: https://github.com/apache/datafusion-comet/issues/174#issuecomment-2973618491 > Now that Comet supports DataFusion's `DataSourceExec` (when `native_datafusion` scan is enabled) it should be much easier to support `delta-rs`. Is the suggestion mea

[PR] feat: consistent hash scheduling implemented as `DistributionPolicy` [datafusion-ballista]

2025-06-15 Thread via GitHub
milenkovicm opened a new pull request, #1272: URL: https://github.com/apache/datafusion-ballista/pull/1272 # Which issue does this PR close? Closes #. # Rationale for this change pluggable `DistributionPolicy` provides a way to extend scheduler task binding policies.

Re: [PR] Minor: Clean-up `bench.sh` usage message [datafusion]

2025-06-15 Thread via GitHub
alamb merged PR #16416: URL: https://github.com/apache/datafusion/pull/16416 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

[PR] Update PMC management instructions to follow new ASF process [datafusion]

2025-06-15 Thread via GitHub
alamb opened a new pull request, #16417: URL: https://github.com/apache/datafusion/pull/16417 ## Which issue does this PR close? - Closes #. ## Rationale for this change The [ASF process for inviting new PMC members](https://www.apache.org/dev/pmc.html#pmcmembers

Re: [PR] Add an example of embedding indexes inside a parquet file [datafusion]

2025-06-15 Thread via GitHub
zhuqi-lucas commented on code in PR #16395: URL: https://github.com/apache/datafusion/pull/16395#discussion_r2148133565 ## datafusion-examples/examples/embedding_parquet_indexes.rs: ## @@ -0,0 +1,243 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more c

Re: [PR] Add an example of embedding indexes inside a parquet file [datafusion]

2025-06-15 Thread via GitHub
zhuqi-lucas commented on code in PR #16395: URL: https://github.com/apache/datafusion/pull/16395#discussion_r214877 ## datafusion-examples/examples/embedding_parquet_indexes.rs: ## @@ -0,0 +1,243 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more c