Re: [PR] Introduce DynamicFilterSource and DynamicPhysicalExpr [datafusion]

2025-04-10 Thread via GitHub
jayzhan211 commented on code in PR #15568: URL: https://github.com/apache/datafusion/pull/15568#discussion_r2037246933 ## datafusion/physical-expr/src/expressions/dynamic_filters.rs: ## @@ -0,0 +1,380 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more

Re: [PR] Improve performance of `last_value` by implementing special `GroupsAccumulator` [datafusion]

2025-04-10 Thread via GitHub
jayzhan211 commented on code in PR #15542: URL: https://github.com/apache/datafusion/pull/15542#discussion_r2037273906 ## datafusion/functions-aggregate/src/first_last.rs: ## @@ -291,6 +296,7 @@ impl AggregateUDFImpl for FirstValue { } } +// TODO: rename to PrimitiveGrou

Re: [PR] Improve performance of `last_value` by implementing special `GroupsAccumulator` [datafusion]

2025-04-10 Thread via GitHub
jayzhan211 commented on code in PR #15542: URL: https://github.com/apache/datafusion/pull/15542#discussion_r2037273906 ## datafusion/functions-aggregate/src/first_last.rs: ## @@ -291,6 +296,7 @@ impl AggregateUDFImpl for FirstValue { } } +// TODO: rename to PrimitiveGrou

Re: [PR] Remove redundant `Precision` combination code in favor of `Precision::min/max/add` [datafusion]

2025-04-10 Thread via GitHub
xudong963 merged PR #15659: URL: https://github.com/apache/datafusion/pull/15659 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@data

Re: [PR] Introduce DynamicFilterSource and DynamicPhysicalExpr [datafusion]

2025-04-10 Thread via GitHub
adriangb commented on code in PR #15568: URL: https://github.com/apache/datafusion/pull/15568#discussion_r2037334037 ## datafusion/physical-expr/src/expressions/dynamic_filters.rs: ## @@ -0,0 +1,380 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more co

Re: [PR] fix: recursion protection for physical plan node [datafusion]

2025-04-10 Thread via GitHub
jayzhan211 commented on PR #15600: URL: https://github.com/apache/datafusion/pull/15600#issuecomment-2792686000 Thanks @chenkovsky and others! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] Fix tokenization of qualified identifiers with numeric prefix. [datafusion-sqlparser-rs]

2025-04-10 Thread via GitHub
romanb commented on code in PR #1803: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1803#discussion_r2037530085 ## src/tokenizer.rs: ## @@ -895,7 +895,7 @@ impl<'a> Tokenizer<'a> { }; let mut location = state.location(); -while let Some

Re: [I] Change default Spark version to 3.5 [datafusion-comet]

2025-04-10 Thread via GitHub
andygrove closed issue #1467: Change default Spark version to 3.5 URL: https://github.com/apache/datafusion-comet/issues/1467 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [PR] chore: Change default Spark version to 3.5 [datafusion-comet]

2025-04-10 Thread via GitHub
andygrove merged PR #1620: URL: https://github.com/apache/datafusion-comet/pull/1620 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@

[PR] chore(deps): bump crossbeam-channel from 0.5.14 to 0.5.15 [datafusion]

2025-04-10 Thread via GitHub
dependabot[bot] opened a new pull request, #15674: URL: https://github.com/apache/datafusion/pull/15674 Bumps [crossbeam-channel](https://github.com/crossbeam-rs/crossbeam) from 0.5.14 to 0.5.15. Release notes Sourced from https://github.com/crossbeam-rs/crossbeam/releases";>crossb

Re: [PR] feat: Add more tests for nested types combinations for `native_datafusion` [datafusion-comet]

2025-04-10 Thread via GitHub
comphead merged PR #1632: URL: https://github.com/apache/datafusion-comet/pull/1632 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@d

Re: [I] Unify `Precision::max` and `set_max_if_greater` methods [datafusion]

2025-04-10 Thread via GitHub
xudong963 closed issue #15615: Unify `Precision::max` and `set_max_if_greater` methods URL: https://github.com/apache/datafusion/issues/15615 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spe

Re: [I] stack overflow on `PhysicalPlanNode::try_from_physical_plan` [datafusion]

2025-04-10 Thread via GitHub
jayzhan211 closed issue #15087: stack overflow on `PhysicalPlanNode::try_from_physical_plan` URL: https://github.com/apache/datafusion/issues/15087 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to th

Re: [PR] Remove waits from blocking threads reading spill files. [datafusion]

2025-04-10 Thread via GitHub
ashdnazg commented on PR #15654: URL: https://github.com/apache/datafusion/pull/15654#issuecomment-2792174978 @2010YOUY01 I checked your benchmark locally on my linux, Ryzen 7945HX, 3 times on each version and got main: ~58s PR: ~57s which is not much better than noise. I also ch

Re: [PR] Introduce DynamicFilterSource and DynamicPhysicalExpr [datafusion]

2025-04-10 Thread via GitHub
adriangb commented on code in PR #15568: URL: https://github.com/apache/datafusion/pull/15568#discussion_r2037417258 ## datafusion/physical-expr/src/expressions/dynamic_filters.rs: ## @@ -0,0 +1,380 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more co

Re: [PR] ExecutionPlan: add APIs for filter pushdown & optimizer rule to apply them [datafusion]

2025-04-10 Thread via GitHub
ozankabak commented on PR #15566: URL: https://github.com/apache/datafusion/pull/15566#issuecomment-2793101507 > Maybe it is possible to move the recursion into the optimizer rule but still keep a `ExecutionPlan` method by making a complex call signature, maybe something like this: >

[I] Add tracing regression smoke tests [datafusion]

2025-04-10 Thread via GitHub
geoffreyclaude opened a new issue, #15672: URL: https://github.com/apache/datafusion/issues/15672 ### Is your feature request related to a problem or challenge? _No response_ ### Describe the solution you'd like _No response_ ### Describe alternatives you've consid

Re: [PR] ExecutionPlan: add APIs for filter pushdown & optimizer rule to apply them [datafusion]

2025-04-10 Thread via GitHub
adriangb commented on code in PR #15566: URL: https://github.com/apache/datafusion/pull/15566#discussion_r2037441578 ## datafusion/core/tests/physical_optimizer/filter_pushdown.rs: ## @@ -0,0 +1,529 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more co

Re: [I] Add tracing regression smoke tests [datafusion]

2025-04-10 Thread via GitHub
geoffreyclaude commented on issue #15672: URL: https://github.com/apache/datafusion/issues/15672#issuecomment-2793173975 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] ExecutionPlan: add APIs for filter pushdown & optimizer rule to apply them [datafusion]

2025-04-10 Thread via GitHub
adriangb commented on PR #15566: URL: https://github.com/apache/datafusion/pull/15566#issuecomment-2793196830 > We will get this over the finish line in a few days. So you're going to make a PR to replace this one? Please do consider my example below. There's a lot more complexi

Re: [PR] Fix tokenization of qualified identifiers with numeric prefix. [datafusion-sqlparser-rs]

2025-04-10 Thread via GitHub
romanb commented on code in PR #1803: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1803#discussion_r2037530085 ## src/tokenizer.rs: ## @@ -895,7 +895,7 @@ impl<'a> Tokenizer<'a> { }; let mut location = state.location(); -while let Some

Re: [PR] Fix tokenization of qualified identifiers with numeric prefix. [datafusion-sqlparser-rs]

2025-04-10 Thread via GitHub
romanb commented on code in PR #1803: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1803#discussion_r2037530085 ## src/tokenizer.rs: ## @@ -895,7 +895,7 @@ impl<'a> Tokenizer<'a> { }; let mut location = state.location(); -while let Some

Re: [PR] Introduce DynamicFilterSource and DynamicPhysicalExpr [datafusion]

2025-04-10 Thread via GitHub
adriangb commented on PR #15568: URL: https://github.com/apache/datafusion/pull/15568#issuecomment-2793681651 @berkaysynnada @jayzhan211 I added a test in f59577c26 that shows how this interacts with `ParquetSource` and should help with confusion in https://github.com/apache/datafusion/pull

Re: [PR] Fix tokenization of qualified identifiers with numeric prefix. [datafusion-sqlparser-rs]

2025-04-10 Thread via GitHub
romanb commented on code in PR #1803: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1803#discussion_r2037530085 ## src/tokenizer.rs: ## @@ -895,7 +895,7 @@ impl<'a> Tokenizer<'a> { }; let mut location = state.location(); -while let Some

Re: [I] Spark executor fail to start occasionally with SIGILL [datafusion-comet]

2025-04-10 Thread via GitHub
mbutrovich commented on issue #1598: URL: https://github.com/apache/datafusion-comet/issues/1598#issuecomment-2794162991 > Do you want me to share here or under separate thread? Maybe in a new discussion you could share your experience: https://github.com/apache/datafusion-comet/disc

Re: [PR] Remove waits from blocking threads reading spill files. [datafusion]

2025-04-10 Thread via GitHub
ashdnazg commented on PR #15654: URL: https://github.com/apache/datafusion/pull/15654#issuecomment-2794369865 @andygrove any chance you could check Comet's performance with this alternative implementation: https://github.com/ashdnazg/datafusion/tree/pull-batch-2 ? -- This is an automated

[PR] build(deps): bump crossbeam-channel from 0.5.13 to 0.5.15 in /examples/ffi-table-provider [datafusion-python]

2025-04-10 Thread via GitHub
dependabot[bot] opened a new pull request, #1102: URL: https://github.com/apache/datafusion-python/pull/1102 Bumps [crossbeam-channel](https://github.com/crossbeam-rs/crossbeam) from 0.5.13 to 0.5.15. Release notes Sourced from https://github.com/crossbeam-rs/crossbeam/releases";>c

[I] Release Comet 0.8.0 [datafusion-comet]

2025-04-10 Thread via GitHub
andygrove opened a new issue, #1635: URL: https://github.com/apache/datafusion-comet/issues/1635 ### What is the problem the feature request solves? I would like to start a discussion about releasing the next version of Comet. Issues to resolve / PRs to merge: - https://g

Re: [I] Can we add `udtf` to `FunctionRegistry`? [datafusion]

2025-04-10 Thread via GitHub
alamb commented on issue #15095: URL: https://github.com/apache/datafusion/issues/15095#issuecomment-2794357961 Maybe we could pull FunctionRegistry into catalog or something 🤔 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

Re: [I] Question: why is the Visitor trait limited to statements, relations & expressions? [datafusion-sqlparser-rs]

2025-04-10 Thread via GitHub
alamb commented on issue #934: URL: https://github.com/apache/datafusion-sqlparser-rs/issues/934#issuecomment-2794401554 Sorry for the late reply -- Option 2 above seems like a reasonable idea to me. @iffyio does that sound reasonable to you? -- This is an automated message from

Re: [PR] [BLOG] tpchgen-rs: World’s fastest open source TPCH data generator, written in Rust [datafusion-site]

2025-04-10 Thread via GitHub
alamb commented on PR #67: URL: https://github.com/apache/datafusion-site/pull/67#issuecomment-2794289480 BTW I made a demo video here: https://www.youtube.com/watch?v=UYIC57hlL14 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitH

Re: [PR] [BLOG] tpchgen-rs: World’s fastest open source TPCH data generator, written in Rust [datafusion-site]

2025-04-10 Thread via GitHub
alamb commented on PR #67: URL: https://github.com/apache/datafusion-site/pull/67#issuecomment-2794290825 Thanks everyone! Oneward! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific co

Re: [PR] [BLOG] tpchgen-rs: World’s fastest open source TPCH data generator, written in Rust [datafusion-site]

2025-04-10 Thread via GitHub
alamb merged PR #67: URL: https://github.com/apache/datafusion-site/pull/67 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusio

Re: [PR] Introduce DynamicFilterSource and DynamicPhysicalExpr [datafusion]

2025-04-10 Thread via GitHub
adriangb commented on code in PR #15568: URL: https://github.com/apache/datafusion/pull/15568#discussion_r2037378369 ## datafusion/physical-expr/src/expressions/dynamic_filters.rs: ## @@ -0,0 +1,380 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more co

Re: [PR] ExecutionPlan: add APIs for filter pushdown & optimizer rule to apply them [datafusion]

2025-04-10 Thread via GitHub
berkaysynnada commented on PR #15566: URL: https://github.com/apache/datafusion/pull/15566#issuecomment-2794211343 I tried to summarize our proposal with @ozankabak https://synnada.notion.site/FilterPushdown-API-Proposal-1d1f46d2dab1802e80a7e1bccec2604f?pvs=73 @adriangb @alamb

[I] Upgrade to DataFusion 47.0.0 [datafusion-comet]

2025-04-10 Thread via GitHub
andygrove opened a new issue, #1634: URL: https://github.com/apache/datafusion-comet/issues/1634 ### What is the problem the feature request solves? DataFusion 47.0.0 has (or will have) improvements that help Comet, especially: - Implemented group accumulator for first_value, i

Re: [PR] Remove waits from blocking threads reading spill files. [datafusion]

2025-04-10 Thread via GitHub
andygrove commented on PR #15654: URL: https://github.com/apache/datafusion/pull/15654#issuecomment-2794408768 > @andygrove any chance you could check Comet's performance with this alternative implementation: https://github.com/ashdnazg/datafusion/tree/pull-batch-2 ? It attempts to remove

Re: [PR] ExecutionPlan: add APIs for filter pushdown & optimizer rule to apply them [datafusion]

2025-04-10 Thread via GitHub
ozankabak commented on PR #15566: URL: https://github.com/apache/datafusion/pull/15566#issuecomment-2793246314 I think we can work on this PR, a new PR may not be necessary. We will share a design document with you describing why we think a variation of the structure @alamb proposed is like

[I] Benchmark automation [datafusion-comet]

2025-04-10 Thread via GitHub
andygrove opened a new issue, #1636: URL: https://github.com/apache/datafusion-comet/issues/1636 ### What is the problem the feature request solves? I have been spending significant time manually running benchmarks, both during development and when preparing to release Comet. I have b

Re: [PR] fix(substrait): fix regressed edge case in renaming inner struct fields [datafusion]

2025-04-10 Thread via GitHub
Blizzara commented on code in PR #15634: URL: https://github.com/apache/datafusion/pull/15634#discussion_r2033110193 ## datafusion/substrait/tests/cases/roundtrip_logical_plan.rs: ## @@ -1061,7 +1061,7 @@ async fn roundtrip_literal_list() -> Result<()> { async fn roundtrip_lite

[I] Internal error in ExternalSorter when running with memory limit [datafusion]

2025-04-10 Thread via GitHub
DerGut opened a new issue, #15675: URL: https://github.com/apache/datafusion/issues/15675 ### Describe the bug When running a sort with a low memory limit, DataFusion can run into an internal error. I noticed this with the `SHOW ALL;` command, which is converted to `SELECT name, valu

Re: [PR] (WIP) Upgrade to arrow/parquet 55 [datafusion]

2025-04-10 Thread via GitHub
mbutrovich commented on PR #15466: URL: https://github.com/apache/datafusion/pull/15466#issuecomment-2794522838 Still seeing if this is just noise, but here are flame graphs for Q14 from my machine if anyone else wants to stare at them: This PR: ![pr](https://github.com/user-attac

Re: [PR] chore: Add manually-triggered CI jobs for testing Spark SQL with native scans [datafusion-comet]

2025-04-10 Thread via GitHub
andygrove commented on code in PR #1624: URL: https://github.com/apache/datafusion-comet/pull/1624#discussion_r2037866933 ## .github/workflows/spark_sql_test_native_datafusion.yml: ## @@ -0,0 +1,71 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contr

Re: [I] Fix PREPARE statement tests [datafusion]

2025-04-10 Thread via GitHub
qstommyshu commented on issue #15577: URL: https://github.com/apache/datafusion/issues/15577#issuecomment-2792823355 Got it @brayanjuls , thanks for the research! I agree with your statement according to the two issues you mentioned. I think you can go ahead a fix those tests now if @

Re: [PR] Implement Future for SpawnedTask. [datafusion]

2025-04-10 Thread via GitHub
geoffreyclaude commented on code in PR #15653: URL: https://github.com/apache/datafusion/pull/15653#discussion_r2037465610 ## datafusion/common-runtime/src/common.rs: ## @@ -15,18 +15,23 @@ // specific language governing permissions and limitations // under the License. -use

Re: [PR] Fix tokenization of qualified identifiers with numeric prefix. [datafusion-sqlparser-rs]

2025-04-10 Thread via GitHub
romanb commented on code in PR #1803: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1803#discussion_r2037465505 ## tests/sqlparser_mysql.rs: ## @@ -1926,6 +1926,106 @@ fn parse_select_with_numeric_prefix_column_name() { } } +#[test] +fn parse_qualified_ide

Re: [I] Improve the performance of early exit evaluation in binary_expr [datafusion]

2025-04-10 Thread via GitHub
alamb commented on issue #15631: URL: https://github.com/apache/datafusion/issues/15631#issuecomment-2792363126 `ShortCircuitStrategy` is a pretty neat idea In my opinion, as long as the code is easy to understand, makes realistic benchmarks faster, and doesn't regress existing perfo

Re: [PR] Fix tokenization of qualified identifiers with numeric prefix. [datafusion-sqlparser-rs]

2025-04-10 Thread via GitHub
romanb commented on code in PR #1803: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1803#discussion_r2037467456 ## tests/sqlparser_mysql.rs: ## @@ -1926,6 +1926,106 @@ fn parse_select_with_numeric_prefix_column_name() { } } +#[test] +fn parse_qualified_ide

Re: [PR] fix: recursion protection for physical plan node [datafusion]

2025-04-10 Thread via GitHub
jayzhan211 merged PR #15600: URL: https://github.com/apache/datafusion/pull/15600 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dat

Re: [PR] Introduce DynamicFilterSource and DynamicPhysicalExpr [datafusion]

2025-04-10 Thread via GitHub
adriangb commented on code in PR #15568: URL: https://github.com/apache/datafusion/pull/15568#discussion_r2037348021 ## datafusion/physical-expr/src/expressions/dynamic_filters.rs: ## @@ -0,0 +1,380 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more co

Re: [PR] Introduce DynamicFilterSource and DynamicPhysicalExpr [datafusion]

2025-04-10 Thread via GitHub
adriangb commented on code in PR #15568: URL: https://github.com/apache/datafusion/pull/15568#discussion_r2037417258 ## datafusion/physical-expr/src/expressions/dynamic_filters.rs: ## @@ -0,0 +1,380 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more co

[PR] chore(deps): bump crossbeam-channel from 0.5.14 to 0.5.15 [datafusion-ballista]

2025-04-10 Thread via GitHub
dependabot[bot] opened a new pull request, #1233: URL: https://github.com/apache/datafusion-ballista/pull/1233 Bumps [crossbeam-channel](https://github.com/crossbeam-rs/crossbeam) from 0.5.14 to 0.5.15. Release notes Sourced from https://github.com/crossbeam-rs/crossbeam/releases";

Re: [PR] feat: Add ConfigOptions to ScalarFunctionArgs [datafusion]

2025-04-10 Thread via GitHub
Omega359 commented on PR #13527: URL: https://github.com/apache/datafusion/pull/13527#issuecomment-2794063751 > Another idea: > > Since [`ExecutionProps`](https://docs.rs/datafusion/latest/datafusion/execution/context/struct.ExecutionProps.html) is already threaded all the way throug

Re: [I] Support fast group accumulator for `first` and `last` [datafusion]

2025-04-10 Thread via GitHub
comphead closed issue #13998: Support fast group accumulator for `first` and `last` URL: https://github.com/apache/datafusion/issues/13998 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specifi

Re: [I] Spark executor fail to start occasionally with SIGILL [datafusion-comet]

2025-04-10 Thread via GitHub
mixermt commented on issue #1598: URL: https://github.com/apache/datafusion-comet/issues/1598#issuecomment-2794100210 > Wow that's phenomenal! Are you able to share some (vague if necessary) descriptions of your workload, cluster hardware, storage source, and what sort of tuning (if an

Re: [PR] feat: Add tracing regression tests [datafusion]

2025-04-10 Thread via GitHub
alamb merged PR #15673: URL: https://github.com/apache/datafusion/pull/15673 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [I] Add tracing regression smoke tests [datafusion]

2025-04-10 Thread via GitHub
alamb closed issue #15672: Add tracing regression smoke tests URL: https://github.com/apache/datafusion/issues/15672 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscr

Re: [PR] docs: docs for benchmarking in aws ec2 [datafusion-comet]

2025-04-10 Thread via GitHub
rluvaton commented on PR #1601: URL: https://github.com/apache/datafusion-comet/pull/1601#issuecomment-2794447417 @andygrove The configuration for comet vs spark doesn't look fair, comet has twice the memory (16gb for spark and 16gb for native) and spark benchmark only has 16gb -- This

Re: [PR] ExecutionPlan: add APIs for filter pushdown & optimizer rule to apply them [datafusion]

2025-04-10 Thread via GitHub
adriangb commented on PR #15566: URL: https://github.com/apache/datafusion/pull/15566#issuecomment-2793342766 Sounds good to me -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific commen

Re: [PR] chore: Add manually-triggered CI jobs for testing Spark SQL with native scans [datafusion-comet]

2025-04-10 Thread via GitHub
kazuyukitanimura commented on code in PR #1624: URL: https://github.com/apache/datafusion-comet/pull/1624#discussion_r2037820965 ## .github/workflows/spark_sql_test_native_datafusion.yml: ## @@ -0,0 +1,71 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or mor

Re: [PR] Improve performance of `last_value` by implementing special `GroupsAccumulator` [datafusion]

2025-04-10 Thread via GitHub
comphead merged PR #15542: URL: https://github.com/apache/datafusion/pull/15542 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

Re: [PR] docs: docs for benchmarking in aws ec2 [datafusion-comet]

2025-04-10 Thread via GitHub
andygrove commented on PR #1601: URL: https://github.com/apache/datafusion-comet/pull/1601#issuecomment-2794459570 > @andygrove The configuration for comet vs spark doesn't look fair, comet has twice the memory (16gb for spark and 16gb for native) and spark benchmark only has 16gb T

Re: [PR] chore: refactor v2 scan conversion [datafusion-comet]

2025-04-10 Thread via GitHub
kazuyukitanimura commented on code in PR #1621: URL: https://github.com/apache/datafusion-comet/pull/1621#discussion_r2037832773 ## spark/src/main/scala/org/apache/comet/CometSparkSessionExtensions.scala: ## @@ -197,8 +133,7 @@ class CometSparkSessionExtensions if (COME

Re: [PR] fix: fix spark/sql test failures in native_iceberg_compat [datafusion-comet]

2025-04-10 Thread via GitHub
parthchandra commented on PR #1593: URL: https://github.com/apache/datafusion-comet/pull/1593#issuecomment-2794479553 @mbutrovich @andygrove this is ready for review. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use th

Re: [I] Add documentation on building with Spark 4 [datafusion-comet]

2025-04-10 Thread via GitHub
andygrove closed issue #515: Add documentation on building with Spark 4 URL: https://github.com/apache/datafusion-comet/issues/515 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment

Re: [I] Add documentation on building with Spark 4 [datafusion-comet]

2025-04-10 Thread via GitHub
andygrove commented on issue #515: URL: https://github.com/apache/datafusion-comet/issues/515#issuecomment-2794543103 This can be closed now -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the s

Re: [PR] chore: Add manually-triggered CI jobs for testing Spark SQL with native scans [datafusion-comet]

2025-04-10 Thread via GitHub
andygrove commented on code in PR #1624: URL: https://github.com/apache/datafusion-comet/pull/1624#discussion_r2037874965 ## .github/workflows/spark_sql_test_native_datafusion.yml: ## @@ -0,0 +1,71 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contr

Re: [PR] docs: docs for benchmarking in aws ec2 [datafusion-comet]

2025-04-10 Thread via GitHub
andygrove commented on PR #1601: URL: https://github.com/apache/datafusion-comet/pull/1601#issuecomment-2794507699 > @andygrove The configuration for comet vs spark doesn't look fair, comet has twice the memory (16gb for spark and 16gb for native) and spark benchmark only has 16gb I

Re: [PR] Implement Future for SpawnedTask. [datafusion]

2025-04-10 Thread via GitHub
geoffreyclaude commented on PR #15653: URL: https://github.com/apache/datafusion/pull/15653#issuecomment-2794553315 > This looks really nice to me -- thank you @ashdnazg 🙏 > > Let's wait a while before merging to give @geoffreyclaude and others a chance to review it again if they wa

[PR] feat: Add tracing regression tests [datafusion]

2025-04-10 Thread via GitHub
geoffreyclaude opened a new pull request, #15673: URL: https://github.com/apache/datafusion/pull/15673 ## Which issue does this PR close? - Closes #15672. ## Rationale for this change This PR adds smoke tests to verify that the `JoinSetTracer` is correctly injected into

Re: [PR] ExecutionPlan: add APIs for filter pushdown & optimizer rule to apply them [datafusion]

2025-04-10 Thread via GitHub
berkaysynnada commented on PR #15566: URL: https://github.com/apache/datafusion/pull/15566#issuecomment-2792203825 > The TLDR is that because each filter may be allowed or not, and maybe transformed, to be pushed down into each child we end up with a matrix of filters x children that we ne

Re: [PR] fix: fix spark/sql test failures in native_iceberg_compat [datafusion-comet]

2025-04-10 Thread via GitHub
andygrove commented on code in PR #1593: URL: https://github.com/apache/datafusion-comet/pull/1593#discussion_r2037886670 ## common/src/main/java/org/apache/comet/parquet/NativeBatchReader.java: ## @@ -496,25 +526,25 @@ private int loadNextBatch() throws Throwable { List co

Re: [PR] Introduce DynamicFilterSource and DynamicPhysicalExpr [datafusion]

2025-04-10 Thread via GitHub
alamb commented on code in PR #15568: URL: https://github.com/apache/datafusion/pull/15568#discussion_r2037010293 ## datafusion/physical-expr-common/src/physical_expr.rs: ## @@ -283,6 +284,51 @@ pub trait PhysicalExpr: Send + Sync + Display + Debug + DynEq + DynHash { ///

Re: [PR] ExecutionPlan: add APIs for filter pushdown & optimizer rule to apply them [datafusion]

2025-04-10 Thread via GitHub
berkaysynnada commented on PR #15566: URL: https://github.com/apache/datafusion/pull/15566#issuecomment-2792250700 @adriangb I don’t want you to go back and forth excessively. Since you’ve spent a lot of time on this and are supporting the current version as the easiest and most understanda

[PR] Adding support for `INHERITS` in `CREATE TABLE` parsing [datafusion-sqlparser-rs]

2025-04-10 Thread via GitHub
LucaCappelletti94 opened a new pull request, #1806: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1806 This pull request will add support for parsing the `INHERITS` keyword in the `CREATE TABLE` statements. Solves issue #1804 -- This is an automated message from the

[PR] chore(deps): bump petgraph from 0.7.1 to 0.8.1 [datafusion]

2025-04-10 Thread via GitHub
dependabot[bot] opened a new pull request, #15669: URL: https://github.com/apache/datafusion/pull/15669 Bumps [petgraph](https://github.com/petgraph/petgraph) from 0.7.1 to 0.8.1. Release notes Sourced from https://github.com/petgraph/petgraph/releases";>petgraph's releases.

[I] Missing support for `INHERITS` operation from PostgreSQL [datafusion-sqlparser-rs]

2025-04-10 Thread via GitHub
LucaCappelletti94 opened a new issue, #1804: URL: https://github.com/apache/datafusion-sqlparser-rs/issues/1804 Hi all, It looks like `sqlparser` does not currently support the [`INHERITS` operation](https://www.postgresql.org/docs/current/ddl-inherit.html). ```sql CREATE TA

[PR] Snowflake COPY INTO target columns, regular selec items and optional alias [datafusion-sqlparser-rs]

2025-04-10 Thread via GitHub
yoavcloud opened a new pull request, #1805: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1805 This PR adds support for the following in Snowflake's COPY INTO: 1. Specifying a list of columns in the destination table: ```sql COPY INTO [.] [ ( [ , ... ] ) ] ``` 2

Re: [PR] Introduce DynamicFilterSource and DynamicPhysicalExpr [datafusion]

2025-04-10 Thread via GitHub
berkaysynnada commented on code in PR #15568: URL: https://github.com/apache/datafusion/pull/15568#discussion_r2037022463 ## datafusion/physical-expr/src/expressions/dynamic_filters.rs: ## @@ -0,0 +1,442 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or mo

Re: [PR] Add support for MySQL's STRAIGHT_JOIN join operator. [datafusion-sqlparser-rs]

2025-04-10 Thread via GitHub
iffyio merged PR #1802: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1802 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr

Re: [PR] Implement Future for SpawnedTask. [datafusion]

2025-04-10 Thread via GitHub
alamb commented on code in PR #15653: URL: https://github.com/apache/datafusion/pull/15653#discussion_r2037038176 ## datafusion/common-runtime/src/common.rs: ## @@ -77,17 +82,32 @@ impl SpawnedTask { } } +impl Future for SpawnedTask { +type Output = Result; + +fn

Re: [PR] feat: Add ConfigOptions to ScalarFunctionArgs [datafusion]

2025-04-10 Thread via GitHub
alamb commented on PR #13527: URL: https://github.com/apache/datafusion/pull/13527#issuecomment-2792307005 Another idea: Since [`ExecutionProps`](https://docs.rs/datafusion/latest/datafusion/execution/context/struct.ExecutionProps.html) is already threaded all the way through and is

Re: [PR] chore(ci): replace `actions-rs` which are deprecated [datafusion-ballista]

2025-04-10 Thread via GitHub
milenkovicm merged PR #1222: URL: https://github.com/apache/datafusion-ballista/pull/1222 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubsc

Re: [PR] Implement Future for SpawnedTask. [datafusion]

2025-04-10 Thread via GitHub
ashdnazg commented on code in PR #15653: URL: https://github.com/apache/datafusion/pull/15653#discussion_r2037104099 ## datafusion/common-runtime/src/common.rs: ## @@ -77,17 +82,32 @@ impl SpawnedTask { } } +impl Future for SpawnedTask { +type Output = Result; + +

Re: [PR] feat: Add Aggregate UDF to FFI crate [datafusion]

2025-04-10 Thread via GitHub
timsaucer commented on PR #14775: URL: https://github.com/apache/datafusion/pull/14775#issuecomment-2792396725 @robtandy As someone keenly interested in the FFI work, would you be able to do a review? -- This is an automated message from the Apache Git Service. To respond to the message,

[PR] Set default metadata prefetch size to 512k [datafusion]

2025-04-10 Thread via GitHub
alamb opened a new pull request, #15670: URL: https://github.com/apache/datafusion/pull/15670 ## Which issue does this PR close? - Closes #. ## Rationale for this change Inspired by looking at some profiling results I think the metadata prefetch hint will reduce

Re: [PR] (WIP) Upgrade to arrow/parquet 55 [datafusion]

2025-04-10 Thread via GitHub
alamb commented on PR #15466: URL: https://github.com/apache/datafusion/pull/15466#issuecomment-2792347897 I tried briefly to reproduce the performance improvements reported above and it seems like I can: [q14.sql.txt](https://github.com/user-attachments/files/19683249/q14.sql.txt)

Re: [PR] Optimize BinaryExpr Evaluation with Short-Circuiting for AND/OR Operators [datafusion]

2025-04-10 Thread via GitHub
kosiew commented on PR #15648: URL: https://github.com/apache/datafusion/pull/15648#issuecomment-2792382973 hi @alamb Here are the benchmark results after incorporating @acking-you 's [enum suggestion](https://github.com/apache/datafusion/issues/15636#issuecomment-2789008443)

Re: [I] Improve the performance of early exit evaluation in binary_expr [datafusion]

2025-04-10 Thread via GitHub
Dandandan commented on issue #15631: URL: https://github.com/apache/datafusion/issues/15631#issuecomment-2792442938 > I don't know if you think it's a good idea? @alamb @Dandandan I think it is a pretty good idea given that evaluation is so important. -- This is an automated messag

Re: [PR] Support bounds evaluation for temporal data types [datafusion]

2025-04-10 Thread via GitHub
berkaysynnada commented on PR #14523: URL: https://github.com/apache/datafusion/pull/14523#issuecomment-2792466034 > Hi @berkaysynnada, can you take another look to move this forward? :) I will do but some tests are failing. I'll have some time probably in the evening or tomorrow morn

Re: [PR] Introduce DynamicFilterSource and DynamicPhysicalExpr [datafusion]

2025-04-10 Thread via GitHub
jayzhan211 commented on code in PR #15568: URL: https://github.com/apache/datafusion/pull/15568#discussion_r2037178254 ## datafusion/physical-expr/src/expressions/dynamic_filters.rs: ## @@ -0,0 +1,380 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more

Re: [PR] Introduce DynamicFilterSource and DynamicPhysicalExpr [datafusion]

2025-04-10 Thread via GitHub
jayzhan211 commented on code in PR #15568: URL: https://github.com/apache/datafusion/pull/15568#discussion_r2037178254 ## datafusion/physical-expr/src/expressions/dynamic_filters.rs: ## @@ -0,0 +1,380 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more

Re: [PR] Introduce DynamicFilterSource and DynamicPhysicalExpr [datafusion]

2025-04-10 Thread via GitHub
jayzhan211 commented on code in PR #15568: URL: https://github.com/apache/datafusion/pull/15568#discussion_r2037185406 ## datafusion/physical-expr/src/expressions/dynamic_filters.rs: ## @@ -0,0 +1,380 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more

Re: [PR] Introduce DynamicFilterSource and DynamicPhysicalExpr [datafusion]

2025-04-10 Thread via GitHub
jayzhan211 commented on code in PR #15568: URL: https://github.com/apache/datafusion/pull/15568#discussion_r2037178254 ## datafusion/physical-expr/src/expressions/dynamic_filters.rs: ## @@ -0,0 +1,380 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more

Re: [PR] Introduce DynamicFilterSource and DynamicPhysicalExpr [datafusion]

2025-04-10 Thread via GitHub
jayzhan211 commented on code in PR #15568: URL: https://github.com/apache/datafusion/pull/15568#discussion_r2037178254 ## datafusion/physical-expr/src/expressions/dynamic_filters.rs: ## @@ -0,0 +1,380 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more

Re: [PR] Implement Future for SpawnedTask. [datafusion]

2025-04-10 Thread via GitHub
eshed-flarion commented on code in PR #15653: URL: https://github.com/apache/datafusion/pull/15653#discussion_r2037103221 ## datafusion/common-runtime/src/common.rs: ## @@ -77,17 +82,32 @@ impl SpawnedTask { } } +impl Future for SpawnedTask { +type Output = Result; +

Re: [PR] feat: Add more testы for nested types combinations for `native_datafusion` [datafusion-comet]

2025-04-10 Thread via GitHub
codecov-commenter commented on PR #1632: URL: https://github.com/apache/datafusion-comet/pull/1632#issuecomment-2791312100 ## [Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/1632?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_ca

Re: [PR] Introduce DynamicFilterSource and DynamicPhysicalExpr [datafusion]

2025-04-10 Thread via GitHub
jayzhan211 commented on code in PR #15568: URL: https://github.com/apache/datafusion/pull/15568#discussion_r2037235439 ## datafusion/physical-expr/src/expressions/dynamic_filters.rs: ## @@ -0,0 +1,380 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more

Re: [PR] Introduce DynamicFilterSource and DynamicPhysicalExpr [datafusion]

2025-04-10 Thread via GitHub
jayzhan211 commented on code in PR #15568: URL: https://github.com/apache/datafusion/pull/15568#discussion_r2037235927 ## datafusion/physical-expr/src/expressions/dynamic_filters.rs: ## @@ -0,0 +1,380 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more

[PR] Public some projected methods in `FileScanConfig` [datafusion]

2025-04-10 Thread via GitHub
xudong963 opened a new pull request, #15671: URL: https://github.com/apache/datafusion/pull/15671 ## Which issue does this PR close? - Closes #. ## Rationale for this change While upgrading DF46, I found tthat hese methods aren't public, but under some scenes

  1   2   3   4   >