Re: [I] Add `output_bytes` metrics to Explain Analyze [datafusion]

2025-06-03 Thread via GitHub
hendrikmakait commented on issue #16244: URL: https://github.com/apache/datafusion/issues/16244#issuecomment-2938796964 I'd be interested in working on this, but I might need a little guidance since I'm new to the project. -- This is an automated message from the Apache Git Service. To re

Re: [I] Library fails to parse function [datafusion-sqlparser-rs]

2025-06-03 Thread via GitHub
LucaCappelletti94 closed issue #1825: Library fails to parse function URL: https://github.com/apache/datafusion-sqlparser-rs/issues/1825 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific c

Re: [PR] Improve performance of constant aggregate window expression [datafusion]

2025-06-03 Thread via GitHub
suibianwanwank commented on code in PR #16234: URL: https://github.com/apache/datafusion/pull/16234#discussion_r2125728466 ## datafusion/physical-expr/src/window/window_expr.rs: ## @@ -186,6 +186,10 @@ pub trait AggregateWindowExpr: WindowExpr { accumulator: &mut Box,

Re: [PR] Update tpch, clickbench, sort_tpch to mark failed queries [datafusion]

2025-06-03 Thread via GitHub
ding-young commented on code in PR #16182: URL: https://github.com/apache/datafusion/pull/16182#discussion_r2125694455 ## benchmarks/src/util/run.rs: ## @@ -138,6 +144,28 @@ impl BenchmarkRun { } } +/// Print the names of failed queries, if any +pub fn ma

Re: [I] How to write csv file to disk from a empty dataframe? [datafusion]

2025-06-03 Thread via GitHub
chenkovsky commented on issue #16240: URL: https://github.com/apache/datafusion/issues/16240#issuecomment-2938605350 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [I] Document semi join, anti semi join and more supported join types [datafusion]

2025-06-03 Thread via GitHub
niebayes commented on issue #16245: URL: https://github.com/apache/datafusion/issues/16245#issuecomment-2938545089 @jonathanc-n Feel free to take this. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to g

Re: [I] Document semi join, anti semi join and more supported join types [datafusion]

2025-06-03 Thread via GitHub
jonathanc-n commented on issue #16245: URL: https://github.com/apache/datafusion/issues/16245#issuecomment-2938425178 @niebayes Oh i see, do you want to take this up? if not i'd be interested in doing it, i've been looking a lot into the join implementations recently. -- This is an automa

Re: [PR] Add `--substrait-round-trip` option in sqllogictests [datafusion]

2025-06-03 Thread via GitHub
2010YOUY01 commented on code in PR #16183: URL: https://github.com/apache/datafusion/pull/16183#discussion_r2125481070 ## datafusion/sqllogictest/bin/sqllogictests.rs: ## @@ -102,6 +103,11 @@ async fn run_tests() -> Result<()> { // to stdout and return OK so they can co

Re: [I] Release DataFusion `48.0.0` (June 2025) [datafusion]

2025-06-03 Thread via GitHub
xudong963 commented on issue #15771: URL: https://github.com/apache/datafusion/issues/15771#issuecomment-2938210171 48.0.0 rc1 is ready: https://github.com/apache/datafusion/tree/48.0.0-rc1 -- This is an automated message from the Apache Git Service. To respond to the message, please log o

Re: [PR] Simplify FileSource / SchemaAdapterFactory API [datafusion]

2025-06-03 Thread via GitHub
xudong963 merged PR #16214: URL: https://github.com/apache/datafusion/pull/16214 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@data

Re: [PR] Prepare for 48.0.0 release: Version and Changelog [datafusion]

2025-06-03 Thread via GitHub
xudong963 merged PR #16238: URL: https://github.com/apache/datafusion/pull/16238 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@data

Re: [I] Document semi join, anti semi join and more supported join types [datafusion]

2025-06-03 Thread via GitHub
niebayes commented on issue #16245: URL: https://github.com/apache/datafusion/issues/16245#issuecomment-2938190140 @jonathanc-n See: https://datafusion.apache.org/user-guide/sql/select.html#join-clause -- This is an automated message from the Apache Git Service. To respond to the message,

Re: [PR] Perf: load default Utf8View for CSV datatype [datafusion]

2025-06-03 Thread via GitHub
zhuqi-lucas commented on PR #16243: URL: https://github.com/apache/datafusion/pull/16243#issuecomment-2938172961 It looks like no performance improvement for h2o_window benchmark result... -- This is an automated message from the Apache Git Service. To respond to the message, please log on

Re: [PR] feat: Allow cancelling of grouping operations which are CPU bound [datafusion]

2025-06-03 Thread via GitHub
zhuqi-lucas commented on PR #16196: URL: https://github.com/apache/datafusion/pull/16196#issuecomment-2938168285 Thank you @pepijnve , since we have both POC for two ways: 1. Current PR unified solution, but one more exec (YieldStreamExec) exposed to customers. 2. Original PR, insert

Re: [PR] Improve performance of constant aggregate window expression [datafusion]

2025-06-03 Thread via GitHub
2010YOUY01 commented on code in PR #16234: URL: https://github.com/apache/datafusion/pull/16234#discussion_r2125366399 ## datafusion/physical-expr/src/window/aggregate.rs: ## @@ -85,6 +88,25 @@ impl PlainAggregateWindowExpr { ); } } + +// Returns t

Re: [I] Support `map_values` [datafusion-comet]

2025-06-03 Thread via GitHub
comphead closed issue #1789: Support `map_values` URL: https://github.com/apache/datafusion-comet/issues/1789 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-

Re: [PR] chore(deps): update rand requirement from 0.8 to 0.9 [datafusion]

2025-06-03 Thread via GitHub
github-actions[bot] closed pull request #14333: chore(deps): update rand requirement from 0.8 to 0.9 URL: https://github.com/apache/datafusion/pull/14333 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [PR] fix: Fix incorrect logic in CometExecRule aggregate handling [datafusion-comet]

2025-06-03 Thread via GitHub
codecov-commenter commented on PR #1841: URL: https://github.com/apache/datafusion-comet/pull/1841#issuecomment-2937902307 ## [Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/1841?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_ca

Re: [PR] chore(deps): update rand requirement from 0.8 to 0.9 [datafusion]

2025-06-03 Thread via GitHub
dependabot[bot] commented on PR #14333: URL: https://github.com/apache/datafusion/pull/14333#issuecomment-2938080710 OK, I won't notify you again about this release, but will get in touch when a new version is available. If you'd rather skip all updates until the next major or minor version

Re: [I] Consolidate schema adapter tests in schema_adapter_integration_tests.rs [datafusion]

2025-06-03 Thread via GitHub
xudong963 closed issue #16202: Consolidate schema adapter tests in schema_adapter_integration_tests.rs URL: https://github.com/apache/datafusion/issues/16202 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above t

Re: [PR] Track peak_mem_used in ExternalSorter [datafusion]

2025-06-03 Thread via GitHub
2010YOUY01 commented on PR #16192: URL: https://github.com/apache/datafusion/pull/16192#issuecomment-2938019298 @ding-young I also think the code to manage `reservation` + `merge_reservation` is tricky. I'll try to answer your questions by adding more doc about memory reservation management

Re: [I] Add `output_bytes` metrics to Explain Analyze [datafusion]

2025-06-03 Thread via GitHub
2010YOUY01 commented on issue #16244: URL: https://github.com/apache/datafusion/issues/16244#issuecomment-2938068143 It's a good idea. A quick reminder for someone who is willing to implement it: It's possible that multiple `Array`s share the same underlying buffer -- those `Array`s c

Re: [PR] Update tpch, clickbench, sort_tpch to mark failed queries [datafusion]

2025-06-03 Thread via GitHub
2010YOUY01 commented on code in PR #16182: URL: https://github.com/apache/datafusion/pull/16182#discussion_r2125283941 ## benchmarks/src/util/run.rs: ## @@ -138,6 +144,28 @@ impl BenchmarkRun { } } +/// Print the names of failed queries, if any +pub fn ma

Re: [PR] Add example demonstrating how Parquet encryption could be configured with KMS integration [datafusion]

2025-06-03 Thread via GitHub
adamreeve commented on code in PR #16237: URL: https://github.com/apache/datafusion/pull/16237#discussion_r2125251156 ## datafusion-examples/examples/parquet_encryption_with_kms.rs: ## @@ -0,0 +1,205 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more c

Re: [PR] feat: Support null aware + equijoins for `NestedLoopJoin` [datafusion]

2025-06-03 Thread via GitHub
comphead commented on code in PR #16210: URL: https://github.com/apache/datafusion/pull/16210#discussion_r2125246086 ## datafusion/physical-plan/src/joins/nested_loop_join.rs: ## @@ -1129,6 +1316,95 @@ pub(crate) mod tests { ) } +fn build_left_table_equijoin(

Re: [PR] feat: Support null aware + equijoins for `NestedLoopJoin` [datafusion]

2025-06-03 Thread via GitHub
comphead commented on code in PR #16210: URL: https://github.com/apache/datafusion/pull/16210#discussion_r2125235875 ## datafusion/physical-plan/src/joins/nested_loop_join.rs: ## @@ -810,6 +871,123 @@ fn build_join_indices( } } +// Find matching indices based on join `on

Re: [PR] Add example demonstrating how Parquet encryption could be configured with KMS integration [datafusion]

2025-06-03 Thread via GitHub
corwinjoy commented on code in PR #16237: URL: https://github.com/apache/datafusion/pull/16237#discussion_r2125241447 ## datafusion-examples/examples/parquet_encryption_with_kms.rs: ## @@ -0,0 +1,205 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more c

Re: [PR] Add example demonstrating how Parquet encryption could be configured with KMS integration [datafusion]

2025-06-03 Thread via GitHub
corwinjoy commented on code in PR #16237: URL: https://github.com/apache/datafusion/pull/16237#discussion_r2125241447 ## datafusion-examples/examples/parquet_encryption_with_kms.rs: ## @@ -0,0 +1,205 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more c

Re: [PR] fix: Fix incorrect logic in CometExecRule aggregate handling [datafusion-comet]

2025-06-03 Thread via GitHub
andygrove closed pull request #1841: fix: Fix incorrect logic in CometExecRule aggregate handling URL: https://github.com/apache/datafusion-comet/pull/1841 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

Re: [PR] Add iceberg-rust to user list [datafusion]

2025-06-03 Thread via GitHub
comphead merged PR #16246: URL: https://github.com/apache/datafusion/pull/16246 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

[PR] Add iceberg-rust to user list [datafusion]

2025-06-03 Thread via GitHub
jonathanc-n opened a new pull request, #16246: URL: https://github.com/apache/datafusion/pull/16246 ## Which issue does this PR close? - Closes #. ## Rationale for this change ## What changes are included in this PR? ## Are these changes tes

Re: [I] Document semi join, anti semi join and more supported join types [datafusion]

2025-06-03 Thread via GitHub
jonathanc-n commented on issue #16245: URL: https://github.com/apache/datafusion/issues/16245#issuecomment-2937909505 Are there any documented joins in the documentation? I might be missing it but I dont seem to be able to find any. -- This is an automated message from the Apache Git Serv

Re: [PR] chore: Enable tests in RemoveRedundantProjectsSuite.scala related to issue #242 [datafusion-comet]

2025-06-03 Thread via GitHub
codecov-commenter commented on PR #1838: URL: https://github.com/apache/datafusion-comet/pull/1838#issuecomment-2937504058 ## [Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/1838?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_ca

[I] Document semi join, anti semi join and more supported join types [datafusion]

2025-06-03 Thread via GitHub
niebayes opened a new issue, #16245: URL: https://github.com/apache/datafusion/issues/16245 We support more join types, but only some are mentioned in the documentation. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[PR] fix; Fix incorrect logic in CometExecRule aggregate handling [datafusion-comet]

2025-06-03 Thread via GitHub
andygrove opened a new pull request, #1841: URL: https://github.com/apache/datafusion-comet/pull/1841 ## Which issue does this PR close? Closes #. ## Rationale for this change ## What changes are included in this PR? ## How are these changes

Re: [PR] chore: Add unit test to reproduce issue #1251 [datafusion-comet]

2025-06-03 Thread via GitHub
codecov-commenter commented on PR #1840: URL: https://github.com/apache/datafusion-comet/pull/1840#issuecomment-2937774481 ## [Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/1840?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_ca

[PR] chore: Add unit test to reproduce issue #1251 [datafusion-comet]

2025-06-03 Thread via GitHub
andygrove opened a new pull request, #1840: URL: https://github.com/apache/datafusion-comet/pull/1840 ## Which issue does this PR close? Part of https://github.com/apache/datafusion-comet/issues/1251 ## Rationale for this change Add test to Comet to reprod

Re: [I] Upgrade to hashbrown 0.15.1: migrate from `hashbrown::raw::RawTable` to `hashbrown::hash_table::HashTable` [datafusion]

2025-06-03 Thread via GitHub
comphead commented on issue #13433: URL: https://github.com/apache/datafusion/issues/13433#issuecomment-2937667067 Crosslinking original `raw` feature deprecation from `hashbrown` https://github.com/rust-lang/hashbrown/pull/546 -- This is an automated message from the Apache Git Service.

Re: [PR] fix: support `map_values` [datafusion-comet]

2025-06-03 Thread via GitHub
comphead merged PR #1835: URL: https://github.com/apache/datafusion-comet/pull/1835 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@d

Re: [I] Upgrade to hashbrown 0.15.1: migrate from `hashbrown::raw::RawTable` to `hashbrown::hash_table::HashTable` [datafusion]

2025-06-03 Thread via GitHub
comphead commented on issue #13433: URL: https://github.com/apache/datafusion/issues/13433#issuecomment-2937672599 reg to PRs above I think it is agreed to replace `RawTable` with `HashTable` across DataFusion? -- This is an automated message from the Apache Git Service. To respond to th

Re: [PR] chore: IgnoreCometNativeScan on a few more Spark SQL tests [datafusion-comet]

2025-06-03 Thread via GitHub
andygrove merged PR #1837: URL: https://github.com/apache/datafusion-comet/pull/1837 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@

[PR] Add a documentation build step in CI [datafusion-python]

2025-06-03 Thread via GitHub
crystalxyz opened a new pull request, #1139: URL: https://github.com/apache/datafusion-python/pull/1139 # Which issue does this PR close? Closes #1138. # Rationale for this change CI should fail early if documentation generation fails. # What changes are included in t

[I] DPP exchange reuse issue when using columnar shuffle [datafusion-comet]

2025-06-03 Thread via GitHub
andygrove opened a new issue, #1839: URL: https://github.com/apache/datafusion-comet/issues/1839 ### Describe the bug The test `SPARK-34637: DPP side broadcast query stage is created firstly` passes when using native shuffle, but fails when using columnar shuffle. ### Steps to

Re: [I] Enable more DPP Spark SQL tests [datafusion-comet]

2025-06-03 Thread via GitHub
rishvin commented on issue #1739: URL: https://github.com/apache/datafusion-comet/issues/1739#issuecomment-2937335355 Opened PR https://github.com/apache/datafusion-comet/pull/1838 to address renaming tests in RemoveRedundantProjectsSuite.scala -- This is an automated message from the Ap

[PR] Postgres: Apply `ONLY` keyword per table in TRUNCATE stmt [datafusion-sqlparser-rs]

2025-06-03 Thread via GitHub
MohamedAbdeen21 opened a new pull request, #1872: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1872 The `ONLY` keyword was previously applied to the entire stmt, the correct psql behaviour is to apply the keyword per table. -- This is an automated message from the Apache Gi

Re: [I] Add CI check for documentation build [datafusion-python]

2025-06-03 Thread via GitHub
crystalxyz commented on issue #1138: URL: https://github.com/apache/datafusion-python/issues/1138#issuecomment-2937321444 I'll take it :D -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spec

Re: [PR] Fix: Map functions crash on out of bounds cases [datafusion]

2025-06-03 Thread via GitHub
krishvishal commented on code in PR #16203: URL: https://github.com/apache/datafusion/pull/16203#discussion_r2124983624 ## datafusion/functions-nested/src/extract.rs: ## @@ -225,44 +252,37 @@ where return Ok(Arc::new(NullArray::new(array.len(; } +// Check

Re: [PR] Fix: Map functions crash on out of bounds cases [datafusion]

2025-06-03 Thread via GitHub
krishvishal commented on code in PR #16203: URL: https://github.com/apache/datafusion/pull/16203#discussion_r2124983434 ## datafusion/functions-nested/src/extract.rs: ## @@ -213,6 +213,33 @@ fn array_element_inner(args: &[ArrayRef]) -> Result { } } +/// Adjusts a 1-base

Re: [PR] Fix: Map functions crash on out of bounds cases [datafusion]

2025-06-03 Thread via GitHub
krishvishal commented on code in PR #16203: URL: https://github.com/apache/datafusion/pull/16203#discussion_r2124973342 ## datafusion/functions-nested/src/extract.rs: ## @@ -225,44 +252,37 @@ where return Ok(Arc::new(NullArray::new(array.len(; } +// Check

Re: [PR] Fix: Map functions crash on out of bounds cases [datafusion]

2025-06-03 Thread via GitHub
krishvishal commented on code in PR #16203: URL: https://github.com/apache/datafusion/pull/16203#discussion_r2124973342 ## datafusion/functions-nested/src/extract.rs: ## @@ -225,44 +252,37 @@ where return Ok(Arc::new(NullArray::new(array.len(; } +// Check

Re: [PR] Fix: Map functions crash on out of bounds cases [datafusion]

2025-06-03 Thread via GitHub
comphead commented on code in PR #16203: URL: https://github.com/apache/datafusion/pull/16203#discussion_r2124970073 ## datafusion/functions-nested/src/extract.rs: ## @@ -225,44 +252,37 @@ where return Ok(Arc::new(NullArray::new(array.len(; } +// Check if

Re: [PR] Fix: Map functions crash on out of bounds cases [datafusion]

2025-06-03 Thread via GitHub
krishvishal commented on code in PR #16203: URL: https://github.com/apache/datafusion/pull/16203#discussion_r2124965299 ## datafusion/functions-nested/src/extract.rs: ## @@ -225,44 +252,37 @@ where return Ok(Arc::new(NullArray::new(array.len(; } +// Check

[PR] Enable tests in RemoveRedundantProjectsSuite.scala related to issue #242 [datafusion-comet]

2025-06-03 Thread via GitHub
rishvin opened a new pull request, #1838: URL: https://github.com/apache/datafusion-comet/pull/1838 ## Which issue does this PR close? Addresses #1739 ## Rationale for this change Enables tests in `RemoveRedundantProjectsSuite.scala` related to #242. The Com

Re: [PR] Fix: Map functions crash on out of bounds cases [datafusion]

2025-06-03 Thread via GitHub
comphead commented on code in PR #16203: URL: https://github.com/apache/datafusion/pull/16203#discussion_r2124930746 ## datafusion/functions-nested/src/extract.rs: ## @@ -213,6 +213,33 @@ fn array_element_inner(args: &[ArrayRef]) -> Result { } } +/// Adjusts a 1-based a

Re: [PR] Fix: Map functions crash on out of bounds cases [datafusion]

2025-06-03 Thread via GitHub
comphead commented on code in PR #16203: URL: https://github.com/apache/datafusion/pull/16203#discussion_r212493 ## datafusion/functions-nested/src/extract.rs: ## @@ -225,44 +252,37 @@ where return Ok(Arc::new(NullArray::new(array.len(; } +// Check if

Re: [PR] Fix: Map functions crash on out of bounds cases [datafusion]

2025-06-03 Thread via GitHub
krishvishal commented on PR #16203: URL: https://github.com/apache/datafusion/pull/16203#issuecomment-2937158972 @comphead I've had to add the handler because the previous fix caused wrong behavior. For example the following query currently returns: ```sql > select [named_st

[PR] chore: IgnoreCometNativeScan on a few more Spark SQL tests [datafusion-comet]

2025-06-03 Thread via GitHub
mbutrovich opened a new pull request, #1837: URL: https://github.com/apache/datafusion-comet/pull/1837 ## Which issue does this PR close? Partially addresses #1542. ## Rationale for this change ## What changes are included in this PR? Update

[PR] MySQL: `[[NOT] ENFORCED]` in CHECK constraint [datafusion-sqlparser-rs]

2025-06-03 Thread via GitHub
MohamedAbdeen21 opened a new pull request, #1870: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1870 Add support for MySQL's `[[NOT] ENFORCED]` option for CHECK constraints. docs: https://dev.mysql.com/doc/refman/8.4/en/create-table.html -- This is an automated message

Re: [PR] Fix: Map functions crash on out of bounds cases [datafusion]

2025-06-03 Thread via GitHub
comphead commented on PR #16203: URL: https://github.com/apache/datafusion/pull/16203#issuecomment-2936960243 Thanks @krishvishal the latest version becomes much more complicated compared to prev one. This can be a subject to check the performance. What is the reason for adding the sp

Re: [PR] chore: IgnoreCometNativeScan on a few more Spark SQL tests [datafusion-comet]

2025-06-03 Thread via GitHub
codecov-commenter commented on PR #1837: URL: https://github.com/apache/datafusion-comet/pull/1837#issuecomment-2936987610 ## [Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/1837?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_ca

Re: [PR] feat: add metadata to physical literal expressions [datafusion]

2025-06-03 Thread via GitHub
timsaucer closed pull request #16053: feat: add metadata to physical literal expressions URL: https://github.com/apache/datafusion/pull/16053 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spec

Re: [I] Access Data from S3 in DeltaLake format using Ballista on Kubernetes [datafusion-ballista]

2025-06-03 Thread via GitHub
milenkovicm commented on issue #1268: URL: https://github.com/apache/datafusion-ballista/issues/1268#issuecomment-2936939066 Let me know if you need more help @janbraunsdorff -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub a

[PR] MySQL: `index_name` in FK constraints [datafusion-sqlparser-rs]

2025-06-03 Thread via GitHub
MohamedAbdeen21 opened a new pull request, #1871: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1871 Add support for `index_name` field in FK constraints in both CREATE and ALTER TABLE statements docs: https://dev.mysql.com/doc/refman/8.4/en/create-table-foreign-keys.htm

Re: [I] [EPIC] Spark SQL test failures when Comet JVM shuffle is used [datafusion-comet]

2025-06-03 Thread via GitHub
andygrove commented on issue #1254: URL: https://github.com/apache/datafusion-comet/issues/1254#issuecomment-2936747407 > Barring AQE and DPP tets ( addressed in a different PR [#1811](https://github.com/apache/datafusion-comet/pull/1811) ) , I am able to run these tests successfully . Not

Re: [I] Access Data from S3 in DeltaLake format using Ballista on Kubernetes [datafusion-ballista]

2025-06-03 Thread via GitHub
milenkovicm commented on issue #1268: URL: https://github.com/apache/datafusion-ballista/issues/1268#issuecomment-2936706643 As noted in #1241 there is no out of the built in support for `deltalake` file format, it's up to users to integrate it if needed. I have updated https://githu

Re: [I] Access Data from S3 in DeltaLake format using Ballista on Kubernetes [datafusion-ballista]

2025-06-03 Thread via GitHub
milenkovicm closed issue #1268: Access Data from S3 in DeltaLake format using Ballista on Kubernetes URL: https://github.com/apache/datafusion-ballista/issues/1268 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

Re: [PR] Perf: load default Utf8View for CSV datatype [datafusion]

2025-06-03 Thread via GitHub
alamb commented on PR #16243: URL: https://github.com/apache/datafusion/pull/16243#issuecomment-2936699578 🤖: Benchmark completed Details ``` Comparing HEAD and default_utf8_for_unkown_type Benchmark h2o_window.json

Re: [PR] Perf: load default Utf8View for CSV datatype [datafusion]

2025-06-03 Thread via GitHub
alamb commented on PR #16243: URL: https://github.com/apache/datafusion/pull/16243#issuecomment-2936634009 🤖 `./gh_compare_branch.sh` [Benchmark Script](https://github.com/alamb/datafusion-benchmarking/blob/main/gh_compare_branch.sh) Running Linux aal-dev 6.11.0-1013-gcp #13~24.04.1-Ubun

Re: [PR] Chore: implement bit_count as ScalarUDFImpl [datafusion-comet]

2025-06-03 Thread via GitHub
andygrove merged PR #1826: URL: https://github.com/apache/datafusion-comet/pull/1826 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@

Re: [PR] Chore: implement bit_count as ScalarUDFImpl [datafusion-comet]

2025-06-03 Thread via GitHub
kazantsev-maksim commented on PR #1826: URL: https://github.com/apache/datafusion-comet/pull/1826#issuecomment-2936605214 Thanks @andygrove! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the s

Re: [PR] Track peak_mem_used in ExternalSorter [datafusion]

2025-06-03 Thread via GitHub
ding-young commented on PR #16192: URL: https://github.com/apache/datafusion/pull/16192#issuecomment-2936483760 @2010YOUY01 Hi, I’ve been struggling a bit with tracking peak memory in SPM step, and I was wondering if I could ask for some help. ### 1. Can we add the memory for conv

Re: [PR] docs: Add documentation for native_datafusion Parquet scanner's S3 support [datafusion-comet]

2025-06-03 Thread via GitHub
parthchandra merged PR #1832: URL: https://github.com/apache/datafusion-comet/pull/1832 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr.

Re: [PR] Feat: Support Spark 4.0.0 part1 [datafusion-comet]

2025-06-03 Thread via GitHub
huaxingao commented on code in PR #1830: URL: https://github.com/apache/datafusion-comet/pull/1830#discussion_r2124462679 ## spark/src/main/spark-3.5/org/apache/spark/sql/comet/shims/ShimCometTPCDSMicroBenchmark.scala: ## @@ -0,0 +1,41 @@ +/* + * Licensed to the Apache Software

Re: [PR] chore: IgnoreCometNativeScan on a few more Spark SQL tests [datafusion-comet]

2025-06-03 Thread via GitHub
mbutrovich commented on PR #1837: URL: https://github.com/apache/datafusion-comet/pull/1837#issuecomment-2936363015 Draft while I do 3.4.3, 3.5.4, and 4.0.0-preview1. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use th

Re: [PR] [MAJOR] Equivalence System Overhaul [datafusion]

2025-06-03 Thread via GitHub
viirya commented on code in PR #16217: URL: https://github.com/apache/datafusion/pull/16217#discussion_r2124440817 ## datafusion/physical-expr-common/src/sort_expr.rs: ## @@ -516,162 +460,240 @@ impl Display for LexOrdering { } } -impl FromIterator for LexOrdering { -

Re: [PR] [MAJOR] Equivalence System Overhaul [datafusion]

2025-06-03 Thread via GitHub
viirya commented on code in PR #16217: URL: https://github.com/apache/datafusion/pull/16217#discussion_r2124418429 ## datafusion/physical-plan/src/sorts/sort.rs: ## @@ -845,7 +840,7 @@ pub struct SortExec { /// Fetch highest/lowest n results fetch: Option, /// Nor

Re: [PR] feat: add metadata to physical literal expressions [datafusion]

2025-06-03 Thread via GitHub
timsaucer commented on PR #16053: URL: https://github.com/apache/datafusion/pull/16053#issuecomment-2936285535 Closing in favor of https://github.com/apache/datafusion/pull/16170 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

Re: [PR] [MAJOR] Equivalence System Overhaul [datafusion]

2025-06-03 Thread via GitHub
viirya commented on code in PR #16217: URL: https://github.com/apache/datafusion/pull/16217#discussion_r2124399512 ## datafusion/sqllogictest/test_files/topk.slt: ## @@ -370,7 +370,7 @@ query TT explain select number, letter, age, number as column4, letter as column5 from part

Re: [PR] [MAJOR] Equivalence System Overhaul [datafusion]

2025-06-03 Thread via GitHub
viirya commented on code in PR #16217: URL: https://github.com/apache/datafusion/pull/16217#discussion_r2124375915 ## datafusion/physical-expr/src/equivalence/properties/mod.rs: ## @@ -190,382 +241,363 @@ impl EquivalenceProperties { &self.oeq_class } -/// Re

Re: [PR] Additional placeholder datatype inferencing [datafusion]

2025-06-03 Thread via GitHub
alamb commented on PR #15980: URL: https://github.com/apache/datafusion/pull/15980#issuecomment-2936183106 Marking as draft as I think this PR is no longer waiting on feedback and I am trying to make it easier to find PRs in need of review. Please mark it as ready for review when it is read

Re: [PR] minor: Refactor PhysicalPlanner::default() to avoid duplicate code [datafusion-comet]

2025-06-03 Thread via GitHub
andygrove merged PR #1821: URL: https://github.com/apache/datafusion-comet/pull/1821 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@

Re: [PR] feat: Allow cancelling of grouping operations which are CPU bound [datafusion]

2025-06-03 Thread via GitHub
pepijnve commented on PR #16196: URL: https://github.com/apache/datafusion/pull/16196#issuecomment-2936193853 @zhuqi-lucas great work. I've continued playing around with alternative structures in the meantime, but I keep coming back to your `YieldStream` as the most elegant solution. It's s

Re: [PR] [MAJOR] Equivalence System Overhaul [datafusion]

2025-06-03 Thread via GitHub
viirya commented on code in PR #16217: URL: https://github.com/apache/datafusion/pull/16217#discussion_r2124345816 ## datafusion/catalog/src/listing_schema.rs: ## @@ -143,7 +141,7 @@ impl ListingSchemaProvider { order_exprs: vec![],

Re: [PR] [MAJOR] Equivalence System Overhaul [datafusion]

2025-06-03 Thread via GitHub
viirya commented on code in PR #16217: URL: https://github.com/apache/datafusion/pull/16217#discussion_r2124345816 ## datafusion/catalog/src/listing_schema.rs: ## @@ -143,7 +141,7 @@ impl ListingSchemaProvider { order_exprs: vec![],

Re: [PR] Improve performance of constant aggregate window expression [datafusion]

2025-06-03 Thread via GitHub
alamb commented on PR #16234: URL: https://github.com/apache/datafusion/pull/16234#issuecomment-2936100737 🤖 `./gh_compare_branch.sh` [Benchmark Script](https://github.com/alamb/datafusion-benchmarking/blob/main/gh_compare_branch.sh) Running Linux aal-dev 6.11.0-1013-gcp #13~24.04.1-Ubun

Re: [PR] Improve performance of constant aggregate window expression [datafusion]

2025-06-03 Thread via GitHub
jonathanc-n commented on PR #16234: URL: https://github.com/apache/datafusion/pull/16234#issuecomment-2936099509 Those benchmarks look nice, seems to have been skewed on my computer for the other queries. -- This is an automated message from the Apache Git Service. To respond to the messa

Re: [PR] Improve performance of constant aggregate window expression [datafusion]

2025-06-03 Thread via GitHub
alamb commented on PR #16234: URL: https://github.com/apache/datafusion/pull/16234#issuecomment-2936099164 > │ QQuery 1 │ 1873.46ms │335.27ms │ +5.59x faster │ Quite nice! -- This is an automated message from the Apache Git Service. To respond to the message, please

[I] Add `output_bytes` metrics to Explain Analyze [datafusion]

2025-06-03 Thread via GitHub
PokIsemaine opened a new issue, #16244: URL: https://github.com/apache/datafusion/issues/16244 ### Is your feature request related to a problem or challenge? Currently, using `Explain Analyze` seems to only provide the metric for `output_rows`. Is it possible to add a metric for the n

Re: [PR] feat: Allow cancelling of grouping operations which are CPU bound [datafusion]

2025-06-03 Thread via GitHub
zhuqi-lucas commented on PR #16196: URL: https://github.com/apache/datafusion/pull/16196#issuecomment-2936001738 Finally, the CI greens again, i think i fixed all testing cases. Next steps is: 1. Update the solution for the corner cases. 2. Adding performance benchmark result. -- Th

[I] Avoid unnecessary uses of CopyExec in native plan [datafusion-comet]

2025-06-03 Thread via GitHub
andygrove opened a new issue, #1836: URL: https://github.com/apache/datafusion-comet/issues/1836 ### What is the problem the feature request solves? As pointed out in https://github.com/apache/datafusion-comet/pull/1793#issuecomment-2916690342, we currently always wrap ScanExec in a

Re: [PR] Improve performance of constant aggregate window expression [datafusion]

2025-06-03 Thread via GitHub
alamb commented on PR #16234: URL: https://github.com/apache/datafusion/pull/16234#issuecomment-2935960865 🤖: Benchmark completed Details ``` Comparing HEAD and constant_agg_window Benchmark h2o_window.json ┏━━

Re: [PR] feat: Add experimental auto mode for `COMET_PARQUET_SCAN_IMPL` [datafusion-comet]

2025-06-03 Thread via GitHub
andygrove commented on code in PR #1747: URL: https://github.com/apache/datafusion-comet/pull/1747#discussion_r2124216384 ## spark/src/main/scala/org/apache/comet/rules/CometScanRule.scala: ## @@ -93,21 +93,63 @@ case class CometScanRule(session: SparkSession) extends Rule[Spar

Re: [PR] feat: Add experimental auto mode for `COMET_PARQUET_SCAN_IMPL` [datafusion-comet]

2025-06-03 Thread via GitHub
andygrove commented on code in PR #1747: URL: https://github.com/apache/datafusion-comet/pull/1747#discussion_r2124215025 ## spark/src/main/scala/org/apache/comet/rules/CometScanRule.scala: ## @@ -93,21 +93,63 @@ case class CometScanRule(session: SparkSession) extends Rule[Spar

Re: [PR] feat: Add experimental auto mode for `COMET_PARQUET_SCAN_IMPL` [datafusion-comet]

2025-06-03 Thread via GitHub
andygrove commented on PR #1747: URL: https://github.com/apache/datafusion-comet/pull/1747#issuecomment-2935920067 @parthchandra @mbutrovich Could I get a review? I changed the scope to adding the "auto" option without changing the default. There is a manual workflow where we can run the S

Re: [I] Enabling Test "Runtime bloom filter join: do not add bloom filter if dpp filter exists on the same column" fails with IllegalStateException in AdaptiveSparkPlanExec.newQueryStage [datafusion-c

2025-06-03 Thread via GitHub
andygrove closed issue #1831: Enabling Test "Runtime bloom filter join: do not add bloom filter if dpp filter exists on the same column" fails with IllegalStateException in AdaptiveSparkPlanExec.newQueryStage URL: https://github.com/apache/datafusion-comet/issues/1831 -- This is an automated

Re: [PR] fix: Enable more Spark SQL tests [datafusion-comet]

2025-06-03 Thread via GitHub
andygrove merged PR #1834: URL: https://github.com/apache/datafusion-comet/pull/1834 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@

Re: [PR] fix: support `map_values` [datafusion-comet]

2025-06-03 Thread via GitHub
codecov-commenter commented on PR #1835: URL: https://github.com/apache/datafusion-comet/pull/1835#issuecomment-2935878217 ## [Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/1835?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_ca

Re: [PR] Improve performance of constant aggregate window expression [datafusion]

2025-06-03 Thread via GitHub
alamb commented on PR #16234: URL: https://github.com/apache/datafusion/pull/16234#issuecomment-2935875381 🤖 `./gh_compare_branch.sh` [Benchmark Script](https://github.com/alamb/datafusion-benchmarking/blob/main/gh_compare_branch.sh) Running Linux aal-dev 6.11.0-1013-gcp #13~24.04.1-Ubun

Re: [PR] Introduce Async User Defined Functions [datafusion]

2025-06-03 Thread via GitHub
alamb commented on PR #14837: URL: https://github.com/apache/datafusion/pull/14837#issuecomment-2935875166 🤖: Benchmark completed Details ``` group epic_async-udf main -

Re: [I] Iceberg integration - parquet-column version conflicts [datafusion-comet]

2025-06-03 Thread via GitHub
snmvaughan commented on issue #1833: URL: https://github.com/apache/datafusion-comet/issues/1833#issuecomment-2935873947 If it isn't going to be relocated, I'd suggest we shadow those dependencies and allow Maven's dependency management handle the things -- This is an automated message f

  1   2   >