Re: [PR] Refactor substrait producer into multiple files [datafusion]

2025-05-20 Thread via GitHub
Blizzara commented on PR #16089: URL: https://github.com/apache/datafusion/pull/16089#issuecomment-2893532984 Thanks! I took a cursory look as well, looked good 👌 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the UR

[PR] feat: coerce from fixed size binary to binary view [datafusion]

2025-05-20 Thread via GitHub
chenkovsky opened a new pull request, #16110: URL: https://github.com/apache/datafusion/pull/16110 ## Which issue does this PR close? - Closes #15755. ## Rationale for this change cannot convert FixedSizeBinary to BinaryView ## What changes are included in this PR?

Re: [PR] fix: stack overflow for substrait functions with large argument lists that translate to DataFusion binary operators [datafusion]

2025-05-20 Thread via GitHub
fmonjalet commented on PR #16031: URL: https://github.com/apache/datafusion/pull/16031#issuecomment-2893193784 Thanks everyone for the review and the merge! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [PR] Schema adapter helper [datafusion]

2025-05-20 Thread via GitHub
kosiew commented on code in PR #16108: URL: https://github.com/apache/datafusion/pull/16108#discussion_r2097676700 ## datafusion/datasource/src/schema_adapter.rs: ## @@ -248,29 +267,11 @@ impl SchemaAdapter for DefaultSchemaAdapter { &self, file_schema: &Schema

Re: [I] General framework to decorrelate the subqueries [datafusion]

2025-05-20 Thread via GitHub
irenjj commented on issue #5492: URL: https://github.com/apache/datafusion/issues/5492#issuecomment-2894078258 > Thank you everyone for your opinions. Looks like my implementation is trying to wrap everything inside a single optimizor, which is hard to follow and reduces space for collabora

Re: [PR] Add support for table valued functions for SQL Server [datafusion-sqlparser-rs]

2025-05-20 Thread via GitHub
aharpervc commented on code in PR #1839: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1839#discussion_r2098125359 ## src/parser/mod.rs: ## @@ -5204,19 +5204,79 @@ impl<'a> Parser<'a> { let (name, args) = self.parse_create_function_name_and_params()?;

[PR] fix: default values for experimental native scans [datafusion-comet]

2025-05-20 Thread via GitHub
mbutrovich opened a new pull request, #1756: URL: https://github.com/apache/datafusion-comet/pull/1756 This only works for native_datafusion. Draft while I circle back to native_iceberg_compat and increase testing. ## Which issue does this PR close? Closes #1750.

[I] Schema contains qualified field name left."concat('a', 'b')" and unqualified field name "concat('a', 'b')" which would be ambiguous [datafusion]

2025-05-20 Thread via GitHub
LiaCastaneda opened a new issue, #16114: URL: https://github.com/apache/datafusion/issues/16114 ### Describe the bug Logical planning fails with the following error: ` Error: Schema error: Schema contains qualified field name left."concat('a', 'b')" and unqualified field name "conc

Re: [PR] chore: Prepare for DataFusion 48.0.0 [datafusion-comet]

2025-05-20 Thread via GitHub
andygrove commented on code in PR #1710: URL: https://github.com/apache/datafusion-comet/pull/1710#discussion_r2098159988 ## native/core/src/execution/planner.rs: ## @@ -884,7 +884,7 @@ impl PhysicalPlanner { func_name, fun_expr,

Re: [PR] chore: Prepare for DataFusion 48.0.0 [datafusion-comet]

2025-05-20 Thread via GitHub
andygrove commented on code in PR #1710: URL: https://github.com/apache/datafusion-comet/pull/1710#discussion_r2098163102 ## native/core/src/parquet/parquet_exec.rs: ## @@ -78,8 +78,8 @@ pub(crate) fn init_datasource_exec( )) }); -if let (Some(fil

Re: [PR] chore: Prepare for DataFusion 48.0.0 [datafusion-comet]

2025-05-20 Thread via GitHub
andygrove commented on code in PR #1710: URL: https://github.com/apache/datafusion-comet/pull/1710#discussion_r2098161197 ## native/core/src/execution/planner.rs: ## @@ -2188,6 +2188,12 @@ impl PhysicalPlanner { .coerce_types(&input_expr_types)

Re: [I] [substrait] Build basic test suite to validate produced Substrait plans [datafusion]

2025-05-20 Thread via GitHub
alamb commented on issue #15069: URL: https://github.com/apache/datafusion/issues/15069#issuecomment-2894772297 > Don't enforce Substrait validation, and rely on contributors manually running them with the --substrait-round-trip flag We could potentially start with this approach and t

Re: [PR] Perf: Optimize in memory sort [datafusion]

2025-05-20 Thread via GitHub
alamb commented on PR #15380: URL: https://github.com/apache/datafusion/pull/15380#issuecomment-2894778449 🤖 `./gh_compare_branch.sh` [Benchmark Script](https://github.com/alamb/datafusion-benchmarking/blob/main/gh_compare_branch.sh) Running Linux aal-dev 6.11.0-1013-gcp #13~24.04.1-Ubun

Re: [PR] Use qualified names on DELETE selections [datafusion]

2025-05-20 Thread via GitHub
alamb commented on PR #16033: URL: https://github.com/apache/datafusion/pull/16033#issuecomment-2894781403 > This is something we should probably add to the DataFusion documentation somewhere 🤔 I made a PR to clarify the status here: - https://github.com/apache/datafusion/pull/161

Re: [PR] Perf: Optimize in memory sort [datafusion]

2025-05-20 Thread via GitHub
Dandandan commented on PR #15380: URL: https://github.com/apache/datafusion/pull/15380#issuecomment-2894800268 Looking at the earlier result > │ Q1 │ 333.47ms │375.58ms │ 1.13x slower │ This is the query ``` SELECT l_linenumber, l_partkey

[I] Support `VARIANT` type for unstructured data [datafusion]

2025-05-20 Thread via GitHub
alamb opened a new issue, #16116: URL: https://github.com/apache/datafusion/issues/16116 ### Is your feature request related to a problem or challenge? Processing semi-structured data (basically think anything that can be represented in JSON) efficiently is becoming more and more impo

[PR] Test Duration in `fuzz` tests [datafusion]

2025-05-20 Thread via GitHub
alamb opened a new pull request, #16111: URL: https://github.com/apache/datafusion/pull/16111 ## Which issue does this PR close? - Related to https://github.com/apache/datafusion/pull/15748 ## Rationale for this change I noticed there was no coverage for Duration in the a

Re: [PR] Update documentation for `datafusion.execution.collect_statistics` [datafusion]

2025-05-20 Thread via GitHub
alamb commented on PR #16100: URL: https://github.com/apache/datafusion/pull/16100#issuecomment-2894408300 Thank @leoyvens @xudong963 and @findepi -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] Support `GroupsAccumulator` for Avg duration [datafusion]

2025-05-20 Thread via GitHub
alamb commented on PR #15748: URL: https://github.com/apache/datafusion/pull/15748#issuecomment-2894406095 > │ QQuery 4 │ 589.62ms │671.68ms │ 1.14x slower │ Given Q4 seems to be unrelated to avg: https://github.com/apache/datafusion/blob/1a3917545c34e162272af9200da12

Re: [PR] Update documentation for `datafusion.execution.collect_statistics` [datafusion]

2025-05-20 Thread via GitHub
alamb merged PR #16100: URL: https://github.com/apache/datafusion/pull/16100 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] pretty-print CREATE VIEW statements [datafusion-sqlparser-rs]

2025-05-20 Thread via GitHub
lovasoa commented on PR #1855: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1855#issuecomment-2893941234 ✅ done -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific commen

Re: [PR] chore: Reduce repetition in the parameter type inference tests [datafusion]

2025-05-20 Thread via GitHub
alamb commented on code in PR #16079: URL: https://github.com/apache/datafusion/pull/16079#discussion_r2098000742 ## datafusion/sql/tests/sql_integration.rs: ## @@ -55,6 +55,35 @@ use sqlparser::dialect::{Dialect, GenericDialect, HiveDialect, MySqlDialect}; mod cases; mod com

Re: [PR] Revert use file schema in parquet pruning [datafusion]

2025-05-20 Thread via GitHub
alamb commented on code in PR #16086: URL: https://github.com/apache/datafusion/pull/16086#discussion_r2098034125 ## datafusion/datasource-parquet/src/opener.rs: ## @@ -178,7 +182,7 @@ impl FileOpener for ParquetOpener { // Build predicates for this specific file

Re: [PR] fix: correct schema type checking in native_iceberg_compat [datafusion-comet]

2025-05-20 Thread via GitHub
andygrove commented on code in PR #1755: URL: https://github.com/apache/datafusion-comet/pull/1755#discussion_r2098074403 ## common/src/main/java/org/apache/comet/parquet/NativeBatchReader.java: ## @@ -229,7 +232,13 @@ public NativeBatchReader(AbstractColumnReader[] columnReade

Re: [PR] fix: correct schema type checking in native_iceberg_compat [datafusion-comet]

2025-05-20 Thread via GitHub
andygrove commented on code in PR #1755: URL: https://github.com/apache/datafusion-comet/pull/1755#discussion_r2098075383 ## common/src/main/java/org/apache/comet/parquet/NativeBatchReader.java: ## @@ -332,9 +346,8 @@ public void init() throws URISyntaxException, IOException {

Re: [PR] Add support for table valued functions for SQL Server [datafusion-sqlparser-rs]

2025-05-20 Thread via GitHub
aharpervc commented on code in PR #1839: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1839#discussion_r2098076680 ## src/parser/mod.rs: ## @@ -5204,19 +5204,79 @@ impl<'a> Parser<'a> { let (name, args) = self.parse_create_function_name_and_params()?;

Re: [PR] Revert use file schema in parquet pruning [datafusion]

2025-05-20 Thread via GitHub
adriangb commented on code in PR #16086: URL: https://github.com/apache/datafusion/pull/16086#discussion_r2098086027 ## datafusion/datasource-parquet/src/opener.rs: ## @@ -178,7 +182,7 @@ impl FileOpener for ParquetOpener { // Build predicates for this specific file

Re: [PR] Handle union schema name coercion [datafusion]

2025-05-20 Thread via GitHub
gabotechs commented on code in PR #16064: URL: https://github.com/apache/datafusion/pull/16064#discussion_r2098169086 ## datafusion/core/src/physical_planner.rs: ## @@ -2711,6 +2724,47 @@ mod tests { assert_eq!(col.name(), "metric:avg"); } + +#[tokio::test] +

Re: [I] Doc changes to provide steps to run individual unit tests in Apache-Spark [datafusion-comet]

2025-05-20 Thread via GitHub
andygrove closed issue #1751: Doc changes to provide steps to run individual unit tests in Apache-Spark URL: https://github.com/apache/datafusion-comet/issues/1751 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL a

Re: [PR] docs: Add instructions for running individual Spark SQL tests from sbt [datafusion-comet]

2025-05-20 Thread via GitHub
andygrove merged PR #1752: URL: https://github.com/apache/datafusion-comet/pull/1752 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@

Re: [PR] feat: support min/max for struct [datafusion]

2025-05-20 Thread via GitHub
chenkovsky commented on code in PR #15667: URL: https://github.com/apache/datafusion/pull/15667#discussion_r2097992883 ## datafusion/functions-aggregate/src/min_max.rs: ## @@ -619,6 +625,45 @@ fn min_batch(values: &ArrayRef) -> Result { }) } +fn min_max_batch_struct(arra

Re: [PR] Add support for `GO` batch delimiter in SQL Server [datafusion-sqlparser-rs]

2025-05-20 Thread via GitHub
aharpervc commented on PR #1809: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1809#issuecomment-2894713954 > Not sure I necessarily agree with this sentiment. I think if there are constructs in a file that arent sql constructs, they should be stripped off from the file befor

Re: [PR] Handle union schema name coercion [datafusion]

2025-05-20 Thread via GitHub
gabotechs commented on code in PR #16064: URL: https://github.com/apache/datafusion/pull/16064#discussion_r2098183006 ## datafusion/core/src/physical_planner.rs: ## @@ -2711,6 +2724,47 @@ mod tests { assert_eq!(col.name(), "metric:avg"); } + +#[tokio::test] +

[PR] Improve the DML / DDL Documentation [datafusion]

2025-05-20 Thread via GitHub
alamb opened a new pull request, #16115: URL: https://github.com/apache/datafusion/pull/16115 ## Which issue does this PR close? - Follow on to https://github.com/apache/datafusion/pull/16033 ## Rationale for this change While reviewing https://github.com/apache/d

Re: [PR] Revert use file schema in parquet pruning [datafusion]

2025-05-20 Thread via GitHub
alamb commented on code in PR #16086: URL: https://github.com/apache/datafusion/pull/16086#discussion_r2098220180 ## datafusion/datasource-parquet/src/opener.rs: ## @@ -178,7 +182,7 @@ impl FileOpener for ParquetOpener { // Build predicates for this specific file

[PR] Minor: Add `Accumulator::return_type` to help with transition [datafusion]

2025-05-20 Thread via GitHub
alamb opened a new pull request, #16112: URL: https://github.com/apache/datafusion/pull/16112 - Draft as it builds on https://github.com/apache/datafusion/pull/15748/files ## Which issue does this PR close? - Follow up to https://github.com/apache/datafusion/pull/15911

Re: [PR] Support `GroupsAccumulator` for Avg duration [datafusion]

2025-05-20 Thread via GitHub
logan-keede commented on PR #15748: URL: https://github.com/apache/datafusion/pull/15748#issuecomment-2894390449 > QQuery 4 │ 589.62ms │671.68ms │ 1.14x slower < is this a cause of concern? -- This is an automated message from the Apache Git Service. To respond

Re: [PR] Example for using a separate threadpool for CPU bound work (try 2) [datafusion]

2025-05-20 Thread via GitHub
MarcPerrier commented on code in PR #14286: URL: https://github.com/apache/datafusion/pull/14286#discussion_r2097962631 ## datafusion-examples/examples/thread_pools_lib/dedicated_executor.rs: ## @@ -0,0 +1,1778 @@ +// Licensed to the Apache Software Foundation (ASF) under one +/

Re: [PR] [POC] feat: Add datafusion-storage [datafusion]

2025-05-20 Thread via GitHub
alamb commented on code in PR #15018: URL: https://github.com/apache/datafusion/pull/15018#discussion_r1994278280 ## datafusion/storage/src/write.rs: ## @@ -0,0 +1,79 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. S

[PR] Minor: Add `ScalarFunctionArgs::return_type` method [datafusion]

2025-05-20 Thread via GitHub
alamb opened a new pull request, #16113: URL: https://github.com/apache/datafusion/pull/16113 ## Which issue does this PR close? - Closes #. ## Rationale for this change - Similarly to https://github.com/apache/datafusion/pull/16112 I think it would be help

Re: [PR] Test Duration in aggregation `fuzz` tests [datafusion]

2025-05-20 Thread via GitHub
zhuqi-lucas commented on code in PR #16111: URL: https://github.com/apache/datafusion/pull/16111#discussion_r2097915073 ## test-utils/src/array_gen/random_data.rs: ## @@ -100,6 +106,15 @@ impl RandomNativeData for IntervalMonthDayNanoType { } } +// Restrict Duration(Seco

Re: [PR] chore(deps): bump apache-avro from 0.17.0 to 0.18.0 [datafusion]

2025-05-20 Thread via GitHub
alamb commented on PR #16092: URL: https://github.com/apache/datafusion/pull/16092#issuecomment-2894613254 🤔 looks like a real test failure that needs to be investigated -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

Re: [I] [substrait] Build basic test suite to validate produced Substrait plans [datafusion]

2025-05-20 Thread via GitHub
gabotechs commented on issue #15069: URL: https://github.com/apache/datafusion/issues/15069#issuecomment-2894865344 > The only worry I have is that the project will not be completed and then the code to do Substrait validation would be abandoned At least at DataDog we have a big inter

Re: [PR] Example for using a separate threadpool for CPU bound work (try 2) [datafusion]

2025-05-20 Thread via GitHub
alamb commented on code in PR #14286: URL: https://github.com/apache/datafusion/pull/14286#discussion_r2098291930 ## datafusion-examples/examples/thread_pools_lib/dedicated_executor.rs: ## @@ -0,0 +1,1778 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or m

Re: [PR] chore: Prepare for DataFusion 48.0.0 [datafusion-comet]

2025-05-20 Thread via GitHub
kazuyukitanimura commented on code in PR #1710: URL: https://github.com/apache/datafusion-comet/pull/1710#discussion_r2098402731 ## native/core/src/execution/planner.rs: ## @@ -884,7 +884,7 @@ impl PhysicalPlanner { func_name, fun_expr,

Re: [PR] fix: Add coercion rules for Float16 types [datafusion]

2025-05-20 Thread via GitHub
alamb merged PR #15816: URL: https://github.com/apache/datafusion/pull/15816 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [I] Type coercion does not handle `Float16` correctly [datafusion]

2025-05-20 Thread via GitHub
alamb closed issue #15815: Type coercion does not handle `Float16` correctly URL: https://github.com/apache/datafusion/issues/15815 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific commen

Re: [PR] Use qualified names on DELETE selections [datafusion]

2025-05-20 Thread via GitHub
alamb commented on PR #16033: URL: https://github.com/apache/datafusion/pull/16033#issuecomment-2895133818 The CI failure is unrelated to this PR - https://github.com/apache/datafusion/issues/16117 So merging in. Thanks @nuno-faria and @comphead -- This is an automated m

Re: [PR] Perf: Optimize in memory sort [datafusion]

2025-05-20 Thread via GitHub
alamb commented on PR #15380: URL: https://github.com/apache/datafusion/pull/15380#issuecomment-2894965896 🤖 `./gh_compare_branch.sh` [Benchmark Script](https://github.com/alamb/datafusion-benchmarking/blob/main/gh_compare_branch.sh) Running Linux aal-dev 6.11.0-1013-gcp #13~24.04.1-Ubun

Re: [PR] Perf: Optimize in memory sort [datafusion]

2025-05-20 Thread via GitHub
alamb commented on PR #15380: URL: https://github.com/apache/datafusion/pull/15380#issuecomment-2894965327 🤖: Benchmark completed Details ``` Comparing HEAD and concat_batches_for_sort Benchmark clickbench_extended.json --

Re: [PR] pretty-print CREATE TABLE statements [datafusion-sqlparser-rs]

2025-05-20 Thread via GitHub
alamb merged PR #1854: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1854 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr.

[I] mac CI tests failing with `fatal runtime error: stack overflow` [datafusion]

2025-05-20 Thread via GitHub
alamb opened a new issue, #16117: URL: https://github.com/apache/datafusion/issues/16117 ### Describe the bug CI is failing on main and some PRs https://github.com/apache/datafusion/actions/runs/15138796927/job/42557036842 ``` thread 'tokio-runtime-worker' has over

Re: [PR] Use qualified names on DELETE selections [datafusion]

2025-05-20 Thread via GitHub
alamb merged PR #16033: URL: https://github.com/apache/datafusion/pull/16033 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [D] Indexing Support in DataFusion? [datafusion]

2025-05-20 Thread via GitHub
GitHub user Epicism deleted a comment on the discussion: Indexing Support in DataFusion? This is amazing! I can't wait to go through this. > Message ID: github.com> > GitHub link: https://github.com/apache/datafusion/discussions/9963#discussioncomment-13210271 This is an automaticall

Re: [D] Indexing Support in DataFusion? [datafusion]

2025-05-20 Thread via GitHub
GitHub user Epicism added a comment to the discussion: Indexing Support in DataFusion? This is amazing! I can't wait to go through this. > Message ID: github.com> > GitHub link: https://github.com/apache/datafusion/discussions/9963#discussioncomment-13210271 This is an automatically

Re: [PR] chore(deps): bump testcontainers-modules from 0.11.6 to 0.12.0 [datafusion]

2025-05-20 Thread via GitHub
alamb closed pull request #16107: chore(deps): bump testcontainers-modules from 0.11.6 to 0.12.0 URL: https://github.com/apache/datafusion/pull/16107 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] chore(deps): bump testcontainers-modules from 0.11.6 to 0.12.0 [datafusion]

2025-05-20 Thread via GitHub
dependabot[bot] commented on PR #16107: URL: https://github.com/apache/datafusion/pull/16107#issuecomment-2895243129 OK, I won't notify you again about this release, but will get in touch when a new version is available. If you'd rather skip all updates until the next major or minor version

Re: [PR] chore(deps): bump testcontainers from 0.23.3 to 0.24.0 [datafusion]

2025-05-20 Thread via GitHub
alamb merged PR #15989: URL: https://github.com/apache/datafusion/pull/15989 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] feat: make error handling in indent explain consistent with that in tree [datafusion]

2025-05-20 Thread via GitHub
alamb merged PR #16097: URL: https://github.com/apache/datafusion/pull/16097 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] fix: Fallback to Spark when PARQUET_FIELD_ID_READ_ENABLED=true for new native scans [datafusion-comet]

2025-05-20 Thread via GitHub
codecov-commenter commented on PR #1757: URL: https://github.com/apache/datafusion-comet/pull/1757#issuecomment-2895250157 ## [Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/1757?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_ca

Re: [PR] chore: Enable more complex type tests [datafusion-comet]

2025-05-20 Thread via GitHub
parthchandra commented on PR #1753: URL: https://github.com/apache/datafusion-comet/pull/1753#issuecomment-2895323808 Late approval. Thanks for the new tests! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL a

Re: [PR] fix: Fallback to Spark when PARQUET_FIELD_ID_READ_ENABLED=true for new native scans [datafusion-comet]

2025-05-20 Thread via GitHub
andygrove commented on PR #1757: URL: https://github.com/apache/datafusion-comet/pull/1757#issuecomment-2895355782 `SQLConf.PARQUET_FIELD_ID_READ_ENABLED` is enabled in **all** Spark tests, so not sure what to do about this now. -- This is an automated message from the Apache Git Service

Re: [PR] chore: Add `scanImpl` attribute to `CometScanExec` [datafusion-comet]

2025-05-20 Thread via GitHub
andygrove merged PR #1746: URL: https://github.com/apache/datafusion-comet/pull/1746 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@

Re: [PR] chore: Prepare for DataFusion 48.0.0 [datafusion-comet]

2025-05-20 Thread via GitHub
andygrove commented on PR #1710: URL: https://github.com/apache/datafusion-comet/pull/1710#issuecomment-2895677097 > Is it possible to have arrow-rs 55.1.0 in datafusion 48.0.0.? A performance improvement went in for int8/int16 which was as a result of the unsigned int issues we raised. Th

Re: [PR] Enhance Schema adapter to accommodate evolving struct [datafusion]

2025-05-20 Thread via GitHub
TheBuilderJR commented on PR #15295: URL: https://github.com/apache/datafusion/pull/15295#issuecomment-2895687205 @kosiew that sounds good but please keep this working branch around on the `schema-adapter` branch. Maybe you can cherry-pick the changes onto a new branch when you break these

Re: [I] Move prepare/parameter handling tests into their own module [datafusion]

2025-05-20 Thread via GitHub
liamzwbao commented on issue #16056: URL: https://github.com/apache/datafusion/issues/16056#issuecomment-2896214930 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[PR] [datafusion-spark] Implement `factorical` function [datafusion]

2025-05-20 Thread via GitHub
tlm365 opened a new pull request, #16125: URL: https://github.com/apache/datafusion/pull/16125 ## Which issue does this PR close? - Closes #16124 . ## Rationale for this change ## What changes are included in this PR? Implement spark-compatible `factori

[I] [datafusion-spark] Implement `factorial` function [datafusion]

2025-05-20 Thread via GitHub
tlm365 opened a new issue, #16124: URL: https://github.com/apache/datafusion/issues/16124 ### Is your feature request related to a problem or challenge? - Part of #15914 ### Describe the solution you'd like Implement spark-compatible [factorial](https://spark.apache.org

[PR] chore: Use pre created data for filter pushdown tests [datafusion]

2025-05-20 Thread via GitHub
comphead opened a new pull request, #16123: URL: https://github.com/apache/datafusion/pull/16123 ## Which issue does this PR close? - Closes #. ## Rationale for this change When working on #16062 I found random data generators based on `random` crate are prone to hav

Re: [I] commit 304488d3... (2025-02-05) broke JOIN ... USING("UPPERCASE_FIELD_NAME") [datafusion]

2025-05-20 Thread via GitHub
jfahne commented on issue #16120: URL: https://github.com/apache/datafusion/issues/16120#issuecomment-2895962861 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To uns

Re: [PR] chore: Prepare for DataFusion 48.0.0 [datafusion-comet]

2025-05-20 Thread via GitHub
andygrove merged PR #1710: URL: https://github.com/apache/datafusion-comet/pull/1710 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@

Re: [PR] fix: correct schema type checking in native_iceberg_compat [datafusion-comet]

2025-05-20 Thread via GitHub
kazuyukitanimura commented on code in PR #1755: URL: https://github.com/apache/datafusion-comet/pull/1755#discussion_r2098991685 ## spark/src/test/scala/org/apache/comet/parquet/ParquetReadSuite.scala: ## @@ -1233,7 +1233,9 @@ abstract class ParquetReadSuite extends CometTestBas

Re: [PR] chore: Upgrade rand crate and some other minor crates [datafusion]

2025-05-20 Thread via GitHub
comphead commented on PR #16062: URL: https://github.com/apache/datafusion/pull/16062#issuecomment-2895914995 depends on #16123 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comme

Re: [PR] chore: Use materialized data for filter pushdown tests [datafusion]

2025-05-20 Thread via GitHub
comphead commented on code in PR #16123: URL: https://github.com/apache/datafusion/pull/16123#discussion_r2098950624 ## datafusion/core/tests/parquet/filter_pushdown.rs: ## @@ -32,50 +32,41 @@ use arrow::compute::concat_batches; use arrow::record_batch::RecordBatch; use datafu

Re: [PR] fix: default values for experimental native_datafusion scan [datafusion-comet]

2025-05-20 Thread via GitHub
mbutrovich commented on code in PR #1756: URL: https://github.com/apache/datafusion-comet/pull/1756#discussion_r2098980947 ## native/core/src/execution/planner.rs: ## @@ -1108,6 +1108,44 @@ impl PhysicalPlanner { .map(|expr| self.create_expr(expr, Arc::clon

Re: [PR] fix: Fallback to Spark when PARQUET_FIELD_ID_READ_ENABLED=true for new native scans [datafusion-comet]

2025-05-20 Thread via GitHub
comphead commented on code in PR #1757: URL: https://github.com/apache/datafusion-comet/pull/1757#discussion_r2098944769 ## spark/src/main/scala/org/apache/comet/rules/CometScanRule.scala: ## @@ -41,36 +41,44 @@ import org.apache.comet.parquet.{CometParquetScan, SupportsComet}

Re: [PR] chore: Reenable nested types for CometFuzzTestSuite with int96 [datafusion-comet]

2025-05-20 Thread via GitHub
andygrove commented on PR #1761: URL: https://github.com/apache/datafusion-comet/pull/1761#issuecomment-2896038701 Perhaps this PR should also remove this item from the compatibility guide? ``` - Reading legacy INT96 timestamps contained within complex types can produce different

Re: [PR] feat: Add support for `expm1` expression from `datafusion-spark` crate [datafusion-comet]

2025-05-20 Thread via GitHub
codecov-commenter commented on PR #1711: URL: https://github.com/apache/datafusion-comet/pull/1711#issuecomment-2896081936 ## [Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/1711?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_ca

Re: [PR] chore: Reenable nested types for CometFuzzTestSuite with int96 [datafusion-comet]

2025-05-20 Thread via GitHub
codecov-commenter commented on PR #1761: URL: https://github.com/apache/datafusion-comet/pull/1761#issuecomment-2896101349 ## [Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/1761?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_ca

Re: [PR] docs: Fix typos and minor grammatical issues in Architecture docs [datafusion]

2025-05-20 Thread via GitHub
comphead commented on code in PR #16119: URL: https://github.com/apache/datafusion/pull/16119#discussion_r2098954205 ## datafusion/core/src/lib.rs: ## @@ -488,16 +488,16 @@ //! DataFusion automatically runs each plan with multiple CPU cores using //! a [Tokio] [`Runtime`] as a

Re: [PR] chore: Reenable nested types for CometFuzzTestSuite with int96 [datafusion-comet]

2025-05-20 Thread via GitHub
mbutrovich commented on PR #1761: URL: https://github.com/apache/datafusion-comet/pull/1761#issuecomment-2895978064 This should be ready for review now. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

Re: [PR] docs: Fix typos and minor grammatical issues in Architecture docs [datafusion]

2025-05-20 Thread via GitHub
comphead commented on code in PR #16119: URL: https://github.com/apache/datafusion/pull/16119#discussion_r2098957264 ## datafusion/core/src/lib.rs: ## @@ -311,9 +311,9 @@ //! ``` //! //! A [`TableProvider`] provides information for planning and -//! an [`ExecutionPlan`]s for

Re: [PR] pretty-print CREATE VIEW statements [datafusion-sqlparser-rs]

2025-05-20 Thread via GitHub
iffyio merged PR #1855: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1855 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr

Re: [PR] Handle optional datatypes properly in `CREATE FUNCTION` statements [datafusion-sqlparser-rs]

2025-05-20 Thread via GitHub
iffyio merged PR #1826: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1826 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr

Re: [PR] Add support for `GO` batch delimiter in SQL Server [datafusion-sqlparser-rs]

2025-05-20 Thread via GitHub
iffyio commented on PR #1809: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1809#issuecomment-2896488108 The illustration makes a lot of sense to me, thanks @aharpervc! I think we should be able to go with the current idea in this PR. Could you take a look at the conflicts in

[PR] Phillip/250521 fix sort unproject unparser upstream [datafusion]

2025-05-20 Thread via GitHub
phillipleblanc opened a new pull request, #16127: URL: https://github.com/apache/datafusion/pull/16127 ## Which issue does this PR close? - Closes #16126 ## Rationale for this change DataFusion turns aggregation computations from a LogicalPlan node into column references

Re: [PR] Add support for `GO` batch delimiter in SQL Server [datafusion-sqlparser-rs]

2025-05-20 Thread via GitHub
iffyio commented on code in PR #1809: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1809#discussion_r2099256280 ## src/parser/mod.rs: ## @@ -475,6 +475,10 @@ impl<'a> Parser<'a> { if expecting_statement_delimiter && word.keyword == Keyword::E

Re: [PR] Set `TrackConsumersPool` as default in datafusion-cli [datafusion]

2025-05-20 Thread via GitHub
Copilot commented on code in PR #16081: URL: https://github.com/apache/datafusion/pull/16081#discussion_r2099329043 ## datafusion-cli/src/main.rs: ## @@ -169,9 +179,22 @@ async fn main_inner() -> Result<()> { if let Some(memory_limit) = args.memory_limit { // set m

Re: [PR] fix: coerce int96 resolution inside of list, struct, and map types [datafusion]

2025-05-20 Thread via GitHub
alamb commented on PR #16058: URL: https://github.com/apache/datafusion/pull/16058#issuecomment-2893895362 > @alamb I'd like to go ahead and merge this one if there are no objections No objections here -- This is an automated message from the Apache Git Service. To respond to the me

Re: [PR] fix: coerce int96 resolution inside of list, struct, and map types [datafusion]

2025-05-20 Thread via GitHub
alamb merged PR #16058: URL: https://github.com/apache/datafusion/pull/16058 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [I] Parquet: coerce_int96 does not work for int96 in nested types [datafusion]

2025-05-20 Thread via GitHub
alamb closed issue #15763: Parquet: coerce_int96 does not work for int96 in nested types URL: https://github.com/apache/datafusion/issues/15763 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the sp

Re: [PR] fix: coerce int96 resolution inside of list, struct, and map types [datafusion]

2025-05-20 Thread via GitHub
alamb commented on PR #16058: URL: https://github.com/apache/datafusion/pull/16058#issuecomment-2893895602 Thanks everyone!@ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[PR] Schema adapter helper [datafusion]

2025-05-20 Thread via GitHub
kosiew opened a new pull request, #16108: URL: https://github.com/apache/datafusion/pull/16108 ## Which issue does this PR close? This is part of a series of PRs re-implementing #15295 to close #14657 by adding schema‐evolution support for listing‐based tables with nested structs in

[PR] chore(deps): bump testcontainers-modules from 0.11.6 to 0.12.0 [datafusion]

2025-05-20 Thread via GitHub
dependabot[bot] opened a new pull request, #16107: URL: https://github.com/apache/datafusion/pull/16107 Bumps [testcontainers-modules](https://github.com/testcontainers/testcontainers-rs-modules-community) from 0.11.6 to 0.12.0. Release notes Sourced from https://github.com/testco

Re: [PR] Schema adapter helper [datafusion]

2025-05-20 Thread via GitHub
kosiew commented on code in PR #16108: URL: https://github.com/apache/datafusion/pull/16108#discussion_r2097676700 ## datafusion/datasource/src/schema_adapter.rs: ## @@ -248,29 +267,11 @@ impl SchemaAdapter for DefaultSchemaAdapter { &self, file_schema: &Schema

[PR] Clean up ExternalSorter and use upstream converter [datafusion]

2025-05-20 Thread via GitHub
alamb opened a new pull request, #16109: URL: https://github.com/apache/datafusion/pull/16109 ## Which issue does this PR close? - Closes #. ## Rationale for this change While reviewing https://github.com/apache/arrow-rs/pull/7530 from @Dandandan I noticed some

Re: [PR] Support `GroupsAccumulator` for Avg duration [datafusion]

2025-05-20 Thread via GitHub
alamb commented on PR #15748: URL: https://github.com/apache/datafusion/pull/15748#issuecomment-2893953391 ``` Benchmark clickbench_extended.json ┏━━┳┳━┳━━━┓ ┃ Query┃ HEA

Re: [PR] Clean up ExternalSorter and use upstream converter [datafusion]

2025-05-20 Thread via GitHub
alamb commented on code in PR #16109: URL: https://github.com/apache/datafusion/pull/16109#discussion_r2097692753 ## datafusion/physical-plan/src/sorts/sort.rs: ## @@ -206,8 +201,6 @@ struct ExternalSorter { schema: SchemaRef, /// Sort expressions expr: Arc<[Physi

[I] `CollectLeft` / "right deep tree" optimization not triggered for join between 3 or more delta tables [datafusion]

2025-05-20 Thread via GitHub
aditanase opened a new issue, #16106: URL: https://github.com/apache/datafusion/issues/16106 ### Describe the bug I have a simple use case for a star-schema join between a facts table and 2 metadata tables (one of them small, another one larger but would still make sense to collect i

Re: [PR] Enhance Schema adapter to accommodate evolving struct [datafusion]

2025-05-20 Thread via GitHub
kosiew commented on PR #15295: URL: https://github.com/apache/datafusion/pull/15295#issuecomment-2893829674 I propose to close this large PR and re-implement as: ## PR 1: Extract and test core SchemaAdapter helpers **Description** Pull the field‐mapping logic out of `Defaul

  1   2   >