Re: [I] Attach `Diagnostic` to "wrong number of arguments" error [datafusion]

2025-03-26 Thread via GitHub
Chen-Yuan-Lai commented on issue #14432: URL: https://github.com/apache/datafusion/issues/14432#issuecomment-2756926856 @eliaperantoni, sorry for the long delay. I pushed a rough implementation for the issue, I noticed that you are working on the `FnCallSpans` feature (#15276), which would

[PR] Attach diagnostic for wrong arg number error [datafusion]

2025-03-26 Thread via GitHub
Chen-Yuan-Lai opened a new pull request, #15451: URL: https://github.com/apache/datafusion/pull/15451 ## Which issue does this PR close? - Closes #14432 . ## Rationale for this change well explained in the issue. ## What changes are included in this PR?

Re: [I] Analysis to support`SortPreservingMerge` --> `ProgressiveEval` [datafusion]

2025-03-26 Thread via GitHub
xudong963 commented on issue #15191: URL: https://github.com/apache/datafusion/issues/15191#issuecomment-2756831956 > The only reason it is not needed here is because there are fewer files than `target_partitions`, so this will not work if we increase the number of files or reduce `target_p

Re: [PR] Add support for DISTINCT + ORDER BY in `ARRAY_AGG` [datafusion]

2025-03-26 Thread via GitHub
alamb merged PR #14413: URL: https://github.com/apache/datafusion/pull/14413 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [D] More thorough contribution guideline [datafusion]

2025-03-26 Thread via GitHub
GitHub user logan-keede edited a comment on the discussion: More thorough contribution guideline > iii. Collect feedback from downstream projects to reveal any possible design > issues Do we have any communication channel for collecting feedback, or announcing feature branch? @alamb What a

[PR] Docs: Formatting and Added Extra resources [datafusion]

2025-03-26 Thread via GitHub
2SpaceMasterRace opened a new pull request, #15450: URL: https://github.com/apache/datafusion/pull/15450 ## Which issue does this PR close? NIL. ## Rationale for this change This PR improves the overall quality and accessibility of the documentation by: - Enhancin

[PR] Fix roundtrip bug with empty projection in DataSourceExec [datafusion]

2025-03-26 Thread via GitHub
XiangpengHao opened a new pull request, #15449: URL: https://github.com/apache/datafusion/pull/15449 ## Which issue does this PR close? - Closes #. ## Rationale for this change The following test fails, for the same reason in #14116 ```rust #

Re: [PR] Docs: Added extra resources & fixed formatting to Concepts, Readings, Events section [datafusion]

2025-03-26 Thread via GitHub
2SpaceMasterRace closed pull request #15436: Docs: Added extra resources & fixed formatting to Concepts, Readings, Events section URL: https://github.com/apache/datafusion/pull/15436 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHu

Re: [PR] Docs: Added extra resources & fixed formatting to Concepts, Readings, Events section [datafusion]

2025-03-26 Thread via GitHub
2SpaceMasterRace commented on PR #15436: URL: https://github.com/apache/datafusion/pull/15436#issuecomment-2756575484 Oh I don't know why it keeps happening after I tried a bunch of methods. Thanks @alamb ! I'll use the commands you posted. -- This is an automated message from the Apache

Re: [I] Improve performance sort TPCH q3 with Utf8Vew ( Sort-preserving merging on a single `Utf8View` ) [datafusion]

2025-03-26 Thread via GitHub
zhuqi-lucas commented on issue #15403: URL: https://github.com/apache/datafusion/issues/15403#issuecomment-2756301876 Submitted a PR for review, the performance has about 40% improvement for sort-tpch q3 with single stringview column. -- This is an automated message from the Apach

Re: [PR] added fallback using reflection for backward-compatibility [datafusion-comet]

2025-03-26 Thread via GitHub
wForget commented on code in PR #1573: URL: https://github.com/apache/datafusion-comet/pull/1573#discussion_r2015410263 ## spark/src/main/spark-3.5/org/apache/spark/sql/comet/shims/ShimCometScanExec.scala: ## @@ -55,15 +55,48 @@ trait ShimCometScanExec { protected def isNeede

Re: [I] Feature: support cast `date` to `timestamp` with tz [datafusion]

2025-03-26 Thread via GitHub
friendlymatthew commented on issue #14638: URL: https://github.com/apache/datafusion/issues/14638#issuecomment-2756504786 > I believe the arrow update is in the [arrow 54.3.0 release](https://github.com/apache/arrow-rs/releases/tag/54.3.0) so once DF is upgraded to that release we can verif

[PR] Minor: fix doc for `FileGroupPartitioner` [datafusion]

2025-03-26 Thread via GitHub
xudong963 opened a new pull request, #15448: URL: https://github.com/apache/datafusion/pull/15448 ## Which issue does this PR close? - Closes #. ## Rationale for this change ## What changes are included in this PR? Fix doc ## Are thes

Re: [PR] Format `Date32` to string given timestamp specifiers [datafusion]

2025-03-26 Thread via GitHub
friendlymatthew commented on code in PR #15361: URL: https://github.com/apache/datafusion/pull/15361#discussion_r2015455219 ## datafusion/functions/src/datetime/to_char.rs: ## @@ -277,7 +282,25 @@ fn _to_char_array(args: &[ColumnarValue]) -> Result { let result = forma

Re: [PR] perf: Reuse row converter during sort [datafusion]

2025-03-26 Thread via GitHub
2010YOUY01 commented on PR #15302: URL: https://github.com/apache/datafusion/pull/15302#issuecomment-2756486624 The test submodule issue should be fixed. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

Re: [PR] Format `Date32` to string given timestamp specifiers [datafusion]

2025-03-26 Thread via GitHub
friendlymatthew commented on code in PR #15361: URL: https://github.com/apache/datafusion/pull/15361#discussion_r2015534358 ## datafusion/functions/src/datetime/to_char.rs: ## @@ -277,7 +282,25 @@ fn _to_char_array(args: &[ColumnarValue]) -> Result { let result = forma

Re: [PR] Format `Date32` to string given timestamp specifiers [datafusion]

2025-03-26 Thread via GitHub
friendlymatthew commented on code in PR #15361: URL: https://github.com/apache/datafusion/pull/15361#discussion_r2015455219 ## datafusion/functions/src/datetime/to_char.rs: ## @@ -277,7 +282,25 @@ fn _to_char_array(args: &[ColumnarValue]) -> Result { let result = forma

Re: [PR] Add `downcast_to_source` method for `DataSourceExec` [datafusion]

2025-03-26 Thread via GitHub
xudong963 commented on PR #15416: URL: https://github.com/apache/datafusion/pull/15416#issuecomment-2756308249 @alamb Thanks for your review! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the s

Re: [I] Why does it report an error when building the `branch-28` branch with `cargo build`? [datafusion]

2025-03-26 Thread via GitHub
mustdo-afk closed issue #15429: Why does it report an error when building the `branch-28` branch with `cargo build`? URL: https://github.com/apache/datafusion/issues/15429 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use t

Re: [I] Why does it report an error when building the `branch-28` branch with `cargo build`? [datafusion]

2025-03-26 Thread via GitHub
mustdo-afk commented on issue #15429: URL: https://github.com/apache/datafusion/issues/15429#issuecomment-2756318054 > This is a known issue - `chrono v0.4.40` broke a bunch of `arrow-rs` releases. The only fix I know of to compile the older versions is to edit the lockfile to use `chrono v

Re: [PR] Add `downcast_to_source` method for `DataSourceExec` [datafusion]

2025-03-26 Thread via GitHub
xudong963 merged PR #15416: URL: https://github.com/apache/datafusion/pull/15416 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@data

Re: [I] `BinaryExpr` evaluate lacks optimization for `Or` and `And` scenarios [datafusion]

2025-03-26 Thread via GitHub
acking-you commented on issue #11212: URL: https://github.com/apache/datafusion/issues/11212#issuecomment-2756305533 Thank you for your guidance and advice @alamb . I will try to work on these later today (I might be a bit busy right now). -- This is an automated message from the Apach

Re: [PR] feat: pushdown filter for native_iceberg_compat [datafusion-comet]

2025-03-26 Thread via GitHub
wForget commented on code in PR #1566: URL: https://github.com/apache/datafusion-comet/pull/1566#discussion_r2015421314 ## spark/src/main/scala/org/apache/comet/parquet/SourceFilterSerde.scala: ## @@ -0,0 +1,175 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under o

Re: [PR] fix: Refactor CometScanRule and fix bugs [datafusion-comet]

2025-03-26 Thread via GitHub
andygrove commented on PR #1483: URL: https://github.com/apache/datafusion-comet/pull/1483#issuecomment-2738005266 Thanks for the review @parthchandra and @mbutrovich -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use t

Re: [PR] feat: pushdown filter for native_iceberg_compat [datafusion-comet]

2025-03-26 Thread via GitHub
wForget commented on code in PR #1566: URL: https://github.com/apache/datafusion-comet/pull/1566#discussion_r2015410919 ## spark/src/test/scala/org/apache/comet/parquet/ParquetReadSuite.scala: ## @@ -1460,6 +1460,33 @@ class ParquetReadV1Suite extends ParquetReadSuite with Adap

[PR] Improve performance sort TPCH q3 with Utf8Vew ( Sort-preserving mergi… [datafusion]

2025-03-26 Thread via GitHub
zhuqi-lucas opened a new pull request, #15447: URL: https://github.com/apache/datafusion/pull/15447 …ng on a single Utf8View ) ## Which issue does this PR close? - Closes [#15403](https://github.com/apache/datafusion/issues/15403) ## Rationale for this change Impro

Re: [I] Migrate subtrait tests to `insta` [datafusion]

2025-03-26 Thread via GitHub
qstommyshu commented on issue #15398: URL: https://github.com/apache/datafusion/issues/15398#issuecomment-2756274122 > Hi [@blaginin](https://github.com/blaginin?rgh-link-date=2025-03-26T21%3A33%3A40.000Z) and [@alamb](https://github.com/alamb?rgh-link-date=2025-03-26T21%3A33%3A40.000Z)

[PR] Migrate optimizer tests to insta [datafusion]

2025-03-26 Thread via GitHub
qstommyshu opened a new pull request, #15446: URL: https://github.com/apache/datafusion/pull/15446 ## Which issue does this PR close? - Closes #15396 . ## Rationale for this change ## What changes are included in this PR? Migrated tests in `data

Re: [PR] added fallback using reflection for backward-compatibility [datafusion-comet]

2025-03-26 Thread via GitHub
wForget commented on code in PR #1573: URL: https://github.com/apache/datafusion-comet/pull/1573#discussion_r2015382561 ## .github/workflows/spark_sql_test.yml: ## @@ -45,7 +45,7 @@ jobs: matrix: os: [ubuntu-24.04] java-version: [11] -spark-versi

Re: [PR] Support binary temporal arithmetic with integers [datafusion]

2025-03-26 Thread via GitHub
github-actions[bot] closed pull request #13741: Support binary temporal arithmetic with integers URL: https://github.com/apache/datafusion/pull/13741 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] Always add round robin repartitioning to leaves (data sources), benefitting unbalanced / small datasets [datafusion]

2025-03-26 Thread via GitHub
github-actions[bot] closed pull request #13707: Always add round robin repartitioning to leaves (data sources), benefitting unbalanced / small datasets URL: https://github.com/apache/datafusion/pull/13707 -- This is an automated message from the Apache Git Service. To respond to the message,

Re: [PR] Migrate optimizer tests to insta [datafusion]

2025-03-26 Thread via GitHub
qstommyshu commented on PR #15445: URL: https://github.com/apache/datafusion/pull/15445#issuecomment-2756254291 Ah, I accidentally merged all bunch of code from the main branch... I think it is easier for me to resolve all these by just creating another branch and PR -- This is an automat

Re: [PR] Migrate datasource tests to insta [datafusion]

2025-03-26 Thread via GitHub
xudong963 merged PR #15258: URL: https://github.com/apache/datafusion/pull/15258 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@data

Re: [PR] Migrate optimizer tests to insta [datafusion]

2025-03-26 Thread via GitHub
qstommyshu closed pull request #15445: Migrate optimizer tests to insta URL: https://github.com/apache/datafusion/pull/15445 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [I] Move `optimize_subquery_sort` into optimizer as a new rule `EliminateSort` [datafusion]

2025-03-26 Thread via GitHub
irenjj commented on issue #15435: URL: https://github.com/apache/datafusion/issues/15435#issuecomment-2756228364 > > Using `datafusion-optimizer` in `datafusion-sql` can lead to dependency issue: > > ``` > > thread 'main' panicked at src/main.rs:84:9: > > circular dependency detecte

[PR] Migrate optimizer tests to insta [datafusion]

2025-03-26 Thread via GitHub
qstommyshu opened a new pull request, #15445: URL: https://github.com/apache/datafusion/pull/15445 ## Which issue does this PR close? - Closes #15396 . ## Rationale for this change ## What changes are included in this PR? Migrated tests in `data

Re: [I] Attach `Diagnostic` to "wrong number of arguments" error [datafusion]

2025-03-26 Thread via GitHub
Chen-Yuan-Lai commented on issue #14432: URL: https://github.com/apache/datafusion/issues/14432#issuecomment-2756209133 @prowang01 Sure! Feel free to reassign the issue -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use t

Re: [I] Spark SQL test failures in native_iceberg_compat mode [datafusion-comet]

2025-03-26 Thread via GitHub
andygrove commented on issue #1542: URL: https://github.com/apache/datafusion-comet/issues/1542#issuecomment-2744045497 I'm looking into the core3 `row index generation` errors. At least one of them is failing with NPE in Comet code: ``` Caused by: java.lang.NullPointerException

Re: [I] Move `optimize_subquery_sort` into optimizer as a new rule `EliminateSort` [datafusion]

2025-03-26 Thread via GitHub
jayzhan211 commented on issue #15435: URL: https://github.com/apache/datafusion/issues/15435#issuecomment-2756154926 > Using `datafusion-optimizer` in `datafusion-sql` can lead to dependency issue: > > ``` > thread 'main' panicked at src/main.rs:84:9: > circular dependency detec

Re: [PR] Add dynamic pruning filters from TopK state [datafusion]

2025-03-26 Thread via GitHub
adriangb commented on code in PR #15301: URL: https://github.com/apache/datafusion/pull/15301#discussion_r2015100172 ## datafusion/datasource-parquet/src/source.rs: ## @@ -259,6 +261,8 @@ pub struct ParquetSource { pub(crate) metrics: ExecutionPlanMetricsSet, /// Optio

Re: [PR] Enforce JOIN plan to require condition [datafusion]

2025-03-26 Thread via GitHub
comphead commented on code in PR #15334: URL: https://github.com/apache/datafusion/pull/15334#discussion_r2009201610 ## datafusion/sqllogictest/test_files/join.slt.part: ## @@ -625,6 +625,24 @@ FROM t1 11 11 11 +# join condition is required +# TODO: query error join con

Re: [PR] Update concepts-readings-events.md [datafusion]

2025-03-26 Thread via GitHub
berkaysynnada merged PR #15440: URL: https://github.com/apache/datafusion/pull/15440 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@

Re: [PR] Add GLOBAL context/modifier to SET statements [datafusion-sqlparser-rs]

2025-03-26 Thread via GitHub
MohamedAbdeen21 commented on code in PR #1767: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1767#discussion_r2004029227 ## src/ast/mod.rs: ## @@ -7919,11 +7921,28 @@ impl fmt::Display for ContextModifier { write!(f, "") }

[PR] chore(deps): bump tokio from 1.43.0 to 1.44.1 [datafusion]

2025-03-26 Thread via GitHub
dependabot[bot] opened a new pull request, #15347: URL: https://github.com/apache/datafusion/pull/15347 Bumps [tokio](https://github.com/tokio-rs/tokio) from 1.43.0 to 1.44.1. Release notes Sourced from https://github.com/tokio-rs/tokio/releases";>tokio's releases. Tokio v1.4

Re: [PR] Add `downcast_to_source` method for `DataSourceExec` [datafusion]

2025-03-26 Thread via GitHub
alamb commented on code in PR #15416: URL: https://github.com/apache/datafusion/pull/15416#discussion_r2014969803 ## docs/source/library-user-guide/upgrading.md: ## @@ -129,6 +129,20 @@ if let Some(datasource_exec) = plan.as_any().downcast_ref::() { # */ ``` +There's also a

Re: [PR] Add dynamic pruning filters from TopK state [datafusion]

2025-03-26 Thread via GitHub
adriangb commented on code in PR #15301: URL: https://github.com/apache/datafusion/pull/15301#discussion_r2015111827 ## datafusion/datasource-parquet/src/source.rs: ## @@ -587,4 +578,17 @@ impl FileSource for ParquetSource { } } } + +fn supports_dy

Re: [PR] Add dynamic pruning filters from TopK state [datafusion]

2025-03-26 Thread via GitHub
adriangb commented on code in PR #15301: URL: https://github.com/apache/datafusion/pull/15301#discussion_r2015103237 ## datafusion/core/src/datasource/physical_plan/parquet.rs: ## @@ -1847,6 +1848,28 @@ mod tests { writer.close().unwrap(); } +fn write_file_nu

Re: [PR] Add dynamic pruning filters from TopK state [datafusion]

2025-03-26 Thread via GitHub
adriangb commented on code in PR #15301: URL: https://github.com/apache/datafusion/pull/15301#discussion_r2015098643 ## datafusion/physical-plan/src/topk/mod.rs: ## @@ -644,10 +738,122 @@ impl RecordBatchStore { } } +/// Pushdown of dynamic fitlers from TopK operators is

Re: [PR] Little changes "cache control" [datafusion]

2025-03-26 Thread via GitHub
alamb closed pull request #14611: Little changes "cache control" URL: https://github.com/apache/datafusion/pull/14611 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubsc

Re: [PR] Add dynamic pruning filters from TopK state [datafusion]

2025-03-26 Thread via GitHub
alamb commented on code in PR #15301: URL: https://github.com/apache/datafusion/pull/15301#discussion_r2015083952 ## datafusion/physical-plan/src/topk/mod.rs: ## @@ -644,10 +738,122 @@ impl RecordBatchStore { } } +/// Pushdown of dynamic fitlers from TopK operators is us

Re: [PR] chore: Reimplement ShuffleWriterExec using interleave_record_batch [datafusion-comet]

2025-03-26 Thread via GitHub
Kontinuation commented on PR #1511: URL: https://github.com/apache/datafusion-comet/pull/1511#issuecomment-2755875567 I have also refactored the handling for repartitioning to a single partition (#1453), this avoids saturating the off-heap memory and fixes the OOM in the TPC-DS test. We ca

Re: [PR] Saner handling of nulls inside arrays [datafusion]

2025-03-26 Thread via GitHub
thinkharderdev commented on PR #15149: URL: https://github.com/apache/datafusion/pull/15149#issuecomment-2739947657 > > I recommend following whatever DuckDB (or postgres do) -- there is not muchv alue in DataFusion having different semantics from other systems > > * DuckDB doesn't ha

Re: [PR] chore: Reimplement ShuffleWriterExec using interleave_record_batch [datafusion-comet]

2025-03-26 Thread via GitHub
Kontinuation commented on PR #1511: URL: https://github.com/apache/datafusion-comet/pull/1511#issuecomment-2755868919 Reran TPC-H SF=100 on an m7i.4xlarge instances with `master = local[8]`, Most of the disk accesses hit the OS cache so the slow EBS didn't affect the query performance too

Re: [PR] Add `FileScanConfigBuilder` [datafusion]

2025-03-26 Thread via GitHub
alamb commented on code in PR #15352: URL: https://github.com/apache/datafusion/pull/15352#discussion_r2015008401 ## datafusion/datasource/src/file_scan_config.rs: ## @@ -326,14 +544,15 @@ impl FileScanConfig { /// # Parameters: /// * `object_store_url`: See [`Self::ob

Re: [PR] Migrate subtrait tests to insta [datafusion]

2025-03-26 Thread via GitHub
qstommyshu commented on PR #15444: URL: https://github.com/apache/datafusion/pull/15444#issuecomment-2755817539 Just to clarify this PR is **NOT FINISHED YET**. I'm still awaiting for an answer of the [scope](https://github.com/apache/datafusion/issues/15398#issuecomment-2755795572) of thi

Re: [PR] added fallback using reflection for backward-compatibility [datafusion-comet]

2025-03-26 Thread via GitHub
YanivKunda commented on code in PR #1573: URL: https://github.com/apache/datafusion-comet/pull/1573#discussion_r2015042394 ## .github/workflows/spark_sql_test.yml: ## @@ -45,7 +45,7 @@ jobs: matrix: os: [ubuntu-24.04] java-version: [11] -spark-ve

[PR] add cargo insta to dev dependencies [datafusion]

2025-03-26 Thread via GitHub
qstommyshu opened a new pull request, #15444: URL: https://github.com/apache/datafusion/pull/15444 ## Which issue does this PR close? - Closes #15398. ## Rationale for this change ## What changes are included in this PR? Migrated tests in da

Re: [PR] chore: Upgrade `rand` crate and some other minor crates [datafusion]

2025-03-26 Thread via GitHub
comphead commented on PR #14967: URL: https://github.com/apache/datafusion/pull/14967#issuecomment-2754648315 Depends on https://github.com/apache/arrow-rs/issues/7084 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use th

Re: [I] Migrate subtrait tests to `insta` [datafusion]

2025-03-26 Thread via GitHub
qstommyshu commented on issue #15398: URL: https://github.com/apache/datafusion/issues/15398#issuecomment-2755795572 Hi @blaginin and @alamb Just want to confirm the example test files are really just "examples", right? I also see there are more files under the *subtrait* test cases

Re: [PR] refactor(hash_join): Move JoinHashMap to separate mod [datafusion]

2025-03-26 Thread via GitHub
alamb commented on PR #15419: URL: https://github.com/apache/datafusion/pull/15419#issuecomment-2755757551 🚀 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe

Re: [PR] Add `downcast_to_source` method for `DataSourceExec` [datafusion]

2025-03-26 Thread via GitHub
xudong963 commented on code in PR #15416: URL: https://github.com/apache/datafusion/pull/15416#discussion_r2015023075 ## docs/source/library-user-guide/upgrading.md: ## @@ -129,6 +129,20 @@ if let Some(datasource_exec) = plan.as_any().downcast_ref::() { # */ ``` +There's al

Re: [PR] add manual trigger for extended tests in pull requests [datafusion]

2025-03-26 Thread via GitHub
alamb commented on PR #14331: URL: https://github.com/apache/datafusion/pull/14331#issuecomment-2755784817 BTW @danila-b has a solution here: https://github.com/apache/datafusion/pull/15101 -- This is an automated message from the Apache Git Service. To respond to the message, please log

Re: [I] A 'cache control' header is missing or empty webkit [datafusion]

2025-03-26 Thread via GitHub
alamb commented on issue #14542: URL: https://github.com/apache/datafusion/issues/14542#issuecomment-2755782575 > the emogi images is not should be fix to line .the images has been wraping and frontend not looks like good What images are you referring to? Maybe you can provide

Re: [PR] Little changes "cache control" [datafusion]

2025-03-26 Thread via GitHub
alamb commented on PR #14611: URL: https://github.com/apache/datafusion/pull/14611#issuecomment-2755780836 Given it is not clear what problem is solving and it has been dormant for a while, I am going to close it. Please reopen when we can better articulate why this change is needed

Re: [PR] feat: implement GroupsAccumulator for `count(DISTINCT)` aggr [datafusion]

2025-03-26 Thread via GitHub
alamb commented on PR #15324: URL: https://github.com/apache/datafusion/pull/15324#issuecomment-2755778699 i think this is still a work in progress, so marking it as a draft to clean up the review queue -- This is an automated message from the Apache Git Service. To respond to the message

Re: [PR] Add support for DISTINCT + ORDER BY in `ARRAY_AGG` [datafusion]

2025-03-26 Thread via GitHub
alamb commented on PR #14413: URL: https://github.com/apache/datafusion/pull/14413#issuecomment-2755776567 🚀 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe

Re: [I] array_agg cannot perform both distinct and order_by [datafusion]

2025-03-26 Thread via GitHub
alamb closed issue #12371: array_agg cannot perform both distinct and order_by URL: https://github.com/apache/datafusion/issues/12371 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comm

Re: [PR] refactor(hash_join): Move JoinHashMap to separate mod [datafusion]

2025-03-26 Thread via GitHub
alamb merged PR #15419: URL: https://github.com/apache/datafusion/pull/15419 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] fix: Unconditionally wrap UNION BY NAME input nodes w/ `Projection` [datafusion]

2025-03-26 Thread via GitHub
alamb commented on code in PR #15242: URL: https://github.com/apache/datafusion/pull/15242#discussion_r2014999225 ## datafusion/expr/src/logical_plan/plan.rs: ## @@ -2679,24 +2679,16 @@ impl Union { Ok(Union { inputs, schema }) } -/// When constructing a `UNI

Re: [PR] refactor: Use SpillManager for all spilling scenarios [datafusion]

2025-03-26 Thread via GitHub
alamb commented on PR #15405: URL: https://github.com/apache/datafusion/pull/15405#issuecomment-2755747018 Thank you @2010YOUY01 and @comphead -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [I] Use spill manager in sort merge join [datafusion]

2025-03-26 Thread via GitHub
alamb closed issue #15400: Use spill manager in sort merge join URL: https://github.com/apache/datafusion/issues/15400 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubs

Re: [PR] minor: Add new crates to labeler [datafusion]

2025-03-26 Thread via GitHub
alamb merged PR #15426: URL: https://github.com/apache/datafusion/pull/15426 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] added fallback using reflection for backward-compatibility [datafusion-comet]

2025-03-26 Thread via GitHub
kazuyukitanimura commented on code in PR #1573: URL: https://github.com/apache/datafusion-comet/pull/1573#discussion_r2014966200 ## .github/workflows/spark_sql_test.yml: ## @@ -45,7 +45,7 @@ jobs: matrix: os: [ubuntu-24.04] java-version: [11] -sp

Re: [PR] perf: Reuse row converter during sort [datafusion]

2025-03-26 Thread via GitHub
alamb commented on PR #15302: URL: https://github.com/apache/datafusion/pull/15302#issuecomment-2755694280 There appears to be a change to the testing pin in this PR as well: ![Screenshot 2025-03-26 at 4 37 14  PM](https://github.com/user-attachments/assets/c5cfb049-6e43-44ef-8d61-44e4

Re: [I] `BinaryExpr` evaluate lacks optimization for `Or` and `And` scenarios [datafusion]

2025-03-26 Thread via GitHub
alamb commented on issue #11212: URL: https://github.com/apache/datafusion/issues/11212#issuecomment-2755686534 Thank you for bringing this up again @acking-you > If we can optimize the specialized query you mentioned and not slowing down other queries, it would be nice to have it.

Re: [PR] Change default `EXPLAIN` format in `datafusion-cli` to `tree` format [datafusion]

2025-03-26 Thread via GitHub
alamb commented on PR #15427: URL: https://github.com/apache/datafusion/pull/15427#issuecomment-2755661874 > I think one issue with the current approach is that loading from env will break. this is a good call -- I will fix that -- This is an automated message from the Apache G

Re: [I] Scalars are too verbose in column name output [datafusion]

2025-03-26 Thread via GitHub
alamb commented on issue #15395: URL: https://github.com/apache/datafusion/issues/15395#issuecomment-2755667376 > always rendering data in a more compact way (the first option from my list) - I think it is a better choice too The challenge is that it will change the schema of

Re: [I] Feature: support cast `date` to `timestamp` with tz [datafusion]

2025-03-26 Thread via GitHub
Omega359 commented on issue #14638: URL: https://github.com/apache/datafusion/issues/14638#issuecomment-2755651138 I believe the arrow update is in the [arrow 54.3.0 release](https://github.com/apache/arrow-rs/releases/tag/54.3.0) so once DF is upgraded to that release we can verify it in D

Re: [PR] Change default `EXPLAIN` format in `datafusion-cli` to `tree` format [datafusion]

2025-03-26 Thread via GitHub
blaginin commented on PR #15427: URL: https://github.com/apache/datafusion/pull/15427#issuecomment-2755465857 I think one issue with the current approach is that loading from env will break. before: https://github.com/user-attachments/assets/bef857c6-7fa8-4852-96ea-fe7fba39cc97";

Re: [PR] Add "end to end parquet reading test" for WASM [datafusion]

2025-03-26 Thread via GitHub
alamb commented on code in PR #15362: URL: https://github.com/apache/datafusion/pull/15362#discussion_r200920 ## datafusion/wasmtest/src/lib.rs: ## @@ -185,26 +206,56 @@ mod test { #[wasm_bindgen_test(unsupported = tokio::test)] async fn test_parquet_write() { -

Re: [I] Snowflake COPY INTO fails to parse with a semicolon [datafusion-sqlparser-rs]

2025-03-26 Thread via GitHub
tv42 closed issue #1519: Snowflake COPY INTO fails to parse with a semicolon URL: https://github.com/apache/datafusion-sqlparser-rs/issues/1519 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spe

Re: [PR] Change default `EXPLAIN` format in `datafusion-cli` to `tree` format [datafusion]

2025-03-26 Thread via GitHub
blaginin commented on code in PR #15427: URL: https://github.com/apache/datafusion/pull/15427#discussion_r2014812090 ## datafusion-cli/tests/cli_integration.rs: ## @@ -74,6 +75,31 @@ fn cli_quick_test<'a>( assert_cmd_snapshot!(cmd); } +#[rstest] Review Comment: this

Re: [PR] Change default `EXPLAIN` format in `datafusion-cli` to `tree` format [datafusion]

2025-03-26 Thread via GitHub
blaginin commented on code in PR #15427: URL: https://github.com/apache/datafusion/pull/15427#discussion_r2014804810 ## datafusion-cli/tests/cli_integration.rs: ## @@ -74,6 +75,31 @@ fn cli_quick_test<'a>( assert_cmd_snapshot!(cmd); } +#[rstest] Review Comment: proba

Re: [PR] added fallback using reflection for backward-compatibility [datafusion-comet]

2025-03-26 Thread via GitHub
YanivKunda commented on code in PR #1573: URL: https://github.com/apache/datafusion-comet/pull/1573#discussion_r2014801685 ## .github/workflows/spark_sql_test.yml: ## @@ -45,7 +45,7 @@ jobs: matrix: os: [ubuntu-24.04] java-version: [11] -spark-ve

Re: [PR] feat: pushdown filter for native_iceberg_compat [datafusion-comet]

2025-03-26 Thread via GitHub
kazuyukitanimura commented on code in PR #1566: URL: https://github.com/apache/datafusion-comet/pull/1566#discussion_r2014788972 ## spark/src/test/scala/org/apache/comet/parquet/ParquetReadSuite.scala: ## @@ -1460,6 +1460,33 @@ class ParquetReadV1Suite extends ParquetReadSuite w

[I] `TableProvider` -> `Constraints` doc refers to nonexistent `new_from_table_constraints` [datafusion]

2025-03-26 Thread via GitHub
tv42 opened a new issue, #15443: URL: https://github.com/apache/datafusion/issues/15443 ### Describe the bug https://docs.rs/datafusion/46.0.1/datafusion/common/struct.Constraints.html#method.new_unverified > Users should use the `empty` or `new_from_table_constraints` function

Re: [PR] added fallback using reflection for backward-compatibility [datafusion-comet]

2025-03-26 Thread via GitHub
kazuyukitanimura commented on code in PR #1573: URL: https://github.com/apache/datafusion-comet/pull/1573#discussion_r2014776467 ## .github/workflows/spark_sql_test.yml: ## @@ -45,7 +45,7 @@ jobs: matrix: os: [ubuntu-24.04] java-version: [11] -sp

Re: [PR] Introduce selection vector repartitioning [datafusion]

2025-03-26 Thread via GitHub
goldmedal commented on code in PR #15423: URL: https://github.com/apache/datafusion/pull/15423#discussion_r2014723481 ## datafusion/physical-plan/src/repartition/mod.rs: ## @@ -316,6 +326,71 @@ impl BatchPartitioner { Ok((partition, batch))

Re: [PR] Support bounds evaluation for temporal data types [datafusion]

2025-03-26 Thread via GitHub
ch-sc commented on code in PR #14523: URL: https://github.com/apache/datafusion/pull/14523#discussion_r2014120634 ## datafusion/expr-common/src/interval_arithmetic.rs: ## @@ -902,6 +960,15 @@ pub fn apply_operator(op: &Operator, lhs: &Interval, rhs: &Interval) -> Result lhs.sub

Re: [PR] feat: enable iceberg compat tests, more tests for complex types [datafusion-comet]

2025-03-26 Thread via GitHub
comphead commented on code in PR #1550: URL: https://github.com/apache/datafusion-comet/pull/1550#discussion_r2014693425 ## spark/src/main/scala/org/apache/comet/serde/QueryPlanSerde.scala: ## @@ -2722,7 +2721,11 @@ object QueryPlanSerde extends Logging with ShimQueryPlanSerde

Re: [I] Attach `Diagnostic` to "wrong number of arguments" error [datafusion]

2025-03-26 Thread via GitHub
prowang01 commented on issue #14432: URL: https://github.com/apache/datafusion/issues/14432#issuecomment-2755187725 Hi! I'm currently preparing my GSoC 2025 application and would love to contribute to this issue as a warm-up task. I understand this one involves attaching a `Diagnostic` to t

Re: [I] Scalars are too verbose in column name output [datafusion]

2025-03-26 Thread via GitHub
blaginin commented on issue #15395: URL: https://github.com/apache/datafusion/issues/15395#issuecomment-2755180609 > If we have column type, we don't need to display type for inner elements. Maybe we can work on column type first? Thank you!!! That's fair, I've created a separate tick

Re: [I] Add an option to display column types in the table [datafusion]

2025-03-26 Thread via GitHub
blaginin commented on issue #15442: URL: https://github.com/apache/datafusion/issues/15442#issuecomment-2755166494 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To u

Re: [I] March 17, 2025: This week(s) in DataFusion [datafusion]

2025-03-26 Thread via GitHub
alamb commented on issue #15269: URL: https://github.com/apache/datafusion/issues/15269#issuecomment-2755022667 And another blog: https://datafusion.apache.org/blog/2025/03/24/datafusion-46.0.0/ (thanks @oznur-synnada !) -- This is an automated message from the Apache Git Service. To res

Re: [I] Blog for DataFusion 46.0.0 [datafusion]

2025-03-26 Thread via GitHub
alamb closed issue #15053: Blog for DataFusion 46.0.0 URL: https://github.com/apache/datafusion/issues/15053 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-m

Re: [PR] Blog post for DataFusion 46.0.0 [datafusion-site]

2025-03-26 Thread via GitHub
alamb commented on PR #64: URL: https://github.com/apache/datafusion-site/pull/64#issuecomment-2755018366 THanks @oznur-synnada ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comm

Re: [PR] FIX : some benchmarks are failing [datafusion]

2025-03-26 Thread via GitHub
getChan commented on code in PR #15367: URL: https://github.com/apache/datafusion/pull/15367#discussion_r2014540154 ## datafusion/core/benches/distinct_query_sql.rs: ## @@ -144,59 +141,50 @@ pub async fn create_context_sampled_data( } fn criterion_benchmark_limited_distinct_

Re: [PR] Blog post for DataFusion 46.0.0 [datafusion-site]

2025-03-26 Thread via GitHub
kevinjqliu commented on PR #64: URL: https://github.com/apache/datafusion-site/pull/64#issuecomment-2754908243 💯 https://datafusion.apache.org/blog/2025/03/24/datafusion-46.0.0/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

Re: [I] Move `optimize_subquery_sort` into optimizer as a new rule `EliminateSort` [datafusion]

2025-03-26 Thread via GitHub
irenjj commented on issue #15435: URL: https://github.com/apache/datafusion/issues/15435#issuecomment-2754833600 Using `datafusion-optimizer` in `datafusion-sql` can lead to dependency issue: ``` thread 'main' panicked at src/main.rs:84:9: circular dependency detected from datafusio

[PR] chore: Move optimize_subquery_sort into optimizer as a new rule Elimi… [datafusion]

2025-03-26 Thread via GitHub
irenjj opened a new pull request, #15441: URL: https://github.com/apache/datafusion/pull/15441 …nateSort ## Which issue does this PR close? - Closes #15435 ## Rationale for this change ## What changes are included in this PR? ## Are t

  1   2   >