Re: [PR] TEST: troubleshoot CI errors [datafusion]

2025-03-26 Thread via GitHub
kosiew closed pull request #15417: TEST: troubleshoot CI errors URL: https://github.com/apache/datafusion/pull/15417 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscr

Re: [I] Improve Spill Performance: `mmap` the spill files [datafusion]

2025-03-26 Thread via GitHub
zebsme commented on issue #15321: URL: https://github.com/apache/datafusion/issues/15321#issuecomment-2754061369 hi, in your example turn mmap into buffer, which is not totally right as the drop method would be skipped when FileDecoder drops. -- This is an automated message from the Apach

Re: [I] [DISCUSS] Switch to `tree` explain by default [datafusion]

2025-03-26 Thread via GitHub
alamb commented on issue #15343: URL: https://github.com/apache/datafusion/issues/15343#issuecomment-2753832474 Here is a proposed PR: - https://github.com/apache/datafusion/pull/15427 -- This is an automated message from the Apache Git Service. To respond to the message, please log on

Re: [PR] Docs: Added extra resources & fixed formatting to Concepts, Readings, Events section [datafusion]

2025-03-26 Thread via GitHub
berkaysynnada commented on PR #15424: URL: https://github.com/apache/datafusion/pull/15424#issuecomment-2754068076 Thanks @2SpaceMasterRace. Can you revert the submodule changes? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

Re: [PR] Triggering extended tests through PR comment [datafusion]

2025-03-26 Thread via GitHub
alamb commented on code in PR #15101: URL: https://github.com/apache/datafusion/pull/15101#discussion_r2013788126 ## .github/workflows/extended.yml: ## @@ -127,4 +145,44 @@ jobs: cargo test --features backtrace --profile release-nonlto --test sqllogictests -- --inclu

Re: [PR] Fix link to Volcano paper [datafusion]

2025-03-26 Thread via GitHub
xudong963 merged PR #15437: URL: https://github.com/apache/datafusion/pull/15437 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@data

Re: [PR] Add dynamic pruning filters from TopK state [datafusion]

2025-03-26 Thread via GitHub
adriangb commented on PR #15301: URL: https://github.com/apache/datafusion/pull/15301#issuecomment-2754554854 > Thank you @adriangb, this looks very exciting. I'd also like to review this in detail, especially from the perspective of API applicability for join filter pushdowns(like starburs

Re: [I] Can we add `udtf` to `FunctionRegistry`? [datafusion]

2025-03-26 Thread via GitHub
Omega359 commented on issue #15095: URL: https://github.com/apache/datafusion/issues/15095#issuecomment-2754570772 I can't see why not? Seems like it should be a simple change. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub a

Re: [I] Move `optimize_subquery_sort` into optimizer as a new rule `EliminateSort` [datafusion]

2025-03-26 Thread via GitHub
irenjj commented on issue #15435: URL: https://github.com/apache/datafusion/issues/15435#issuecomment-2754833600 Using `datafusion-optimizer` in `datafusion-sql` can lead to dependency issue: ``` thread 'main' panicked at src/main.rs:84:9: circular dependency detected from datafusio

Re: [PR] Blog post for DataFusion 46.0.0 [datafusion-site]

2025-03-26 Thread via GitHub
oznur-synnada commented on PR #64: URL: https://github.com/apache/datafusion-site/pull/64#issuecomment-2754708985 @alamb This has been merged and is now live on the website. Please let us know if anything further is needed. -- This is an automated message from the Apache Git Service. To r

Re: [PR] FIX : some benchmarks are failing [datafusion]

2025-03-26 Thread via GitHub
getChan commented on code in PR #15367: URL: https://github.com/apache/datafusion/pull/15367#discussion_r2014540154 ## datafusion/core/benches/distinct_query_sql.rs: ## @@ -144,59 +141,50 @@ pub async fn create_context_sampled_data( } fn criterion_benchmark_limited_distinct_

Re: [PR] added fallback using reflection for backward-compatibility [datafusion-comet]

2025-03-26 Thread via GitHub
kazuyukitanimura commented on code in PR #1573: URL: https://github.com/apache/datafusion-comet/pull/1573#discussion_r2014776467 ## .github/workflows/spark_sql_test.yml: ## @@ -45,7 +45,7 @@ jobs: matrix: os: [ubuntu-24.04] java-version: [11] -sp

[I] `TableProvider` -> `Constraints` doc refers to nonexistent `new_from_table_constraints` [datafusion]

2025-03-26 Thread via GitHub
tv42 opened a new issue, #15443: URL: https://github.com/apache/datafusion/issues/15443 ### Describe the bug https://docs.rs/datafusion/46.0.1/datafusion/common/struct.Constraints.html#method.new_unverified > Users should use the `empty` or `new_from_table_constraints` function

Re: [PR] Blog post for DataFusion 46.0.0 [datafusion-site]

2025-03-26 Thread via GitHub
alamb commented on PR #64: URL: https://github.com/apache/datafusion-site/pull/64#issuecomment-2755018366 THanks @oznur-synnada ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comm

Re: [PR] Support bounds evaluation for temporal data types [datafusion]

2025-03-26 Thread via GitHub
ch-sc commented on code in PR #14523: URL: https://github.com/apache/datafusion/pull/14523#discussion_r2014120634 ## datafusion/expr-common/src/interval_arithmetic.rs: ## @@ -902,6 +960,15 @@ pub fn apply_operator(op: &Operator, lhs: &Interval, rhs: &Interval) -> Result lhs.sub

Re: [PR] Introduce selection vector repartitioning [datafusion]

2025-03-26 Thread via GitHub
goldmedal commented on code in PR #15423: URL: https://github.com/apache/datafusion/pull/15423#discussion_r2014723481 ## datafusion/physical-plan/src/repartition/mod.rs: ## @@ -316,6 +326,71 @@ impl BatchPartitioner { Ok((partition, batch))

Re: [PR] upgraded spark 3.5.4 to 3.5.5 [datafusion-comet]

2025-03-26 Thread via GitHub
YanivKunda commented on code in PR #1565: URL: https://github.com/apache/datafusion-comet/pull/1565#discussion_r2013820176 ## spark/src/main/spark-3.5/org/apache/spark/sql/comet/shims/ShimCometScanExec.scala: ## @@ -55,15 +55,15 @@ trait ShimCometScanExec { protected def isNe

Re: [PR] Blog post for DataFusion 46.0.0 [datafusion-site]

2025-03-26 Thread via GitHub
berkaysynnada commented on PR #64: URL: https://github.com/apache/datafusion-site/pull/64#issuecomment-2753878166 Thank you @alamb, @kevinjqliu, @comphead. I've all applied your suggestions -- This is an automated message from the Apache Git Service. To respond to the message, please log o

Re: [PR] TEST: troubleshoot CI errors [datafusion]

2025-03-26 Thread via GitHub
kosiew commented on PR #15417: URL: https://github.com/apache/datafusion/pull/15417#issuecomment-2753994691 Resolved issue, using feature gating #[cfg(feature = "parquet")] -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitH

[PR] Lia/fix duplicate schema names on multiple joins queriesa [datafusion]

2025-03-26 Thread via GitHub
LiaCastaneda opened a new pull request, #15433: URL: https://github.com/apache/datafusion/pull/15433 ## Which issue does this PR close? - Closes #. ## Rationale for this change ## What changes are included in this PR? ## Are these changes te

[I] Duplicate unqualified schema names on queries with multiple JOIN [datafusion]

2025-03-26 Thread via GitHub
LiaCastaneda opened a new issue, #15439: URL: https://github.com/apache/datafusion/issues/15439 ### Describe the bug 👋 I'm getting the error : `Schema contains duplicate unqualified field name "id:1" ` This occurs in queries involving multiple JOINs when using the subs

Re: [PR] Fix link to Volcano paper [datafusion]

2025-03-26 Thread via GitHub
JackKelly commented on PR #15437: URL: https://github.com/apache/datafusion/pull/15437#issuecomment-2754367882 (There's a slightly higher quality scan of this paper [here](https://cs-people.bu.edu/mathan/reading-groups/papers-classics/encapsulation-volcano.pdf). But I'd suggest keeping the

Re: [PR] Blog post for DataFusion 46.0.0 [datafusion-site]

2025-03-26 Thread via GitHub
berkaysynnada merged PR #64: URL: https://github.com/apache/datafusion-site/pull/64 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@d

Re: [I] Blog for DataFusion 46.0.0 [datafusion]

2025-03-26 Thread via GitHub
alamb closed issue #15053: Blog for DataFusion 46.0.0 URL: https://github.com/apache/datafusion/issues/15053 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-m

Re: [I] March 17, 2025: This week(s) in DataFusion [datafusion]

2025-03-26 Thread via GitHub
alamb commented on issue #15269: URL: https://github.com/apache/datafusion/issues/15269#issuecomment-2755022667 And another blog: https://datafusion.apache.org/blog/2025/03/24/datafusion-46.0.0/ (thanks @oznur-synnada !) -- This is an automated message from the Apache Git Service. To res

Re: [PR] feat: enable iceberg compat tests, more tests for complex types [datafusion-comet]

2025-03-26 Thread via GitHub
comphead commented on code in PR #1550: URL: https://github.com/apache/datafusion-comet/pull/1550#discussion_r2014693425 ## spark/src/main/scala/org/apache/comet/serde/QueryPlanSerde.scala: ## @@ -2722,7 +2721,11 @@ object QueryPlanSerde extends Logging with ShimQueryPlanSerde

[PR] Update concepts-readings-events.md [datafusion]

2025-03-26 Thread via GitHub
berkaysynnada opened a new pull request, #15440: URL: https://github.com/apache/datafusion/pull/15440 ## Which issue does this PR close? - Closes #. ## Rationale for this change update the release list with new 46.0.0 post ## What changes are includ

Re: [PR] Blog post for DataFusion 46.0.0 [datafusion-site]

2025-03-26 Thread via GitHub
kevinjqliu commented on PR #64: URL: https://github.com/apache/datafusion-site/pull/64#issuecomment-2754908243 💯 https://datafusion.apache.org/blog/2025/03/24/datafusion-46.0.0/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[PR] chore: Move optimize_subquery_sort into optimizer as a new rule Elimi… [datafusion]

2025-03-26 Thread via GitHub
irenjj opened a new pull request, #15441: URL: https://github.com/apache/datafusion/pull/15441 …nateSort ## Which issue does this PR close? - Closes #15435 ## Rationale for this change ## What changes are included in this PR? ## Are t

Re: [I] Add an option to display column types in the table [datafusion]

2025-03-26 Thread via GitHub
blaginin commented on issue #15442: URL: https://github.com/apache/datafusion/issues/15442#issuecomment-2755166494 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To u

Re: [I] Scalars are too verbose in column name output [datafusion]

2025-03-26 Thread via GitHub
blaginin commented on issue #15395: URL: https://github.com/apache/datafusion/issues/15395#issuecomment-2755180609 > If we have column type, we don't need to display type for inner elements. Maybe we can work on column type first? Thank you!!! That's fair, I've created a separate tick

Re: [I] Attach `Diagnostic` to "wrong number of arguments" error [datafusion]

2025-03-26 Thread via GitHub
prowang01 commented on issue #14432: URL: https://github.com/apache/datafusion/issues/14432#issuecomment-2755187725 Hi! I'm currently preparing my GSoC 2025 application and would love to contribute to this issue as a warm-up task. I understand this one involves attaching a `Diagnostic` to t

Re: [PR] feat: pushdown filter for native_iceberg_compat [datafusion-comet]

2025-03-26 Thread via GitHub
kazuyukitanimura commented on code in PR #1566: URL: https://github.com/apache/datafusion-comet/pull/1566#discussion_r2014788972 ## spark/src/test/scala/org/apache/comet/parquet/ParquetReadSuite.scala: ## @@ -1460,6 +1460,33 @@ class ParquetReadV1Suite extends ParquetReadSuite w

Re: [PR] Change default `EXPLAIN` format in `datafusion-cli` to `tree` format [datafusion]

2025-03-26 Thread via GitHub
blaginin commented on code in PR #15427: URL: https://github.com/apache/datafusion/pull/15427#discussion_r2014804810 ## datafusion-cli/tests/cli_integration.rs: ## @@ -74,6 +75,31 @@ fn cli_quick_test<'a>( assert_cmd_snapshot!(cmd); } +#[rstest] Review Comment: proba

Re: [PR] added fallback using reflection for backward-compatibility [datafusion-comet]

2025-03-26 Thread via GitHub
YanivKunda commented on code in PR #1573: URL: https://github.com/apache/datafusion-comet/pull/1573#discussion_r2014801685 ## .github/workflows/spark_sql_test.yml: ## @@ -45,7 +45,7 @@ jobs: matrix: os: [ubuntu-24.04] java-version: [11] -spark-ve

Re: [PR] Little changes "cache control" [datafusion]

2025-03-26 Thread via GitHub
alamb closed pull request #14611: Little changes "cache control" URL: https://github.com/apache/datafusion/pull/14611 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubsc

Re: [PR] Add dynamic pruning filters from TopK state [datafusion]

2025-03-26 Thread via GitHub
adriangb commented on code in PR #15301: URL: https://github.com/apache/datafusion/pull/15301#discussion_r2015098643 ## datafusion/physical-plan/src/topk/mod.rs: ## @@ -644,10 +738,122 @@ impl RecordBatchStore { } } +/// Pushdown of dynamic fitlers from TopK operators is

Re: [PR] Add dynamic pruning filters from TopK state [datafusion]

2025-03-26 Thread via GitHub
adriangb commented on code in PR #15301: URL: https://github.com/apache/datafusion/pull/15301#discussion_r2015103237 ## datafusion/core/src/datasource/physical_plan/parquet.rs: ## @@ -1847,6 +1848,28 @@ mod tests { writer.close().unwrap(); } +fn write_file_nu

Re: [PR] Add dynamic pruning filters from TopK state [datafusion]

2025-03-26 Thread via GitHub
adriangb commented on code in PR #15301: URL: https://github.com/apache/datafusion/pull/15301#discussion_r2015111827 ## datafusion/datasource-parquet/src/source.rs: ## @@ -587,4 +578,17 @@ impl FileSource for ParquetSource { } } } + +fn supports_dy

Re: [PR] Add `downcast_to_source` method for `DataSourceExec` [datafusion]

2025-03-26 Thread via GitHub
alamb commented on code in PR #15416: URL: https://github.com/apache/datafusion/pull/15416#discussion_r2014969803 ## docs/source/library-user-guide/upgrading.md: ## @@ -129,6 +129,20 @@ if let Some(datasource_exec) = plan.as_any().downcast_ref::() { # */ ``` +There's also a

[PR] chore(deps): bump tokio from 1.43.0 to 1.44.1 [datafusion]

2025-03-26 Thread via GitHub
dependabot[bot] opened a new pull request, #15347: URL: https://github.com/apache/datafusion/pull/15347 Bumps [tokio](https://github.com/tokio-rs/tokio) from 1.43.0 to 1.44.1. Release notes Sourced from https://github.com/tokio-rs/tokio/releases";>tokio's releases. Tokio v1.4

Re: [PR] Add GLOBAL context/modifier to SET statements [datafusion-sqlparser-rs]

2025-03-26 Thread via GitHub
MohamedAbdeen21 commented on code in PR #1767: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1767#discussion_r2004029227 ## src/ast/mod.rs: ## @@ -7919,11 +7921,28 @@ impl fmt::Display for ContextModifier { write!(f, "") }

Re: [PR] Update concepts-readings-events.md [datafusion]

2025-03-26 Thread via GitHub
berkaysynnada merged PR #15440: URL: https://github.com/apache/datafusion/pull/15440 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@

Re: [PR] Migrate datasource tests to insta [datafusion]

2025-03-26 Thread via GitHub
xudong963 merged PR #15258: URL: https://github.com/apache/datafusion/pull/15258 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@data

[PR] Migrate optimizer tests to insta [datafusion]

2025-03-26 Thread via GitHub
qstommyshu opened a new pull request, #15446: URL: https://github.com/apache/datafusion/pull/15446 ## Which issue does this PR close? - Closes #15396 . ## Rationale for this change ## What changes are included in this PR? Migrated tests in `data

Re: [I] Migrate subtrait tests to `insta` [datafusion]

2025-03-26 Thread via GitHub
qstommyshu commented on issue #15398: URL: https://github.com/apache/datafusion/issues/15398#issuecomment-2756274122 > Hi [@blaginin](https://github.com/blaginin?rgh-link-date=2025-03-26T21%3A33%3A40.000Z) and [@alamb](https://github.com/alamb?rgh-link-date=2025-03-26T21%3A33%3A40.000Z)

Re: [PR] added fallback using reflection for backward-compatibility [datafusion-comet]

2025-03-26 Thread via GitHub
wForget commented on code in PR #1573: URL: https://github.com/apache/datafusion-comet/pull/1573#discussion_r2015382561 ## .github/workflows/spark_sql_test.yml: ## @@ -45,7 +45,7 @@ jobs: matrix: os: [ubuntu-24.04] java-version: [11] -spark-versi

Re: [PR] Support binary temporal arithmetic with integers [datafusion]

2025-03-26 Thread via GitHub
github-actions[bot] closed pull request #13741: Support binary temporal arithmetic with integers URL: https://github.com/apache/datafusion/pull/13741 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] Always add round robin repartitioning to leaves (data sources), benefitting unbalanced / small datasets [datafusion]

2025-03-26 Thread via GitHub
github-actions[bot] closed pull request #13707: Always add round robin repartitioning to leaves (data sources), benefitting unbalanced / small datasets URL: https://github.com/apache/datafusion/pull/13707 -- This is an automated message from the Apache Git Service. To respond to the message,

Re: [PR] Migrate optimizer tests to insta [datafusion]

2025-03-26 Thread via GitHub
qstommyshu commented on PR #15445: URL: https://github.com/apache/datafusion/pull/15445#issuecomment-2756254291 Ah, I accidentally merged all bunch of code from the main branch... I think it is easier for me to resolve all these by just creating another branch and PR -- This is an automat

Re: [PR] feat: pushdown filter for native_iceberg_compat [datafusion-comet]

2025-03-26 Thread via GitHub
wForget commented on code in PR #1566: URL: https://github.com/apache/datafusion-comet/pull/1566#discussion_r2015410919 ## spark/src/test/scala/org/apache/comet/parquet/ParquetReadSuite.scala: ## @@ -1460,6 +1460,33 @@ class ParquetReadV1Suite extends ParquetReadSuite with Adap

[PR] Improve performance sort TPCH q3 with Utf8Vew ( Sort-preserving mergi… [datafusion]

2025-03-26 Thread via GitHub
zhuqi-lucas opened a new pull request, #15447: URL: https://github.com/apache/datafusion/pull/15447 …ng on a single Utf8View ) ## Which issue does this PR close? - Closes [#15403](https://github.com/apache/datafusion/issues/15403) ## Rationale for this change Impro

Re: [PR] fix: Refactor CometScanRule and fix bugs [datafusion-comet]

2025-03-26 Thread via GitHub
andygrove commented on PR #1483: URL: https://github.com/apache/datafusion-comet/pull/1483#issuecomment-2738005266 Thanks for the review @parthchandra and @mbutrovich -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use t

Re: [PR] Saner handling of nulls inside arrays [datafusion]

2025-03-26 Thread via GitHub
thinkharderdev commented on PR #15149: URL: https://github.com/apache/datafusion/pull/15149#issuecomment-2739947657 > > I recommend following whatever DuckDB (or postgres do) -- there is not muchv alue in DataFusion having different semantics from other systems > > * DuckDB doesn't ha

Re: [PR] chore: Reimplement ShuffleWriterExec using interleave_record_batch [datafusion-comet]

2025-03-26 Thread via GitHub
Kontinuation commented on PR #1511: URL: https://github.com/apache/datafusion-comet/pull/1511#issuecomment-2755868919 Reran TPC-H SF=100 on an m7i.4xlarge instances with `master = local[8]`, Most of the disk accesses hit the OS cache so the slow EBS didn't affect the query performance too

Re: [PR] chore: Reimplement ShuffleWriterExec using interleave_record_batch [datafusion-comet]

2025-03-26 Thread via GitHub
Kontinuation commented on PR #1511: URL: https://github.com/apache/datafusion-comet/pull/1511#issuecomment-2755875567 I have also refactored the handling for repartitioning to a single partition (#1453), this avoids saturating the off-heap memory and fixes the OOM in the TPC-DS test. We ca

Re: [PR] Add `FileScanConfigBuilder` [datafusion]

2025-03-26 Thread via GitHub
alamb commented on code in PR #15352: URL: https://github.com/apache/datafusion/pull/15352#discussion_r2015008401 ## datafusion/datasource/src/file_scan_config.rs: ## @@ -326,14 +544,15 @@ impl FileScanConfig { /// # Parameters: /// * `object_store_url`: See [`Self::ob

Re: [PR] Add dynamic pruning filters from TopK state [datafusion]

2025-03-26 Thread via GitHub
alamb commented on code in PR #15301: URL: https://github.com/apache/datafusion/pull/15301#discussion_r2015083952 ## datafusion/physical-plan/src/topk/mod.rs: ## @@ -644,10 +738,122 @@ impl RecordBatchStore { } } +/// Pushdown of dynamic fitlers from TopK operators is us

Re: [I] Attach `Diagnostic` to "wrong number of arguments" error [datafusion]

2025-03-26 Thread via GitHub
Chen-Yuan-Lai commented on issue #14432: URL: https://github.com/apache/datafusion/issues/14432#issuecomment-2756209133 @prowang01 Sure! Feel free to reassign the issue -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use t

[PR] Migrate optimizer tests to insta [datafusion]

2025-03-26 Thread via GitHub
qstommyshu opened a new pull request, #15445: URL: https://github.com/apache/datafusion/pull/15445 ## Which issue does this PR close? - Closes #15396 . ## Rationale for this change ## What changes are included in this PR? Migrated tests in `data

Re: [I] Move `optimize_subquery_sort` into optimizer as a new rule `EliminateSort` [datafusion]

2025-03-26 Thread via GitHub
irenjj commented on issue #15435: URL: https://github.com/apache/datafusion/issues/15435#issuecomment-2756228364 > > Using `datafusion-optimizer` in `datafusion-sql` can lead to dependency issue: > > ``` > > thread 'main' panicked at src/main.rs:84:9: > > circular dependency detecte

Re: [PR] Migrate optimizer tests to insta [datafusion]

2025-03-26 Thread via GitHub
qstommyshu closed pull request #15445: Migrate optimizer tests to insta URL: https://github.com/apache/datafusion/pull/15445 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [PR] perf: Reuse row converter during sort [datafusion]

2025-03-26 Thread via GitHub
alamb commented on PR #15302: URL: https://github.com/apache/datafusion/pull/15302#issuecomment-2755694280 There appears to be a change to the testing pin in this PR as well: ![Screenshot 2025-03-26 at 4 37 14  PM](https://github.com/user-attachments/assets/c5cfb049-6e43-44ef-8d61-44e4

Re: [PR] added fallback using reflection for backward-compatibility [datafusion-comet]

2025-03-26 Thread via GitHub
kazuyukitanimura commented on code in PR #1573: URL: https://github.com/apache/datafusion-comet/pull/1573#discussion_r2014966200 ## .github/workflows/spark_sql_test.yml: ## @@ -45,7 +45,7 @@ jobs: matrix: os: [ubuntu-24.04] java-version: [11] -sp

Re: [I] Feature: support cast `date` to `timestamp` with tz [datafusion]

2025-03-26 Thread via GitHub
Omega359 commented on issue #14638: URL: https://github.com/apache/datafusion/issues/14638#issuecomment-2755651138 I believe the arrow update is in the [arrow 54.3.0 release](https://github.com/apache/arrow-rs/releases/tag/54.3.0) so once DF is upgraded to that release we can verify it in D

Re: [I] Move `optimize_subquery_sort` into optimizer as a new rule `EliminateSort` [datafusion]

2025-03-26 Thread via GitHub
jayzhan211 commented on issue #15435: URL: https://github.com/apache/datafusion/issues/15435#issuecomment-2756154926 > Using `datafusion-optimizer` in `datafusion-sql` can lead to dependency issue: > > ``` > thread 'main' panicked at src/main.rs:84:9: > circular dependency detec

Re: [PR] Change default `EXPLAIN` format in `datafusion-cli` to `tree` format [datafusion]

2025-03-26 Thread via GitHub
blaginin commented on PR #15427: URL: https://github.com/apache/datafusion/pull/15427#issuecomment-2755465857 I think one issue with the current approach is that loading from env will break. before: https://github.com/user-attachments/assets/bef857c6-7fa8-4852-96ea-fe7fba39cc97";

Re: [PR] Change default `EXPLAIN` format in `datafusion-cli` to `tree` format [datafusion]

2025-03-26 Thread via GitHub
blaginin commented on code in PR #15427: URL: https://github.com/apache/datafusion/pull/15427#discussion_r2014812090 ## datafusion-cli/tests/cli_integration.rs: ## @@ -74,6 +75,31 @@ fn cli_quick_test<'a>( assert_cmd_snapshot!(cmd); } +#[rstest] Review Comment: this

Re: [PR] minor: Add new crates to labeler [datafusion]

2025-03-26 Thread via GitHub
alamb merged PR #15426: URL: https://github.com/apache/datafusion/pull/15426 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [I] Use spill manager in sort merge join [datafusion]

2025-03-26 Thread via GitHub
alamb closed issue #15400: Use spill manager in sort merge join URL: https://github.com/apache/datafusion/issues/15400 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubs

Re: [PR] refactor: Use SpillManager for all spilling scenarios [datafusion]

2025-03-26 Thread via GitHub
alamb commented on PR #15405: URL: https://github.com/apache/datafusion/pull/15405#issuecomment-2755747018 Thank you @2010YOUY01 and @comphead -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [I] Scalars are too verbose in column name output [datafusion]

2025-03-26 Thread via GitHub
alamb commented on issue #15395: URL: https://github.com/apache/datafusion/issues/15395#issuecomment-2755667376 > always rendering data in a more compact way (the first option from my list) - I think it is a better choice too The challenge is that it will change the schema of

Re: [I] Spark SQL test failures in native_iceberg_compat mode [datafusion-comet]

2025-03-26 Thread via GitHub
andygrove commented on issue #1542: URL: https://github.com/apache/datafusion-comet/issues/1542#issuecomment-2744045497 I'm looking into the core3 `row index generation` errors. At least one of them is failing with NPE in Comet code: ``` Caused by: java.lang.NullPointerException

Re: [I] Snowflake COPY INTO fails to parse with a semicolon [datafusion-sqlparser-rs]

2025-03-26 Thread via GitHub
tv42 closed issue #1519: Snowflake COPY INTO fails to parse with a semicolon URL: https://github.com/apache/datafusion-sqlparser-rs/issues/1519 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spe

Re: [PR] Add "end to end parquet reading test" for WASM [datafusion]

2025-03-26 Thread via GitHub
alamb commented on code in PR #15362: URL: https://github.com/apache/datafusion/pull/15362#discussion_r200920 ## datafusion/wasmtest/src/lib.rs: ## @@ -185,26 +206,56 @@ mod test { #[wasm_bindgen_test(unsupported = tokio::test)] async fn test_parquet_write() { -

Re: [PR] Enforce JOIN plan to require condition [datafusion]

2025-03-26 Thread via GitHub
comphead commented on code in PR #15334: URL: https://github.com/apache/datafusion/pull/15334#discussion_r2009201610 ## datafusion/sqllogictest/test_files/join.slt.part: ## @@ -625,6 +625,24 @@ FROM t1 11 11 11 +# join condition is required +# TODO: query error join con

Re: [PR] Add dynamic pruning filters from TopK state [datafusion]

2025-03-26 Thread via GitHub
adriangb commented on code in PR #15301: URL: https://github.com/apache/datafusion/pull/15301#discussion_r2015100172 ## datafusion/datasource-parquet/src/source.rs: ## @@ -259,6 +261,8 @@ pub struct ParquetSource { pub(crate) metrics: ExecutionPlanMetricsSet, /// Optio

Re: [PR] fix: Unconditionally wrap UNION BY NAME input nodes w/ `Projection` [datafusion]

2025-03-26 Thread via GitHub
alamb commented on code in PR #15242: URL: https://github.com/apache/datafusion/pull/15242#discussion_r2014999225 ## datafusion/expr/src/logical_plan/plan.rs: ## @@ -2679,24 +2679,16 @@ impl Union { Ok(Union { inputs, schema }) } -/// When constructing a `UNI

Re: [PR] refactor(hash_join): Move JoinHashMap to separate mod [datafusion]

2025-03-26 Thread via GitHub
alamb merged PR #15419: URL: https://github.com/apache/datafusion/pull/15419 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] Add support for DISTINCT + ORDER BY in `ARRAY_AGG` [datafusion]

2025-03-26 Thread via GitHub
alamb commented on PR #14413: URL: https://github.com/apache/datafusion/pull/14413#issuecomment-2755776567 🚀 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe

Re: [I] array_agg cannot perform both distinct and order_by [datafusion]

2025-03-26 Thread via GitHub
alamb closed issue #12371: array_agg cannot perform both distinct and order_by URL: https://github.com/apache/datafusion/issues/12371 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comm

Re: [PR] Little changes "cache control" [datafusion]

2025-03-26 Thread via GitHub
alamb commented on PR #14611: URL: https://github.com/apache/datafusion/pull/14611#issuecomment-2755780836 Given it is not clear what problem is solving and it has been dormant for a while, I am going to close it. Please reopen when we can better articulate why this change is needed

Re: [PR] feat: implement GroupsAccumulator for `count(DISTINCT)` aggr [datafusion]

2025-03-26 Thread via GitHub
alamb commented on PR #15324: URL: https://github.com/apache/datafusion/pull/15324#issuecomment-2755778699 i think this is still a work in progress, so marking it as a draft to clean up the review queue -- This is an automated message from the Apache Git Service. To respond to the message

Re: [I] A 'cache control' header is missing or empty webkit [datafusion]

2025-03-26 Thread via GitHub
alamb commented on issue #14542: URL: https://github.com/apache/datafusion/issues/14542#issuecomment-2755782575 > the emogi images is not should be fix to line .the images has been wraping and frontend not looks like good What images are you referring to? Maybe you can provide

Re: [PR] add manual trigger for extended tests in pull requests [datafusion]

2025-03-26 Thread via GitHub
alamb commented on PR #14331: URL: https://github.com/apache/datafusion/pull/14331#issuecomment-2755784817 BTW @danila-b has a solution here: https://github.com/apache/datafusion/pull/15101 -- This is an automated message from the Apache Git Service. To respond to the message, please log

Re: [PR] refactor(hash_join): Move JoinHashMap to separate mod [datafusion]

2025-03-26 Thread via GitHub
alamb commented on PR #15419: URL: https://github.com/apache/datafusion/pull/15419#issuecomment-2755757551 🚀 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe

Re: [PR] Add `downcast_to_source` method for `DataSourceExec` [datafusion]

2025-03-26 Thread via GitHub
xudong963 commented on code in PR #15416: URL: https://github.com/apache/datafusion/pull/15416#discussion_r2015023075 ## docs/source/library-user-guide/upgrading.md: ## @@ -129,6 +129,20 @@ if let Some(datasource_exec) = plan.as_any().downcast_ref::() { # */ ``` +There's al

Re: [I] Migrate subtrait tests to `insta` [datafusion]

2025-03-26 Thread via GitHub
qstommyshu commented on issue #15398: URL: https://github.com/apache/datafusion/issues/15398#issuecomment-2755795572 Hi @blaginin and @alamb Just want to confirm the example test files are really just "examples", right? I also see there are more files under the *subtrait* test cases

Re: [PR] chore: Upgrade `rand` crate and some other minor crates [datafusion]

2025-03-26 Thread via GitHub
comphead commented on PR #14967: URL: https://github.com/apache/datafusion/pull/14967#issuecomment-2754648315 Depends on https://github.com/apache/arrow-rs/issues/7084 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use th

Re: [PR] Change default `EXPLAIN` format in `datafusion-cli` to `tree` format [datafusion]

2025-03-26 Thread via GitHub
alamb commented on PR #15427: URL: https://github.com/apache/datafusion/pull/15427#issuecomment-2755661874 > I think one issue with the current approach is that loading from env will break. this is a good call -- I will fix that -- This is an automated message from the Apache G

Re: [I] `BinaryExpr` evaluate lacks optimization for `Or` and `And` scenarios [datafusion]

2025-03-26 Thread via GitHub
alamb commented on issue #11212: URL: https://github.com/apache/datafusion/issues/11212#issuecomment-2755686534 Thank you for bringing this up again @acking-you > If we can optimize the specialized query you mentioned and not slowing down other queries, it would be nice to have it.

[PR] add cargo insta to dev dependencies [datafusion]

2025-03-26 Thread via GitHub
qstommyshu opened a new pull request, #15444: URL: https://github.com/apache/datafusion/pull/15444 ## Which issue does this PR close? - Closes #15398. ## Rationale for this change ## What changes are included in this PR? Migrated tests in da

Re: [PR] added fallback using reflection for backward-compatibility [datafusion-comet]

2025-03-26 Thread via GitHub
YanivKunda commented on code in PR #1573: URL: https://github.com/apache/datafusion-comet/pull/1573#discussion_r2015042394 ## .github/workflows/spark_sql_test.yml: ## @@ -45,7 +45,7 @@ jobs: matrix: os: [ubuntu-24.04] java-version: [11] -spark-ve

Re: [PR] Migrate subtrait tests to insta [datafusion]

2025-03-26 Thread via GitHub
qstommyshu commented on PR #15444: URL: https://github.com/apache/datafusion/pull/15444#issuecomment-2755817539 Just to clarify this PR is **NOT FINISHED YET**. I'm still awaiting for an answer of the [scope](https://github.com/apache/datafusion/issues/15398#issuecomment-2755795572) of thi

Re: [I] `BinaryExpr` evaluate lacks optimization for `Or` and `And` scenarios [datafusion]

2025-03-26 Thread via GitHub
acking-you commented on issue #11212: URL: https://github.com/apache/datafusion/issues/11212#issuecomment-2756305533 Thank you for your guidance and advice @alamb . I will try to work on these later today (I might be a bit busy right now). -- This is an automated message from the Apach

Re: [PR] feat: pushdown filter for native_iceberg_compat [datafusion-comet]

2025-03-26 Thread via GitHub
wForget commented on code in PR #1566: URL: https://github.com/apache/datafusion-comet/pull/1566#discussion_r2015421314 ## spark/src/main/scala/org/apache/comet/parquet/SourceFilterSerde.scala: ## @@ -0,0 +1,175 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under o

Re: [PR] Add `downcast_to_source` method for `DataSourceExec` [datafusion]

2025-03-26 Thread via GitHub
xudong963 merged PR #15416: URL: https://github.com/apache/datafusion/pull/15416 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@data

Re: [I] Why does it report an error when building the `branch-28` branch with `cargo build`? [datafusion]

2025-03-26 Thread via GitHub
mustdo-afk commented on issue #15429: URL: https://github.com/apache/datafusion/issues/15429#issuecomment-2756318054 > This is a known issue - `chrono v0.4.40` broke a bunch of `arrow-rs` releases. The only fix I know of to compile the older versions is to edit the lockfile to use `chrono v

Re: [I] Why does it report an error when building the `branch-28` branch with `cargo build`? [datafusion]

2025-03-26 Thread via GitHub
mustdo-afk closed issue #15429: Why does it report an error when building the `branch-28` branch with `cargo build`? URL: https://github.com/apache/datafusion/issues/15429 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use t

[PR] Support computing statistics for FileGroup [datafusion]

2025-03-26 Thread via GitHub
xudong963 opened a new pull request, #15432: URL: https://github.com/apache/datafusion/pull/15432 ## Which issue does this PR close? - Follow up: https://github.com/apache/datafusion/pull/15379 ## Rationale for this change ## What changes are included in t

  1   2   >