Re: [PR] Add new stats pruning helpers to allow combining partition values in file level stats [datafusion]

2025-05-29 Thread via GitHub
adriangb commented on code in PR #16139: URL: https://github.com/apache/datafusion/pull/16139#discussion_r2114102421 ## datafusion/common/src/pruning.rs: ## @@ -122,3 +127,984 @@ pub trait PruningStatistics { values: &HashSet, ) -> Option; } + +/// Prune files bas

Re: [PR] Add new stats pruning helpers to allow combining partition values in file level stats [datafusion]

2025-05-29 Thread via GitHub
adriangb commented on code in PR #16139: URL: https://github.com/apache/datafusion/pull/16139#discussion_r2114104284 ## datafusion/common/src/pruning.rs: ## @@ -122,3 +127,984 @@ pub trait PruningStatistics { values: &HashSet, ) -> Option; } + +/// Prune files bas

Re: [PR] feat: create builder for disk manager [datafusion]

2025-05-29 Thread via GitHub
xudong963 merged PR #16191: URL: https://github.com/apache/datafusion/pull/16191 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@data

Re: [I] Make `DiskManagerBuilder` to construct DiskManagers [datafusion]

2025-05-29 Thread via GitHub
xudong963 closed issue #15319: Make `DiskManagerBuilder` to construct DiskManagers URL: https://github.com/apache/datafusion/issues/15319 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [I] Avoid re-implementing expression simplification in pruning.rs [datafusion]

2025-05-29 Thread via GitHub
xudong963 commented on issue #16004: URL: https://github.com/apache/datafusion/issues/16004#issuecomment-2919485666 The existing expression simplification code is used to optimize logical expr. The `expr` mentioned by the issue is physical expr and I think @adriangb 's expr is also physical

Re: [PR] Add support for `TABLESAMPLE` pipe operator [datafusion-sqlparser-rs]

2025-05-29 Thread via GitHub
hendrikmakait commented on PR #1860: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1860#issuecomment-2919634086 @iffyio: CI should be fixed now -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abo

Re: [PR] Return an error on overflow in `do_append_val_inner` [datafusion]

2025-05-29 Thread via GitHub
alamb merged PR #16201: URL: https://github.com/apache/datafusion/pull/16201 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [I] Return an error instead of panic in `ByteGroupValueBuilder::do_append_val_inner` [datafusion]

2025-05-29 Thread via GitHub
alamb closed issue #15969: Return an error instead of panic in `ByteGroupValueBuilder::do_append_val_inner` URL: https://github.com/apache/datafusion/issues/15969 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL ab

Re: [I] Treat truncated parquet stats as inexact [datafusion]

2025-05-29 Thread via GitHub
CookiePieWw commented on issue #15976: URL: https://github.com/apache/datafusion/issues/15976#issuecomment-2920132245 Hi :) I've spent some time on this and found the problem in `get_col_stats` https://github.com/apache/datafusion/blob/2c2f225926958b6abf06b01fcfb594017531043c/datafusion/d

[PR] fix: map parquet field_id correctly (native_iceberg_compat) [datafusion-comet]

2025-05-29 Thread via GitHub
parthchandra opened a new pull request, #1815: URL: https://github.com/apache/datafusion-comet/pull/1815 ## Which issue does this PR close? Part of https://github.com/apache/datafusion-comet/issues/1542 ## Rationale for this change Parquet files may have field id's and na

Re: [PR] chore: [native scans] Ignore Spark SQL test for string predicate pushdown [datafusion-comet]

2025-05-29 Thread via GitHub
andygrove merged PR #1768: URL: https://github.com/apache/datafusion-comet/pull/1768 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@

Re: [I] Simplify Filter Pushdown APIs for Better Maintainability and Developer Experience [datafusion]

2025-05-29 Thread via GitHub
xudong963 commented on issue #16188: URL: https://github.com/apache/datafusion/issues/16188#issuecomment-2919458473 I still don't get a chance to read the whole Filter pushdown APIs, but will do in a week, then will give some feedback. -- This is an automated message from the Apache Git S

Re: [PR] Fix ScalarStructBuilder::build() for an empty struct [datafusion]

2025-05-29 Thread via GitHub
xudong963 merged PR #16205: URL: https://github.com/apache/datafusion/pull/16205 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@data

Re: [I] Run DataFusion benchmarks regularly and track performance history over time [datafusion]

2025-05-29 Thread via GitHub
xudong963 commented on issue #5504: URL: https://github.com/apache/datafusion/issues/5504#issuecomment-2919472815 Another way is directly to use https://benchmark.clickhouse.com/, change some code to adapt Datafusion's benchmark aim -- This is an automated message from the Apache Git Serv

Re: [PR] Change default SQL mapping for `VARCAHR` from `Utf8` to `Utf8View` [datafusion]

2025-05-29 Thread via GitHub
zhuqi-lucas commented on PR #16142: URL: https://github.com/apache/datafusion/pull/16142#issuecomment-2919746777 Thank you @xudong963 for review! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to t

Re: [PR] chore: Speed up "PR Builds" CI workflows [datafusion-comet]

2025-05-29 Thread via GitHub
andygrove commented on PR #1807: URL: https://github.com/apache/datafusion-comet/pull/1807#issuecomment-2919939803 Thanks for the reviews @parthchandra and @comphead -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

Re: [I] Stop duplicating Apple Silicon Spark 3.4 PR builds [datafusion-comet]

2025-05-29 Thread via GitHub
andygrove closed issue #1783: Stop duplicating Apple Silicon Spark 3.4 PR builds URL: https://github.com/apache/datafusion-comet/issues/1783 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specif

Re: [I] Run subsets of Comet scalatest suites in parallel [datafusion-comet]

2025-05-29 Thread via GitHub
andygrove closed issue #1800: Run subsets of Comet scalatest suites in parallel URL: https://github.com/apache/datafusion-comet/issues/1800 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specifi

Re: [PR] chore: Bump DataFusion to git rev 2c2f225 [datafusion-comet]

2025-05-29 Thread via GitHub
codecov-commenter commented on PR #1814: URL: https://github.com/apache/datafusion-comet/pull/1814#issuecomment-2919931772 ## [Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/1814?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_ca

Re: [PR] chore: enable map_values testing since we fall back on nested types for defa… [datafusion-comet]

2025-05-29 Thread via GitHub
codecov-commenter commented on PR #1813: URL: https://github.com/apache/datafusion-comet/pull/1813#issuecomment-2919942250 ## [Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/1813?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_ca

Re: [PR] WIP: Test DataFusion with experimental parquet pushdown [datafusion]

2025-05-29 Thread via GitHub
alamb commented on PR #16208: URL: https://github.com/apache/datafusion/pull/16208#issuecomment-2919955178 🤖 `./gh_compare_branch.sh` [Benchmark Script](https://github.com/alamb/datafusion-benchmarking/blob/main/gh_compare_branch.sh) Running Linux aal-dev 6.11.0-1013-gcp #13~24.04.1-Ubun

Re: [I] [native_iceberg_compat] Spark SQL core-2 "filter pushdown - StringPredicate" failure [datafusion-comet]

2025-05-29 Thread via GitHub
andygrove closed issue #1767: [native_iceberg_compat] Spark SQL core-2 "filter pushdown - StringPredicate" failure URL: https://github.com/apache/datafusion-comet/issues/1767 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and us

Re: [I] [native_iceberg_compat] Spark SQL core-2 "filter pushdown - StringPredicate" failure [datafusion-comet]

2025-05-29 Thread via GitHub
andygrove commented on issue #1767: URL: https://github.com/apache/datafusion-comet/issues/1767#issuecomment-2920364682 Fixed in https://github.com/apache/datafusion-comet/pull/1768 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to Gi

Re: [PR] feat: Add auto mode for `COMET_PARQUET_SCAN_IMPL` [datafusion-comet]

2025-05-29 Thread via GitHub
andygrove commented on PR #1747: URL: https://github.com/apache/datafusion-comet/pull/1747#issuecomment-2920369648 > Not sure why this would cause the ci failures that we see here. Maybe defer this until some more of the known issues are fixed? Some of the tests need updating because

Re: [I] Release DataFusion `48.0.0` (June 2025) [datafusion]

2025-05-29 Thread via GitHub
andygrove commented on issue #15771: URL: https://github.com/apache/datafusion/issues/15771#issuecomment-2920373095 > Since updating Comet to use latest DataFusion (pinned dependency), we have been seeing regular but intermittent CI failures that we are still trying to debug. This is

Re: [I] Expr formatting missing parentheses [datafusion]

2025-05-29 Thread via GitHub
hendrikmakait commented on issue #16054: URL: https://github.com/apache/datafusion/issues/16054#issuecomment-2920043776 I've had a closer look, we simply wrap `- {foo}` in parentheses regardless of whether that's strictly necessary. Following the same principle for `BinaryExpr` won't look a

Re: [PR] WIP: Test DataFusion with experimental parquet pushdown [datafusion]

2025-05-29 Thread via GitHub
alamb commented on PR #16208: URL: https://github.com/apache/datafusion/pull/16208#issuecomment-2920053295 🤖: Benchmark completed Details ``` Comparing HEAD and alamb_test_filter_pushdown Benchmark clickbench_extended.json ---

Re: [PR] WIP: Test DataFusion with experimental parquet pushdown [datafusion]

2025-05-29 Thread via GitHub
alamb commented on PR #16208: URL: https://github.com/apache/datafusion/pull/16208#issuecomment-2920064834 > 🤖: Benchmark completed Hmm that is somewhat depressing. I will investigate tomorrow -- This is an automated message from the Apache Git Service. To respond to the message, pl

Re: [I] Treat truncated parquet stats as inexact [datafusion]

2025-05-29 Thread via GitHub
alamb commented on issue #15976: URL: https://github.com/apache/datafusion/issues/15976#issuecomment-2920485084 > But I didn't find a method to access the ..exact flags in StatisticsConverter, so my plan is to first add functions similar to row_group_mins to the converter to extract the fla

Re: [PR] chore: add assertion that not using comet scan but using native scan [datafusion-comet]

2025-05-29 Thread via GitHub
andygrove commented on PR #1793: URL: https://github.com/apache/datafusion-comet/pull/1793#issuecomment-2919558936 > Now I'm **really** confused, where in the code there is reuse of buffers? The key structs are `MutableBuffer` and `ParquetMutableBuffer`. -- This is an automated mes

Re: [PR] Add support for mysql's drop index (`DROP INDEX idx_a ON table_a` and `ALTER TABLE table_a DROP INDEX idx_a`) [datafusion-sqlparser-rs]

2025-05-29 Thread via GitHub
vimko commented on PR #1865: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1865#issuecomment-2919372439 @iffyio Indeed, I didn't notice that. Thank you very much for reminding me. -- This is an automated message from the Apache Git Service. To respond to the message, please

Re: [PR] Change default SQL mapping for `VARCAHR` from `Utf8` to `Utf8View` [datafusion]

2025-05-29 Thread via GitHub
xudong963 commented on PR #16142: URL: https://github.com/apache/datafusion/pull/16142#issuecomment-2919399177 I'm reviewing -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] feat: support inability to yield for loop when it's not using Tokio MPSC (RecordBatchReceiverStream) [datafusion]

2025-05-29 Thread via GitHub
alamb commented on PR #16196: URL: https://github.com/apache/datafusion/pull/16196#issuecomment-2920543295 🤖 `./gh_compare_branch.sh` [Benchmark Script](https://github.com/alamb/datafusion-benchmarking/blob/main/gh_compare_branch.sh) Running Linux aal-dev 6.11.0-1013-gcp #13~24.04.1-Ubun

Re: [PR] feat: support inability to yield for loop when it's not using Tokio MPSC (RecordBatchReceiverStream) [datafusion]

2025-05-29 Thread via GitHub
alamb commented on PR #16196: URL: https://github.com/apache/datafusion/pull/16196#issuecomment-2920533806 > Thank you @alamb , it's surprising that performance has no regression, even faster for clickbench_partitioned, it may due to we yield for each partition running, and those make the p

Re: [PR] Add support for `TABLESAMPLE` pipe operator [datafusion-sqlparser-rs]

2025-05-29 Thread via GitHub
alamb commented on PR #1860: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1860#issuecomment-2920524140 I restarted the checks -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the s

Re: [PR] feat: support inability to yield for loop when it's not using Tokio MPSC (RecordBatchReceiverStream) [datafusion]

2025-05-29 Thread via GitHub
alamb commented on PR #16196: URL: https://github.com/apache/datafusion/pull/16196#issuecomment-2920564055 🤖: Benchmark completed Details ``` Comparing HEAD and issue_16193 Benchmark cancellation.json ┏

Re: [PR] Reduce size of `Expr` struct [datafusion]

2025-05-29 Thread via GitHub
hendrikmakait commented on code in PR #16207: URL: https://github.com/apache/datafusion/pull/16207#discussion_r2114223556 ## datafusion/expr/src/expr.rs: ## @@ -330,7 +331,7 @@ pub enum Expr { /// [`ExprFunctionExt`]: crate::expr_fn::ExprFunctionExt AggregateFunction(A

Re: [PR] chore: [native scans] Ignore Spark SQL test for string predicate pushdown [datafusion-comet]

2025-05-29 Thread via GitHub
andygrove commented on code in PR #1768: URL: https://github.com/apache/datafusion-comet/pull/1768#discussion_r2114220564 ## spark/src/test/scala/org/apache/comet/CometStringExpressionSuite.scala: ## @@ -20,87 +20,68 @@ package org.apache.comet import org.apache.spark.sql.Co

Re: [PR] chore: [native scans] Ignore Spark SQL test for string predicate pushdown [datafusion-comet]

2025-05-29 Thread via GitHub
andygrove commented on code in PR #1768: URL: https://github.com/apache/datafusion-comet/pull/1768#discussion_r2114221607 ## spark/src/test/scala/org/apache/comet/CometStringExpressionSuite.scala: ## @@ -114,92 +95,91 @@ class CometStringExpressionSuite extends CometTestBase {

Re: [PR] chore: [native scans] Ignore Spark SQL test for string predicate pushdown [datafusion-comet]

2025-05-29 Thread via GitHub
andygrove commented on PR #1768: URL: https://github.com/apache/datafusion-comet/pull/1768#issuecomment-2919781336 Could I get a review @parthchandra @mbutrovich -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the UR

[PR] enable map_values testing since we fall back on nested types for defa… [datafusion-comet]

2025-05-29 Thread via GitHub
zhuqi-lucas opened a new pull request, #1813: URL: https://github.com/apache/datafusion-comet/pull/1813 …ult values ## Which issue does this PR close? Part of Closes [#1789](https://github.com/apache/datafusion-comet/issues/1789) We should fix map_values when

Re: [PR] enable map_values testing since we fall back on nested types for defa… [datafusion-comet]

2025-05-29 Thread via GitHub
mbutrovich commented on PR #1813: URL: https://github.com/apache/datafusion-comet/pull/1813#issuecomment-2919813707 I thought we were blocked on a DataFusion fix for this. https://github.com/apache/datafusion/pull/16203 Maybe I'm getting the map issues mixed up. -- This is an auto

Re: [PR] Return an error on overflow in `do_append_val_inner` [datafusion]

2025-05-29 Thread via GitHub
alamb commented on PR #16201: URL: https://github.com/apache/datafusion/pull/16201#issuecomment-2919242427 🤖: Benchmark completed Details ``` Comparing HEAD and issue-15969-error-on-buffer-overflow Benchmark clickbench_extended.json -

Re: [PR] chore: enable map_values testing since we fall back on nested types for defa… [datafusion-comet]

2025-05-29 Thread via GitHub
zhuqi-lucas commented on PR #1813: URL: https://github.com/apache/datafusion-comet/pull/1813#issuecomment-2919840718 Thank you @mbutrovich for review, i just noticed some fix in apache datafusion is in progress. https://github.com/apache/datafusion/pull/16203 But it seems we a

Re: [PR] Reduce size of `Expr` struct [datafusion]

2025-05-29 Thread via GitHub
hendrikmakait commented on code in PR #16207: URL: https://github.com/apache/datafusion/pull/16207#discussion_r2114253916 ## datafusion/expr/src/expr.rs: ## @@ -330,7 +331,7 @@ pub enum Expr { /// [`ExprFunctionExt`]: crate::expr_fn::ExprFunctionExt AggregateFunction(A

Re: [PR] Add new stats pruning helpers to allow combining partition values in file level stats [datafusion]

2025-05-29 Thread via GitHub
xudong963 commented on code in PR #16139: URL: https://github.com/apache/datafusion/pull/16139#discussion_r2113890230 ## datafusion/common/src/pruning.rs: ## @@ -122,3 +127,984 @@ pub trait PruningStatistics { values: &HashSet, ) -> Option; } + +/// Prune files ba

Re: [PR] Return an error on overflow in `do_append_val_inner` [datafusion]

2025-05-29 Thread via GitHub
alamb commented on PR #16201: URL: https://github.com/apache/datafusion/pull/16201#issuecomment-2919640903 🚀 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe

[PR] feat: Support null aware + equijoins for `NestedLoopJoin` [datafusion]

2025-05-29 Thread via GitHub
jonathanc-n opened a new pull request, #16210: URL: https://github.com/apache/datafusion/pull/16210 ## Which issue does this PR close? - Closes #16051 . ## Rationale for this change Supporting null aware loop join in the case that cost is cheaper than Hash join.

Re: [PR] fix: map parquet field_id correctly (native_iceberg_compat) [datafusion-comet]

2025-05-29 Thread via GitHub
andygrove commented on code in PR #1815: URL: https://github.com/apache/datafusion-comet/pull/1815#discussion_r2114468324 ## common/src/main/java/org/apache/comet/parquet/NativeBatchReader.java: ## @@ -236,7 +236,7 @@ public NativeBatchReader(AbstractColumnReader[] columnReader

[PR] WIP: Test DataFusion with experimental parquet pushdown [datafusion]

2025-05-29 Thread via GitHub
alamb opened a new pull request, #16208: URL: https://github.com/apache/datafusion/pull/16208 This PR is for testing DataFusion with the code in the following PR - https://github.com/apache/arrow-rs/pull/7513 I want to run 2 experiments: 1. Does using `IncremntalRecordBatchBuilde

[PR] chore: Bump DF to 2c2f225 [datafusion-comet]

2025-05-29 Thread via GitHub
andygrove opened a new pull request, #1814: URL: https://github.com/apache/datafusion-comet/pull/1814 ## Which issue does this PR close? Closes #. ## Rationale for this change ## What changes are included in this PR? ## How are these changes

[I] docs: Instructions for running sbt do not work [datafusion-comet]

2025-05-29 Thread via GitHub
andygrove opened a new issue, #1812: URL: https://github.com/apache/datafusion-comet/issues/1812 ### Describe the bug I tried running sbt for running individual tests and it failed to compile Spark due to OOM. The following worked, so we should update the docs to provide the memory s

Re: [I] Expr formatting missing parentheses [datafusion]

2025-05-29 Thread via GitHub
hendrikmakait commented on issue #16054: URL: https://github.com/apache/datafusion/issues/16054#issuecomment-2919976201 It looks like we also misplace parentheses when combining binary expressions and unary expressions: ``` > select -(1+2); +-+ | (- In

Re: [I] Spark-compatible CAST operation [datafusion]

2025-05-29 Thread via GitHub
alamb commented on issue #11201: URL: https://github.com/apache/datafusion/issues/11201#issuecomment-2919978927 The key will be to find some way to control what PhysicalExpr is used during the physical planner There might already be an API to do this (provide a custom planner for Expr

[PR] Fix SchemaDisplay for nested `BinaryExpr` [datafusion]

2025-05-29 Thread via GitHub
hendrikmakait opened a new pull request, #16209: URL: https://github.com/apache/datafusion/pull/16209 ## Which issue does this PR close? - Closes https://github.com/apache/datafusion/issues/16054. ## Rationale for this change ## What changes are included i

Re: [I] Blog post about parquet vs custom file formats [datafusion]

2025-05-29 Thread via GitHub
alamb commented on issue #16149: URL: https://github.com/apache/datafusion/issues/16149#issuecomment-2919981992 3x faster for Q21 is pretty neat to see -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to g

Re: [PR] Additional placeholder datatype inferencing [datafusion]

2025-05-29 Thread via GitHub
kczimm commented on code in PR #15980: URL: https://github.com/apache/datafusion/pull/15980#discussion_r2114359645 ## datafusion/expr/src/logical_plan/plan.rs: ## @@ -1494,6 +1494,14 @@ impl LogicalPlan { let mut param_types: HashMap> = HashMap::new(); self.

Re: [PR] fix: translate missing or corrupt file exceptions, fall back if asked to ignore [datafusion-comet]

2025-05-29 Thread via GitHub
parthchandra merged PR #1765: URL: https://github.com/apache/datafusion-comet/pull/1765 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr.

Re: [I] Reduce page metadata loading to only what is necessary for query execution in ParquetOpen [datafusion]

2025-05-29 Thread via GitHub
alamb commented on issue #16200: URL: https://github.com/apache/datafusion/issues/16200#issuecomment-291747 Thank you @zhuqi-lucas -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specifi

Re: [PR] chore: Speed up "PR Builds" CI workflows [datafusion-comet]

2025-05-29 Thread via GitHub
andygrove merged PR #1807: URL: https://github.com/apache/datafusion-comet/pull/1807 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@

Re: [PR] fix: map parquet field_id correctly (native_iceberg_compat) [datafusion-comet]

2025-05-29 Thread via GitHub
codecov-commenter commented on PR #1815: URL: https://github.com/apache/datafusion-comet/pull/1815#issuecomment-2920187518 ## [Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/1815?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_ca

Re: [PR] feat: support inability to yield for loop when it's not using Tokio MPSC (RecordBatchReceiverStream) [datafusion]

2025-05-29 Thread via GitHub
alamb commented on code in PR #16196: URL: https://github.com/apache/datafusion/pull/16196#discussion_r2114732484 ## datafusion/physical-plan/src/aggregates/no_grouping.rs: ## @@ -77,6 +77,9 @@ impl AggregateStream { let baseline_metrics = BaselineMetrics::new(&agg.metr

Re: [I] AggregateExec not cancellable [datafusion]

2025-05-29 Thread via GitHub
alamb commented on issue #16193: URL: https://github.com/apache/datafusion/issues/16193#issuecomment-2920591397 - I think https://github.com/apache/datafusion/pull/16196 will solve this problem. - Notes: https://github.com/apache/datafusion/pull/16196#pullrequestreview-2879569007 --

Re: [I] Queries with exchange reuse sometimes fail in Comet [datafusion-comet]

2025-05-29 Thread via GitHub
andygrove commented on issue #1798: URL: https://github.com/apache/datafusion-comet/issues/1798#issuecomment-2920604833 Here is a related PR to Spark that helped fix the problem in Spark RAPIDS. https://github.com/apache/spark/commit/52e3cf9ff50b4209e29cb06df09b1ef3a18bc83b -- Thi

Re: [PR] feat: rewrite subquery into dependent join logical plan [datafusion]

2025-05-29 Thread via GitHub
duongcongtoai commented on code in PR #16016: URL: https://github.com/apache/datafusion/pull/16016#discussion_r2113711082 ## datafusion/optimizer/src/decorrelate_general.rs: ## @@ -0,0 +1,1137 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contribu

Re: [PR] Return an error on overflow in `do_append_val_inner` [datafusion]

2025-05-29 Thread via GitHub
alamb commented on PR #16201: URL: https://github.com/apache/datafusion/pull/16201#issuecomment-2919156519 🤖 `./gh_compare_branch.sh` [Benchmark Script](https://github.com/alamb/datafusion-benchmarking/blob/main/gh_compare_branch.sh) Running Linux aal-dev 6.11.0-1013-gcp #13~24.04.1-Ubun

Re: [I] Reduce page metadata loading to only what is necessary for query execution in ParquetOpen [datafusion]

2025-05-29 Thread via GitHub
zhuqi-lucas commented on issue #16200: URL: https://github.com/apache/datafusion/issues/16200#issuecomment-2919165846 I'd like to take this issue and try. And feel free to reassign if i don't submit a PR for a long time. -- This is an automated message from the Apache Git Service. To resp

Re: [I] Reduce page metadata loading to only what is necessary for query execution in ParquetOpen [datafusion]

2025-05-29 Thread via GitHub
zhuqi-lucas commented on issue #16200: URL: https://github.com/apache/datafusion/issues/16200#issuecomment-2919165981 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. T

Re: [I] Reduce page metadata loading to only what is necessary for query execution in ParquetOpen [datafusion]

2025-05-29 Thread via GitHub
zhuqi-lucas commented on issue #16200: URL: https://github.com/apache/datafusion/issues/16200#issuecomment-2919171527 1. The first step which i have done the experiment to rewrite the clickbench partition to support page_index, details: https://github.com/apache/datafusion/issues/1614

Re: [PR] fix: translate missing or corrupt file exceptions in NativeUtil, fall back native scans if asked to ignore [datafusion-comet]

2025-05-29 Thread via GitHub
mbutrovich commented on PR #1765: URL: https://github.com/apache/datafusion-comet/pull/1765#issuecomment-2919178189 It's changing an exception message that it isn't even matching: ``` Expected :"...o bypass this error.[]" Actual :"...o bypass this error.[ (of class java.lang.Strin

Re: [PR] Adds support for mysql's drop index [datafusion-sqlparser-rs]

2025-05-29 Thread via GitHub
dmzmk commented on code in PR #1864: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1864#discussion_r2113709060 ## src/parser/mod.rs: ## @@ -6249,6 +6249,11 @@ impl<'a> Parser<'a> { loc ); } +// Mysql requires table sp

Re: [PR] feat: rewrite subquery into dependent join logical plan [datafusion]

2025-05-29 Thread via GitHub
duongcongtoai commented on code in PR #16016: URL: https://github.com/apache/datafusion/pull/16016#discussion_r2113711082 ## datafusion/optimizer/src/decorrelate_general.rs: ## @@ -0,0 +1,1137 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contribu

Re: [PR] Implement schema adapter support for FileSource and add integration tests [datafusion]

2025-05-29 Thread via GitHub
kosiew commented on code in PR #16148: URL: https://github.com/apache/datafusion/pull/16148#discussion_r2113115289 ## datafusion/datasource/src/test_util.rs: ## @@ -81,6 +83,8 @@ impl FileSource for MockSource { fn file_type(&self) -> &str { "mock" } + +im

Re: [PR] chore: add assertion that not using comet scan but using native scan [datafusion-comet]

2025-05-29 Thread via GitHub
rluvaton commented on PR #1793: URL: https://github.com/apache/datafusion-comet/pull/1793#issuecomment-2918699562 Now I'm **really** confused, where in the code there is reuse of buffers? -- This is an automated message from the Apache Git Service. To respond to the message, please log on

Re: [I] Timeouts reading "large" files from object stores over "slow" connections [datafusion]

2025-05-29 Thread via GitHub
alamb commented on issue #15067: URL: https://github.com/apache/datafusion/issues/15067#issuecomment-2919147319 There is a fix upstream that will help this symptom substantially which will be in the next version of object store 0.12.0: - https://github.com/apache/arrow-rs-object-store/iss

Re: [PR] feat: Support parsing subqueries with `OuterReferenceColumn` belongs to non-adjacent outer relations [datafusion]

2025-05-29 Thread via GitHub
irenjj commented on PR #16186: URL: https://github.com/apache/datafusion/pull/16186#issuecomment-2918752095 > ~we can implement the tree formatter for this node type to visualize it~ tree formatter is used only in datafusion-cli, for sqllogical test, we only use indent explain.🤣 --

Re: [PR] Set Formatted TableOptions Enum [datafusion]

2025-05-29 Thread via GitHub
berkaysynnada commented on code in PR #16166: URL: https://github.com/apache/datafusion/pull/16166#discussion_r2113297575 ## datafusion/datasource/src/file_format.rs: ## @@ -120,7 +121,26 @@ pub trait FileFormatFactory: Sync + Send + GetExt + fmt::Debug { &self,

Re: [PR] feat: Support parsing subqueries with `OuterReferenceColumn` belongs to non-adjacent outer relations [datafusion]

2025-05-29 Thread via GitHub
duongcongtoai commented on PR #16186: URL: https://github.com/apache/datafusion/pull/16186#issuecomment-2918703742 i create a temp branch here to combine 2 PRs https://github.com/duongcongtoai/arrow-datafusion/blob/14554-subquery-unnest-framework-fixed-planner/datafusion/sqllogictest/test

Re: [PR] chore: add assertion that not using comet scan but using native scan [datafusion-comet]

2025-05-29 Thread via GitHub
rluvaton commented on PR #1793: URL: https://github.com/apache/datafusion-comet/pull/1793#issuecomment-2918851846 Also, isn't reuse of buffer only part of the reason to do copy? like if the spark side call the arrow release function in the FFI while the `Array` is still being used for exam

Re: [PR] Adds support for mysql's drop index [datafusion-sqlparser-rs]

2025-05-29 Thread via GitHub
iffyio commented on code in PR #1864: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1864#discussion_r2113591715 ## src/parser/mod.rs: ## @@ -6249,6 +6249,11 @@ impl<'a> Parser<'a> { loc ); } +// Mysql requires table s

Re: [PR] Add support for `TABLESAMPLE` pipe operator [datafusion-sqlparser-rs]

2025-05-29 Thread via GitHub
iffyio commented on PR #1860: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1860#issuecomment-2918865945 @hendrikmakait could you take a look at the CI failures? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub a

Re: [PR] Reduce size of `Expr` struct [datafusion]

2025-05-29 Thread via GitHub
Dandandan commented on code in PR #16207: URL: https://github.com/apache/datafusion/pull/16207#discussion_r2113426682 ## datafusion/expr/src/expr.rs: ## @@ -330,7 +331,7 @@ pub enum Expr { /// [`ExprFunctionExt`]: crate::expr_fn::ExprFunctionExt AggregateFunction(Aggre

Re: [I] Blog post about parquet vs custom file formats [datafusion]

2025-05-29 Thread via GitHub
zhuqi-lucas commented on issue #16149: URL: https://github.com/apache/datafusion/issues/16149#issuecomment-2918761743 > > A fun experiment might be to "fix" the clickbench partitioned dataset by > > > resorting and writing with page indexes (could use a bunch of DataFusion COPY comman

Re: [PR] Add support for mysql's drop index (`DROP INDEX idx_a ON table_a` and `ALTER TABLE table_a DROP INDEX idx_a`) [datafusion-sqlparser-rs]

2025-05-29 Thread via GitHub
iffyio commented on PR #1865: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1865#issuecomment-2918876058 Thanks @vimko! fyi there's a similar PR for `DROP INDEX idx_a on table_a` feature, we'll probably look to land that one and rebase this on top of it -- This is an automa

Re: [PR] Add support for parameter default values in SQL Server [datafusion-sqlparser-rs]

2025-05-29 Thread via GitHub
iffyio merged PR #1866: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1866 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr

[I] native_iceberg_compat uses hard-coded config values [datafusion-comet]

2025-05-29 Thread via GitHub
parthchandra opened a new issue, #1816: URL: https://github.com/apache/datafusion-comet/issues/1816 ### Describe the bug In native_iceberg_compat, the initialization had codes the following configuration flags - ``` conf.set("spark.sql.parquet.binaryAsString", "false");

Re: [I] variance/stddev sometimes produces incorrect results [datafusion]

2025-05-29 Thread via GitHub
Jefffrey closed issue #13247: variance/stddev sometimes produces incorrect results URL: https://github.com/apache/datafusion/issues/13247 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [I] variance/stddev sometimes produces incorrect results [datafusion]

2025-05-29 Thread via GitHub
Jefffrey commented on issue #13247: URL: https://github.com/apache/datafusion/issues/13247#issuecomment-2920708500 Resolved by #13248 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] fix: map parquet field_id correctly (native_iceberg_compat) [datafusion-comet]

2025-05-29 Thread via GitHub
parthchandra commented on PR #1815: URL: https://github.com/apache/datafusion-comet/pull/1815#issuecomment-2920712348 @mbutrovich need your help here. CI is failing due to tests involving default values. Should these tests be enabled? -- This is an automated message from the Apache Git

Re: [PR] fix: map parquet field_id correctly (native_iceberg_compat) [datafusion-comet]

2025-05-29 Thread via GitHub
parthchandra commented on code in PR #1815: URL: https://github.com/apache/datafusion-comet/pull/1815#discussion_r2114833029 ## common/src/main/java/org/apache/comet/parquet/NativeBatchReader.java: ## @@ -236,7 +236,7 @@ public NativeBatchReader(AbstractColumnReader[] columnRea

Re: [PR] chore: Bump DataFusion to git rev 2c2f225 [datafusion-comet]

2025-05-29 Thread via GitHub
andygrove merged PR #1814: URL: https://github.com/apache/datafusion-comet/pull/1814 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@

Re: [PR] feat: Support null aware + equijoins for `NestedLoopJoin` [datafusion]

2025-05-29 Thread via GitHub
jonathanc-n commented on code in PR #16210: URL: https://github.com/apache/datafusion/pull/16210#discussion_r2114981517 ## datafusion/physical-plan/src/joins/nested_loop_join.rs: ## @@ -810,6 +871,123 @@ fn build_join_indices( } } +// Find matching indices based on join

Re: [I] Queries with exchange reuse sometimes fail in Comet [datafusion-comet]

2025-05-29 Thread via GitHub
mbutrovich commented on issue #1798: URL: https://github.com/apache/datafusion-comet/issues/1798#issuecomment-2920626829 > Here is a related PR to Spark that helped fix the problem in Spark RAPIDS. > > [apache/spark@52e3cf9](https://github.com/apache/spark/commit/52e3cf9ff50b4209e29c

Re: [PR] chore: enable map_values testing since we fall back on nested types for defa… [datafusion-comet]

2025-05-29 Thread via GitHub
zhuqi-lucas commented on PR #1813: URL: https://github.com/apache/datafusion-comet/pull/1813#issuecomment-2920887659 Thank you @comphead for explain! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [I] Support `map_values` [datafusion-comet]

2025-05-29 Thread via GitHub
comphead commented on issue #1789: URL: https://github.com/apache/datafusion-comet/issues/1789#issuecomment-2920869741 UPD: apparently DF checks validity on runtime and Comet does not. -- This is an automated message from the Apache Git Service. To respond to the message, please log on t

Re: [I] Should we introduce property testing? [datafusion]

2025-05-29 Thread via GitHub
Jefffrey commented on issue #5512: URL: https://github.com/apache/datafusion/issues/5512#issuecomment-2920679988 Although this issue is older, I'll close it as duplicate in favour of #12569 as the newer one adds some extra details (like linking to an example PR where proptesting could be be

Re: [I] Should we introduce property testing? [datafusion]

2025-05-29 Thread via GitHub
Jefffrey closed issue #5512: Should we introduce property testing? URL: https://github.com/apache/datafusion/issues/5512 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsu

Re: [I] Support `map_values` [datafusion-comet]

2025-05-29 Thread via GitHub
comphead commented on issue #1789: URL: https://github.com/apache/datafusion-comet/issues/1789#issuecomment-2920864833 Datafusion test ``` let batch = session_ctx .read_parquet("/tmp/t1/part-0-340a4bdf-2e2c-42a8-a38a-01b47ab7d3c0-c000.snappy.parquet",

Re: [PR] fix: Fix `EquivalenceClass` calculation for Union queries [datafusion]

2025-05-29 Thread via GitHub
chenkovsky commented on code in PR #16185: URL: https://github.com/apache/datafusion/pull/16185#discussion_r2115020974 ## datafusion/physical-expr/src/equivalence/class.rs: ## @@ -422,6 +423,60 @@ impl EquivalenceGroup { self.bridge_classes() } +/// Returns a

Re: [I] Change mapping of SQL `VARCHAR` from `Utf8` to `Utf8View` [datafusion]

2025-05-29 Thread via GitHub
xudong963 closed issue #15096: Change mapping of SQL `VARCHAR` from `Utf8` to `Utf8View` URL: https://github.com/apache/datafusion/issues/15096 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the sp

  1   2   >