Re: [I] Remove nulls in joins during hash table lookup [datafusion]

2025-04-21 Thread via GitHub
Dandandan commented on issue #15784: URL: https://github.com/apache/datafusion/issues/15784#issuecomment-2817826955 Additionally, I think there is an optimization that pushes down null filters down. Maybe it would be worth testing if it can not pushed down to execute it in the join itself (

Re: [PR] Support `GroupsAccumulator` for Avg duration [datafusion]

2025-04-21 Thread via GitHub
shruti2522 commented on PR #15748: URL: https://github.com/apache/datafusion/pull/15748#issuecomment-2817805364 Ready for review @alamb -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specif

Re: [I] Improve SQL parser recursion limit error message [datafusion]

2025-04-21 Thread via GitHub
xudong963 closed issue #15623: Improve SQL parser recursion limit error message URL: https://github.com/apache/datafusion/issues/15623 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific com

Re: [PR] Show current SQL recursion limit in RecursionLimitExceeded error message [datafusion]

2025-04-21 Thread via GitHub
xudong963 merged PR #15644: URL: https://github.com/apache/datafusion/pull/15644 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@data

Re: [PR] Add `statistics_by_partition API` to ExecutionPlan [datafusion]

2025-04-21 Thread via GitHub
berkaysynnada commented on PR #15503: URL: https://github.com/apache/datafusion/pull/15503#issuecomment-2817844462 > I may start a new branch based on the branch to experiment with @berkaysynnada's suggestion to see if there are some challenges, then we can decide the next direction. /cc @a

Re: [PR] Enforce JOIN plan to require condition [datafusion]

2025-04-21 Thread via GitHub
niebayes commented on PR #15334: URL: https://github.com/apache/datafusion/pull/15334#issuecomment-2817844756 I don't understand this change. Previously, we have a sql: ``` select * from t where (select sid from t) = (select a from t limit 1) order by ts ``` which defini

Re: [PR] fix: clickbench type err [datafusion]

2025-04-21 Thread via GitHub
Weijun-H commented on code in PR #15773: URL: https://github.com/apache/datafusion/pull/15773#discussion_r2051622814 ## benchmarks/queries/clickbench/extended.sql: ## @@ -4,4 +4,4 @@ SELECT "BrowserCountry", COUNT(DISTINCT "SocialNetwork"), COUNT(DISTINCT "HitCo SELECT "Socia

Re: [PR] fix: clickbench type err [datafusion]

2025-04-21 Thread via GitHub
Weijun-H commented on code in PR #15773: URL: https://github.com/apache/datafusion/pull/15773#discussion_r2051622969 ## benchmarks/queries/clickbench/README.md: ## @@ -155,7 +155,7 @@ WHERE THEN split_part(split_part("URL", 'resolution=', 2), '&', 1)::INT ELSE

Re: [PR] feat: ORDER BY ALL [datafusion]

2025-04-21 Thread via GitHub
berkaysynnada commented on code in PR #15772: URL: https://github.com/apache/datafusion/pull/15772#discussion_r2052112805 ## datafusion/expr/src/expr.rs: ## @@ -701,6 +701,24 @@ impl TryCast { } } +/// OrderBy Expressions +pub enum OrderByExprs { +OrderByExprVec(Vec)

Re: [I] Remove nulls in joins during hash table lookup [datafusion]

2025-04-21 Thread via GitHub
ctsk commented on issue #15784: URL: https://github.com/apache/datafusion/issues/15784#issuecomment-2817904944 That does make sense. I found the [filter_null_join_keys](https://github.com/apache/datafusion/blob/main/datafusion/optimizer/src/filter_null_join_keys.rs) rule which takes care of

Re: [PR] fix: clickbench type err [datafusion]

2025-04-21 Thread via GitHub
Weijun-H commented on code in PR #15773: URL: https://github.com/apache/datafusion/pull/15773#discussion_r2052113962 ## benchmarks/queries/clickbench/README.md: ## @@ -155,7 +155,7 @@ WHERE THEN split_part(split_part("URL", 'resolution=', 2), '&', 1)::INT ELSE

Re: [I] Reuse Rows allocation in SortPreservingMergeStream / `RowCursorStream` [datafusion]

2025-04-21 Thread via GitHub
acking-you commented on issue #15720: URL: https://github.com/apache/datafusion/issues/15720#issuecomment-2817918126 # Overall Implementation Adjust `RowCursorStream` to become the owner of `Rows` with continuous reuse, requiring each partition to maintain two `Rows` instances (subsequent

[PR] Minor: fix flaky test in `aggregate.slt` [datafusion]

2025-04-21 Thread via GitHub
xudong963 opened a new pull request, #15786: URL: https://github.com/apache/datafusion/pull/15786 ## Which issue does this PR close? - Closes #. ## Rationale for this change ## What changes are included in this PR? ## Are these changes teste

Re: [I] `Cargo bench --bench sql_planner` is failing [datafusion]

2025-04-21 Thread via GitHub
xudong963 closed issue #15753: `Cargo bench --bench sql_planner` is failing URL: https://github.com/apache/datafusion/issues/15753 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment

Re: [PR] fix: clickbench type err [datafusion]

2025-04-21 Thread via GitHub
xudong963 commented on PR #15773: URL: https://github.com/apache/datafusion/pull/15773#issuecomment-2817936843 Thank you -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [PR] fix: clickbench type err [datafusion]

2025-04-21 Thread via GitHub
xudong963 merged PR #15773: URL: https://github.com/apache/datafusion/pull/15773 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@data

Re: [PR] replace reassign_predicate_columns helper with PhysicalExpr::with_schema [datafusion]

2025-04-21 Thread via GitHub
berkaysynnada commented on PR #15779: URL: https://github.com/apache/datafusion/pull/15779#issuecomment-2817948487 > Hi @adriangb. I've a suggestion. As I said, if I don't misestimate the need, this requirement has arisen in some other places as well. So, let's solve it for all --

Re: [PR] Improve `ListingTable` / `ListingTableOptions` docs [datafusion]

2025-04-21 Thread via GitHub
Weijun-H commented on code in PR #15767: URL: https://github.com/apache/datafusion/pull/15767#discussion_r2052133457 ## datafusion/core/src/datasource/listing/table.rs: ## @@ -479,11 +505,13 @@ impl ListingOptions { } /// Infer the schema of the files at the given pa

Re: [PR] replace reassign_predicate_columns helper with PhysicalExpr::with_schema [datafusion]

2025-04-21 Thread via GitHub
berkaysynnada commented on code in PR #15779: URL: https://github.com/apache/datafusion/pull/15779#discussion_r2052131746 ## datafusion/physical-expr-common/src/physical_expr.rs: ## @@ -333,6 +333,15 @@ pub trait PhysicalExpr: Send + Sync + Display + Debug + DynEq + DynHash {

[PR] Minor: eliminate unnecessary struct creation in session state build. [datafusion]

2025-04-21 Thread via GitHub
Rachelint opened a new pull request, #15800: URL: https://github.com/apache/datafusion/pull/15800 ## Which issue does this PR close? - Closes #. ## Rationale for this change I found some unnecessary struct creation in `SessionStateBuilder::build` due to using `un

Re: [I] Support non UTF-8 in CSV files [datafusion]

2025-04-21 Thread via GitHub
myrust-go commented on issue #15756: URL: https://github.com/apache/datafusion/issues/15756#issuecomment-2819848742 Can it support more character sets? We also need this > > > We need to support delimeter with &[u8] or similar in arrow's csv reader. You only take the first u8 so obvio

Re: [PR] chore : migrated all the UDFS to invoke_with_args [datafusion]

2025-04-21 Thread via GitHub
github-actions[bot] commented on PR #14779: URL: https://github.com/apache/datafusion/pull/14779#issuecomment-2819872173 Thank you for your contribution. Unfortunately, this pull request is stale because it has been open 60 days with no activity. Please remove the stale label or comment or

Re: [I] Support non UTF-8 in CSV files [datafusion]

2025-04-21 Thread via GitHub
myrust-go commented on issue #15756: URL: https://github.com/apache/datafusion/issues/15756#issuecomment-2819889539 Or fully support utf8? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spec

Re: [PR] Add Configurable HTML Table Formatter for DataFusion DataFrames in Python [datafusion-python]

2025-04-21 Thread via GitHub
kosiew commented on PR #1100: URL: https://github.com/apache/datafusion-python/pull/1100#issuecomment-2819926946 @timsaucer thanks for the review and merge. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

Re: [PR] Support marking columns as system columns via Field's metadata [datafusion]

2025-04-21 Thread via GitHub
github-actions[bot] commented on PR #14362: URL: https://github.com/apache/datafusion/pull/14362#issuecomment-2819872465 Thank you for your contribution. Unfortunately, this pull request is stale because it has been open 60 days with no activity. Please remove the stale label or comment or

Re: [PR] Add union_tag scalar function [datafusion]

2025-04-21 Thread via GitHub
github-actions[bot] commented on PR #14687: URL: https://github.com/apache/datafusion/pull/14687#issuecomment-2819872276 Thank you for your contribution. Unfortunately, this pull request is stale because it has been open 60 days with no activity. Please remove the stale label or comment or

Re: [PR] chore: Return NativeType instead of DataType for get_example_types [datafusion]

2025-04-21 Thread via GitHub
github-actions[bot] commented on PR #14778: URL: https://github.com/apache/datafusion/pull/14778#issuecomment-2819872232 Thank you for your contribution. Unfortunately, this pull request is stale because it has been open 60 days with no activity. Please remove the stale label or comment or

Re: [PR] Reduce size of `Expr` struct [datafusion]

2025-04-21 Thread via GitHub
github-actions[bot] commented on PR #14366: URL: https://github.com/apache/datafusion/pull/14366#issuecomment-2819872416 Thank you for your contribution. Unfortunately, this pull request is stale because it has been open 60 days with no activity. Please remove the stale label or comment or

Re: [PR] Implement faster join traversal [datafusion]

2025-04-21 Thread via GitHub
github-actions[bot] closed pull request #14539: Implement faster join traversal URL: https://github.com/apache/datafusion/pull/14539 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comme

Re: [PR] Support logic optimize rule to pass the case that Utf8view datatype combined with Utf8 datatype [datafusion]

2025-04-21 Thread via GitHub
zhuqi-lucas commented on code in PR #15239: URL: https://github.com/apache/datafusion/pull/15239#discussion_r2053405630 ## datafusion/common/src/dfschema.rs: ## @@ -564,6 +564,7 @@ impl DFSchema { } /// Check to see if fields in 2 Arrow schemas are compatible +#[

Re: [I] Support non UTF-8 in CSV files [datafusion]

2025-04-21 Thread via GitHub
jayzhan211 commented on issue #15756: URL: https://github.com/apache/datafusion/issues/15756#issuecomment-2820193611 If delimeter is `&[u8]`, I think any kind of delimeter should be able to supported. The only concern is how to design it optimally for all the cases -- This is an automated

Re: [PR] Minor: eliminate unnecessary struct creation in session state build [datafusion]

2025-04-21 Thread via GitHub
Rachelint commented on PR #15800: URL: https://github.com/apache/datafusion/pull/15800#issuecomment-2820236609 > 👍 > > I'm wondering if there is a clippy rule for this case I think it a good idea, I am trying to do it like what have been done about `Arc::clone` -- This is an

Re: [I] Potential flaky tests [datafusion]

2025-04-21 Thread via GitHub
bikbov commented on issue #15789: URL: https://github.com/apache/datafusion/issues/15789#issuecomment-2820242739 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To uns

Re: [PR] Minor: eliminate unnecessary struct creation in session state build [datafusion]

2025-04-21 Thread via GitHub
xudong963 merged PR #15800: URL: https://github.com/apache/datafusion/pull/15800 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@data

Re: [PR] Minor: eliminate unnecessary struct creation in session state build [datafusion]

2025-04-21 Thread via GitHub
xudong963 commented on PR #15800: URL: https://github.com/apache/datafusion/pull/15800#issuecomment-2820252438 Thank you -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [PR] Support logic optimize rule to pass the case that Utf8view datatype combined with Utf8 datatype [datafusion]

2025-04-21 Thread via GitHub
xudong963 commented on code in PR #15239: URL: https://github.com/apache/datafusion/pull/15239#discussion_r2053359938 ## datafusion/common/src/dfschema.rs: ## @@ -564,6 +564,7 @@ impl DFSchema { } /// Check to see if fields in 2 Arrow schemas are compatible +#[de

Re: [PR] Enable repartitioning on MemTable. [datafusion]

2025-04-21 Thread via GitHub
wiedld commented on code in PR #15409: URL: https://github.com/apache/datafusion/pull/15409#discussion_r2053131373 ## datafusion/core/tests/fuzz_cases/aggregate_fuzz.rs: ## @@ -520,7 +520,9 @@ async fn group_by_string_test( let expected = compute_counts(&input, column_name)

Re: [PR] Enable repartitioning on MemTable. [datafusion]

2025-04-21 Thread via GitHub
wiedld commented on code in PR #15409: URL: https://github.com/apache/datafusion/pull/15409#discussion_r2053132567 ## datafusion/datasource/src/memory.rs: ## @@ -902,4 +1130,319 @@ mod tests { Ok(()) } + +fn batch(row_size: usize) -> RecordBatch { +le

Re: [PR] Enable repartitioning on MemTable. [datafusion]

2025-04-21 Thread via GitHub
wiedld commented on PR #15409: URL: https://github.com/apache/datafusion/pull/15409#issuecomment-2820043803 Took me awhile to circle back to this PR. I believe it's now ready for re-review @2010YOUY01 -- This is an automated message from the Apache Git Service. To respond to the message,

Re: [I] Single partition shuffle should not use partition buffers [datafusion-comet]

2025-04-21 Thread via GitHub
andygrove closed issue #1497: Single partition shuffle should not use partition buffers URL: https://github.com/apache/datafusion-comet/issues/1497 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to th

Re: [PR] Implement Parquet filter pushdown via new filter pushdown APIs [datafusion]

2025-04-21 Thread via GitHub
adriangb commented on code in PR #15769: URL: https://github.com/apache/datafusion/pull/15769#discussion_r2052745403 ## datafusion/core/src/datasource/listing/table.rs: ## @@ -982,18 +980,6 @@ impl TableProvider for ListingTable { return Ok(TableProviderFilt

Re: [PR] feat: add `with_group_indices_order_mode` function for `GroupsAccumulator` to help create specialized impl [datafusion]

2025-04-21 Thread via GitHub
ozankabak commented on PR #15022: URL: https://github.com/apache/datafusion/pull/15022#issuecomment-2819068976 The "ideal" flow in DF is to check for ordering during planning and choose specialized executors (and accumulators) based on this information. We don't do this in all cases yet, bu

Re: [PR] Implement Parquet filter pushdown via new filter pushdown APIs [datafusion]

2025-04-21 Thread via GitHub
adriangb commented on code in PR #15769: URL: https://github.com/apache/datafusion/pull/15769#discussion_r2052747672 ## datafusion/sqllogictest/test_files/aggregate.slt: ## @@ -5006,7 +5006,7 @@ SELECT column5, avg(column1) FROM d GROUP BY column5; query I?? SELECT column5,

Re: [PR] Implement Parquet filter pushdown via new filter pushdown APIs [datafusion]

2025-04-21 Thread via GitHub
adriangb commented on code in PR #15769: URL: https://github.com/apache/datafusion/pull/15769#discussion_r2052746932 ## datafusion/sqllogictest/test_files/parquet_filter_pushdown.slt: ## @@ -81,11 +81,15 @@ EXPLAIN select a from t_pushdown where b > 2 ORDER BY a; logical_

Re: [PR] Implement Parquet filter pushdown via new filter pushdown APIs [datafusion]

2025-04-21 Thread via GitHub
adriangb commented on code in PR #15769: URL: https://github.com/apache/datafusion/pull/15769#discussion_r205275 ## datafusion/datasource-parquet/src/source.rs: ## @@ -589,4 +559,49 @@ impl FileSource for ParquetSource { } } } + +fn try_pushdow

Re: [PR] Implement Parquet filter pushdown via new filter pushdown APIs [datafusion]

2025-04-21 Thread via GitHub
adriangb commented on code in PR #15769: URL: https://github.com/apache/datafusion/pull/15769#discussion_r2052755734 ## datafusion/datasource-parquet/src/source.rs: ## @@ -589,4 +559,49 @@ impl FileSource for ParquetSource { } } } + +fn try_pushdow

Re: [PR] Implement Parquet filter pushdown via new filter pushdown APIs [datafusion]

2025-04-21 Thread via GitHub
adriangb commented on code in PR #15769: URL: https://github.com/apache/datafusion/pull/15769#discussion_r2052756433 ## datafusion/datasource-parquet/src/source.rs: ## @@ -253,18 +251,18 @@ use object_store::ObjectStore; /// [`RecordBatch`]: arrow::record_batch::RecordBatch //

Re: [PR] Implement Parquet filter pushdown via new filter pushdown APIs [datafusion]

2025-04-21 Thread via GitHub
adriangb commented on code in PR #15769: URL: https://github.com/apache/datafusion/pull/15769#discussion_r2052755948 ## datafusion/datasource-parquet/src/source.rs: ## @@ -589,4 +559,49 @@ impl FileSource for ParquetSource { } } } + +fn try_pushdow

Re: [PR] TopK dynamic filter pushdown attempt 2 [datafusion]

2025-04-21 Thread via GitHub
adriangb commented on code in PR #15770: URL: https://github.com/apache/datafusion/pull/15770#discussion_r2052761831 ## datafusion/physical-optimizer/src/push_down_filter.rs: ## @@ -382,7 +383,7 @@ impl PhysicalOptimizerRule for PushdownFilter { context .

Re: [PR] TopK dynamic filter pushdown attempt 2 [datafusion]

2025-04-21 Thread via GitHub
adriangb commented on code in PR #15770: URL: https://github.com/apache/datafusion/pull/15770#discussion_r2052761831 ## datafusion/physical-optimizer/src/push_down_filter.rs: ## @@ -382,7 +383,7 @@ impl PhysicalOptimizerRule for PushdownFilter { context .

[PR] Speed up `optimize_projection` [datafusion]

2025-04-21 Thread via GitHub
xudong963 opened a new pull request, #15787: URL: https://github.com/apache/datafusion/pull/15787 ## Which issue does this PR close? - Closes #. ## Rationale for this change ## What changes are included in this PR? ## Are these changes teste

Re: [I] incorrect range frame implementation [datafusion]

2025-04-21 Thread via GitHub
suibianwanwank commented on issue #15714: URL: https://github.com/apache/datafusion/issues/15714#issuecomment-2817976931 I agree that supporting expressions in the window frame boundaries would improve compatibility with the SQL standard. Currently, DataFusion uses a cumulative algori

Re: [PR] Set HashJoin seed [datafusion]

2025-04-21 Thread via GitHub
Weijun-H commented on code in PR #15783: URL: https://github.com/apache/datafusion/pull/15783#discussion_r2052164212 ## datafusion/physical-plan/src/joins/hash_join.rs: ## @@ -86,6 +86,9 @@ use datafusion_physical_expr_common::physical_expr::fmt_sql; use futures::{ready, Stream

Re: [PR] Speed up `optimize_projection` [datafusion]

2025-04-21 Thread via GitHub
xudong963 commented on PR #15787: URL: https://github.com/apache/datafusion/pull/15787#issuecomment-2818018652 🤖 `./gh_compare_branch.sh` [Benchmark Script](https://github.com/alamb/datafusion-benchmarking/blob/main/gh_compare_branch.sh) Running Linux aal-dev 6.8.0-1016-gcp #18-Ubuntu SM

Re: [PR] Minor: fix flaky test in `aggregate.slt` [datafusion]

2025-04-21 Thread via GitHub
xudong963 merged PR #15786: URL: https://github.com/apache/datafusion/pull/15786 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@data

Re: [PR] Minor: fix flaky test in `aggregate.slt` [datafusion]

2025-04-21 Thread via GitHub
xudong963 commented on PR #15786: URL: https://github.com/apache/datafusion/pull/15786#issuecomment-2818070749 > LGTM! I’ve left a few queries that might potentially require `rowsort`: > > https://github.com/apache/datafusion/blob/8a193c22f6c3f771bff16cb8f9608d764d268c59/datafusion/sq

[PR] [branch-47] toolchain 1.84 compatibility [datafusion]

2025-04-21 Thread via GitHub
gabotechs opened a new pull request, #15790: URL: https://github.com/apache/datafusion/pull/15790 Reverts https://github.com/apache/datafusion/pull/15625, as changes shipped there are incompatible with the Rust toolchain -- This is an automated message from the Apache Git Service. To resp

[PR] chore(deps): bump sqllogictest from 0.28.0 to 0.28.1 [datafusion]

2025-04-21 Thread via GitHub
dependabot[bot] opened a new pull request, #15788: URL: https://github.com/apache/datafusion/pull/15788 Bumps [sqllogictest](https://github.com/risinglightdb/sqllogictest-rs) from 0.28.0 to 0.28.1. Release notes Sourced from https://github.com/risinglightdb/sqllogictest-rs/releases

Re: [PR] feat: add `with_group_indices_order_mode` function for `GroupsAccumulator` to help create specialized impl [datafusion]

2025-04-21 Thread via GitHub
ozankabak commented on PR #15022: URL: https://github.com/apache/datafusion/pull/15022#issuecomment-2818317038 Hi @rluvaton -- thanks for the PR. I and @berkaysynnada are checking out this PR and we have some questions. @berkaysynnada will soon post here and we can iterate together.

Re: [PR] feat: add `with_group_indices_order_mode` function for `GroupsAccumulator` to help create specialized impl [datafusion]

2025-04-21 Thread via GitHub
berkaysynnada commented on PR #15022: URL: https://github.com/apache/datafusion/pull/15022#issuecomment-2818813306 Thank you @rluvaton. I had some difficulty to understand what does this PR actually solve. If you can share a real case to demonstrate how this order in metadata works in a rea

Re: [PR] chore: Enable CometFuzzTestSuite int96 test for experimental native scans (without complex types) [datafusion-comet]

2025-04-21 Thread via GitHub
codecov-commenter commented on PR #1664: URL: https://github.com/apache/datafusion-comet/pull/1664#issuecomment-2818820900 ## [Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/1664?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_ca

Re: [PR] chore: Enable CometFuzzTestSuite int96 test for experimental native scans (without complex types) [datafusion-comet]

2025-04-21 Thread via GitHub
mbutrovich commented on code in PR #1664: URL: https://github.com/apache/datafusion-comet/pull/1664#discussion_r2052598103 ## spark/src/test/scala/org/apache/comet/CometFuzzTestSuite.scala: ## @@ -206,10 +206,15 @@ class CometFuzzTestSuite extends CometTestBase with AdaptiveSpa

[I] Add tests for map types to CometFuzzTestSuite [datafusion-comet]

2025-04-21 Thread via GitHub
andygrove opened a new issue, #1665: URL: https://github.com/apache/datafusion-comet/issues/1665 ### What is the problem the feature request solves? `CometFuzzTestSuite` currently has tests for structs and arrays, but not maps. This issue is for adding maps. ### Describe

Re: [PR] [branch-47] toolchain 1.84 compatibility [datafusion]

2025-04-21 Thread via GitHub
gabotechs closed pull request #15790: [branch-47] toolchain 1.84 compatibility URL: https://github.com/apache/datafusion/pull/15790 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific commen

Re: [PR] [branch-47] toolchain 1.84 compatibility [datafusion]

2025-04-21 Thread via GitHub
gabotechs commented on PR #15790: URL: https://github.com/apache/datafusion/pull/15790#issuecomment-2818081708 oops, wrong repo, closing! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the speci

Re: [PR] Speed up `optimize_projection` [datafusion]

2025-04-21 Thread via GitHub
jayzhan211 commented on code in PR #15787: URL: https://github.com/apache/datafusion/pull/15787#discussion_r2052256953 ## datafusion/optimizer/src/optimize_projections/mod.rs: ## @@ -785,13 +785,24 @@ fn rewrite_projection_given_requirements( /// Projection is unnecessary, when

Re: [I] Support non UTF-8 in CSV files [datafusion]

2025-04-21 Thread via GitHub
guojidan commented on issue #15756: URL: https://github.com/apache/datafusion/issues/15756#issuecomment-2818171149 > btw, why do you need this kind of delimiter support? Is converting them to `,` an option for your use-case? in minio test project [mint](https://github.com/minio/mint/

Re: [I] Support non UTF-8 in CSV files [datafusion]

2025-04-21 Thread via GitHub
guojidan commented on issue #15756: URL: https://github.com/apache/datafusion/issues/15756#issuecomment-2818174774 > We need to support delimeter with &[u8] or similar in arrow's csv reader. You only take the first u8 so obviously there is error. > > ``` > let r = "╦".as_bytes(

Re: [PR] Minor: fix flaky test in `aggregate.slt` [datafusion]

2025-04-21 Thread via GitHub
xudong963 commented on PR #15786: URL: https://github.com/apache/datafusion/pull/15786#issuecomment-2817985685 CI in main is broken: https://github.com/apache/datafusion/actions/runs/14570294254/job/40866408433 And the PR will fix it. -- This is an automated message from the Apache

Re: [I] Support non UTF-8 in CSV files [datafusion]

2025-04-21 Thread via GitHub
jayzhan211 commented on issue #15756: URL: https://github.com/apache/datafusion/issues/15756#issuecomment-2818160274 btw, why do you need this kind of delimiter support? Is converting them to `,` an option for your use-case? -- This is an automated message from the Apache Git Service. To

Re: [PR] Implement Parquet filter pushdown via new filter pushdown APIs [datafusion]

2025-04-21 Thread via GitHub
berkaysynnada commented on code in PR #15769: URL: https://github.com/apache/datafusion/pull/15769#discussion_r2052242110 ## datafusion/datasource-parquet/src/source.rs: ## @@ -589,4 +559,49 @@ impl FileSource for ParquetSource { } } } + +fn try_pu

Re: [PR] Add Configurable HTML Table Formatter for DataFusion DataFrames in Python [datafusion-python]

2025-04-21 Thread via GitHub
timsaucer merged PR #1100: URL: https://github.com/apache/datafusion-python/pull/1100 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...

Re: [PR] Add Configurable HTML Table Formatter for DataFusion DataFrames in Python [datafusion-python]

2025-04-21 Thread via GitHub
timsaucer commented on PR #1100: URL: https://github.com/apache/datafusion-python/pull/1100#issuecomment-2818259829 I've tested this locally and it works well. I'd love to see the documentation, but I see that's tracked on another issue since this one is so large. Thank you so much for the

Re: [I] Move DataFrame HTML Rendering to Configurable Python Formatter [datafusion-python]

2025-04-21 Thread via GitHub
timsaucer closed issue #1096: Move DataFrame HTML Rendering to Configurable Python Formatter URL: https://github.com/apache/datafusion-python/issues/1096 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [PR] TopK dynamic filter pushdown attempt 2 [datafusion]

2025-04-21 Thread via GitHub
berkaysynnada commented on PR #15770: URL: https://github.com/apache/datafusion/pull/15770#issuecomment-2818261990 @adriangb I'll complete reviewing this after merging other open PR's. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

Re: [PR] Add support for `GO` batch delimiter in SQL Server [datafusion-sqlparser-rs]

2025-04-21 Thread via GitHub
aharpervc commented on code in PR #1809: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1809#discussion_r2050966075 ## tests/sqlparser_mssql.rs: ## @@ -2053,3 +2054,171 @@ fn parse_drop_trigger() { } ); } + +#[test] +fn parse_mssql_go_keyword() { +

Re: [I] Release Comet 0.8.0 [datafusion-comet]

2025-04-21 Thread via GitHub
andygrove commented on issue #1635: URL: https://github.com/apache/datafusion-comet/issues/1635#issuecomment-2818474349 I created a Google doc where we can collaborate on a blog post. https://docs.google.com/document/d/1vznwLvyIPiILTPAS_24WAvWuZ4qdrV4pPdUoQghvpao/edit?usp=sharing -

Re: [PR] Minor: remove unused logic for limit pushdown [datafusion]

2025-04-21 Thread via GitHub
zhuqi-lucas commented on PR #15730: URL: https://github.com/apache/datafusion/pull/15730#issuecomment-2818499171 Thank you @berkaysynnada for review. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [PR] Add support for `GO` batch delimiter in SQL Server [datafusion-sqlparser-rs]

2025-04-21 Thread via GitHub
aharpervc commented on code in PR #1809: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1809#discussion_r2052449901 ## src/parser/mod.rs: ## @@ -618,6 +632,7 @@ impl<'a> Parser<'a> { // `COMMENT` is snowflake specific https://docs.snowflake.com/en

Re: [PR] Add support for `GO` batch delimiter in SQL Server [datafusion-sqlparser-rs]

2025-04-21 Thread via GitHub
aharpervc commented on code in PR #1809: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1809#discussion_r2052450841 ## src/parser/mod.rs: ## @@ -15064,6 +15117,38 @@ impl<'a> Parser<'a> { })) } +/// Parse [Statement::Go] +fn parse_go(&mut se

Re: [PR] Minor: remove unused logic for limit pushdown [datafusion]

2025-04-21 Thread via GitHub
berkaysynnada merged PR #15730: URL: https://github.com/apache/datafusion/pull/15730 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@

Re: [I] Optimize native shuffle for single partition case [datafusion-comet]

2025-04-21 Thread via GitHub
andygrove closed issue #1453: Optimize native shuffle for single partition case URL: https://github.com/apache/datafusion-comet/issues/1453 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specifi

Re: [I] Test suite is not always testing `native_datafusion` [datafusion-comet]

2025-04-21 Thread via GitHub
andygrove commented on issue #1538: URL: https://github.com/apache/datafusion-comet/issues/1538#issuecomment-2818533684 I think that this can be closed now -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [PR] Add support for `GO` batch delimiter in SQL Server [datafusion-sqlparser-rs]

2025-04-21 Thread via GitHub
aharpervc commented on code in PR #1809: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1809#discussion_r2052462208 ## src/parser/mod.rs: ## @@ -15064,6 +15117,38 @@ impl<'a> Parser<'a> { })) } +/// Parse [Statement::Go] +fn parse_go(&mut se

Re: [I] Test suite is not always testing `native_datafusion` [datafusion-comet]

2025-04-21 Thread via GitHub
andygrove closed issue #1538: Test suite is not always testing `native_datafusion` URL: https://github.com/apache/datafusion-comet/issues/1538 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spe

Re: [PR] Add support for `GO` batch delimiter in SQL Server [datafusion-sqlparser-rs]

2025-04-21 Thread via GitHub
aharpervc commented on code in PR #1809: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1809#discussion_r2052462208 ## src/parser/mod.rs: ## @@ -15064,6 +15117,38 @@ impl<'a> Parser<'a> { })) } +/// Parse [Statement::Go] +fn parse_go(&mut se

[PR] Enable CometFuzzTestSuite int96 test for experimental native scans (without complex types) [datafusion-comet]

2025-04-21 Thread via GitHub
mbutrovich opened a new pull request, #1664: URL: https://github.com/apache/datafusion-comet/pull/1664 ## Which issue does this PR close? N/A ## Rationale for this change #1652 added better int96 support for experimental native scans (relying on CometCast

Re: [PR] Add support for `GO` batch delimiter in SQL Server [datafusion-sqlparser-rs]

2025-04-21 Thread via GitHub
aharpervc commented on code in PR #1809: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1809#discussion_r2052471264 ## src/dialect/mssql.rs: ## @@ -116,7 +116,17 @@ impl Dialect for MsSqlDialect { true } -fn is_column_alias(&self, kw: &Keyword,

Re: [I] Re-implement memory management in native shuffle writer [datafusion-comet]

2025-04-21 Thread via GitHub
andygrove commented on issue #1446: URL: https://github.com/apache/datafusion-comet/issues/1446#issuecomment-2818553181 I tihnk that this is out of date now -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abov

Re: [I] Re-implement memory management in native shuffle writer [datafusion-comet]

2025-04-21 Thread via GitHub
andygrove closed issue #1446: Re-implement memory management in native shuffle writer URL: https://github.com/apache/datafusion-comet/issues/1446 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] Add support for `GO` batch delimiter in SQL Server [datafusion-sqlparser-rs]

2025-04-21 Thread via GitHub
aharpervc commented on code in PR #1809: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1809#discussion_r2052462208 ## src/parser/mod.rs: ## @@ -15064,6 +15117,38 @@ impl<'a> Parser<'a> { })) } +/// Parse [Statement::Go] +fn parse_go(&mut se

Re: [PR] chore: Enable CometFuzzTestSuite int96 test for experimental native scans (without complex types) [datafusion-comet]

2025-04-21 Thread via GitHub
andygrove merged PR #1664: URL: https://github.com/apache/datafusion-comet/pull/1664 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@

[PR] Factor out Substrait consumers into separate files [datafusion]

2025-04-21 Thread via GitHub
gabotechs opened a new pull request, #15794: URL: https://github.com/apache/datafusion/pull/15794 ## Which issue does this PR close? - Closes #13864. ## Rationale for this change The `consumer.rs` file grew a bit too big (~3400 LOC). Good thing is that it

[I] Migrate datafusion-cli tests to insta [datafusion]

2025-04-21 Thread via GitHub
blaginin opened a new issue, #15795: URL: https://github.com/apache/datafusion/issues/15795 In https://github.com/apache/datafusion/issues/15178, we're switching hard-coded constants in tests to `insta`. This issue targets updating **datafusion-cli tests** (`datafusion-cli/`).

Re: [PR] fix: update row groups count in internal metrics accumulator [datafusion-comet]

2025-04-21 Thread via GitHub
andygrove merged PR #1658: URL: https://github.com/apache/datafusion-comet/pull/1658 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@

Re: [I] [Discussion] Efficient Row Selection for Multi-Engine Support [datafusion]

2025-04-21 Thread via GitHub
XiangpengHao commented on issue #14816: URL: https://github.com/apache/datafusion/issues/14816#issuecomment-2819130637 > Is there any way in datafusion that we get different iterators for each of the file partitions we do when we create the physical plan? Not sure if this is what you

Re: [PR] fix: Shuffle should maintain insertion order [datafusion-comet]

2025-04-21 Thread via GitHub
andygrove merged PR #1660: URL: https://github.com/apache/datafusion-comet/pull/1660 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@

Re: [I] nit: Hash Partitioning Shuffle Writes Last Batches First [datafusion-comet]

2025-04-21 Thread via GitHub
andygrove closed issue #1659: nit: Hash Partitioning Shuffle Writes Last Batches First URL: https://github.com/apache/datafusion-comet/issues/1659 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] Add support for `GO` batch delimiter in SQL Server [datafusion-sqlparser-rs]

2025-04-21 Thread via GitHub
aharpervc commented on code in PR #1809: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1809#discussion_r2052785862 ## src/parser/mod.rs: ## @@ -4055,6 +4070,44 @@ impl<'a> Parser<'a> { ) } +/// Look backwards in the token stream and expect that

Re: [I] Investigate unstable benchmark results on macOS [datafusion-comet]

2025-04-21 Thread via GitHub
mbutrovich commented on issue #1648: URL: https://github.com/apache/datafusion-comet/issues/1648#issuecomment-2819405668 I profiled it and we're getting crushed in OS mutexes in the allocator. I noticed there's some support for mimalloc in Comet already that's not really documented. DF ena

  1   2   >