Re: [I] Support `merge` for `Distribution` [datafusion]

2025-03-18 Thread via GitHub
xudong963 commented on issue #15290: URL: https://github.com/apache/datafusion/issues/15290#issuecomment-2732391652 There is a proposal: https://github.com/apache/datafusion/pull/15296 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

Re: [PR] Improve performance of `first_value` by implementing special `GroupsAccumulator` [datafusion]

2025-03-18 Thread via GitHub
UBarney commented on code in PR #15266: URL: https://github.com/apache/datafusion/pull/15266#discussion_r2000992780 ## datafusion/functions-aggregate/src/first_last.rs: ## @@ -179,6 +292,423 @@ impl AggregateUDFImpl for FirstValue { } } +struct FirstGroupsAccumulator +wh

Re: [I] Failed optimizations with Int64 type [datafusion]

2025-03-18 Thread via GitHub
alamb commented on issue #15291: URL: https://github.com/apache/datafusion/issues/15291#issuecomment-2732103208 Thanks @aectaan -- what is the error message that you get? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and us

Re: [PR] Improve performance of `first_value` by implementing special `GroupsAccumulator` [datafusion]

2025-03-18 Thread via GitHub
UBarney commented on code in PR #15266: URL: https://github.com/apache/datafusion/pull/15266#discussion_r2000992780 ## datafusion/functions-aggregate/src/first_last.rs: ## @@ -179,6 +292,423 @@ impl AggregateUDFImpl for FirstValue { } } +struct FirstGroupsAccumulator +wh

Re: [PR] fix: Unconditionally wrap UNION BY NAME input nodes w/ `Projection` [datafusion]

2025-03-18 Thread via GitHub
Omega359 commented on code in PR #15242: URL: https://github.com/apache/datafusion/pull/15242#discussion_r2001009802 ## datafusion/sqllogictest/test_files/union_by_name.slt: ## @@ -287,3 +287,137 @@ SELECT '0' as c UNION ALL BY NAME SELECT 0 as c; 0 0 + +# Regression tes

Re: [PR] Migrate datasource tests to insta [datafusion]

2025-03-18 Thread via GitHub
blaginin commented on code in PR #15258: URL: https://github.com/apache/datafusion/pull/15258#discussion_r2001416176 ## datafusion/core/Cargo.toml: ## @@ -126,6 +126,7 @@ datafusion-physical-plan = { workspace = true } datafusion-sql = { workspace = true } flate2 = { version =

Re: [PR] Migrate tests to insta [datafusion]

2025-03-18 Thread via GitHub
jsai28 commented on code in PR #15288: URL: https://github.com/apache/datafusion/pull/15288#discussion_r2001379803 ## datafusion/core/tests/parquet/schema.rs: ## @@ -153,7 +151,15 @@ async fn schema_merge_can_preserve_metadata() { let actual = df.collect().await.unwrap();

Re: [PR] Migrate tests to insta [datafusion]

2025-03-18 Thread via GitHub
jsai28 commented on code in PR #15288: URL: https://github.com/apache/datafusion/pull/15288#discussion_r2001384434 ## datafusion/core/tests/sql/path_partition.rs: ## @@ -145,16 +146,7 @@ async fn parquet_distinct_partition_col() -> Result<()> { .collect() .awai

Re: [PR] minor: fix `data/sqlite` link [datafusion]

2025-03-18 Thread via GitHub
sdht0 commented on PR #15286: URL: https://github.com/apache/datafusion/pull/15286#issuecomment-2733877823 Ah fixed it thanks. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment

Re: [PR] Simplify display format of `AggregateFunctionExpr`, add `Expr::sql_name` [datafusion]

2025-03-18 Thread via GitHub
xudong963 commented on code in PR #15253: URL: https://github.com/apache/datafusion/pull/15253#discussion_r2001108185 ## datafusion/sqllogictest/test_files/explain_tree.slt: ## @@ -202,53 +202,48 @@ physical_plan 02)│ AggregateExec │ 03)│

Re: [PR] Refactor file schema type coercions [datafusion]

2025-03-18 Thread via GitHub
xudong963 commented on PR #15268: URL: https://github.com/apache/datafusion/pull/15268#issuecomment-2733344214 Thank you all! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] Refactor file schema type coercions [datafusion]

2025-03-18 Thread via GitHub
xudong963 merged PR #15268: URL: https://github.com/apache/datafusion/pull/15268 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@data

Re: [I] Support `merge` for `Distribution` [datafusion]

2025-03-18 Thread via GitHub
xudong963 commented on issue #15290: URL: https://github.com/apache/datafusion/issues/15290#issuecomment-2732132955 > Will require an accurate distribution (not just an approximation Yes, it depends on whether each distribution is accurate, if they're, the merged distribution should b

Re: [PR] Improve performance of `first_value` by implementing special `GroupsAccumulator` [datafusion]

2025-03-18 Thread via GitHub
Dandandan commented on code in PR #15266: URL: https://github.com/apache/datafusion/pull/15266#discussion_r2000574008 ## datafusion/functions-aggregate/src/first_last.rs: ## @@ -179,6 +292,423 @@ impl AggregateUDFImpl for FirstValue { } } +struct FirstGroupsAccumulator

Re: [PR] Simplify display format of `AggregateFunctionExpr`, add `Expr::sql_name` [datafusion]

2025-03-18 Thread via GitHub
xudong963 commented on code in PR #15253: URL: https://github.com/apache/datafusion/pull/15253#discussion_r2000832270 ## datafusion/sqllogictest/test_files/explain_tree.slt: ## @@ -202,53 +202,48 @@ physical_plan 02)│ AggregateExec │ 03)│

Re: [PR] Example for using a separate threadpool for CPU bound work (try 2) [datafusion]

2025-03-18 Thread via GitHub
alamb commented on PR #14286: URL: https://github.com/apache/datafusion/pull/14286#issuecomment-2733529188 Converting to a draft until we hav spawn service - https://github.com/apache/arrow-rs/pull/7253 -- This is an automated message from the Apache Git Service. To respond to the messa

Re: [PR] Remove inline table scan analyzer rule [datafusion]

2025-03-18 Thread via GitHub
jayzhan211 commented on code in PR #15201: URL: https://github.com/apache/datafusion/pull/15201#discussion_r2000904176 ## datafusion/core/tests/dataframe/mod.rs: ## @@ -1571,14 +1571,18 @@ async fn with_column_join_same_columns() -> Result<()> { assert_snapshot!(

Re: [PR] Triggering extended tests through PR comment [datafusion]

2025-03-18 Thread via GitHub
Omega359 commented on PR #15101: URL: https://github.com/apache/datafusion/pull/15101#issuecomment-2733251482 Is this ready for review or is there something outstanding for it to be still in draft? -- This is an automated message from the Apache Git Service. To respond to the message, ple

Re: [PR] Simplify display format of `AggregateFunctionExpr`, add `Expr::sql_name` [datafusion]

2025-03-18 Thread via GitHub
irenjj commented on PR #15253: URL: https://github.com/apache/datafusion/pull/15253#issuecomment-2733266739 Thanks @alamb, @jayzhan211 and @xudong963 for your review, here are two points that remain unclear: 1. For GROUP BY, is it necessary to preserve the row index -- for more informati

Re: [PR] minor: fix `data/sqlite` link [datafusion]

2025-03-18 Thread via GitHub
sdht0 commented on PR #15286: URL: https://github.com/apache/datafusion/pull/15286#issuecomment-2733900724 Weird that I received an email about another comment from Weijun-H but can't see it here? -- This is an automated message from the Apache Git Service. To respond to the message, plea

Re: [PR] Simplify display format of `AggregateFunctionExpr`, add `Expr::sql_name` [datafusion]

2025-03-18 Thread via GitHub
irenjj commented on PR #15253: URL: https://github.com/apache/datafusion/pull/15253#issuecomment-2732945218 > Is it possible to modify `Display` for Expr for explain statement? I haven't tried it, not sure how it will affect other logic. -- This is an automated message from the Apac

Re: [PR] feat: Fix multi-lines printing issue for datafusion-cli and add the streaming printing feature back [datafusion]

2025-03-18 Thread via GitHub
blaginin commented on code in PR #14954: URL: https://github.com/apache/datafusion/pull/14954#discussion_r2001448048 ## datafusion-cli/tests/cli_integration.rs: ## @@ -51,6 +51,163 @@ fn init() { ["--command", "show datafusion.execution.batch_size", "--format", "json", "-q

Re: [I] Require Comet 0.6 Docker image for Spark 3.5.5 [datafusion-comet]

2025-03-18 Thread via GitHub
RaghavendraGanesh commented on issue #1509: URL: https://github.com/apache/datafusion-comet/issues/1509#issuecomment-2733053103 Thanks @andygrove , will give it a try. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use t

Re: [D] enum Expr extension on logical level [datafusion]

2025-03-18 Thread via GitHub
GitHub user bertvermeiren edited a discussion: enum Expr extension on logical level Hi, In order to write some additional logical and physical plan implementations, we do have to create some kind of "composite" logical expression. Nowadays in the code base you have already existing expressi

Re: [PR] Migrate tests to insta [datafusion]

2025-03-18 Thread via GitHub
jsai28 commented on code in PR #15288: URL: https://github.com/apache/datafusion/pull/15288#discussion_r2001378037 ## datafusion/core/tests/parquet/schema.rs: ## @@ -82,7 +69,18 @@ async fn schema_merge_ignores_metadata_by_default() { .unwrap(); let actual = df.col

Re: [PR] Migrate tests to insta [datafusion]

2025-03-18 Thread via GitHub
jsai28 commented on code in PR #15288: URL: https://github.com/apache/datafusion/pull/15288#discussion_r2001380462 ## datafusion/core/tests/parquet/schema.rs: ## @@ -167,7 +173,15 @@ async fn schema_merge_can_preserve_metadata() { assert_eq!(actual.clone(), expected_metadat

Re: [PR] Migrate tests to insta [datafusion]

2025-03-18 Thread via GitHub
jsai28 commented on code in PR #15288: URL: https://github.com/apache/datafusion/pull/15288#discussion_r2001378793 ## datafusion/core/tests/parquet/schema.rs: ## @@ -97,7 +95,18 @@ async fn schema_merge_ignores_metadata_by_default() { .collect() .await

Re: [PR] Migrate tests to insta [datafusion]

2025-03-18 Thread via GitHub
jsai28 commented on code in PR #15288: URL: https://github.com/apache/datafusion/pull/15288#discussion_r2001386538 ## datafusion/core/tests/sql/path_partition.rs: ## @@ -430,21 +381,7 @@ async fn parquet_multiple_partitions() -> Result<()> { .collect() .await?;

Re: [PR] Migrate tests to insta [datafusion]

2025-03-18 Thread via GitHub
jsai28 commented on code in PR #15288: URL: https://github.com/apache/datafusion/pull/15288#discussion_r2001387282 ## datafusion/core/tests/sql/select.rs: ## @@ -30,23 +30,7 @@ async fn test_list_query_parameters() -> Result<()> { .with_param_values(vec![ScalarValue::fr

Re: [PR] Migrate tests to insta [datafusion]

2025-03-18 Thread via GitHub
jsai28 commented on code in PR #15288: URL: https://github.com/apache/datafusion/pull/15288#discussion_r2001385323 ## datafusion/core/tests/sql/path_partition.rs: ## @@ -313,18 +294,7 @@ async fn csv_filter_with_file_nonstring_col() -> Result<()> { .collect()

Re: [PR] Migrate tests to insta [datafusion]

2025-03-18 Thread via GitHub
jsai28 commented on code in PR #15288: URL: https://github.com/apache/datafusion/pull/15288#discussion_r2001388172 ## datafusion/core/tests/sql/select.rs: ## @@ -114,33 +72,7 @@ async fn test_prepare_statement() -> Result<()> { let dataframe = dataframe.with_param_values(pa

Re: [PR] Migrate tests to insta [datafusion]

2025-03-18 Thread via GitHub
jsai28 commented on code in PR #15288: URL: https://github.com/apache/datafusion/pull/15288#discussion_r2001376856 ## datafusion/core/tests/parquet/custom_reader.rs: ## @@ -96,17 +97,15 @@ async fn route_data_access_ops_to_parquet_file_reader_factory() { let task_ctx = ses

Re: [PR] Migrate user_defined tests to insta [datafusion]

2025-03-18 Thread via GitHub
blaginin commented on code in PR #15255: URL: https://github.com/apache/datafusion/pull/15255#discussion_r2000955601 ## datafusion/core/tests/user_defined/expr_planner.rs: ## @@ -73,52 +73,62 @@ async fn plan_and_collect(sql: &str) -> Result> { ctx.sql(sql).await?.collect(

Re: [PR] Add CatalogProvider and SchemaProvider to FFI Crate [datafusion]

2025-03-18 Thread via GitHub
alamb commented on PR #15280: URL: https://github.com/apache/datafusion/pull/15280#issuecomment-2732820890 🎉 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe

Re: [PR] Migrate user_defined tests to insta [datafusion]

2025-03-18 Thread via GitHub
blaginin commented on code in PR #15255: URL: https://github.com/apache/datafusion/pull/15255#discussion_r2000968071 ## datafusion/core/tests/user_defined/user_defined_table_functions.rs: ## @@ -34,11 +34,19 @@ use datafusion::physical_plan::{collect, ExecutionPlan}; use datafu

Re: [PR] Migrate user_defined tests to insta [datafusion]

2025-03-18 Thread via GitHub
blaginin commented on code in PR #15255: URL: https://github.com/apache/datafusion/pull/15255#discussion_r2000967075 ## datafusion/core/tests/user_defined/user_defined_window_functions.rs: ## @@ -57,30 +57,38 @@ const BOUNDED_WINDOW_QUERY: &str = odd_counter(val) OVER (P

[I] Update all github workflow to use actions tied to sha hashes [datafusion]

2025-03-18 Thread via GitHub
Omega359 opened a new issue, #15298: URL: https://github.com/apache/datafusion/issues/15298 ### Is your feature request related to a problem or challenge? A recent [supply chain attack](https://arstechnica.com/information-technology/2025/03/supply-chain-attack-exposing-credentials-aff

Re: [PR] Add GLOBAL context/modifier to SET statements [datafusion-sqlparser-rs]

2025-03-18 Thread via GitHub
MohamedAbdeen21 commented on code in PR #1767: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1767#discussion_r2000753514 ## src/ast/mod.rs: ## @@ -7919,11 +7921,28 @@ impl fmt::Display for ContextModifier { write!(f, "") }

Re: [I] Update all github workflow to use actions tied to sha hashes [datafusion]

2025-03-18 Thread via GitHub
alamb commented on issue #15298: URL: https://github.com/apache/datafusion/issues/15298#issuecomment-2734055350 Thank you @Omega359 -- I agree this is very important -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

Re: [PR] Fix predicate pushdown for custom SchemaAdapters [datafusion]

2025-03-18 Thread via GitHub
alamb commented on code in PR #15263: URL: https://github.com/apache/datafusion/pull/15263#discussion_r200126 ## datafusion/core/src/datasource/physical_plan/parquet.rs: ## @@ -224,6 +224,327 @@ mod tests { ) } +#[tokio::test] +async fn test_pushdown_

Re: [I] Add support for S3 Object Store in default binaries [datafusion-ballista]

2025-03-18 Thread via GitHub
milenkovicm commented on issue #1205: URL: https://github.com/apache/datafusion-ballista/issues/1205#issuecomment-2734056479 once we have s3 support it should work with minio. would you be interested in contributing @fithisux ? -- This is an automated message from the Apache Git Service

Re: [I] Update all github workflow to use actions tied to sha hashes [datafusion]

2025-03-18 Thread via GitHub
alamb commented on issue #15298: URL: https://github.com/apache/datafusion/issues/15298#issuecomment-2734058015 I think this is a good first issue as the write up is clear and there is an example to follow -- This is an automated message from the Apache Git Service. To respond to the mess

Re: [I] Failed optimizations with Int64 type [datafusion]

2025-03-18 Thread via GitHub
alamb commented on issue #15291: URL: https://github.com/apache/datafusion/issues/15291#issuecomment-2734059396 Thanks @aectaan -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comme

Re: [I] Failed optimizations with Int64 type [datafusion]

2025-03-18 Thread via GitHub
alamb commented on issue #15291: URL: https://github.com/apache/datafusion/issues/15291#issuecomment-2734063672 I wonder if you can get a pure SQL (`datafusion-cli` based) reproducer? Or does it require creating and configuring a custom context / optimizer rules 🤔 -- This is an automated

[PR] feat: support merge for `Distribution` [datafusion]

2025-03-18 Thread via GitHub
xudong963 opened a new pull request, #15296: URL: https://github.com/apache/datafusion/pull/15296 ## Which issue does this PR close? - Closes https://github.com/apache/datafusion/issues/15290 ## Rationale for this change See issue #15290 ## What change

Re: [PR] Improve performance of `first_value` by implementing special `GroupsAccumulator` [datafusion]

2025-03-18 Thread via GitHub
2010YOUY01 commented on code in PR #15266: URL: https://github.com/apache/datafusion/pull/15266#discussion_r2000593852 ## datafusion/functions-aggregate/src/first_last.rs: ## @@ -179,6 +292,423 @@ impl AggregateUDFImpl for FirstValue { } } +struct FirstGroupsAccumulator

Re: [PR] Simplify display format of `AggregateFunctionExpr`, add `Expr::sql_name` [datafusion]

2025-03-18 Thread via GitHub
irenjj commented on code in PR #15253: URL: https://github.com/apache/datafusion/pull/15253#discussion_r2000877388 ## datafusion/sqllogictest/test_files/explain_tree.slt: ## @@ -202,53 +202,48 @@ physical_plan 02)│ AggregateExec │ 03)│ │

Re: [PR] chore(deps): bump async-trait from 0.1.87 to 0.1.88 [datafusion]

2025-03-18 Thread via GitHub
xudong963 merged PR #15294: URL: https://github.com/apache/datafusion/pull/15294 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@data

Re: [D] Does DataFusion Support JSON Path Filtering Like `jsonb_path_exists` in PostgreSQL? [datafusion]

2025-03-18 Thread via GitHub
GitHub user dadepo added a comment to the discussion: Does DataFusion Support JSON Path Filtering Like `jsonb_path_exists` in PostgreSQL? > You could create a function called `jsonb_path_exists` that takes a binary > column and a json path string perhaps? I think what I am missing is how this

Re: [I] [EPIC] A collection of tickets for improved WASM support in DataFusion [datafusion]

2025-03-18 Thread via GitHub
savaliyabhargav commented on issue #13815: URL: https://github.com/apache/datafusion/issues/13815#issuecomment-2732178094 @matthewmturner yes sure i am interested can you please give me more detail about it -- This is an automated message from the Apache Git Service. To respond to th

[I] Support `merge` for `Distribution` [datafusion]

2025-03-18 Thread via GitHub
xudong963 opened a new issue, #15290: URL: https://github.com/apache/datafusion/issues/15290 ### Is your feature request related to a problem or challenge? I'm working on the ticket: https://github.com/apache/datafusion/issues/10316. Given that, we'll replace all `Precision` wit

Re: [PR] chore: Update links for released version [datafusion-comet]

2025-03-18 Thread via GitHub
andygrove commented on code in PR #1540: URL: https://github.com/apache/datafusion-comet/pull/1540#discussion_r2001302504 ## docs/source/user-guide/kubernetes.md: ## @@ -65,31 +65,31 @@ metadata: spec: type: Scala mode: cluster - image: ghcr.io/apache/datafusion-comet:sp

Re: [I] Add support for S3 Object Store in default binaries [datafusion-ballista]

2025-03-18 Thread via GitHub
fithisux commented on issue #1205: URL: https://github.com/apache/datafusion-ballista/issues/1205#issuecomment-2733646600 I would rather have minio support because they provide full working docker images that you incorporate in a full fledge docker compose educational stack. -- This is a

Re: [I] Failed optimizations with Int64 type [datafusion]

2025-03-18 Thread via GitHub
aectaan commented on issue #15291: URL: https://github.com/apache/datafusion/issues/15291#issuecomment-2732112476 @alamb it's related to common subexpr eliminate: ``` called `Result::unwrap()` on an `Err` value: Context("Optimizer rule 'common_sub_expression_eliminate' failed", SchemaE

[PR] Enhance Schema adapter to accommodate evolving struct [datafusion]

2025-03-18 Thread via GitHub
kosiew opened a new pull request, #15295: URL: https://github.com/apache/datafusion/pull/15295 ## Which issue does this PR close? - Closes #14757. ## Rationale for this change This PR introduces a `NestedStructSchemaAdapter` to improve schema evolution handling in DataFu

Re: [PR] Enhance Schema adapter to accommodate evolving struct [datafusion]

2025-03-18 Thread via GitHub
kosiew commented on code in PR #15295: URL: https://github.com/apache/datafusion/pull/15295#discussion_r2000486806 ## datafusion/datasource/src/nested_schema_adapter.rs: ## @@ -0,0 +1,582 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor l

Re: [PR] Enhance Schema adapter to accommodate evolving struct [datafusion]

2025-03-18 Thread via GitHub
kosiew commented on code in PR #15295: URL: https://github.com/apache/datafusion/pull/15295#discussion_r2000486806 ## datafusion/datasource/src/nested_schema_adapter.rs: ## @@ -0,0 +1,582 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor l

Re: [PR] Simplify display format of `AggregateFunctionExpr`, add `Expr::sql_name` [datafusion]

2025-03-18 Thread via GitHub
irenjj commented on code in PR #15253: URL: https://github.com/apache/datafusion/pull/15253#discussion_r2001153997 ## datafusion/physical-plan/src/aggregates/mod.rs: ## @@ -801,6 +803,16 @@ impl DisplayAs for AggregateExec { } } Display

Re: [PR] chore: Attach Diagnostic to "incompatible type in unary expression" error [datafusion]

2025-03-18 Thread via GitHub
onlyjackfrost commented on code in PR #15209: URL: https://github.com/apache/datafusion/pull/15209#discussion_r2001158621 ## datafusion/sql/src/expr/unary_op.rs: ## @@ -45,7 +45,18 @@ impl SqlToRel<'_, S> { { Ok(operand) } e

Re: [PR] Add support for `RAISE` statement [datafusion-sqlparser-rs]

2025-03-18 Thread via GitHub
iffyio commented on code in PR #1766: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1766#discussion_r2001170116 ## src/ast/mod.rs: ## @@ -2256,6 +2256,57 @@ impl fmt::Display for ConditionalStatements { } } +/// A `RAISE` statement. +/// +/// Examples: +//

Re: [PR] Simplify display format of `AggregateFunctionExpr`, add `Expr::sql_name` [datafusion]

2025-03-18 Thread via GitHub
irenjj commented on code in PR #15253: URL: https://github.com/apache/datafusion/pull/15253#discussion_r2001176277 ## datafusion/expr/src/expr.rs: ## @@ -2596,6 +2612,176 @@ impl Display for SchemaDisplay<'_> { } } +struct SqlDisplay<'a>(&'a Expr); +impl Display for SqlD

Re: [PR] chore: Attach Diagnostic to "incompatible type in unary expression" error [datafusion]

2025-03-18 Thread via GitHub
onlyjackfrost commented on code in PR #15209: URL: https://github.com/apache/datafusion/pull/15209#discussion_r2001158621 ## datafusion/sql/src/expr/unary_op.rs: ## @@ -45,7 +45,18 @@ impl SqlToRel<'_, S> { { Ok(operand) } e

Re: [PR] Add support for `RAISE` statement [datafusion-sqlparser-rs]

2025-03-18 Thread via GitHub
iffyio merged PR #1766: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1766 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr

Re: [PR] chore: Attach Diagnostic to "incompatible type in unary expression" error [datafusion]

2025-03-18 Thread via GitHub
eliaperantoni commented on code in PR #15209: URL: https://github.com/apache/datafusion/pull/15209#discussion_r2001166961 ## datafusion/sql/src/expr/unary_op.rs: ## @@ -45,7 +45,18 @@ impl SqlToRel<'_, S> { { Ok(operand) } e

Re: [PR] chore: Attach Diagnostic to "incompatible type in unary expression" error [datafusion]

2025-03-18 Thread via GitHub
onlyjackfrost commented on code in PR #15209: URL: https://github.com/apache/datafusion/pull/15209#discussion_r2001158621 ## datafusion/sql/src/expr/unary_op.rs: ## @@ -45,7 +45,18 @@ impl SqlToRel<'_, S> { { Ok(operand) } e

Re: [PR] Improve performance of `first_value` by implementing special `GroupsAccumulator` [datafusion]

2025-03-18 Thread via GitHub
blaginin commented on code in PR #15266: URL: https://github.com/apache/datafusion/pull/15266#discussion_r2001206529 ## datafusion/functions-aggregate/src/first_last.rs: ## @@ -179,6 +292,424 @@ impl AggregateUDFImpl for FirstValue { } } +struct FirstPrimitiveGroupsAccum

Re: [PR] Improve performance of `first_value` by implementing special `GroupsAccumulator` [datafusion]

2025-03-18 Thread via GitHub
blaginin commented on code in PR #15266: URL: https://github.com/apache/datafusion/pull/15266#discussion_r2001210201 ## datafusion/functions-aggregate/src/first_last.rs: ## @@ -179,6 +292,424 @@ impl AggregateUDFImpl for FirstValue { } } +struct FirstPrimitiveGroupsAccum

Re: [I] Timeouts reading "large" files from object stores over "slow" connections [datafusion]

2025-03-18 Thread via GitHub
alamb commented on issue #15067: URL: https://github.com/apache/datafusion/issues/15067#issuecomment-2733546979 I am convinced this issue would be solved with automatic retries - https://github.com/apache/arrow-rs/issues/7242 -- This is an automated message from the Apache Git Service.

Re: [PR] Add GLOBAL context/modifier to SET statements [datafusion-sqlparser-rs]

2025-03-18 Thread via GitHub
iffyio commented on code in PR #1767: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1767#discussion_r2000290321 ## src/ast/mod.rs: ## @@ -7919,11 +7921,28 @@ impl fmt::Display for ContextModifier { write!(f, "") } Self::L

[PR] chore(deps): bump uuid from 1.15.1 to 1.16.0 [datafusion]

2025-03-18 Thread via GitHub
dependabot[bot] opened a new pull request, #15292: URL: https://github.com/apache/datafusion/pull/15292 Bumps [uuid](https://github.com/uuid-rs/uuid) from 1.15.1 to 1.16.0. Release notes Sourced from https://github.com/uuid-rs/uuid/releases";>uuid's releases. v1.16.0 What'

[PR] chore(deps): bump rust_decimal from 1.36.0 to 1.37.0 [datafusion]

2025-03-18 Thread via GitHub
dependabot[bot] opened a new pull request, #15293: URL: https://github.com/apache/datafusion/pull/15293 Bumps [rust_decimal](https://github.com/paupino/rust-decimal) from 1.36.0 to 1.37.0. Release notes Sourced from https://github.com/paupino/rust-decimal/releases";>rust_decimal's

[PR] chore(deps): bump async-trait from 0.1.87 to 0.1.88 [datafusion]

2025-03-18 Thread via GitHub
dependabot[bot] opened a new pull request, #15294: URL: https://github.com/apache/datafusion/pull/15294 Bumps [async-trait](https://github.com/dtolnay/async-trait) from 0.1.87 to 0.1.88. Release notes Sourced from https://github.com/dtolnay/async-trait/releases";>async-trait's rel

Re: [PR] fix: Queries similar to `count-bug` produce incorrect results [datafusion]

2025-03-18 Thread via GitHub
suibianwanwank commented on PR #15281: URL: https://github.com/apache/datafusion/pull/15281#issuecomment-2731893665 @alamb Hi, I have updated the PR title to start with "fix," but it seems that the "bug" label has not been added. And the PR is now ready for review, Please take a look at you

Re: [PR] Minor: consistently apply `clippy::clone_on_ref_ptr` in all crates [datafusion]

2025-03-18 Thread via GitHub
berkaysynnada merged PR #15284: URL: https://github.com/apache/datafusion/pull/15284 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@

Re: [PR] Minor: consistently apply `clippy::clone_on_ref_ptr` in all crates [datafusion]

2025-03-18 Thread via GitHub
berkaysynnada commented on PR #15284: URL: https://github.com/apache/datafusion/pull/15284#issuecomment-2731952749 Thank you @alamb and @comphead -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [I] Upgrade Guide for DataFusion 46 does not include the array signatures change [datafusion]

2025-03-18 Thread via GitHub
alamb closed issue #15105: Upgrade Guide for DataFusion 46 does not include the array signatures change URL: https://github.com/apache/datafusion/issues/15105 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [PR] Fix predicate pushdown for custom SchemaAdapters [datafusion]

2025-03-18 Thread via GitHub
adriangb commented on code in PR #15263: URL: https://github.com/apache/datafusion/pull/15263#discussion_r2001600091 ## datafusion/core/src/datasource/physical_plan/parquet.rs: ## @@ -224,6 +224,327 @@ mod tests { ) } +#[tokio::test] +async fn test_pushdo

Re: [PR] Add upgrade notes for array signatures [datafusion]

2025-03-18 Thread via GitHub
alamb merged PR #15237: URL: https://github.com/apache/datafusion/pull/15237 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] build: Use unique name for surefire artifacts [datafusion-comet]

2025-03-18 Thread via GitHub
andygrove merged PR #1544: URL: https://github.com/apache/datafusion-comet/pull/1544 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@

[PR] docs: Use a shallow clone for Spark SQL tests. [datafusion-comet]

2025-03-18 Thread via GitHub
mbutrovich opened a new pull request, #1547: URL: https://github.com/apache/datafusion-comet/pull/1547 ## Which issue does this PR close? Closes #. ## Rationale for this change We don't need the whole Spark repository to run Spark SQL tests. Current

Re: [I] Failed optimizations with Int64 type [datafusion]

2025-03-18 Thread via GitHub
alamb commented on issue #15291: URL: https://github.com/apache/datafusion/issues/15291#issuecomment-2734534317 > Unfortunately `datafusion-cli` parser fails at this request: doesn't like opening brace before WHERE - that's why I made it as a test. Maybe I missing something. Maybe yo

Re: [PR] refactor: Move view and stream from `datasource` to `catalog` [datafusion]

2025-03-18 Thread via GitHub
alamb commented on code in PR #15260: URL: https://github.com/apache/datafusion/pull/15260#discussion_r2001661867 ## datafusion/catalog/Cargo.toml: ## @@ -35,17 +35,18 @@ arrow = { workspace = true } async-trait = { workspace = true } dashmap = { workspace = true } datafusion

Re: [PR] chore: Update links for released version [datafusion-comet]

2025-03-18 Thread via GitHub
comphead commented on code in PR #1540: URL: https://github.com/apache/datafusion-comet/pull/1540#discussion_r2001712113 ## docs/source/user-guide/kubernetes.md: ## @@ -65,31 +65,31 @@ metadata: spec: type: Scala mode: cluster - image: ghcr.io/apache/datafusion-comet:spa

Re: [PR] minor: fix `data/sqlite` link [datafusion]

2025-03-18 Thread via GitHub
alamb merged PR #15286: URL: https://github.com/apache/datafusion/pull/15286 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] WIP: Test arrow-rs 54.3.0 upgrade [datafusion]

2025-03-18 Thread via GitHub
alamb closed pull request #15285: WIP: Test arrow-rs 54.3.0 upgrade URL: https://github.com/apache/datafusion/pull/15285 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsu

Re: [PR] feat: add read array support [datafusion-comet]

2025-03-18 Thread via GitHub
comphead commented on code in PR #1456: URL: https://github.com/apache/datafusion-comet/pull/1456#discussion_r2001744891 ## spark/src/main/scala/org/apache/comet/serde/QueryPlanSerde.scala: ## @@ -61,13 +61,15 @@ object QueryPlanSerde extends Logging with ShimQueryPlanSerde wit

Re: [PR] Support logic optimize rule to pass the case that Utf8view datatype combined with Utf8 datatype [datafusion]

2025-03-18 Thread via GitHub
alamb commented on PR #15239: URL: https://github.com/apache/datafusion/pull/15239#issuecomment-2734365565 Thanks again@ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [PR] wip: Update benchmark results for 0.7.0 release [datafusion-comet]

2025-03-18 Thread via GitHub
andygrove commented on code in PR #1548: URL: https://github.com/apache/datafusion-comet/pull/1548#discussion_r2001820424 ## README.md: ## @@ -46,23 +46,23 @@ The following chart shows the time it takes to run the 22 TPC-H queries against using a single executor with 8 cores.

Re: [PR] Simplify display format of `AggregateFunctionExpr`, add `Expr::sql_name` [datafusion]

2025-03-18 Thread via GitHub
alamb commented on PR #15253: URL: https://github.com/apache/datafusion/pull/15253#issuecomment-2734461898 > Thanks @alamb, @jayzhan211 and @xudong963 for your review, here are two points that remain unclear: > > 1. For GROUP BY, is it necessary to preserve the row ind

Re: [PR] CI Red: Fix union in view table test [datafusion]

2025-03-18 Thread via GitHub
jonahgao commented on code in PR #15300: URL: https://github.com/apache/datafusion/pull/15300#discussion_r2002313134 ## datafusion/sqllogictest/test_files/union.slt: ## @@ -907,11 +907,56 @@ SELECT * FROM (SELECT y FROM u1 UNION ALL SELECT y FROM u2) ORDER BY y; 20 40 +quer

Re: [PR] CI Red: Fix union in view table test [datafusion]

2025-03-18 Thread via GitHub
jonahgao merged PR #15300: URL: https://github.com/apache/datafusion/pull/15300 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

Re: [I] [EPIC] Attach `Diagnostic` to more errors [datafusion]

2025-03-18 Thread via GitHub
jsai28 commented on issue #14429: URL: https://github.com/apache/datafusion/issues/14429#issuecomment-2735199806 Hi, is this still a potential GSoC project? It looks like many of the tickets in this epic have an open pull request and are close to completion. If you know of any other areas t

Re: [I] Investigate TPC-H q4 hanging when not enough memory is allocated [datafusion-comet]

2025-03-18 Thread via GitHub
Kontinuation commented on issue #1523: URL: https://github.com/apache/datafusion-comet/issues/1523#issuecomment-2735209828 The query blocked because we don't have enough number of blocking threads configured for the tokio runtime. In merge phase, each spill file will be wrapped by a

Re: [I] [EPIC] A collection of tickets for improved WASM support in DataFusion [datafusion]

2025-03-18 Thread via GitHub
matthewmturner commented on issue #13815: URL: https://github.com/apache/datafusion/issues/13815#issuecomment-2735185316 For the WASM UDFs they just need some more real world testing / benchmarking. To be honest, the other points @alamb mentioned would probably better benefit the DataFusio

Re: [PR] feat: support merge for `Distribution` [datafusion]

2025-03-18 Thread via GitHub
xudong963 commented on code in PR #15296: URL: https://github.com/apache/datafusion/pull/15296#discussion_r2002335941 ## datafusion/expr-common/src/statistics.rs: ## @@ -857,6 +857,143 @@ pub fn compute_variance( ScalarValue::try_from(target_type) } +/// Merges two distr

Re: [PR] feat: support merge for `Distribution` [datafusion]

2025-03-18 Thread via GitHub
xudong963 commented on code in PR #15296: URL: https://github.com/apache/datafusion/pull/15296#discussion_r2002299377 ## datafusion/expr-common/src/statistics.rs: ## @@ -857,6 +857,143 @@ pub fn compute_variance( ScalarValue::try_from(target_type) } +/// Merges two distr

Re: [I] [DISCUSS] Release DataFusion `46.0.1` Patch or `46.1.0` minor release (March 2025) [datafusion]

2025-03-18 Thread via GitHub
xudong963 commented on issue #15151: URL: https://github.com/apache/datafusion/issues/15151#issuecomment-2735217322 Just a reminder, we can do a final release today, it seems to require a PMC member to do the last steps. cc@alamb. -- This is an automated message from the Apache Git Servic

Re: [PR] feat: support merge for `Distribution` [datafusion]

2025-03-18 Thread via GitHub
xudong963 commented on PR #15296: URL: https://github.com/apache/datafusion/pull/15296#issuecomment-2735210417 > I think eventually it would be nice to add some tests for this code Yes, as the ticket description said: I'll do it after we are consistent. -- This is an automated messa

Re: [PR] Improve performance of `first_value` by implementing special `GroupsAccumulator` [datafusion]

2025-03-18 Thread via GitHub
UBarney commented on code in PR #15266: URL: https://github.com/apache/datafusion/pull/15266#discussion_r2002297890 ## datafusion/functions-aggregate/src/first_last.rs: ## @@ -179,6 +292,424 @@ impl AggregateUDFImpl for FirstValue { } } +struct FirstPrimitiveGroupsAccumu

Re: [PR] Add GLOBAL context/modifier to SET statements [datafusion-sqlparser-rs]

2025-03-18 Thread via GitHub
iffyio commented on code in PR #1767: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1767#discussion_r2001562854 ## src/ast/mod.rs: ## @@ -7919,11 +7921,28 @@ impl fmt::Display for ContextModifier { write!(f, "") } Self::L

  1   2   3   >