Re: [PR] Fix predicate pushdown for custom SchemaAdapters [datafusion]

2025-03-18 Thread via GitHub
adriangb commented on code in PR #15263: URL: https://github.com/apache/datafusion/pull/15263#discussion_r2001600091 ## datafusion/core/src/datasource/physical_plan/parquet.rs: ## @@ -224,6 +224,327 @@ mod tests { ) } +#[tokio::test] +async fn test_pushdo

Re: [PR] Add upgrade notes for array signatures [datafusion]

2025-03-18 Thread via GitHub
alamb merged PR #15237: URL: https://github.com/apache/datafusion/pull/15237 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] build: Use unique name for surefire artifacts [datafusion-comet]

2025-03-18 Thread via GitHub
andygrove merged PR #1544: URL: https://github.com/apache/datafusion-comet/pull/1544 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@

[PR] docs: Use a shallow clone for Spark SQL tests. [datafusion-comet]

2025-03-18 Thread via GitHub
mbutrovich opened a new pull request, #1547: URL: https://github.com/apache/datafusion-comet/pull/1547 ## Which issue does this PR close? Closes #. ## Rationale for this change We don't need the whole Spark repository to run Spark SQL tests. Current

Re: [I] Failed optimizations with Int64 type [datafusion]

2025-03-18 Thread via GitHub
alamb commented on issue #15291: URL: https://github.com/apache/datafusion/issues/15291#issuecomment-2734534317 > Unfortunately `datafusion-cli` parser fails at this request: doesn't like opening brace before WHERE - that's why I made it as a test. Maybe I missing something. Maybe yo

Re: [PR] refactor: Move view and stream from `datasource` to `catalog` [datafusion]

2025-03-18 Thread via GitHub
alamb commented on code in PR #15260: URL: https://github.com/apache/datafusion/pull/15260#discussion_r2001661867 ## datafusion/catalog/Cargo.toml: ## @@ -35,17 +35,18 @@ arrow = { workspace = true } async-trait = { workspace = true } dashmap = { workspace = true } datafusion

Re: [PR] chore: Update links for released version [datafusion-comet]

2025-03-18 Thread via GitHub
comphead commented on code in PR #1540: URL: https://github.com/apache/datafusion-comet/pull/1540#discussion_r2001712113 ## docs/source/user-guide/kubernetes.md: ## @@ -65,31 +65,31 @@ metadata: spec: type: Scala mode: cluster - image: ghcr.io/apache/datafusion-comet:spa

Re: [PR] minor: fix `data/sqlite` link [datafusion]

2025-03-18 Thread via GitHub
alamb merged PR #15286: URL: https://github.com/apache/datafusion/pull/15286 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] WIP: Test arrow-rs 54.3.0 upgrade [datafusion]

2025-03-18 Thread via GitHub
alamb closed pull request #15285: WIP: Test arrow-rs 54.3.0 upgrade URL: https://github.com/apache/datafusion/pull/15285 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsu

Re: [PR] feat: add read array support [datafusion-comet]

2025-03-18 Thread via GitHub
comphead commented on code in PR #1456: URL: https://github.com/apache/datafusion-comet/pull/1456#discussion_r2001744891 ## spark/src/main/scala/org/apache/comet/serde/QueryPlanSerde.scala: ## @@ -61,13 +61,15 @@ object QueryPlanSerde extends Logging with ShimQueryPlanSerde wit

Re: [PR] Support logic optimize rule to pass the case that Utf8view datatype combined with Utf8 datatype [datafusion]

2025-03-18 Thread via GitHub
alamb commented on PR #15239: URL: https://github.com/apache/datafusion/pull/15239#issuecomment-2734365565 Thanks again@ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [PR] wip: Update benchmark results for 0.7.0 release [datafusion-comet]

2025-03-18 Thread via GitHub
andygrove commented on code in PR #1548: URL: https://github.com/apache/datafusion-comet/pull/1548#discussion_r2001820424 ## README.md: ## @@ -46,23 +46,23 @@ The following chart shows the time it takes to run the 22 TPC-H queries against using a single executor with 8 cores.

Re: [PR] Simplify display format of `AggregateFunctionExpr`, add `Expr::sql_name` [datafusion]

2025-03-18 Thread via GitHub
alamb commented on PR #15253: URL: https://github.com/apache/datafusion/pull/15253#issuecomment-2734461898 > Thanks @alamb, @jayzhan211 and @xudong963 for your review, here are two points that remain unclear: > > 1. For GROUP BY, is it necessary to preserve the row ind

Re: [PR] CI Red: Fix union in view table test [datafusion]

2025-03-18 Thread via GitHub
jonahgao commented on code in PR #15300: URL: https://github.com/apache/datafusion/pull/15300#discussion_r2002313134 ## datafusion/sqllogictest/test_files/union.slt: ## @@ -907,11 +907,56 @@ SELECT * FROM (SELECT y FROM u1 UNION ALL SELECT y FROM u2) ORDER BY y; 20 40 +quer

Re: [PR] CI Red: Fix union in view table test [datafusion]

2025-03-18 Thread via GitHub
jonahgao merged PR #15300: URL: https://github.com/apache/datafusion/pull/15300 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

Re: [I] [EPIC] Attach `Diagnostic` to more errors [datafusion]

2025-03-18 Thread via GitHub
jsai28 commented on issue #14429: URL: https://github.com/apache/datafusion/issues/14429#issuecomment-2735199806 Hi, is this still a potential GSoC project? It looks like many of the tickets in this epic have an open pull request and are close to completion. If you know of any other areas t

Re: [I] Investigate TPC-H q4 hanging when not enough memory is allocated [datafusion-comet]

2025-03-18 Thread via GitHub
Kontinuation commented on issue #1523: URL: https://github.com/apache/datafusion-comet/issues/1523#issuecomment-2735209828 The query blocked because we don't have enough number of blocking threads configured for the tokio runtime. In merge phase, each spill file will be wrapped by a

Re: [I] [EPIC] A collection of tickets for improved WASM support in DataFusion [datafusion]

2025-03-18 Thread via GitHub
matthewmturner commented on issue #13815: URL: https://github.com/apache/datafusion/issues/13815#issuecomment-2735185316 For the WASM UDFs they just need some more real world testing / benchmarking. To be honest, the other points @alamb mentioned would probably better benefit the DataFusio

Re: [PR] feat: support merge for `Distribution` [datafusion]

2025-03-18 Thread via GitHub
xudong963 commented on code in PR #15296: URL: https://github.com/apache/datafusion/pull/15296#discussion_r2002335941 ## datafusion/expr-common/src/statistics.rs: ## @@ -857,6 +857,143 @@ pub fn compute_variance( ScalarValue::try_from(target_type) } +/// Merges two distr

Re: [PR] feat: support merge for `Distribution` [datafusion]

2025-03-18 Thread via GitHub
xudong963 commented on code in PR #15296: URL: https://github.com/apache/datafusion/pull/15296#discussion_r2002299377 ## datafusion/expr-common/src/statistics.rs: ## @@ -857,6 +857,143 @@ pub fn compute_variance( ScalarValue::try_from(target_type) } +/// Merges two distr

Re: [I] [DISCUSS] Release DataFusion `46.0.1` Patch or `46.1.0` minor release (March 2025) [datafusion]

2025-03-18 Thread via GitHub
xudong963 commented on issue #15151: URL: https://github.com/apache/datafusion/issues/15151#issuecomment-2735217322 Just a reminder, we can do a final release today, it seems to require a PMC member to do the last steps. cc@alamb. -- This is an automated message from the Apache Git Servic

Re: [PR] feat: support merge for `Distribution` [datafusion]

2025-03-18 Thread via GitHub
xudong963 commented on PR #15296: URL: https://github.com/apache/datafusion/pull/15296#issuecomment-2735210417 > I think eventually it would be nice to add some tests for this code Yes, as the ticket description said: I'll do it after we are consistent. -- This is an automated messa

Re: [PR] Improve performance of `first_value` by implementing special `GroupsAccumulator` [datafusion]

2025-03-18 Thread via GitHub
UBarney commented on code in PR #15266: URL: https://github.com/apache/datafusion/pull/15266#discussion_r2002297890 ## datafusion/functions-aggregate/src/first_last.rs: ## @@ -179,6 +292,424 @@ impl AggregateUDFImpl for FirstValue { } } +struct FirstPrimitiveGroupsAccumu

Re: [PR] Add GLOBAL context/modifier to SET statements [datafusion-sqlparser-rs]

2025-03-18 Thread via GitHub
iffyio commented on code in PR #1767: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1767#discussion_r2001562854 ## src/ast/mod.rs: ## @@ -7919,11 +7921,28 @@ impl fmt::Display for ContextModifier { write!(f, "") } Self::L

Re: [PR] Add GLOBAL context/modifier to SET statements [datafusion-sqlparser-rs]

2025-03-18 Thread via GitHub
mvzink commented on code in PR #1767: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1767#discussion_r2001539812 ## src/ast/mod.rs: ## @@ -7910,6 +7910,8 @@ pub enum ContextModifier { Local, /// `SESSION` identifier Session, +/// `GLOBAL` identif

Re: [PR] feat: Native support utf8view for regex string operators [datafusion]

2025-03-18 Thread via GitHub
alamb commented on PR #15275: URL: https://github.com/apache/datafusion/pull/15275#issuecomment-2734067285 Thanks again @zhuqi-lucas -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] feat: Native support utf8view for regex string operators [datafusion]

2025-03-18 Thread via GitHub
alamb merged PR #15275: URL: https://github.com/apache/datafusion/pull/15275 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [I] Add SQL examples to window functions: `nth_value`, etc [datafusion]

2025-03-18 Thread via GitHub
sageraven1 commented on issue #13399: URL: https://github.com/apache/datafusion/issues/13399#issuecomment-2734096284 > [@sageraven1](https://github.com/sageraven1) , Are you still working on this? I see that PR was marked as stale and got closed. If you aren't working on this, I would like

Re: [PR] Simplify display format of `AggregateFunctionExpr`, add `Expr::sql_name` [datafusion]

2025-03-18 Thread via GitHub
alamb commented on code in PR #15253: URL: https://github.com/apache/datafusion/pull/15253#discussion_r2001842580 ## datafusion/expr/src/expr.rs: ## @@ -64,6 +64,15 @@ use sqlparser::ast::{ /// /// [`ExprFunctionExt`]: crate::expr_fn::ExprFunctionExt /// +/// # Printing Expre

Re: [PR] Blog post on Parquet pruning in datafusion [datafusion-site]

2025-03-18 Thread via GitHub
comphead commented on code in PR #60: URL: https://github.com/apache/datafusion-site/pull/60#discussion_r2002075851 ## content/blog/2025-03-18-parquet-pruning.md: ## @@ -0,0 +1,111 @@ +--- +layout: post +title: Parquet pruning in DataFusion: Read Only What Matters +date: 2025-03

Re: [I] Build failure in flight_sql.rs [datafusion-ballista]

2025-03-18 Thread via GitHub
ahmedriza commented on issue #895: URL: https://github.com/apache/datafusion-ballista/issues/895#issuecomment-2734822308 > It looks like this issue has been fixed, is it ok to close this issue? Sure, thanks -- This is an automated message from the Apache Git Service. To respond to

Re: [I] Doc: Add an example how to test Comet in K8s against user defined query [datafusion-comet]

2025-03-18 Thread via GitHub
comphead commented on issue #1546: URL: https://github.com/apache/datafusion-comet/issues/1546#issuecomment-2734792143 Spark-PI example is based on in memory RDD so Comet Scan cannot be tested, we need to prepare our own test case which reads data from local source and check the logs conta

Re: [PR] feat: add read array support [datafusion-comet]

2025-03-18 Thread via GitHub
andygrove commented on code in PR #1456: URL: https://github.com/apache/datafusion-comet/pull/1456#discussion_r2002090347 ## spark/src/main/scala/org/apache/comet/serde/QueryPlanSerde.scala: ## @@ -763,7 +766,8 @@ object QueryPlanSerde extends Logging with ShimQueryPlanSerde wi

Re: [PR] feat: enable iceberg compat tests, more tests for complex types [datafusion-comet]

2025-03-18 Thread via GitHub
comphead commented on code in PR #1550: URL: https://github.com/apache/datafusion-comet/pull/1550#discussion_r2002258552 ## spark/src/test/scala/org/apache/comet/exec/CometNativeReaderSuite.scala: ## @@ -63,4 +61,67 @@ class CometNativeReaderSuite extends CometTestBase with Ada

Re: [I] feat: fix schema issues for `native reader - read STRUCT of ARRAY fields` [datafusion-comet]

2025-03-18 Thread via GitHub
comphead commented on issue #1551: URL: https://github.com/apache/datafusion-comet/issues/1551#issuecomment-2735107639 Both datafusion and iceberg compat fails on ``` native reader - read STRUCT of ARRAY fields - native_datafusion *** FAILED *** (191 milliseconds) org.apache.sp

Re: [PR] feat: enable iceberg compat tests, more tests for complex types [datafusion-comet]

2025-03-18 Thread via GitHub
comphead commented on code in PR #1550: URL: https://github.com/apache/datafusion-comet/pull/1550#discussion_r2002258552 ## spark/src/test/scala/org/apache/comet/exec/CometNativeReaderSuite.scala: ## @@ -63,4 +61,67 @@ class CometNativeReaderSuite extends CometTestBase with Ada

Re: [PR] Improve collection during repr and repr_html [datafusion-python]

2025-03-18 Thread via GitHub
kosiew commented on code in PR #1036: URL: https://github.com/apache/datafusion-python/pull/1036#discussion_r2002243419 ## python/tests/test_dataframe.py: ## @@ -1191,13 +1192,17 @@ def add_with_parameter(df_internal, value: Any) -> DataFrame: def test_dataframe_repr_html(df)

Re: [PR] Improve collection during repr and repr_html [datafusion-python]

2025-03-18 Thread via GitHub
kosiew commented on code in PR #1036: URL: https://github.com/apache/datafusion-python/pull/1036#discussion_r2002243419 ## python/tests/test_dataframe.py: ## @@ -1191,13 +1192,17 @@ def add_with_parameter(df_internal, value: Any) -> DataFrame: def test_dataframe_repr_html(df)

Re: [PR] Always add round robin repartitioning to leaves (data sources), benefitting unbalanced / small datasets [datafusion]

2025-03-18 Thread via GitHub
github-actions[bot] commented on PR #13707: URL: https://github.com/apache/datafusion/pull/13707#issuecomment-2735135422 Thank you for your contribution. Unfortunately, this pull request is stale because it has been open 60 days with no activity. Please remove the stale label or comment or

Re: [I] Failed optimizations with Int64 type [datafusion]

2025-03-18 Thread via GitHub
qazxcdswe123 commented on issue #15291: URL: https://github.com/apache/datafusion/issues/15291#issuecomment-2735134447 @alamb On head commit. ```csv col_int32,col_int64,col_uint32,col_utf8 1,1,0,a 2,2,1,b 3,3,2,c 4,4,3,d 5,5,4,e 6,6,5,f 7,7,6,g 8,8,7,h 9,9

Re: [PR] Support binary temporal arithmetic with integers [datafusion]

2025-03-18 Thread via GitHub
github-actions[bot] commented on PR #13741: URL: https://github.com/apache/datafusion/pull/13741#issuecomment-2735135388 Thank you for your contribution. Unfortunately, this pull request is stale because it has been open 60 days with no activity. Please remove the stale label or comment or

Re: [PR] chore(deps): bump rust_decimal from 1.36.0 to 1.37.0 [datafusion]

2025-03-18 Thread via GitHub
alamb merged PR #15293: URL: https://github.com/apache/datafusion/pull/15293 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [I] Upgrade Guide for DataFusion 46 does not include the array signatures change [datafusion]

2025-03-18 Thread via GitHub
alamb closed issue #15105: Upgrade Guide for DataFusion 46 does not include the array signatures change URL: https://github.com/apache/datafusion/issues/15105 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [I] Update all github workflow to use actions tied to sha hashes [datafusion]

2025-03-18 Thread via GitHub
Jiashu-Hu commented on issue #15298: URL: https://github.com/apache/datafusion/issues/15298#issuecomment-2734276811 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [PR] Fix predicate pushdown for custom SchemaAdapters [datafusion]

2025-03-18 Thread via GitHub
adriangb commented on code in PR #15263: URL: https://github.com/apache/datafusion/pull/15263#discussion_r2001600091 ## datafusion/core/src/datasource/physical_plan/parquet.rs: ## @@ -224,6 +224,327 @@ mod tests { ) } +#[tokio::test] +async fn test_pushdo

Re: [PR] Fix predicate pushdown for custom SchemaAdapters [datafusion]

2025-03-18 Thread via GitHub
adriangb commented on PR #15263: URL: https://github.com/apache/datafusion/pull/15263#issuecomment-2734206909 2620c6a46fba979d3c7a17a1c9cb32f52c1d5b1b 😄 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

Re: [PR] Migrate tests to insta [datafusion]

2025-03-18 Thread via GitHub
jsai28 commented on code in PR #15288: URL: https://github.com/apache/datafusion/pull/15288#discussion_r2001387282 ## datafusion/core/tests/sql/select.rs: ## @@ -30,23 +30,7 @@ async fn test_list_query_parameters() -> Result<()> { .with_param_values(vec![ScalarValue::fr

Re: [PR] Migrate tests to insta [datafusion]

2025-03-18 Thread via GitHub
jsai28 commented on code in PR #15288: URL: https://github.com/apache/datafusion/pull/15288#discussion_r2001387628 ## datafusion/core/tests/sql/select.rs: ## @@ -66,33 +50,7 @@ async fn test_named_query_parameters() -> Result<()> { ])? .collect() .awai

Re: [I] Dynamic pruning filters from TopK state [datafusion]

2025-03-18 Thread via GitHub
alamb commented on issue #15037: URL: https://github.com/apache/datafusion/issues/15037#issuecomment-2734575745 > Does anyone have a handle on how we might implement this? I was thinking we’d need to add a method to exec operators called `apply_filter` but that basically sends down the addi

Re: [PR] Add WITH ORDER example to blog post [datafusion-site]

2025-03-18 Thread via GitHub
Omega359 commented on code in PR #59: URL: https://github.com/apache/datafusion-site/pull/59#discussion_r2002096546 ## content/blog/2025-03-11-ordering-analysis.md: ## @@ -291,6 +291,53 @@ Following third and fourth constraints for the simplified table, the succinct va `[time_

Re: [I] Migrate datasource tests to `insta` [datafusion]

2025-03-18 Thread via GitHub
sreshu commented on issue #15246: URL: https://github.com/apache/datafusion/issues/15246#issuecomment-2734911692 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To uns

Re: [PR] Migrate tests to insta [datafusion]

2025-03-18 Thread via GitHub
jsai28 commented on code in PR #15288: URL: https://github.com/apache/datafusion/pull/15288#discussion_r2001378037 ## datafusion/core/tests/parquet/schema.rs: ## @@ -82,7 +69,18 @@ async fn schema_merge_ignores_metadata_by_default() { .unwrap(); let actual = df.col

[PR] feat: simplify regex wildcard pattern [datafusion]

2025-03-18 Thread via GitHub
waynexia opened a new pull request, #15299: URL: https://github.com/apache/datafusion/pull/15299 ## Which issue does this PR close? - Closes #. ## Rationale for this change Simplify dump regex cases like `~ '.*'` or `!~ '.*'`. ## What changes are in

Re: [PR] Remove inline table scan analyzer rule [datafusion]

2025-03-18 Thread via GitHub
jayzhan211 merged PR #15201: URL: https://github.com/apache/datafusion/pull/15201 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dat

Re: [PR] feat: support merge for `Distribution` [datafusion]

2025-03-18 Thread via GitHub
alamb commented on code in PR #15296: URL: https://github.com/apache/datafusion/pull/15296#discussion_r2002037023 ## datafusion/expr-common/src/statistics.rs: ## @@ -857,6 +857,143 @@ pub fn compute_variance( ScalarValue::try_from(target_type) } +/// Merges two distribut

[PR] Fix union in view table test [datafusion]

2025-03-18 Thread via GitHub
jayzhan211 opened a new pull request, #15300: URL: https://github.com/apache/datafusion/pull/15300 ## Which issue does this PR close? - Closes #. ## Rationale for this change ## What changes are included in this PR? ## Are these changes test

Re: [PR] Fix union in view table test [datafusion]

2025-03-18 Thread via GitHub
jayzhan211 commented on code in PR #15300: URL: https://github.com/apache/datafusion/pull/15300#discussion_r2002212215 ## datafusion/sqllogictest/test_files/union.slt: ## @@ -907,11 +907,56 @@ SELECT * FROM (SELECT y FROM u1 UNION ALL SELECT y FROM u2) ORDER BY y; 20 40 +qu

Re: [I] Doc: Add an example how to test Comet in K8s against user define query [datafusion-comet]

2025-03-18 Thread via GitHub
comphead commented on issue #1546: URL: https://github.com/apache/datafusion-comet/issues/1546#issuecomment-2734719437 Apache Spark also relies on SparkPI example in their documentation https://spark.apache.org/docs/3.5.4/running-on-kubernetes.html -- This is an automated message from th

Re: [PR] Remove inline table scan analyzer rule [datafusion]

2025-03-18 Thread via GitHub
jayzhan211 commented on PR #15201: URL: https://github.com/apache/datafusion/pull/15201#issuecomment-2734924833 Thanks @alamb -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment

Re: [I] [EPIC] Redesign DataFusion main page [datafusion]

2025-03-18 Thread via GitHub
sreshu commented on issue #14389: URL: https://github.com/apache/datafusion/issues/14389#issuecomment-2734913948 Are you thinking of more modifications @alamb ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL a

[I] Improve representation of `LIKE ALL` and variants [datafusion-sqlparser-rs]

2025-03-18 Thread via GitHub
mvzink opened a new issue, #1770: URL: https://github.com/apache/datafusion-sqlparser-rs/issues/1770 Currently, expressions such as `... LIKE ALL(...)` are parsed as an `Expr::Like` with the `pattern` being an `Expr::Function` with a `name` of `"ALL"`. It seems preferable to parse them as a

Re: [PR] Add GLOBAL context/modifier to SET statements [datafusion-sqlparser-rs]

2025-03-18 Thread via GitHub
MohamedAbdeen21 commented on code in PR #1767: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1767#discussion_r2001991786 ## src/ast/mod.rs: ## @@ -7919,11 +7921,28 @@ impl fmt::Display for ContextModifier { write!(f, "") }

Re: [PR] Fix union in view table test [datafusion]

2025-03-18 Thread via GitHub
jayzhan211 commented on code in PR #15300: URL: https://github.com/apache/datafusion/pull/15300#discussion_r2002213580 ## datafusion/sqllogictest/test_files/union.slt: ## @@ -907,11 +907,56 @@ SELECT * FROM (SELECT y FROM u1 UNION ALL SELECT y FROM u2) ORDER BY y; 20 40 +qu

Re: [PR] feat: simplify regex wildcard pattern [datafusion]

2025-03-18 Thread via GitHub
jayzhan211 commented on code in PR #15299: URL: https://github.com/apache/datafusion/pull/15299#discussion_r2002218628 ## datafusion/optimizer/src/simplify_expressions/regex.rs: ## @@ -43,6 +45,23 @@ pub fn simplify_regex_expr( let mode = OperatorMode::new(&op); if l

Re: [PR] Fix union in view table test [datafusion]

2025-03-18 Thread via GitHub
jayzhan211 commented on code in PR #15300: URL: https://github.com/apache/datafusion/pull/15300#discussion_r2002215012 ## datafusion/sqllogictest/test_files/union.slt: ## @@ -907,11 +907,56 @@ SELECT * FROM (SELECT y FROM u1 UNION ALL SELECT y FROM u2) ORDER BY y; 20 40 +qu

Re: [PR] Add WITH ORDER example to blog post [datafusion-site]

2025-03-18 Thread via GitHub
akurmustafa commented on PR #59: URL: https://github.com/apache/datafusion-site/pull/59#issuecomment-2734727273 > Thanks @akurmustafa > > I noticed some small format issues when reviewing this so I pushed [e10c17f](https://github.com/apache/datafusion-site/commit/e10c17f0f99b83d8e15cd

Re: [PR] docs: various improvements to tuning guide [datafusion-comet]

2025-03-18 Thread via GitHub
parthchandra commented on code in PR #1525: URL: https://github.com/apache/datafusion-comet/pull/1525#discussion_r2002219014 ## docs/source/user-guide/tuning.md: ## @@ -141,30 +191,22 @@ It must be set before the Spark context is created. You can enable or disable Co at runtim

Re: [PR] feat: implement scripts for binary release build [datafusion-comet]

2025-03-18 Thread via GitHub
parthchandra commented on PR #932: URL: https://github.com/apache/datafusion-comet/pull/932#issuecomment-2735075699 > @parthchandra Consulting a question: In the current compilation script `dev/release/build-release-comet.sh`, the final invocation of the compilation command is `core-amd64-l

[PR] Blog post on Parquet filter pushdown [datafusion-site]

2025-03-18 Thread via GitHub
XiangpengHao opened a new pull request, #61: URL: https://github.com/apache/datafusion-site/pull/61 This is the sequal to #60 , we should merge that one before merge this. Feel free to edit! @alamb -- This is an automated message from the Apache Git Service. To respond to the message, pl

Re: [PR] chore: Update links for released version [datafusion-comet]

2025-03-18 Thread via GitHub
andygrove commented on code in PR #1540: URL: https://github.com/apache/datafusion-comet/pull/1540#discussion_r2001586049 ## docs/source/user-guide/kubernetes.md: ## @@ -65,31 +65,31 @@ metadata: spec: type: Scala mode: cluster - image: ghcr.io/apache/datafusion-comet:sp

Re: [PR] Add GLOBAL context/modifier to SET statements [datafusion-sqlparser-rs]

2025-03-18 Thread via GitHub
MohamedAbdeen21 commented on code in PR #1767: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1767#discussion_r2001587043 ## src/ast/mod.rs: ## @@ -7919,11 +7921,28 @@ impl fmt::Display for ContextModifier { write!(f, "") }

Re: [PR] Migrate user_defined tests to insta [datafusion]

2025-03-18 Thread via GitHub
alamb commented on PR #15255: URL: https://github.com/apache/datafusion/pull/15255#issuecomment-2734485532 Thanks @shruti2522 and @blaginin 🙏 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to th

Re: [PR] Migrate user_defined tests to insta [datafusion]

2025-03-18 Thread via GitHub
alamb merged PR #15255: URL: https://github.com/apache/datafusion/pull/15255 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] Simplify display format of `AggregateFunctionExpr`, add `Expr::sql_name` [datafusion]

2025-03-18 Thread via GitHub
alamb commented on PR #15253: URL: https://github.com/apache/datafusion/pull/15253#issuecomment-2734479401 I pushed a commit to add more documentation explaining the different options for printing `Expr`s I also pushed a commit to rename `sql_name` to `human_display` as that seemed m

Re: [PR] chore: Update links for released version [datafusion-comet]

2025-03-18 Thread via GitHub
andygrove merged PR #1540: URL: https://github.com/apache/datafusion-comet/pull/1540 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@

Re: [PR] Blog post on Parquet pruning in datafusion [datafusion-site]

2025-03-18 Thread via GitHub
alamb commented on code in PR #60: URL: https://github.com/apache/datafusion-site/pull/60#discussion_r2001947189 ## content/blog/2025-03-18-parquet-pruning.md: ## @@ -0,0 +1,111 @@ +--- +layout: post +title: Parquet pruning in DataFusion: Read Only What Matters +date: 2025-03-18

Re: [PR] Migrate tests to insta [datafusion]

2025-03-18 Thread via GitHub
jsai28 commented on code in PR #15288: URL: https://github.com/apache/datafusion/pull/15288#discussion_r2001388172 ## datafusion/core/tests/sql/select.rs: ## @@ -114,33 +72,7 @@ async fn test_prepare_statement() -> Result<()> { let dataframe = dataframe.with_param_values(pa

Re: [PR] Add GLOBAL context/modifier to SET statements [datafusion-sqlparser-rs]

2025-03-18 Thread via GitHub
mvzink commented on code in PR #1767: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1767#discussion_r2001988225 ## src/ast/mod.rs: ## @@ -7919,11 +7921,28 @@ impl fmt::Display for ContextModifier { write!(f, "") } Self::L

Re: [PR] Improve performance of `first_value` by implementing special `GroupsAccumulator` [datafusion]

2025-03-18 Thread via GitHub
UBarney commented on code in PR #15266: URL: https://github.com/apache/datafusion/pull/15266#discussion_r2002243574 ## datafusion/functions-aggregate/src/first_last.rs: ## @@ -179,6 +292,424 @@ impl AggregateUDFImpl for FirstValue { } } +struct FirstPrimitiveGroupsAccumu

[PR] feat: enable iceberg compat tests, more tests for complex types [datafusion-comet]

2025-03-18 Thread via GitHub
comphead opened a new pull request, #1550: URL: https://github.com/apache/datafusion-comet/pull/1550 ## Which issue does this PR close? Part of #1454 . ## Rationale for this change Enable Iceberg compat type tests, add more tests for complex types ## What

Re: [PR] feat: support merge for `Distribution` [datafusion]

2025-03-18 Thread via GitHub
xudong963 commented on code in PR #15296: URL: https://github.com/apache/datafusion/pull/15296#discussion_r2002296888 ## datafusion/expr-common/src/statistics.rs: ## @@ -857,6 +857,143 @@ pub fn compute_variance( ScalarValue::try_from(target_type) } +/// Merges two distr

Re: [PR] Add WITH ORDER example to blog post [datafusion-site]

2025-03-18 Thread via GitHub
Omega359 commented on code in PR #59: URL: https://github.com/apache/datafusion-site/pull/59#discussion_r2002097220 ## content/blog/2025-03-11-ordering-analysis.md: ## @@ -291,6 +291,53 @@ Following third and fourth constraints for the simplified table, the succinct va `[time_

Re: [PR] Improve performance of `first_value` by implementing special `GroupsAccumulator` [datafusion]

2025-03-18 Thread via GitHub
UBarney commented on code in PR #15266: URL: https://github.com/apache/datafusion/pull/15266#discussion_r2002297890 ## datafusion/functions-aggregate/src/first_last.rs: ## @@ -179,6 +292,424 @@ impl AggregateUDFImpl for FirstValue { } } +struct FirstPrimitiveGroupsAccumu

Re: [PR] Improve performance of `first_value` by implementing special `GroupsAccumulator` [datafusion]

2025-03-18 Thread via GitHub
UBarney commented on code in PR #15266: URL: https://github.com/apache/datafusion/pull/15266#discussion_r2002297890 ## datafusion/functions-aggregate/src/first_last.rs: ## @@ -179,6 +292,424 @@ impl AggregateUDFImpl for FirstValue { } } +struct FirstPrimitiveGroupsAccumu

Re: [PR] Migrate user_defined tests to insta [datafusion]

2025-03-18 Thread via GitHub
alamb commented on code in PR #15255: URL: https://github.com/apache/datafusion/pull/15255#discussion_r2001657619 ## datafusion/core/tests/user_defined/user_defined_table_functions.rs: ## @@ -34,11 +34,19 @@ use datafusion::physical_plan::{collect, ExecutionPlan}; use datafusio

Re: [PR] Add upgrade notes for array signatures [datafusion]

2025-03-18 Thread via GitHub
alamb commented on PR #15237: URL: https://github.com/apache/datafusion/pull/15237#issuecomment-2734190774 Thanks again @jkosh44 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comm

Re: [I] Datafusion can't seem to cast evolving structs [datafusion]

2025-03-18 Thread via GitHub
alamb commented on issue #14757: URL: https://github.com/apache/datafusion/issues/14757#issuecomment-2734594945 Looks like @kosiew took a shot in - https://github.com/apache/datafusion/pull/15295 -- This is an automated message from the Apache Git Service. To respond to the message, p

Re: [PR] shell script to collect Benchmarks [datafusion]

2025-03-18 Thread via GitHub
alamb commented on PR #15144: URL: https://github.com/apache/datafusion/pull/15144#issuecomment-2734597583 > Just noticed that benchmarks/README.md is not included in prettier CI check, is that intended? I don't think it is intended And I am sorry I haven't had time to test out

Re: [I] Allow UDFs to return custom `Diagnostic` [datafusion]

2025-03-18 Thread via GitHub
alamb commented on issue #15276: URL: https://github.com/apache/datafusion/issues/15276#issuecomment-2734586032 Sounds like a good idea to me -- thanks @eliaperantoni I think it would be most helpful to: 1. Use the new API in at least one of the built in functions so 1) the API is

Re: [PR] wip: Update benchmark results for 0.7.0 release [datafusion-comet]

2025-03-18 Thread via GitHub
codecov-commenter commented on PR #1548: URL: https://github.com/apache/datafusion-comet/pull/1548#issuecomment-2734624914 ## [Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/1548?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_ca

Re: [D] Does DataFusion Support JSON Path Filtering Like `jsonb_path_exists` in PostgreSQL? [datafusion]

2025-03-18 Thread via GitHub
GitHub user alamb added a comment to the discussion: Does DataFusion Support JSON Path Filtering Like `jsonb_path_exists` in PostgreSQL? I think you can use [`ScalarUDF::call`](https://docs.rs/datafusion/latest/datafusion/logical_expr/struct.ScalarUDF.html#method.call) ```rust let json_path_ex

Re: [PR] Migrate tests to insta [datafusion]

2025-03-18 Thread via GitHub
jsai28 commented on code in PR #15288: URL: https://github.com/apache/datafusion/pull/15288#discussion_r2001386223 ## datafusion/core/tests/sql/path_partition.rs: ## @@ -390,15 +349,7 @@ async fn csv_grouping_by_partition() -> Result<()> { .collect() .await?;

Re: [PR] Migrate tests to insta [datafusion]

2025-03-18 Thread via GitHub
jsai28 commented on code in PR #15288: URL: https://github.com/apache/datafusion/pull/15288#discussion_r2001376856 ## datafusion/core/tests/parquet/custom_reader.rs: ## @@ -96,17 +97,15 @@ async fn route_data_access_ops_to_parquet_file_reader_factory() { let task_ctx = ses

Re: [PR] Migrate tests to insta [datafusion]

2025-03-18 Thread via GitHub
jsai28 commented on code in PR #15288: URL: https://github.com/apache/datafusion/pull/15288#discussion_r2001384839 ## datafusion/core/tests/sql/path_partition.rs: ## @@ -275,18 +267,7 @@ async fn csv_filter_with_file_col() -> Result<()> { .collect() .await?;

Re: [PR] minor: fix `data/sqlite` link [datafusion]

2025-03-18 Thread via GitHub
xudong963 commented on PR #15286: URL: https://github.com/apache/datafusion/pull/15286#issuecomment-2732874285 You can run `prettier -w xxx` to fix the ci -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above t

Re: [PR] Support logic optimize rule to pass the case that Utf8view datatype combined with Utf8 datatype [datafusion]

2025-03-18 Thread via GitHub
alamb merged PR #15239: URL: https://github.com/apache/datafusion/pull/15239 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [I] Failed optimizations with Int64 type [datafusion]

2025-03-18 Thread via GitHub
aectaan commented on issue #15291: URL: https://github.com/apache/datafusion/issues/15291#issuecomment-2734472619 Thank you @alamb! No, it doesn't require anything custom. Unfortunately `datafusion-cli` parser fails at this request: doesn't like opening brace before WHERE - that's wh

Re: [PR] Add GLOBAL context/modifier to SET statements [datafusion-sqlparser-rs]

2025-03-18 Thread via GitHub
MohamedAbdeen21 commented on PR #1767: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1767#issuecomment-2734552546 > i.e. adding a `scope` to `SetAssignment`. That will be a more substantial change to parsing itself and could be a separate PR IMO. I agree, it's gonna req

[PR] doc: Renew `kubernetes.md` [datafusion-comet]

2025-03-18 Thread via GitHub
comphead opened a new pull request, #1549: URL: https://github.com/apache/datafusion-comet/pull/1549 ## Which issue does this PR close? Related to #1546. ## Rationale for this change ## What changes are included in this PR? ## How are these

Re: [PR] Add WITH ORDER example to blog post [datafusion-site]

2025-03-18 Thread via GitHub
alamb commented on PR #59: URL: https://github.com/apache/datafusion-site/pull/59#issuecomment-2734731781 > Thanks @alamb, I was working on to add the example you gave ("DataFusion can find / use orderings based on query intermediates"). Should we add this to the document what do you think?

<    1   2   3   >