Re: [I] Implement a scalar function for creating ScalarValue::Map [datafusion]

2024-07-05 Thread via GitHub
goldmedal commented on issue #11268: URL: https://github.com/apache/datafusion/issues/11268#issuecomment-2211676307 > The datatype of key and value should be known before `invoke`, so we can get the corresponding Builder based on the data type Indeed. I'll try to use MapBuilder. Thank

Re: [PR] Infer count() aggregation is not null [datafusion]

2024-07-05 Thread via GitHub
jonahgao commented on code in PR #11256: URL: https://github.com/apache/datafusion/pull/11256#discussion_r1667250109 ## datafusion/expr/src/expr_schema.rs: ## @@ -322,10 +322,16 @@ impl ExprSchemable for Expr { } } Expr::Cast(Cast { exp

Re: [PR] Infer count() aggregation is not null [datafusion]

2024-07-05 Thread via GitHub
jonahgao merged PR #11256: URL: https://github.com/apache/datafusion/pull/11256 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

Re: [PR] Change array agg result from empty list to null if no row qualifed [datafusion]

2024-07-05 Thread via GitHub
jayzhan211 commented on code in PR #11299: URL: https://github.com/apache/datafusion/pull/11299#discussion_r1667248774 ## datafusion/sqllogictest/test_files/aggregate.slt: ## @@ -2744,28 +2735,84 @@ SELECT ARRAY_AGG([1]) [[1]] -# test_approx_percentile_cont_decimal_supp

Re: [PR] Change array agg result from empty list to null if no row qualifed [datafusion]

2024-07-05 Thread via GitHub
jayzhan211 commented on code in PR #11299: URL: https://github.com/apache/datafusion/pull/11299#discussion_r1667248774 ## datafusion/sqllogictest/test_files/aggregate.slt: ## @@ -2744,28 +2735,84 @@ SELECT ARRAY_AGG([1]) [[1]] -# test_approx_percentile_cont_decimal_supp

[PR] Change array agg semantic for empty result from empty list to empty row [datafusion]

2024-07-05 Thread via GitHub
jayzhan211 opened a new pull request, #11299: URL: https://github.com/apache/datafusion/pull/11299 ## Which issue does this PR close? As @findepi pointed out in https://github.com/apache/datafusion/issues/11274#issuecomment-2211364784 that most of the aggregate function does not retu

Re: [PR] Impl a general get results from stats [datafusion]

2024-07-05 Thread via GitHub
Rachelint commented on PR #11261: URL: https://github.com/apache/datafusion/pull/11261#issuecomment-2211608009 > Thank you @Rachelint ❤️ > > > It is mainly due to AggregateUDFImpl is defined in expr crate, and the needed PhysicalExpr is defined in datafusion-physical-expr-common crate

Re: [PR] use safe cast in propagate_constraints [datafusion]

2024-07-05 Thread via GitHub
Lordworms commented on PR #11297: URL: https://github.com/apache/datafusion/pull/11297#issuecomment-2211600860 seems like a arrow-rs related problem, but we can use safe cast to avoid it temporarily -- This is an automated message from the Apache Git Service. To respond to the message, pl

[PR] Implement TPCH substrait integration teset, support tpch_3 [datafusion]

2024-07-05 Thread via GitHub
Lordworms opened a new pull request, #11298: URL: https://github.com/apache/datafusion/pull/11298 ## Which issue does this PR close? part of #10710 Closes #. ## Rationale for this change ## What changes are included in this PR? ## Are these

[PR] use safe cast in propagate_constraints [datafusion]

2024-07-05 Thread via GitHub
Lordworms opened a new pull request, #11297: URL: https://github.com/apache/datafusion/pull/11297 ## Which issue does this PR close? Closes #11252 ## Rationale for this change ## What changes are included in this PR? ## Are these changes te

Re: [I] Customizable Nullability for UDAF [datafusion]

2024-07-05 Thread via GitHub
jayzhan211 commented on issue #11274: URL: https://github.com/apache/datafusion/issues/11274#issuecomment-2211569477 Another thing to note that is the `nullability` of the function is different from the final result but the information for optimizer. We can still tell optimizer that t

Re: [PR] Convert `nth_value` to UDAF [datafusion]

2024-07-05 Thread via GitHub
jayzhan211 commented on code in PR #11287: URL: https://github.com/apache/datafusion/pull/11287#discussion_r1667223626 ## datafusion/proto/tests/cases/roundtrip_physical_plan.rs: ## @@ -362,15 +363,17 @@ fn rountrip_aggregate() -> Result<()> { false, )?],

Re: [I] Customizable Nullability for UDAF [datafusion]

2024-07-05 Thread via GitHub
jayzhan211 commented on issue #11274: URL: https://github.com/apache/datafusion/issues/11274#issuecomment-2211550511 I agree that we don't need non-null for `array_agg`, `min/max`, but we may need it for count variant like `approx_count_distinct` or other user-defined function that expect n

Re: [PR] fix: Optimize some functions to rewrite dictionary-encoded strings [datafusion-comet]

2024-07-05 Thread via GitHub
vaibhawvipul commented on PR #627: URL: https://github.com/apache/datafusion-comet/pull/627#issuecomment-2211549537 > There is a test failure: > > ``` > Cause: org.apache.comet.CometNativeException: Unsupported data type Dictionary(Int32, Binary) for function md5 > ``` fi

Re: [PR] Implement UDF Plan [datafusion]

2024-07-05 Thread via GitHub
jayzhan211 commented on PR #11263: URL: https://github.com/apache/datafusion/pull/11263#issuecomment-2211544997 > > I would prefer to have `plan_extract` and `plan_position` separately. > > Here is a PR for planning position: #11243 > > It isn't clear to me what we should do wit

Re: [PR] Implement user defined planner for position [datafusion]

2024-07-05 Thread via GitHub
jayzhan211 commented on code in PR #11243: URL: https://github.com/apache/datafusion/pull/11243#discussion_r1667216951 ## datafusion/functions/src/unicode/planner.rs: ## @@ -0,0 +1,36 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor licen

Re: [PR] Impl a general get results from stats [datafusion]

2024-07-05 Thread via GitHub
edmondop commented on PR #11261: URL: https://github.com/apache/datafusion/pull/11261#issuecomment-2211541999 The draft implementation seems reasonable and all checks are passing 🎉 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to Gi

Re: [PR] Infer count() aggregation is not null [datafusion]

2024-07-05 Thread via GitHub
jayzhan211 commented on code in PR #11256: URL: https://github.com/apache/datafusion/pull/11256#discussion_r1667215728 ## datafusion/expr/src/expr_schema.rs: ## @@ -322,10 +322,16 @@ impl ExprSchemable for Expr { } } Expr::Cast(Cast { e

Re: [I] Implement a scalar function for creating ScalarValue::Map [datafusion]

2024-07-05 Thread via GitHub
jayzhan211 commented on issue #11268: URL: https://github.com/apache/datafusion/issues/11268#issuecomment-2211534980 > The reason I didn't use MapBuilder is that I can't find a smart way to get the corresponding ArrayBuilder for the arguments ( MapBuilder requires two initial builders for t

Re: [PR] feat: support `COUNT()` [datafusion]

2024-07-05 Thread via GitHub
jayzhan211 commented on PR #11229: URL: https://github.com/apache/datafusion/pull/11229#issuecomment-2211525858 > @jayzhan211 -- would you mind taking a quick look now to see if this approach is inline w/ what you were thinking? It's super rough, but just hoping to get feedback on the gener

Re: [PR] Fix data page statistics when all rows are null in a data page [datafusion]

2024-07-05 Thread via GitHub
efredine commented on code in PR #11295: URL: https://github.com/apache/datafusion/pull/11295#discussion_r1667208743 ## datafusion/core/src/datasource/physical_plan/parquet/statistics.rs: ## @@ -600,6 +601,31 @@ make_data_page_stats_iterator!( Index::DOUBLE, f64 ); R

Re: [PR] chore: Add Miri workflow [datafusion-comet]

2024-07-05 Thread via GitHub
codecov-commenter commented on PR #636: URL: https://github.com/apache/datafusion-comet/pull/636#issuecomment-2211497380 ## [Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/636?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campai

Re: [PR] chore: Add Miri workflow [datafusion-comet]

2024-07-05 Thread via GitHub
andygrove commented on code in PR #636: URL: https://github.com/apache/datafusion-comet/pull/636#discussion_r1667181751 ## core/src/execution/sort.rs: ## @@ -165,7 +165,7 @@ where // because they are defined as Vec> ptr::copy_non

Re: [PR] chore: Add Miri workflow [datafusion-comet]

2024-07-05 Thread via GitHub
andygrove commented on code in PR #636: URL: https://github.com/apache/datafusion-comet/pull/636#discussion_r1667181686 ## core/src/execution/datafusion/spark_hash.rs: ## @@ -89,10 +89,7 @@ pub(crate) fn spark_compatible_murmur3_hash>(data: T, seed: u32) // data is &[u8] so

Re: [PR] chore: Add Miri workflow [datafusion-comet]

2024-07-05 Thread via GitHub
andygrove commented on code in PR #636: URL: https://github.com/apache/datafusion-comet/pull/636#discussion_r1667181549 ## core/Cargo.toml: ## @@ -97,6 +97,7 @@ twox-hash = "1.6.3" [features] default = [] +nightly = [] Review Comment: We have some references to this feat

Re: [PR] test: Add CometTPCDSQueryTestSuite [datafusion-comet]

2024-07-05 Thread via GitHub
viirya commented on PR #628: URL: https://github.com/apache/datafusion-comet/pull/628#issuecomment-2211450881 Thanks @andygrove -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comme

Re: [I] Sorted rows returned by DataFusion sort operator could be in different order than Spark on irrelevant columns [datafusion-comet]

2024-07-05 Thread via GitHub
viirya closed issue #629: Sorted rows returned by DataFusion sort operator could be in different order than Spark on irrelevant columns URL: https://github.com/apache/datafusion-comet/issues/629 -- This is an automated message from the Apache Git Service. To respond to the message, please log

Re: [PR] test: Add CometTPCDSQueryTestSuite [datafusion-comet]

2024-07-05 Thread via GitHub
viirya merged PR #628: URL: https://github.com/apache/datafusion-comet/pull/628 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

[PR] Add user_defined_sql_planners(..) to FunctionRegistry [datafusion]

2024-07-05 Thread via GitHub
Omega359 opened a new pull request, #11296: URL: https://github.com/apache/datafusion/pull/11296 ## Which issue does this PR close? Closes #11294 ## Rationale for this change Adding the ability to retrieve the list of registered user defined sql functions.

Re: [PR] minor: replace .downcast_ref::().is_some() with .is::() [datafusion-comet]

2024-07-05 Thread via GitHub
andygrove merged PR #635: URL: https://github.com/apache/datafusion-comet/pull/635 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@da

Re: [I] Improve CometExchange metrics [datafusion-comet]

2024-07-05 Thread via GitHub
viirya commented on issue #625: URL: https://github.com/apache/datafusion-comet/issues/625#issuecomment-2211423534 Okay, I think `shuffle write time` is more like a e2e measure for the time on entire shuffle operation. `shuffleNativeTotalTime` focuses only on the time spent on native code

Re: [PR] add short circuit in BinaryExpr [datafusion]

2024-07-05 Thread via GitHub
Dandandan commented on code in PR #11247: URL: https://github.com/apache/datafusion/pull/11247#discussion_r1667162101 ## datafusion/physical-expr/src/expressions/binary.rs: ## @@ -257,7 +257,50 @@ impl PhysicalExpr for BinaryExpr { fn evaluate(&self, batch: &RecordBatch) ->

Re: [PR] add short circuit in BinaryExpr [datafusion]

2024-07-05 Thread via GitHub
Dandandan commented on code in PR #11247: URL: https://github.com/apache/datafusion/pull/11247#discussion_r1667162101 ## datafusion/physical-expr/src/expressions/binary.rs: ## @@ -257,7 +257,50 @@ impl PhysicalExpr for BinaryExpr { fn evaluate(&self, batch: &RecordBatch) ->

Re: [PR] fix: correctly handle Substrait windows with rows bounds (and validate executability of test plans) [datafusion]

2024-07-05 Thread via GitHub
Blizzara commented on code in PR #11278: URL: https://github.com/apache/datafusion/pull/11278#discussion_r1667159836 ## datafusion/substrait/tests/testdata/data.csv: ## @@ -1,3 +1,3 @@ a,b,c,d,e,f -1,2.0,2020-01-01,false,4294967296,'a' -3,4.5,2020-01-01,true,2147483648,'b' \ No

Re: [I] Improve CometExchange metrics [datafusion-comet]

2024-07-05 Thread via GitHub
viirya commented on issue #625: URL: https://github.com/apache/datafusion-comet/issues/625#issuecomment-2211414231 Hmm, I think we already have `shuffleNativeTotalTime` which is `shuffle write time`, no? -- This is an automated message from the Apache Git Service. To respond to the messa

Re: [PR] feat: support `COUNT()` [datafusion]

2024-07-05 Thread via GitHub
tshauck commented on PR #11229: URL: https://github.com/apache/datafusion/pull/11229#issuecomment-2211407574 @jayzhan211 -- would you mind taking a quick look now to see if this approach is inline w/ what you were thinking? It's super rough, but just hoping to get feedback on the general sh

[PR] Fix data page statistics when all rows are null in a data page [datafusion]

2024-07-05 Thread via GitHub
efredine opened a new pull request, #11295: URL: https://github.com/apache/datafusion/pull/11295 ## Which issue does this PR close? Closes #11280. ## Rationale for this change When all rows for a data page are null the min and max statistics should be present but null. S

Re: [PR] feat: ANSI support for Add [datafusion-comet]

2024-07-05 Thread via GitHub
planga82 commented on code in PR #616: URL: https://github.com/apache/datafusion-comet/pull/616#discussion_r1667141077 ## core/src/execution/datafusion/expressions/binary.rs: ## @@ -0,0 +1,202 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contribu

Re: [PR] feat: ANSI support for Add [datafusion-comet]

2024-07-05 Thread via GitHub
planga82 commented on code in PR #616: URL: https://github.com/apache/datafusion-comet/pull/616#discussion_r1667140189 ## core/src/execution/datafusion/expressions/binary.rs: ## @@ -0,0 +1,202 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contribu

[I] Add supporting functions for user defined sql planners to FunctionRegistry [datafusion]

2024-07-05 Thread via GitHub
Omega359 opened a new issue, #11294: URL: https://github.com/apache/datafusion/issues/11294 ### Is your feature request related to a problem or challenge? Currently with the function registry trait you can register a user defined sql planner to the FunctionRegistry but there is no abi

Re: [I] Add supporting functions for user defined sql planners to FunctionRegistry [datafusion]

2024-07-05 Thread via GitHub
Omega359 commented on issue #11294: URL: https://github.com/apache/datafusion/issues/11294#issuecomment-2211393261 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To u

Re: [PR] Implement user defined planner for extract [datafusion]

2024-07-05 Thread via GitHub
Omega359 commented on PR #11215: URL: https://github.com/apache/datafusion/pull/11215#issuecomment-2211387323 The issue boils down to SqlToRel::new_with_options(..) not setting the planners. Unfortunately, I can't find a way to retrieve the list of planners set on the context to set them my

[PR] Fix count() docs [datafusion]

2024-07-05 Thread via GitHub
findepi opened a new pull request, #11293: URL: https://github.com/apache/datafusion/pull/11293 The count aggregate was documented to count null values, but it does not do that. The implemented behavior is correct, so let's fix docs. -- This is an automated message from the Apache Git

Re: [PR] test: Add CometTPCDSQueryTestSuite [datafusion-comet]

2024-07-05 Thread via GitHub
viirya commented on code in PR #628: URL: https://github.com/apache/datafusion-comet/pull/628#discussion_r1667114976 ## spark/src/test/scala/org/apache/spark/sql/CometSQLQueryTestHelper.scala: ## @@ -0,0 +1,108 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under on

Re: [PR] test: Add CometTPCDSQueryTestSuite [datafusion-comet]

2024-07-05 Thread via GitHub
viirya commented on code in PR #628: URL: https://github.com/apache/datafusion-comet/pull/628#discussion_r1667114976 ## spark/src/test/scala/org/apache/spark/sql/CometSQLQueryTestHelper.scala: ## @@ -0,0 +1,108 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under on

Re: [PR] Infer count() aggregation is not null [datafusion]

2024-07-05 Thread via GitHub
findepi commented on PR #11256: URL: https://github.com/apache/datafusion/pull/11256#issuecomment-2211376092 thanks @jonahgao for your review! updated. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to g

[PR] Remove unnecessary qualified names [datafusion]

2024-07-05 Thread via GitHub
findepi opened a new pull request, #11292: URL: https://github.com/apache/datafusion/pull/11292 Leverage existing imports. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment

Re: [PR] Infer count() aggregation is not null [datafusion]

2024-07-05 Thread via GitHub
findepi commented on code in PR #11256: URL: https://github.com/apache/datafusion/pull/11256#discussion_r1667103815 ## datafusion/expr/src/expr_schema.rs: ## @@ -322,10 +322,16 @@ impl ExprSchemable for Expr { } } Expr::Cast(Cast { expr

Re: [I] Customizable Nullability for UDAF [datafusion]

2024-07-05 Thread via GitHub
findepi commented on issue #11274: URL: https://github.com/apache/datafusion/issues/11274#issuecomment-2211364784 Just thinking more about this. SQL spec > If no row qualifies, then the result of COUNT is 0 (zero), and the result of any other aggregate function is the null value.

[I] Incorrect results in aggregation queries with grouping sets [datafusion]

2024-07-05 Thread via GitHub
thinkharderdev opened a new issue, #11291: URL: https://github.com/apache/datafusion/issues/11291 ### Describe the bug The `InputOrderMode` in `AggregateExec` does not take proper account of grouping sets and can cause incorrect aggregation results in cases where we have a constant e

Re: [PR] [draft] Add `LogicalType`, try to support user-defined types [datafusion]

2024-07-05 Thread via GitHub
notfilippo commented on PR #11160: URL: https://github.com/apache/datafusion/pull/11160#issuecomment-2211324761 > So I can more fully help plan / communicate / coordinate this one. Sounds good! Feel free to follow up here / on slack / on discord 😄 I've also noticed that this po

[PR] WIP: add miri workflow [datafusion-comet]

2024-07-05 Thread via GitHub
andygrove opened a new pull request, #636: URL: https://github.com/apache/datafusion-comet/pull/636 ## Which issue does this PR close? Closes #. ## Rationale for this change ## What changes are included in this PR? ## How are these changes t

Re: [PR] Implement user defined planner for extract [datafusion]

2024-07-05 Thread via GitHub
Omega359 commented on PR #11215: URL: https://github.com/apache/datafusion/pull/11215#issuecomment-2211321035 Adding this clause to the plan_to_sql.rs:roundtrip_expr() test showcases the issue: ``` ( TableReference::bare("person"), r#"extract(year from

Re: [PR] Implement user defined planner for extract [datafusion]

2024-07-05 Thread via GitHub
Omega359 commented on PR #11215: URL: https://github.com/apache/datafusion/pull/11215#issuecomment-2211314004 I just reset my branch with upstream main just now and I'm getting this in 4 of my tests: ``` network_time_parsing_should_work::case_2 stdout Error: Result fai

Re: [PR] fix: correctly handle Substrait windows with rows bounds (and validate executability of test plans) [datafusion]

2024-07-05 Thread via GitHub
alamb commented on code in PR #11278: URL: https://github.com/apache/datafusion/pull/11278#discussion_r1667071792 ## datafusion/substrait/tests/testdata/data.csv: ## @@ -1,3 +1,3 @@ a,b,c,d,e,f -1,2.0,2020-01-01,false,4294967296,'a' -3,4.5,2020-01-01,true,2147483648,'b' \ No ne

Re: [I] [DISCUSSION] Potentially consolidate Expr::IsNotUnknown, `IsKnown`, etc [datafusion]

2024-07-05 Thread via GitHub
alamb commented on issue #11282: URL: https://github.com/apache/datafusion/issues/11282#issuecomment-2211304845 I like the idea of desugaring (I think that would map to consolidating some of the Enum variants) 🚀 -- This is an automated message from the Apache Git Service. To respond to t

Re: [PR] Improve and test dataframe API examples in docs [datafusion]

2024-07-05 Thread via GitHub
alamb commented on code in PR #11290: URL: https://github.com/apache/datafusion/pull/11290#discussion_r1667067174 ## docs/source/library-user-guide/using-the-dataframe-api.md: ## @@ -19,129 +19,236 @@ # Using the DataFrame API -## What is a DataFrame +## What is a DataFrame

[PR] Improve and test dataframe API examples in docs [datafusion]

2024-07-05 Thread via GitHub
alamb opened a new pull request, #11290: URL: https://github.com/apache/datafusion/pull/11290 ## Which issue does this PR close? Part of https://github.com/apache/datafusion/issues/11172 ## Rationale for this change I am trying to make the documentation better and exa

Re: [PR] Convert `nth_value` to UDAF [datafusion]

2024-07-05 Thread via GitHub
jcsherin commented on code in PR #11287: URL: https://github.com/apache/datafusion/pull/11287#discussion_r1667019179 ## datafusion/functions-aggregate/src/nth_value.rs: ## @@ -430,3 +428,176 @@ impl NthValueAccumulator { Ok(()) } } + +/// This is a wrapper struct

Re: [PR] Convert `nth_value` to UDAF [datafusion]

2024-07-05 Thread via GitHub
jcsherin commented on code in PR #11287: URL: https://github.com/apache/datafusion/pull/11287#discussion_r1667017516 ## datafusion/functions-aggregate/src/nth_value.rs: ## @@ -19,152 +19,150 @@ //! that can evaluated at runtime during query execution use std::any::Any; -use

[PR] minor: replace .downcast_ref::().is_some() with .is::() [datafusion-comet]

2024-07-05 Thread via GitHub
andygrove opened a new pull request, #635: URL: https://github.com/apache/datafusion-comet/pull/635 ## Which issue does this PR close? N/A ## Rationale for this change Minor code cleanup. We can check the type of an `Any` using `is::()` instead of actuall

Re: [I] Separate Spark-compatibility expressions into a library [datafusion-comet]

2024-07-05 Thread via GitHub
andygrove commented on issue #630: URL: https://github.com/apache/datafusion-comet/issues/630#issuecomment-2211239482 Thank you for filing this @Blizzara. I have been meaning to suggest this as well. I think it would be very valuable to provide a crate with spark-compatible expressions but

Re: [I] Investigate possible unsafe use of get_unchecked in spark_compatible_murmur3_hash [datafusion-comet]

2024-07-05 Thread via GitHub
andygrove commented on issue #633: URL: https://github.com/apache/datafusion-comet/issues/633#issuecomment-2211236446 test execution::datafusion::shuffle_writer::test::test_insert_larger_batch ... error: Undefined Behavior: trying to retag from <789872535> for SharedReadOnly permission at

Re: [PR] Improve volatile expression handling in `CommonSubexprEliminate` [datafusion]

2024-07-05 Thread via GitHub
peter-toth commented on code in PR #11265: URL: https://github.com/apache/datafusion/pull/11265#discussion_r1667023665 ## datafusion/optimizer/src/common_subexpr_eliminate.rs: ## @@ -917,27 +912,36 @@ struct ExprIdentifierVisitor<'a, 'n> { /// Record item that used when trave

Re: [PR] feat: enable "substring" as a UDF in addition to "substr" [datafusion]

2024-07-05 Thread via GitHub
Lordworms commented on PR #11277: URL: https://github.com/apache/datafusion/pull/11277#issuecomment-2211225604 Seems like a reasonable change to me, Thank you @Blizzara -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

Re: [I] Implement a scalar function for creating ScalarValue::Map [datafusion]

2024-07-05 Thread via GitHub
goldmedal commented on issue #11268: URL: https://github.com/apache/datafusion/issues/11268#issuecomment-2211228822 > I think following the model for `named_struct` makes the most sense here > > From my reading of https://docs.rs/arrow/latest/arrow/array/struct.MapArray.html the idea

Re: [PR] Convert `nth_value` to UDAF [datafusion]

2024-07-05 Thread via GitHub
jcsherin commented on code in PR #11287: URL: https://github.com/apache/datafusion/pull/11287#discussion_r1667017516 ## datafusion/functions-aggregate/src/nth_value.rs: ## @@ -19,152 +19,150 @@ //! that can evaluated at runtime during query execution use std::any::Any; -use

[I] Run miri checks in CI [datafusion-comet]

2024-07-05 Thread via GitHub
andygrove opened a new issue, #634: URL: https://github.com/apache/datafusion-comet/issues/634 ### What is the problem the feature request solves? We should run the miri checker in CI to help ensure that our unsafe code is actually safe. See https://github.com/rust-lang/miri fo

Re: [PR] Fail on optimization cycles [datafusion]

2024-07-05 Thread via GitHub
findepi commented on PR #11288: URL: https://github.com/apache/datafusion/pull/11288#issuecomment-2211219703 Tests fail indicating that `optimize_projections` can loop. Need further investigation -- This is an automated message from the Apache Git Service. To respond to the message, plea

[I] Investigate possible unsafe use of get_unchecked in spark_compatible_murmur3_hash [datafusion-comet]

2024-07-05 Thread via GitHub
andygrove opened a new issue, #633: URL: https://github.com/apache/datafusion-comet/issues/633 ### Describe the bug I ran the miri checker and it warned about a possible unsafe call to get_unchecked that we should investigate. I will post full details shortly. ### Steps

Re: [PR] add short circuit in BinaryExpr [datafusion]

2024-07-05 Thread via GitHub
alamb commented on PR #11247: URL: https://github.com/apache/datafusion/pull/11247#issuecomment-2211217568 Here are my benchmark results ``` Benchmark tpch_mem.json ┏━━┳━━━┳━━━┳━━━┓

Re: [PR] Improve volatile expression handling in `CommonSubexprEliminate` [datafusion]

2024-07-05 Thread via GitHub
peter-toth commented on PR #11265: URL: https://github.com/apache/datafusion/pull/11265#issuecomment-2211216696 > It may also be worth writing a few .slt tests showing this in action and how it interacts with multiple subexpressions Sure, I will add some tests tomorrow. -- This is

Re: [PR] fix: correctly handle Substrait windows with rows bounds (and validate executability of test plans) [datafusion]

2024-07-05 Thread via GitHub
Blizzara commented on code in PR #11278: URL: https://github.com/apache/datafusion/pull/11278#discussion_r1667027643 ## datafusion/substrait/tests/cases/roundtrip_logical_plan.rs: ## @@ -523,24 +523,6 @@ async fn roundtrip_arithmetic_ops() -> Result<()> { Ok(()) } -#[tok

Re: [PR] Improve volatile expression handling in `CommonSubexprEliminate` [datafusion]

2024-07-05 Thread via GitHub
peter-toth commented on code in PR #11265: URL: https://github.com/apache/datafusion/pull/11265#discussion_r1667004909 ## datafusion/optimizer/src/common_subexpr_eliminate.rs: ## @@ -1838,4 +1851,53 @@ mod test { Ok(()) } + +#[test] +fn test_volatile() ->

Re: [PR] fix: correctly handle Substrait windows with rows bounds (and validate executability of test plans) [datafusion]

2024-07-05 Thread via GitHub
Blizzara commented on code in PR #11278: URL: https://github.com/apache/datafusion/pull/11278#discussion_r1667027578 ## datafusion/substrait/tests/testdata/data.csv: ## @@ -1,3 +1,3 @@ a,b,c,d,e,f -1,2.0,2020-01-01,false,4294967296,'a' -3,4.5,2020-01-01,true,2147483648,'b' \ No

Re: [PR] extract statistics read for struct array in parquet [datafusion]

2024-07-05 Thread via GitHub
Lordworms commented on PR #11289: URL: https://github.com/apache/datafusion/pull/11289#issuecomment-2211201377 @alamb I'll try to add support for struct in DataPage in next PR. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

Re: [PR] Implement non-nested struct statistic extraction [datafusion]

2024-07-05 Thread via GitHub
Lordworms closed pull request #10730: Implement non-nested struct statistic extraction URL: https://github.com/apache/datafusion/pull/10730 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specif

Re: [PR] fix: correctly handle Substrait windows with rows bounds (and validate executability of test plans) [datafusion]

2024-07-05 Thread via GitHub
alamb commented on code in PR #11278: URL: https://github.com/apache/datafusion/pull/11278#discussion_r1667023252 ## datafusion/substrait/tests/cases/roundtrip_logical_plan.rs: ## @@ -523,24 +523,6 @@ async fn roundtrip_arithmetic_ops() -> Result<()> { Ok(()) } -#[tokio:

Re: [PR] Improve volatile expression handling in `CommonSubexprEliminate` [datafusion]

2024-07-05 Thread via GitHub
peter-toth commented on code in PR #11265: URL: https://github.com/apache/datafusion/pull/11265#discussion_r1667022866 ## datafusion/optimizer/src/common_subexpr_eliminate.rs: ## @@ -917,27 +912,36 @@ struct ExprIdentifierVisitor<'a, 'n> { /// Record item that used when trave

Re: [PR] Improve volatile expression handling in `CommonSubexprEliminate` [datafusion]

2024-07-05 Thread via GitHub
peter-toth commented on code in PR #11265: URL: https://github.com/apache/datafusion/pull/11265#discussion_r1667004909 ## datafusion/optimizer/src/common_subexpr_eliminate.rs: ## @@ -1838,4 +1851,53 @@ mod test { Ok(()) } + +#[test] +fn test_volatile() ->

Re: [I] [DISCUSSION] Potentially consolidate Expr::IsNotUnknown, `IsKnown`, etc [datafusion]

2024-07-05 Thread via GitHub
findepi commented on issue #11282: URL: https://github.com/apache/datafusion/issues/11282#issuecomment-2211198155 Which in turn means we need those duplicated constructs on the SQL layer, but we could then desugar and replace `IS [NOT] UNKNOWN` with `IS [NOT] NULL` at runtime. -- This is

[PR] extract statistics read for struct array in parquet [datafusion]

2024-07-05 Thread via GitHub
Lordworms opened a new pull request, #11289: URL: https://github.com/apache/datafusion/pull/11289 ## Which issue does this PR close? Closes #10609 ## Rationale for this change ## What changes are included in this PR? ## Are these changes te

Re: [PR] Convert `nth_value` to UDAF [datafusion]

2024-07-05 Thread via GitHub
jcsherin commented on code in PR #11287: URL: https://github.com/apache/datafusion/pull/11287#discussion_r1667024463 ## datafusion/sql/src/expr/function.rs: ## @@ -415,9 +415,11 @@ impl<'a, S: ContextProvider> SqlToRel<'a, S> { ) -> Result { // check udaf first

Re: [PR] Convert udaf nth value [datafusion]

2024-07-05 Thread via GitHub
jcsherin commented on code in PR #11287: URL: https://github.com/apache/datafusion/pull/11287#discussion_r1667019179 ## datafusion/functions-aggregate/src/nth_value.rs: ## @@ -430,3 +428,176 @@ impl NthValueAccumulator { Ok(()) } } + +/// This is a wrapper struct

[I] Produce optimal plans, remove `max_passes` config [datafusion]

2024-07-05 Thread via GitHub
findepi opened a new issue, #11286: URL: https://github.com/apache/datafusion/issues/11286 ### Is your feature request related to a problem or challenge? `max_passes` (defaults to 3) defines how many times a given optimization rule can be applied to a plan tree. This artificially l

Re: [I] HashJoin for nested types give wrong results [datafusion]

2024-07-05 Thread via GitHub
comphead closed issue #11233: HashJoin for nested types give wrong results URL: https://github.com/apache/datafusion/issues/11233 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] fix: correctly handle Substrait windows with rows bounds (and validate executability of test plans) [datafusion]

2024-07-05 Thread via GitHub
alamb commented on code in PR #11278: URL: https://github.com/apache/datafusion/pull/11278#discussion_r1667022434 ## datafusion/substrait/tests/testdata/data.csv: ## @@ -1,3 +1,3 @@ a,b,c,d,e,f -1,2.0,2020-01-01,false,4294967296,'a' -3,4.5,2020-01-01,true,2147483648,'b' \ No ne

[PR] Fail on optimization cycles [datafusion]

2024-07-05 Thread via GitHub
findepi opened a new pull request, #11288: URL: https://github.com/apache/datafusion/pull/11288 Fixes https://github.com/apache/datafusion/issues/11285. See there for rationale. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

Re: [PR] Add LogicalPlanSignature and use in the optimizer loop [datafusion]

2024-07-05 Thread via GitHub
findepi commented on code in PR #5623: URL: https://github.com/apache/datafusion/pull/5623#discussion_r1667008120 ## datafusion/optimizer/src/optimizer.rs: ## @@ -330,15 +324,14 @@ impl Optimizer { } log_plan(&format!("Optimized plan (pass {i})"), &new_

[PR] Convert udaf nth value [datafusion]

2024-07-05 Thread via GitHub
jcsherin opened a new pull request, #11287: URL: https://github.com/apache/datafusion/pull/11287 ## Which issue does this PR close? Closes #11284. ## Rationale for this change ## What changes are included in this PR? - Converts `nth_value` to UDAF

Re: [PR] Improve volatile expression handling in `CommonSubexprEliminate` [datafusion]

2024-07-05 Thread via GitHub
peter-toth commented on code in PR #11265: URL: https://github.com/apache/datafusion/pull/11265#discussion_r1667023792 ## datafusion/expr/src/expr.rs: ## @@ -1413,12 +1413,16 @@ impl Expr { .unwrap() } +/// Returns true if the expression node is volatile,

Re: [PR] Convert udaf nth value [datafusion]

2024-07-05 Thread via GitHub
jcsherin commented on code in PR #11287: URL: https://github.com/apache/datafusion/pull/11287#discussion_r1667017516 ## datafusion/functions-aggregate/src/nth_value.rs: ## @@ -19,152 +19,150 @@ //! that can evaluated at runtime during query execution use std::any::Any; -use

Re: [PR] Convert udaf nth value [datafusion]

2024-07-05 Thread via GitHub
jcsherin commented on code in PR #11287: URL: https://github.com/apache/datafusion/pull/11287#discussion_r1667015451 ## datafusion/functions-aggregate/src/nth_value.rs: ## @@ -19,152 +19,150 @@ //! that can evaluated at runtime during query execution use std::any::Any; -use

Re: [PR] Improve volatile expression handling in `CommonSubexprEliminate` [datafusion]

2024-07-05 Thread via GitHub
peter-toth commented on code in PR #11265: URL: https://github.com/apache/datafusion/pull/11265#discussion_r1667004909 ## datafusion/optimizer/src/common_subexpr_eliminate.rs: ## @@ -1838,4 +1851,53 @@ mod test { Ok(()) } + +#[test] +fn test_volatile() ->

Re: [PR] Improve volatile expression handling in `CommonSubexprEliminate` [datafusion]

2024-07-05 Thread via GitHub
peter-toth commented on code in PR #11265: URL: https://github.com/apache/datafusion/pull/11265#discussion_r1667004909 ## datafusion/optimizer/src/common_subexpr_eliminate.rs: ## @@ -1838,4 +1851,53 @@ mod test { Ok(()) } + +#[test] +fn test_volatile() ->

Re: [PR] Improve volatile expression handling in `CommonSubexprEliminate` [datafusion]

2024-07-05 Thread via GitHub
alamb commented on code in PR #11265: URL: https://github.com/apache/datafusion/pull/11265#discussion_r1666988190 ## datafusion/optimizer/src/common_subexpr_eliminate.rs: ## @@ -1838,4 +1851,53 @@ mod test { Ok(()) } + +#[test] +fn test_volatile() -> Resu

[I] Avoid masking errors and producing suboptimal plans [datafusion]

2024-07-05 Thread via GitHub
findepi opened a new issue, #11285: URL: https://github.com/apache/datafusion/issues/11285 ### Is your feature request related to a problem or challenge? `Optimizer` returns early when a rewrite produces a certain plan for the second time. Producing same plan again is considered a

Re: [PR] initial prettier unparse [datafusion]

2024-07-05 Thread via GitHub
goldmedal commented on code in PR #11186: URL: https://github.com/apache/datafusion/pull/11186#discussion_r1667004917 ## datafusion/sql/tests/cases/plan_to_sql.rs: ## @@ -314,3 +310,78 @@ fn test_table_references_in_plan_to_sql() { "SELECT \"table\".id, \"table\".\"valu

Re: [I] [DISCUSSION] Potentially consolidate Expr::IsNotUnknown, `IsKnown`, etc [datafusion]

2024-07-05 Thread via GitHub
findepi commented on issue #11282: URL: https://github.com/apache/datafusion/issues/11282#issuecomment-2211173292 Indeed, from SQL 2016 spec ``` ::= [ IS [ NOT ] ] ::= TRUE | FALSE | UNKNOWN ``` I didn't know about this syntax. Than

Re: [PR] Fix running examples readme [datafusion]

2024-07-05 Thread via GitHub
comphead commented on PR #11225: URL: https://github.com/apache/datafusion/pull/11225#issuecomment-2211174548 > This branch sadly has some merge conflicts that need to be resolved before we can merge it: > There is no conflicted files though, hope rebase solves it @findepi --

  1   2   3   >