Re: [I] SQL/PGQ or even GQL support [datafusion]

2025-01-31 Thread via GitHub
georgiy-belyanin commented on issue #13545: URL: https://github.com/apache/datafusion/issues/13545#issuecomment-2626672813 Probably, it's worth starting from something simple related to SQL/PGQ. SQL/PGQ DDL sounds easy enough to be implemented using parser extensions. Then `CREATE PROPERTY

Re: [I] Simple Functions [datafusion]

2025-01-31 Thread via GitHub
davidhewitt commented on issue #12635: URL: https://github.com/apache/datafusion/issues/12635#issuecomment-2626830164 Thanks, looks similar to what we've done in `datafusions-functions-json` but we tried to handle dictionaries without casting them away. I guess we should just cast them and

Re: [PR] Example for using a separate threadpool for CPU bound work (try 2) [datafusion]

2025-01-31 Thread via GitHub
JanKaul commented on PR #14286: URL: https://github.com/apache/datafusion/pull/14286#issuecomment-2626849059 Thanks for having a look. I have to dig a bit deeper to really understand where the issue is. I think this PR is a very pragmatic approach that solves the most important use c

Re: [PR] Example for using a separate threadpool for CPU bound work (try 2) [datafusion]

2025-01-31 Thread via GitHub
tustvold commented on PR #14286: URL: https://github.com/apache/datafusion/pull/14286#issuecomment-2626852759 > As this problem seems to only occur when executing the physical plan. Assuming this was a mistake we made at InfluxData, we had issues with plan time concurrency that were c

Re: [I] column types must match schema types occurs after `unnest_columns` on another column [datafusion]

2025-01-31 Thread via GitHub
duongcongtoai commented on issue #14218: URL: https://github.com/apache/datafusion/issues/14218#issuecomment-2626875293 duplicate with https://github.com/apache/datafusion/issues/6057 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to G

Re: [I] `make_array` -> `unnest` w/ dict-encoded strings fails [datafusion]

2025-01-31 Thread via GitHub
duongcongtoai commented on issue #6057: URL: https://github.com/apache/datafusion/issues/6057#issuecomment-2626880570 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. T

Re: [I] bug: array_position not working as expected [datafusion]

2025-01-31 Thread via GitHub
duongcongtoai commented on issue #6694: URL: https://github.com/apache/datafusion/issues/6694#issuecomment-2626879276 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. T

Re: [PR] Support marking columns as system columns via Field's metadata [datafusion]

2025-01-31 Thread via GitHub
chenkovsky commented on PR #14362: URL: https://github.com/apache/datafusion/pull/14362#issuecomment-2626887571 > One thing I didn't understand from you PR @chenkovsky: you got it to work only modifying TableScan? I had trouble understanding when I was grokking your PR. If you have a better

Re: [PR] Provide user-defined invariants for logical node extensions. [datafusion]

2025-01-31 Thread via GitHub
alamb commented on code in PR #14329: URL: https://github.com/apache/datafusion/pull/14329#discussion_r1937048640 ## datafusion/expr/src/logical_plan/extension.rs: ## @@ -54,6 +57,22 @@ pub trait UserDefinedLogicalNode: fmt::Debug + Send + Sync { /// Return the output schem

Re: [I] Enable TableScan to return multiple arbitrary table references [datafusion]

2025-01-31 Thread via GitHub
phisn commented on issue #14358: URL: https://github.com/apache/datafusion/issues/14358#issuecomment-2626936190 Note: This would be a API breaking change because we have to change a public field in the LogicalPlan. -- This is an automated message from the Apache Git Service. To respond to

Re: [PR] Example for using a separate threadpool for CPU bound work (try 2) [datafusion]

2025-01-31 Thread via GitHub
rohitrastogi commented on code in PR #14286: URL: https://github.com/apache/datafusion/pull/14286#discussion_r1936870081 ## datafusion-examples/examples/thread_pools_lib/dedicated_executor.rs: ## @@ -0,0 +1,1778 @@ +// Licensed to the Apache Software Foundation (ASF) under one +

Re: [PR] Example for using a separate threadpool for CPU bound work (try 2) [datafusion]

2025-01-31 Thread via GitHub
rohitrastogi commented on code in PR #14286: URL: https://github.com/apache/datafusion/pull/14286#discussion_r1936870081 ## datafusion-examples/examples/thread_pools_lib/dedicated_executor.rs: ## @@ -0,0 +1,1778 @@ +// Licensed to the Apache Software Foundation (ASF) under one +

[I] Create UNION plan node with correct schema [datafusion]

2025-01-31 Thread via GitHub
findepi opened a new issue, #14380: URL: https://github.com/apache/datafusion/issues/14380 ### Is your feature request related to a problem or challenge? When building a query plan `union()` behavior doesn't type-coerce the expressions and it [takes the left node's schema](https://gi

Re: [PR] Provide documentation of expose APIs to enable handling of type coercion at UNION plan construction. [datafusion]

2025-01-31 Thread via GitHub
findepi commented on code in PR #12142: URL: https://github.com/apache/datafusion/pull/12142#discussion_r1937064978 ## datafusion/expr/src/logical_plan/builder.rs: ## @@ -1331,7 +1331,17 @@ pub fn validate_unique_names<'a>( }) } -/// Union two logical plans. +/// Union t

Re: [PR] Provide documentation of expose APIs to enable handling of type coercion at UNION plan construction. [datafusion]

2025-01-31 Thread via GitHub
findepi commented on code in PR #12142: URL: https://github.com/apache/datafusion/pull/12142#discussion_r1937065101 ## datafusion/expr/src/logical_plan/builder.rs: ## @@ -1331,7 +1331,17 @@ pub fn validate_unique_names<'a>( }) } -/// Union two logical plans. +/// Union t

Re: [PR] Fix UNION field nullability tracking [datafusion]

2025-01-31 Thread via GitHub
findepi commented on code in PR #14356: URL: https://github.com/apache/datafusion/pull/14356#discussion_r1937065602 ## datafusion/expr/src/logical_plan/plan.rs: ## @@ -2645,6 +2643,106 @@ pub struct Union { pub schema: DFSchemaRef, } +impl Union { +/// Constructs new

Re: [PR] Fix UNION field nullability tracking [datafusion]

2025-01-31 Thread via GitHub
findepi commented on code in PR #14356: URL: https://github.com/apache/datafusion/pull/14356#discussion_r1937065983 ## datafusion/expr/src/logical_plan/plan.rs: ## @@ -2645,6 +2643,106 @@ pub struct Union { pub schema: DFSchemaRef, } +impl Union { +/// Constructs new

[I] Format for Value renders incorrect escaping of quote characters in BigQuery [datafusion-sqlparser-rs]

2025-01-31 Thread via GitHub
graup opened a new issue, #1695: URL: https://github.com/apache/datafusion-sqlparser-rs/issues/1695 `impl fmt::Display for EscapeQuotedString` (which is used when formatting a Value expr) uses double quote character escaping (like `''` or `""`) which is a syntax error in BigQuery. T

Re: [I] `make_array` -> `unnest` w/ dict-encoded strings fails [datafusion]

2025-01-31 Thread via GitHub
duongcongtoai commented on issue #6057: URL: https://github.com/apache/datafusion/issues/6057#issuecomment-2626964540 some work has been going on with unnest, and somehow this bug can no longer be reproduced ``` async fn test_unnest_used_to_be_broken() { let ctx = SessionContext

Re: [PR] Add RETURNS TABLE() support for CREATE FUNCTION in Postgresql [datafusion-sqlparser-rs]

2025-01-31 Thread via GitHub
alamb commented on PR #1687: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1687#issuecomment-2626964878 Thanks @remysaissy and @iffyio -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to g

Re: [PR] Add RETURNS TABLE() support for CREATE FUNCTION in Postgresql [datafusion-sqlparser-rs]

2025-01-31 Thread via GitHub
alamb commented on PR #1687: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1687#issuecomment-2626965748 It looks like there are some real test failures -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

Re: [PR] Fix UNION field nullability tracking [datafusion]

2025-01-31 Thread via GitHub
findepi commented on code in PR #14356: URL: https://github.com/apache/datafusion/pull/14356#discussion_r1937072353 ## datafusion/expr/src/logical_plan/plan.rs: ## @@ -2645,6 +2643,106 @@ pub struct Union { pub schema: DFSchemaRef, } +impl Union { +/// Constructs new

Re: [PR] Fix UNION field nullability tracking [datafusion]

2025-01-31 Thread via GitHub
findepi commented on code in PR #14356: URL: https://github.com/apache/datafusion/pull/14356#discussion_r1937076628 ## datafusion/expr/src/logical_plan/plan.rs: ## @@ -2645,6 +2643,106 @@ pub struct Union { pub schema: DFSchemaRef, } +impl Union { +/// Constructs new

Re: [PR] Example for using a separate threadpool for CPU bound work (try 2) [datafusion]

2025-01-31 Thread via GitHub
JanKaul commented on PR #14286: URL: https://github.com/apache/datafusion/pull/14286#issuecomment-2626715496 @tustvold I've adopted the [example](https://github.com/JanKaul/cpu-io-executor/blob/main/src/buffered_stream.rs) to somewhat resemble the parquet reader. It tries to mimic its behav

Re: [PR] Example for using a separate threadpool for CPU bound work (try 2) [datafusion]

2025-01-31 Thread via GitHub
tustvold commented on PR #14286: URL: https://github.com/apache/datafusion/pull/14286#issuecomment-2626724671 I think you misunderstand, the parquet reader does not do streaming fetches at all, it uses get_range and get_ranges. -- This is an automated message from the Apache Git Service.

Re: [PR] Fix Type Coercion for UDF Arguments [datafusion]

2025-01-31 Thread via GitHub
shehabgamin commented on PR #14268: URL: https://github.com/apache/datafusion/pull/14268#issuecomment-2626728678 > First of all, we need to have a out of box behavior for each function, and at the same time easy to be customizable. The Signature Coercible now becomes problematic, if we foll

Re: [PR] Example for using a separate threadpool for CPU bound work (try 2) [datafusion]

2025-01-31 Thread via GitHub
tustvold commented on code in PR #14286: URL: https://github.com/apache/datafusion/pull/14286#discussion_r1936924661 ## datafusion-examples/examples/thread_pools_lib/dedicated_executor.rs: ## @@ -0,0 +1,1778 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// o

Re: [PR] Improve speed of `median` by implementing special `GroupsAccumulator` [datafusion]

2025-01-31 Thread via GitHub
alamb commented on PR #13681: URL: https://github.com/apache/datafusion/pull/13681#issuecomment-2626987546 🚀 I am quite please to see this make the 45 release -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

Re: [PR] feat: implement `invoke_with_args` for `struct` and `named_struct` [datafusion]

2025-01-31 Thread via GitHub
alamb commented on code in PR #14276: URL: https://github.com/apache/datafusion/pull/14276#discussion_r1937088967 ## datafusion/functions/src/core/named_struct.rs: ## @@ -203,12 +137,19 @@ impl ScalarUDFImpl for NamedStructFunc { } -fn invoke_batch( Rev

Re: [PR] Remove use of deprecated dict_id in datafusion-proto (#14173) [datafusion]

2025-01-31 Thread via GitHub
cj-zhukov commented on PR #14227: URL: https://github.com/apache/datafusion/pull/14227#issuecomment-2627001851 > It seems that `datafusion/core/src/physical_optimizer/enforce_sorting.rs` somehow is added to this PR accidentally (it was moved on main recently, so perhaps this happened during

Re: [I] Querying Parquet file specifically with a predicate returns invalid data error but works in other situations [datafusion]

2025-01-31 Thread via GitHub
jhorstmann commented on issue #14281: URL: https://github.com/apache/datafusion/issues/14281#issuecomment-2627342594 Having looked into thrift some time ago, this reminds me of this [inconsistency about boolean encodings](https://github.com/jhorstmann/compact-thrift/issues/7), where boolea

[PR] DFParser should skip unsupported COPY INTO [datafusion]

2025-01-31 Thread via GitHub
osipovartem opened a new pull request, #14382: URL: https://github.com/apache/datafusion/pull/14382 ## Which issue does this PR close? Closes apache/datafusion#14372 ## What changes are included in this PR? This PR allows to skip **[COPY INTO](https://docs.snowflake.com/

[I] `Case` coercion of Structs loses field names (regression introduced in DataFusion 44) [datafusion]

2025-01-31 Thread via GitHub
alamb opened a new issue, #14383: URL: https://github.com/apache/datafusion/issues/14383 ### Describe the bug Found while working on - https://github.com/apache/datafusion/issues/14154 it may be the same When two structs are coerced as part of case, the field names are lost (

Re: [I] Type Coercion fails for List with inner type struct which has large/view types [datafusion]

2025-01-31 Thread via GitHub
alamb commented on issue #14154: URL: https://github.com/apache/datafusion/issues/14154#issuecomment-2627390626 Well, I found another bug when working on this one - https://github.com/apache/datafusion/issues/14383 I still haven't found a reproducer for this one. But I have another

Re: [I] `Case` coercion of Structs loses field names [datafusion]

2025-01-31 Thread via GitHub
alamb commented on issue #14383: URL: https://github.com/apache/datafusion/issues/14383#issuecomment-2627393447 Actually I tested this in DataFuson 43 and it also fails -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and u

[PR] Alamb/undo removal [datafusion]

2025-01-31 Thread via GitHub
alamb opened a new pull request, #14388: URL: https://github.com/apache/datafusion/pull/14388 ## Which issue does this PR close? - related to https://github.com/apache/datafusion/pull/14370 ## Rationale for this change As @andygrove mentioned https://github.com/apache/d

Re: [I] Regression: `Invalid comparison operation: Utf8 == Utf8View` error during LEFT ANTI JOIN [datafusion]

2025-01-31 Thread via GitHub
rkrishn7 commented on issue #13510: URL: https://github.com/apache/datafusion/issues/13510#issuecomment-2627821535 Awesome, thanks @alamb! Your fix makes sense to me 👍🏾 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[I] [EPIC] Redesign Datafusion main page [datafusion]

2025-01-31 Thread via GitHub
comphead opened a new issue, #14389: URL: https://github.com/apache/datafusion/issues/14389 ### Is your feature request related to a problem or challenge? Users currently fell a little bit lost when dealing with Datafusion main page https://github.com/apache/datafusion Imho the

Re: [PR] minor: remove unused is_sorted method from utils [datafusion]

2025-01-31 Thread via GitHub
alamb commented on code in PR #14370: URL: https://github.com/apache/datafusion/pull/14370#discussion_r1937622258 ## datafusion/common/src/utils/mod.rs: ## @@ -769,20 +769,6 @@ pub fn set_difference, S: Borrow>( .collect() } -/// Checks whether the given index sequen

[I] Add FFI wrappers for TableProvider::insert_into [datafusion]

2025-01-31 Thread via GitHub
davisp opened a new issue, #14390: URL: https://github.com/apache/datafusion/issues/14390 ### Is your feature request related to a problem or challenge? The current `datafusion_ffi` wrappers do not have support for `TableProvider::insert_into`. ### Describe the solution you'd l

Re: [I] Querying Parquet file specifically with a predicate returns invalid data error but works in other situations [datafusion]

2025-01-31 Thread via GitHub
alamb commented on issue #14281: URL: https://github.com/apache/datafusion/issues/14281#issuecomment-2627834027 @jhorstmann made a fix: - https://github.com/apache/datafusion/issues/14281 @senyosimpson any chance you can verify that the change in https://github.com/apache/datafu

[PR] Wrap TableProvider::insert_into [datafusion]

2025-01-31 Thread via GitHub
davisp opened a new pull request, #14391: URL: https://github.com/apache/datafusion/pull/14391 This method was missing from the FFI bindings for use in datafusion-python extensions. ## Which issue does this PR close? Closes #14390. ## Rationale for this change Add

Re: [I] Add Span to Tokens, AST nodes [datafusion-sqlparser-rs]

2025-01-31 Thread via GitHub
alamb commented on issue #161: URL: https://github.com/apache/datafusion-sqlparser-rs/issues/161#issuecomment-2627837655 Good call -- thanks @mkarbo Let's continue coordination of the remaining features / work on - https://github.com/apache/datafusion-sqlparser-rs/issues/1548 -

Re: [PR] Make TypedString preserve quote style [datafusion-sqlparser-rs]

2025-01-31 Thread via GitHub
alamb commented on PR #1679: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1679#issuecomment-2627835959 Werd! The PR / code 🚂 keeps on running. Thanks again @iffyio for all you do to keep this repo moving forward -- This is an automated message from the Apache Git Service.

Re: [PR] Fix UNION field nullability tracking [datafusion]

2025-01-31 Thread via GitHub
findepi merged PR #14356: URL: https://github.com/apache/datafusion/pull/14356 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafu

Re: [I] External Error prefix is repeated multiple times [datafusion]

2025-01-31 Thread via GitHub
Omega359 commented on issue #14080: URL: https://github.com/apache/datafusion/issues/14080#issuecomment-2627285157 https://github.com/apache/datafusion-testing/blob/5b424aefd7f6bf198220c37f59d39dbb25b47695/data/sqlite/random/groupby/slt_good_12.slt#L19568 is an example. -- This is an aut

Re: [PR] Feature: AggregateMonotonicity [datafusion]

2025-01-31 Thread via GitHub
berkaysynnada commented on code in PR #14271: URL: https://github.com/apache/datafusion/pull/14271#discussion_r1937284064 ## datafusion/sqllogictest/test_files/window.slt: ## @@ -5452,3 +5452,89 @@ order by c1, c2, rank1, rank2; statement ok drop table t1; + + +# Set-Monoton

Re: [I] Incorrect result for IS NOT NULL predicate over UNION ALL query [datafusion]

2025-01-31 Thread via GitHub
findepi closed issue #14352: Incorrect result for IS NOT NULL predicate over UNION ALL query URL: https://github.com/apache/datafusion/issues/14352 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to th

Re: [I] [DISCUSSION] Add separate crate to cover spark builtin functions [datafusion]

2025-01-31 Thread via GitHub
andygrove commented on issue #5600: URL: https://github.com/apache/datafusion/issues/5600#issuecomment-2627483100 > With that in mind, a joint effort on something in the main DataFusion repo or a `datafusion-contrib` repo could both work, and we are open to either option. I am -1 on

[PR] Alamb/fix field coercion [datafusion]

2025-01-31 Thread via GitHub
alamb opened a new pull request, #14384: URL: https://github.com/apache/datafusion/pull/14384 ## Which issue does this PR close? - Fixes https://github.com/apache/datafusion/issues/14383 ## Rationale for this change When coercing fields for a struct, we shouldn't rename t

Re: [PR] Do not rename struct fields when coercing types in `CASE` [datafusion]

2025-01-31 Thread via GitHub
alamb commented on code in PR #14384: URL: https://github.com/apache/datafusion/pull/14384#discussion_r1937405278 ## datafusion/expr-common/src/type_coercion/binary.rs: ## @@ -961,23 +961,31 @@ fn struct_coercion(lhs_type: &DataType, rhs_type: &DataType) -> Option

Re: [I] [DISCUSSION] Add separate crate to cover spark builtin functions [datafusion]

2025-01-31 Thread via GitHub
andygrove commented on issue #5600: URL: https://github.com/apache/datafusion/issues/5600#issuecomment-2627484867 > 3\. I'm not sure we want to say yes to spark but no to other udf suites. This is a valid point also. -- This is an automated message from the Apache Git Servic

Re: [I] [DISCUSSION] Add separate crate to cover spark builtin functions [datafusion]

2025-01-31 Thread via GitHub
andygrove commented on issue #5600: URL: https://github.com/apache/datafusion/issues/5600#issuecomment-2627492221 It seems to me that we already have an Apache DataFusion project that provides Spark-compatible DataFusion expressions (Comet). I think @shehabgamin's main concern is tha

Re: [PR] Do not rename struct fields when coercing types in `CASE` [datafusion]

2025-01-31 Thread via GitHub
alamb commented on code in PR #14384: URL: https://github.com/apache/datafusion/pull/14384#discussion_r1937410660 ## datafusion/sqllogictest/test_files/case.slt: ## @@ -308,3 +308,113 @@ NULL NULL false statement ok drop table foo + + +# Test coercion of inner struct field n

Re: [PR] fix: fetch is missed during EnforceDistribution [datafusion]

2025-01-31 Thread via GitHub
xudong963 commented on PR #14207: URL: https://github.com/apache/datafusion/pull/14207#issuecomment-2627501242 Thanks, @alamb, I'm on vacation, will reply asap -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL ab

Re: [I] [DISCUSSION] Add separate crate to cover spark builtin functions [datafusion]

2025-01-31 Thread via GitHub
comphead commented on issue #5600: URL: https://github.com/apache/datafusion/issues/5600#issuecomment-2627708479 > > > With that in mind, a joint effort on something in the main DataFusion repo or a `datafusion-contrib` repo could both work, and we are open to either option. > > > >

Re: [PR] feat: metadata columns [datafusion]

2025-01-31 Thread via GitHub
adriangb commented on code in PR #14057: URL: https://github.com/apache/datafusion/pull/14057#discussion_r1937553927 ## datafusion-testing: ## Review Comment: This needs to be reverted -- This is an automated message from the Apache Git Service. To respond to the messag

Re: [PR] Do not rename struct fields when coercing types in `CASE` [datafusion]

2025-01-31 Thread via GitHub
alamb commented on code in PR #14384: URL: https://github.com/apache/datafusion/pull/14384#discussion_r1937555136 ## datafusion/expr-common/src/type_coercion/binary.rs: ## @@ -961,23 +961,31 @@ fn struct_coercion(lhs_type: &DataType, rhs_type: &DataType) -> Option

Re: [PR] feat: metadata columns [datafusion]

2025-01-31 Thread via GitHub
adriangb commented on code in PR #14057: URL: https://github.com/apache/datafusion/pull/14057#discussion_r1937555727 ## datafusion/core/src/execution/session_state.rs: ## @@ -105,11 +105,11 @@ use uuid::Uuid; /// # #[tokio::main] /// # async fn main() -> Result<()> { ///

Re: [PR] minor: remove unused is_sorted method from utils [datafusion]

2025-01-31 Thread via GitHub
andygrove commented on code in PR #14370: URL: https://github.com/apache/datafusion/pull/14370#discussion_r1937564039 ## datafusion/common/src/utils/mod.rs: ## @@ -769,20 +769,6 @@ pub fn set_difference, S: Borrow>( .collect() } -/// Checks whether the given index se

Re: [PR] Provide user-defined invariants for logical node extensions. [datafusion]

2025-01-31 Thread via GitHub
wiedld commented on code in PR #14329: URL: https://github.com/apache/datafusion/pull/14329#discussion_r1937693958 ## datafusion/expr/src/logical_plan/extension.rs: ## @@ -54,6 +57,22 @@ pub trait UserDefinedLogicalNode: fmt::Debug + Send + Sync { /// Return the output sche

Re: [PR] Fix Type Coercion for UDF Arguments [datafusion]

2025-01-31 Thread via GitHub
alamb commented on PR #14268: URL: https://github.com/apache/datafusion/pull/14268#issuecomment-2627866762 I don't have a strong opinion about how exactly the coercion rules should be changed. It does seem to me that we keep churning / thrashing on coercion (and introducing regressi

[PR] Parse SET GLOBAL variable modifier for MySQL [datafusion-sqlparser-rs]

2025-01-31 Thread via GitHub
mvzink opened a new pull request, #1696: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1696 This also stops rewriting `SESSION` away. Closes #1694 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use th

Re: [I] [DISCUSSION] Add separate crate to cover spark builtin functions [datafusion]

2025-01-31 Thread via GitHub
alamb commented on issue #5600: URL: https://github.com/apache/datafusion/issues/5600#issuecomment-2628065146 > Perhaps as an alternative we could setup a datafusion-udfs (pick an appropriate name) under the apache umbrella and managed by datafusion pmc's where this could live? Just a thoug

Re: [I] [DISCUSSION] Add separate crate to cover spark builtin functions [datafusion]

2025-01-31 Thread via GitHub
alamb commented on issue #5600: URL: https://github.com/apache/datafusion/issues/5600#issuecomment-2628070067 In terms of testing, I think some combination of sqllogictest / gold data style tests and maybe even real-spark runs in the [`extended.yml`](https://github.com/apache/datafusion/blo

Re: [I] Add Span to Tokens, AST nodes [datafusion-sqlparser-rs]

2025-01-31 Thread via GitHub
mkarbo commented on issue #161: URL: https://github.com/apache/datafusion-sqlparser-rs/issues/161#issuecomment-2627046981 @alamb we can close this one? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[PR] Add tests for coercing structs [datafusion]

2025-01-31 Thread via GitHub
alamb opened a new pull request, #14381: URL: https://github.com/apache/datafusion/pull/14381 ## Which issue does this PR close? - Closes https://github.com/apache/datafusion/issues/14154 ## Rationale for this change We don't have test coverage for having to coerce `List(Str

Re: [I] External Error prefix is repeated multiple times [datafusion]

2025-01-31 Thread via GitHub
alamb commented on issue #14080: URL: https://github.com/apache/datafusion/issues/14080#issuecomment-2627015581 > [@Omega359](https://github.com/Omega359) [@gz](https://github.com/gz) Can you provide me with code that can reproduce a bug? I want to use it for testing. I think the test

Re: [PR] Remove use of deprecated dict_id in datafusion-proto (#14173) [datafusion]

2025-01-31 Thread via GitHub
cj-zhukov commented on code in PR #14227: URL: https://github.com/apache/datafusion/pull/14227#discussion_r1937115713 ## datafusion/proto-common/proto/datafusion_common.proto: ## @@ -108,8 +108,7 @@ message Field { // for complex data types like structs, unions repeated Fi

Re: [PR] Add RETURNS TABLE() support for CREATE FUNCTION in Postgresql [datafusion-sqlparser-rs]

2025-01-31 Thread via GitHub
remysaissy commented on PR #1687: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1687#issuecomment-2627274710 @alamb, I don't see how the `parse_join_constraint_unnest_alias` (nightly build failure) is related to this commit since the difference is related to a Join becoming a

Re: [I] Regression: `Invalid comparison operation: Utf8 == Utf8View` error during LEFT ANTI JOIN [datafusion]

2025-01-31 Thread via GitHub
alamb commented on issue #13510: URL: https://github.com/apache/datafusion/issues/13510#issuecomment-2627641993 I looked a bit at this, I think the issue is that the join type coercion is resolving the on clause in terms of the output schema (not hte input schema). I am working on a

Re: [I] [DISCUSSION] Add separate crate to cover spark builtin functions [datafusion]

2025-01-31 Thread via GitHub
Omega359 commented on issue #5600: URL: https://github.com/apache/datafusion/issues/5600#issuecomment-2627663730 > > With that in mind, a joint effort on something in the main DataFusion repo or a `datafusion-contrib` repo could both work, and we are open to either option. > > I am -

Re: [PR] Support `array_union` scalar expr [datafusion-comet]

2025-01-31 Thread via GitHub
dharanad commented on PR #1362: URL: https://github.com/apache/datafusion-comet/pull/1362#issuecomment-2627696735 There are cases where ordering is different between spark and df ``` == Results == !== Correct Answer - 1000 ==== Spark Answ

Re: [PR] Do not rename struct fields when coercing types in `CASE` [datafusion]

2025-01-31 Thread via GitHub
alamb commented on code in PR #14384: URL: https://github.com/apache/datafusion/pull/14384#discussion_r1937544188 ## datafusion/sqllogictest/test_files/case.slt: ## @@ -308,3 +308,113 @@ NULL NULL false statement ok drop table foo + + +# Test coercion of inner struct field n

Re: [I] `make_array` -> `unnest` w/ dict-encoded strings fails [datafusion]

2025-01-31 Thread via GitHub
crepererum commented on issue #6057: URL: https://github.com/apache/datafusion/issues/6057#issuecomment-2627012783 Then I suggest we add a regression test and close this issue. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub a

Re: [PR] Remove use of deprecated dict_id in datafusion-proto (#14173) [datafusion]

2025-01-31 Thread via GitHub
alamb commented on code in PR #14227: URL: https://github.com/apache/datafusion/pull/14227#discussion_r193740 ## datafusion/sqllogictest/test_files/copy.slt: ## @@ -538,26 +538,6 @@ select * from validate_arrow_file; 1 Foo 2 Bar -# Copy from dict encoded values to single

Re: [I] column types must match schema types occurs after `unnest_columns` on another column [datafusion]

2025-01-31 Thread via GitHub
duongcongtoai commented on issue #14218: URL: https://github.com/apache/datafusion/issues/14218#issuecomment-2627041906 This error happens when the provider's schema is not aligned with the schema of the record batch produced by it, in particular column `__delta_rs_path` has type Utf8 accor

Re: [PR] Feature: AggregateMonotonicity [datafusion]

2025-01-31 Thread via GitHub
berkaysynnada commented on code in PR #14271: URL: https://github.com/apache/datafusion/pull/14271#discussion_r1937319791 ## datafusion/sqllogictest/test_files/window.slt: ## @@ -5452,3 +5452,89 @@ order by c1, c2, rank1, rank2; statement ok drop table t1; + + +# Set-Monoton

[PR] chore: Adding commit activity badge [datafusion]

2025-01-31 Thread via GitHub
comphead opened a new pull request, #14386: URL: https://github.com/apache/datafusion/pull/14386 ## Which issue does this PR close? Closes #. ## Rationale for this change ## What changes are included in this PR? ## Are these changes tested?

Re: [PR] chore: Adding commit activity badge [datafusion]

2025-01-31 Thread via GitHub
comphead commented on PR #14386: URL: https://github.com/apache/datafusion/pull/14386#issuecomment-2627787043 Also adding open issues https://github.com/user-attachments/assets/9d3b9611-85a6-4949-b183-9851568359fc"; /> -- This is an automated message from the Apache Git Servic

[PR] Fix join type coercion [datafusion]

2025-01-31 Thread via GitHub
alamb opened a new pull request, #14387: URL: https://github.com/apache/datafusion/pull/14387 ## Which issue does this PR close? - Closes https://github.com/apache/datafusion/issues/13510 ## Rationale for this change Fix a bug ## What changes are included in this PR?

Re: [I] Regression: `Invalid comparison operation: Utf8 == Utf8View` error during LEFT ANTI JOIN [datafusion]

2025-01-31 Thread via GitHub
alamb commented on issue #13510: URL: https://github.com/apache/datafusion/issues/13510#issuecomment-2627780385 BTW @rkrishn7 narrowed in on the fact that the inputs are `UNNAMED_TABLE` which is a key observation. Thank you @rkrishn7 🙏 -- This is an automated message from the Apache Git

Re: [PR] feat: Add array reading support to native_datafusion scan [datafusion-comet]

2025-01-31 Thread via GitHub
andygrove commented on PR #1324: URL: https://github.com/apache/datafusion-comet/pull/1324#issuecomment-2627792750 `DisableAQECometAsyncShuffleSuite` test `columnar shuffle on array` is failing when calling `comet::execution::shuffle::row::builder_to_array` due to: ``` org.apache.

Re: [I] Regression: `Invalid comparison operation: Utf8 == Utf8View` error during LEFT ANTI JOIN [datafusion]

2025-01-31 Thread via GitHub
alamb commented on issue #13510: URL: https://github.com/apache/datafusion/issues/13510#issuecomment-2627796240 Here is a fix: - https://github.com/apache/datafusion/pull/14387 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHu

Re: [PR] Add regexp_extract func [datafusion]

2025-01-31 Thread via GitHub
SKY-ALIN commented on code in PR #14282: URL: https://github.com/apache/datafusion/pull/14282#discussion_r1937604422 ## datafusion/functions/src/regex/regexpextract.rs: ## @@ -0,0 +1,289 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor li

Re: [I] Use parquet crate for decoding Parquet data into Arrow arrays [datafusion-comet]

2025-01-31 Thread via GitHub
andygrove commented on issue #1040: URL: https://github.com/apache/datafusion-comet/issues/1040#issuecomment-2627796948 Closing this since we now have code in main for using ParquetExec -- This is an automated message from the Apache Git Service. To respond to the message, please log on t

Re: [I] Use parquet crate for decoding Parquet data into Arrow arrays [datafusion-comet]

2025-01-31 Thread via GitHub
andygrove closed issue #1040: Use parquet crate for decoding Parquet data into Arrow arrays URL: https://github.com/apache/datafusion-comet/issues/1040 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go t

Re: [I] Support complex datatypes in Comet Scan [datafusion-comet]

2025-01-31 Thread via GitHub
andygrove closed issue #434: Support complex datatypes in Comet Scan URL: https://github.com/apache/datafusion-comet/issues/434 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [I] Support complex datatypes in Comet Scan [datafusion-comet]

2025-01-31 Thread via GitHub
andygrove commented on issue #434: URL: https://github.com/apache/datafusion-comet/issues/434#issuecomment-2627799934 @mattwparas We are now actively working on supporting reading complex types from Parquet. We have this partially working in main behind feature flags, and the epic to track

Re: [PR] chore: Adding commit activity badge [datafusion]

2025-01-31 Thread via GitHub
comphead merged PR #14386: URL: https://github.com/apache/datafusion/pull/14386 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

Re: [PR] Fix regression list Type Coercion List with inner type struct which has large/view types [datafusion]

2025-01-31 Thread via GitHub
alamb closed pull request #14385: Fix regression list Type Coercion List with inner type struct which has large/view types URL: https://github.com/apache/datafusion/pull/14385 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and u

Re: [I] datafusion-cli not installed [datafusion]

2025-01-31 Thread via GitHub
comphead closed issue #9294: datafusion-cli not installed URL: https://github.com/apache/datafusion/issues/9294 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe,

Re: [I] datafusion-cli not installed [datafusion]

2025-01-31 Thread via GitHub
comphead commented on issue #9294: URL: https://github.com/apache/datafusion/issues/9294#issuecomment-2627807294 Closing the issue as it doesn't seem to require any actions anymore, feel free to reopen if needed -- This is an automated message from the Apache Git Service. To respond to th

Re: [I] Release DataFusion `45.0.0` [datafusion]

2025-01-31 Thread via GitHub
alamb commented on issue #14008: URL: https://github.com/apache/datafusion/issues/14008#issuecomment-2627806359 Here is a proposed fix for https://github.com/apache/datafusion/issues/13510 - https://github.com/apache/datafusion/pull/14387 -- This is an automated message from the Apache

Re: [PR] feat: Speed up `struct` and `named_struct` using `invoke_with_args` [datafusion]

2025-01-31 Thread via GitHub
pepijnve commented on code in PR #14276: URL: https://github.com/apache/datafusion/pull/14276#discussion_r1937117565 ## datafusion/functions/src/core/named_struct.rs: ## @@ -203,12 +137,19 @@ impl ScalarUDFImpl for NamedStructFunc { } -fn invoke_batch(

Re: [PR] feat: Speed up `struct` and `named_struct` using `invoke_with_args` [datafusion]

2025-01-31 Thread via GitHub
alamb commented on PR #14276: URL: https://github.com/apache/datafusion/pull/14276#issuecomment-2627005764 I also took the liberty of merging up from main to get the CI to run again. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to G

Re: [PR] Remove use of deprecated dict_id in datafusion-proto (#14173) [datafusion]

2025-01-31 Thread via GitHub
alamb commented on code in PR #14227: URL: https://github.com/apache/datafusion/pull/14227#discussion_r1937107757 ## datafusion/proto-common/proto/datafusion_common.proto: ## @@ -108,8 +108,7 @@ message Field { // for complex data types like structs, unions repeated Field

Re: [I] Type Coercion fails for List with inner type struct which has large/view types [datafusion]

2025-01-31 Thread via GitHub
alamb commented on issue #14154: URL: https://github.com/apache/datafusion/issues/14154#issuecomment-2627404261 Ok, I have a datafusion only reproducer: ```sql create or replace table t as values ( 100, -- column1 int (so the case isn'

Re: [I] `Case` coercion of Structs loses field names [datafusion]

2025-01-31 Thread via GitHub
ion-elgreco commented on issue #14383: URL: https://github.com/apache/datafusion/issues/14383#issuecomment-2627405210 I have actually seen this, but thought it was a visual bug 😮 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

Re: [PR] bug: Fix NULL handling in array_slice, introduce `NullHandling` enum to `Signature` [datafusion]

2025-01-31 Thread via GitHub
jkosh44 commented on code in PR #14289: URL: https://github.com/apache/datafusion/pull/14289#discussion_r1937340392 ## datafusion/physical-expr/src/scalar_function.rs: ## @@ -186,6 +187,15 @@ impl PhysicalExpr for ScalarFunctionExpr { .map(|e| e.evaluate(batch))

  1   2   3   >