Re: [I] RecursionLimitedExceeded received for some sqlite test queries [datafusion]

2025-01-12 Thread via GitHub
tlm365 commented on issue #14091: URL: https://github.com/apache/datafusion/issues/14091#issuecomment-2585649879 Looks like the error is coming from `sqlparser-rs` and not `datafusion` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[PR] Add two new methods in ScalarFunction `return_type_from_args` and `is_nullable_from_args_nullable` [datafusion]

2025-01-12 Thread via GitHub
jayzhan211 opened a new pull request, #14094: URL: https://github.com/apache/datafusion/pull/14094 ## Which issue does this PR close? Part of #13717 ## Rationale for this change ## What changes are included in this PR? ## Are these changes

Re: [PR] Add two new methods in ScalarFunction `return_type_from_args` and `is_nullable_from_args_nullable` [datafusion]

2025-01-12 Thread via GitHub
jayzhan211 commented on code in PR #14094: URL: https://github.com/apache/datafusion/pull/14094#discussion_r1912405518 ## datafusion/expr/src/udf.rs: ## @@ -209,10 +214,15 @@ impl ScalarUDF { self.inner.invoke(args) } +#[allow(deprecated)] pub fn is_null

Re: [PR] Add two new methods in ScalarFunction `return_type_from_args` and `is_nullable_from_args_nullable` [datafusion]

2025-01-12 Thread via GitHub
jayzhan211 commented on code in PR #14094: URL: https://github.com/apache/datafusion/pull/14094#discussion_r1912405731 ## datafusion/expr/src/udf.rs: ## @@ -342,6 +352,14 @@ pub struct ScalarFunctionArgs<'a> { pub return_type: &'a DataType, } +#[derive(Debug)] +pub struc

Re: [PR] Add two new methods in ScalarFunction `return_type_from_args` and `is_nullable_from_args_nullable` [datafusion]

2025-01-12 Thread via GitHub
jayzhan211 commented on code in PR #14094: URL: https://github.com/apache/datafusion/pull/14094#discussion_r1912406060 ## datafusion/functions/src/core/named_struct.rs: ## @@ -44,11 +44,18 @@ fn named_struct_expr(args: &[ColumnarValue]) -> Result { .chunks_exact(2)

Re: [PR] Add two new methods in ScalarFunction `return_type_from_args` and `is_nullable_from_args_nullable` [datafusion]

2025-01-12 Thread via GitHub
jayzhan211 commented on code in PR #14094: URL: https://github.com/apache/datafusion/pull/14094#discussion_r1912405839 ## datafusion/functions/src/core/arrow_cast.rs: ## @@ -86,22 +87,36 @@ impl ScalarUDFImpl for ArrowCastFunc { } fn return_type(&self, _arg_types: &[

Re: [PR] chore: move `SanityChecker` into `physical-optimizer` crate [datafusion]

2025-01-12 Thread via GitHub
alamb commented on PR #14083: URL: https://github.com/apache/datafusion/pull/14083#issuecomment-2585695133 Converting to draft given the dependency situation we have uncovered -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub an

Re: [PR] Move JoinSelection into datafusion-physical-optimizer crate (#14073) [datafusion]

2025-01-12 Thread via GitHub
alamb commented on code in PR #14085: URL: https://github.com/apache/datafusion/pull/14085#discussion_r1912430563 ## datafusion/physical-optimizer/src/join_selection.rs: ## @@ -903,7 +900,7 @@ mod tests_statistical { let original_schema = join.schema();

Re: [PR] Add two new methods in ScalarFunction `return_type_from_args` and `is_nullable_from_args_nullable` [datafusion]

2025-01-12 Thread via GitHub
alamb commented on code in PR #14094: URL: https://github.com/apache/datafusion/pull/14094#discussion_r1912432276 ## datafusion/expr/src/udf.rs: ## @@ -342,6 +352,14 @@ pub struct ScalarFunctionArgs<'a> { pub return_type: &'a DataType, } +#[derive(Debug)] +pub struct Ret

Re: [PR] Minor: use hashmap for `physical_exprs_contains` and move `PhysicalExprRef` to `physical-expr-common` [datafusion]

2025-01-12 Thread via GitHub
alamb commented on code in PR #14081: URL: https://github.com/apache/datafusion/pull/14081#discussion_r1912434472 ## datafusion/physical-expr/src/physical_expr.rs: ## @@ -48,31 +47,24 @@ pub fn physical_exprs_bag_equal( lhs: &[Arc], rhs: &[Arc], ) -> bool { -// TO

Re: [I] RecursionLimitedExceeded received for some sqlite test queries [datafusion]

2025-01-12 Thread via GitHub
tlm365 commented on issue #14091: URL: https://github.com/apache/datafusion/issues/14091#issuecomment-2585666884 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To uns

Re: [PR] Add two new methods in ScalarFunction `return_type_from_args` and `is_nullable_from_args_nullable` [datafusion]

2025-01-12 Thread via GitHub
findepi commented on PR #14094: URL: https://github.com/apache/datafusion/pull/14094#issuecomment-2585670790 As stated in https://github.com/apache/datafusion/pull/13717#discussion_r1888341956 , this new method doesn't necessarily simplify anything. Can you please fill "Rationale

Re: [I] Add a hint about normalization in error message [datafusion]

2025-01-12 Thread via GitHub
findepi commented on issue #14089: URL: https://github.com/apache/datafusion/issues/14089#issuecomment-2585678389 > I think the root cause is that SQL is largely case insensitive SQL spec seems to say that unquoted (not delimited) identifiers are equivalent to their upper-case delimi

Re: [PR] Add H2O.ai Database-like Ops benchmark to dfbench (groupby support) [datafusion]

2025-01-12 Thread via GitHub
alamb merged PR #13996: URL: https://github.com/apache/datafusion/pull/13996 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [I] Jan 1, 2025: This week(s) in DataFusion [datafusion]

2025-01-12 Thread via GitHub
alamb commented on issue #13970: URL: https://github.com/apache/datafusion/issues/13970#issuecomment-2585687492 Some really great work by @zhuqi-lucas to add the h20.ai benchmark to the repo: - https://github.com/apache/datafusion/pull/13996 Also thank you @2010YOUY01 for the ass

[PR] Add recursion limit configuration to `DFParser` [datafusion]

2025-01-12 Thread via GitHub
tlm365 opened a new pull request, #14095: URL: https://github.com/apache/datafusion/pull/14095 ## Which issue does this PR close? Closes #14091. ## Rationale for this change ## What changes are included in this PR? Set parser config, now support up to 1

Re: [PR] chore: deprecate `ValuesExec` in favour of `MemoryExec` [datafusion]

2025-01-12 Thread via GitHub
alamb commented on PR #14032: URL: https://github.com/apache/datafusion/pull/14032#issuecomment-2585689667 Thanks again @jonathanc-n and @jayzhan211 -- I merged up to resolve a conflict and will plan to merge when the tests are clean -- This is an automated message from the Apache Git Se

Re: [PR] Add H2O.ai Database-like Ops benchmark to dfbench (groupby support) [datafusion]

2025-01-12 Thread via GitHub
zhuqi-lucas commented on PR #13996: URL: https://github.com/apache/datafusion/pull/13996#issuecomment-2585689067 Thank you @alamb ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific co

Re: [I] Async User Defined Functions (UDF) [datafusion]

2025-01-12 Thread via GitHub
alamb commented on issue #6518: URL: https://github.com/apache/datafusion/issues/6518#issuecomment-2585692537 BTW @Omega359 pointed out in Discord that there is something seemingly similar looking in Arroyo (I think thanks to @ https://github.com/ArroyoSystems/arroyo/blob/4014db4824

Re: [I] Async User Defined Functions (UDF) [datafusion]

2025-01-12 Thread via GitHub
alamb commented on issue #6518: URL: https://github.com/apache/datafusion/issues/6518#issuecomment-2585694146 > ```rust > let sql = r#" > CREATE FUNCTION an_llm_function(STRING) > RETURNS STRING > LANGUAGE MODEL > AS 'microsoft/phi-4' > "#; > > ctx.sql(sql).await?.sho

Re: [PR] Add telemetry.sh to list of use cases [datafusion]

2025-01-12 Thread via GitHub
alamb merged PR #14090: URL: https://github.com/apache/datafusion/pull/14090 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [I] Jan 1, 2025: This week(s) in DataFusion [datafusion]

2025-01-12 Thread via GitHub
alamb commented on issue #13970: URL: https://github.com/apache/datafusion/issues/13970#issuecomment-2585709942 A cool PR from @chenkovsky adding metadata support: - https://github.com/apache/datafusion/pull/14057 -- This is an automated message from the Apache Git Service. To respond t

Re: [PR] feat: metadata columns [datafusion]

2025-01-12 Thread via GitHub
alamb commented on code in PR #14057: URL: https://github.com/apache/datafusion/pull/14057#discussion_r1912435293 ## datafusion/common/src/dfschema.rs: ## @@ -106,37 +106,175 @@ pub type DFSchemaRef = Arc; /// ``` #[derive(Debug, Clone, PartialEq, Eq)] pub struct DFSchema { +

Re: [I] metadata column support [datafusion]

2025-01-12 Thread via GitHub
alamb commented on issue #13975: URL: https://github.com/apache/datafusion/issues/13975#issuecomment-2585710471 @nishikinocurtis -- perhaps you can look at the proposed PR / API here: - https://github.com/apache/datafusion/pull/14057 -- This is an automated message from the Apa

Re: [PR] feat: metadata columns [datafusion]

2025-01-12 Thread via GitHub
alamb commented on PR #14057: URL: https://github.com/apache/datafusion/pull/14057#issuecomment-2585711106 Something other people have asked for in the past (whihc I can't find now) is the ability to know what file a particular row came from in a listing table that combines multiple files

Re: [I] Deprecate `ValuesExec` and use `MemoryExec` [datafusion]

2025-01-12 Thread via GitHub
alamb closed issue #13968: Deprecate `ValuesExec` and use `MemoryExec` URL: https://github.com/apache/datafusion/issues/13968 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [I] No efficient way to load a subset of files from partitioned table [datafusion]

2025-01-12 Thread via GitHub
alamb commented on issue #8906: URL: https://github.com/apache/datafusion/issues/8906#issuecomment-2585712406 Another approach could be if we support metadata columns here: - https://github.com/apache/datafusion/pull/14057 Would be to expose the filename as a metadata column from a

Re: [PR] chore: deprecate `ValuesExec` in favour of `MemoryExec` [datafusion]

2025-01-12 Thread via GitHub
alamb commented on PR #14032: URL: https://github.com/apache/datafusion/pull/14032#issuecomment-2585712553 🚀 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe

Re: [PR] chore: deprecate `ValuesExec` in favour of `MemoryExec` [datafusion]

2025-01-12 Thread via GitHub
alamb merged PR #14032: URL: https://github.com/apache/datafusion/pull/14032 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] Improve performance of `find_in_set` function [datafusion]

2025-01-12 Thread via GitHub
alamb commented on PR #14020: URL: https://github.com/apache/datafusion/pull/14020#issuecomment-2585712897 Looks like this PR is ready to go so I'll merge it in. Let's handle any follow on work with subsequent PRs. Thanks @tlm365 @comphead and @jayzhan211 -- This is an automated m

Re: [PR] minor: Add link to example for `AsyncSchemaProvider` [datafusion]

2025-01-12 Thread via GitHub
alamb commented on PR #14062: URL: https://github.com/apache/datafusion/pull/14062#issuecomment-2585713025 Thanks again @berkaysynnada -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specifi

Re: [PR] Improve performance of `find_in_set` function [datafusion]

2025-01-12 Thread via GitHub
alamb merged PR #14020: URL: https://github.com/apache/datafusion/pull/14020 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] minor: Add link to example for `AsyncSchemaProvider` [datafusion]

2025-01-12 Thread via GitHub
alamb merged PR #14062: URL: https://github.com/apache/datafusion/pull/14062 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] feat: add support for `LogicalPlan::DML(...)` serde [datafusion]

2025-01-12 Thread via GitHub
alamb commented on code in PR #14079: URL: https://github.com/apache/datafusion/pull/14079#discussion_r1912440272 ## datafusion/proto/tests/cases/roundtrip_logical_plan.rs: ## @@ -375,6 +375,32 @@ async fn roundtrip_logical_plan_sort() -> Result<()> { Ok(()) } +#[tokio::

Re: [I] Add support for complex nested types in List Arrays and Struct Arrays in `avro_to_arrow` [datafusion]

2025-01-12 Thread via GitHub
alamb commented on issue #11342: URL: https://github.com/apache/datafusion/issues/11342#issuecomment-2585728667 Started collecting this and other avro related items in https://github.com/apache/datafusion/issues/14096 -- This is an automated message from the Apache Git Service. To respond

Re: [PR] Add two new methods in ScalarFunction `return_type_from_args` and `is_nullable_from_args_nullable` [datafusion]

2025-01-12 Thread via GitHub
jayzhan211 commented on PR #14094: URL: https://github.com/apache/datafusion/pull/14094#issuecomment-2585751923 > I am also not sure about only supporting string args, that is likely a regression in behavior for some users (For example, maybe they look for constant integers as well)

Re: [PR] Add two new methods in ScalarFunction `return_type_from_args` and `is_nullable_from_args_nullable` [datafusion]

2025-01-12 Thread via GitHub
jayzhan211 commented on PR #14094: URL: https://github.com/apache/datafusion/pull/14094#issuecomment-2585752891 ```rust #[derive(Debug)] pub enum ReturnTypeArgs<'a> { /// information known at logical planning time /// Note you can get get type and nullability for each arg

Re: [I] Add a hint about normalization in error message [datafusion]

2025-01-12 Thread via GitHub
alamb commented on issue #14089: URL: https://github.com/apache/datafusion/issues/14089#issuecomment-2585752736 I agree with @findepi -- I think of the automatic lowercasing in DataFusion as the implementation detail of how the case-normalization is implemented. -- This is an automated

Re: [I] Optimized spill file format [datafusion]

2025-01-12 Thread via GitHub
alamb commented on issue #14078: URL: https://github.com/apache/datafusion/issues/14078#issuecomment-2585752275 FYI I am working with @@totoroyyb on the arrow IPC work, in case anyone is interested or has time to help: - https://github.com/apache/arrow-rs/pull/6938#issuecomment-2585751118

Re: [PR] feat: metadata columns [datafusion]

2025-01-12 Thread via GitHub
adriangb commented on PR #14057: URL: https://github.com/apache/datafusion/pull/14057#issuecomment-2585758539 We want this as well to hide "special" internal columns we create to speed up JSON columns. +1 for the feature! -- This is an automated message from the Apache Git Service. To res

Re: [PR] Add `ColumnStatistics::Sum` [datafusion]

2025-01-12 Thread via GitHub
gatesn commented on code in PR #14074: URL: https://github.com/apache/datafusion/pull/14074#discussion_r1912472076 ## datafusion/common/src/stats.rs: ## @@ -170,24 +170,63 @@ impl Precision { pub fn add(&self, other: &Precision) -> Precision { match (self, other)

Re: [I] [EPIC] Run full sqllogic / sqlite test suite against DataFusion [datafusion]

2025-01-12 Thread via GitHub
Omega359 commented on issue #13811: URL: https://github.com/apache/datafusion/issues/13811#issuecomment-2585787881 Issue for avg(distinct) support - https://github.com/apache/datafusion/issues/2408 This I believe would be a useful addition if someone is able to take it on. -- This

Re: [I] [EPIC] Run full sqllogic / sqlite test suite against DataFusion [datafusion]

2025-01-12 Thread via GitHub
Omega359 commented on issue #13811: URL: https://github.com/apache/datafusion/issues/13811#issuecomment-2585788355 Many of the cast to Int issues are caused by the fact that in sqlite int is a flexible width data type - 0, 1, 2, 3, 4, 6, or 8 bytes depending on the magnitude of the value.

Re: [I] Optimize `Token::make_word` [datafusion-sqlparser-rs]

2025-01-12 Thread via GitHub
LorrensP-2158466 commented on issue #1588: URL: https://github.com/apache/datafusion-sqlparser-rs/issues/1588#issuecomment-2585725385 I don't know if this is useful, but I remember watching a video from Strager: [Perfect Hash Tables](https://youtu.be/DMQ_HcNSOAI?si=pTcWhy_D2-wBIkHb). He ha

[I] [Epic] A Collection of AVRO related tickets [datafusion]

2025-01-12 Thread via GitHub
alamb opened a new issue, #14096: URL: https://github.com/apache/datafusion/issues/14096 ### Is your feature request related to a problem or challenge? _No response_ ### Describe the solution you'd like _No response_ ### Describe alternatives you've considered

Re: [PR] Add two new methods in ScalarFunction `return_type_from_args` and `is_nullable_from_args_nullable` [datafusion]

2025-01-12 Thread via GitHub
jayzhan211 commented on PR #14094: URL: https://github.com/apache/datafusion/pull/14094#issuecomment-2585750558 > Multiple APIs (Nullability and return type) Great, I also want this too. -- This is an automated message from the Apache Git Service. To respond to the message, please l

[PR] Feat/use uv python management [datafusion-python]

2025-01-12 Thread via GitHub
timsaucer opened a new pull request, #994: URL: https://github.com/apache/datafusion-python/pull/994 # Which issue does this PR close? Closes #977 # Rationale for this change As described in the issue, there is a confusing mix of pip and conda for dependency management.

[I] Consider using upstream arrow-avro reader [datafusion]

2025-01-12 Thread via GitHub
alamb opened a new issue, #14097: URL: https://github.com/apache/datafusion/issues/14097 ### Is your feature request related to a problem or challenge? Currently DataFusion has its own avro --> arrow implementation in: https://github.com/apache/datafusion/blob/54a5d3fd3f98997f048c2e3e

Re: [PR] WIP Upgrade to arrow-rs/parquet `54.0.0` [datafusion]

2025-01-12 Thread via GitHub
alamb commented on PR #13663: URL: https://github.com/apache/datafusion/pull/13663#issuecomment-2585734688 > I would appreciate this PR getting in before the next datafusion release 🙏 Yes I hope to -- if anyone has time to help get it in shape / make a new PR it would be most apprecia

Re: [PR] WIP Upgrade to arrow-rs/parquet `54.0.0` [datafusion]

2025-01-12 Thread via GitHub
alamb commented on code in PR #13663: URL: https://github.com/apache/datafusion/pull/13663#discussion_r1912451041 ## datafusion/common/src/config.rs: ## @@ -1653,10 +1649,6 @@ config_namespace_with_hashmap! { /// Sets bloom filter number of distinct values. If NULL, use

Re: [PR] feat: add support for `LogicalPlan::DML(...)` serde [datafusion]

2025-01-12 Thread via GitHub
milenkovicm commented on code in PR #14079: URL: https://github.com/apache/datafusion/pull/14079#discussion_r1912456843 ## datafusion/proto/tests/cases/roundtrip_logical_plan.rs: ## @@ -375,6 +375,32 @@ async fn roundtrip_logical_plan_sort() -> Result<()> { Ok(()) } +#[t

Re: [PR] feat: add support for `LogicalPlan::DML(...)` serde [datafusion]

2025-01-12 Thread via GitHub
milenkovicm commented on PR #14079: URL: https://github.com/apache/datafusion/pull/14079#issuecomment-2585744176 it should be resolved now, thanks @alamb -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above t

Re: [PR] Add a ColumnStatistics::Sum [datafusion]

2025-01-12 Thread via GitHub
alamb commented on code in PR #14074: URL: https://github.com/apache/datafusion/pull/14074#discussion_r1912441506 ## datafusion/physical-plan/src/joins/cross_join.rs: ## @@ -636,18 +661,25 @@ mod tests { distinct_count: Precision::Exact(5),

Re: [PR] Minor: use hashmap for `physical_exprs_contains` and move `PhysicalExprRef` to `physical-expr-common` [datafusion]

2025-01-12 Thread via GitHub
jayzhan211 commented on PR #14081: URL: https://github.com/apache/datafusion/pull/14081#issuecomment-2585754652 Thanks @alamb -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment

Re: [PR] Minor: use hashmap for `physical_exprs_contains` and move `PhysicalExprRef` to `physical-expr-common` [datafusion]

2025-01-12 Thread via GitHub
jayzhan211 merged PR #14081: URL: https://github.com/apache/datafusion/pull/14081 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dat

Re: [PR] feat: metadata columns [datafusion]

2025-01-12 Thread via GitHub
adriangb commented on PR #14057: URL: https://github.com/apache/datafusion/pull/14057#issuecomment-2585764005 My only question is if "metadata" is the right name for these columns? Could it be "system" columns or something like that? -- This is an automated message from the Apache Git Ser

[I] Spaceship operator (<=>) not supported [datafusion]

2025-01-12 Thread via GitHub
ion-elgreco opened a new issue, #14098: URL: https://github.com/apache/datafusion/issues/14098 ### Is your feature request related to a problem or challenge? I am trying to add support for Generated Columns in Delta-rs. For this to be functional, we need the spaceship operator to be i

Re: [PR] Add `ColumnStatistics::Sum` [datafusion]

2025-01-12 Thread via GitHub
gatesn commented on code in PR #14074: URL: https://github.com/apache/datafusion/pull/14074#discussion_r1912472076 ## datafusion/common/src/stats.rs: ## @@ -170,24 +170,63 @@ impl Precision { pub fn add(&self, other: &Precision) -> Precision { match (self, other)

Re: [PR] Correctly look for end delimiter dollar quoted string [datafusion-sqlparser-rs]

2025-01-12 Thread via GitHub
hansott commented on PR #1650: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1650#issuecomment-2585844156 @iffyio Updated, thanks for your feedback! 🥇 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[I] sqlite query results in Internal error: predicate did not evaluate to an array [datafusion]

2025-01-12 Thread via GitHub
Omega359 opened a new issue, #14099: URL: https://github.com/apache/datafusion/issues/14099 ### Describe the bug Seen in datafusion-testing/data/sqlite/random/expr/slt_good_102.slt: ``` # Datafusion - Datafusion expected results: # Datafusion - Expected - -704522 query

Re: [I] [EPIC] A collection of items to improve developer / CI speed [datafusion]

2025-01-12 Thread via GitHub
Omega359 commented on issue #13813: URL: https://github.com/apache/datafusion/issues/13813#issuecomment-2585836320 > good find! > > > 2024 edition though... > > but maybe it's not a bad thing? https://doc.rust-lang.org/edition-guide/editions/#editions-do-not-split-the-ecosystem

Re: [I] sqlite query results in Internal error: predicate did not evaluate to an array [datafusion]

2025-01-12 Thread via GitHub
Omega359 commented on issue #14099: URL: https://github.com/apache/datafusion/issues/14099#issuecomment-2585829889 This occurs about 60 times in the sqlite tests. A few more examples: ```sql SELECT ALL - 42 * ( COUNT ( * ) ) AS col2, - CASE WHEN NOT + 39 IS NOT NULL THEN + SUM ( DI

Re: [I] sql odd case of rounding compared to duckdb and postgresql [datafusion]

2025-01-12 Thread via GitHub
Omega359 closed issue #13781: sql odd case of rounding compared to duckdb and postgresql URL: https://github.com/apache/datafusion/issues/13781 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the sp

Re: [I] sql result discrepency with sqlite, postgres and duckdb bug #2 [datafusion]

2025-01-12 Thread via GitHub
Omega359 closed issue #13782: sql result discrepency with sqlite, postgres and duckdb bug #2 URL: https://github.com/apache/datafusion/issues/13782 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to th

Re: [I] sql odd case of rounding compared to duckdb and postgresql [datafusion]

2025-01-12 Thread via GitHub
Omega359 commented on issue #13781: URL: https://github.com/apache/datafusion/issues/13781#issuecomment-2585831002 Resolved as fixed with change 'as REAL' to 'AS FLOAT8' in sqlite test files -- This is an automated message from the Apache Git Service. To respond to the message, please log

Re: [I] sql result discrepency with sqlite, postgres and duckdb [datafusion]

2025-01-12 Thread via GitHub
Omega359 closed issue #13780: sql result discrepency with sqlite, postgres and duckdb URL: https://github.com/apache/datafusion/issues/13780 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the speci

Re: [I] sql result discrepency with sqlite, postgres and duckdb bug #2 [datafusion]

2025-01-12 Thread via GitHub
Omega359 commented on issue #13782: URL: https://github.com/apache/datafusion/issues/13782#issuecomment-2585831103 Resolved as fixed with change 'as REAL' to 'AS FLOAT8' in sqlite test files -- This is an automated message from the Apache Git Service. To respond to the message, please log

Re: [I] sql result discrepency with sqlite, postgres and duckdb [datafusion]

2025-01-12 Thread via GitHub
Omega359 commented on issue #13780: URL: https://github.com/apache/datafusion/issues/13780#issuecomment-2585831382 Resolved as fixed with change 'as REAL' to 'AS FLOAT8' in sqlite test files -- This is an automated message from the Apache Git Service. To respond to the message, please log

[PR] fix: correct LZ0 to LZO in compression options [datafusion-python]

2025-01-12 Thread via GitHub
kosiew opened a new pull request, #995: URL: https://github.com/apache/datafusion-python/pull/995 # Which issue does this PR close? Closes #979. # Rationale for this change Correct confusion of lz0 for lzo # What changes are included in this PR?

Re: [PR] fix: incorrect NATURAL/USING JOIN schema [datafusion]

2025-01-12 Thread via GitHub
jonahgao commented on code in PR #14102: URL: https://github.com/apache/datafusion/pull/14102#discussion_r1912748655 ## datafusion/expr/src/utils.rs: ## @@ -379,14 +379,12 @@ fn get_exprs_except_skipped( } } -/// Resolves an `Expr::Wildcard` to a collection of `Expr::Col

Re: [PR] fix: incorrect NATURAL/USING JOIN schema [datafusion]

2025-01-12 Thread via GitHub
jonahgao commented on code in PR #14102: URL: https://github.com/apache/datafusion/pull/14102#discussion_r1912749696 ## datafusion/expr/src/utils.rs: ## @@ -705,27 +711,20 @@ pub fn exprlist_to_fields<'a>( .map(|e| match e { Expr::Wildcard { qualifier, opti

Re: [PR] fix: incorrect NATURAL/USING JOIN schema [datafusion]

2025-01-12 Thread via GitHub
jonahgao commented on code in PR #14102: URL: https://github.com/apache/datafusion/pull/14102#discussion_r1912741556 ## datafusion/expr/src/utils.rs: ## @@ -379,14 +379,12 @@ fn get_exprs_except_skipped( } } -/// Resolves an `Expr::Wildcard` to a collection of `Expr::Col

[PR] fix: incorrect NATURAL/USING JOIN schema [datafusion]

2025-01-12 Thread via GitHub
jonahgao opened a new pull request, #14102: URL: https://github.com/apache/datafusion/pull/14102 ## Which issue does this PR close? Closes #14058. ## Rationale for this change ## What changes are included in this PR? When expanding unqualified wildcard

Re: [PR] fix: speed up ConcurrentHashMap#computeIfAbsent of JDK8 [datafusion-comet]

2025-01-12 Thread via GitHub
SteNicholas commented on PR #1245: URL: https://github.com/apache/datafusion-comet/pull/1245#issuecomment-2586329462 @mbutrovich, is it better to close this pull request and reopen if needed? -- This is an automated message from the Apache Git Service. To respond to the message, please lo

Re: [PR] fix: incorrect NATURAL/USING JOIN schema [datafusion]

2025-01-12 Thread via GitHub
jonahgao commented on code in PR #14102: URL: https://github.com/apache/datafusion/pull/14102#discussion_r1912755777 ## datafusion/expr/src/utils.rs: ## @@ -379,14 +379,12 @@ fn get_exprs_except_skipped( } } -/// Resolves an `Expr::Wildcard` to a collection of `Expr::Col

Re: [PR] Add recursion limit configuration to `DFParser` [datafusion]

2025-01-12 Thread via GitHub
2010YOUY01 commented on PR #14095: URL: https://github.com/apache/datafusion/pull/14095#issuecomment-2586368050 Thank you, this fix makes sense to me. I think we can add the bug reproducer to `sqllogictest` for the regression test. Although this will be covered in the extended tests,

Re: [PR] fix: incorrect NATURAL/USING JOIN schema [datafusion]

2025-01-12 Thread via GitHub
jonahgao commented on code in PR #14102: URL: https://github.com/apache/datafusion/pull/14102#discussion_r1912765058 ## datafusion/expr/src/utils.rs: ## @@ -705,27 +711,20 @@ pub fn exprlist_to_fields<'a>( .map(|e| match e { Expr::Wildcard { qualifier, opti

Re: [PR] feat: handle different placing of type names [datafusion-sqlparser-rs]

2025-01-12 Thread via GitHub
github-actions[bot] closed pull request #1470: feat: handle different placing of type names URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1470 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

Re: [I] parquet RowGroup pruning for `Dictionary(Decimal)` type incorrect [datafusion]

2025-01-12 Thread via GitHub
kosiew commented on issue #13821: URL: https://github.com/apache/datafusion/issues/13821#issuecomment-2586088480 Would it be a good approach to fix this by coercing both sides of BinaryExpr like `col = CAST(1 AS DECIMAL(4, 1)))` to a common type? -- This is an automated message fro

Re: [PR] fix: correct LZ0 to LZO in compression options [datafusion-python]

2025-01-12 Thread via GitHub
kosiew commented on PR #995: URL: https://github.com/apache/datafusion-python/pull/995#issuecomment-2586418645 Encountered weird CI build error ``` Run ruff check --output-format=github python/ Would reformat: python/tests/test_functions.py 1 file would be reformatted, 35 fil

Re: [PR] feat: add support for array_remove expression [datafusion-comet]

2025-01-12 Thread via GitHub
andygrove merged PR #1179: URL: https://github.com/apache/datafusion-comet/pull/1179 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@

Re: [PR] feat: add support for array_remove expression [datafusion-comet]

2025-01-12 Thread via GitHub
andygrove commented on code in PR #1179: URL: https://github.com/apache/datafusion-comet/pull/1179#discussion_r1912521481 ## spark/src/test/scala/org/apache/comet/CometExpressionSuite.scala: ## @@ -2529,4 +2529,21 @@ class CometExpressionSuite extends CometTestBase with Adaptiv

Re: [PR] Add `ColumnStatistics::Sum` [datafusion]

2025-01-12 Thread via GitHub
berkaysynnada commented on PR #14074: URL: https://github.com/apache/datafusion/pull/14074#issuecomment-2585899506 > FYI @suremarc @berkaysynnada / @ozankabak as this changes statistics and I think you are already working on things related to that: We've started to refactor. The desig

Re: [PR] Correctly look for end delimiter dollar quoted string [datafusion-sqlparser-rs]

2025-01-12 Thread via GitHub
iffyio merged PR #1650: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1650 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr

Re: [I] String sqllogictest error when running the test with `complete` [datafusion]

2025-01-12 Thread via GitHub
Omega359 commented on issue #12752: URL: https://github.com/apache/datafusion/issues/12752#issuecomment-2585932150 I believe this ticket can be closed with the fix in #13142. I didn't encounter this with the sqlite tests. @jayzhan211 -- This is an automated message from the Apache Git Se

Re: [PR] Simplify the return type of `sql_select_to_rex()` [datafusion]

2025-01-12 Thread via GitHub
jonahgao commented on PR #14088: URL: https://github.com/apache/datafusion/pull/14088#issuecomment-2586031501 Thanks @findepi for the review. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the s

Re: [PR] Simplify the return type of `sql_select_to_rex()` [datafusion]

2025-01-12 Thread via GitHub
jonahgao merged PR #14088: URL: https://github.com/apache/datafusion/pull/14088 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

Re: [PR] feat: add support for array_remove expression [datafusion-comet]

2025-01-12 Thread via GitHub
andygrove commented on code in PR #1179: URL: https://github.com/apache/datafusion-comet/pull/1179#discussion_r1912546988 ## spark/src/test/scala/org/apache/comet/CometExpressionSuite.scala: ## @@ -2529,4 +2529,21 @@ class CometExpressionSuite extends CometTestBase with Adaptiv

Re: [PR] feat: metadata columns [datafusion]

2025-01-12 Thread via GitHub
adriangb commented on PR #14057: URL: https://github.com/apache/datafusion/pull/14057#issuecomment-2585948359 I guess the naming doesn't really hurt our use case so okay let's go with that if it means something in the domain in general 👍🏻 -- This is an automated message from the Apache G

[I] Update rust version to 1.81.0 (MSRV) [datafusion]

2025-01-12 Thread via GitHub
Omega359 opened a new issue, #14100: URL: https://github.com/apache/datafusion/issues/14100 ### Is your feature request related to a problem or challenge? rust 1.84.0 is out, as per policy outlined in the README.md file DataFusion supports the last 4 minor Rust versions. That means bu

Re: [I] Use Row Format in SortExec [datafusion]

2025-01-12 Thread via GitHub
Lordworms commented on issue #7053: URL: https://github.com/apache/datafusion/issues/7053#issuecomment-2586009190 Current design is 1. substitute `SendableRecordBatchStream` between `SortPreservingMergeExec` and `SortExec` to `RowOrColumnStream` ```Rust pub enum RowOrColumn {

Re: [I] Automatically run sqlitetests regularly (but not with all PRs) to DataFusion [datafusion]

2025-01-12 Thread via GitHub
Omega359 commented on issue #13967: URL: https://github.com/apache/datafusion/issues/13967#issuecomment-2585941891 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To u

Re: [I] String sqllogictest error when running the test with `complete` [datafusion]

2025-01-12 Thread via GitHub
berkaysynnada commented on issue #12752: URL: https://github.com/apache/datafusion/issues/12752#issuecomment-2585933832 `cargo test --test sqllogictests -- --complete` is still generating the same error -- This is an automated message from the Apache Git Service. To respond to the messag

[PR] test: Add explicit test for null and empty arrays with array_remove [datafusion-comet]

2025-01-12 Thread via GitHub
andygrove opened a new pull request, #1270: URL: https://github.com/apache/datafusion-comet/pull/1270 ## Which issue does this PR close? Part of https://github.com/apache/datafusion-comet/issues/1269 ## Rationale for this change Improve test coverage for a

Re: [PR] feat: metadata columns [datafusion]

2025-01-12 Thread via GitHub
Omega359 commented on PR #14057: URL: https://github.com/apache/datafusion/pull/14057#issuecomment-2585929480 Metadata column is the name I'm familiar with in other systems. For example, [spark](https://spark.apache.org/docs/3.5.4/api/java/org/apache/spark/sql/connector/catalog/MetadataColum

[PR] Add sqlite sqllogictest run to extended.yml [datafusion]

2025-01-12 Thread via GitHub
Omega359 opened a new pull request, #14101: URL: https://github.com/apache/datafusion/pull/14101 ## Which issue does this PR close? Closes #13967 ## Rationale for this change Run sqlite test suite on push to main. **##** What changes are included in this PR

[PR] Feat: Support array_intersect [datafusion-comet]

2025-01-12 Thread via GitHub
erenavsarogullari opened a new pull request, #1271: URL: https://github.com/apache/datafusion-comet/pull/1271 ## Which issue does this PR close? Related to Epic: https://github.com/apache/datafusion-comet/issues/1042 array_intersect: `select array_intersect(array(1, 2, 3), array(2, 3, 4

Re: [PR] Minor: Add a link to RecordBatchStreamAdapter to `SendableRecordBatchStream` [datafusion]

2025-01-12 Thread via GitHub
jonahgao merged PR #14084: URL: https://github.com/apache/datafusion/pull/14084 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf