Re: [I] Overflow happened on: -2147483648 % -1 [datafusion]

2025-03-05 Thread via GitHub
kazuyukitanimura commented on issue #14771: URL: https://github.com/apache/datafusion/issues/14771#issuecomment-2702976157 > Do we also need to consider the overflow behavior of -2147483648 / -1? No for data fusion, as it is matching Postgres behavior. But Yes for comet -- This is a

Re: [PR] feat: Add `datafusion-spark` crate [datafusion]

2025-03-05 Thread via GitHub
shehabgamin commented on PR #14392: URL: https://github.com/apache/datafusion/pull/14392#issuecomment-2703064824 > Would you have bandwidth to help with adding some initial tests in the Comet repo? I think if we have some examples then it will be easier for others to contribute. Yeah

Re: [PR] Document guidelines for physical operator yielding [datafusion]

2025-03-05 Thread via GitHub
berkaysynnada commented on code in PR #15030: URL: https://github.com/apache/datafusion/pull/15030#discussion_r1982848377 ## datafusion/physical-plan/src/execution_plan.rs: ## @@ -260,13 +260,30 @@ pub trait ExecutionPlan: Debug + DisplayAs + Send + Sync { /// used. //

Re: [PR] Document guidelines for physical operator yielding [datafusion]

2025-03-05 Thread via GitHub
berkaysynnada commented on code in PR #15030: URL: https://github.com/apache/datafusion/pull/15030#discussion_r1982854173 ## datafusion/physical-plan/src/execution_plan.rs: ## @@ -260,13 +260,30 @@ pub trait ExecutionPlan: Debug + DisplayAs + Send + Sync { /// used. //

Re: [I] Overflow happened on: -2147483648 % -1 [datafusion]

2025-03-05 Thread via GitHub
wForget commented on issue #14771: URL: https://github.com/apache/datafusion/issues/14771#issuecomment-2703080393 > No for data fusion, as it is matching with the Postgres behavior. But Yes for comet to be matched with Spark In spark, the `/` operation will be converted to `double/dec

Re: [PR] fix: nested window function [datafusion]

2025-03-05 Thread via GitHub
chenkovsky commented on PR #15033: URL: https://github.com/apache/datafusion/pull/15033#issuecomment-2702448317 > > @alamb I think I have to add a BFS visitor in sqlparse crate, how do you feel about it? > > i think that would mean it will take at least another month to fix this issu

Re: [I] Overflow happened on: -2147483648 % -1 [datafusion]

2025-03-05 Thread via GitHub
wForget commented on issue #14771: URL: https://github.com/apache/datafusion/issues/14771#issuecomment-2702649260 > I just saw this got fixed in arrow-rs [apache/arrow-rs#7159](https://github.com/apache/arrow-rs/pull/7159) Do we also need to consider the overflow behavior of `-2147483

Re: [PR] feat: add spark_signed_integer_remainder native function for compatibility with spark [datafusion-comet]

2025-03-05 Thread via GitHub
wForget closed pull request #1416: feat: add spark_signed_integer_remainder native function for compatibility with spark URL: https://github.com/apache/datafusion-comet/pull/1416 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub an

Re: [I] Overflow happened on: -2147483648 % -1 [datafusion-comet]

2025-03-05 Thread via GitHub
wForget closed issue #1412: Overflow happened on: -2147483648 % -1 URL: https://github.com/apache/datafusion-comet/issues/1412 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. T

Re: [I] Support Push down expression evaluation in `TableProviders` [datafusion]

2025-03-05 Thread via GitHub
adriangb commented on issue #14993: URL: https://github.com/apache/datafusion/issues/14993#issuecomment-2702641301 @cetra3 suggested that maybe this can be done with a rewrite of the PhysicalPlan? I guess the main issue would be that you don't know anything about the file except the file pa

Re: [PR] Fix array_has_all and array_has_any with empty array [datafusion]

2025-03-05 Thread via GitHub
alan910127 commented on code in PR #15039: URL: https://github.com/apache/datafusion/pull/15039#discussion_r1982608655 ## datafusion/functions-nested/src/array_has.rs: ## @@ -439,6 +439,14 @@ fn array_has_all_and_any_dispatch( ) -> Result { let haystack = as_generic_list_a

[PR] Order Requirement Analysis [datafusion-site]

2025-03-05 Thread via GitHub
akurmustafa opened a new pull request, #58: URL: https://github.com/apache/datafusion-site/pull/58 As per the [discussion](https://github.com/apache/datafusion/issues/14836#issuecomment-2702045671). I am moving some of my previous blog posts to the `Datafusion` website. Please feel free to

Re: [PR] fix: type checking [datafusion-python]

2025-03-05 Thread via GitHub
chenkovsky commented on PR #993: URL: https://github.com/apache/datafusion-python/pull/993#issuecomment-2702850399 @timsaucer could you please review it again? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL a

Re: [I] Support Push down expression evaluation in `TableProviders` [datafusion]

2025-03-05 Thread via GitHub
adriangb commented on issue #14993: URL: https://github.com/apache/datafusion/issues/14993#issuecomment-2702471043 > I note that the IO/CPU is already intertwined when implementing something like filter pushdown in parquet, so I am not sure also pusing down expressions makes the problem wor

Re: [PR] feat: add read array support [datafusion-comet]

2025-03-05 Thread via GitHub
kazuyukitanimura commented on code in PR #1456: URL: https://github.com/apache/datafusion-comet/pull/1456#discussion_r1982384947 ## spark/src/main/scala/org/apache/comet/serde/QueryPlanSerde.scala: ## @@ -60,14 +60,16 @@ object QueryPlanSerde extends Logging with ShimQueryPlanS

Re: [PR] Refactor test suite in EnforceDistribution, to use standard test config. [datafusion]

2025-03-05 Thread via GitHub
alamb commented on PR #15010: URL: https://github.com/apache/datafusion/pull/15010#issuecomment-2702431428 BTW @blaginin is proposing to add `insta` in this PR: - https://github.com/apache/datafusion/pull/13672 If we merge that some of these tests may become much eaiser to maintai

Re: [I] Avoid casting columns when comparing ints and strings [datafusion]

2025-03-05 Thread via GitHub
alan910127 commented on issue #15035: URL: https://github.com/apache/datafusion/issues/15035#issuecomment-2702482042 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [PR] Move `UnwrapCastInComparison` into `Simplifier` [datafusion]

2025-03-05 Thread via GitHub
alamb commented on code in PR #15012: URL: https://github.com/apache/datafusion/pull/15012#discussion_r1982386907 ## datafusion/optimizer/src/unwrap_cast_in_comparison.rs: ## @@ -1,1418 +0,0 @@ -// Licensed to the Apache Software Foundation (ASF) under one -// or more contributo

Re: [PR] fix: nested window function [datafusion]

2025-03-05 Thread via GitHub
chenkovsky commented on PR #15033: URL: https://github.com/apache/datafusion/pull/15033#issuecomment-2702420018 @alamb I think I have to add a BFS visitor in sqlparse crate, how do you feel about it? -- This is an automated message from the Apache Git Service. To respond to the message, p

Re: [PR] feat: Add support for rpad [datafusion-comet]

2025-03-05 Thread via GitHub
andygrove merged PR #1470: URL: https://github.com/apache/datafusion-comet/pull/1470 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@

Re: [PR] chore: Refactor CometScanRule to avoid duplication and improve fallback messages [datafusion-comet]

2025-03-05 Thread via GitHub
kazuyukitanimura commented on code in PR #1474: URL: https://github.com/apache/datafusion-comet/pull/1474#discussion_r1982416352 ## spark/src/main/scala/org/apache/comet/CometSparkSessionExtensions.scala: ## @@ -116,143 +113,124 @@ class CometSparkSessionExtensions

Re: [PR] Enable Dataframe to be converted into views which can be used in register_table [datafusion-python]

2025-03-05 Thread via GitHub
kosiew commented on code in PR #1016: URL: https://github.com/apache/datafusion-python/pull/1016#discussion_r1982570721 ## src/dataframe.rs: ## @@ -50,6 +52,22 @@ use crate::{ expr::{sort_expr::PySortExpr, PyExpr}, }; +#[pyclass(name = "TableProvider", module = "datafus

Re: [PR] chore: Upgrade to Spark 3.5.4 [datafusion-comet]

2025-03-05 Thread via GitHub
andygrove merged PR #1471: URL: https://github.com/apache/datafusion-comet/pull/1471 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@

Re: [I] Add Memory Profiling Functionality [datafusion]

2025-03-05 Thread via GitHub
2010YOUY01 commented on issue #14510: URL: https://github.com/apache/datafusion/issues/14510#issuecomment-2702809806 > I am interested in this function. > > [Tag memory that is allocated through the buffer manager, and add duckdb_memory() function by Mytherin ยท Pull Request #10496 ยท

Re: [PR] Reject `RESPECT NULLS` and `IGNORE NULLS` for aggregate functions [datafusion]

2025-03-05 Thread via GitHub
vbarua commented on code in PR #15014: URL: https://github.com/apache/datafusion/pull/15014#discussion_r1982693060 ## datafusion/sqllogictest/test_files/aggregate.slt: ## @@ -5893,15 +5889,11 @@ SELECT FIRST_VALUE(column1 ORDER BY column2) FROM t; NULL -query I +query e

Re: [PR] Reject `RESPECT NULLS` and `IGNORE NULLS` for aggregate functions [datafusion]

2025-03-05 Thread via GitHub
vbarua commented on code in PR #15014: URL: https://github.com/apache/datafusion/pull/15014#discussion_r1982693060 ## datafusion/sqllogictest/test_files/aggregate.slt: ## @@ -5893,15 +5889,11 @@ SELECT FIRST_VALUE(column1 ORDER BY column2) FROM t; NULL -query I +query e

Re: [I] External sort failing with modest memory limit [datafusion]

2025-03-05 Thread via GitHub
Kontinuation commented on issue #15028: URL: https://github.com/apache/datafusion/issues/15028#issuecomment-2702915602 I have tried the repro. This is more like a problem of Parquet writer, and not strongly related to sorting. I made small tweaks to the repro code to expose the status

[PR] Fix array_has_all and array_has_any with empty array [datafusion]

2025-03-05 Thread via GitHub
LuQQiu opened a new pull request, #15039: URL: https://github.com/apache/datafusion/pull/15039 ## Which issue does this PR close? - Closes #15038 ## What changes are included in this PR? Fix array_has_any and array_has_all with empty array input ## Are these chang

Re: [PR] Fix array_has_all and array_has_any with empty array [datafusion]

2025-03-05 Thread via GitHub
LuQQiu commented on PR #15039: URL: https://github.com/apache/datafusion/pull/15039#issuecomment-2702483400 @jayzhan211 Could you help approve the test workflows and review the PR? Thanks in advance -- This is an automated message from the Apache Git Service. To respond to the message, pl

Re: [I] Support different `EXPLAIN` formats via SQL [datafusion]

2025-03-05 Thread via GitHub
waynexia commented on issue #15021: URL: https://github.com/apache/datafusion/issues/15021#issuecomment-2702483758 I can implement the JSON part -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to th

Re: [PR] docs: Improve documentation for running stability plan tests [datafusion-comet]

2025-03-05 Thread via GitHub
andygrove commented on PR #1469: URL: https://github.com/apache/datafusion-comet/pull/1469#issuecomment-2702515411 Thanks for the review @kazuyukitanimura -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [PR] feat: add read array support [datafusion-comet]

2025-03-05 Thread via GitHub
comphead commented on code in PR #1456: URL: https://github.com/apache/datafusion-comet/pull/1456#discussion_r1982427638 ## spark/src/main/scala/org/apache/comet/serde/QueryPlanSerde.scala: ## @@ -60,14 +60,16 @@ object QueryPlanSerde extends Logging with ShimQueryPlanSerde wit

Re: [PR] docs: Improve documentation for running stability plan tests [datafusion-comet]

2025-03-05 Thread via GitHub
andygrove merged PR #1469: URL: https://github.com/apache/datafusion-comet/pull/1469 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@

Re: [PR] fix: AQE creating a non-supported Final HashAggregate post-shuffle [datafusion-comet]

2025-03-05 Thread via GitHub
kazuyukitanimura commented on PR #1390: URL: https://github.com/apache/datafusion-comet/pull/1390#issuecomment-2702515744 @EmilyMatt I see that plan stability tests are failing. Do you plan to update the golden files? -- This is an automated message from the Apache Git Service. To respon

Re: [I] Documentation regarding running/regenerating stability test plans [datafusion-comet]

2025-03-05 Thread via GitHub
andygrove closed issue #1393: Documentation regarding running/regenerating stability test plans URL: https://github.com/apache/datafusion-comet/issues/1393 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

Re: [I] Support Push down expression evaluation in `TableProviders` [datafusion]

2025-03-05 Thread via GitHub
cetra3 commented on issue #14993: URL: https://github.com/apache/datafusion/issues/14993#issuecomment-2702699865 I guess what I was getting at is that maybe we could use a `PhysicalOptimizerRule` to do this sort of thing. However currently all the OptimizerRule traits are non async, which

Re: [I] A complete solution for stable and safe sort with spill [datafusion]

2025-03-05 Thread via GitHub
2010YOUY01 commented on issue #14692: URL: https://github.com/apache/datafusion/issues/14692#issuecomment-2702697221 This is a reproducer for an external sort query failure under a very low memory limit https://github.com/apache/datafusion/issues/15028 Discord discussion:https://discord.

Re: [I] [Epic] DataFusion Blogs [datafusion]

2025-03-05 Thread via GitHub
akurmustafa commented on issue #14836: URL: https://github.com/apache/datafusion/issues/14836#issuecomment-2702704499 > I think the content looks good to me -- if you are open to updates / edits, I think porting them to the Datafusion blog sounds like a good idea to me. > > I als

Re: [PR] fix: nested window function [datafusion]

2025-03-05 Thread via GitHub
alamb commented on PR #15033: URL: https://github.com/apache/datafusion/pull/15033#issuecomment-2702422268 > @alamb I think I have to add a BFS visitor in sqlparse crate, how do you feel about it? i think that would mean it will take at least another month to fix this issue - as we w

Re: [I] array_has_any(column, []) with empty array throws RowConverter column schema mismatch, expected Utf8 got Int64 [datafusion]

2025-03-05 Thread via GitHub
jayzhan211 commented on issue #15038: URL: https://github.com/apache/datafusion/issues/15038#issuecomment-2702435176 we should return `false` for this query -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [PR] Fix array_has_all and array_has_any with empty array [datafusion]

2025-03-05 Thread via GitHub
LuQQiu commented on code in PR #15039: URL: https://github.com/apache/datafusion/pull/15039#discussion_r1982505448 ## datafusion/functions-nested/src/array_has.rs: ## @@ -439,6 +439,14 @@ fn array_has_all_and_any_dispatch( ) -> Result { let haystack = as_generic_list_array

Re: [I] Add support for `rpad` [datafusion-comet]

2025-03-05 Thread via GitHub
andygrove closed issue #1468: Add support for `rpad` URL: https://github.com/apache/datafusion-comet/issues/1468 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe,

Re: [PR] Adjust physical optimizer rule order, put `ProjectionPushdown` at last [datafusion]

2025-03-05 Thread via GitHub
xudong963 commented on code in PR #15040: URL: https://github.com/apache/datafusion/pull/15040#discussion_r1982671720 ## datafusion/sqllogictest/test_files/window.slt: ## @@ -2833,13 +2833,12 @@ logical_plan 06)--Projection: CAST(annotated_data_infinite.inc_col AS Int64

[PR] Adjust physical optimizer rule order, put `ProjectionPushdown` at last [datafusion]

2025-03-05 Thread via GitHub
xudong963 opened a new pull request, #15040: URL: https://github.com/apache/datafusion/pull/15040 ## Which issue does this PR close? - Closes #. ## Rationale for this change ## What changes are included in this PR? ## Are these changes t

Re: [PR] Implement tree explain for `DataSourceExec` [datafusion]

2025-03-05 Thread via GitHub
comphead commented on code in PR #15029: URL: https://github.com/apache/datafusion/pull/15029#discussion_r1982674514 ## datafusion/datasource/src/memory.rs: ## @@ -425,25 +425,20 @@ impl DataSource for MemorySourceConfig { } } DisplayFo

Re: [PR] chore: Refactor CometScanRule to avoid duplication and improve fallback messages [datafusion-comet]

2025-03-05 Thread via GitHub
kazuyukitanimura commented on code in PR #1474: URL: https://github.com/apache/datafusion-comet/pull/1474#discussion_r1982451500 ## spark/src/main/scala/org/apache/comet/CometSparkSessionExtensions.scala: ## @@ -116,140 +116,111 @@ class CometSparkSessionExtensions

Re: [PR] feat: rand expression support [datafusion-comet]

2025-03-05 Thread via GitHub
kazuyukitanimura commented on PR #1199: URL: https://github.com/apache/datafusion-comet/pull/1199#issuecomment-2702519134 @akupchinskiy do you plan to resolve the conflicts? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

Re: [PR] chore: Refactor CometScanRule to avoid duplication and improve fallback messages [datafusion-comet]

2025-03-05 Thread via GitHub
kazuyukitanimura commented on code in PR #1474: URL: https://github.com/apache/datafusion-comet/pull/1474#discussion_r1982461249 ## spark/src/main/scala/org/apache/comet/CometSparkSessionExtensions.scala: ## @@ -116,143 +113,124 @@ class CometSparkSessionExtensions

Re: [PR] Implement tree explain for `DataSourceExec` [datafusion]

2025-03-05 Thread via GitHub
comphead commented on PR #15029: URL: https://github.com/apache/datafusion/pull/15029#issuecomment-2702782556 Btw I like this simple approach to calc batch sizes, we can also reuse it in https://github.com/apache/datafusion/issues/14510 perhaps to sum up all incoming or outcoming batches.

Re: [PR] Implement tree explain for `DataSourceExec` [datafusion]

2025-03-05 Thread via GitHub
comphead commented on code in PR #15029: URL: https://github.com/apache/datafusion/pull/15029#discussion_r1982674514 ## datafusion/datasource/src/memory.rs: ## @@ -425,25 +425,20 @@ impl DataSource for MemorySourceConfig { } } DisplayFo

Re: [PR] Make Substrait Schema Structs always non-nullable [datafusion]

2025-03-05 Thread via GitHub
vbarua commented on PR #15011: URL: https://github.com/apache/datafusion/pull/15011#issuecomment-2702795062 Thanks for chasing this up here and in the core Substrait spec @amoeba ๐Ÿ™‡ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to Git

Re: [PR] Fix array_has_all and array_has_any with empty array [datafusion]

2025-03-05 Thread via GitHub
LuQQiu commented on code in PR #15039: URL: https://github.com/apache/datafusion/pull/15039#discussion_r1982505448 ## datafusion/functions-nested/src/array_has.rs: ## @@ -439,6 +439,14 @@ fn array_has_all_and_any_dispatch( ) -> Result { let haystack = as_generic_list_array

Re: [I] Overflow happened on: -2147483648 % -1 [datafusion-comet]

2025-03-05 Thread via GitHub
wForget commented on issue #1412: URL: https://github.com/apache/datafusion-comet/issues/1412#issuecomment-2702650035 > I just saw this got fixed in arrow-rs [apache/arrow-rs#7159](https://github.com/apache/arrow-rs/pull/7159) Yes, I will close this issue. -- This is an automa

Re: [PR] Implement tree explain for `DataSourceExec` [datafusion]

2025-03-05 Thread via GitHub
alamb commented on code in PR #15029: URL: https://github.com/apache/datafusion/pull/15029#discussion_r1982398221 ## datafusion/datasource/src/memory.rs: ## @@ -425,25 +425,20 @@ impl DataSource for MemorySourceConfig { } } DisplayForma

Re: [PR] fix: nested window function [datafusion]

2025-03-05 Thread via GitHub
chenkovsky commented on PR #15033: URL: https://github.com/apache/datafusion/pull/15033#issuecomment-2702473080 > > @alamb I think I have to add a BFS visitor in sqlparse crate, how do you feel about it? > > > > i think that would mean it will take at least another month to fi

Re: [PR] Fix array_has_all and array_has_any with empty array [datafusion]

2025-03-05 Thread via GitHub
westonpace commented on code in PR #15039: URL: https://github.com/apache/datafusion/pull/15039#discussion_r1982443721 ## datafusion/functions-nested/src/array_has.rs: ## @@ -439,6 +439,14 @@ fn array_has_all_and_any_dispatch( ) -> Result { let haystack = as_generic_list_a

Re: [PR] _repr_ and _html_repr_ show '... and additional rows' message [datafusion-python]

2025-03-05 Thread via GitHub
Spaarsh commented on code in PR #1041: URL: https://github.com/apache/datafusion-python/pull/1041#discussion_r1980972650 ## src/dataframe.rs: ## @@ -90,59 +90,108 @@ impl PyDataFrame { } fn __repr__(&self, py: Python) -> PyDataFusionResult { -let df = self.d

Re: [PR] fix: graceful NULL and type error handling in array functions [datafusion]

2025-03-05 Thread via GitHub
jayzhan211 commented on code in PR #14737: URL: https://github.com/apache/datafusion/pull/14737#discussion_r1981271099 ## datafusion/sqllogictest/test_files/array.slt: ## @@ -2243,9 +2245,9 @@ select array_sort(column1, 'DESC', 'NULLS LAST') from arrays_values; [30, 29, 28, 27

Re: [I] More accurate memory accounting in external sort [datafusion]

2025-03-05 Thread via GitHub
2010YOUY01 commented on issue #14748: URL: https://github.com/apache/datafusion/issues/14748#issuecomment-2700722243 There is a small optimization can be done after we have accurate memory accounting https://github.com/apache/datafusion/pull/15017#issuecomment-2700720460 -- This is an au

Re: [PR] Reuse last projection layer when renaming columns [datafusion]

2025-03-05 Thread via GitHub
blaginin commented on PR #14684: URL: https://github.com/apache/datafusion/pull/14684#issuecomment-2700735451 Hey, I still want to merge this one. https://github.com/apache/datafusion/pull/14653 has indeed made the benches much faster, but I think it's still good to make the logical plan sm

Re: [PR] Move `UnwrapCastInComparison` into `Simplifier` [datafusion]

2025-03-05 Thread via GitHub
jayzhan211 commented on code in PR #15012: URL: https://github.com/apache/datafusion/pull/15012#discussion_r1981245810 ## datafusion/optimizer/src/unwrap_cast_in_comparison.rs: ## @@ -1,1418 +0,0 @@ -// Licensed to the Apache Software Foundation (ASF) under one -// or more contr

Re: [I] Project Ideas for GSoC 2025 (Google Summer of Code) [datafusion]

2025-03-05 Thread via GitHub
mkarbo commented on issue #14478: URL: https://github.com/apache/datafusion/issues/14478#issuecomment-2700200275 Sent you an e-mail :+1: -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specif

Re: [I] Expose to `AccumulatorArgs` whether all the groups are sorted [datafusion]

2025-03-05 Thread via GitHub
jayzhan211 commented on issue #14991: URL: https://github.com/apache/datafusion/issues/14991#issuecomment-2700784561 ``` query TT explain select col_i32, sum(col_u32) sum_col_u32 from (select * from test_table order by col_i32 limit 10) group by col_i32 logical_plan 01)Pr

[I] Update python min version to 3.9 [datafusion-python]

2025-03-05 Thread via GitHub
timsaucer opened a new issue, #1042: URL: https://github.com/apache/datafusion-python/issues/1042 **Is your feature request related to a problem or challenge? Please describe what you are trying to do.** Python 3.8 is past end of life since October: https://devguide.python.org/versio

Re: [I] Expr simplifier doesn't simplify exprs that are same if you swap lhs with rhs regardless of cycles [datafusion]

2025-03-05 Thread via GitHub
ion-elgreco commented on issue #14943: URL: https://github.com/apache/datafusion/issues/14943#issuecomment-2700751765 > I think it will not be present in DataFusion 46 (we have the RC out for voting now) Ah pity! -- This is an automated message from the Apache Git Service. To respo

Re: [PR] fix: graceful NULL and type error handling in array functions [datafusion]

2025-03-05 Thread via GitHub
jayzhan211 commented on code in PR #14737: URL: https://github.com/apache/datafusion/pull/14737#discussion_r1981277730 ## datafusion/functions-nested/src/sort.rs: ## @@ -175,6 +204,7 @@ pub fn array_sort_inner(args: &[ArrayRef]) -> Result { for i in 0..row_count {

Re: [PR] add manual trigger for extended tests in pull requests [datafusion]

2025-03-05 Thread via GitHub
alamb commented on code in PR #14331: URL: https://github.com/apache/datafusion/pull/14331#discussion_r1981316805 ## .github/workflows/extended.yml: ## @@ -33,16 +33,46 @@ on: push: branches: - main + issue_comment: +types: [created] + +permissions: + pull-r

Re: [PR] fix: External sort failing on an edge case [datafusion]

2025-03-05 Thread via GitHub
alamb commented on code in PR #15017: URL: https://github.com/apache/datafusion/pull/15017#discussion_r1981241095 ## datafusion/core/tests/memory_limit/mod.rs: ## @@ -468,6 +468,31 @@ async fn test_stringview_external_sort() { let _ = df.collect().await.expect("Query execut

Re: [I] `schema_force_view_type` configuration not working for `CREATE EXTERNAL TABLE` [datafusion]

2025-03-05 Thread via GitHub
alamb commented on issue #14909: URL: https://github.com/apache/datafusion/issues/14909#issuecomment-2700708838 Thank you @zhuqi-lucas If you change the example slightly (so the column names are not explicitly listed) then the type is correctly set to Utf8View ```sql > CREA

Re: [PR] fix: External sort failing on an edge case [datafusion]

2025-03-05 Thread via GitHub
2010YOUY01 commented on code in PR #15017: URL: https://github.com/apache/datafusion/pull/15017#discussion_r1981260336 ## datafusion/core/tests/memory_limit/mod.rs: ## @@ -468,6 +468,31 @@ async fn test_stringview_external_sort() { let _ = df.collect().await.expect("Query e

Re: [PR] _repr_ and _html_repr_ show '... and additional rows' message [datafusion-python]

2025-03-05 Thread via GitHub
kosiew commented on code in PR #1041: URL: https://github.com/apache/datafusion-python/pull/1041#discussion_r1980965152 ## src/dataframe.rs: ## @@ -90,59 +90,108 @@ impl PyDataFrame { } fn __repr__(&self, py: Python) -> PyDataFusionResult { -let df = self.df

Re: [PR] BUG: schema_force_view_type configuration not working for CREATE EXTERNAL TABLE [datafusion]

2025-03-05 Thread via GitHub
alamb commented on code in PR #14922: URL: https://github.com/apache/datafusion/pull/14922#discussion_r1981262973 ## datafusion-examples/examples/dataframe.rs: ## @@ -59,7 +59,8 @@ use tempfile::tempdir; #[tokio::main] async fn main() -> Result<()> { // The SessionContext

Re: [PR] fix: External sort failing on an edge case [datafusion]

2025-03-05 Thread via GitHub
2010YOUY01 commented on PR #15017: URL: https://github.com/apache/datafusion/pull/15017#issuecomment-2700720460 > Thank you @2010YOUY01 > > While this likely will result in slightly slower performance in some cases (as there is additional spilling) making sure the queries won't error

Re: [PR] Reuse alias if possible [datafusion]

2025-03-05 Thread via GitHub
alamb commented on PR #14781: URL: https://github.com/apache/datafusion/pull/14781#issuecomment-2700721523 Interestingly, I was working on a very similar PR last night: - https://github.com/apache/datafusion/pull/15008 -- This is an automated message from the Apache Git Service. To res

Re: [PR] Minor: cleanup unused code [datafusion]

2025-03-05 Thread via GitHub
alamb commented on code in PR #15016: URL: https://github.com/apache/datafusion/pull/15016#discussion_r1981151334 ## datafusion/common/src/scalar/mod.rs: ## @@ -2764,15 +2764,6 @@ impl ScalarValue { Ok(scalars) } -// TODO: Support more types after other Scala

Re: [PR] Minor: add method `SessionStateBuilder::new_with_default_features()` [datafusion]

2025-03-05 Thread via GitHub
alamb merged PR #14998: URL: https://github.com/apache/datafusion/pull/14998 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [I] Add method to SessionStateBuilder that has all the defaut features [datafusion]

2025-03-05 Thread via GitHub
alamb closed issue #14981: Add method to SessionStateBuilder that has all the defaut features URL: https://github.com/apache/datafusion/issues/14981 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to t

Re: [I] Weekly Plan (Andrew Lamb) March 3, 2025 [datafusion]

2025-03-05 Thread via GitHub
alamb commented on issue #14978: URL: https://github.com/apache/datafusion/issues/14978#issuecomment-2700569370 DataFusion: Bugs/UX/Performance - [ ] https://github.com/apache/datafusion/pull/14331 - [ ] https://github.com/apache/datafusion/pull/14684 DataFusion: New Feat

Re: [I] discuss: Introduce `datafusion-storage` as datafusion's own storage interface [datafusion]

2025-03-05 Thread via GitHub
Xuanwo commented on issue #14854: URL: https://github.com/apache/datafusion/issues/14854#issuecomment-2700577333 > I'm willing to kick off a PoC first so the community can get a sense of this. Kicked off at https://github.com/apache/datafusion/pull/15018 -- This is an automated mes

Re: [PR] Count wildcard alias [datafusion]

2025-03-05 Thread via GitHub
alamb commented on code in PR #14927: URL: https://github.com/apache/datafusion/pull/14927#discussion_r1980359449 ## datafusion/core/tests/dataframe/mod.rs: ## @@ -2469,9 +2470,53 @@ async fn test_count_wildcard_on_sort() -> Result<()> { .explain(false, false)?

[PR] [POC] feat: Add datafusion-storage [datafusion]

2025-03-05 Thread via GitHub
Xuanwo opened a new pull request, #15018: URL: https://github.com/apache/datafusion/pull/15018 ## Which issue does this PR close? PoC for https://github.com/apache/datafusion/issues/14854 ## Rationale for this change This PR is basiclly a PoC for the `datafusion-storage`

Re: [PR] Move `UnwrapCastInComparison` into `Simplifier` [datafusion]

2025-03-05 Thread via GitHub
alamb commented on code in PR #15012: URL: https://github.com/apache/datafusion/pull/15012#discussion_r1981203517 ## datafusion/optimizer/src/unwrap_cast_in_comparison.rs: ## @@ -1,1418 +0,0 @@ -// Licensed to the Apache Software Foundation (ASF) under one -// or more contributo

Re: [PR] Refactor test suite in EnforceDistribution, to use standard test config. [datafusion]

2025-03-05 Thread via GitHub
alamb commented on code in PR #15010: URL: https://github.com/apache/datafusion/pull/15010#discussion_r1981211727 ## datafusion/core/tests/physical_optimizer/enforce_distribution.rs: ## @@ -371,46 +371,91 @@ macro_rules! plans_matches_expected { } } +fn test_suite_defaul

[PR] fix: External sort failing on an edge case [datafusion]

2025-03-05 Thread via GitHub
2010YOUY01 opened a new pull request, #15017: URL: https://github.com/apache/datafusion/pull/15017 ## Which issue does this PR close? - Closes #. NA ## Rationale for this change I came across one sorting query with memory limit fail indefinitely, here i

Re: [PR] Reject `RESPECT NULLS` and `IGNORE NULLS` for aggregate functions [datafusion]

2025-03-05 Thread via GitHub
alamb commented on code in PR #15014: URL: https://github.com/apache/datafusion/pull/15014#discussion_r1981180551 ## datafusion/sqllogictest/test_files/aggregate.slt: ## @@ -5893,15 +5889,11 @@ SELECT FIRST_VALUE(column1 ORDER BY column2) FROM t; NULL -query I +query er

[PR] Enable unit tests in tpch.rs to load TPCH tables [datafusion-ballista]

2025-03-05 Thread via GitHub
vmingchen opened a new pull request, #1195: URL: https://github.com/apache/datafusion-ballista/pull/1195 Partial fix to https://github.com/apache/datafusion-ballista/issues/1194 The `.tbl` files generated by `tpch-gen.sh` has an additional trailing column that need to be special-treat

[I] Plans that have multiple output partitions at the top level fail [datafusion-ray]

2025-03-05 Thread via GitHub
adragomir opened a new issue, #75: URL: https://github.com/apache/datafusion-ray/issues/75 I have a physical plan that looks like this when it enters ray ``` [ output_partitions: 16]ProjectionExec: expr=[...] [ output_partitions: 16] RayStageExec[0] (output_partiti

Re: [I] Unsupported Arrow Vector for export: class org.apache.arrow.vector.complex.ListVector [datafusion-comet]

2025-03-05 Thread via GitHub
comphead commented on issue #1289: URL: https://github.com/apache/datafusion-comet/issues/1289#issuecomment-2701540775 Yeah, looks like the problem is in the shuffle Manager to create a local test ``` class CometNativeReaderSuite extends CometTestBase with AdaptiveSparkPla

Re: [PR] feat: Upgrade to DataFusion 46.0.0-rc2 [datafusion-comet]

2025-03-05 Thread via GitHub
andygrove commented on code in PR #1423: URL: https://github.com/apache/datafusion-comet/pull/1423#discussion_r1981864217 ## native/core/src/parquet/mod.rs: ## @@ -687,16 +675,30 @@ pub unsafe extern "system" fn Java_org_apache_comet_parquet_Native_initRecordBat Sp

Re: [I] Implement `tree` explain for `AggregateExec` [datafusion]

2025-03-05 Thread via GitHub
zebsme commented on issue #15024: URL: https://github.com/apache/datafusion/issues/15024#issuecomment-2701655209 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To uns

Re: [PR] Add snapshot testing to CLI & set up AWS mock [datafusion]

2025-03-05 Thread via GitHub
blaginin commented on PR #13672: URL: https://github.com/apache/datafusion/pull/13672#issuecomment-2701668145 Because CLI integration tests don't break very often, I moved them to the extended tests https://github.com/blaginin/datafusion/actions/runs/13679702615/job/38249126001#step:

Re: [PR] Reuse alias if possible [datafusion]

2025-03-05 Thread via GitHub
blaginin commented on PR #14781: URL: https://github.com/apache/datafusion/pull/14781#issuecomment-2701663777 yes, @alamb, I think we got on the same issue with `unnest` ๐Ÿ˜€ - I'm happy to keep working on mine unless you want to take over? -- This is an automated message from the Apache Gi

[I] Avoid casting columns when comparing ints and strings [datafusion]

2025-03-05 Thread via GitHub
alamb opened a new issue, #15035: URL: https://github.com/apache/datafusion/issues/15035 ### Is your feature request related to a problem or challenge? - related to https://github.com/apache/datafusion/issues/14944 The usecase is a predicate like this, where month_id is an integ

Re: [I] Should `PruningPredicate` coerce? [datafusion]

2025-03-05 Thread via GitHub
alamb commented on issue #14944: URL: https://github.com/apache/datafusion/issues/14944#issuecomment-2701725282 Also filed a ticket for unwrapping that particular comparison expressio - https://github.com/apache/datafusion/issues/15035 -- This is an automated message from the Apache Git

Re: [PR] add manual trigger for extended tests in pull requests [datafusion]

2025-03-05 Thread via GitHub
Omega359 commented on PR #14331: URL: https://github.com/apache/datafusion/pull/14331#issuecomment-2701726107 I came across [this article](https://dev.to/zirkelc/trigger-github-workflow-for-comment-on-pull-request-45l2) when looking into the issue. Looks like they use [an action](https://g

Re: [PR] fix: Executor memory overhead overriding [datafusion-comet]

2025-03-05 Thread via GitHub
andygrove merged PR #1462: URL: https://github.com/apache/datafusion-comet/pull/1462 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@

Re: [I] Comet executor memory overriding to absurd numbers (unified mode) [datafusion-comet]

2025-03-05 Thread via GitHub
andygrove closed issue #1460: Comet executor memory overriding to absurd numbers (unified mode) URL: https://github.com/apache/datafusion-comet/issues/1460 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

Re: [I] Release sqlparser-rs version `0.55.0` [datafusion-sqlparser-rs]

2025-03-05 Thread via GitHub
alamb commented on issue #1671: URL: https://github.com/apache/datafusion-sqlparser-rs/issues/1671#issuecomment-2701821359 Filed a ticket for the next release here: - https://github.com/apache/datafusion-sqlparser-rs/issues/1756 -- This is an automated message from the Apache Git Servi

[I] Release sqlparser-rs version `0.56.0` [datafusion-sqlparser-rs]

2025-03-05 Thread via GitHub
alamb opened a new issue, #1756: URL: https://github.com/apache/datafusion-sqlparser-rs/issues/1756 Follow on to - https://github.com/apache/datafusion-sqlparser-rs/issues/1671 This ticket tracks creating the next sqlparser release (mostly so others can follow along) **Targ

Re: [I] Can we do a release? [datafusion-sqlparser-rs]

2025-03-05 Thread via GitHub
alamb closed issue #1740: Can we do a release? URL: https://github.com/apache/datafusion-sqlparser-rs/issues/1740 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe

  1   2   3   >