Re: [PR] Minor: Remove redundant implementation of `StringArrayType` [datafusion]

2025-01-07 Thread via GitHub
alamb merged PR #14023: URL: https://github.com/apache/datafusion/pull/14023 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] Minor: Remove redundant implementation of `StringArrayType` [datafusion]

2025-01-07 Thread via GitHub
alamb commented on PR #14023: URL: https://github.com/apache/datafusion/pull/14023#issuecomment-2575055613 Thanks again @tlm365 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comme

Re: [I] Support pruning on `starts_with` [datafusion]

2025-01-07 Thread via GitHub
alamb commented on issue #14027: URL: https://github.com/apache/datafusion/issues/14027#issuecomment-2575059095 I think the rewrite would look something like ```sql select * from my_file where starts_with(col, 'http://') ``` Rewritten to the equivalent of ```sql

Re: [I] Memory account not adding up in SortExec [datafusion]

2025-01-07 Thread via GitHub
alamb commented on issue #10073: URL: https://github.com/apache/datafusion/issues/10073#issuecomment-2575063192 > This is not surprising to me. During the sort we are probably building a string array and probably using some kind of resize-on-append string building that is doubling and we en

[PR] Start new line if \r and dialect is postgres [datafusion-sqlparser-rs]

2025-01-07 Thread via GitHub
hansott opened a new pull request, #1647: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1647 Currently the tokenizer throws an error for ```js insert into cats_2 (petname) values ('foo'),--\r(version()||'\n'); ``` this is because postgres treats \r as a separ

Re: [PR] chore: extract agg_funcs expressions to folders based on spark grouping [datafusion-comet]

2025-01-07 Thread via GitHub
rluvaton commented on PR #1224: URL: https://github.com/apache/datafusion-comet/pull/1224#issuecomment-2574740848 Done, I really think some contributor with an access should merge those one by one and fix those conflicts as otherwise it will take a lot of back and forth -- This is an aut

Re: [I] Jan 1, 2025: This week(s) in DataFusion [datafusion]

2025-01-07 Thread via GitHub
alamb commented on issue #13970: URL: https://github.com/apache/datafusion/issues/13970#issuecomment-2575082619 This is a pretty cool API from @westonpace making it easier to implement remote (async) catalogs: https://github.com/apache/datafusion/pull/13800 -- This is an automated message

Re: [PR] feat: add `AsyncCatalogProvider` helpers for asynchronous catalogs [datafusion]

2025-01-07 Thread via GitHub
alamb commented on code in PR #13800: URL: https://github.com/apache/datafusion/pull/13800#discussion_r1905333290 ## datafusion/catalog/src/async.rs: ## @@ -0,0 +1,747 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements.

Re: [I] Introduce ProjectionMask To Allow Nested Projection Pushdown [datafusion]

2025-01-07 Thread via GitHub
alamb commented on issue #2581: URL: https://github.com/apache/datafusion/issues/2581#issuecomment-2575099075 > Apologies, I should have checked the example value. 10_000 shows what I mean: Ah, yes, in this case the [UnwrapCastInComparison](https://docs.rs/datafusion/latest/datafusio

Re: [PR] Add support for MySQL's INSERT INTO ... SET syntax [datafusion-sqlparser-rs]

2025-01-07 Thread via GitHub
alamb commented on PR #1641: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1641#issuecomment-2575101680 Thank you for these PRs @yoavcloud and thank you for reviewing and keeping them moving @iffyio -- This is an automated message from the Apache Git Service. To respond t

[PR] Feat: Add support for `array_min`, `array_max`, `sort_array`, `array_zip` & `array_union` [datafusion-comet]

2025-01-07 Thread via GitHub
dharanad opened a new pull request, #1227: URL: https://github.com/apache/datafusion-comet/pull/1227 ## Which issue does this PR close? Part of https://github.com/apache/datafusion-comet/issues/1042 ## Rationale for this change ## What changes are included

Re: [PR] Improve perfomance of `reverse` function [datafusion]

2025-01-07 Thread via GitHub
tlm365 commented on code in PR #14025: URL: https://github.com/apache/datafusion/pull/14025#discussion_r1905054494 ## datafusion/functions/src/unicode/reverse.rs: ## @@ -116,14 +115,23 @@ pub fn reverse(args: &[ArrayRef]) -> Result { } } -fn reverse_impl<'a, T: OffsetSi

[PR] FLUP #13810 [datafusion]

2025-01-07 Thread via GitHub
cht42 opened a new pull request, #14034: URL: https://github.com/apache/datafusion/pull/14034 ## Which issue does this PR close? Closes/FLUP #13809 ## Rationale for this change FLUP for https://github.com/apache/datafusion/pull/13810, this fix will also handle f

Re: [PR] MsSQL SET for session params [datafusion-sqlparser-rs]

2025-01-07 Thread via GitHub
yoavcloud commented on PR #1646: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1646#issuecomment-2575195680 @alamb done -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] Simplify error handling in case.rs (#13990) [datafusion]

2025-01-07 Thread via GitHub
cj-zhukov commented on PR #14033: URL: https://github.com/apache/datafusion/pull/14033#issuecomment-2576860655 @alamb Andrew, I noticed the build fails unless I import `datafusion_common::DataFusionError`, which is not used in my PR changes but appears necessary for compatibility with the m

Re: [PR] Refactor into `LexOrdering::collapse`, `LexRequirement::collapse` avoid clone [datafusion]

2025-01-07 Thread via GitHub
berkaysynnada commented on code in PR #14038: URL: https://github.com/apache/datafusion/pull/14038#discussion_r1906625182 ## datafusion/physical-expr-common/src/sort_expr.rs: ## @@ -540,6 +557,33 @@ impl LexRequirement { .collect(), ) } + +///

Re: [PR] chore: extract predicate_functions expressions to folders based on spark grouping [datafusion-comet]

2025-01-07 Thread via GitHub
viirya commented on PR #1218: URL: https://github.com/apache/datafusion-comet/pull/1218#issuecomment-2576900702 Thanks @rluvaton @andygrove -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the s

Re: [PR] Unparsing optimized (> 2 inputs) unions [datafusion]

2025-01-07 Thread via GitHub
goldmedal commented on code in PR #14031: URL: https://github.com/apache/datafusion/pull/14031#discussion_r1906575417 ## datafusion/sql/src/unparser/plan.rs: ## @@ -729,12 +722,16 @@ impl Unparser<'_> { .map(|input| self.select_to_sql_expr(input, query))

Re: [PR] chore: extract predicate_functions expressions to folders based on spark grouping [datafusion-comet]

2025-01-07 Thread via GitHub
viirya merged PR #1218: URL: https://github.com/apache/datafusion-comet/pull/1218 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dat

Re: [PR] feat(optimizer): Enable filter pushdown on window functions [datafusion]

2025-01-07 Thread via GitHub
2010YOUY01 commented on code in PR #14026: URL: https://github.com/apache/datafusion/pull/14026#discussion_r1906209324 ## datafusion/optimizer/src/push_down_filter.rs: ## @@ -985,6 +985,77 @@ impl OptimizerRule for PushDownFilter { }

Re: [PR] Default to ZSTD compression when writing Parquet [datafusion-python]

2025-01-07 Thread via GitHub
kosiew commented on code in PR #981: URL: https://github.com/apache/datafusion-python/pull/981#discussion_r1906516715 ## python/datafusion/dataframe.py: ## @@ -620,16 +620,25 @@ def write_csv(self, path: str | pathlib.Path, with_header: bool = False) -> None def write_parq

Re: [PR] Default to ZSTD compression when writing Parquet [datafusion-python]

2025-01-07 Thread via GitHub
kosiew commented on code in PR #981: URL: https://github.com/apache/datafusion-python/pull/981#discussion_r1906516715 ## python/datafusion/dataframe.py: ## @@ -620,16 +620,25 @@ def write_csv(self, path: str | pathlib.Path, with_header: bool = False) -> None def write_parq

Re: [PR] Default to ZSTD compression when writing Parquet [datafusion-python]

2025-01-07 Thread via GitHub
kosiew commented on code in PR #981: URL: https://github.com/apache/datafusion-python/pull/981#discussion_r1906516715 ## python/datafusion/dataframe.py: ## @@ -620,16 +620,25 @@ def write_csv(self, path: str | pathlib.Path, with_header: bool = False) -> None def write_parq

Re: [PR] Refactor into `LexOrdering::collapse`, `LexRequirement::collapse` avoid clone [datafusion]

2025-01-07 Thread via GitHub
berkaysynnada commented on code in PR #14038: URL: https://github.com/apache/datafusion/pull/14038#discussion_r1906625182 ## datafusion/physical-expr-common/src/sort_expr.rs: ## @@ -540,6 +557,33 @@ impl LexRequirement { .collect(), ) } + +///

Re: [PR] Use partial aggregation schema for spilling to avoid column mismatch in GroupedHashAggregateStream [datafusion]

2025-01-07 Thread via GitHub
korowa merged PR #13995: URL: https://github.com/apache/datafusion/pull/13995 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafus

Re: [I] Schema error when spilling with multiple aggregations [datafusion]

2025-01-07 Thread via GitHub
korowa closed issue #13949: Schema error when spilling with multiple aggregations URL: https://github.com/apache/datafusion/issues/13949 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] Improve perfomance of `reverse` function [datafusion]

2025-01-07 Thread via GitHub
2010YOUY01 commented on code in PR #14025: URL: https://github.com/apache/datafusion/pull/14025#discussion_r1906203641 ## datafusion/functions/src/unicode/reverse.rs: ## @@ -116,14 +115,23 @@ pub fn reverse(args: &[ArrayRef]) -> Result { } } -fn reverse_impl<'a, T: Offs

Re: [PR] Improve perfomance of `reverse` function [datafusion]

2025-01-07 Thread via GitHub
2010YOUY01 commented on code in PR #14025: URL: https://github.com/apache/datafusion/pull/14025#discussion_r1906203641 ## datafusion/functions/src/unicode/reverse.rs: ## @@ -116,14 +115,23 @@ pub fn reverse(args: &[ArrayRef]) -> Result { } } -fn reverse_impl<'a, T: Offs

Re: [PR] chore: deprecate `ValuesExec` in favour of `MemoryExec` [datafusion]

2025-01-07 Thread via GitHub
jonathanc-n commented on code in PR #14032: URL: https://github.com/apache/datafusion/pull/14032#discussion_r1906250554 ## datafusion/core/src/physical_planner.rs: ## @@ -466,6 +467,7 @@ impl DefaultPhysicalPlanner { .collect::>>>()

Re: [PR] chore: extract strings file to `strings_func` like in spark grouping [datafusion-comet]

2025-01-07 Thread via GitHub
andygrove merged PR #1215: URL: https://github.com/apache/datafusion-comet/pull/1215 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@

Re: [PR] Default to ZSTD compression when writing Parquet [datafusion-python]

2025-01-07 Thread via GitHub
kosiew commented on code in PR #981: URL: https://github.com/apache/datafusion-python/pull/981#discussion_r1906516715 ## python/datafusion/dataframe.py: ## @@ -620,16 +620,25 @@ def write_csv(self, path: str | pathlib.Path, with_header: bool = False) -> None def write_parq

Re: [PR] Default to ZSTD compression when writing Parquet [datafusion-python]

2025-01-07 Thread via GitHub
kosiew commented on code in PR #981: URL: https://github.com/apache/datafusion-python/pull/981#discussion_r1906516715 ## python/datafusion/dataframe.py: ## @@ -620,16 +620,25 @@ def write_csv(self, path: str | pathlib.Path, with_header: bool = False) -> None def write_parq

<    1   2