Re: [PR] Refactor into `LexOrdering::collapse`, `LexRequirement::collapse` avoid clone [datafusion]

2025-01-07 Thread via GitHub
berkaysynnada commented on code in PR #14038: URL: https://github.com/apache/datafusion/pull/14038#discussion_r1906625182 ## datafusion/physical-expr-common/src/sort_expr.rs: ## @@ -540,6 +557,33 @@ impl LexRequirement { .collect(), ) } + +///

Re: [PR] Refactor into `LexOrdering::collapse`, `LexRequirement::collapse` avoid clone [datafusion]

2025-01-07 Thread via GitHub
berkaysynnada commented on code in PR #14038: URL: https://github.com/apache/datafusion/pull/14038#discussion_r1906625182 ## datafusion/physical-expr-common/src/sort_expr.rs: ## @@ -540,6 +557,33 @@ impl LexRequirement { .collect(), ) } + +///

Re: [PR] chore: extract predicate_functions expressions to folders based on spark grouping [datafusion-comet]

2025-01-07 Thread via GitHub
viirya commented on PR #1218: URL: https://github.com/apache/datafusion-comet/pull/1218#issuecomment-2576900702 Thanks @rluvaton @andygrove -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the s

Re: [PR] chore: extract predicate_functions expressions to folders based on spark grouping [datafusion-comet]

2025-01-07 Thread via GitHub
viirya merged PR #1218: URL: https://github.com/apache/datafusion-comet/pull/1218 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dat

Re: [PR] Unparsing optimized (> 2 inputs) unions [datafusion]

2025-01-07 Thread via GitHub
goldmedal commented on code in PR #14031: URL: https://github.com/apache/datafusion/pull/14031#discussion_r1906575417 ## datafusion/sql/src/unparser/plan.rs: ## @@ -729,12 +722,16 @@ impl Unparser<'_> { .map(|input| self.select_to_sql_expr(input, query))

Re: [PR] Default to ZSTD compression when writing Parquet [datafusion-python]

2025-01-07 Thread via GitHub
kosiew commented on code in PR #981: URL: https://github.com/apache/datafusion-python/pull/981#discussion_r1906516715 ## python/datafusion/dataframe.py: ## @@ -620,16 +620,25 @@ def write_csv(self, path: str | pathlib.Path, with_header: bool = False) -> None def write_parq

Re: [PR] Default to ZSTD compression when writing Parquet [datafusion-python]

2025-01-07 Thread via GitHub
kosiew commented on code in PR #981: URL: https://github.com/apache/datafusion-python/pull/981#discussion_r1906516715 ## python/datafusion/dataframe.py: ## @@ -620,16 +620,25 @@ def write_csv(self, path: str | pathlib.Path, with_header: bool = False) -> None def write_parq

Re: [PR] Simplify error handling in case.rs (#13990) [datafusion]

2025-01-07 Thread via GitHub
cj-zhukov commented on PR #14033: URL: https://github.com/apache/datafusion/pull/14033#issuecomment-2576860655 @alamb Andrew, I noticed the build fails unless I import `datafusion_common::DataFusionError`, which is not used in my PR changes but appears necessary for compatibility with the m

Re: [PR] Default to ZSTD compression when writing Parquet [datafusion-python]

2025-01-07 Thread via GitHub
kosiew commented on code in PR #981: URL: https://github.com/apache/datafusion-python/pull/981#discussion_r1906516715 ## python/datafusion/dataframe.py: ## @@ -620,16 +620,25 @@ def write_csv(self, path: str | pathlib.Path, with_header: bool = False) -> None def write_parq

Re: [PR] Default to ZSTD compression when writing Parquet [datafusion-python]

2025-01-07 Thread via GitHub
kosiew commented on code in PR #981: URL: https://github.com/apache/datafusion-python/pull/981#discussion_r1906516715 ## python/datafusion/dataframe.py: ## @@ -620,16 +620,25 @@ def write_csv(self, path: str | pathlib.Path, with_header: bool = False) -> None def write_parq

Re: [PR] Default to ZSTD compression when writing Parquet [datafusion-python]

2025-01-07 Thread via GitHub
kosiew commented on code in PR #981: URL: https://github.com/apache/datafusion-python/pull/981#discussion_r1906516715 ## python/datafusion/dataframe.py: ## @@ -620,16 +620,25 @@ def write_csv(self, path: str | pathlib.Path, with_header: bool = False) -> None def write_parq

Re: [I] Schema error when spilling with multiple aggregations [datafusion]

2025-01-07 Thread via GitHub
korowa closed issue #13949: Schema error when spilling with multiple aggregations URL: https://github.com/apache/datafusion/issues/13949 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] Use partial aggregation schema for spilling to avoid column mismatch in GroupedHashAggregateStream [datafusion]

2025-01-07 Thread via GitHub
korowa merged PR #13995: URL: https://github.com/apache/datafusion/pull/13995 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafus

Re: [PR] chore: extract strings file to `strings_func` like in spark grouping [datafusion-comet]

2025-01-07 Thread via GitHub
andygrove merged PR #1215: URL: https://github.com/apache/datafusion-comet/pull/1215 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@

Re: [PR] chore: deprecate `ValuesExec` in favour of `MemoryExec` [datafusion]

2025-01-07 Thread via GitHub
jonathanc-n commented on code in PR #14032: URL: https://github.com/apache/datafusion/pull/14032#discussion_r1906250554 ## datafusion/core/src/physical_planner.rs: ## @@ -466,6 +467,7 @@ impl DefaultPhysicalPlanner { .collect::>>>()

Re: [PR] Improve perfomance of `reverse` function [datafusion]

2025-01-07 Thread via GitHub
2010YOUY01 commented on code in PR #14025: URL: https://github.com/apache/datafusion/pull/14025#discussion_r1906203641 ## datafusion/functions/src/unicode/reverse.rs: ## @@ -116,14 +115,23 @@ pub fn reverse(args: &[ArrayRef]) -> Result { } } -fn reverse_impl<'a, T: Offs

Re: [PR] feat(optimizer): Enable filter pushdown on window functions [datafusion]

2025-01-07 Thread via GitHub
2010YOUY01 commented on code in PR #14026: URL: https://github.com/apache/datafusion/pull/14026#discussion_r1906209324 ## datafusion/optimizer/src/push_down_filter.rs: ## @@ -985,6 +985,77 @@ impl OptimizerRule for PushDownFilter { }

Re: [PR] Improve perfomance of `reverse` function [datafusion]

2025-01-07 Thread via GitHub
2010YOUY01 commented on code in PR #14025: URL: https://github.com/apache/datafusion/pull/14025#discussion_r1906203641 ## datafusion/functions/src/unicode/reverse.rs: ## @@ -116,14 +115,23 @@ pub fn reverse(args: &[ArrayRef]) -> Result { } } -fn reverse_impl<'a, T: Offs

Re: [PR] chore: Release Ballista 43.0.0 [datafusion-ballista]

2025-01-07 Thread via GitHub
andygrove merged PR #1156: URL: https://github.com/apache/datafusion-ballista/pull/1156 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr.

Re: [I] Ballista 43.0.0 Release [datafusion-ballista]

2025-01-07 Thread via GitHub
andygrove closed issue #974: Ballista 43.0.0 Release URL: https://github.com/apache/datafusion-ballista/issues/974 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscrib

Re: [PR] chore: Upgrade to DataFusion 44.0.0 from 44.0.0 RC2 [datafusion-comet]

2025-01-07 Thread via GitHub
andygrove merged PR #1232: URL: https://github.com/apache/datafusion-comet/pull/1232 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@

Re: [I] Move CPU Bound Tasks off Tokio Threadpool [datafusion]

2025-01-07 Thread via GitHub
djanderson commented on issue #13692: URL: https://github.com/apache/datafusion/issues/13692#issuecomment-2576419792 @tustvold, I gave this a thorough read today and plan to test out the approach soon. The only thing I'm a little flaky on is the use of the call to `use_current_thread

Re: [PR] [comet-parquet-exec] Move type conversion logic for ParquetExec out of Cast expression. [datafusion-comet]

2025-01-07 Thread via GitHub
parthchandra commented on PR #1229: URL: https://github.com/apache/datafusion-comet/pull/1229#issuecomment-2576342088 > > Finally, can we include two more things (either in spark_parquet_options or in some parquet_conversion_context struct) which has the conversion and type promition optio

Re: [PR] [comet-parquet-exec] Move type conversion logic for ParquetExec out of Cast expression. [datafusion-comet]

2025-01-07 Thread via GitHub
parthchandra commented on PR #1229: URL: https://github.com/apache/datafusion-comet/pull/1229#issuecomment-2576305496 > > Also, can we (for the moment), simply call spark cast directly in parquet support instead of duplicating code. Then, we can override the cast operations that are parque

Re: [PR] feat(optimizer): Enable filter pushdown on window functions [datafusion]

2025-01-07 Thread via GitHub
alamb commented on PR #14026: URL: https://github.com/apache/datafusion/pull/14026#issuecomment-2576259000 This looks amazing @nuno-faria -- thanks you -- I plan to review it carefully tomorrow -- This is an automated message from the Apache Git Service. To respond to the message, please

Re: [I] Add `InSubqueryExec` support [datafusion-comet]

2025-01-07 Thread via GitHub
andygrove commented on issue #121: URL: https://github.com/apache/datafusion-comet/issues/121#issuecomment-2576247721 I'm going to pick this issue up. I am currently studying Spark's code to understand how it works without Comet. -- This is an automated message from the Apache Git Servi

Re: [PR] [comet-parquet-exec] Move type conversion logic for ParquetExec out of Cast expression. [datafusion-comet]

2025-01-07 Thread via GitHub
mbutrovich commented on PR #1229: URL: https://github.com/apache/datafusion-comet/pull/1229#issuecomment-2576256559 > Finally, can we include two more things (either in spark_parquet_options or in some parquet_conversion_context struct) which has the conversion and type promition options t

Re: [PR] fix: yield when the next file is ready to open to prevent CPU starvation [datafusion]

2025-01-07 Thread via GitHub
alamb commented on code in PR #14028: URL: https://github.com/apache/datafusion/pull/14028#discussion_r1906059983 ## datafusion/core/src/datasource/physical_plan/file_stream.rs: ## @@ -478,7 +478,12 @@ impl FileStream { reader

Re: [PR] Refactor into `LexOrdering::collapse`, `LexRequirement::collapse` avoid clone [datafusion]

2025-01-07 Thread via GitHub
alamb commented on PR #14038: URL: https://github.com/apache/datafusion/pull/14038#issuecomment-2576231781 I pushed https://github.com/apache/datafusion/pull/14038/commits/a44acfdb3af5bf0082c277de6ee7e09e92251a49 to this PR which had the content of the suggestion from @akurmustafa on http

[PR] chore: use datafusion from crates.io [datafusion-comet]

2025-01-07 Thread via GitHub
rluvaton opened a new pull request, #1232: URL: https://github.com/apache/datafusion-comet/pull/1232 DataFusion 44.0.0 was release to crates.io, therefore I'm using it instead of the release candidate -- This is an automated message from the Apache Git Service. To respond to the message,

Re: [PR] chore: extract strings file to `strings_func` like in spark grouping [datafusion-comet]

2025-01-07 Thread via GitHub
viirya commented on code in PR #1215: URL: https://github.com/apache/datafusion-comet/pull/1215#discussion_r1906040640 ## native/spark-expr/src/string_funcs/string_space.rs: ## @@ -0,0 +1,103 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contribut

Re: [PR] chore: extract strings file to `strings_func` like in spark grouping [datafusion-comet]

2025-01-07 Thread via GitHub
codecov-commenter commented on PR #1215: URL: https://github.com/apache/datafusion-comet/pull/1215#issuecomment-2576216434 ## [Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/1215?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_ca

Re: [I] Exponential planning time (100s of seconds) with `UNION` and `ORDER BY` queries [datafusion]

2025-01-07 Thread via GitHub
alamb commented on issue #13748: URL: https://github.com/apache/datafusion/issues/13748#issuecomment-2576212517 I have a few PRs open to make the fields non pub. I have also been studying the code -- my first POC will be to figure out how to avoid calling `normalized_oeq_class()` so much as

Re: [PR] Encapsulate fields of `EquivalenceProperties` [datafusion]

2025-01-07 Thread via GitHub
alamb commented on code in PR #14040: URL: https://github.com/apache/datafusion/pull/14040#discussion_r1906019498 ## datafusion/physical-expr/src/equivalence/properties.rs: ## @@ -124,15 +124,15 @@ use itertools::Itertools; /// ``` #[derive(Debug, Clone)] pub struct Equivalen

[PR] Encapsulate fields of `EquivalenceProperties` [datafusion]

2025-01-07 Thread via GitHub
alamb opened a new pull request, #14040: URL: https://github.com/apache/datafusion/pull/14040 ## Which issue does this PR close? - Part of https://github.com/apache/datafusion/issues/13748 ## Rationale for this change As a first part of optimizing equivalence / ordering c

Re: [PR] Feat/ffi enter tokio runtime [datafusion]

2025-01-07 Thread via GitHub
kevinjqliu commented on code in PR #13937: URL: https://github.com/apache/datafusion/pull/13937#discussion_r1906007056 ## datafusion/ffi/src/lib.rs: ## @@ -26,5 +26,14 @@ pub mod session_config; pub mod table_provider; pub mod table_source; +/// Returns the major version of

Re: [PR] Encapsulate fields of `EquivalenceGroup` [datafusion]

2025-01-07 Thread via GitHub
alamb commented on code in PR #14039: URL: https://github.com/apache/datafusion/pull/14039#discussion_r1906004274 ## datafusion/physical-expr/src/equivalence/class.rs: ## @@ -323,11 +323,10 @@ impl Display for EquivalenceClass { } } -/// An `EquivalenceGroup` is a collec

[PR] Encapsulate fields of `EquivalenceGroup` [datafusion]

2025-01-07 Thread via GitHub
alamb opened a new pull request, #14039: URL: https://github.com/apache/datafusion/pull/14039 ## Which issue does this PR close? - Part of https://github.com/apache/datafusion/issues/13748 ## Rationale for this change As a first part of optimizing equivalence / ordering c

Re: [PR] Refactor into `LexOrdering::collapse`, `LexRequirement::collapse` avoid clone [datafusion]

2025-01-07 Thread via GitHub
alamb commented on code in PR #14038: URL: https://github.com/apache/datafusion/pull/14038#discussion_r1905987391 ## datafusion/physical-expr-common/src/sort_expr.rs: ## @@ -409,6 +409,22 @@ impl LexOrdering { .map(PhysicalSortExpr::from) .collect()

Re: [PR] Refactor into `LexOrdering::collapse`, `LexRequirement::collapse` avoid clone [datafusion]

2025-01-07 Thread via GitHub
alamb commented on code in PR #14038: URL: https://github.com/apache/datafusion/pull/14038#discussion_r1905986687 ## datafusion/physical-expr/src/equivalence/ordering.rs: ## @@ -207,19 +212,6 @@ impl IntoIterator for OrderingEquivalenceClass { } } -/// This function cons

Re: [PR] Refactor into `LexOrdering::collapse`, `LexRequirement::collapse` avoid clone [datafusion]

2025-01-07 Thread via GitHub
alamb commented on code in PR #14038: URL: https://github.com/apache/datafusion/pull/14038#discussion_r1905985754 ## datafusion/physical-expr/src/equivalence/mod.rs: ## @@ -41,14 +41,9 @@ pub use properties::{ /// It will also filter out entries that are ordered if the next ent

[PR] chore: Release Ballista 43.0.0 [datafusion-ballista]

2025-01-07 Thread via GitHub
milenkovicm opened a new pull request, #1156: URL: https://github.com/apache/datafusion-ballista/pull/1156 # Which issue does this PR close? Closes #974. # Rationale for this change # What changes are included in this PR? # Are there any user-facin

Re: [I] FFI Execution Plans that spawn threads panic [datafusion]

2025-01-07 Thread via GitHub
kevinjqliu commented on issue #13851: URL: https://github.com/apache/datafusion/issues/13851#issuecomment-2576127057 > For the segmentation error above it is probably because you would need to build datafusion-python with that same fix to the FFI API - it is a breaking change. Thats

[PR] Refactor into `LexOrdering::collapse`, avoid clone [datafusion]

2025-01-07 Thread via GitHub
alamb opened a new pull request, #14038: URL: https://github.com/apache/datafusion/pull/14038 ## Which issue does this PR close? - Part of https://github.com/apache/datafusion/issues/13748 ## Rationale for this change While working to encapsulate the sort order co

Re: [PR] chore: extract datetime_funcs expressions to folders based on spark grouping [datafusion-comet]

2025-01-07 Thread via GitHub
andygrove merged PR #1222: URL: https://github.com/apache/datafusion-comet/pull/1222 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@

Re: [PR] [comet-parquet-exec] Disable DPP in stability tests when full native scan is enabled [datafusion-comet]

2025-01-07 Thread via GitHub
andygrove merged PR #1230: URL: https://github.com/apache/datafusion-comet/pull/1230 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@

Re: [PR] Complete encapsulatug `OrderingEquivalenceClass` (make fields non pub) [datafusion]

2025-01-07 Thread via GitHub
alamb commented on code in PR #14037: URL: https://github.com/apache/datafusion/pull/14037#discussion_r1905777670 ## datafusion/physical-expr/src/equivalence/ordering.rs: ## @@ -53,24 +53,33 @@ impl OrderingEquivalenceClass { self.orderings.clear(); } -/// Cr

Re: [PR] [comet-parquet-exec] Move type conversion logic for ParquetExec out of Cast expression. [datafusion-comet]

2025-01-07 Thread via GitHub
andygrove commented on PR #1229: URL: https://github.com/apache/datafusion-comet/pull/1229#issuecomment-2576060934 > Also, can we (for the moment), simply call spark cast directly in parquet support instead of duplicating code. Then, we can override the cast operations that are parquet spe

Re: [PR] [comet-parquet-exec] Disable DPP in stability tests when full native scan is enabled [datafusion-comet]

2025-01-07 Thread via GitHub
andygrove commented on code in PR #1230: URL: https://github.com/apache/datafusion-comet/pull/1230#discussion_r1905927899 ## spark/src/test/scala/org/apache/spark/sql/comet/CometPlanStabilitySuite.scala: ## @@ -263,11 +263,13 @@ trait CometPlanStabilitySuite extends DisableAdap

[PR] fix: Set scan implementation choice via environment variable [datafusion-comet]

2025-01-07 Thread via GitHub
parthchandra opened a new pull request, #1231: URL: https://github.com/apache/datafusion-comet/pull/1231 Makes it easier to switch scan implementation types during development by setting the environment variable `NATIVE_SCAN_IMPL= {native | native_full | native_recordbatch }` Default re

Re: [PR] Unparsing optimized (> 2 inputs) unions [datafusion]

2025-01-07 Thread via GitHub
MohamedAbdeen21 commented on PR #14031: URL: https://github.com/apache/datafusion/pull/14031#issuecomment-2576003567 Thanks for the suggestion @goldmedal, updated accordingly -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHu

Re: [PR] [comet-parquet-exec] Disable DPP in stability tests [datafusion-comet]

2025-01-07 Thread via GitHub
parthchandra commented on code in PR #1230: URL: https://github.com/apache/datafusion-comet/pull/1230#discussion_r1905896252 ## spark/src/test/scala/org/apache/spark/sql/comet/CometPlanStabilitySuite.scala: ## @@ -263,11 +263,13 @@ trait CometPlanStabilitySuite extends DisableA

Re: [PR] test: Add plan execution during tests for bounded source [datafusion]

2025-01-07 Thread via GitHub
avkirilishin commented on PR #14013: URL: https://github.com/apache/datafusion/pull/14013#issuecomment-2575981457 Sorry for the confusion, that was my mistake. Of course, this PR only addresses bounded sources. I’ve updated the title and commit message accordingly. > Do you also plan

Re: [I] FFI Execution Plans that spawn threads panic [datafusion]

2025-01-07 Thread via GitHub
timsaucer commented on issue #13851: URL: https://github.com/apache/datafusion/issues/13851#issuecomment-2575975404 Hi Kevin, I'm back to work this week but have a bit of a backlog right now to get through. For the segmentation error above it is probably because you would need to build data

[PR] [comet-parquet-exec] Disable DPP in stability tests [datafusion-comet]

2025-01-07 Thread via GitHub
andygrove opened a new pull request, #1230: URL: https://github.com/apache/datafusion-comet/pull/1230 ## Which issue does this PR close? Closes #. ## Rationale for this change ## What changes are included in this PR? ## How are these changes

Re: [PR] feat: add support for array_remove expression [datafusion-comet]

2025-01-07 Thread via GitHub
jatin510 commented on code in PR #1179: URL: https://github.com/apache/datafusion-comet/pull/1179#discussion_r1905885168 ## native/core/src/execution/planner.rs: ## @@ -735,6 +736,36 @@ impl PhysicalPlanner { )); Ok(array_has_expr)

Re: [PR] [comet-parquet-exec] fix: fix various bugs in casting between struct types [datafusion-comet]

2025-01-07 Thread via GitHub
parthchandra commented on code in PR #1226: URL: https://github.com/apache/datafusion-comet/pull/1226#discussion_r1905866905 ## native/spark-expr/src/cast.rs: ## @@ -817,17 +818,28 @@ fn cast_struct_to_struct( cast_options: &SparkCastOptions, ) -> DataFusionResult { m

Re: [PR] Use partial aggregation schema for spilling to avoid column mismatch in GroupedHashAggregateStream [datafusion]

2025-01-07 Thread via GitHub
korowa commented on code in PR #13995: URL: https://github.com/apache/datafusion/pull/13995#discussion_r1905842942 ## datafusion/core/src/dataframe/mod.rs: ## @@ -2743,6 +2754,143 @@ mod tests { Ok(()) } +// test for https://github.com/apache/datafusion/issue

Re: [PR] Use partial aggregation schema for spilling to avoid column mismatch in GroupedHashAggregateStream [datafusion]

2025-01-07 Thread via GitHub
korowa commented on code in PR #13995: URL: https://github.com/apache/datafusion/pull/13995#discussion_r1905846854 ## datafusion/core/src/dataframe/mod.rs: ## @@ -43,6 +38,10 @@ use crate::physical_plan::{ ExecutionPlan, SendableRecordBatchStream, }; use crate::prelude::S

Re: [PR] feat: add support for array_remove expression [datafusion-comet]

2025-01-07 Thread via GitHub
andygrove commented on code in PR #1179: URL: https://github.com/apache/datafusion-comet/pull/1179#discussion_r1905845993 ## native/core/src/execution/planner.rs: ## @@ -735,6 +736,36 @@ impl PhysicalPlanner { )); Ok(array_has_expr)

Re: [PR] feat: add support for array_remove expression [datafusion-comet]

2025-01-07 Thread via GitHub
jatin510 commented on PR #1179: URL: https://github.com/apache/datafusion-comet/pull/1179#issuecomment-2575891718 > There seems to be a difference in null handling between DataFusion and Spark that is causing the tests to fail. > > This test passes: > > ```scala > checkSpar

[PR] [comet-parquet-exec] Move type conversion logic for ParquetExec out of Cast expression. [datafusion-comet]

2025-01-07 Thread via GitHub
mbutrovich opened a new pull request, #1229: URL: https://github.com/apache/datafusion-comet/pull/1229 (no comment) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsub

Re: [PR] Start new line if \r in Postgres dialect [datafusion-sqlparser-rs]

2025-01-07 Thread via GitHub
iffyio merged PR #1647: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1647 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr

Re: [PR] Support pluralized time units [datafusion-sqlparser-rs]

2025-01-07 Thread via GitHub
iffyio commented on code in PR #1630: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1630#discussion_r1905821060 ## tests/sqlparser_common.rs: ## @@ -5374,10 +5396,49 @@ fn parse_interval_all() { verified_only_select("SELECT INTERVAL '1' MINUTE TO SECOND");

Re: [PR] fix: yield when the next file is ready to open to prevent CPU starvation [datafusion]

2025-01-07 Thread via GitHub
crepererum commented on code in PR #14028: URL: https://github.com/apache/datafusion/pull/14028#discussion_r1905787146 ## datafusion/core/src/datasource/physical_plan/file_stream.rs: ## @@ -478,7 +478,12 @@ impl FileStream { r

Re: [PR] Simplify error handling in case.rs (#13990) [datafusion]

2025-01-07 Thread via GitHub
alamb commented on PR #14033: URL: https://github.com/apache/datafusion/pull/14033#issuecomment-2575793893 Thank you @cj-zhukov ! It seems as if there is a compile error in this PR: https://github.com/apache/datafusion/actions/runs/12646530220/job/35237388507?pr=14033 ``` e

Re: [PR] Complete encapsulatug `OrderingEquivalenceClass` (make fields non pub) [datafusion]

2025-01-07 Thread via GitHub
alamb commented on code in PR #14037: URL: https://github.com/apache/datafusion/pull/14037#discussion_r1905776819 ## datafusion/physical-expr/src/equivalence/ordering.rs: ## @@ -39,7 +39,7 @@ use arrow_schema::SortOptions; /// ordering. In this case, we say that these orderings

[PR] Complete encapsulatug `OrderingEquivalenceClass` (make fields non pub) [datafusion]

2025-01-07 Thread via GitHub
alamb opened a new pull request, #14037: URL: https://github.com/apache/datafusion/pull/14037 ## Which issue does this PR close? - Part of https://github.com/apache/datafusion/issues/13748 ## Rationale for this change As a first part of optimizing equivalence / or

Re: [PR] chore: extract agg_funcs expressions to folders based on spark grouping [datafusion-comet]

2025-01-07 Thread via GitHub
andygrove merged PR #1224: URL: https://github.com/apache/datafusion-comet/pull/1224 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@

Re: [PR] fix: yield when the next file is ready to open to prevent CPU starvation [datafusion]

2025-01-07 Thread via GitHub
jeffreyssmith2nd commented on code in PR #14028: URL: https://github.com/apache/datafusion/pull/14028#discussion_r1905736433 ## datafusion/core/src/datasource/physical_plan/file_stream.rs: ## @@ -478,7 +478,12 @@ impl FileStream {

[I] [comet-parquet-exec] Track remaining test failures in POC 1 & 2 [datafusion-comet]

2025-01-07 Thread via GitHub
andygrove opened a new issue, #1228: URL: https://github.com/apache/datafusion-comet/issues/1228 ### What is the problem the feature request solves? I thought it would be useful to have one issue to track all the test failures that we are currently working on resolving ## POC 1

Re: [PR] Fix error on `array_distinct` when input is empty #13810 [datafusion]

2025-01-07 Thread via GitHub
comphead commented on code in PR #14034: URL: https://github.com/apache/datafusion/pull/14034#discussion_r1905726601 ## datafusion/functions-nested/src/set_ops.rs: ## @@ -513,9 +513,6 @@ fn general_array_distinct( array: &GenericListArray, field: &FieldRef, ) -> Resul

Re: [PR] chore: set validation and type hint for ffi tableprovider [datafusion-python]

2025-01-07 Thread via GitHub
timsaucer merged PR #983: URL: https://github.com/apache/datafusion-python/pull/983 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@d

Re: [PR] Add support for ClickHouse `FORMAT` on `INSERT` [datafusion-sqlparser-rs]

2025-01-07 Thread via GitHub
bombsimon commented on code in PR #1628: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1628#discussion_r1905462357 ## src/ast/dml.rs: ## @@ -547,7 +561,15 @@ impl Display for Insert { write!(f, "{source}")?; } -if self.source.is_non

Re: [PR] feat: support enable_url_table config [datafusion-python]

2025-01-07 Thread via GitHub
timsaucer commented on PR #980: URL: https://github.com/apache/datafusion-python/pull/980#issuecomment-2575307105 Thank you for the addition! Would it be simpler to just expose the function on the session context instead of making it part of the initialization? -- This is an automated me

Re: [PR] fix: yield when the next file is ready to open to prevent CPU starvation [datafusion]

2025-01-07 Thread via GitHub
crepererum commented on code in PR #14028: URL: https://github.com/apache/datafusion/pull/14028#discussion_r1905684304 ## datafusion/core/src/datasource/physical_plan/file_stream.rs: ## @@ -478,7 +478,12 @@ impl FileStream { r

Re: [PR] Added references to IDE documentation for dev containers [datafusion]

2025-01-07 Thread via GitHub
alamb merged PR #14014: URL: https://github.com/apache/datafusion/pull/14014 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] Added references to IDE documentation for dev containers [datafusion]

2025-01-07 Thread via GitHub
alamb commented on PR #14014: URL: https://github.com/apache/datafusion/pull/14014#issuecomment-2575659015 Thanks @Omega359 and @goldmedal -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the sp

Re: [I] Document how to use `.devcontainer` [datafusion]

2025-01-07 Thread via GitHub
alamb closed issue #13969: Document how to use `.devcontainer` URL: https://github.com/apache/datafusion/issues/13969 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubsc

[I] Un-cancellable Query when hitting many large files. [datafusion]

2025-01-07 Thread via GitHub
jeffreyssmith2nd opened a new issue, #14036: URL: https://github.com/apache/datafusion/issues/14036 ### Describe the bug **TLDR; Reading many large Parquet files can prevent a query from being cancelled.** We have a customer that is running a query similar to the following

Re: [PR] [comet-parquet-exec] fix: fix various bugs in casting between struct types [datafusion-comet]

2025-01-07 Thread via GitHub
andygrove merged PR #1226: URL: https://github.com/apache/datafusion-comet/pull/1226 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@

Re: [PR] Correctly look for end delimiter dollar quoted string [datafusion-sqlparser-rs]

2025-01-07 Thread via GitHub
hansott commented on PR #1650: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1650#issuecomment-2575575853 Adding some more tests first -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [PR] Default to ZSTD compression when writing Parquet [datafusion-python]

2025-01-07 Thread via GitHub
ion-elgreco commented on code in PR #981: URL: https://github.com/apache/datafusion-python/pull/981#discussion_r1905619163 ## python/datafusion/dataframe.py: ## @@ -620,16 +620,25 @@ def write_csv(self, path: str | pathlib.Path, with_header: bool = False) -> None def write

Re: [PR] Default to ZSTD compression when writing Parquet [datafusion-python]

2025-01-07 Thread via GitHub
kevinjqliu commented on code in PR #981: URL: https://github.com/apache/datafusion-python/pull/981#discussion_r1905611807 ## python/datafusion/dataframe.py: ## @@ -620,16 +620,25 @@ def write_csv(self, path: str | pathlib.Path, with_header: bool = False) -> None def write_

Re: [PR] [comet-parquet-exec] fix: fix various bugs in casting between struct types [datafusion-comet]

2025-01-07 Thread via GitHub
mbutrovich commented on code in PR #1226: URL: https://github.com/apache/datafusion-comet/pull/1226#discussion_r1905609592 ## native/spark-expr/src/cast.rs: ## @@ -817,17 +818,28 @@ fn cast_struct_to_struct( cast_options: &SparkCastOptions, ) -> DataFusionResult { mat

Re: [PR] Support pluralized time units [datafusion-sqlparser-rs]

2025-01-07 Thread via GitHub
wugeer commented on code in PR #1630: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1630#discussion_r1905597631 ## src/parser/mod.rs: ## @@ -2353,14 +2355,30 @@ impl<'a> Parser<'a> { }; Ok(DateTimeField::Week(week_day))

Re: [PR] Fix error on `array_distinct` when input is empty #13810 [datafusion]

2025-01-07 Thread via GitHub
cht42 commented on PR #14034: URL: https://github.com/apache/datafusion/pull/14034#issuecomment-2575530332 > Thanks @cht42 ! > > What does FLUP stand for 🤔 My google fu doesn't seem to be able to find anythihg relevant: https://www.google.com/search?q=flup&oq=flup Means: **F**o

Re: [PR] chore: extract agg_funcs expressions to folders based on spark grouping [datafusion-comet]

2025-01-07 Thread via GitHub
andygrove commented on PR #1224: URL: https://github.com/apache/datafusion-comet/pull/1224#issuecomment-2575518378 > Done, I really think some contributor with an access should merge those one by one and fix those conflicts as otherwise it will take a lot of back and forth and the conflict

[PR] Correctly look for end delimiter dollar quoted string [datafusion-sqlparser-rs]

2025-01-07 Thread via GitHub
hansott opened a new pull request, #1650: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1650 Currently the tokenizer throws an error for ```sql SELECT $abc$x$ab$abc$ ``` The logic is also quite difficult to read so I made it a bit simpler. -- This is an au

Re: [PR] feat: support enable_url_table config [datafusion-python]

2025-01-07 Thread via GitHub
chenkovsky commented on PR #980: URL: https://github.com/apache/datafusion-python/pull/980#issuecomment-2575455417 > Thank you for the addition! Would it be simpler to just expose the function on the session context instead of making it part of the initialization? Even if you do think it's

[PR] Add support for MS-SQL BEGIN/END TRY/CATCH [datafusion-sqlparser-rs]

2025-01-07 Thread via GitHub
yoavcloud opened a new pull request, #1649: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1649 (no comment) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [PR] Add support for ClickHouse `FORMAT` on `INSERT` [datafusion-sqlparser-rs]

2025-01-07 Thread via GitHub
bombsimon commented on code in PR #1628: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1628#discussion_r1905449986 ## src/ast/query.rs: ## @@ -2465,14 +2465,25 @@ impl fmt::Display for GroupByExpr { #[cfg_attr(feature = "serde", derive(Serialize, Deserialize))]

Re: [I] multiply overflow in stats.rs [datafusion]

2025-01-07 Thread via GitHub
LindaSummer commented on issue #13775: URL: https://github.com/apache/datafusion/issues/13775#issuecomment-2575392114 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. T

Re: [I] multiply overflow in stats.rs [datafusion]

2025-01-07 Thread via GitHub
LindaSummer commented on issue #13775: URL: https://github.com/apache/datafusion/issues/13775#issuecomment-2575391891 > Hi Edward - Thanks for taking a look at this! I do not have the ability to do that however if you add a comment here with just the word 'take' a github action will assign

Re: [PR] Feature/single source exec [datafusion]

2025-01-07 Thread via GitHub
mertak-synnada commented on PR #14035: URL: https://github.com/apache/datafusion/pull/14035#issuecomment-2575383133 Closed since this PR is opened pre-maturely -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL ab

Re: [PR] chore: deprecate `ValuesExec` in favour of `MemoryExec` [datafusion]

2025-01-07 Thread via GitHub
alamb commented on code in PR #14032: URL: https://github.com/apache/datafusion/pull/14032#discussion_r1905509497 ## datafusion/physical-plan/src/values.rs: ## @@ -34,6 +34,7 @@ use datafusion_execution::TaskContext; use datafusion_physical_expr::EquivalenceProperties; /// E

Re: [PR] Feature/single source exec [datafusion]

2025-01-07 Thread via GitHub
mertak-synnada closed pull request #14035: Feature/single source exec URL: https://github.com/apache/datafusion/pull/14035 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To un

Re: [I] multiply overflow in stats.rs [datafusion]

2025-01-07 Thread via GitHub
Omega359 commented on issue #13775: URL: https://github.com/apache/datafusion/issues/13775#issuecomment-2575373984 Hi Edward - Thanks for taking a look at this! I do not have the ability to do that however if you add a comment here with just the word 'take' a github action will assign it to

Re: [PR] Improve perfomance of `reverse` function [datafusion]

2025-01-07 Thread via GitHub
tlm365 commented on code in PR #14025: URL: https://github.com/apache/datafusion/pull/14025#discussion_r1905501646 ## datafusion/functions/src/unicode/reverse.rs: ## @@ -116,14 +115,23 @@ pub fn reverse(args: &[ArrayRef]) -> Result { } } -fn reverse_impl<'a, T: OffsetSi

Re: [PR] FLUP #13810 [datafusion]

2025-01-07 Thread via GitHub
alamb commented on PR #14034: URL: https://github.com/apache/datafusion/pull/14034#issuecomment-2575369809 Thanks @cht42 ! What does FLUP stand for 🤔 My google fu doesn't seem to be able to find anythihg relevant: https://www.google.com/search?q=flup&oq=flup -- This is an automat

  1   2   >