Re: [I] supports_filters_pushdown is invoked more than once on a single Custom Data Source [datafusion]

2025-01-07 Thread via GitHub
cisaacson commented on issue #13994: URL: https://github.com/apache/datafusion/issues/13994#issuecomment-2575344845 > You can also declare that the data source does not support pushing down any filters, and then, within a custom optimization rule similar to PushDownFilter, push the filters

[PR] Correctly look for end delimiter dollar quoted string [datafusion-sqlparser-rs]

2025-01-07 Thread via GitHub
hansott opened a new pull request, #1650: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1650 Currently the tokenizer throws an error for ```sql SELECT $abc$x$ab$abc$ ``` The logic is also quite difficult to read so I made it a bit simpler. -- This is an au

Re: [PR] chore: extract agg_funcs expressions to folders based on spark grouping [datafusion-comet]

2025-01-07 Thread via GitHub
andygrove commented on PR #1224: URL: https://github.com/apache/datafusion-comet/pull/1224#issuecomment-2575518378 > Done, I really think some contributor with an access should merge those one by one and fix those conflicts as otherwise it will take a lot of back and forth and the conflict

Re: [PR] Correctly look for end delimiter dollar quoted string [datafusion-sqlparser-rs]

2025-01-07 Thread via GitHub
hansott commented on PR #1650: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1650#issuecomment-2575575853 Adding some more tests first -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [PR] MsSQL SET for session params [datafusion-sqlparser-rs]

2025-01-07 Thread via GitHub
alamb commented on PR #1646: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1646#issuecomment-2575103207 Hi @yoavcloud -- it looks like this PR has some conflicts -- is there any chance you can resolve them? -- This is an automated message from the Apache Git Service. To re

Re: [I] sql result discrepency with sqlite, postgres and duckdb [datafusion]

2025-01-07 Thread via GitHub
alamb commented on issue #13780: URL: https://github.com/apache/datafusion/issues/13780#issuecomment-2575115298 First of all, very nice 🕵️ work ! > 'AS REAL' -> 'AS DOUBLE' > > This would better match the actual types being tested and would fix many of the failing results. Alo

Re: [PR] Unparsing optimized (> 2 inputs) unions [datafusion]

2025-01-07 Thread via GitHub
goldmedal commented on PR #14031: URL: https://github.com/apache/datafusion/pull/14031#issuecomment-2575129897 Thanks for working on this, @MohamedAbdeen21 Instead of applying optimization rules when roundtrip tests, I prefer to construct the expected plan manually. Just like other test

[PR] enum DropBehavior [datafusion-sqlparser-rs]

2025-01-07 Thread via GitHub
stepancheg opened a new pull request, #1648: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1648 There's `` definition [in the grammar](https://jakewheat.github.io/sql-overview/sql-2016-foundation-grammar.html#drop-behavior), so corresponding enum seems appropriate: ```

Re: [PR] Improve perfomance of `reverse` function [datafusion]

2025-01-07 Thread via GitHub
tlm365 commented on code in PR #14025: URL: https://github.com/apache/datafusion/pull/14025#discussion_r1905501646 ## datafusion/functions/src/unicode/reverse.rs: ## @@ -116,14 +115,23 @@ pub fn reverse(args: &[ArrayRef]) -> Result { } } -fn reverse_impl<'a, T: OffsetSi

Re: [PR] Add support for ClickHouse `FORMAT` on `INSERT` [datafusion-sqlparser-rs]

2025-01-07 Thread via GitHub
bombsimon commented on code in PR #1628: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1628#discussion_r1905449986 ## src/ast/query.rs: ## @@ -2465,14 +2465,25 @@ impl fmt::Display for GroupByExpr { #[cfg_attr(feature = "serde", derive(Serialize, Deserialize))]

Re: [I] multiply overflow in stats.rs [datafusion]

2025-01-07 Thread via GitHub
LindaSummer commented on issue #13775: URL: https://github.com/apache/datafusion/issues/13775#issuecomment-2575392114 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. T

Re: [I] multiply overflow in stats.rs [datafusion]

2025-01-07 Thread via GitHub
LindaSummer commented on issue #13775: URL: https://github.com/apache/datafusion/issues/13775#issuecomment-2575391891 > Hi Edward - Thanks for taking a look at this! I do not have the ability to do that however if you add a comment here with just the word 'take' a github action will assign

Re: [PR] Fix error on `array_distinct` when input is empty #13810 [datafusion]

2025-01-07 Thread via GitHub
comphead commented on code in PR #14034: URL: https://github.com/apache/datafusion/pull/14034#discussion_r1905726601 ## datafusion/functions-nested/src/set_ops.rs: ## @@ -513,9 +513,6 @@ fn general_array_distinct( array: &GenericListArray, field: &FieldRef, ) -> Resul

[I] [comet-parquet-exec] Track remaining test failures in POC 1 & 2 [datafusion-comet]

2025-01-07 Thread via GitHub
andygrove opened a new issue, #1228: URL: https://github.com/apache/datafusion-comet/issues/1228 ### What is the problem the feature request solves? I thought it would be useful to have one issue to track all the test failures that we are currently working on resolving ## POC 1

Re: [PR] fix: yield when the next file is ready to open to prevent CPU starvation [datafusion]

2025-01-07 Thread via GitHub
jeffreyssmith2nd commented on code in PR #14028: URL: https://github.com/apache/datafusion/pull/14028#discussion_r1905736433 ## datafusion/core/src/datasource/physical_plan/file_stream.rs: ## @@ -478,7 +478,12 @@ impl FileStream {

Re: [PR] chore: extract agg_funcs expressions to folders based on spark grouping [datafusion-comet]

2025-01-07 Thread via GitHub
andygrove merged PR #1224: URL: https://github.com/apache/datafusion-comet/pull/1224 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@

Re: [PR] [comet-parquet-exec] fix: fix various bugs in casting between struct types [datafusion-comet]

2025-01-07 Thread via GitHub
andygrove merged PR #1226: URL: https://github.com/apache/datafusion-comet/pull/1226 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@

Re: [PR] Support pluralized time units [datafusion-sqlparser-rs]

2025-01-07 Thread via GitHub
wugeer commented on code in PR #1630: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1630#discussion_r1905597631 ## src/parser/mod.rs: ## @@ -2353,14 +2355,30 @@ impl<'a> Parser<'a> { }; Ok(DateTimeField::Week(week_day))

Re: [PR] Default to ZSTD compression when writing Parquet [datafusion-python]

2025-01-07 Thread via GitHub
kevinjqliu commented on code in PR #981: URL: https://github.com/apache/datafusion-python/pull/981#discussion_r1905611807 ## python/datafusion/dataframe.py: ## @@ -620,16 +620,25 @@ def write_csv(self, path: str | pathlib.Path, with_header: bool = False) -> None def write_

Re: [PR] [comet-parquet-exec] fix: fix various bugs in casting between struct types [datafusion-comet]

2025-01-07 Thread via GitHub
mbutrovich commented on code in PR #1226: URL: https://github.com/apache/datafusion-comet/pull/1226#discussion_r1905609592 ## native/spark-expr/src/cast.rs: ## @@ -817,17 +818,28 @@ fn cast_struct_to_struct( cast_options: &SparkCastOptions, ) -> DataFusionResult { mat

Re: [PR] Default to ZSTD compression when writing Parquet [datafusion-python]

2025-01-07 Thread via GitHub
ion-elgreco commented on code in PR #981: URL: https://github.com/apache/datafusion-python/pull/981#discussion_r1905619163 ## python/datafusion/dataframe.py: ## @@ -620,16 +620,25 @@ def write_csv(self, path: str | pathlib.Path, with_header: bool = False) -> None def write

[I] Un-cancellable Query when hitting many large files. [datafusion]

2025-01-07 Thread via GitHub
jeffreyssmith2nd opened a new issue, #14036: URL: https://github.com/apache/datafusion/issues/14036 ### Describe the bug **TLDR; Reading many large Parquet files can prevent a query from being cancelled.** We have a customer that is running a query similar to the following

Re: [PR] Improve perfomance of `reverse` function [datafusion]

2025-01-07 Thread via GitHub
2010YOUY01 commented on code in PR #14025: URL: https://github.com/apache/datafusion/pull/14025#discussion_r1905442653 ## datafusion/functions/src/unicode/reverse.rs: ## @@ -116,14 +115,23 @@ pub fn reverse(args: &[ArrayRef]) -> Result { } } -fn reverse_impl<'a, T: Offs

[PR] Chore/single source exec [datafusion]

2025-01-07 Thread via GitHub
mertak-synnada opened a new pull request, #14035: URL: https://github.com/apache/datafusion/pull/14035 ## Which issue does this PR close? Closes #13838. ## Rationale for this change This PR merges all Data sources into one Execution Plan, named `DataSourceExec` a

Re: [PR] Add arrow cast [datafusion-python]

2025-01-07 Thread via GitHub
timsaucer merged PR #962: URL: https://github.com/apache/datafusion-python/pull/962 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@d

Re: [PR] release datafusion-python 42.1.0 [datafusion-python]

2025-01-07 Thread via GitHub
timsaucer commented on PR #930: URL: https://github.com/apache/datafusion-python/pull/930#issuecomment-2575300355 I recommend closing this PR since we have already moved on to 43. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitH

Re: [PR] Fix small issues in pyproject.toml [datafusion-python]

2025-01-07 Thread via GitHub
timsaucer merged PR #976: URL: https://github.com/apache/datafusion-python/pull/976 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@d

Re: [I] Add type hints and capsule validation for FFI Table providers [datafusion-python]

2025-01-07 Thread via GitHub
timsaucer closed issue #950: Add type hints and capsule validation for FFI Table providers URL: https://github.com/apache/datafusion-python/issues/950 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] FLUP #13810 [datafusion]

2025-01-07 Thread via GitHub
alamb commented on PR #14034: URL: https://github.com/apache/datafusion/pull/14034#issuecomment-2575369809 Thanks @cht42 ! What does FLUP stand for 🤔 My google fu doesn't seem to be able to find anythihg relevant: https://www.google.com/search?q=flup&oq=flup -- This is an automat

Re: [PR] Improve perfomance of `reverse` function [datafusion]

2025-01-07 Thread via GitHub
tlm365 commented on code in PR #14025: URL: https://github.com/apache/datafusion/pull/14025#discussion_r1905501646 ## datafusion/functions/src/unicode/reverse.rs: ## @@ -116,14 +115,23 @@ pub fn reverse(args: &[ArrayRef]) -> Result { } } -fn reverse_impl<'a, T: OffsetSi

Re: [I] multiply overflow in stats.rs [datafusion]

2025-01-07 Thread via GitHub
Omega359 commented on issue #13775: URL: https://github.com/apache/datafusion/issues/13775#issuecomment-2575373984 Hi Edward - Thanks for taking a look at this! I do not have the ability to do that however if you add a comment here with just the word 'take' a github action will assign it to

[PR] Complete encapsulatug `OrderingEquivalenceClass` (make fields non pub) [datafusion]

2025-01-07 Thread via GitHub
alamb opened a new pull request, #14037: URL: https://github.com/apache/datafusion/pull/14037 ## Which issue does this PR close? - Part of https://github.com/apache/datafusion/issues/13748 ## Rationale for this change As a first part of optimizing equivalence / or

Re: [PR] Complete encapsulatug `OrderingEquivalenceClass` (make fields non pub) [datafusion]

2025-01-07 Thread via GitHub
alamb commented on code in PR #14037: URL: https://github.com/apache/datafusion/pull/14037#discussion_r1905776819 ## datafusion/physical-expr/src/equivalence/ordering.rs: ## @@ -39,7 +39,7 @@ use arrow_schema::SortOptions; /// ordering. In this case, we say that these orderings

[PR] Add support for MS-SQL BEGIN/END TRY/CATCH [datafusion-sqlparser-rs]

2025-01-07 Thread via GitHub
yoavcloud opened a new pull request, #1649: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1649 (no comment) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [PR] feat: support enable_url_table config [datafusion-python]

2025-01-07 Thread via GitHub
chenkovsky commented on PR #980: URL: https://github.com/apache/datafusion-python/pull/980#issuecomment-2575455417 > Thank you for the addition! Would it be simpler to just expose the function on the session context instead of making it part of the initialization? Even if you do think it's

Re: [PR] Add support for ClickHouse `FORMAT` on `INSERT` [datafusion-sqlparser-rs]

2025-01-07 Thread via GitHub
bombsimon commented on code in PR #1628: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1628#discussion_r1905454728 ## src/keywords.rs: ## @@ -931,7 +931,7 @@ pub const RESERVED_FOR_TABLE_ALIAS: &[Keyword] = &[ Keyword::PREWHERE, // for ClickHouse SELECT

Re: [PR] Feature/single source exec [datafusion]

2025-01-07 Thread via GitHub
mertak-synnada closed pull request #14035: Feature/single source exec URL: https://github.com/apache/datafusion/pull/14035 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To un

Re: [PR] chore: deprecate `ValuesExec` in favour of `MemoryExec` [datafusion]

2025-01-07 Thread via GitHub
alamb commented on code in PR #14032: URL: https://github.com/apache/datafusion/pull/14032#discussion_r1905509497 ## datafusion/physical-plan/src/values.rs: ## @@ -34,6 +34,7 @@ use datafusion_execution::TaskContext; use datafusion_physical_expr::EquivalenceProperties; /// E

Re: [PR] Feature/single source exec [datafusion]

2025-01-07 Thread via GitHub
mertak-synnada commented on PR #14035: URL: https://github.com/apache/datafusion/pull/14035#issuecomment-2575383133 Closed since this PR is opened pre-maturely -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL ab

Re: [PR] Fix error on `array_distinct` when input is empty #13810 [datafusion]

2025-01-07 Thread via GitHub
cht42 commented on PR #14034: URL: https://github.com/apache/datafusion/pull/14034#issuecomment-2575530332 > Thanks @cht42 ! > > What does FLUP stand for 🤔 My google fu doesn't seem to be able to find anythihg relevant: https://www.google.com/search?q=flup&oq=flup Means: **F**o

Re: [PR] Simplify error handling in case.rs (#13990) [datafusion]

2025-01-07 Thread via GitHub
alamb commented on PR #14033: URL: https://github.com/apache/datafusion/pull/14033#issuecomment-2575793893 Thank you @cj-zhukov ! It seems as if there is a compile error in this PR: https://github.com/apache/datafusion/actions/runs/12646530220/job/35237388507?pr=14033 ``` e

Re: [PR] fix: yield when the next file is ready to open to prevent CPU starvation [datafusion]

2025-01-07 Thread via GitHub
crepererum commented on code in PR #14028: URL: https://github.com/apache/datafusion/pull/14028#discussion_r1905787146 ## datafusion/core/src/datasource/physical_plan/file_stream.rs: ## @@ -478,7 +478,12 @@ impl FileStream { r

Re: [PR] Add support for ClickHouse `FORMAT` on `INSERT` [datafusion-sqlparser-rs]

2025-01-07 Thread via GitHub
bombsimon commented on code in PR #1628: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1628#discussion_r1905449986 ## src/ast/query.rs: ## @@ -2465,14 +2465,25 @@ impl fmt::Display for GroupByExpr { #[cfg_attr(feature = "serde", derive(Serialize, Deserialize))]

Re: [PR] MsSQL SET for session params [datafusion-sqlparser-rs]

2025-01-07 Thread via GitHub
alamb commented on code in PR #1646: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1646#discussion_r1905447833 ## src/parser/mod.rs: ## @@ -10307,11 +10307,67 @@ impl<'a> Parser<'a> { snapshot: None, session: false, }

Re: [PR] MsSQL SET for session params [datafusion-sqlparser-rs]

2025-01-07 Thread via GitHub
alamb commented on PR #1646: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1646#issuecomment-2575293097 fyi @iffyio -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific co

Re: [PR] Support async iteration of RecordBatchStream [datafusion-python]

2025-01-07 Thread via GitHub
timsaucer commented on code in PR #975: URL: https://github.com/apache/datafusion-python/pull/975#discussion_r1905447211 ## python/datafusion/record_batch.py: ## @@ -59,18 +59,22 @@ def __init__(self, record_batch_stream: df_internal.RecordBatchStream) -> None: def next(

Re: [PR] Added references to IDE documentation for dev containers [datafusion]

2025-01-07 Thread via GitHub
alamb commented on PR #14014: URL: https://github.com/apache/datafusion/pull/14014#issuecomment-2575659015 Thanks @Omega359 and @goldmedal -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the sp

Re: [PR] Added references to IDE documentation for dev containers [datafusion]

2025-01-07 Thread via GitHub
alamb merged PR #14014: URL: https://github.com/apache/datafusion/pull/14014 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [I] Document how to use `.devcontainer` [datafusion]

2025-01-07 Thread via GitHub
alamb closed issue #13969: Document how to use `.devcontainer` URL: https://github.com/apache/datafusion/issues/13969 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubsc

Re: [PR] fix: yield when the next file is ready to open to prevent CPU starvation [datafusion]

2025-01-07 Thread via GitHub
crepererum commented on code in PR #14028: URL: https://github.com/apache/datafusion/pull/14028#discussion_r1905684304 ## datafusion/core/src/datasource/physical_plan/file_stream.rs: ## @@ -478,7 +478,12 @@ impl FileStream { r

Re: [PR] Add support for ClickHouse `FORMAT` on `INSERT` [datafusion-sqlparser-rs]

2025-01-07 Thread via GitHub
bombsimon commented on code in PR #1628: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1628#discussion_r1905462357 ## src/ast/dml.rs: ## @@ -547,7 +561,15 @@ impl Display for Insert { write!(f, "{source}")?; } -if self.source.is_non

Re: [PR] feat: support enable_url_table config [datafusion-python]

2025-01-07 Thread via GitHub
timsaucer commented on PR #980: URL: https://github.com/apache/datafusion-python/pull/980#issuecomment-2575307105 Thank you for the addition! Would it be simpler to just expose the function on the session context instead of making it part of the initialization? -- This is an automated me

Re: [PR] chore: set validation and type hint for ffi tableprovider [datafusion-python]

2025-01-07 Thread via GitHub
timsaucer merged PR #983: URL: https://github.com/apache/datafusion-python/pull/983 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@d

[PR] [comet-parquet-exec] Move type conversion logic for ParquetExec out of Cast expression. [datafusion-comet]

2025-01-07 Thread via GitHub
mbutrovich opened a new pull request, #1229: URL: https://github.com/apache/datafusion-comet/pull/1229 (no comment) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsub

Re: [PR] Support pluralized time units [datafusion-sqlparser-rs]

2025-01-07 Thread via GitHub
iffyio commented on code in PR #1630: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1630#discussion_r1905821060 ## tests/sqlparser_common.rs: ## @@ -5374,10 +5396,49 @@ fn parse_interval_all() { verified_only_select("SELECT INTERVAL '1' MINUTE TO SECOND");

Re: [PR] Start new line if \r in Postgres dialect [datafusion-sqlparser-rs]

2025-01-07 Thread via GitHub
iffyio merged PR #1647: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1647 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr

Re: [PR] feat: add support for array_remove expression [datafusion-comet]

2025-01-07 Thread via GitHub
jatin510 commented on PR #1179: URL: https://github.com/apache/datafusion-comet/pull/1179#issuecomment-2575891718 > There seems to be a difference in null handling between DataFusion and Spark that is causing the tests to fail. > > This test passes: > > ```scala > checkSpar

Re: [PR] [comet-parquet-exec] Disable DPP in stability tests [datafusion-comet]

2025-01-07 Thread via GitHub
parthchandra commented on code in PR #1230: URL: https://github.com/apache/datafusion-comet/pull/1230#discussion_r1905896252 ## spark/src/test/scala/org/apache/spark/sql/comet/CometPlanStabilitySuite.scala: ## @@ -263,11 +263,13 @@ trait CometPlanStabilitySuite extends DisableA

[PR] [comet-parquet-exec] Disable DPP in stability tests [datafusion-comet]

2025-01-07 Thread via GitHub
andygrove opened a new pull request, #1230: URL: https://github.com/apache/datafusion-comet/pull/1230 ## Which issue does this PR close? Closes #. ## Rationale for this change ## What changes are included in this PR? ## How are these changes

Re: [I] FFI Execution Plans that spawn threads panic [datafusion]

2025-01-07 Thread via GitHub
timsaucer commented on issue #13851: URL: https://github.com/apache/datafusion/issues/13851#issuecomment-2575975404 Hi Kevin, I'm back to work this week but have a bit of a backlog right now to get through. For the segmentation error above it is probably because you would need to build data

Re: [PR] Unparsing optimized (> 2 inputs) unions [datafusion]

2025-01-07 Thread via GitHub
MohamedAbdeen21 commented on PR #14031: URL: https://github.com/apache/datafusion/pull/14031#issuecomment-2576003567 Thanks for the suggestion @goldmedal, updated accordingly -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHu

[PR] Encapsulate fields of `EquivalenceGroup` [datafusion]

2025-01-07 Thread via GitHub
alamb opened a new pull request, #14039: URL: https://github.com/apache/datafusion/pull/14039 ## Which issue does this PR close? - Part of https://github.com/apache/datafusion/issues/13748 ## Rationale for this change As a first part of optimizing equivalence / ordering c

Re: [PR] Encapsulate fields of `EquivalenceGroup` [datafusion]

2025-01-07 Thread via GitHub
alamb commented on code in PR #14039: URL: https://github.com/apache/datafusion/pull/14039#discussion_r1906004274 ## datafusion/physical-expr/src/equivalence/class.rs: ## @@ -323,11 +323,10 @@ impl Display for EquivalenceClass { } } -/// An `EquivalenceGroup` is a collec

Re: [PR] Feat/ffi enter tokio runtime [datafusion]

2025-01-07 Thread via GitHub
kevinjqliu commented on code in PR #13937: URL: https://github.com/apache/datafusion/pull/13937#discussion_r1906007056 ## datafusion/ffi/src/lib.rs: ## @@ -26,5 +26,14 @@ pub mod session_config; pub mod table_provider; pub mod table_source; +/// Returns the major version of

Re: [PR] Use partial aggregation schema for spilling to avoid column mismatch in GroupedHashAggregateStream [datafusion]

2025-01-07 Thread via GitHub
korowa commented on code in PR #13995: URL: https://github.com/apache/datafusion/pull/13995#discussion_r1905846854 ## datafusion/core/src/dataframe/mod.rs: ## @@ -43,6 +38,10 @@ use crate::physical_plan::{ ExecutionPlan, SendableRecordBatchStream, }; use crate::prelude::S

Re: [PR] feat: add support for array_remove expression [datafusion-comet]

2025-01-07 Thread via GitHub
andygrove commented on code in PR #1179: URL: https://github.com/apache/datafusion-comet/pull/1179#discussion_r1905845993 ## native/core/src/execution/planner.rs: ## @@ -735,6 +736,36 @@ impl PhysicalPlanner { )); Ok(array_has_expr)

Re: [PR] feat: add support for array_remove expression [datafusion-comet]

2025-01-07 Thread via GitHub
jatin510 commented on code in PR #1179: URL: https://github.com/apache/datafusion-comet/pull/1179#discussion_r1905885168 ## native/core/src/execution/planner.rs: ## @@ -735,6 +736,36 @@ impl PhysicalPlanner { )); Ok(array_has_expr)

[PR] fix: Set scan implementation choice via environment variable [datafusion-comet]

2025-01-07 Thread via GitHub
parthchandra opened a new pull request, #1231: URL: https://github.com/apache/datafusion-comet/pull/1231 Makes it easier to switch scan implementation types during development by setting the environment variable `NATIVE_SCAN_IMPL= {native | native_full | native_recordbatch }` Default re

Re: [I] FFI Execution Plans that spawn threads panic [datafusion]

2025-01-07 Thread via GitHub
kevinjqliu commented on issue #13851: URL: https://github.com/apache/datafusion/issues/13851#issuecomment-2576127057 > For the segmentation error above it is probably because you would need to build datafusion-python with that same fix to the FFI API - it is a breaking change. Thats

[PR] chore: Release Ballista 43.0.0 [datafusion-ballista]

2025-01-07 Thread via GitHub
milenkovicm opened a new pull request, #1156: URL: https://github.com/apache/datafusion-ballista/pull/1156 # Which issue does this PR close? Closes #974. # Rationale for this change # What changes are included in this PR? # Are there any user-facin

Re: [PR] Refactor into `LexOrdering::collapse`, `LexRequirement::collapse` avoid clone [datafusion]

2025-01-07 Thread via GitHub
alamb commented on code in PR #14038: URL: https://github.com/apache/datafusion/pull/14038#discussion_r1905986687 ## datafusion/physical-expr/src/equivalence/ordering.rs: ## @@ -207,19 +212,6 @@ impl IntoIterator for OrderingEquivalenceClass { } } -/// This function cons

Re: [PR] Refactor into `LexOrdering::collapse`, `LexRequirement::collapse` avoid clone [datafusion]

2025-01-07 Thread via GitHub
alamb commented on code in PR #14038: URL: https://github.com/apache/datafusion/pull/14038#discussion_r1905985754 ## datafusion/physical-expr/src/equivalence/mod.rs: ## @@ -41,14 +41,9 @@ pub use properties::{ /// It will also filter out entries that are ordered if the next ent

Re: [PR] Refactor into `LexOrdering::collapse`, `LexRequirement::collapse` avoid clone [datafusion]

2025-01-07 Thread via GitHub
alamb commented on code in PR #14038: URL: https://github.com/apache/datafusion/pull/14038#discussion_r1905987391 ## datafusion/physical-expr-common/src/sort_expr.rs: ## @@ -409,6 +409,22 @@ impl LexOrdering { .map(PhysicalSortExpr::from) .collect()

Re: [PR] Use partial aggregation schema for spilling to avoid column mismatch in GroupedHashAggregateStream [datafusion]

2025-01-07 Thread via GitHub
korowa commented on code in PR #13995: URL: https://github.com/apache/datafusion/pull/13995#discussion_r1905842942 ## datafusion/core/src/dataframe/mod.rs: ## @@ -2743,6 +2754,143 @@ mod tests { Ok(()) } +// test for https://github.com/apache/datafusion/issue

Re: [PR] [comet-parquet-exec] fix: fix various bugs in casting between struct types [datafusion-comet]

2025-01-07 Thread via GitHub
parthchandra commented on code in PR #1226: URL: https://github.com/apache/datafusion-comet/pull/1226#discussion_r1905866905 ## native/spark-expr/src/cast.rs: ## @@ -817,17 +818,28 @@ fn cast_struct_to_struct( cast_options: &SparkCastOptions, ) -> DataFusionResult { m

Re: [PR] [comet-parquet-exec] Move type conversion logic for ParquetExec out of Cast expression. [datafusion-comet]

2025-01-07 Thread via GitHub
andygrove commented on PR #1229: URL: https://github.com/apache/datafusion-comet/pull/1229#issuecomment-2576060934 > Also, can we (for the moment), simply call spark cast directly in parquet support instead of duplicating code. Then, we can override the cast operations that are parquet spe

Re: [PR] test: Add plan execution during tests for bounded source [datafusion]

2025-01-07 Thread via GitHub
avkirilishin commented on PR #14013: URL: https://github.com/apache/datafusion/pull/14013#issuecomment-2575981457 Sorry for the confusion, that was my mistake. Of course, this PR only addresses bounded sources. I’ve updated the title and commit message accordingly. > Do you also plan

Re: [PR] [comet-parquet-exec] Disable DPP in stability tests when full native scan is enabled [datafusion-comet]

2025-01-07 Thread via GitHub
andygrove commented on code in PR #1230: URL: https://github.com/apache/datafusion-comet/pull/1230#discussion_r1905927899 ## spark/src/test/scala/org/apache/spark/sql/comet/CometPlanStabilitySuite.scala: ## @@ -263,11 +263,13 @@ trait CometPlanStabilitySuite extends DisableAdap

[PR] Refactor into `LexOrdering::collapse`, avoid clone [datafusion]

2025-01-07 Thread via GitHub
alamb opened a new pull request, #14038: URL: https://github.com/apache/datafusion/pull/14038 ## Which issue does this PR close? - Part of https://github.com/apache/datafusion/issues/13748 ## Rationale for this change While working to encapsulate the sort order co

Re: [I] Move CPU Bound Tasks off Tokio Threadpool [datafusion]

2025-01-07 Thread via GitHub
djanderson commented on issue #13692: URL: https://github.com/apache/datafusion/issues/13692#issuecomment-2576419792 @tustvold, I gave this a thorough read today and plan to test out the approach soon. The only thing I'm a little flaky on is the use of the call to `use_current_thread

Re: [PR] fix: yield when the next file is ready to open to prevent CPU starvation [datafusion]

2025-01-07 Thread via GitHub
alamb commented on code in PR #14028: URL: https://github.com/apache/datafusion/pull/14028#discussion_r1906059983 ## datafusion/core/src/datasource/physical_plan/file_stream.rs: ## @@ -478,7 +478,12 @@ impl FileStream { reader

Re: [PR] [comet-parquet-exec] Move type conversion logic for ParquetExec out of Cast expression. [datafusion-comet]

2025-01-07 Thread via GitHub
mbutrovich commented on PR #1229: URL: https://github.com/apache/datafusion-comet/pull/1229#issuecomment-2576256559 > Finally, can we include two more things (either in spark_parquet_options or in some parquet_conversion_context struct) which has the conversion and type promition options t

Re: [I] Add `InSubqueryExec` support [datafusion-comet]

2025-01-07 Thread via GitHub
andygrove commented on issue #121: URL: https://github.com/apache/datafusion-comet/issues/121#issuecomment-2576247721 I'm going to pick this issue up. I am currently studying Spark's code to understand how it works without Comet. -- This is an automated message from the Apache Git Servi

Re: [PR] feat(optimizer): Enable filter pushdown on window functions [datafusion]

2025-01-07 Thread via GitHub
alamb commented on PR #14026: URL: https://github.com/apache/datafusion/pull/14026#issuecomment-2576259000 This looks amazing @nuno-faria -- thanks you -- I plan to review it carefully tomorrow -- This is an automated message from the Apache Git Service. To respond to the message, please

Re: [PR] Complete encapsulatug `OrderingEquivalenceClass` (make fields non pub) [datafusion]

2025-01-07 Thread via GitHub
alamb commented on code in PR #14037: URL: https://github.com/apache/datafusion/pull/14037#discussion_r1905777670 ## datafusion/physical-expr/src/equivalence/ordering.rs: ## @@ -53,24 +53,33 @@ impl OrderingEquivalenceClass { self.orderings.clear(); } -/// Cr

Re: [PR] [comet-parquet-exec] Disable DPP in stability tests when full native scan is enabled [datafusion-comet]

2025-01-07 Thread via GitHub
andygrove merged PR #1230: URL: https://github.com/apache/datafusion-comet/pull/1230 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@

Re: [PR] chore: extract datetime_funcs expressions to folders based on spark grouping [datafusion-comet]

2025-01-07 Thread via GitHub
andygrove merged PR #1222: URL: https://github.com/apache/datafusion-comet/pull/1222 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@

Re: [PR] [comet-parquet-exec] Move type conversion logic for ParquetExec out of Cast expression. [datafusion-comet]

2025-01-07 Thread via GitHub
parthchandra commented on PR #1229: URL: https://github.com/apache/datafusion-comet/pull/1229#issuecomment-2576305496 > > Also, can we (for the moment), simply call spark cast directly in parquet support instead of duplicating code. Then, we can override the cast operations that are parque

Re: [I] Exponential planning time (100s of seconds) with `UNION` and `ORDER BY` queries [datafusion]

2025-01-07 Thread via GitHub
alamb commented on issue #13748: URL: https://github.com/apache/datafusion/issues/13748#issuecomment-2576212517 I have a few PRs open to make the fields non pub. I have also been studying the code -- my first POC will be to figure out how to avoid calling `normalized_oeq_class()` so much as

[PR] chore: use datafusion from crates.io [datafusion-comet]

2025-01-07 Thread via GitHub
rluvaton opened a new pull request, #1232: URL: https://github.com/apache/datafusion-comet/pull/1232 DataFusion 44.0.0 was release to crates.io, therefore I'm using it instead of the release candidate -- This is an automated message from the Apache Git Service. To respond to the message,

Re: [PR] Refactor into `LexOrdering::collapse`, `LexRequirement::collapse` avoid clone [datafusion]

2025-01-07 Thread via GitHub
alamb commented on PR #14038: URL: https://github.com/apache/datafusion/pull/14038#issuecomment-2576231781 I pushed https://github.com/apache/datafusion/pull/14038/commits/a44acfdb3af5bf0082c277de6ee7e09e92251a49 to this PR which had the content of the suggestion from @akurmustafa on http

Re: [PR] Encapsulate fields of `EquivalenceProperties` [datafusion]

2025-01-07 Thread via GitHub
alamb commented on code in PR #14040: URL: https://github.com/apache/datafusion/pull/14040#discussion_r1906019498 ## datafusion/physical-expr/src/equivalence/properties.rs: ## @@ -124,15 +124,15 @@ use itertools::Itertools; /// ``` #[derive(Debug, Clone)] pub struct Equivalen

[PR] Encapsulate fields of `EquivalenceProperties` [datafusion]

2025-01-07 Thread via GitHub
alamb opened a new pull request, #14040: URL: https://github.com/apache/datafusion/pull/14040 ## Which issue does this PR close? - Part of https://github.com/apache/datafusion/issues/13748 ## Rationale for this change As a first part of optimizing equivalence / ordering c

Re: [PR] chore: extract strings file to `strings_func` like in spark grouping [datafusion-comet]

2025-01-07 Thread via GitHub
codecov-commenter commented on PR #1215: URL: https://github.com/apache/datafusion-comet/pull/1215#issuecomment-2576216434 ## [Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/1215?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_ca

Re: [PR] chore: extract strings file to `strings_func` like in spark grouping [datafusion-comet]

2025-01-07 Thread via GitHub
viirya commented on code in PR #1215: URL: https://github.com/apache/datafusion-comet/pull/1215#discussion_r1906040640 ## native/spark-expr/src/string_funcs/string_space.rs: ## @@ -0,0 +1,103 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contribut

Re: [PR] [comet-parquet-exec] Move type conversion logic for ParquetExec out of Cast expression. [datafusion-comet]

2025-01-07 Thread via GitHub
parthchandra commented on PR #1229: URL: https://github.com/apache/datafusion-comet/pull/1229#issuecomment-2576342088 > > Finally, can we include two more things (either in spark_parquet_options or in some parquet_conversion_context struct) which has the conversion and type promition optio

Re: [PR] chore: Release Ballista 43.0.0 [datafusion-ballista]

2025-01-07 Thread via GitHub
andygrove merged PR #1156: URL: https://github.com/apache/datafusion-ballista/pull/1156 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr.

Re: [I] Ballista 43.0.0 Release [datafusion-ballista]

2025-01-07 Thread via GitHub
andygrove closed issue #974: Ballista 43.0.0 Release URL: https://github.com/apache/datafusion-ballista/issues/974 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscrib

Re: [PR] chore: Upgrade to DataFusion 44.0.0 from 44.0.0 RC2 [datafusion-comet]

2025-01-07 Thread via GitHub
andygrove merged PR #1232: URL: https://github.com/apache/datafusion-comet/pull/1232 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@

Re: [I] Define extension API for user-defined invariants. [datafusion]

2025-01-07 Thread via GitHub
alamb commented on issue #14029: URL: https://github.com/apache/datafusion/issues/14029#issuecomment-2575052672 > But that would still leave the logical plan invariant extensions for consideration. I would recommend adding a method to: https://docs.rs/datafusion/latest/datafusion/log

  1   2   >