Re: [PR] Migrate Optimizer tests to insta, part4 [datafusion]

2025-05-04 Thread via GitHub
blaginin commented on code in PR #15937: URL: https://github.com/apache/datafusion/pull/15937#discussion_r2072657525 ## datafusion/optimizer/src/propagate_empty_relation.rs: ## @@ -280,8 +291,7 @@ mod tests { .project(vec![binary_expr(lit(1), Operator::Plus, lit(1))

Re: [PR] Migrate Optimizer tests to insta, part4 [datafusion]

2025-05-04 Thread via GitHub
qstommyshu commented on code in PR #15937: URL: https://github.com/apache/datafusion/pull/15937#discussion_r2072659106 ## datafusion/optimizer/src/propagate_empty_relation.rs: ## @@ -280,8 +291,7 @@ mod tests { .project(vec![binary_expr(lit(1), Operator::Plus, lit(1

Re: [PR] Migrate Optimizer tests to insta, part4 [datafusion]

2025-05-04 Thread via GitHub
qstommyshu commented on code in PR #15937: URL: https://github.com/apache/datafusion/pull/15937#discussion_r2072659106 ## datafusion/optimizer/src/propagate_empty_relation.rs: ## @@ -280,8 +291,7 @@ mod tests { .project(vec![binary_expr(lit(1), Operator::Plus, lit(1

Re: [PR] Migrate Optimizer tests to insta, part4 [datafusion]

2025-05-04 Thread via GitHub
qstommyshu commented on code in PR #15937: URL: https://github.com/apache/datafusion/pull/15937#discussion_r2072659106 ## datafusion/optimizer/src/propagate_empty_relation.rs: ## @@ -280,8 +291,7 @@ mod tests { .project(vec![binary_expr(lit(1), Operator::Plus, lit(1

Re: [PR] add benchmark code for `Reuse rows in row cursor stream` [datafusion]

2025-05-04 Thread via GitHub
acking-you commented on PR #15913: URL: https://github.com/apache/datafusion/pull/15913#issuecomment-2849308320 Thank you for your code review, @alamb. I've added more performance test cases! -- This is an automated message from the Apache Git Service. To respond to the message, please lo

Re: [PR] Update extending-operators.md [datafusion]

2025-05-04 Thread via GitHub
Adez017 commented on PR #15832: URL: https://github.com/apache/datafusion/pull/15832#issuecomment-2849308705 > Could you please refer to the error lints in CI? Such as > > ``` > error[E0433]: failed to resolve: use of undeclared type `Statistics` > --> datafusion/core/src/lib.r

Re: [PR] Added SQL Example for `Aggregate Functions` [datafusion]

2025-05-04 Thread via GitHub
Adez017 commented on PR #15778: URL: https://github.com/apache/datafusion/pull/15778#issuecomment-2849310226 hi @alamb @xudong963 i think its not a big PR to deal with , it just need a small inspection and it would be helpful do deal with it . Thank you -- This is an automated messag

Re: [PR] Migrate Optimizer tests to insta, part4 [datafusion]

2025-05-04 Thread via GitHub
qstommyshu commented on code in PR #15937: URL: https://github.com/apache/datafusion/pull/15937#discussion_r2072660969 ## datafusion/optimizer/src/test/mod.rs: ## @@ -242,6 +226,24 @@ pub fn assert_optimized_plan_eq_display_indent( assert_eq!(formatted_plan, expected); }

Re: [PR] Migrate Optimizer tests to insta, part4 [datafusion]

2025-05-04 Thread via GitHub
qstommyshu commented on code in PR #15937: URL: https://github.com/apache/datafusion/pull/15937#discussion_r2072661482 ## datafusion/optimizer/src/test/mod.rs: ## @@ -242,6 +226,24 @@ pub fn assert_optimized_plan_eq_display_indent( assert_eq!(formatted_plan, expected); }

Re: [PR] Migrate Optimizer tests to insta, part4 [datafusion]

2025-05-04 Thread via GitHub
qstommyshu commented on code in PR #15937: URL: https://github.com/apache/datafusion/pull/15937#discussion_r2072661482 ## datafusion/optimizer/src/test/mod.rs: ## @@ -242,6 +226,24 @@ pub fn assert_optimized_plan_eq_display_indent( assert_eq!(formatted_plan, expected); }

Re: [PR] Add support for `GO` batch delimiter in SQL Server [datafusion-sqlparser-rs]

2025-05-04 Thread via GitHub
iffyio commented on code in PR #1809: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1809#discussion_r2072719447 ## src/parser/mod.rs: ## @@ -475,6 +475,10 @@ impl<'a> Parser<'a> { if expecting_statement_delimiter && word.keyword == Keyword::E

Re: [PR] Fix: parsing ident starting with underscore in certain dialects [datafusion-sqlparser-rs]

2025-05-04 Thread via GitHub
MohamedAbdeen21 commented on code in PR #1835: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1835#discussion_r2072717456 ## src/tokenizer.rs: ## @@ -1281,20 +1262,91 @@ impl<'a> Tokenizer<'a> { return Ok(Some(Token::make_word(s.as_

Re: [PR] Added support for `CREATE DOMAIN` [datafusion-sqlparser-rs]

2025-05-04 Thread via GitHub
iffyio merged PR #1830: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1830 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr

Re: [PR] Allow stored procedures to be defined without `BEGIN`/`END` [datafusion-sqlparser-rs]

2025-05-04 Thread via GitHub
iffyio commented on PR #1834: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1834#issuecomment-2849446927 @aharpervc could you take a look at this PR, it seems to have commits from other PRs like the CREATE TRIGGER in it, so hard to tell what's being introduced this one --

Re: [PR] Resolved bug in `parse_function_arg` [datafusion-sqlparser-rs]

2025-05-04 Thread via GitHub
iffyio commented on code in PR #1826: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1826#discussion_r2072720003 ## src/parser/mod.rs: ## @@ -5199,13 +5199,20 @@ impl<'a> Parser<'a> { // parse: [ argname ] argtype let mut name = None; +l

Re: [PR] Migrate Optimizer tests to insta, part4 [datafusion]

2025-05-04 Thread via GitHub
xudong963 commented on PR #15937: URL: https://github.com/apache/datafusion/pull/15937#issuecomment-2849853927 Thank you all -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] Migrate Optimizer tests to insta, part4 [datafusion]

2025-05-04 Thread via GitHub
xudong963 merged PR #15937: URL: https://github.com/apache/datafusion/pull/15937 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@data

Re: [PR] Update extending-operators.md [datafusion]

2025-05-04 Thread via GitHub
xudong963 commented on PR #15832: URL: https://github.com/apache/datafusion/pull/15832#issuecomment-2849865580 > Thanks, mate, but I think it shows for the `lib.rs` file, and we didn't make any changes there. Does this mean that we need to change over there? If you take a look at the

Re: [PR] feat: ORDER BY ALL [datafusion]

2025-05-04 Thread via GitHub
PokIsemaine commented on code in PR #15772: URL: https://github.com/apache/datafusion/pull/15772#discussion_r2072647351 ## datafusion/sql/src/expr/order_by.rs: ## @@ -61,13 +57,27 @@ impl SqlToRel<'_, S> { None => input_schema, }; -let mut expr_ve

[PR] chore: Comet + Iceberg (1.8.1) CI [datafusion-comet]

2025-05-04 Thread via GitHub
hsiang-c opened a new pull request, #1715: URL: https://github.com/apache/datafusion-comet/pull/1715 ## Which issue does this PR close? Closes #. https://github.com/apache/datafusion-comet/issues/1685 ## Rationale for this change Run Iceberg Spark' tests a

Re: [PR] chore: Comet + Iceberg (1.8.1) CI [datafusion-comet]

2025-05-04 Thread via GitHub
hsiang-c commented on code in PR #1715: URL: https://github.com/apache/datafusion-comet/pull/1715#discussion_r2072756448 ## .github/workflows/iceberg_spark_test.yml: ## @@ -0,0 +1,81 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license

Re: [PR] chore: Comet + Iceberg (1.8.1) CI [datafusion-comet]

2025-05-04 Thread via GitHub
hsiang-c commented on code in PR #1715: URL: https://github.com/apache/datafusion-comet/pull/1715#discussion_r2072756556 ## .github/actions/setup-iceberg-builder/action.yaml: ## @@ -0,0 +1,63 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor

Re: [I] [datafusion-spark] Implement `ceil` function [datafusion]

2025-05-04 Thread via GitHub
irenjj commented on issue #15916: URL: https://github.com/apache/datafusion/issues/15916#issuecomment-2849729993 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To uns

Re: [I] Spark executors failing occasionally on SIGSEGV [datafusion-comet]

2025-05-04 Thread via GitHub
mixermt commented on issue #1714: URL: https://github.com/apache/datafusion-comet/issues/1714#issuecomment-2849234775 Now I see that the failure occurred in a specific job from our regression set. I will try re-play it multiple times to narrow down the issue. -- This is an automated

Re: [PR] feat: ORDER BY ALL [datafusion]

2025-05-04 Thread via GitHub
xudong963 commented on code in PR #15772: URL: https://github.com/apache/datafusion/pull/15772#discussion_r2072637668 ## datafusion/sql/src/expr/order_by.rs: ## @@ -61,13 +57,27 @@ impl SqlToRel<'_, S> { None => input_schema, }; -let mut expr_vec

Re: [PR] feat: ORDER BY ALL [datafusion]

2025-05-04 Thread via GitHub
xudong963 commented on code in PR #15772: URL: https://github.com/apache/datafusion/pull/15772#discussion_r2072637962 ## datafusion/sql/src/expr/order_by.rs: ## @@ -61,13 +57,27 @@ impl SqlToRel<'_, S> { None => input_schema, }; -let mut expr_vec

Re: [I] Add imdb 10 rows slt test [datafusion]

2025-05-04 Thread via GitHub
xudong963 commented on issue #15934: URL: https://github.com/apache/datafusion/issues/15934#issuecomment-2849263595 I notice we don't have the least tpc-ds tests in sqllogictest, maybe it's better to add them -- This is an automated message from the Apache Git Service. To respond to the m

Re: [I] [DISCUSSION] JOIN "task force" / project team [datafusion]

2025-05-04 Thread via GitHub
xudong963 commented on issue #15885: URL: https://github.com/apache/datafusion/issues/15885#issuecomment-2849264701 > > > not sure if it will help direction, cost nothing to share :) [Debunking the Myth of Join Ordering: Toward Robust SQL Analytics](https://arxiv.org/abs/2502.15181) > >

Re: [PR] Fix: parsing ident starting with underscore in certain dialects [datafusion-sqlparser-rs]

2025-05-04 Thread via GitHub
iffyio commented on code in PR #1835: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1835#discussion_r2072716889 ## src/tokenizer.rs: ## @@ -1281,20 +1262,91 @@ impl<'a> Tokenizer<'a> { return Ok(Some(Token::make_word(s.as_str(), N

Re: [PR] refactor: replace `unwrap_or` with `unwrap_or_else` for improved lazy… [datafusion]

2025-05-04 Thread via GitHub
NevroHelios commented on PR #15841: URL: https://github.com/apache/datafusion/pull/15841#issuecomment-2849377862 Is there anything else I am missing or need to add? @alamb @xudong963 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to G

[I] Spark executors failing occasionally on SIGSEGV [datafusion-comet]

2025-05-04 Thread via GitHub
mixermt opened a new issue, #1714: URL: https://github.com/apache/datafusion-comet/issues/1714 Hi, Experience occasional failure of Spark executors ``` │ # A fatal error has been detected by the Java Runtime Environment:

Re: [PR] Update extending-operators.md [datafusion]

2025-05-04 Thread via GitHub
xudong963 commented on PR #15832: URL: https://github.com/apache/datafusion/pull/15832#issuecomment-2849259474 Could you please refer to the error lints in CI? Such as ``` error[E0433]: failed to resolve: use of undeclared type `Statistics` --> datafusion/core/src/lib.rs:1365:12

Re: [PR] chore: Prepare for DataFusion 48.0.0 [datafusion-comet]

2025-05-04 Thread via GitHub
codecov-commenter commented on PR #1710: URL: https://github.com/apache/datafusion-comet/pull/1710#issuecomment-2849378438 ## [Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/1710?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_ca

Re: [I] support `regexp_match` as predicate [datafusion]

2025-05-04 Thread via GitHub
juju4 commented on issue #15872: URL: https://github.com/apache/datafusion/issues/15872#issuecomment-2849384492 ah, regexp_like() is good. Thanks @Omega359 But it makes me confused why some functions works in both select and where (regexp_like) and some not (regexp_match, probably others

Re: [D] outlier, time compare or frequency analysis operators in datafusion? [datafusion]

2025-05-04 Thread via GitHub
GitHub user juju4 added a comment to the discussion: outlier, time compare or frequency analysis operators in datafusion? so best approaching that is currently available is standard deviation (stddev), correct? https://datafusion.apache.org/user-guide/sql/aggregate_functions.html#stddev GitHu

Re: [I] Spark executors failing occasionally on SIGSEGV [datafusion-comet]

2025-05-04 Thread via GitHub
mixermt commented on issue #1714: URL: https://github.com/apache/datafusion-comet/issues/1714#issuecomment-2849354985 Another observation, failure happens while read of data from Iceberg table. Either we have some versions mismatch, jar hell or some other still unknown reason to me

Re: [PR] Migrate Optimizer tests to insta, part4 [datafusion]

2025-05-04 Thread via GitHub
blaginin commented on code in PR #15937: URL: https://github.com/apache/datafusion/pull/15937#discussion_r2072785654 ## datafusion/optimizer/src/test/mod.rs: ## @@ -242,6 +226,24 @@ pub fn assert_optimized_plan_eq_display_indent( assert_eq!(formatted_plan, expected); } +

Re: [PR] Migrate Optimizer tests to insta, part4 [datafusion]

2025-05-04 Thread via GitHub
blaginin commented on code in PR #15937: URL: https://github.com/apache/datafusion/pull/15937#discussion_r2072785470 ## datafusion/optimizer/src/propagate_empty_relation.rs: ## @@ -280,8 +291,7 @@ mod tests { .project(vec![binary_expr(lit(1), Operator::Plus, lit(1))

Re: [PR] Migrate Optimizer tests to insta, part4 [datafusion]

2025-05-04 Thread via GitHub
Copilot commented on code in PR #15937: URL: https://github.com/apache/datafusion/pull/15937#discussion_r2072657169 ## datafusion/optimizer/src/test/mod.rs: ## @@ -242,6 +226,24 @@ pub fn assert_optimized_plan_eq_display_indent( assert_eq!(formatted_plan, expected); } +#

Re: [PR] Implement intermediate result blocked approach to aggregation memory management [datafusion]

2025-05-04 Thread via GitHub
Rachelint commented on code in PR #15591: URL: https://github.com/apache/datafusion/pull/15591#discussion_r2072857296 ## datafusion/functions-aggregate-common/src/aggregate/groups_accumulator/prim_op.rs: ## @@ -43,8 +46,8 @@ where T: ArrowPrimitiveType + Send, F: Fn(&m

Re: [PR] Implement intermediate result blocked approach to aggregation memory management [datafusion]

2025-05-04 Thread via GitHub
Rachelint commented on code in PR #15591: URL: https://github.com/apache/datafusion/pull/15591#discussion_r2072857296 ## datafusion/functions-aggregate-common/src/aggregate/groups_accumulator/prim_op.rs: ## @@ -43,8 +46,8 @@ where T: ArrowPrimitiveType + Send, F: Fn(&m

[I] Optimize TopK memory usage by slicing batches as they come in [datafusion]

2025-05-04 Thread via GitHub
adriangb opened a new issue, #15940: URL: https://github.com/apache/datafusion/issues/15940 ### Is your feature request related to a problem or challenge? We encountered a case with dictionaries flowing into a TopK operator where we hit key overflow errors. We believe this is because

Re: [PR] Implement intermediate result blocked approach to aggregation memory management [datafusion]

2025-05-04 Thread via GitHub
Rachelint commented on code in PR #15591: URL: https://github.com/apache/datafusion/pull/15591#discussion_r2072859602 ## datafusion/functions-aggregate-common/src/aggregate/groups_accumulator.rs: ## @@ -507,3 +509,157 @@ pub(crate) fn slice_and_maybe_filter( Ok(sliced_a

Re: [PR] Implement intermediate result blocked approach to aggregation memory management [datafusion]

2025-05-04 Thread via GitHub
Rachelint commented on code in PR #15591: URL: https://github.com/apache/datafusion/pull/15591#discussion_r2072859439 ## datafusion/functions-aggregate-common/src/aggregate/groups_accumulator/prim_op.rs: ## @@ -198,4 +232,28 @@ where fn size(&self) -> usize { self.

Re: [PR] Implement intermediate result blocked approach to aggregation memory management [datafusion]

2025-05-04 Thread via GitHub
Rachelint commented on code in PR #15591: URL: https://github.com/apache/datafusion/pull/15591#discussion_r2072859880 ## datafusion/functions-aggregate-common/src/aggregate/groups_accumulator.rs: ## @@ -507,3 +509,157 @@ pub(crate) fn slice_and_maybe_filter( Ok(sliced_a

Re: [I] Optimize TopK memory usage by slicing batches as they come in [datafusion]

2025-05-04 Thread via GitHub
adriangb commented on issue #15940: URL: https://github.com/apache/datafusion/issues/15940#issuecomment-2850001961 Hmm I went to look at the code and `maybe_compact` is basically doing this already. Looking into why we hit this now. Could be a bug on our end or in DF. -- This is an automa

Re: [PR] Implement intermediate result blocked approach to aggregation memory management [datafusion]

2025-05-04 Thread via GitHub
Rachelint commented on code in PR #15591: URL: https://github.com/apache/datafusion/pull/15591#discussion_r2072895547 ## datafusion/functions-aggregate/src/correlation.rs: ## @@ -448,6 +448,9 @@ impl GroupsAccumulator for CorrelationGroupsAccumulator { let n = match emi

Re: [I] Parquet predicate filters fail with "Invalid comparison operation: Utf8View <= Utf8" [datafusion]

2025-05-04 Thread via GitHub
adriangb commented on issue #15920: URL: https://github.com/apache/datafusion/issues/15920#issuecomment-2849047722 Thanks for reporting. We've been making changes to apply predicate pushdown filters against physical file schemas (as opposed the the table schema without partition columns)

Re: [I] Support metadata columns (`location`, `size`, `last_modified`) in `ListingTableProvider` [datafusion]

2025-05-04 Thread via GitHub
phillipleblanc commented on issue #15173: URL: https://github.com/apache/datafusion/issues/15173#issuecomment-2849049906 > And then making another stream wrapper that did the projecting Actually, yeah - that would work quite nicely I think. If we changed FileStream's interface from re

Re: [PR] Implement Parquet filter pushdown via new filter pushdown APIs [datafusion]

2025-05-04 Thread via GitHub
adriangb commented on code in PR #15769: URL: https://github.com/apache/datafusion/pull/15769#discussion_r2072538338 ## datafusion/sqllogictest/test_files/parquet_filter_pushdown.slt: ## @@ -81,11 +81,15 @@ EXPLAIN select a from t_pushdown where b > 2 ORDER BY a; logical_

[I] Move physical plan filter pushdown optimizer rule to avoid adding unnecessary nodes [datafusion]

2025-05-04 Thread via GitHub
adriangb opened a new issue, #15938: URL: https://github.com/apache/datafusion/issues/15938 See https://github.com/apache/datafusion/pull/15769#discussion_r2070804563. The thought is that if we just re-arrange optimizer rule order the extra `CoalesceBatchesExec` and `RepartitionExec`

Re: [PR] Implement Parquet filter pushdown via new filter pushdown APIs [datafusion]

2025-05-04 Thread via GitHub
adriangb commented on PR #15769: URL: https://github.com/apache/datafusion/pull/15769#issuecomment-2849130181 I updated the order of the pushdown rules in this PR, it worked to get rid of the extra nodes. I've also added the upgrade guide and the pushdown preview is being shown in th

[I] Memory leak in `datafusion-cli` [datafusion]

2025-05-04 Thread via GitHub
2010YOUY01 opened a new issue, #15939: URL: https://github.com/apache/datafusion/issues/15939 ### Describe the bug When running several memory consuming queries in datafusion-cli, the system memory won't be released after the query ends, the memory usage from different queries are ac

Re: [PR] Substrait: Handle inner map fields in schema renaming [datafusion]

2025-05-04 Thread via GitHub
cht42 commented on code in PR #15869: URL: https://github.com/apache/datafusion/pull/15869#discussion_r2072584358 ## datafusion/substrait/tests/cases/substrait_validations.rs: ## @@ -61,16 +61,41 @@ mod tests { let proto_plan = read_json("tests/tes

Re: [PR] Migrate Optimizer tests to insta, part4 [datafusion]

2025-05-04 Thread via GitHub
qstommyshu commented on PR #15937: URL: https://github.com/apache/datafusion/pull/15937#issuecomment-2849155781 Hi @alamb , @blaginin This PR is ready for review -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use