Re: [PR] Implementation for regex_instr [datafusion]

2025-05-03 Thread via GitHub
nirnayroy commented on PR #15928: URL: https://github.com/apache/datafusion/pull/15928#issuecomment-2849046029 > Thank you. I'm wondering what's the reference system for this function's behavior (like postgres or others) The reference system for this function's behaviour is [post

Re: [PR] feat: support min/max for struct [datafusion]

2025-05-03 Thread via GitHub
chenkovsky commented on code in PR #15667: URL: https://github.com/apache/datafusion/pull/15667#discussion_r2072519832 ## datafusion/functions-aggregate/src/min_max.rs: ## @@ -619,6 +625,45 @@ fn min_batch(values: &ArrayRef) -> Result { }) } +fn min_max_batch_struct(arra

Re: [PR] feat: support min/max for struct [datafusion]

2025-05-03 Thread via GitHub
chenkovsky commented on code in PR #15667: URL: https://github.com/apache/datafusion/pull/15667#discussion_r2072517249 ## datafusion/functions-aggregate/src/min_max.rs: ## @@ -619,6 +625,45 @@ fn min_batch(values: &ArrayRef) -> Result { }) } +fn min_max_batch_struct(arra

Re: [PR] Implementation for regex_instr [datafusion]

2025-05-03 Thread via GitHub
2010YOUY01 commented on PR #15928: URL: https://github.com/apache/datafusion/pull/15928#issuecomment-2848931103 Thank you. I'm wondering what's the reference system for this function's behavior (like postgres or others) -- This is an automated message from the Apache Git Service. To respo

Re: [PR] refactor filter pushdown apis [datafusion]

2025-05-03 Thread via GitHub
adriangb commented on PR #15801: URL: https://github.com/apache/datafusion/pull/15801#issuecomment-2848894671 @alamb there's several in-flight PRs now. They all interact and they'll be things we want to tweak from the resultant merged change, but could we start merging them, starting with t

Re: [PR] refactor filter pushdown apis [datafusion]

2025-05-03 Thread via GitHub
adriangb commented on PR #15801: URL: https://github.com/apache/datafusion/pull/15801#issuecomment-2848840198 > However, the existing ExecutionPlan APIs are general-purpose, self-explanatory, and simple, whereas these new ones feel harder to understand and are quite specific to the current

Re: [PR] chore(deps): bump tokio-util from 0.7.14 to 0.7.15 [datafusion]

2025-05-03 Thread via GitHub
Dandandan merged PR #15918: URL: https://github.com/apache/datafusion/pull/15918 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@data

Re: [PR] Implement Parquet filter pushdown via new filter pushdown APIs [datafusion]

2025-05-03 Thread via GitHub
adriangb commented on code in PR #15769: URL: https://github.com/apache/datafusion/pull/15769#discussion_r2072445465 ## datafusion/sqllogictest/test_files/push_down_filter.slt: ## @@ -218,43 +219,57 @@ LOCATION 'test_files/scratch/push_down_filter/t.parquet'; query TT explain

Re: [PR] fix query results for predicates referencing partition columns and data columns [datafusion]

2025-05-03 Thread via GitHub
adriangb commented on code in PR #15935: URL: https://github.com/apache/datafusion/pull/15935#discussion_r2072417846 ## datafusion/datasource-parquet/src/file_format.rs: ## @@ -464,16 +464,16 @@ impl FileFormat for ParquetFormat { fn supports_filters_pushdown( &sel

Re: [PR] Added SQL Example for `Aggregate Functions` [datafusion]

2025-05-03 Thread via GitHub
Adez017 commented on PR #15778: URL: https://github.com/apache/datafusion/pull/15778#issuecomment-2848745150 could anyone help ? @alamb @xudong963 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go t

Re: [PR] Update extending-operators.md [datafusion]

2025-05-03 Thread via GitHub
Adez017 commented on PR #15832: URL: https://github.com/apache/datafusion/pull/15832#issuecomment-2848744801 could anyone please help here ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the sp

Re: [PR] Implement intermediate result blocked approach to aggregation memory management [datafusion]

2025-05-03 Thread via GitHub
Rachelint commented on code in PR #15591: URL: https://github.com/apache/datafusion/pull/15591#discussion_r2072440427 ## datafusion/functions-aggregate-common/src/aggregate/groups_accumulator/accumulate.rs: ## @@ -212,7 +229,66 @@ impl NullState { /// /// resets the in

Re: [PR] Implement intermediate result blocked approach to aggregation memory management [datafusion]

2025-05-03 Thread via GitHub
Rachelint commented on code in PR #15591: URL: https://github.com/apache/datafusion/pull/15591#discussion_r2072440252 ## datafusion/functions-aggregate/src/average.rs: ## @@ -667,8 +668,8 @@ where partial_counts, opt_filter, total_num_group

Re: [PR] Implement intermediate result blocked approach to aggregation memory management [datafusion]

2025-05-03 Thread via GitHub
Rachelint commented on code in PR #15591: URL: https://github.com/apache/datafusion/pull/15591#discussion_r2072440207 ## datafusion/expr-common/src/groups_accumulator.rs: ## @@ -17,29 +17,52 @@ //! Vectorized [`GroupsAccumulator`] +use std::collections::VecDeque; + use arr

Re: [PR] chore: fix CI job name [datafusion-comet]

2025-05-03 Thread via GitHub
codecov-commenter commented on PR #1712: URL: https://github.com/apache/datafusion-comet/pull/1712#issuecomment-2848707865 ## [Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/1712?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_ca

Re: [PR] Implement intermediate result blocked approach to aggregation memory management [datafusion]

2025-05-03 Thread via GitHub
Rachelint commented on code in PR #15591: URL: https://github.com/apache/datafusion/pull/15591#discussion_r2072425201 ## datafusion/common/src/config.rs: ## @@ -405,6 +405,18 @@ config_namespace! { /// in joins can reduce memory usage when joining large /// tab

Re: [PR] Fix: parsing ident starting with underscore in certain dialects [datafusion-sqlparser-rs]

2025-05-03 Thread via GitHub
MohamedAbdeen21 commented on code in PR #1835: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1835#discussion_r2072423389 ## src/tokenizer.rs: ## @@ -1281,20 +1262,91 @@ impl<'a> Tokenizer<'a> { return Ok(Some(Token::make_word(s.as_

Re: [I] Speedup character_length [datafusion]

2025-05-03 Thread via GitHub
Dandandan closed issue #15930: Speedup character_length URL: https://github.com/apache/datafusion/issues/15930 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e

Re: [PR] Speedup `character_length` [datafusion]

2025-05-03 Thread via GitHub
Dandandan merged PR #15931: URL: https://github.com/apache/datafusion/pull/15931 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@data

Re: [PR] Fix: parsing ident starting with underscore in certain dialects [datafusion-sqlparser-rs]

2025-05-03 Thread via GitHub
MohamedAbdeen21 commented on code in PR #1835: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1835#discussion_r2072419795 ## src/tokenizer.rs: ## @@ -1281,20 +1262,91 @@ impl<'a> Tokenizer<'a> { return Ok(Some(Token::make_word(s.as_

Re: [PR] Added support for CREATE DOMAIN and its test suite [datafusion-sqlparser-rs]

2025-05-03 Thread via GitHub
LucaCappelletti94 commented on code in PR #1830: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1830#discussion_r2072420020 ## tests/sqlparser_postgres.rs: ## @@ -5080,6 +5080,111 @@ fn test_escaped_string_literal() { } } +#[test] +fn parse_create_domain()

Re: [PR] Added support for CREATE DOMAIN and its test suite [datafusion-sqlparser-rs]

2025-05-03 Thread via GitHub
LucaCappelletti94 commented on code in PR #1830: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1830#discussion_r2072419701 ## src/parser/mod.rs: ## @@ -5894,6 +5896,47 @@ impl<'a> Parser<'a> { Ok(owner) } +/// ```sql +/// CREATE DOMAIN

Re: [PR] Added support for CREATE DOMAIN and its test suite [datafusion-sqlparser-rs]

2025-05-03 Thread via GitHub
LucaCappelletti94 commented on code in PR #1830: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1830#discussion_r2072419395 ## src/ast/mod.rs: ## @@ -3968,6 +3968,19 @@ pub enum Statement { owned_by: Option, }, /// ```sql +/// CREATE DOMAIN n

Re: [PR] Implement intermediate result blocked approach to aggregation memory management [datafusion]

2025-05-03 Thread via GitHub
Rachelint commented on code in PR #15591: URL: https://github.com/apache/datafusion/pull/15591#discussion_r2072417977 ## datafusion/expr-common/src/groups_accumulator.rs: ## @@ -250,4 +288,30 @@ pub trait GroupsAccumulator: Send { /// This function is called once per batch,

Re: [PR] Fix: parsing ident starting with underscore in certain dialects [datafusion-sqlparser-rs]

2025-05-03 Thread via GitHub
MohamedAbdeen21 commented on code in PR #1835: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1835#discussion_r2072418073 ## src/tokenizer.rs: ## @@ -1281,20 +1262,91 @@ impl<'a> Tokenizer<'a> { return Ok(Some(Token::make_word(s.as_

Re: [PR] Implement intermediate result blocked approach to aggregation memory management [datafusion]

2025-05-03 Thread via GitHub
Rachelint commented on code in PR #15591: URL: https://github.com/apache/datafusion/pull/15591#discussion_r2072417977 ## datafusion/expr-common/src/groups_accumulator.rs: ## @@ -250,4 +288,30 @@ pub trait GroupsAccumulator: Send { /// This function is called once per batch,

Re: [PR] perf: Add memory profiling [datafusion-comet]

2025-05-03 Thread via GitHub
andygrove commented on code in PR #1702: URL: https://github.com/apache/datafusion-comet/pull/1702#discussion_r2072417946 ## native/core/src/execution/jni_api.rs: ## @@ -359,6 +367,41 @@ pub unsafe extern "system" fn Java_org_apache_comet_Native_executePlan( // Retriev

Re: [PR] Implement Parquet filter pushdown via new filter pushdown APIs [datafusion]

2025-05-03 Thread via GitHub
adriangb commented on PR #15769: URL: https://github.com/apache/datafusion/pull/15769#issuecomment-2848681802 If you can make a PR to change the order of the rules and open that ticket I would appreciate it I won't be able to for several hours 🙏 -- This is an automated message from the Ap

Re: [I] Make it easier to run TPCH queries with datafusion-cli [datafusion]

2025-05-03 Thread via GitHub
kevinjqliu commented on issue #14608: URL: https://github.com/apache/datafusion/issues/14608#issuecomment-2848681742 sounds good to me! `SELECT * FROM lineitem(1.0)` makes sense `SELECT 1 FROM tpchgen(1.0)` looks a bit odd but i cant think of a better alternative -- This is an autom

Re: [PR] fix query results for predicates referencing partition columns and data columns [datafusion]

2025-05-03 Thread via GitHub
adriangb commented on code in PR #15935: URL: https://github.com/apache/datafusion/pull/15935#discussion_r2072417846 ## datafusion/datasource-parquet/src/file_format.rs: ## @@ -464,16 +464,16 @@ impl FileFormat for ParquetFormat { fn supports_filters_pushdown( &sel

Re: [PR] Add support for `DENY` statements [datafusion-sqlparser-rs]

2025-05-03 Thread via GitHub
iffyio commented on code in PR #1836: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1836#discussion_r2072414916 ## src/parser/mod.rs: ## @@ -12987,23 +12991,34 @@ impl<'a> Parser<'a> { /// Parse a GRANT statement. pub fn parse_grant(&mut self) -> Resul

Re: [PR] Implement intermediate result blocked approach to aggregation memory management [datafusion]

2025-05-03 Thread via GitHub
Rachelint commented on code in PR #15591: URL: https://github.com/apache/datafusion/pull/15591#discussion_r2072414793 ## benchmarks/queries/clickbench/extended.sql: ## @@ -5,3 +5,4 @@ SELECT "SocialSourceNetworkID", "RegionID", COUNT(*), AVG("Age"), AVG("ParamPric SELECT "Clie

Re: [PR] Implement intermediate result blocked approach to aggregation memory management [datafusion]

2025-05-03 Thread via GitHub
Rachelint commented on code in PR #15591: URL: https://github.com/apache/datafusion/pull/15591#discussion_r2072414793 ## benchmarks/queries/clickbench/extended.sql: ## @@ -5,3 +5,4 @@ SELECT "SocialSourceNetworkID", "RegionID", COUNT(*), AVG("Age"), AVG("ParamPric SELECT "Clie

Re: [PR] Fix: parsing ident starting with underscore in certain dialects [datafusion-sqlparser-rs]

2025-05-03 Thread via GitHub
iffyio commented on code in PR #1835: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1835#discussion_r2072413695 ## src/tokenizer.rs: ## @@ -1281,20 +1262,91 @@ impl<'a> Tokenizer<'a> { return Ok(Some(Token::make_word(s.as_str(), N

Re: [PR] fix query results for predicates referencing partition columns and data columns [datafusion]

2025-05-03 Thread via GitHub
berkaysynnada commented on code in PR #15935: URL: https://github.com/apache/datafusion/pull/15935#discussion_r2072414250 ## datafusion/datasource-parquet/src/file_format.rs: ## @@ -464,16 +464,16 @@ impl FileFormat for ParquetFormat { fn supports_filters_pushdown(

Re: [PR] Implement Parquet filter pushdown via new filter pushdown APIs [datafusion]

2025-05-03 Thread via GitHub
berkaysynnada commented on PR #15769: URL: https://github.com/apache/datafusion/pull/15769#issuecomment-2848670746 > Is there anything I'm missing? Is this what you meant by `the mistake`? I meant the need of this PR. > Is there anything I'm missing? https://github.com/apache

Re: [PR] Added support for CREATE DOMAIN and its test suite [datafusion-sqlparser-rs]

2025-05-03 Thread via GitHub
iffyio commented on code in PR #1830: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1830#discussion_r2072411675 ## src/parser/mod.rs: ## @@ -5894,6 +5896,47 @@ impl<'a> Parser<'a> { Ok(owner) } +/// ```sql +/// CREATE DOMAIN name [ AS ]

Re: [PR] Implement Parquet filter pushdown via new filter pushdown APIs [datafusion]

2025-05-03 Thread via GitHub
adriangb commented on PR #15769: URL: https://github.com/apache/datafusion/pull/15769#issuecomment-2848665962 > @adriangb do you have time to address the last suggestions? I understand the mistake here, and I think we should take this in asap I am going to try to address the last roun

Re: [PR] fix query results for predicates referencing partition columns and data columns [datafusion]

2025-05-03 Thread via GitHub
adriangb commented on code in PR #15935: URL: https://github.com/apache/datafusion/pull/15935#discussion_r2072411339 ## datafusion/datasource-parquet/src/file_format.rs: ## @@ -464,16 +464,16 @@ impl FileFormat for ParquetFormat { fn supports_filters_pushdown( &sel

Re: [PR] Add `CREATE TRIGGER` support for SQL Server [datafusion-sqlparser-rs]

2025-05-03 Thread via GitHub
iffyio merged PR #1810: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1810 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr

Re: [PR] Add extended query for checking improvement for blocked groups optimization [datafusion]

2025-05-03 Thread via GitHub
Rachelint commented on PR #15936: URL: https://github.com/apache/datafusion/pull/15936#issuecomment-2848659361 Thanks @jayzhan211 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific com

Re: [PR] Add extended query for checking improvement for blocked groups optimization [datafusion]

2025-05-03 Thread via GitHub
Rachelint merged PR #15936: URL: https://github.com/apache/datafusion/pull/15936 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@data

[PR] Migrate Optimizer tests to insta, part4 [datafusion]

2025-05-03 Thread via GitHub
qstommyshu opened a new pull request, #15937: URL: https://github.com/apache/datafusion/pull/15937 ## Which issue does this PR close? - Related #15396 , #15446, #15884, #15893 ## Rationale for this change ## What changes are included in this PR?

Re: [PR] feat: metadata handling for aggregates and window functions [datafusion]

2025-05-03 Thread via GitHub
timsaucer commented on PR #15911: URL: https://github.com/apache/datafusion/pull/15911#issuecomment-2848636584 FYI @paleolimbot @crystalxyz since this impacts both of your work -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub a

Re: [PR] Implement Parquet filter pushdown via new filter pushdown APIs [datafusion]

2025-05-03 Thread via GitHub
berkaysynnada commented on PR #15769: URL: https://github.com/apache/datafusion/pull/15769#issuecomment-2848631745 @adriangb do you have time to address the last suggestions? I understand the mistake here, and I think we should take this in asap -- This is an automated message from the Ap

Re: [PR] Add extended query for checking improvement for blocked groups optimization [datafusion]

2025-05-03 Thread via GitHub
Rachelint commented on PR #15936: URL: https://github.com/apache/datafusion/pull/15936#issuecomment-2848629484 @alamb @jayzhan211 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific com

Re: [PR] Implement intermediate result blocked approach to aggregation memory management [datafusion]

2025-05-03 Thread via GitHub
Rachelint commented on PR #15591: URL: https://github.com/apache/datafusion/pull/15591#issuecomment-2848629234 @alamb I have submitted an pr about new added query for this pr #15936 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to Gi

[PR] Add extended query for checking improvement for blocked groups optimization [datafusion]

2025-05-03 Thread via GitHub
Rachelint opened a new pull request, #15936: URL: https://github.com/apache/datafusion/pull/15936 ## Which issue does this PR close? - Closes #. ## Rationale for this change Need a simple enough query for checking the improvment of #15591 ## What chang

Re: [PR] fix query results for predicates referencing partition columns and data columns [datafusion]

2025-05-03 Thread via GitHub
berkaysynnada commented on code in PR #15935: URL: https://github.com/apache/datafusion/pull/15935#discussion_r2072391618 ## datafusion/datasource-parquet/src/file_format.rs: ## @@ -464,16 +464,16 @@ impl FileFormat for ParquetFormat { fn supports_filters_pushdown(

Re: [PR] fix query results for predicates referencing partition columns and data columns [datafusion]

2025-05-03 Thread via GitHub
adriangb commented on PR #15935: URL: https://github.com/apache/datafusion/pull/15935#issuecomment-2848579359 cc @alamb -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [PR] fix query results for predicates referencing partition columns and data columns [datafusion]

2025-05-03 Thread via GitHub
adriangb commented on code in PR #15935: URL: https://github.com/apache/datafusion/pull/15935#discussion_r2072373837 ## datafusion/datasource-parquet/src/row_filter.rs: ## @@ -662,14 +660,11 @@ mod test { assert!(!can_expr_be_pushed_down_with_schemas( &expr

Re: [I] Wrong query results for filters that involve partition columns and data file columns and `pushdown_filters` is enabled [datafusion]

2025-05-03 Thread via GitHub
adriangb commented on issue #15912: URL: https://github.com/apache/datafusion/issues/15912#issuecomment-2848574101 Looking at the git blame... it looks like I introduced the bug in https://github.com/apache/datafusion/pull/15263. Feeling guilty so here is the fix: https://github.com/apac

Re: [PR] Implement intermediate result blocked approach to aggregation memory management [datafusion]

2025-05-03 Thread via GitHub
alamb commented on PR #15591: URL: https://github.com/apache/datafusion/pull/15591#issuecomment-2848573594 🤖: Benchmark completed Details ``` Comparing HEAD and intermeidate-result-blocked-approach Benchmark clickbench_extended.json -

[PR] fix query results for predicates referencing partition columns and data columns [datafusion]

2025-05-03 Thread via GitHub
adriangb opened a new pull request, #15935: URL: https://github.com/apache/datafusion/pull/15935 Fixes #15912 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe

Re: [PR] Implement intermediate result blocked approach to aggregation memory management [datafusion]

2025-05-03 Thread via GitHub
Rachelint commented on PR #15591: URL: https://github.com/apache/datafusion/pull/15591#issuecomment-2848571591 > Given we have past evidence this approach will work I think we could merge it before the queries really sped up. My big concern is that we have a plan to eventually avoid both si

Re: [PR] Introduce selection vector repartitioning [datafusion]

2025-05-03 Thread via GitHub
goldmedal commented on PR #15423: URL: https://github.com/apache/datafusion/pull/15423#issuecomment-2848568844 Thanks @2010YOUY01 for the suggestions. > 1. Use the term `(selection) bitmap` instead of `selection vector` to avoid confusion. I believe `selection vector` commonly refers

Re: [PR] Implement intermediate result blocked approach to aggregation memory management [datafusion]

2025-05-03 Thread via GitHub
Rachelint commented on PR #15591: URL: https://github.com/apache/datafusion/pull/15591#issuecomment-2848565268 > What would be required to improve the performance for one or more of the real clickbench queries? Implementing group management for other data types? Yes. Usually high c

Re: [PR] Implement intermediate result blocked approach to aggregation memory management [datafusion]

2025-05-03 Thread via GitHub
alamb commented on PR #15591: URL: https://github.com/apache/datafusion/pull/15591#issuecomment-2848561993 🤖 `./gh_compare_branch.sh` [Benchmark Script](https://github.com/alamb/datafusion-benchmarking/blob/main/gh_compare_branch.sh) Running Linux aal-dev 6.11.0-1013-gcp #13~24.04.1-Ubun

Re: [PR] Speedup `character_length` [datafusion]

2025-05-03 Thread via GitHub
alamb commented on PR #15931: URL: https://github.com/apache/datafusion/pull/15931#issuecomment-2848561984 🤖: Benchmark completed Details ``` group main speedup_character_length ---

Re: [PR] Implement intermediate result blocked approach to aggregation memory management [datafusion]

2025-05-03 Thread via GitHub
alamb commented on PR #15591: URL: https://github.com/apache/datafusion/pull/15591#issuecomment-2848559813 > I am back from holiday, and continue to work on this today. Welcome back! > Emmm... As expected the new added query is not convered, I think I should submit an new pr fo

Re: [PR] Speedup `character_length` [datafusion]

2025-05-03 Thread via GitHub
alamb commented on PR #15931: URL: https://github.com/apache/datafusion/pull/15931#issuecomment-2848558610 🤖 `./gh_compare_branch_bench.sh` [Benchmark Script](https://github.com/alamb/datafusion-benchmarking/blob/main/gh_compare_branch_bench.sh) Running Linux aal-dev 6.11.0-1013-gcp #13~

Re: [PR] feat: support min/max for struct [datafusion]

2025-05-03 Thread via GitHub
alamb commented on code in PR #15667: URL: https://github.com/apache/datafusion/pull/15667#discussion_r2072364948 ## datafusion/functions-aggregate-common/src/aggregate/groups_accumulator/nulls.rs: ## @@ -193,6 +193,14 @@ pub fn set_nulls_dyn(input: &dyn Array, nulls: Option) -

Re: [PR] Implement intermediate result blocked approach to aggregation memory management [datafusion]

2025-05-03 Thread via GitHub
Rachelint commented on PR #15591: URL: https://github.com/apache/datafusion/pull/15591#issuecomment-2848557998 > 🤖: Benchmark completed > Details > > ``` > Comparing HEAD and intermeidate-result-blocked-approach > > Benchmark clickbench_extended.json

Re: [PR] Implement intermediate result blocked approach to aggregation memory management [datafusion]

2025-05-03 Thread via GitHub
alamb commented on PR #15591: URL: https://github.com/apache/datafusion/pull/15591#issuecomment-2848554286 🤖: Benchmark completed Details ``` Comparing HEAD and intermeidate-result-blocked-approach Benchmark clickbench_extended.json -

Re: [PR] Implement intermediate result blocked approach to aggregation memory management [datafusion]

2025-05-03 Thread via GitHub
Rachelint commented on PR #15591: URL: https://github.com/apache/datafusion/pull/15591#issuecomment-2848554248 > how is the benchmark triggered and can we run clickbench extended too? > > upd: I didn't find improvement for extended query locally Also for the new added one? ``

Re: [PR] Implement intermediate result blocked approach to aggregation memory management [datafusion]

2025-05-03 Thread via GitHub
Rachelint commented on PR #15591: URL: https://github.com/apache/datafusion/pull/15591#issuecomment-2848551270 > > 🤖: Benchmark completed > > (I am surprised this PR didn't yield better results, I am reruning now to see if the results are reproducable I am back from holiday, an

Re: [PR] Implement intermediate result blocked approach to aggregation memory management [datafusion]

2025-05-03 Thread via GitHub
Rachelint commented on PR #15591: URL: https://github.com/apache/datafusion/pull/15591#issuecomment-2848551267 > > 🤖: Benchmark completed > > (I am surprised this PR didn't yield better results, I am reruning now to see if the results are reproducable I am back from holiday, an

Re: [PR] docs: Label `bloom_filter_on_read` as a reading config [datafusion]

2025-05-03 Thread via GitHub
alamb commented on PR #15933: URL: https://github.com/apache/datafusion/pull/15933#issuecomment-2848549315 Thanks again @nuno-faria -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific c

Re: [PR] docs: Label `bloom_filter_on_read` as a reading config [datafusion]

2025-05-03 Thread via GitHub
alamb merged PR #15933: URL: https://github.com/apache/datafusion/pull/15933 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] Implement intermediate result blocked approach to aggregation memory management [datafusion]

2025-05-03 Thread via GitHub
alamb commented on PR #15591: URL: https://github.com/apache/datafusion/pull/15591#issuecomment-2848548914 🤖 `./gh_compare_branch.sh` [Benchmark Script](https://github.com/alamb/datafusion-benchmarking/blob/main/gh_compare_branch.sh) Running Linux aal-dev 6.11.0-1013-gcp #13~24.04.1-Ubun

Re: [PR] Implement intermediate result blocked approach to aggregation memory management [datafusion]

2025-05-03 Thread via GitHub
alamb commented on PR #15591: URL: https://github.com/apache/datafusion/pull/15591#issuecomment-2848548029 > 🤖: Benchmark completed (I am surprised this PR didn't yield better results, I am reruning now to see) -- This is an automated message from the Apache Git Service. To respond

Re: [PR] Implement intermediate result blocked approach to aggregation memory management [datafusion]

2025-05-03 Thread via GitHub
alamb commented on PR #15591: URL: https://github.com/apache/datafusion/pull/15591#issuecomment-2848547823 > how is the benchmark triggered and can we run clickbench extended too? I am using some scripts in https://github.com/alamb/datafusion-benchmarking on a gcp machine (I haven't

Re: [PR] Improve sqllogictest error reporting [datafusion]

2025-05-03 Thread via GitHub
2010YOUY01 commented on code in PR #15905: URL: https://github.com/apache/datafusion/pull/15905#discussion_r2072352105 ## datafusion/sqllogictest/bin/sqllogictests.rs: ## @@ -234,15 +235,45 @@ async fn run_test_file( runner.with_column_validator(strict_column_validator);

Re: [I] Add imdb 10 rows slt test [datafusion]

2025-05-03 Thread via GitHub
kumarlokesh commented on issue #15934: URL: https://github.com/apache/datafusion/issues/15934#issuecomment-2848478868 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. T

Re: [PR] support simple/cross lateral joins [datafusion]

2025-05-03 Thread via GitHub
jayzhan211 commented on PR #14595: URL: https://github.com/apache/datafusion/pull/14595#issuecomment-2848475409 @skyzh Hi, I address the comment mentioned above in https://github.com/skyzh/datafusion/pull/1, if looks good to you we can merge this PR. -- This is an automated message from