Re: [I] [substrait] Build basic test suite to validate produced Substrait plans [datafusion]

2025-05-19 Thread via GitHub
gabotechs commented on issue #15069: URL: https://github.com/apache/datafusion/issues/15069#issuecomment-2893161350 That sounds reasonable, although given the current state there's a lot of ignoring to be done in the Substrait validation mode, some numbers here: currently out of 7302 `query

Re: [PR] Implementation for regex_instr [datafusion]

2025-05-19 Thread via GitHub
nirnayroy commented on PR #15928: URL: https://github.com/apache/datafusion/pull/15928#issuecomment-2893026630 fixed formatting error in workflow -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to t

Re: [PR] Resolved bug in `parse_function_arg` [datafusion-sqlparser-rs]

2025-05-19 Thread via GitHub
LucaCappelletti94 commented on code in PR #1826: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1826#discussion_r2096954711 ## tests/sqlparser_postgres.rs: ## @@ -4098,6 +4099,219 @@ fn parse_update_in_with_subquery() { pg_and_generic().verified_stmt(r#"WITH "

Re: [PR] Mysql: Add `SRID` column option [datafusion-sqlparser-rs]

2025-05-19 Thread via GitHub
iffyio commented on code in PR #1852: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1852#discussion_r2096922347 ## src/parser/mod.rs: ## @@ -16571,6 +16575,23 @@ mod tests { } } +#[test] +fn test_mysql_srid_create_table() { +let sql

Re: [PR] Resolved bug in `parse_function_arg` [datafusion-sqlparser-rs]

2025-05-19 Thread via GitHub
iffyio commented on code in PR #1826: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1826#discussion_r2096911544 ## src/parser/mod.rs: ## @@ -5200,12 +5200,26 @@ impl<'a> Parser<'a> { // parse: [ argname ] argtype let mut name = None; let

Re: [PR] Optimize performance of `string::ascii` function [datafusion]

2025-05-19 Thread via GitHub
tlm365 commented on code in PR #16087: URL: https://github.com/apache/datafusion/pull/16087#discussion_r2096902385 ## datafusion/functions/benches/ascii.rs: ## @@ -0,0 +1,116 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreem

Re: [PR] Add support for table valued functions for SQL Server [datafusion-sqlparser-rs]

2025-05-19 Thread via GitHub
iffyio commented on code in PR #1839: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1839#discussion_r2096898613 ## src/parser/mod.rs: ## @@ -5204,19 +5204,79 @@ impl<'a> Parser<'a> { let (name, args) = self.parse_create_function_name_and_params()?;

Re: [I] General framework to decorrelate the subqueries [datafusion]

2025-05-19 Thread via GitHub
duongcongtoai commented on issue #5492: URL: https://github.com/apache/datafusion/issues/5492#issuecomment-2892904604 In this case, we may need at least 2 new optimizor passes 🤔 `SubqueryDecorrelation` and `DelimGetRemoval`. and looks like this [PR](https://github.com/apache/datafusion/pu

Re: [PR] Update criterion requirement from 0.5 to 0.6 in /sqlparser_bench [datafusion-sqlparser-rs]

2025-05-19 Thread via GitHub
iffyio merged PR #1857: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1857 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr

Re: [I] General framework to decorrelate the subqueries [datafusion]

2025-05-19 Thread via GitHub
duongcongtoai commented on issue #5492: URL: https://github.com/apache/datafusion/issues/5492#issuecomment-2892878744 Thank you everyone for your opinions. Looks like my implementation is trying to wrap everything inside a single optimizor, which is hard to follow and reduces space for coll

Re: [PR] chore: Upgrade rand crate and some other minor crates [datafusion]

2025-05-19 Thread via GitHub
comphead commented on PR #16062: URL: https://github.com/apache/datafusion/pull/16062#issuecomment-2892876543 Finally there is a green build. Please do not merge it until I fix ignored tests which will be in a separate PR -- This is an automated message from the Apache Git Service. To res

Re: [PR] pretty-print CREATE VIEW statements [datafusion-sqlparser-rs]

2025-05-19 Thread via GitHub
iffyio commented on PR #1855: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1855#issuecomment-2892870653 @lovasoa when you get the time could you merge in latest from main to fix up the ci failure in this and the #1854 PRs? -- This is an automated message from the Apache Gi

Re: [PR] Perf: Optimize in memory sort [datafusion]

2025-05-19 Thread via GitHub
zhuqi-lucas commented on PR #15380: URL: https://github.com/apache/datafusion/pull/15380#issuecomment-2892754141 > 🤖: Benchmark completed > > Details > > ``` > Comparing HEAD and concat_batches_for_sort > > Benchmark clickbench_extended.json > --

Re: [PR] Perf: Optimize in memory sort [datafusion]

2025-05-19 Thread via GitHub
zhuqi-lucas commented on code in PR #15380: URL: https://github.com/apache/datafusion/pull/15380#discussion_r2096766180 ## datafusion/physical-plan/src/sorts/sort.rs: ## @@ -674,16 +676,35 @@ impl ExternalSorter { return self.sort_batch_stream(batch, metrics, reserv

Re: [PR] Secure GitHub Actions by using specific SHA hashes [datafusion]

2025-05-19 Thread via GitHub
github-actions[bot] commented on PR #15306: URL: https://github.com/apache/datafusion/pull/15306#issuecomment-2892705292 Thank you for your contribution. Unfortunately, this pull request is stale because it has been open 60 days with no activity. Please remove the stale label or comment or

Re: [PR] [POC] feat: Add datafusion-storage [datafusion]

2025-05-19 Thread via GitHub
github-actions[bot] commented on PR #15018: URL: https://github.com/apache/datafusion/pull/15018#issuecomment-2892705347 Thank you for your contribution. Unfortunately, this pull request is stale because it has been open 60 days with no activity. Please remove the stale label or comment or

Re: [I] Error `NamedStructField should be rewritten in OperatorToFunction with subquery` if query is wrapped in view [datafusion]

2025-05-19 Thread via GitHub
ahirner closed issue #10764: Error `NamedStructField should be rewritten in OperatorToFunction with subquery` if query is wrapped in view URL: https://github.com/apache/datafusion/issues/10764 -- This is an automated message from the Apache Git Service. To respond to the message, please log o

Re: [I] Open Variant Type for semi-structured data [datafusion]

2025-05-19 Thread via GitHub
ianmcook commented on issue #10987: URL: https://github.com/apache/datafusion/issues/10987#issuecomment-2892638977 There's a discussion happening on the Arrow dev mailing list about adding Variant as a canonical extension type in the Arrow spec. Input from the DataFusion developer community

Re: [PR] fix: correct schema type checking in native_iceberg_compat [datafusion-comet]

2025-05-19 Thread via GitHub
codecov-commenter commented on PR #1755: URL: https://github.com/apache/datafusion-comet/pull/1755#issuecomment-2892635123 ## [Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/1755?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_ca

Re: [PR] feat: make error handling in indent explain consistent with that in tree [datafusion]

2025-05-19 Thread via GitHub
chenkovsky commented on code in PR #16097: URL: https://github.com/apache/datafusion/pull/16097#discussion_r2096658640 ## datafusion/core/src/physical_planner.rs: ## @@ -1757,6 +1757,12 @@ impl DefaultPhysicalPlanner { ))); } +if !e.logical_optimi

Re: [PR] doc: fix indent format explain [datafusion]

2025-05-19 Thread via GitHub
comphead merged PR #16085: URL: https://github.com/apache/datafusion/pull/16085 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

Re: [PR] fix : correct schema type checking in native_iceberg_compat [datafusion-comet]

2025-05-19 Thread via GitHub
parthchandra commented on PR #1755: URL: https://github.com/apache/datafusion-comet/pull/1755#issuecomment-2892542148 @mbutrovich @andygrove Spark test fixes for native_iceberg_compat -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[PR] fix : correct schema type checking in native_iceberg_compat [datafusion-comet]

2025-05-19 Thread via GitHub
parthchandra opened a new pull request, #1755: URL: https://github.com/apache/datafusion-comet/pull/1755 ## Which issue does this PR close? Part of https://github.com/apache/datafusion-comet/issues/1542 Closes #. ## Rationale for this change This addresses test fai

Re: [I] General framework to decorrelate the subqueries [datafusion]

2025-05-19 Thread via GitHub
irenjj commented on issue #5492: URL: https://github.com/apache/datafusion/issues/5492#issuecomment-2892527051 > There are one thing we surely know that should be implemented: detect which nodes in the LogicalPlan AST is a dependent join node. However, we don't need to create a new LogicalP

Re: [PR] chore: Enable more complex type tests [datafusion-comet]

2025-05-19 Thread via GitHub
codecov-commenter commented on PR #1753: URL: https://github.com/apache/datafusion-comet/pull/1753#issuecomment-2892394556 ## [Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/1753?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_ca

Re: [PR] fix: coerce int96 resolution inside of list, struct, and map types [datafusion]

2025-05-19 Thread via GitHub
andygrove commented on PR #16058: URL: https://github.com/apache/datafusion/pull/16058#issuecomment-2892388867 @alamb I'd like to go ahead and merge this one if there are no objections -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

Re: [I] [native_datafusion] Spark SQL failure "select nested field from a complex map key using map_keys" [datafusion-comet]

2025-05-19 Thread via GitHub
andygrove commented on issue #1754: URL: https://github.com/apache/datafusion-comet/issues/1754#issuecomment-2892387185 Thanks @comphead! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spec

Re: [PR] Implementation for regex_instr [datafusion]

2025-05-19 Thread via GitHub
nirnayroy commented on PR #15928: URL: https://github.com/apache/datafusion/pull/15928#issuecomment-289230 fixed the cippy errors showing up in the workflow -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL a

Re: [I] [native_datafusion] Spark SQL failure "select nested field from a complex map key using map_keys" [datafusion-comet]

2025-05-19 Thread via GitHub
comphead commented on issue #1754: URL: https://github.com/apache/datafusion-comet/issues/1754#issuecomment-2892335517 you wanna me to take this @andygrove ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abo

Re: [PR] fix: Add coercion rules for Float16 types [datafusion]

2025-05-19 Thread via GitHub
etseidl commented on PR #15816: URL: https://github.com/apache/datafusion/pull/15816#issuecomment-2892365766 Thanks @alamb! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [I] [native_datafusion] Spark SQL failure "select nested field from a complex map key using map_keys" [datafusion-comet]

2025-05-19 Thread via GitHub
comphead commented on issue #1754: URL: https://github.com/apache/datafusion-comet/issues/1754#issuecomment-2892357390 the similar issue also for the map_values, hopefully the fix can fix both. -- This is an automated message from the Apache Git Service. To respond to the message, please

Re: [I] [native_datafusion] Spark SQL failure "select nested field from a complex map key using map_keys" [datafusion-comet]

2025-05-19 Thread via GitHub
andygrove commented on issue #1754: URL: https://github.com/apache/datafusion-comet/issues/1754#issuecomment-2892277712 I am guessing that we need to implement logic for casting between maps, similar to what @comphead did for lists in https://github.com/apache/datafusion-comet/pull/1687

Re: [PR] Revert use file schema in parquet pruning [datafusion]

2025-05-19 Thread via GitHub
adriangb commented on code in PR #16086: URL: https://github.com/apache/datafusion/pull/16086#discussion_r2095856518 ## datafusion/core/src/datasource/physical_plan/parquet.rs: ## @@ -156,10 +162,20 @@ mod tests { source = source .with_pushd

[I] [native_datafusion] Spark SQL failure "select nested field from a complex map key using map_keys" [datafusion-comet]

2025-05-19 Thread via GitHub
andygrove opened a new issue, #1754: URL: https://github.com/apache/datafusion-comet/issues/1754 ### Describe the bug Repro: ``` ignore("read map[struct, struct] from parquet") { assume(usingDataSourceExec(conf)) withTempPath { dir => // create in

Re: [I] Support data source sampling with TABLESAMPLE [datafusion]

2025-05-19 Thread via GitHub
theirix commented on issue #13563: URL: https://github.com/apache/datafusion/issues/13563#issuecomment-2892193833 Sorry for the long absence. After datafusion-sqlparser-rs gained SQL support for tablesamples in [0.54](https://github.com/apache/datafusion-sqlparser-rs/pull/1566), I am going

Re: [PR] Add support for table valued functions for SQL Server [datafusion-sqlparser-rs]

2025-05-19 Thread via GitHub
aharpervc commented on PR #1839: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1839#issuecomment-2892157468 It seems the CI failure for the lint job was addressed by https://github.com/apache/datafusion-sqlparser-rs/pull/1856. I've rebased again, all good 👍 -- This is an

[PR] Enable more tests [datafusion-comet]

2025-05-19 Thread via GitHub
andygrove opened a new pull request, #1753: URL: https://github.com/apache/datafusion-comet/pull/1753 ## Which issue does this PR close? N/A ## Rationale for this change Enable some complex tests that were previously ignored ## What changes are incl

Re: [I] Add metadata support for Aggregate and Window Functions [datafusion]

2025-05-19 Thread via GitHub
timsaucer closed issue #15902: Add metadata support for Aggregate and Window Functions URL: https://github.com/apache/datafusion/issues/15902 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spec

Re: [PR] feat: metadata handling for aggregates and window functions [datafusion]

2025-05-19 Thread via GitHub
timsaucer merged PR #15911: URL: https://github.com/apache/datafusion/pull/15911 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@data

Re: [PR] Perf: Optimize in memory sort [datafusion]

2025-05-19 Thread via GitHub
Dandandan commented on code in PR #15380: URL: https://github.com/apache/datafusion/pull/15380#discussion_r2096369131 ## datafusion/physical-plan/src/sorts/sort.rs: ## @@ -674,16 +676,35 @@ impl ExternalSorter { return self.sort_batch_stream(batch, metrics, reservat

Re: [PR] Optimize performance of `string::ascii` function [datafusion]

2025-05-19 Thread via GitHub
alamb commented on code in PR #16087: URL: https://github.com/apache/datafusion/pull/16087#discussion_r2096357959 ## datafusion/functions/src/string/ascii.rs: ## @@ -103,19 +103,22 @@ impl ScalarUDFImpl for AsciiFunc { fn calculate_ascii<'a, V>(array: V) -> Result where -

Re: [PR] Support `GroupsAccumulator` for Avg duration [datafusion]

2025-05-19 Thread via GitHub
alamb commented on PR #15748: URL: https://github.com/apache/datafusion/pull/15748#issuecomment-2892029694 🤖: Benchmark completed Details ``` Comparing HEAD and avg_duration_ga Benchmark clickbench_extended.json ┏━

Re: [I] [native_datafusion] No support for default values for Parquet columns [datafusion-comet]

2025-05-19 Thread via GitHub
mbutrovich commented on issue #1750: URL: https://github.com/apache/datafusion-comet/issues/1750#issuecomment-2892026436 Taking a look at this today. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [PR] docs: Add instructions for running individual Spark SQL tests from sbt [datafusion-comet]

2025-05-19 Thread via GitHub
coderfender commented on PR #1752: URL: https://github.com/apache/datafusion-comet/pull/1752#issuecomment-2892016663 Thank you for the approval -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to t

Re: [PR] Make `SessionContenxt::register_parquet` obey `collect_statistics` config [datafusion]

2025-05-19 Thread via GitHub
adriangb commented on PR #16080: URL: https://github.com/apache/datafusion/pull/16080#issuecomment-2891982600 Will try to fix tests later today but may be tomorrow -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the UR

Re: [PR] Support `GroupsAccumulator` for Avg duration [datafusion]

2025-05-19 Thread via GitHub
alamb commented on PR #15748: URL: https://github.com/apache/datafusion/pull/15748#issuecomment-2891939602 🤖 `./gh_compare_branch.sh` [Benchmark Script](https://github.com/alamb/datafusion-benchmarking/blob/main/gh_compare_branch.sh) Running Linux aal-dev 6.11.0-1013-gcp #13~24.04.1-Ubun

Re: [PR] Perf: Optimize in memory sort [datafusion]

2025-05-19 Thread via GitHub
alamb commented on PR #15380: URL: https://github.com/apache/datafusion/pull/15380#issuecomment-2891939456 🤖: Benchmark completed Details ``` Comparing HEAD and concat_batches_for_sort Benchmark clickbench_extended.json --

Re: [PR] Update documentation for `datafusion.execution.collect_statistics` [datafusion]

2025-05-19 Thread via GitHub
leoyvens commented on code in PR #16100: URL: https://github.com/apache/datafusion/pull/16100#discussion_r2096272547 ## datafusion/common/src/config.rs: ## @@ -292,7 +292,9 @@ config_namespace! { /// target batch size is determined by the configuration setting

Re: [I] General framework to decorrelate the subqueries [datafusion]

2025-05-19 Thread via GitHub
duongcongtoai commented on issue #5492: URL: https://github.com/apache/datafusion/issues/5492#issuecomment-2891917519 There are one thing we surely know that should be implemented: detect which nodes in the LogicalPlan AST is a dependent join node. However, we don't need to create a new Log

Re: [I] Blog about DataFusion correlated subquery support [datafusion]

2025-05-19 Thread via GitHub
alamb commented on issue #16084: URL: https://github.com/apache/datafusion/issues/16084#issuecomment-2891905096 @Adez017 -- thank you -- maybe you can help review the post. I think we need to do more of the work before writing a post about it so this probably can't start for another few we

Re: [PR] chore: Remove SMJ experimental status in docs [datafusion]

2025-05-19 Thread via GitHub
comphead commented on PR #16072: URL: https://github.com/apache/datafusion/pull/16072#issuecomment-2891899456 Thanks @alamb -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] Support `GroupsAccumulator` for Avg duration [datafusion]

2025-05-19 Thread via GitHub
alamb commented on PR #15748: URL: https://github.com/apache/datafusion/pull/15748#issuecomment-2891868596 @logan-keede just finished up a benchmark for this PR here; - https://github.com/apache/datafusion/pull/16105 I will merge this PR up from main and run the benchmarks on it to

Re: [PR] minor: Add benchmark query and corresponding documentation for Average Duration [datafusion]

2025-05-19 Thread via GitHub
alamb merged PR #16105: URL: https://github.com/apache/datafusion/pull/16105 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [I] Add Extended Clickbench benchmark for avg(duration) [datafusion]

2025-05-19 Thread via GitHub
alamb closed issue #15949: Add Extended Clickbench benchmark for avg(duration) URL: https://github.com/apache/datafusion/issues/15949 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comm

Re: [PR] minor: Add benchmark query and corresponding documentation for Average Duration [datafusion]

2025-05-19 Thread via GitHub
alamb commented on PR #16105: URL: https://github.com/apache/datafusion/pull/16105#issuecomment-2891864844 ```shell ./bench.sh run clickbench_extended ... Q8: SELECT "RegionID", "UserAgent", "OS", AVG(to_timestamp("ResponseEndTiming")-to_timestamp("ResponseStartTiming")) as avg_res

Re: [PR] chore(CI) Upgrade toolchain to Rust-1.87 [datafusion]

2025-05-19 Thread via GitHub
alamb merged PR #16068: URL: https://github.com/apache/datafusion/pull/16068 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [I] Update workspace / CI to Rust 1.87 [datafusion]

2025-05-19 Thread via GitHub
alamb closed issue #16061: Update workspace / CI to Rust 1.87 URL: https://github.com/apache/datafusion/issues/16061 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscr

Re: [I] [Epic] Remove Sort Merge Join Experimental status [datafusion]

2025-05-19 Thread via GitHub
alamb closed issue #9846: [Epic] Remove Sort Merge Join Experimental status URL: https://github.com/apache/datafusion/issues/9846 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] chore: Remove SMJ experimental status in docs [datafusion]

2025-05-19 Thread via GitHub
alamb merged PR #16072: URL: https://github.com/apache/datafusion/pull/16072 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] docs: Add instructions for running individual Spark SQL tests from sbt [datafusion-comet]

2025-05-19 Thread via GitHub
andygrove commented on code in PR #1752: URL: https://github.com/apache/datafusion-comet/pull/1752#discussion_r2096248604 ## docs/source/contributor-guide/spark-sql-tests.md: ## @@ -65,6 +65,15 @@ ENABLE_COMET=true build/sbt "hive/testOnly * -- -l org.apache.spark.tags.Extende

Re: [PR] fix: stack overflow for substrait functions with large argument lists that translate to DataFusion binary operators [datafusion]

2025-05-19 Thread via GitHub
alamb merged PR #16031: URL: https://github.com/apache/datafusion/pull/16031 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [I] Stack overflow for substrait functions with large argument lists that translate to DataFusion binary operators [datafusion]

2025-05-19 Thread via GitHub
alamb closed issue #16030: Stack overflow for substrait functions with large argument lists that translate to DataFusion binary operators URL: https://github.com/apache/datafusion/issues/16030 -- This is an automated message from the Apache Git Service. To respond to the message, please log o

Re: [PR] fix: stack overflow for substrait functions with large argument lists that translate to DataFusion binary operators [datafusion]

2025-05-19 Thread via GitHub
alamb commented on PR #16031: URL: https://github.com/apache/datafusion/pull/16031#issuecomment-2891850119 Thanks again everyone -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comme

Re: [PR] fix: Add coercion rules for Float16 types [datafusion]

2025-05-19 Thread via GitHub
alamb commented on PR #15816: URL: https://github.com/apache/datafusion/pull/15816#issuecomment-2891848455 I took the liberty of adding some basic slt tests as well to get this PR moving and ready for merge -- This is an automated message from the Apache Git Service. To respond to the mes

Re: [PR] Perf: Optimize in memory sort [datafusion]

2025-05-19 Thread via GitHub
alamb commented on PR #15380: URL: https://github.com/apache/datafusion/pull/15380#issuecomment-2891843039 🤖 `./gh_compare_branch.sh` [Benchmark Script](https://github.com/alamb/datafusion-benchmarking/blob/main/gh_compare_branch.sh) Running Linux aal-dev 6.11.0-1013-gcp #13~24.04.1-Ubun

Re: [PR] Perf: Optimize in memory sort [datafusion]

2025-05-19 Thread via GitHub
alamb commented on PR #15380: URL: https://github.com/apache/datafusion/pull/15380#issuecomment-2891842936 🤖: Benchmark completed Details ``` Comparing HEAD and concat_batches_for_sort Benchmark sort_tpch.json ┏━━━

Re: [PR] minor: Add benchmark query and corresponding documentation for Average Duration [datafusion]

2025-05-19 Thread via GitHub
alamb commented on PR #16105: URL: https://github.com/apache/datafusion/pull/16105#issuecomment-2891815932 I am just running this quickly locally and then I'll merge it in -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHu

Re: [PR] Perf: Optimize in memory sort [datafusion]

2025-05-19 Thread via GitHub
alamb commented on PR #15380: URL: https://github.com/apache/datafusion/pull/15380#issuecomment-2891815391 🤖 `./gh_compare_branch.sh` [Benchmark Script](https://github.com/alamb/datafusion-benchmarking/blob/main/gh_compare_branch.sh) Running Linux aal-dev 6.11.0-1013-gcp #13~24.04.1-Ubun

Re: [PR] minor: Add benchmark query and corresponding documentation for Average Duration [datafusion]

2025-05-19 Thread via GitHub
alamb commented on code in PR #16105: URL: https://github.com/apache/datafusion/pull/16105#discussion_r2096222012 ## benchmarks/queries/clickbench/README.md: ## @@ -192,10 +193,46 @@ Results look like +-+--+--+--+ ``` +### Q8: Average Late

Re: [PR] add_docs_to_run_single_test [datafusion-comet]

2025-05-19 Thread via GitHub
coderfender commented on PR #1752: URL: https://github.com/apache/datafusion-comet/pull/1752#issuecomment-2891795103 @andygrove . Please take a look whenever you get a chance. Thank you -- This is an automated message from the Apache Git Service. To respond to the message, please log on

[PR] feat: disable task stage plan binary cache [datafusion-ballista]

2025-05-19 Thread via GitHub
milenkovicm opened a new pull request, #1266: URL: https://github.com/apache/datafusion-ballista/pull/1266 # Which issue does this PR close? Closes #. # Rationale for this change In some cases stage plan should not be cached, as task plan may change, as there is no easy

Re: [I] Add imdb 10 rows slt test [datafusion]

2025-05-19 Thread via GitHub
alamb closed issue #15934: Add imdb 10 rows slt test URL: https://github.com/apache/datafusion/issues/15934 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-ma

Re: [PR] Added SLT tests for IMDB benchmark queries [datafusion]

2025-05-19 Thread via GitHub
alamb commented on PR #16067: URL: https://github.com/apache/datafusion/pull/16067#issuecomment-2891785070 I merged up and ran this locally and everything looks good still -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and us

Re: [PR] Added SLT tests for IMDB benchmark queries [datafusion]

2025-05-19 Thread via GitHub
alamb merged PR #16067: URL: https://github.com/apache/datafusion/pull/16067 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

[PR] add_docs_to_run_single_test [datafusion-comet]

2025-05-19 Thread via GitHub
coderfender opened a new pull request, #1752: URL: https://github.com/apache/datafusion-comet/pull/1752 ## Which issue does this PR close? Closes #1751 ## Rationale for this change Doc changes to help users run needed unit tests ## What changes are include

Re: [PR] Added SLT tests for IMDB benchmark queries [datafusion]

2025-05-19 Thread via GitHub
alamb commented on PR #16067: URL: https://github.com/apache/datafusion/pull/16067#issuecomment-2891784584 🚀 Thank you @kumarlokesh and @jayzhan211 for the review -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[I] Doc changes to provide steps to run individual unit tests in Apache-Spark [datafusion-comet]

2025-05-19 Thread via GitHub
coderfender opened a new issue, #1751: URL: https://github.com/apache/datafusion-comet/issues/1751 Doc changes to provide steps to run individual unit tests in Apache-Spark -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

Re: [PR] Label Spark functions PRs with spark label [datafusion]

2025-05-19 Thread via GitHub
alamb commented on PR #16095: URL: https://github.com/apache/datafusion/pull/16095#issuecomment-2891779107 Welcome back @findepi 🤗 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific c

Re: [PR] Label Spark functions PRs with spark label [datafusion]

2025-05-19 Thread via GitHub
alamb merged PR #16095: URL: https://github.com/apache/datafusion/pull/16095 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] Fix temp dir leak in tests [datafusion]

2025-05-19 Thread via GitHub
alamb merged PR #16094: URL: https://github.com/apache/datafusion/pull/16094 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] fix: stack overflow for substrait functions with large argument lists that translate to DataFusion binary operators [datafusion]

2025-05-19 Thread via GitHub
alamb commented on PR #16031: URL: https://github.com/apache/datafusion/pull/16031#issuecomment-2891772639 I plan to merge this PR to main once the CI has passed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

Re: [PR] feat: metadata handling for aggregates and window functions [datafusion]

2025-05-19 Thread via GitHub
alamb commented on PR #15911: URL: https://github.com/apache/datafusion/pull/15911#issuecomment-2891771248 I just merged a fix for CI on main, and remerged this PR. Hopefully it will now be good to go -- This is an automated message from the Apache Git Service. To respond to the message,

Re: [I] General framework to decorrelate the subqueries [datafusion]

2025-05-19 Thread via GitHub
suibianwanwank commented on issue #5492: URL: https://github.com/apache/datafusion/issues/5492#issuecomment-2891769413 I'm not sure if I fully understand your point. In this paper, the decorrelate should be based on `DependentJoin`, as can be seen from the example relational algebra diagram

Re: [PR] Refactor substrait producer into multiple files [datafusion]

2025-05-19 Thread via GitHub
alamb merged PR #16089: URL: https://github.com/apache/datafusion/pull/16089 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] Refactor substrait producer into multiple files [datafusion]

2025-05-19 Thread via GitHub
alamb commented on PR #16089: URL: https://github.com/apache/datafusion/pull/16089#issuecomment-2891767670 Since this is just moving code around and is likely uncontroversial I am going to merge it in without waiting the customary 24 hours -- This is an automated message from the Apache G

Re: [PR] Fix CI on main: Add window function examples in code [datafusion]

2025-05-19 Thread via GitHub
alamb merged PR #16102: URL: https://github.com/apache/datafusion/pull/16102 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] Fix CI on main: Add window function examples in code [datafusion]

2025-05-19 Thread via GitHub
alamb commented on PR #16102: URL: https://github.com/apache/datafusion/pull/16102#issuecomment-2891764217 Thank you for the quick review @andygrove -- merging this in to unblock CI (we have a pile of PRs waiting to merge) -- This is an automated message from the Apache Git Service. To r

Re: [I] [Epic]: Google Summer of Code 2025 Improving Spilling Execution [datafusion]

2025-05-19 Thread via GitHub
alamb commented on issue #16065: URL: https://github.com/apache/datafusion/issues/16065#issuecomment-2891762929 > Stabilize external sort and aggregate. In my opinion, I suggest starting and finishing with external sort -- having a robust and performance external sort can be a key

Re: [PR] chore: Reduce repetition in the parameter type inference tests [datafusion]

2025-05-19 Thread via GitHub
jsai28 commented on PR #16079: URL: https://github.com/apache/datafusion/pull/16079#issuecomment-2891756937 @alamb Yep I just need to fix that clippy issue and then I'll add it to the rest of the tests. I'll probably just do it in this PR itself -- This is an automated message from th

Re: [PR] feat: Add auto mode for `COMET_PARQUET_SCAN_IMPL` [datafusion-comet]

2025-05-19 Thread via GitHub
parthchandra commented on PR #1747: URL: https://github.com/apache/datafusion-comet/pull/1747#issuecomment-2891757995 Looking good so far. Will do a final review once it is ready. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitH

Re: [PR] feat: metadata handling for aggregates and window functions [datafusion]

2025-05-19 Thread via GitHub
timsaucer commented on PR #15911: URL: https://github.com/apache/datafusion/pull/15911#issuecomment-2891757264 > > > Do you think it is feasible to update the scalar, aggregate, and window function APIs to use `FieldRef` instead of Field? That way we can avoid most string copies. > >

Re: [PR] chore: Add `scanImpl` attribute to `CometScanExec` [datafusion-comet]

2025-05-19 Thread via GitHub
andygrove commented on code in PR #1746: URL: https://github.com/apache/datafusion-comet/pull/1746#discussion_r2096187472 ## spark/src/main/scala/org/apache/comet/serde/QueryPlanSerde.scala: ## @@ -2376,12 +2375,26 @@ object QueryPlanSerde extends Logging with CometExprShim {

Re: [PR] Optimize performance of `string::ascii` function [datafusion]

2025-05-19 Thread via GitHub
Dandandan commented on code in PR #16087: URL: https://github.com/apache/datafusion/pull/16087#discussion_r2096162977 ## datafusion/functions/src/string/ascii.rs: ## @@ -103,19 +106,29 @@ impl ScalarUDFImpl for AsciiFunc { fn calculate_ascii<'a, V>(array: V) -> Result where

Re: [PR] feat: metadata handling for aggregates and window functions [datafusion]

2025-05-19 Thread via GitHub
alamb commented on PR #15911: URL: https://github.com/apache/datafusion/pull/15911#issuecomment-2891735853 > > Do you think it is feasible to update the scalar, aggregate, and window function APIs to use `FieldRef` instead of Field? That way we can avoid most string copies. > > Do yo

Re: [PR] chore: Remove SMJ experimental status in docs [datafusion]

2025-05-19 Thread via GitHub
alamb commented on PR #16072: URL: https://github.com/apache/datafusion/pull/16072#issuecomment-2891733882 I also looked around for more docs about sort merge join or experimental and could not find them -- This is an automated message from the Apache Git Service. To respond to the messag

Re: [PR] Optimize performance of `string::ascii` function [datafusion]

2025-05-19 Thread via GitHub
Dandandan commented on code in PR #16087: URL: https://github.com/apache/datafusion/pull/16087#discussion_r2096168025 ## datafusion/functions/src/string/ascii.rs: ## @@ -103,19 +103,22 @@ impl ScalarUDFImpl for AsciiFunc { fn calculate_ascii<'a, V>(array: V) -> Result where

Re: [PR] chore: More refactoring of type checking logic [datafusion-comet]

2025-05-19 Thread via GitHub
andygrove commented on code in PR #1744: URL: https://github.com/apache/datafusion-comet/pull/1744#discussion_r2096163931 ## spark/src/main/scala/org/apache/comet/serde/QueryPlanSerde.scala: ## @@ -2518,6 +2518,15 @@ object QueryPlanSerde extends Logging with CometExprShim {

Re: [PR] chore: Add `scanImpl` attribute to `CometScanExec` [datafusion-comet]

2025-05-19 Thread via GitHub
parthchandra commented on code in PR #1746: URL: https://github.com/apache/datafusion-comet/pull/1746#discussion_r2096122217 ## spark/src/main/scala/org/apache/comet/serde/QueryPlanSerde.scala: ## @@ -2376,12 +2375,26 @@ object QueryPlanSerde extends Logging with CometExprShim

Re: [PR] Optimize performance of `string::ascii` function [datafusion]

2025-05-19 Thread via GitHub
Dandandan commented on code in PR #16087: URL: https://github.com/apache/datafusion/pull/16087#discussion_r2096161845 ## datafusion/functions/src/string/ascii.rs: ## @@ -103,19 +103,22 @@ impl ScalarUDFImpl for AsciiFunc { fn calculate_ascii<'a, V>(array: V) -> Result where

Re: [I] General framework to decorrelate the subqueries [datafusion]

2025-05-19 Thread via GitHub
duongcongtoai commented on issue #5492: URL: https://github.com/apache/datafusion/issues/5492#issuecomment-2891716695 I wonder if creating `LogicalPlan::DependentJoin` make sense in our case? if the only place it is being used is inside the optimizor that does the decorrelation In du

  1   2   3   >