Re: [PR] Fix TopK Sort incorrectly pushed down past Join with anti join [datafusion]

2025-07-01 Thread via GitHub
zhuqi-lucas commented on code in PR #16641: URL: https://github.com/apache/datafusion/pull/16641#discussion_r2178965043 ## datafusion/physical-optimizer/src/enforce_sorting/sort_pushdown.rs: ## @@ -668,6 +668,15 @@ fn handle_hash_join( plan: &HashJoinExec, parent_requi

Re: [PR] Reuse Rows allocation in RowCursorStream [datafusion]

2025-07-01 Thread via GitHub
Dandandan commented on PR #16647: URL: https://github.com/apache/datafusion/pull/16647#issuecomment-3026648547 FYI @zhuqi-lucas @acking-you -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the sp

Re: [PR] Perf: fast CursorValues compare for StringViewArray using inline_key_… [datafusion]

2025-07-01 Thread via GitHub
zhuqi-lucas commented on code in PR #16630: URL: https://github.com/apache/datafusion/pull/16630#discussion_r2179270018 ## datafusion/physical-plan/src/sorts/cursor.rs: ## @@ -288,6 +288,64 @@ impl CursorArray for StringViewArray { } } +/// Todo use arrow-rs side api aft

Re: [PR] Perf: fast CursorValues compare for StringViewArray using inline_key_… [datafusion]

2025-07-01 Thread via GitHub
zhuqi-lucas commented on PR #16630: URL: https://github.com/apache/datafusion/pull/16630#issuecomment-3026686262 > Thanks @zhuqi-lucas -- this looks wonderful > > I kicked off some benchmark runs. I think we'll be ready to merge once the benchmarks are done and we add some tests. >

Re: [PR] docs: Update benchmark results for 0.9.0 [datafusion-comet]

2025-07-01 Thread via GitHub
andygrove merged PR #1959: URL: https://github.com/apache/datafusion-comet/pull/1959 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@

Re: [I] Wrong results when `pushdown_filters` is enabled (starting in 48.0.0) [datafusion]

2025-07-01 Thread via GitHub
ianthetechie commented on issue #16588: URL: https://github.com/apache/datafusion/issues/16588#issuecomment-3024817979 @adriangb The dataset is loaded here: https://github.com/ianthetechie/datafusion-predicate-pushdown-mre/blob/0abc24cd2f728a88e7496f0294554f0858d7c79b/src/context.rs#L12. It

Re: [I] Run DataFusion benchmarks regularly and track performance history over time [datafusion]

2025-07-01 Thread via GitHub
alamb commented on issue #5504: URL: https://github.com/apache/datafusion/issues/5504#issuecomment-3024823095 I just ran the clickbench benchmark scripts against the last 100 or so commits of DataFusion -- you can see the results here https://alamb.github.io/datafusion-benchmarking/

Re: [I] Make iceberg integration tests optional [datafusion-comet]

2025-07-01 Thread via GitHub
parthchandra commented on issue #1964: URL: https://github.com/apache/datafusion-comet/issues/1964#issuecomment-3024824685 > One idea I have is to only run these workflows if the PR title contains `[iceberg]`. I like this idea. Run either the Spark SQL tests or the Iceberg tests dep

Re: [I] Make iceberg integration tests optional [datafusion-comet]

2025-07-01 Thread via GitHub
hsiang-c commented on issue #1964: URL: https://github.com/apache/datafusion-comet/issues/1964#issuecomment-3024829533 @andygrove @parthchandra We can make it optional for now and meanwhile I'll try to split the tests into smaller chunks to speed up CI. -- This is an automated message fr

Re: [I] Update ClickBench benchmarks with next DataFusion [datafusion]

2025-07-01 Thread via GitHub
Omega359 commented on issue #14587: URL: https://github.com/apache/datafusion/issues/14587#issuecomment-3025036814 This can likely be closed, i'll create a new ticket for DF 49. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

Re: [PR] feat: implement predicate adaptation for nested structs [datafusion]

2025-07-01 Thread via GitHub
adriangb commented on code in PR #16589: URL: https://github.com/apache/datafusion/pull/16589#discussion_r2178261730 ## datafusion/physical-expr-adapter/src/schema_rewriter.rs: ## @@ -97,13 +101,111 @@ impl<'a> PhysicalExprSchemaRewriter<'a> { &self, expr: Arc,

Re: [I] Update ClickBench benchmarks with DataFusion 49.0.0 (When Published) [datafusion]

2025-07-01 Thread via GitHub
Omega359 commented on issue #16643: URL: https://github.com/apache/datafusion/issues/16643#issuecomment-3025048655 Given the number of improvements that have been made to DF in the last 3 releases I'm expecting that DF should show an improvement in the clikcbench benchmark (even with https:

Re: [PR] [ignore] Remove Comet ANSI config [datafusion-comet]

2025-07-01 Thread via GitHub
codecov-commenter commented on PR #1969: URL: https://github.com/apache/datafusion-comet/pull/1969#issuecomment-3025047223 ## [Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/1969?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_ca

Re: [PR] Format `Date32` to string given timestamp specifiers [datafusion]

2025-07-01 Thread via GitHub
friendlymatthew commented on PR #15361: URL: https://github.com/apache/datafusion/pull/15361#issuecomment-3025056915 > @friendlymatthew - I think this actually should be reopened and just be updated to the previous version of the code then merged in - I think it's a worthy addition to DF. I

[PR] Format `Date32` to string given timestamp specifiers [datafusion]

2025-07-01 Thread via GitHub
friendlymatthew opened a new pull request, #15361: URL: https://github.com/apache/datafusion/pull/15361 - Closes https://github.com/apache/datafusion/issues/14536 ## Rationale for this change Datafusion currently errs when attempting to format a date using time-related specifie

[I] Update ClickBench benchmarks with DataFusion 49.0.0 (When Published) [datafusion]

2025-07-01 Thread via GitHub
Omega359 opened a new issue, #16643: URL: https://github.com/apache/datafusion/issues/16643 ### Is your feature request related to a problem or challenge? Follow on to https://github.com/apache/datafusion/issues/14246 Requires https://github.com/apache/datafusion/

Re: [I] Support `from_unixtime(ts, [fmt])` [datafusion]

2025-07-01 Thread via GitHub
Omega359 commented on issue #16577: URL: https://github.com/apache/datafusion/issues/16577#issuecomment-3025061285 > Right now it is > > ``` > from_unixtime(expression[, timezone]) > ``` > > So I guess it will be like > > ``` > from_unixtime(expression[, fmt, tim

Re: [I] is it possible to make a async UDTF that read rows from other database? [datafusion]

2025-07-01 Thread via GitHub
Omega359 commented on issue #16601: URL: https://github.com/apache/datafusion/issues/16601#issuecomment-3025066808 Please use the discussions to ask questions :) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

Re: [PR] Migrate core test to insta, part 2 [datafusion]

2025-07-01 Thread via GitHub
blaginin commented on code in PR #16617: URL: https://github.com/apache/datafusion/pull/16617#discussion_r2178600922 ## datafusion/core/tests/physical_optimizer/sanity_checker.rs: ## @@ -445,10 +443,15 @@ async fn test_bounded_window_agg_no_sort_requirement() -> Result<()> {

Re: [PR] respect parquet filter pushdown config in scan [datafusion]

2025-07-01 Thread via GitHub
adriangb commented on PR #16646: URL: https://github.com/apache/datafusion/pull/16646#issuecomment-3025642930 I can confirm this new test fails without the change: ``` ❯ cargo test -p datafusion-sqllogictest --test sqllogictests -- parquet_filter_pushdown Compiling datafusion

[PR] Run Iceberg Spark tests only when PR title contains [iceberg] [datafusion-comet]

2025-07-01 Thread via GitHub
hsiang-c opened a new pull request, #1976: URL: https://github.com/apache/datafusion-comet/pull/1976 ## Which issue does this PR close? Closes #. https://github.com/apache/datafusion-comet/issues/1964 ## Rationale for this change Make Iceberg Spark CI opti

Re: [PR] Run Iceberg Spark tests only when PR title contains [iceberg] [datafusion-comet]

2025-07-01 Thread via GitHub
hsiang-c commented on code in PR #1976: URL: https://github.com/apache/datafusion-comet/pull/1976#discussion_r2178604906 ## .github/workflows/iceberg_spark_test.yml: ## @@ -41,6 +41,7 @@ env: jobs: iceberg-spark-sql: +if: contains(github.event.pull_request.title, '[ice

Re: [PR] Support explain tree format debug for benchmark debug [datafusion]

2025-07-01 Thread via GitHub
alamb commented on PR #16604: URL: https://github.com/apache/datafusion/pull/16604#issuecomment-3025591249 Thanks again @zhuqi-lucas -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] Migrate core test to insta, part 2 [datafusion]

2025-07-01 Thread via GitHub
blaginin commented on code in PR #16617: URL: https://github.com/apache/datafusion/pull/16617#discussion_r2178602504 ## datafusion/core/tests/physical_optimizer/sanity_checker.rs: ## @@ -445,10 +443,15 @@ async fn test_bounded_window_agg_no_sort_requirement() -> Result<()> {

Re: [PR] Add an example of embedding indexes inside a parquet file [datafusion]

2025-07-01 Thread via GitHub
alamb commented on code in PR #16395: URL: https://github.com/apache/datafusion/pull/16395#discussion_r2178596921 ## datafusion-examples/examples/embedding_parquet_indexes.rs: ## @@ -0,0 +1,402 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contrib

[PR] Reuse Rows in RowCursorStream [datafusion]

2025-07-01 Thread via GitHub
Dandandan opened a new pull request, #16647: URL: https://github.com/apache/datafusion/pull/16647 ## Which issue does this PR close? - Closes #. ## Rationale for this change ## What changes are included in this PR? ## Are these changes teste

Re: [PR] respect parquet filter pushdown config in scan [datafusion]

2025-07-01 Thread via GitHub
alamb commented on code in PR #16646: URL: https://github.com/apache/datafusion/pull/16646#discussion_r2178609697 ## datafusion/datasource-parquet/src/source.rs: ## @@ -343,9 +343,7 @@ impl ParquetSource { } /// If true, the predicate will be used during the parquet

Re: [I] Wrong results when `pushdown_filters` is enabled (starting in 48.0.0) [datafusion]

2025-07-01 Thread via GitHub
adriangb commented on issue #16588: URL: https://github.com/apache/datafusion/issues/16588#issuecomment-3025643782 https://github.com/apache/datafusion/pull/16646 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

Re: [PR] respect parquet filter pushdown config in scan [datafusion]

2025-07-01 Thread via GitHub
adriangb commented on code in PR #16646: URL: https://github.com/apache/datafusion/pull/16646#discussion_r2178616664 ## datafusion/datasource-parquet/src/source.rs: ## @@ -647,6 +651,7 @@ impl FileSource for ParquetSource { None => conjunction(allowed_filters.iter()

Re: [PR] respect parquet filter pushdown config in scan [datafusion]

2025-07-01 Thread via GitHub
adriangb commented on code in PR #16646: URL: https://github.com/apache/datafusion/pull/16646#discussion_r2178616357 ## datafusion/datasource-parquet/src/source.rs: ## @@ -343,9 +343,7 @@ impl ParquetSource { } /// If true, the predicate will be used during the parqu

Re: [I] [EPIC] Implement expressions as ScalarUDFImpl [datafusion-comet]

2025-07-01 Thread via GitHub
parthchandra commented on issue #1819: URL: https://github.com/apache/datafusion-comet/issues/1819#issuecomment-3025402294 @Kimahriman what you say makes sense. For `GetStructField` and `GetArrayStructFields` it may not be useful to convert to ScalarUDFImpl -- This is an automated messag

Re: [PR] Feat: Support Spark 4.0.0 part1 [datafusion-comet]

2025-07-01 Thread via GitHub
huaxingao commented on PR #1830: URL: https://github.com/apache/datafusion-comet/pull/1830#issuecomment-3024594602 Thanks @andygrove @kazuyukitanimura -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

Re: [PR] feat: Add from_unixtime support [datafusion-comet]

2025-07-01 Thread via GitHub
andygrove commented on PR #1943: URL: https://github.com/apache/datafusion-comet/pull/1943#issuecomment-3024627259 Here is a suggested test to add to `CometFuzzTestSuite`: ```scala test("from_unix_time") { val df = spark.read.parquet(filename) df.createOrReplaceTemp

Re: [PR] Restore topk filtering tests [datafusion]

2025-07-01 Thread via GitHub
adriangb commented on PR #16501: URL: https://github.com/apache/datafusion/pull/16501#issuecomment-3024615890 @Dandandan @alamb @AdamGS can you review and verify you agree with the proposed change to the tests? Would love to merge this and close this chapter in preparation for the next rele

Re: [PR] perf: Reuse row converter during sort [datafusion]

2025-07-01 Thread via GitHub
Omega359 commented on PR #15302: URL: https://github.com/apache/datafusion/pull/15302#issuecomment-3024777988 Was there a benchmark created and run that covered this change? I'm wondering if it's noteworthy enough to include in the [DF 49 blog post](https://github.com/apache/datafusion/issu

Re: [I] [DISCUSS] DataFusion less frequent major / breaking releases (ease using multiple third-party extensions (like delta, or iceberg) ) [datafusion]

2025-07-01 Thread via GitHub
Omega359 commented on issue #16622: URL: https://github.com/apache/datafusion/issues/16622#issuecomment-3024465732 Personally I prefer option 2 however it does involve more work for maintainers, at least at the point of syncing of main into LTS. Backporting of changes/fixes from main -> LTS

Re: [PR] Auto start testcontainers for `datafusion-cli` [datafusion]

2025-07-01 Thread via GitHub
blaginin commented on PR #16644: URL: https://github.com/apache/datafusion/pull/16644#issuecomment-3025840494 fyi @Omega359 -- as you brought test containers initially 👀 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

Re: [I] Wrong results when `pushdown_filters` is enabled (starting in 48.0.0) [datafusion]

2025-07-01 Thread via GitHub
adriangb commented on issue #16588: URL: https://github.com/apache/datafusion/issues/16588#issuecomment-3024719052 @ianthetechie I looked at your example repo (thanks so much for providing an MRE) but I don't see the data anywhere. Maybe I'm missing something but is `categories` a known dat

Re: [PR] feat: Add from_unixtime support [datafusion-comet]

2025-07-01 Thread via GitHub
andygrove commented on code in PR #1943: URL: https://github.com/apache/datafusion-comet/pull/1943#discussion_r2177902711 ## spark/src/main/scala/org/apache/comet/serde/unixtime.scala: ## @@ -0,0 +1,61 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or

[I] Investigate Guava duplicated classes [datafusion-comet]

2025-07-01 Thread via GitHub
kazuyukitanimura opened a new issue, #1968: URL: https://github.com/apache/datafusion-comet/issues/1968 ### Describe the bug We now have duplicated classes that we ignore ``` com.google.guava guava

Re: [PR] Feat: Support Spark 4.0.0 part1 [datafusion-comet]

2025-07-01 Thread via GitHub
kazuyukitanimura commented on code in PR #1830: URL: https://github.com/apache/datafusion-comet/pull/1830#discussion_r2178160745 ## pom.xml: ## @@ -1074,9 +1074,19 @@ under the License. javax.annotation.meta.TypeQualifierNickname

Re: [PR] Support multiple ordered array_agg aggregations [datafusion]

2025-07-01 Thread via GitHub
alamb commented on code in PR #16625: URL: https://github.com/apache/datafusion/pull/16625#discussion_r2178510104 ## datafusion/sqllogictest/test_files/group_by.slt: ## @@ -2903,7 +2909,8 @@ logical_plan physical_plan 01)ProjectionExec: expr=[country@0 as country, first_value

Re: [PR] Support explain tree format debug for benchmark debug [datafusion]

2025-07-01 Thread via GitHub
alamb merged PR #16604: URL: https://github.com/apache/datafusion/pull/16604 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [I] Support explain tree format debug for benchmark debug [datafusion]

2025-07-01 Thread via GitHub
alamb closed issue #16603: Support explain tree format debug for benchmark debug URL: https://github.com/apache/datafusion/issues/16603 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific co

Re: [PR] Support multiple ordered array_agg aggregations [datafusion]

2025-07-01 Thread via GitHub
ozankabak commented on PR #16625: URL: https://github.com/apache/datafusion/pull/16625#issuecomment-3025538219 Agreed, I think this will need some time to brew. As I said previously, I hope to get some more eyes on this in the short term (maybe early next week) -- This is an automated mes

Re: [PR] Perf: fast CursorValues compare for StringViewArray using inline_key_… [datafusion]

2025-07-01 Thread via GitHub
alamb commented on PR #16630: URL: https://github.com/apache/datafusion/pull/16630#issuecomment-3025612675 🤖: Benchmark completed Details ``` Comparing HEAD and fast_sort_with_inlined_fast_key Benchmark clickbench_extended.json --

Re: [PR] Perf: fast CursorValues compare for StringViewArray using inline_key_… [datafusion]

2025-07-01 Thread via GitHub
alamb commented on PR #16630: URL: https://github.com/apache/datafusion/pull/16630#issuecomment-3025612783 🤖 `./gh_compare_branch.sh` [Benchmark Script](https://github.com/alamb/datafusion-benchmarking/blob/main/gh_compare_branch.sh) Running Linux aal-dev 6.11.0-1016-gcp #16~24.04.1-Ubun

Re: [PR] Perf: fast CursorValues compare for StringViewArray using inline_key_… [datafusion]

2025-07-01 Thread via GitHub
alamb commented on PR #16630: URL: https://github.com/apache/datafusion/pull/16630#issuecomment-3025615238 🤖: Benchmark completed Details ``` Comparing HEAD and fast_sort_with_inlined_fast_key Benchmark sort_tpch.json

Re: [PR] fix: [ignore] Remove auto scan fallback for Spark 4.0.0 [datafusion-comet]

2025-07-01 Thread via GitHub
codecov-commenter commented on PR #1975: URL: https://github.com/apache/datafusion-comet/pull/1975#issuecomment-3025599293 ## [Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/1975?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_ca

Re: [PR] Migrate core test to insta, part 2 [datafusion]

2025-07-01 Thread via GitHub
blaginin commented on code in PR #16617: URL: https://github.com/apache/datafusion/pull/16617#discussion_r2178589084 ## datafusion/core/tests/physical_optimizer/limited_distinct_aggregation.rs: ## @@ -162,24 +173,31 @@ async fn test_single_global() -> Result<()> { Some(

[PR] respect parquet filter pushdown config in scan [datafusion]

2025-07-01 Thread via GitHub
adriangb opened a new pull request, #16646: URL: https://github.com/apache/datafusion/pull/16646 Fixes #16625 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe

Re: [PR] respect parquet filter pushdown config in scan [datafusion]

2025-07-01 Thread via GitHub
adriangb commented on PR #16646: URL: https://github.com/apache/datafusion/pull/16646#issuecomment-3025629159 I'm trying to understand why this isn't caught by `datafusion/sqllogictest/test_files/parquet_filter_pushdown.slt`. Might have to wait until tomorrow. -- This is an automated mes

Re: [PR] Format `Date32` to string given timestamp specifiers [datafusion]

2025-07-01 Thread via GitHub
Omega359 commented on PR #15361: URL: https://github.com/apache/datafusion/pull/15361#issuecomment-3024951447 @friendlymatthew - I think this actually should be reopened and just be updated to the previous version of the code then merged in - I think it's a worthy addition to DF. I think th

Re: [I] from_unixtime does not work for large values [datafusion]

2025-07-01 Thread via GitHub
Omega359 commented on issue #16594: URL: https://github.com/apache/datafusion/issues/16594#issuecomment-3025120468 DF delegates to arrow for this which uses Chrono. https://docs.rs/chrono/latest/src/chrono/naive/date/mod.rs.html#1431 Switching to a different library such as jiff will likely

Re: [I] Feature Request: Implement `MATCH_RECOGNIZE` for Advanced Pattern Matching [datafusion]

2025-07-01 Thread via GitHub
geoffreyclaude commented on issue #13583: URL: https://github.com/apache/datafusion/issues/13583#issuecomment-3025188787 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] fix: extend recursive protection to prevent stack overflows in additional functions [datafusion]

2025-07-01 Thread via GitHub
alamb commented on PR #16506: URL: https://github.com/apache/datafusion/pull/16506#issuecomment-3025182224 Thank you for working on this -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specif

Re: [I] Add ANSI support for Remainder [datafusion-comet]

2025-07-01 Thread via GitHub
rishvin commented on issue #532: URL: https://github.com/apache/datafusion-comet/issues/532#issuecomment-3025165684 Opened PR: https://github.com/apache/datafusion-comet/pull/1971 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitH

[PR] chore: ANSI support for remainder operation. [datafusion-comet]

2025-07-01 Thread via GitHub
rishvin opened a new pull request, #1971: URL: https://github.com/apache/datafusion-comet/pull/1971 ## Which issue does this PR close? Closes #532 ## Rationale for this change Spark in ANSI mode can throw division by zero exception for remainder, however

Re: [PR] Make `GenericDialect` support from-first syntax [datafusion-sqlparser-rs]

2025-07-01 Thread via GitHub
alamb commented on PR #1911: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1911#issuecomment-3025197204 The code keeps on rolling -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to th

Re: [I] Fix listagg-collation.sql test in Spark 4.0.0 [datafusion-comet]

2025-07-01 Thread via GitHub
andygrove commented on issue #1947: URL: https://github.com/apache/datafusion-comet/issues/1947#issuecomment-3025198415 Debug output showing the plans: ``` AdaptiveSparkPlan isFinalPlan=false +- ObjectHashAggregate(keys=[], functions=[listagg(c1#97, null, collate(c1#97, utf8_bi

Re: [I] Feature Request: Implement `MATCH_RECOGNIZE` for Advanced Pattern Matching [datafusion]

2025-07-01 Thread via GitHub
geoffreyclaude commented on issue #13583: URL: https://github.com/apache/datafusion/issues/13583#issuecomment-3025194205 I've got a pretty terrible hackathon POC which more or less works. I'll clean it up and hopefully open up a draft PR by the end of the week for follow up discussions! -

Re: [PR] chore: ANSI support for remainder operation. [datafusion-comet]

2025-07-01 Thread via GitHub
rishvin commented on code in PR #1971: URL: https://github.com/apache/datafusion-comet/pull/1971#discussion_r2178359783 ## native/spark-expr/src/math_funcs/modulo_expr.rs: ## @@ -0,0 +1,513 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor

Re: [PR] [datafusion-spark] Implement Spark `luhn_check` function [datafusion]

2025-07-01 Thread via GitHub
shehabgamin commented on PR #16580: URL: https://github.com/apache/datafusion/pull/16580#issuecomment-3025703724 > Will review by Tuesday! > > Thank you @tlm365 🚀 Looks like this went back into draft mode, feel free to ping me whenever for review! -- This is an automated mes

Re: [PR] feat: Parquet modular encryption [datafusion]

2025-07-01 Thread via GitHub
alamb commented on PR #16351: URL: https://github.com/apache/datafusion/pull/16351#issuecomment-3025176564 Thank you @corwinjoy -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comme

Re: [PR] Run Iceberg Spark tests only when PR title contains [iceberg] [datafusion-comet]

2025-07-01 Thread via GitHub
hsiang-c commented on code in PR #1976: URL: https://github.com/apache/datafusion-comet/pull/1976#discussion_r2178692973 ## .github/workflows/iceberg_spark_test.yml: ## @@ -41,6 +41,7 @@ env: jobs: iceberg-spark-sql: +if: contains(github.event.pull_request.title, '[ice

Re: [I] Support GROUP BY and DISTINCT with FixedSizeList values [datafusion]

2025-07-01 Thread via GitHub
findepi commented on issue #16442: URL: https://github.com/apache/datafusion/issues/16442#issuecomment-3024541889 also needs https://github.com/apache/arrow-rs/pull/7789 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

Re: [PR] Restore topk filtering tests [datafusion]

2025-07-01 Thread via GitHub
adriangb commented on PR #16501: URL: https://github.com/apache/datafusion/pull/16501#issuecomment-3024609927 Since there seems to be agreement on the path forward I pushed 514ab74e0 which I think achieves the goal by simply changing `SELECT *` to `SELECT `. Then we can continue to assert th

Re: [I] Make iceberg integration tests optional [datafusion-comet]

2025-07-01 Thread via GitHub
kazuyukitanimura commented on issue #1964: URL: https://github.com/apache/datafusion-comet/issues/1964#issuecomment-3024768690 Is it possible to split the test make each run shorter and easier to retry? @hsiang-c There were issues with Iceberg integrations, so I think CI is important.

Re: [PR] Convert Option> to Vec [datafusion]

2025-07-01 Thread via GitHub
findepi commented on code in PR #16615: URL: https://github.com/apache/datafusion/pull/16615#discussion_r2178199885 ## datafusion/expr/src/expr.rs: ## @@ -1175,26 +1175,26 @@ impl Exists { /// User Defined Aggregate Function /// -/// See [`udaf::AggregateUDF`] for more infor

Re: [I] Release Comet 0.9.0 (June/July 2025) [datafusion-comet]

2025-07-01 Thread via GitHub
andygrove commented on issue #1856: URL: https://github.com/apache/datafusion-comet/issues/1856#issuecomment-3024975347 Blog post PR: https://github.com/apache/datafusion-site/pull/78 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

Re: [PR] perf: Optimize `AvgDecimalGroupsAccumulator` [datafusion-comet]

2025-07-01 Thread via GitHub
parthchandra commented on code in PR #1893: URL: https://github.com/apache/datafusion-comet/pull/1893#discussion_r2178204401 ## native/spark-expr/src/agg_funcs/avg_decimal.rs: ## @@ -341,29 +337,16 @@ impl AvgDecimalGroupsAccumulator { } } -fn is_overflow(&se

Re: [I] Release Comet 0.9.0 (June/July 2025) [datafusion-comet]

2025-07-01 Thread via GitHub
andygrove commented on issue #1856: URL: https://github.com/apache/datafusion-comet/issues/1856#issuecomment-3024971566 I have started the release vote -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[I] Release Comet 0.10.0 (August 2025) [datafusion-comet]

2025-07-01 Thread via GitHub
andygrove opened a new issue, #1970: URL: https://github.com/apache/datafusion-comet/issues/1970 ### What is the problem the feature request solves? I would like to start planning the Comet 0.10.0 release. Let's use this issue to co-ordinate on any issues or PRs to resolve for the rel

Re: [PR] fix: enable listagg test for Spark 4.0.0 [wip] [datafusion-comet]

2025-07-01 Thread via GitHub
codecov-commenter commented on PR #1965: URL: https://github.com/apache/datafusion-comet/pull/1965#issuecomment-3024897081 ## [Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/1965?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_ca

[I] Enable `auto` scan mode for Spark 4.0.0 [datafusion-comet]

2025-07-01 Thread via GitHub
andygrove opened a new issue, #1967: URL: https://github.com/apache/datafusion-comet/issues/1967 ### What is the problem the feature request solves? For the initial Spark 4.0.0 support, we disabled the `auto` scan mode: ```scala private def selectScan(scanExec: FileSourceSc

Re: [PR] fix: support scalar function nested in get_field in Unparser [datafusion]

2025-07-01 Thread via GitHub
alamb merged PR #16610: URL: https://github.com/apache/datafusion/pull/16610 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] fix: support scalar function nested in get_field in Unparser [datafusion]

2025-07-01 Thread via GitHub
alamb commented on PR #16610: URL: https://github.com/apache/datafusion/pull/16610#issuecomment-3025428193 Thanks @chenkovsky and @goldmedal -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[PR] fix: [ignore] Remove auto scan fallback for Spark 4.0.0 [datafusion-comet]

2025-07-01 Thread via GitHub
andygrove opened a new pull request, #1975: URL: https://github.com/apache/datafusion-comet/pull/1975 ## Which issue does this PR close? Closes #. ## Rationale for this change Testing / scoping out future work. ## What changes are included in this P

Re: [PR] Fix TopK Sort incorrectly pushed down past Join with anti join [datafusion]

2025-07-01 Thread via GitHub
adriangb commented on code in PR #16641: URL: https://github.com/apache/datafusion/pull/16641#discussion_r2177988731 ## datafusion/physical-optimizer/src/enforce_sorting/sort_pushdown.rs: ## @@ -668,6 +668,15 @@ fn handle_hash_join( plan: &HashJoinExec, parent_required

Re: [PR] Implementation for regex_instr [datafusion]

2025-07-01 Thread via GitHub
Omega359 commented on PR #15928: URL: https://github.com/apache/datafusion/pull/15928#issuecomment-3025423773 I'll pull this branch later this week and run the tests but in general this PR is looking pretty good! I left a few comments/suggestions for a few things I found from a quick review

Re: [PR] TopK dynamic filter pushdown attempt 2 [datafusion]

2025-07-01 Thread via GitHub
alamb commented on PR #15770: URL: https://github.com/apache/datafusion/pull/15770#issuecomment-3025443600 High level it is described in https://github.com/apache/datafusion/issues/15037 I think @adriangb is considering a writeup about it directly here: - https://github.com/apache

Re: [PR] restore topk pre-filtering of batches and make sort query fuzzer less sensitive to expected non determinism [datafusion]

2025-07-01 Thread via GitHub
alamb commented on PR #16501: URL: https://github.com/apache/datafusion/pull/16501#issuecomment-3025439277 Amazing -- thank you @adriangb and @Dandandan -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above t

Re: [PR] fix: enable listagg test for Spark 4.0.0 [wip] [datafusion-comet]

2025-07-01 Thread via GitHub
andygrove closed pull request #1965: fix: enable listagg test for Spark 4.0.0 [wip] URL: https://github.com/apache/datafusion-comet/pull/1965 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spec

Re: [PR] Perf: fast CursorValues compare for StringViewArray using inline_key_… [datafusion]

2025-07-01 Thread via GitHub
alamb commented on PR #16630: URL: https://github.com/apache/datafusion/pull/16630#issuecomment-3025481489 🤖 `./gh_compare_branch.sh` [Benchmark Script](https://github.com/alamb/datafusion-benchmarking/blob/main/gh_compare_branch.sh) Running Linux aal-dev 6.11.0-1016-gcp #16~24.04.1-Ubun

Re: [I] Wrong results when `pushdown_filters` is enabled (starting in 48.0.0) [datafusion]

2025-07-01 Thread via GitHub
adriangb commented on issue #16588: URL: https://github.com/apache/datafusion/issues/16588#issuecomment-3025452309 I have a fix -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific commen

Re: [PR] Apply pre-selection and computation skipping to short-circuit optimization [datafusion]

2025-07-01 Thread via GitHub
alamb commented on PR #15694: URL: https://github.com/apache/datafusion/pull/15694#issuecomment-3025430123 > Is there a benchmark showcasing this change (and #15462) against previous behaviour? I'd like to showcase this a bit in the [DF 49 blog post](https://github.com/apache/datafusion/iss

Re: [I] Wrong results when `pushdown_filters` is enabled (starting in 48.0.0) [datafusion]

2025-07-01 Thread via GitHub
alamb commented on issue #16588: URL: https://github.com/apache/datafusion/issues/16588#issuecomment-3025434537 Thank you for looking into this @adriangb -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above t

Re: [PR] TopK dynamic filter pushdown attempt 2 [datafusion]

2025-07-01 Thread via GitHub
Omega359 commented on PR #15770: URL: https://github.com/apache/datafusion/pull/15770#issuecomment-3025461502 Awesome, thanks. Yes, I think a full blog post for this would be amazing but I'd still like to have it in the release blog post with a short summary and maybe a few stats. -- Thi

Re: [I] SQL Unparser Can Not Process Struct Of Struct Access When Generating Sql [datafusion]

2025-07-01 Thread via GitHub
alamb closed issue #16607: SQL Unparser Can Not Process Struct Of Struct Access When Generating Sql URL: https://github.com/apache/datafusion/issues/16607 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to g

Re: [PR] Perf: fast CursorValues compare for StringViewArray using inline_key_… [datafusion]

2025-07-01 Thread via GitHub
alamb commented on code in PR #16630: URL: https://github.com/apache/datafusion/pull/16630#discussion_r2178499065 ## datafusion/physical-plan/src/sorts/cursor.rs: ## @@ -288,6 +288,64 @@ impl CursorArray for StringViewArray { } } +/// Todo use arrow-rs side api after: <

Re: [PR] Format `Date32` to string given timestamp specifiers [datafusion]

2025-07-01 Thread via GitHub
alamb commented on PR #15361: URL: https://github.com/apache/datafusion/pull/15361#issuecomment-3025499081 Thanks @Omega359 and @friendlymatthew -- when you are happy with it, please ping me and I will give it a final review -- This is an automated message from the Apache Git Service. T

Re: [PR] chore(deps): bump substrait from 0.57.0 to 0.58.0 [datafusion]

2025-07-01 Thread via GitHub
comphead merged PR #16640: URL: https://github.com/apache/datafusion/pull/16640 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

Re: [PR] feat: Add from_unixtime support [datafusion-comet]

2025-07-01 Thread via GitHub
andygrove commented on code in PR #1943: URL: https://github.com/apache/datafusion-comet/pull/1943#discussion_r2177922388 ## spark/src/test/scala/org/apache/comet/CometExpressionSuite.scala: ## @@ -1605,6 +1605,41 @@ class CometExpressionSuite extends CometTestBase with Adaptiv

Re: [PR] Convert Option> to Vec [datafusion]

2025-07-01 Thread via GitHub
ViggoC commented on code in PR #16615: URL: https://github.com/apache/datafusion/pull/16615#discussion_r2178075914 ## datafusion/expr/src/expr.rs: ## @@ -1175,26 +1175,26 @@ impl Exists { /// User Defined Aggregate Function /// -/// See [`udaf::AggregateUDF`] for more inform

[I] Remove `COMET_ANSI_MODE_ENABLED` config [datafusion-comet]

2025-07-01 Thread via GitHub
andygrove opened a new issue, #1974: URL: https://github.com/apache/datafusion-comet/issues/1974 ### What is the problem the feature request solves? `COMET_ANSI_MODE_ENABLED` was added early on during testing with Spark 4.0.0 to ensure that queries would fall back to Spark by default

Re: [PR] [ignore] Remove Comet ANSI config [datafusion-comet]

2025-07-01 Thread via GitHub
andygrove closed pull request #1969: [ignore] Remove Comet ANSI config URL: https://github.com/apache/datafusion-comet/pull/1969 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[I] Reuse CometArrowAllocator instead of creating a new RootAllocator [datafusion-comet]

2025-07-01 Thread via GitHub
EmilyMatt opened a new issue, #1972: URL: https://github.com/apache/datafusion-comet/issues/1972 ### What is the problem the feature request solves? Just removes the allocator from the CometArrowConverters, since we already have a RootAllocator in the package file ### Describe

Re: [PR] Implementation for regex_instr [datafusion]

2025-07-01 Thread via GitHub
Omega359 commented on code in PR #15928: URL: https://github.com/apache/datafusion/pull/15928#discussion_r2178421432 ## datafusion/functions/src/regex/mod.rs: ## @@ -60,6 +65,34 @@ pub mod expr_fn { super::regexp_match().call(args) } +/// Returns index of reg

Re: [PR] Implementation for regex_instr [datafusion]

2025-07-01 Thread via GitHub
Omega359 commented on code in PR #15928: URL: https://github.com/apache/datafusion/pull/15928#discussion_r2178423784 ## datafusion/functions/src/regex/mod.rs: ## @@ -89,7 +122,44 @@ pub fn functions() -> Vec> { vec![ regexp_count(), regexp_match(), +

[PR] Improve display format of BoundedWindowAggExec [datafusion]

2025-07-01 Thread via GitHub
geetanshjuneja opened a new pull request, #16645: URL: https://github.com/apache/datafusion/pull/16645 ## Which issue does this PR close? Closes #15758 - Closes #. ## Rationale for this change We can see Debug format in this explain statement, turn it to Displa

  1   2   3   >