[I] update rust edition to 2024 [datafusion-ballista]

2025-06-14 Thread via GitHub
milenkovicm opened a new issue, #1271: URL: https://github.com/apache/datafusion-ballista/issues/1271 **Is your feature request related to a problem or challenge? Please describe what you are trying to do.** to keep up with latest rust edition we need to update rust to 2024 **D

Re: [PR] Improve ability to cancel queries quickly [datafusion]

2025-06-14 Thread via GitHub
pepijnve commented on PR #16301: URL: https://github.com/apache/datafusion/pull/16301#issuecomment-2973145860 Superseded by #16398 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific com

Re: [PR] Improve ability to cancel queries quickly [datafusion]

2025-06-14 Thread via GitHub
pepijnve closed pull request #16301: Improve ability to cancel queries quickly URL: https://github.com/apache/datafusion/pull/16301 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific commen

Re: [PR] feat: Support null aware + equijoins for `NestedLoopJoin` [datafusion]

2025-06-14 Thread via GitHub
UBarney commented on code in PR #16210: URL: https://github.com/apache/datafusion/pull/16210#discussion_r2147410263 ## datafusion/physical-plan/src/joins/nested_loop_join.rs: ## @@ -178,6 +187,18 @@ pub struct NestedLoopJoinExec { metrics: ExecutionPlanMetricsSet, ///

[PR] fix: respect inexact flags in row group metadata [datafusion]

2025-06-14 Thread via GitHub
CookiePieWw opened a new pull request, #16412: URL: https://github.com/apache/datafusion/pull/16412 ## Which issue does this PR close? - Closes #15976. ## Rationale for this change As title, wrap the max and min stats with `Inexact` or `Exact` respecting

Re: [PR] Eliminate Self Joins [datafusion]

2025-06-14 Thread via GitHub
jonathanc-n commented on PR #16023: URL: https://github.com/apache/datafusion/pull/16023#issuecomment-2972877511 I will try to take a look at this this weekend @atahanyorganci -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub a

Re: [PR] fix: Remove `null_equals_null` todo in `NestedLoopJoin` [datafusion]

2025-06-14 Thread via GitHub
Dandandan commented on PR #16390: URL: https://github.com/apache/datafusion/pull/16390#issuecomment-2972940062 > After some thinking I realized that it will still not need `null_equals_null` support in the nestedloopjoin this will happen at planner time, and it is more of a question on whet

Re: [PR] Use Tokio's task budget consistently [datafusion]

2025-06-14 Thread via GitHub
pepijnve commented on PR #16398: URL: https://github.com/apache/datafusion/pull/16398#issuecomment-2973035858 Thinking about it some more. The evaluation type is intended to describe how the operator computes record batches itself: lazy on demand, or by driving things itself. I’m kind of tr

Re: [PR] Use Tokio's task budget consistently [datafusion]

2025-06-14 Thread via GitHub
pepijnve commented on PR #16398: URL: https://github.com/apache/datafusion/pull/16398#issuecomment-2973036248 Open to suggestions on better names for these properties. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use th

[PR] chore: use DF scalar functions for StartsWith, EndsWith, Contains, DF LikeExpr [datafusion-comet]

2025-06-14 Thread via GitHub
mbutrovich opened a new pull request, #1887: URL: https://github.com/apache/datafusion-comet/pull/1887 ## Which issue does this PR close? Closes #. ## Rationale for this change The existing code uses deprecated kernels. I am working on Utf8View support fo

Re: [I] Dynamic pruning filters from TopK state (optimize `ORDER BY LIMIT` queries) [datafusion]

2025-06-14 Thread via GitHub
Dandandan commented on issue #15037: URL: https://github.com/apache/datafusion/issues/15037#issuecomment-2973296403 Really close now! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] chore: use DF scalar functions for StartsWith, EndsWith, Contains, DF LikeExpr [datafusion-comet]

2025-06-14 Thread via GitHub
codecov-commenter commented on PR #1887: URL: https://github.com/apache/datafusion-comet/pull/1887#issuecomment-2973321758 ## [Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/1887?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_ca

Re: [PR] TopK dynamic filter pushdown attempt 2 [datafusion]

2025-06-14 Thread via GitHub
adriangb commented on PR #15770: URL: https://github.com/apache/datafusion/pull/15770#issuecomment-2973326482 @Dandandan thank you so much for pushing this forward! Really appreciate the help and collaboration. Since you've been looking at the code do you have any thoughts on https:/

Re: [PR] feat: Support Parquet writer options [datafusion-python]

2025-06-14 Thread via GitHub
nuno-faria commented on code in PR #1123: URL: https://github.com/apache/datafusion-python/pull/1123#discussion_r2147168692 ## python/datafusion/dataframe.py: ## @@ -704,38 +694,135 @@ def write_csv(self, path: str | pathlib.Path, with_header: bool = False) -> None def wr

[PR] build(deps): bump datafusion-ffi from 47.0.0 to 48.0.0 [datafusion-python]

2025-06-14 Thread via GitHub
dependabot[bot] opened a new pull request, #1147: URL: https://github.com/apache/datafusion-python/pull/1147 Bumps [datafusion-ffi](https://github.com/apache/datafusion) from 47.0.0 to 48.0.0. Commits https://github.com/apache/datafusion/commit/33a32d4382bee7e3c705d0f55d05c24a1

[PR] build(deps): bump datafusion-substrait from 47.0.0 to 48.0.0 [datafusion-python]

2025-06-14 Thread via GitHub
dependabot[bot] opened a new pull request, #1148: URL: https://github.com/apache/datafusion-python/pull/1148 Bumps [datafusion-substrait](https://github.com/apache/datafusion) from 47.0.0 to 48.0.0. Commits https://github.com/apache/datafusion/commit/33a32d4382bee7e3c705d0f55d0

[PR] build(deps): bump prost-types from 0.13.5 to 0.14.0 [datafusion-python]

2025-06-14 Thread via GitHub
dependabot[bot] opened a new pull request, #1152: URL: https://github.com/apache/datafusion-python/pull/1152 Bumps [prost-types](https://github.com/tokio-rs/prost) from 0.13.5 to 0.14.0. Changelog Sourced from https://github.com/tokio-rs/prost/blob/master/CHANGELOG.md";>prost-types'

[PR] build(deps): bump object_store from 0.12.1 to 0.12.2 [datafusion-python]

2025-06-14 Thread via GitHub
dependabot[bot] opened a new pull request, #1149: URL: https://github.com/apache/datafusion-python/pull/1149 Bumps [object_store](https://github.com/apache/arrow-rs-object-store) from 0.12.1 to 0.12.2. Changelog Sourced from https://github.com/apache/arrow-rs-object-store/blob/main

[PR] build(deps): bump datafusion-proto from 47.0.0 to 48.0.0 [datafusion-python]

2025-06-14 Thread via GitHub
dependabot[bot] opened a new pull request, #1151: URL: https://github.com/apache/datafusion-python/pull/1151 Bumps [datafusion-proto](https://github.com/apache/datafusion) from 47.0.0 to 48.0.0. Commits https://github.com/apache/datafusion/commit/33a32d4382bee7e3c705d0f55d05c24

[PR] build(deps): bump datafusion from 47.0.0 to 48.0.0 [datafusion-python]

2025-06-14 Thread via GitHub
dependabot[bot] opened a new pull request, #1150: URL: https://github.com/apache/datafusion-python/pull/1150 Bumps [datafusion](https://github.com/apache/datafusion) from 47.0.0 to 48.0.0. Commits https://github.com/apache/datafusion/commit/33a32d4382bee7e3c705d0f55d05c24a115a2

[PR] build(deps): bump prost from 0.13.5 to 0.14.0 [datafusion-python]

2025-06-14 Thread via GitHub
dependabot[bot] opened a new pull request, #1153: URL: https://github.com/apache/datafusion-python/pull/1153 Bumps [prost](https://github.com/tokio-rs/prost) from 0.13.5 to 0.14.0. Changelog Sourced from https://github.com/tokio-rs/prost/blob/master/CHANGELOG.md";>prost's changelog

Re: [PR] Apply filter early in TopK [datafusion]

2025-06-14 Thread via GitHub
Dandandan commented on PR #16408: URL: https://github.com/apache/datafusion/pull/16408#issuecomment-2973088391 I applied the changes to https://github.com/apache/datafusion/pull/15770 let's continue there -- This is an automated message from the Apache Git Service. To respond to the messa

Re: [PR] Apply filter early in TopK [datafusion]

2025-06-14 Thread via GitHub
Dandandan closed pull request #16408: Apply filter early in TopK URL: https://github.com/apache/datafusion/pull/16408 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubsc

Re: [PR] TopK dynamic filter pushdown attempt 2 [datafusion]

2025-06-14 Thread via GitHub
Dandandan commented on PR #15770: URL: https://github.com/apache/datafusion/pull/15770#issuecomment-2973096992 @alamb could you rerun the benchmarks? Maybe also run the topk benchmark (`topk_tpch`). -- This is an automated message from the Apache Git Service. To respond to the mess

Re: [PR] Reduce some cloning [datafusion]

2025-06-14 Thread via GitHub
Dandandan merged PR #16404: URL: https://github.com/apache/datafusion/pull/16404 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@data

Re: [PR] TopK dynamic filter pushdown attempt 2 [datafusion]

2025-06-14 Thread via GitHub
adriangb commented on PR #15770: URL: https://github.com/apache/datafusion/pull/15770#issuecomment-2972849206 Let's merge your PR into here :) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] chore(deps): bump syn from 2.0.102 to 2.0.103 [datafusion]

2025-06-14 Thread via GitHub
comphead merged PR #16393: URL: https://github.com/apache/datafusion/pull/16393 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

Re: [PR] chore: generate basic spark function tests [datafusion]

2025-06-14 Thread via GitHub
comphead commented on code in PR #16409: URL: https://github.com/apache/datafusion/pull/16409#discussion_r2147442023 ## datafusion/sqllogictest/test_files/spark/datetime/current_timezone.slt: ## @@ -0,0 +1,22 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or

Re: [PR] chore: generate basic spark function tests [datafusion]

2025-06-14 Thread via GitHub
shehabgamin commented on code in PR #16409: URL: https://github.com/apache/datafusion/pull/16409#discussion_r2147445532 ## datafusion/sqllogictest/test_files/spark/datetime/current_timezone.slt: ## @@ -0,0 +1,22 @@ +# Licensed to the Apache Software Foundation (ASF) under one +#

Re: [PR] chore: generate basic spark function tests [datafusion]

2025-06-14 Thread via GitHub
comphead commented on PR #16409: URL: https://github.com/apache/datafusion/pull/16409#issuecomment-2973512315 Got it @shehabgamin I'm seeing a lot of `slt` tests like ``` #S #E #query #L ``` which not very explanatory. For testing `spark` integ

Re: [PR] Add design process section to the docs [datafusion]

2025-06-14 Thread via GitHub
comphead commented on code in PR #16397: URL: https://github.com/apache/datafusion/pull/16397#discussion_r2147429880 ## docs/source/contributor-guide/index.md: ## @@ -108,6 +108,26 @@ Features above) prior to acceptance include: [extensions list]: ../library-user-guide/extensio

Re: [PR] fix: Fixed error handling for `generate_series/range` [datafusion]

2025-06-14 Thread via GitHub
comphead commented on code in PR #16391: URL: https://github.com/apache/datafusion/pull/16391#discussion_r2147430655 ## datafusion/functions-table/src/generate_series.rs: ## @@ -197,11 +197,18 @@ impl TableFunctionImpl for GenerateSeriesFuncImpl { } let mut n

Re: [PR] fix: Fixed error handling for `generate_series/range` [datafusion]

2025-06-14 Thread via GitHub
comphead commented on code in PR #16391: URL: https://github.com/apache/datafusion/pull/16391#discussion_r2147430611 ## datafusion/functions-table/src/generate_series.rs: ## @@ -197,11 +197,18 @@ impl TableFunctionImpl for GenerateSeriesFuncImpl { } let mut n

Re: [PR] chore: generate basic spark function tests [datafusion]

2025-06-14 Thread via GitHub
shehabgamin commented on PR #16409: URL: https://github.com/apache/datafusion/pull/16409#issuecomment-2973501683 > Thanks @shehabgamin appreciate if you can provide more insight on the PR details. It looks like a lot of tests commented out mostly from `spark` domain and PR rationale didn't

Re: [PR] chore: generate basic spark function tests [datafusion]

2025-06-14 Thread via GitHub
comphead commented on PR #16409: URL: https://github.com/apache/datafusion/pull/16409#issuecomment-2973513172 https://github.com/apache/datafusion-comet/blob/6bf80b107cc1574cb7f259719d0aa203e387efc4/spark/src/test/scala/org/apache/comet/CometExpressionCoverageSuite.scala#L182 This is

Re: [PR] fix: Fixed error handling for `generate_series/range` [datafusion]

2025-06-14 Thread via GitHub
jonathanc-n commented on code in PR #16391: URL: https://github.com/apache/datafusion/pull/16391#discussion_r2147445833 ## datafusion/functions-table/src/generate_series.rs: ## @@ -197,11 +197,18 @@ impl TableFunctionImpl for GenerateSeriesFuncImpl { } let mu

Re: [PR] [datafusion-spark] Example of using Spark compatible function library [datafusion]

2025-06-14 Thread via GitHub
comphead merged PR #16384: URL: https://github.com/apache/datafusion/pull/16384 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

Re: [I] [datafusion-spark] Example of using Spark compatible function library [datafusion]

2025-06-14 Thread via GitHub
comphead closed issue #15915: [datafusion-spark] Example of using Spark compatible function library URL: https://github.com/apache/datafusion/issues/15915 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to g

Re: [I] Can't publish datafusion-spark crate due to error [datafusion]

2025-06-14 Thread via GitHub
comphead closed issue #16383: Can't publish datafusion-spark crate due to error URL: https://github.com/apache/datafusion/issues/16383 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific com

Re: [PR] Simplify expressions passed to table functions [datafusion]

2025-06-14 Thread via GitHub
comphead merged PR #16388: URL: https://github.com/apache/datafusion/pull/16388 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

Re: [PR] fix: Fixed error handling for `generate_series/range` [datafusion]

2025-06-14 Thread via GitHub
comphead commented on code in PR #16391: URL: https://github.com/apache/datafusion/pull/16391#discussion_r2147446520 ## datafusion/functions-table/src/generate_series.rs: ## @@ -197,11 +197,18 @@ impl TableFunctionImpl for GenerateSeriesFuncImpl { } let mut n

Re: [PR] Use Tokio's task budget consistently [datafusion]

2025-06-14 Thread via GitHub
ozankabak commented on PR #16398: URL: https://github.com/apache/datafusion/pull/16398#issuecomment-2973514966 > Thinking about it some more. The evaluation type is intended to describe how the operator computes record batches itself: lazy on demand, or by driving things itself. I’m kind of

Re: [I] Table function supports non-literal args [datafusion]

2025-06-14 Thread via GitHub
comphead closed issue #14958: Table function supports non-literal args URL: https://github.com/apache/datafusion/issues/14958 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [I] Improve sql planing performance (optimize `try_process_unnest`) [datafusion]

2025-06-14 Thread via GitHub
comphead closed issue #16242: Improve sql planing performance (optimize `try_process_unnest`) URL: https://github.com/apache/datafusion/issues/16242 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to t

Re: [PR] Add fast paths for try_process_unnest [datafusion]

2025-06-14 Thread via GitHub
comphead merged PR #16389: URL: https://github.com/apache/datafusion/pull/16389 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

Re: [PR] fix: Fixed error handling for `generate_series/range` [datafusion]

2025-06-14 Thread via GitHub
jonathanc-n commented on code in PR #16391: URL: https://github.com/apache/datafusion/pull/16391#discussion_r2147447123 ## datafusion/functions-table/src/generate_series.rs: ## @@ -197,11 +197,18 @@ impl TableFunctionImpl for GenerateSeriesFuncImpl { } let mu

Re: [PR] Add note in upgrade guide about changes to `Expr::Scalar` in 48.0.0 [datafusion]

2025-06-14 Thread via GitHub
comphead commented on PR #16360: URL: https://github.com/apache/datafusion/pull/16360#issuecomment-2973517490 https://github.com/apache/datafusion/issues/16414 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL ab

[I] Automate `Upgrade Guide` on top of most recent `deperecated` methods [datafusion]

2025-06-14 Thread via GitHub
comphead opened a new issue, #16414: URL: https://github.com/apache/datafusion/issues/16414 Thanks @alamb I'm creating a ticket to generate a `upgrade` guide or part of it from `deprecated` attribute _Originally posted by @comphead in https://github.com/apache/datafusion/pull

Re: [I] Automate `Upgrade Guide` on top of most recent `deprecated` methods [datafusion]

2025-06-14 Thread via GitHub
comphead commented on issue #16414: URL: https://github.com/apache/datafusion/issues/16414#issuecomment-2973518335 It would be nice having the `upgrade guide` updated automatically by analyzing `deprecated` attributes by `since` param and attach the details to the guide by running a script,

[I] Add documentation to clarify algorithms for Mark Joins [datafusion]

2025-06-14 Thread via GitHub
jonathanc-n opened a new issue, #16415: URL: https://github.com/apache/datafusion/issues/16415 ### Is your feature request related to a problem or challenge? > that would be nice to show as an example, it is challenging to read algorithms for upcoming contributors, but it can be done

Re: [I] Add documentation to clarify algorithms for Mark Joins [datafusion]

2025-06-14 Thread via GitHub
jonathanc-n commented on issue #16415: URL: https://github.com/apache/datafusion/issues/16415#issuecomment-2973518470 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. T

Re: [PR] feat: Support RightMark join for NestedLoop and Hash join [datafusion]

2025-06-14 Thread via GitHub
jonathanc-n commented on code in PR #16083: URL: https://github.com/apache/datafusion/pull/16083#discussion_r2147448608 ## datafusion/physical-plan/src/joins/symmetric_hash_join.rs: ## @@ -818,6 +822,20 @@ where .collect(); (build_indices, probe_ind

Re: [PR] Apply filter early in TopK [datafusion]

2025-06-14 Thread via GitHub
adriangb commented on PR #16408: URL: https://github.com/apache/datafusion/pull/16408#issuecomment-2972860809 Can you make a PR on our fork and we merge this into the main PR? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub an

Re: [PR] Apply filter early in TopK [datafusion]

2025-06-14 Thread via GitHub
adriangb commented on PR #16408: URL: https://github.com/apache/datafusion/pull/16408#issuecomment-2972862984 Or you can just push to the main PR, I gave you write access to our fork :) My one question is: how does this optimization play with filter pushdown? If a child plan accepted

Re: [PR] Apply filter early in TopK [datafusion]

2025-06-14 Thread via GitHub
Dandandan commented on PR #16408: URL: https://github.com/apache/datafusion/pull/16408#issuecomment-2972888776 > Or you can just push to the main PR, I gave you write access to our fork :) > > My one question is: how does this optimization play with filter pushdown? If a child plan ac

Re: [PR] fix: Remove `null_equals_null` todo in `NestedLoopJoin` [datafusion]

2025-06-14 Thread via GitHub
jonathanc-n commented on PR #16390: URL: https://github.com/apache/datafusion/pull/16390#issuecomment-2973341692 @Dandandan Yes, it was mentioned in the original pull request. I concluded that if it is better for the nested loop join to take a equijoin condition (where one table's rows are

Re: [PR] feat: Support RightMark join for NestedLoop and Hash join [datafusion]

2025-06-14 Thread via GitHub
comphead commented on code in PR #16083: URL: https://github.com/apache/datafusion/pull/16083#discussion_r2147425621 ## datafusion/physical-plan/src/joins/symmetric_hash_join.rs: ## @@ -818,6 +822,20 @@ where .collect(); (build_indices, probe_indice

Re: [PR] feat: Support RightMark join for NestedLoop and Hash join [datafusion]

2025-06-14 Thread via GitHub
comphead commented on PR #16083: URL: https://github.com/apache/datafusion/pull/16083#issuecomment-2973494086 Thanks for this contribution, I'm planning to have this PR open for a little bit of more time to see if there are any other feedbacks -- This is an automated message from the Apac

Re: [PR] Update Roadmap documentation [datafusion]

2025-06-14 Thread via GitHub
comphead commented on code in PR #16399: URL: https://github.com/apache/datafusion/pull/16399#discussion_r2147428069 ## docs/source/contributor-guide/roadmap.md: ## @@ -46,81 +46,12 @@ make review efficient and avoid surprises. # Quarterly Roadmap -A quarterly roadmap will

Re: [PR] feat: add SchemaProvider::table_type(table_name: &str) [datafusion]

2025-06-14 Thread via GitHub
comphead commented on code in PR #16401: URL: https://github.com/apache/datafusion/pull/16401#discussion_r2147428867 ## datafusion/catalog/src/schema.rs: ## @@ -54,6 +55,14 @@ pub trait SchemaProvider: Debug + Sync + Send { name: &str, ) -> Result>, DataFusionError

Re: [PR] feat: add SchemaProvider::table_type(table_name: &str) [datafusion]

2025-06-14 Thread via GitHub
comphead commented on code in PR #16401: URL: https://github.com/apache/datafusion/pull/16401#discussion_r2147429310 ## datafusion/catalog/src/schema.rs: ## @@ -54,6 +55,14 @@ pub trait SchemaProvider: Debug + Sync + Send { name: &str, ) -> Result>, DataFusionError

Re: [PR] Add design process section to the docs [datafusion]

2025-06-14 Thread via GitHub
comphead commented on code in PR #16397: URL: https://github.com/apache/datafusion/pull/16397#discussion_r2147429475 ## docs/source/contributor-guide/index.md: ## @@ -108,6 +108,26 @@ Features above) prior to acceptance include: [extensions list]: ../library-user-guide/extensio

[PR] chore: release datafusion 47.0.0 [datafusion-ballista]

2025-06-14 Thread via GitHub
milenkovicm opened a new pull request, #1269: URL: https://github.com/apache/datafusion-ballista/pull/1269 # Which issue does this PR close? Closes #. # Rationale for this change generate a changelog for ballista 47 release # What changes are included in this

Re: [PR] Support data source sampling with TABLESAMPLE [datafusion]

2025-06-14 Thread via GitHub
theirix commented on code in PR #16325: URL: https://github.com/apache/datafusion/pull/16325#discussion_r2146926102 ## datafusion/sql/tests/sql_integration.rs: ## @@ -4714,3 +4714,115 @@ fn test_using_join_wildcard_schema() { ] ); } + +#[test] Review Comment:

Re: [PR] fix: Move `null_equals_null` todo in `NestedLoopJoin` to Physical Planner [datafusion]

2025-06-14 Thread via GitHub
Dandandan commented on PR #16390: URL: https://github.com/apache/datafusion/pull/16390#issuecomment-2973547633 > @Dandandan Yes, it was mentioned in the original pull request. I concluded that if it is better for the nested loop join to take a equijoin condition (where one table's rows are

Re: [PR] Support remaining pipe operators [datafusion-sqlparser-rs]

2025-06-14 Thread via GitHub
iffyio commented on code in PR #1879: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1879#discussion_r2147453124 ## src/parser/mod.rs: ## @@ -9947,6 +9947,51 @@ impl<'a> Parser<'a> { Ok(IdentWithAlias { ident, alias }) } +/// Parse `identifier [

Re: [PR] fix: Fixed error handling for `generate_series/range` [datafusion]

2025-06-14 Thread via GitHub
jonathanc-n commented on code in PR #16391: URL: https://github.com/apache/datafusion/pull/16391#discussion_r2147461767 ## datafusion/functions-table/src/generate_series.rs: ## @@ -197,11 +197,18 @@ impl TableFunctionImpl for GenerateSeriesFuncImpl { } let mu

Re: [PR] TopK dynamic filter pushdown attempt 2 [datafusion]

2025-06-14 Thread via GitHub
Dandandan commented on PR #15770: URL: https://github.com/apache/datafusion/pull/15770#issuecomment-2972642194 Results of applying this in TopK itself: https://github.com/apache/datafusion/pull/16408#issue-3145903557 -- This is an automated message from the Apache Git Service. To respon

Re: [I] Blog post about parquet vs custom file formats [datafusion]

2025-06-14 Thread via GitHub
JigaoLuo commented on issue #16149: URL: https://github.com/apache/datafusion/issues/16149#issuecomment-2972640469 Hi @alamb @zhuqi-lucas , I recently encountered an issue and it is very nice. Thanks. I am also curious: **Why would uncompressed Parquet be considered an optimization

Re: [PR] Chore: implement hour func as ScalarUDFImpl [datafusion-comet]

2025-06-14 Thread via GitHub
codecov-commenter commented on PR #1874: URL: https://github.com/apache/datafusion-comet/pull/1874#issuecomment-2972716900 ## [Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/1874?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_ca

Re: [PR] chore: generate basic spark function tests [datafusion]

2025-06-14 Thread via GitHub
shehabgamin commented on PR #16409: URL: https://github.com/apache/datafusion/pull/16409#issuecomment-2972618052 Not sure if it makes sense to commit the script I used, so I'll paste it here for now: ``` """ WARNING: - This script extracts only basic, straightforward tests.

[PR] chore: generate basic spark function tests [datafusion]

2025-06-14 Thread via GitHub
shehabgamin opened a new pull request, #16409: URL: https://github.com/apache/datafusion/pull/16409 ## Which issue does this PR close? - Closes #. ## Rationale for this change ## What changes are included in this PR? ## Are these changes tes

Re: [PR] Use Tokio's task budget consistently [datafusion]

2025-06-14 Thread via GitHub
ozankabak commented on PR #16398: URL: https://github.com/apache/datafusion/pull/16398#issuecomment-2972621416 > While I was writing this I started wondering if evaluation type should be a per child thing. In my spawn experiment branch for instance hash join is eager for the build side, but

Re: [PR] Migrate core test to insta, part1 [datafusion]

2025-06-14 Thread via GitHub
Chen-Yuan-Lai commented on code in PR #16324: URL: https://github.com/apache/datafusion/pull/16324#discussion_r2146791719 ## datafusion/core/tests/sql/explain_analyze.rs: ## @@ -52,43 +54,45 @@ async fn explain_analyze_baseline_metrics() { let formatted = arrow::util::prett

Re: [PR] TopK dynamic filter pushdown attempt 2 [datafusion]

2025-06-14 Thread via GitHub
Dandandan commented on PR #15770: URL: https://github.com/apache/datafusion/pull/15770#issuecomment-2972552039 I am reapplying my PR on top of this branch, I'll report my results. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHu

Re: [PR] Chore: implement hour func as ScalarUDFImpl [datafusion-comet]

2025-06-14 Thread via GitHub
trompa commented on PR #1874: URL: https://github.com/apache/datafusion-comet/pull/1874#issuecomment-2972493584 @mbutrovich ive locally moved hour/minute/seconds implementation to a macro. I could potentially be able to reuse if for other functions i dont see implemented: day/month/year

Re: [PR] TopK dynamic filter pushdown attempt 2 [datafusion]

2025-06-14 Thread via GitHub
Dandandan commented on PR #15770: URL: https://github.com/apache/datafusion/pull/15770#issuecomment-2972769559 Combined with changes in my PR (https://github.com/apache/datafusion/pull/16408), this is looking sweet for the TopK benchmarks: ``` Benchmark run_topk_tpch.json -

Re: [PR] feat: support RangePartitioning with native shuffle [datafusion-comet]

2025-06-14 Thread via GitHub
mbutrovich commented on code in PR #1862: URL: https://github.com/apache/datafusion-comet/pull/1862#discussion_r2146947020 ## common/src/main/scala/org/apache/comet/CometConf.scala: ## @@ -307,6 +307,18 @@ object CometConf extends ShimCometConf { .booleanConf .crea

[PR] Add topk benchmark [datafusion]

2025-06-14 Thread via GitHub
Dandandan opened a new pull request, #16410: URL: https://github.com/apache/datafusion/pull/16410 ## Which issue does this PR close? - Closes #. ## Rationale for this change ## What changes are included in this PR? ## Are these changes teste

[I] Add TopK benchmark [datafusion]

2025-06-14 Thread via GitHub
Dandandan opened a new issue, #16411: URL: https://github.com/apache/datafusion/issues/16411 ### Is your feature request related to a problem or challenge? We want to extend topk test to test it a bit more extensively. ### Describe the solution you'd like Add a command to

Re: [PR] Add topk_tpch benchmark [datafusion]

2025-06-14 Thread via GitHub
Dandandan merged PR #16410: URL: https://github.com/apache/datafusion/pull/16410 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@data

Re: [I] Add TopK benchmark [datafusion]

2025-06-14 Thread via GitHub
Dandandan closed issue #16411: Add TopK benchmark URL: https://github.com/apache/datafusion/issues/16411 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

Re: [PR] Add topk_tpch benchmark [datafusion]

2025-06-14 Thread via GitHub
Dandandan commented on PR #16410: URL: https://github.com/apache/datafusion/pull/16410#issuecomment-2972782892 Thanks for the quick review @adriangb -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[PR] Apply filter in TopK [datafusion]

2025-06-14 Thread via GitHub
Dandandan opened a new pull request, #16408: URL: https://github.com/apache/datafusion/pull/16408 ## Which issue does this PR close? - Closes #. ## Rationale for this change ## What changes are included in this PR? ## Are these changes teste

Re: [PR] TopK dynamic filter pushdown attempt 2 [datafusion]

2025-06-14 Thread via GitHub
Dandandan commented on code in PR #15770: URL: https://github.com/apache/datafusion/pull/15770#discussion_r2146805677 ## datafusion/common/src/config.rs: ## @@ -614,6 +614,13 @@ config_namespace! { /// during aggregations, if possible pub enable_topk_aggregatio

Re: [PR] Add an example of embedding indexes inside a parquet file [datafusion]

2025-06-14 Thread via GitHub
zhuqi-lucas commented on code in PR #16395: URL: https://github.com/apache/datafusion/pull/16395#discussion_r2146810315 ## datafusion-examples/examples/embedding_parquet_indexes.rs: ## @@ -0,0 +1,243 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more c

Re: [PR] Optimize TopK with threshold filter ~1.4x speedup [datafusion]

2025-06-14 Thread via GitHub
Dandandan closed pull request #15697: Optimize TopK with threshold filter ~1.4x speedup URL: https://github.com/apache/datafusion/pull/15697 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the speci

Re: [PR] Optimize TopK with threshold filter ~1.4x speedup [datafusion]

2025-06-14 Thread via GitHub
Dandandan commented on PR #15697: URL: https://github.com/apache/datafusion/pull/15697#issuecomment-2972592373 This will be replaced by https://github.com/apache/datafusion/pull/16408 with the PR from @adriangb -- This is an automated message from the Apache Git Service. To respond to th

[I] [Blog] Proposal: Add categorical-tags to blogs for better navigation [datafusion]

2025-06-14 Thread via GitHub
JigaoLuo opened a new issue, #16407: URL: https://github.com/apache/datafusion/issues/16407 Hi datafusion team, First, thank you for consistently publishing [high-quality blogs](https://datafusion.apache.org/blog/)! I appreciate the effort behind them. Feedback & Suggestions:

[PR] chore: update datafusion to 48 [datafusion-ballista]

2025-06-14 Thread via GitHub
milenkovicm opened a new pull request, #1270: URL: https://github.com/apache/datafusion-ballista/pull/1270 # Which issue does this PR close? Closes #. # Rationale for this change - keep up with datafusion release # What changes are included in this PR?