Re: [PR] feat(spark): implement Spark `map` function `map_from_entries` [datafusion]

2025-09-27 Thread via GitHub
comphead commented on code in PR #17779: URL: https://github.com/apache/datafusion/pull/17779#discussion_r2384270261 ## datafusion/sqllogictest/test_files/spark/map/map_from_entries.slt: ## @@ -0,0 +1,119 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or mor

Re: [I] Mismatched `min` value for -0.0 and 0.0 [datafusion-comet]

2025-09-27 Thread via GitHub
comphead commented on issue #2448: URL: https://github.com/apache/datafusion-comet/issues/2448#issuecomment-3341996202 I dont see normalized code is called, checking but Spark and DF has diff vision on min ``` scala> spark.sql("select min(a) from (select cast(-0.0 as flo

Re: [I] Potential performance regression with `parquet 56.1.0` / data ranges [datafusion]

2025-09-27 Thread via GitHub
XiangpengHao commented on issue #17575: URL: https://github.com/apache/datafusion/issues/17575#issuecomment-3341985618 I don't remember either (I thought it was fixed, but obviously not...) -- This is an automated message from the Apache Git Service. To respond to the message, please log o

[PR] build(deps): bump arrow from 56.1.0 to 56.2.0 [datafusion-python]

2025-09-27 Thread via GitHub
dependabot[bot] opened a new pull request, #1255: URL: https://github.com/apache/datafusion-python/pull/1255 Bumps [arrow](https://github.com/apache/arrow-rs) from 56.1.0 to 56.2.0. Release notes Sourced from https://github.com/apache/arrow-rs/releases";>arrow's releases. ar

Re: [PR] chore: refactor usage of `reassign_predicate_columns` [datafusion]

2025-09-27 Thread via GitHub
rkrishn7 commented on PR #17703: URL: https://github.com/apache/datafusion/pull/17703#issuecomment-3341968990 @alamb Updated w/ your suggestions! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to t

Re: [PR] chore: refactor usage of `reassign_predicate_columns` [datafusion]

2025-09-27 Thread via GitHub
rkrishn7 commented on code in PR #17703: URL: https://github.com/apache/datafusion/pull/17703#discussion_r2384274536 ## datafusion/physical-expr/src/utils/mod.rs: ## @@ -238,22 +238,26 @@ pub fn collect_columns(expr: &Arc) -> HashSet { columns } -/// Re-assign column in

Re: [PR] chore: refactor usage of `reassign_predicate_columns` [datafusion]

2025-09-27 Thread via GitHub
rkrishn7 commented on code in PR #17703: URL: https://github.com/apache/datafusion/pull/17703#discussion_r2384272903 ## datafusion/datasource/src/file_scan_config.rs: ## @@ -764,24 +751,24 @@ impl FileScanConfig { eq_properties: &mut EquivalenceProperties, sche

Re: [PR] feat(spark): implement Spark `map` function `map_from_entries` [datafusion]

2025-09-27 Thread via GitHub
comphead commented on code in PR #17779: URL: https://github.com/apache/datafusion/pull/17779#discussion_r2384270558 ## datafusion/sqllogictest/test_files/spark/map/map_from_entries.slt: ## Review Comment: Yeah, there is a test from ([], null) we can also have otherwise (nu

Re: [PR] Impl spark bit not function [datafusion]

2025-09-27 Thread via GitHub
kazantsev-maksim closed pull request #17155: Impl spark bit not function URL: https://github.com/apache/datafusion/pull/17155 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [PR] Optimize CASE expressions by removing WHEN false branches [datafusion]

2025-09-27 Thread via GitHub
petern48 commented on PR #17628: URL: https://github.com/apache/datafusion/pull/17628#issuecomment-3341936613 I don't mind taking it if you'd rather save your time for other things. Should be pretty straightforward for me since I did the previous WHEN TRUE PR. I'll leave that up to you @ala

[PR] Feat: Add sha1 function impl [datafusion-comet]

2025-09-27 Thread via GitHub
kazantsev-maksim opened a new pull request, #2471: URL: https://github.com/apache/datafusion-comet/pull/2471 ## Which issue does this PR close? Part of: https://github.com/apache/datafusion-comet/issues/2443 Part of: https://github.com/apache/datafusion-comet/issues/2443

Re: [PR] Make most `pyclasses` frozen [datafusion-python]

2025-09-27 Thread via GitHub
timsaucer commented on PR #1252: URL: https://github.com/apache/datafusion-python/pull/1252#issuecomment-3341824898 This and https://github.com/apache/datafusion-python/pull/1253 appear to be duplicates. @kosiew and @ntjohnson1 do you have thoughts about merging or closing one? -- This

Re: [PR] Make most `pyclasses` frozen [datafusion-python]

2025-09-27 Thread via GitHub
ntjohnson1 commented on PR #1252: URL: https://github.com/apache/datafusion-python/pull/1252#issuecomment-3341836589 > This and #1253 appear to be duplicates. @kosiew and @ntjohnson1 do you have thoughts about merging or closing one? #1253 looks way more comprehensive so should be me

Re: [PR] chore: Use checked arithmetic in unified memory pool accounting [WIP] [datafusion-comet]

2025-09-27 Thread via GitHub
andygrove closed pull request #2454: chore: Use checked arithmetic in unified memory pool accounting [WIP] URL: https://github.com/apache/datafusion-comet/pull/2454 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

Re: [I] Can't read a directory of CSV files / CSV schema evolution: incorrect number of fields for line 1, expected 17 got 20 [datafusion]

2025-09-27 Thread via GitHub
EeshanBembi commented on issue #17516: URL: https://github.com/apache/datafusion/issues/17516#issuecomment-3341707275 Yes @alamb , sorry i was not able to work on this earlier, i have addressed the review comments now! -- This is an automated message from the Apache Git Service. To respon

Re: [PR] feat: data source sampling via extension [datafusion]

2025-09-27 Thread via GitHub
theirix commented on code in PR #17633: URL: https://github.com/apache/datafusion/pull/17633#discussion_r2384019257 ## datafusion-examples/examples/table_sample.rs: ## @@ -0,0 +1,1353 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor licen

Re: [PR] feat: data source sampling via extension [datafusion]

2025-09-27 Thread via GitHub
vegarsti commented on code in PR #17633: URL: https://github.com/apache/datafusion/pull/17633#discussion_r2383936943 ## datafusion-examples/examples/table_sample.rs: ## @@ -0,0 +1,1252 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor lice

Re: [I] Make `generate_series` compatible with PostgreSQL [datafusion]

2025-09-27 Thread via GitHub
Jefffrey commented on issue #13316: URL: https://github.com/apache/datafusion/issues/13316#issuecomment-3341572652 Closed by #13540 (following duckdb behaviour) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL a

Re: [I] Make `generate_series` compatible with PostgreSQL [datafusion]

2025-09-27 Thread via GitHub
Jefffrey closed issue #13316: Make `generate_series` compatible with PostgreSQL URL: https://github.com/apache/datafusion/issues/13316 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific com

Re: [I] Slow aggregrate query with `array_agg`, Polars is 4 times faster for equal query [datafusion]

2025-09-27 Thread via GitHub
duongcongtoai commented on issue #17446: URL: https://github.com/apache/datafusion/issues/17446#issuecomment-3341564409 ``` hyperfine "~/proj/rust/build/release-nonlto/datafusion-cli-opt -f report.sql" "~/proj/rust/build/release-nonlto/datafusion-cli-49.0.0 -f report.sql" "~/proj/rust/b

[PR] More decimal 32/64 support - type coercsion and misc gaps [datafusion]

2025-09-27 Thread via GitHub
AdamGS opened a new pull request, #17808: URL: https://github.com/apache/datafusion/pull/17808 ## Which issue does this PR close? This is a followup of #17501 and #17489 (Should I open a dedicated issue?). The changes are taken out of https://github.com/apache/datafusion/pull/17

Re: [I] `sql_planner` benchmark panic'ing on main [datafusion]

2025-09-27 Thread via GitHub
pepijnve commented on issue #17801: URL: https://github.com/apache/datafusion/issues/17801#issuecomment-3341545330 I’ll try to understand what was broken in this change later today. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to Git

Re: [I] Result from `array_has` depends on runtime array type [datafusion]

2025-09-27 Thread via GitHub
Jefffrey commented on issue #16459: URL: https://github.com/apache/datafusion/issues/16459#issuecomment-3341485635 I tried reproducing this, both on current main (2f54f3033263bd00a7694de68b065f0d1e899243) and also on e1716f91c9794f2717130963c27f5e4202e9abe2 which was the latest commit on `

Re: [I] `sql_planner` benchmark panic'ing on main [datafusion]

2025-09-27 Thread via GitHub
alamb commented on issue #17801: URL: https://github.com/apache/datafusion/issues/17801#issuecomment-3341442438 You beat me to it: ``` (venv) andrewlamb@Andrews-MacBook-Pro-3:~/Software/datafusion$ git bisect bad e5dcc8c04f9559f8af6efea3c7ff8202f3c1c618 is the first bad commit

Re: [PR] feat: data source sampling via extension [datafusion]

2025-09-27 Thread via GitHub
theirix commented on code in PR #17633: URL: https://github.com/apache/datafusion/pull/17633#discussion_r2384016887 ## datafusion-examples/examples/table_sample.rs: ## @@ -0,0 +1,1252 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor licen

Re: [PR] Blog: Add blog post about DataFusion 50.0.0 release [datafusion-site]

2025-09-27 Thread via GitHub
mbutrovich commented on code in PR #115: URL: https://github.com/apache/datafusion-site/pull/115#discussion_r2382572800 ## content/blog/2025-09-29-datafusion-50.0.0.md: ## @@ -0,0 +1,413 @@ +--- +layout: post +title: Apache DataFusion 50.0.0 Released +date: 2025-09-29 +author: p

Re: [PR] Feat : Bringing in support for map_filter expression. [datafusion-comet]

2025-09-27 Thread via GitHub
codetyri0n closed pull request #2236: Feat : Bringing in support for map_filter expression. URL: https://github.com/apache/datafusion-comet/pull/2236 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [I] `sql_planner` benchmark panic'ing on main [datafusion]

2025-09-27 Thread via GitHub
pepijnve commented on issue #17801: URL: https://github.com/apache/datafusion/issues/17801#issuecomment-3341394840 Bisect points to https://github.com/apache/datafusion/commit/e5dcc8c04f9559f8af6efea3c7ff8202f3c1c618 -- This is an automated message from the Apache Git Service. To respond

Re: [PR] dev: Add Apache license check to the lint script [datafusion]

2025-09-27 Thread via GitHub
2010YOUY01 commented on code in PR #17787: URL: https://github.com/apache/datafusion/pull/17787#discussion_r2383969174 ## .github/workflows/dev.yml: ## @@ -33,7 +33,10 @@ jobs: name: Check License Header steps: - uses: actions/checkout@08c6903cd8c0fde910a37f8832

Re: [PR] dev: Add Apache license check to the lint script [datafusion]

2025-09-27 Thread via GitHub
2010YOUY01 commented on PR #17787: URL: https://github.com/apache/datafusion/pull/17787#issuecomment-3341352688 Thanks for the review @comphead @alamb -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to g

Re: [I] `sql_planner` benchmark panic'ing on main [datafusion]

2025-09-27 Thread via GitHub
pepijnve commented on issue #17801: URL: https://github.com/apache/datafusion/issues/17801#issuecomment-3341345796 🤦‍♂️ internal clone `main` != upstream `main`. Running again on `5bbdb7eb114d19e9a45f993af4ab2d2535c0cfbc` -- This is an automated message from the Apache Git Service. To res

Re: [I] `sql_planner` benchmark panic'ing on main [datafusion]

2025-09-27 Thread via GitHub
pepijnve commented on issue #17801: URL: https://github.com/apache/datafusion/issues/17801#issuecomment-3341344131 I ran `cargo bench --profile dev --bench sql_planner -- physical_plan_tpcds_all` locally this morning. No panic on `0d60ccae40d0e8f2d22c15fafb01c5d4be8c63a6`. -- This is an

Re: [PR] chore: refactor usage of `reassign_predicate_columns` [datafusion]

2025-09-27 Thread via GitHub
rkrishn7 commented on PR #17703: URL: https://github.com/apache/datafusion/pull/17703#issuecomment-3339934947 Hey @alamb sorry I've been quite busy this week. I plan to address your comments either today or tomorrow! -- This is an automated message from the Apache Git Service. To re

Re: [PR] feat: data source sampling via extension [datafusion]

2025-09-26 Thread via GitHub
vegarsti commented on code in PR #17633: URL: https://github.com/apache/datafusion/pull/17633#discussion_r2383934243 ## datafusion-examples/examples/table_sample.rs: ## @@ -0,0 +1,1252 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor lice

Re: [PR] feat: data source sampling via extension [datafusion]

2025-09-26 Thread via GitHub
vegarsti commented on code in PR #17633: URL: https://github.com/apache/datafusion/pull/17633#discussion_r2383933775 ## datafusion-examples/examples/table_sample.rs: ## @@ -0,0 +1,1252 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor lice

Re: [PR] docs: update wasmtest README with instructions for Apple silicon [datafusion]

2025-09-26 Thread via GitHub
Jefffrey merged PR #17755: URL: https://github.com/apache/datafusion/pull/17755 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

Re: [I] Add `array_min` function support [datafusion]

2025-09-26 Thread via GitHub
Jefffrey commented on issue #14416: URL: https://github.com/apache/datafusion/issues/14416#issuecomment-3341207807 Implemented by #16574 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specif

[PR] chore: remove dialect settings in SLT tests that are outdated [datafusion]

2025-09-26 Thread via GitHub
Jefffrey opened a new pull request, #17807: URL: https://github.com/apache/datafusion/pull/17807 Closes #16516 Closes #15719 These two issues have been supported for a while it seems, with default generic SQL dialect allowing filters on aggregates. Fix the SLT tests to not need to

Re: [I] Add `array_min` function support [datafusion]

2025-09-26 Thread via GitHub
Jefffrey closed issue #14416: Add `array_min` function support URL: https://github.com/apache/datafusion/issues/14416 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubsc

Re: [I] Add SedonaDB to list of known issues [datafusion]

2025-09-26 Thread via GitHub
Jefffrey closed issue #17794: Add SedonaDB to list of known issues URL: https://github.com/apache/datafusion/issues/17794 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To uns

Re: [PR] docs: Add SedonaDB as known user of Apache DataFusion [datafusion]

2025-09-26 Thread via GitHub
Jefffrey merged PR #17806: URL: https://github.com/apache/datafusion/pull/17806 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

Re: [PR] docs: Add SedonaDB as known user of Apache DataFusion [datafusion]

2025-09-26 Thread via GitHub
jiayuasu commented on code in PR #17806: URL: https://github.com/apache/datafusion/pull/17806#discussion_r2383848553 ## docs/source/user-guide/introduction.md: ## @@ -121,11 +120,11 @@ Here are some active projects using DataFusion: - [Parseable] Log storage and observability p

Re: [PR] docs: Add SedonaDB as known user of Apache DataFusion [datafusion]

2025-09-26 Thread via GitHub
petern48 commented on code in PR #17806: URL: https://github.com/apache/datafusion/pull/17806#discussion_r2383810825 ## docs/source/user-guide/introduction.md: ## @@ -121,11 +120,11 @@ Here are some active projects using DataFusion: - [Parseable] Log storage and observability p

Re: [PR] perf: boolean group values implementations [datafusion]

2025-09-26 Thread via GitHub
kosiew commented on code in PR #17726: URL: https://github.com/apache/datafusion/pull/17726#discussion_r2383807325 ## datafusion/physical-plan/src/aggregates/group_values/multi_group_by/boolean.rs: ## @@ -0,0 +1,475 @@ +// Licensed to the Apache Software Foundation (ASF) under o

[PR] docs: Add SedonaDB as known user to Apache DataFusion [datafusion]

2025-09-26 Thread via GitHub
petern48 opened a new pull request, #17806: URL: https://github.com/apache/datafusion/pull/17806 ## Which issue does this PR close? - Closes #17794 ## Rationale for this change It's public knowledge now! ## What changes are included in this PR?

Re: [PR] [Backport] Fix potential overflow when we print verbose physical plan [datafusion]

2025-09-26 Thread via GitHub
zhuqi-lucas merged PR #17804: URL: https://github.com/apache/datafusion/pull/17804 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@da

Re: [PR] chore(deps): bump prost from 0.13.5 to 0.14.1 in the proto group [datafusion]

2025-09-26 Thread via GitHub
dependabot[bot] commented on PR #17752: URL: https://github.com/apache/datafusion/pull/17752#issuecomment-3340897860 This pull request was built based on a group rule. Closing it will not ignore any of these versions in future pull requests. To ignore these dependencies, configure [ig

Re: [PR] minor: Improve the log message of `CometTestBase#checkCometOperators` [datafusion-comet]

2025-09-26 Thread via GitHub
mbutrovich merged PR #2458: URL: https://github.com/apache/datafusion-comet/pull/2458 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...

Re: [PR] docs: refine `AggregateUDFImpl::is_ordered_set_aggregate` documentation [datafusion]

2025-09-26 Thread via GitHub
Jefffrey commented on code in PR #17805: URL: https://github.com/apache/datafusion/pull/17805#discussion_r2383774417 ## datafusion/expr/src/udaf.rs: ## @@ -742,18 +742,23 @@ pub trait AggregateUDFImpl: Debug + DynEq + DynHash + Send + Sync { /// If this function is ordered

Re: [PR] Fix potential overflow when we print verbose physical plan [datafusion]

2025-09-26 Thread via GitHub
Jefffrey commented on code in PR #17798: URL: https://github.com/apache/datafusion/pull/17798#discussion_r2383733928 ## datafusion/core/src/physical_planner.rs: ## @@ -2135,7 +2135,15 @@ impl DefaultPhysicalPlanner { "Optimized physical plan:\n{}\n", di

[PR] [Backport] Fix potential overflow when we print verbose physical plan [datafusion]

2025-09-26 Thread via GitHub
zhuqi-lucas opened a new pull request, #17804: URL: https://github.com/apache/datafusion/pull/17804 ## Which issue does this PR close? Backport: Fix potential overflow when we print verbose physical plan https://github.com/apache/datafusion/pull/17798 ## Rationale for this c

Re: [PR] feat: support table sample [datafusion]

2025-09-26 Thread via GitHub
github-actions[bot] closed pull request #16505: feat: support table sample URL: https://github.com/apache/datafusion/pull/16505 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[PR] chore(deps): bump taiki-e/install-action from 2.62.8 to 2.62.9 [datafusion]

2025-09-26 Thread via GitHub
dependabot[bot] opened a new pull request, #17799: URL: https://github.com/apache/datafusion/pull/17799 Bumps [taiki-e/install-action](https://github.com/taiki-e/install-action) from 2.62.8 to 2.62.9. Release notes Sourced from https://github.com/taiki-e/install-action/releases";>t

Re: [PR] fix: ignore `DataType::Null` in possible types during csv type inference [datafusion]

2025-09-26 Thread via GitHub
dqkqd commented on PR #17796: URL: https://github.com/apache/datafusion/pull/17796#issuecomment-3341017167 The test failed. It ensures an empty table should have its columns infer as `Uft8`. DuckDB does the same so I think this is correct. ```bash D CREATE TABLE empty AS SELECT

Re: [PR] Add support for Float16 type in substrait [datafusion]

2025-09-26 Thread via GitHub
github-actions[bot] commented on PR #16793: URL: https://github.com/apache/datafusion/pull/16793#issuecomment-3341010919 Thank you for your contribution. Unfortunately, this pull request is stale because it has been open 60 days with no activity. Please remove the stale label or comment or

Re: [PR] Fix potential overflow when we print verbose physical plan [datafusion]

2025-09-26 Thread via GitHub
zhuqi-lucas commented on PR #17798: URL: https://github.com/apache/datafusion/pull/17798#issuecomment-3340986856 Thank you @alamb and @Jefffrey for review. I think it's valuable for 50.1.0, let me create a backport to 50.1.0. > Thanks @zhuqi-lucas > > Do you think it would be

Re: [I] TDigest quantile estimate algorithm correctness [datafusion]

2025-09-26 Thread via GitHub
Jefffrey commented on issue #17803: URL: https://github.com/apache/datafusion/issues/17803#issuecomment-3340944096 More recent code link: https://github.com/apache/datafusion/blob/8c8e5651d0ca3d4b41c1de2ed66d3624263acdbc/datafusion/functions-aggregate-common/src/tdigest.rs#L531-L532 S

Re: [I] approx_percentile_cont panics because array is not ordered [datafusion]

2025-09-26 Thread via GitHub
Jefffrey commented on issue #4259: URL: https://github.com/apache/datafusion/issues/4259#issuecomment-3340947056 > [@Jefffrey](https://github.com/Jefffrey) Thanks for catching up this old issue. Feel free to go ahead. I may not have time to follow this for now Raised #17803 to track,

[I] TDigest quantile estimate algorithm correctness [datafusion]

2025-09-26 Thread via GitHub
Jefffrey opened a new issue, #17803: URL: https://github.com/apache/datafusion/issues/17803 > BTW, the estimate quantile algorithm doesn't follow the `paper`, any reason for this? > > https://github.com/apache/arrow-datafusion/blob/df8aa7a2e2a6f54acfbfed336b84144256fb7ff8/datafusion/phy

Re: [PR] feat(spark): implement Spark `make_dt_interval` function [datafusion]

2025-09-26 Thread via GitHub
Jefffrey commented on PR #17728: URL: https://github.com/apache/datafusion/pull/17728#issuecomment-3340915831 Thanks @davidlghellin -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific c

Re: [PR] chore(deps): bump sysinfo from 0.37.0 to 0.37.1 [datafusion]

2025-09-26 Thread via GitHub
Jefffrey merged PR #17800: URL: https://github.com/apache/datafusion/pull/17800 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

Re: [PR] feat(spark): implement Spark `map` function `map_from_entries` [datafusion]

2025-09-26 Thread via GitHub
Jefffrey commented on code in PR #17779: URL: https://github.com/apache/datafusion/pull/17779#discussion_r2383731056 ## datafusion/sqllogictest/test_files/spark/map/map_from_entries.slt: ## Review Comment: Perhaps add tests for `SELECT map_from_entries(NULL)` and also when

Re: [PR] minor: Skip calculating per-task memory limit when in off-heap mode [datafusion-comet]

2025-09-26 Thread via GitHub
mbutrovich commented on code in PR #2462: URL: https://github.com/apache/datafusion-comet/pull/2462#discussion_r2383376194 ## spark/src/main/scala/org/apache/comet/CometExecIterator.scala: ## @@ -93,6 +93,14 @@ class CometExecIterator( } val protobufSparkConfigs = buil

Re: [PR] chore(deps): bump prost from 0.13.5 to 0.14.1 in the proto group [datafusion]

2025-09-26 Thread via GitHub
Jefffrey closed pull request #17752: chore(deps): bump prost from 0.13.5 to 0.14.1 in the proto group URL: https://github.com/apache/datafusion/pull/17752 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to g

Re: [PR] chore(deps): bump taiki-e/install-action from 2.62.8 to 2.62.9 [datafusion]

2025-09-26 Thread via GitHub
Jefffrey merged PR #17799: URL: https://github.com/apache/datafusion/pull/17799 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

Re: [PR] [DOCS] Add dbt Fusion engine and R2 Query Engine to "Known Users" [datafusion]

2025-09-26 Thread via GitHub
alamb commented on PR #17793: URL: https://github.com/apache/datafusion/pull/17793#issuecomment-3340158764 🚀 📰 Thank you @dataders and @comphead -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

Re: [I] `SessionContext` propagation in physical plan proto functions [datafusion]

2025-09-26 Thread via GitHub
alamb closed issue #17596: `SessionContext` propagation in physical plan proto functions URL: https://github.com/apache/datafusion/issues/17596 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the sp

Re: [I] ExecutionMemoryPool errors releasing more memory than allocated [datafusion-comet]

2025-09-26 Thread via GitHub
parthchandra commented on issue #2453: URL: https://github.com/apache/datafusion-comet/issues/2453#issuecomment-3340848003 Either way, tremendous bit of debugging! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the U

Re: [PR] feat: Display window function's alias name in output column [datafusion]

2025-09-26 Thread via GitHub
alamb merged PR #17788: URL: https://github.com/apache/datafusion/pull/17788 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] fix: ignore `DataType::Null` in possible types during csv type inference [datafusion]

2025-09-26 Thread via GitHub
dqkqd commented on code in PR #17796: URL: https://github.com/apache/datafusion/pull/17796#discussion_r2383659583 ## datafusion/datasource-csv/src/file_format.rs: ## @@ -593,15 +593,23 @@ fn build_schema_helper(names: Vec, types: &[HashSet]) -> Schem .zip(types)

Re: [PR] fix: ignore `DataType::Null` in possible types during csv type inference [datafusion]

2025-09-26 Thread via GitHub
dqkqd commented on code in PR #17796: URL: https://github.com/apache/datafusion/pull/17796#discussion_r2383668162 ## datafusion/core/src/datasource/file_format/csv.rs: ## @@ -470,6 +471,47 @@ mod tests { Ok(()) } +#[tokio::test] +async fn test_infer_schem

Re: [PR] fix: ignore `DataType::Null` in possible types during csv type inference [datafusion]

2025-09-26 Thread via GitHub
dqkqd commented on code in PR #17796: URL: https://github.com/apache/datafusion/pull/17796#discussion_r2383659583 ## datafusion/datasource-csv/src/file_format.rs: ## @@ -593,15 +593,23 @@ fn build_schema_helper(names: Vec, types: &[HashSet]) -> Schem .zip(types)

Re: [PR] Chore: Used DataFusion impl of bit_get function [datafusion-comet]

2025-09-26 Thread via GitHub
andygrove merged PR #2466: URL: https://github.com/apache/datafusion-comet/pull/2466 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@

Re: [I] Regression: SQL parser became less tolerant to columns that match SQL keywords [datafusion]

2025-09-26 Thread via GitHub
sergiimk commented on issue #17802: URL: https://github.com/apache/datafusion/issues/17802#issuecomment-3340754640 Bisected a version of `sqlparser` that introduced this issue and created a ticket: https://github.com/apache/datafusion-sqlparser-rs/issues/2049 -- This is an automated me

[PR] chore: support specific query [datafusion-benchmarks]

2025-09-26 Thread via GitHub
comphead opened a new pull request, #23: URL: https://github.com/apache/datafusion-benchmarks/pull/23 Added `--query` command line argument: A new optional parameter that accepts an integer representing the specific query number to run (1-based indexing). Modified the main function si

[I] Regression: SQL parser became less tolerant to columns that match SQL keywords [datafusion]

2025-09-26 Thread via GitHub
sergiimk opened a new issue, #17802: URL: https://github.com/apache/datafusion/issues/17802 ### Describe the bug Latest release (between `49.0.2` and `50.0.0`) seems to introduce a regression in SQL parser that causes it to be less tolerant of column names that collide with SQL keywo

Re: [PR] Reduce cloning in LogicalPlanBuilder [datafusion]

2025-09-26 Thread via GitHub
findepi commented on code in PR #17675: URL: https://github.com/apache/datafusion/pull/17675#discussion_r2382983761 ## datafusion/expr/src/expr_rewriter/mod.rs: ## @@ -214,26 +214,29 @@ pub fn strip_outer_reference(expr: Expr) -> Expr { /// Returns plan with expressions coerced

Re: [I] ExecutionMemoryPool errors releasing more memory than allocated [datafusion-comet]

2025-09-26 Thread via GitHub
andygrove commented on issue #2453: URL: https://github.com/apache/datafusion-comet/issues/2453#issuecomment-3340627340 more debug logging, which now more clearly shows what is happening: ``` 25/09/26 15:50:30 INFO CometTaskMemoryManager: Task 1561 requested 734496 bytes 25/09/

Re: [PR] minor: Skip calculating per-task memory limit when in off-heap mode [datafusion-comet]

2025-09-26 Thread via GitHub
mbutrovich merged PR #2462: URL: https://github.com/apache/datafusion-comet/pull/2462 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...

[PR] Fix potential overflow when we print verbose physical plan [datafusion]

2025-09-26 Thread via GitHub
zhuqi-lucas opened a new pull request, #17798: URL: https://github.com/apache/datafusion/pull/17798 ## Which issue does this PR close? When we upgrade to apache datafusion 50, we meet error when we enable debug log: ```rust thread 'tokio-runtime-worker' has overflowed its s

Re: [PR] Blog: Add blog post about DataFusion 50.0.0 release [datafusion-site]

2025-09-26 Thread via GitHub
alamb commented on PR #115: URL: https://github.com/apache/datafusion-site/pull/115#issuecomment-3339550277 Also, is it ok if I put contributors names next to the features as we have done in past releases? I think that is a nice acknowledgment to the community as well as serves as addition

[PR] ignore [datafusion-comet]

2025-09-26 Thread via GitHub
andygrove opened a new pull request, #2468: URL: https://github.com/apache/datafusion-comet/pull/2468 ## Which issue does this PR close? Closes #. ## Rationale for this change ## What changes are included in this PR? ## How are these changes

Re: [PR] build: ignore [datafusion-comet]

2025-09-26 Thread via GitHub
andygrove closed pull request #2468: build: ignore URL: https://github.com/apache/datafusion-comet/pull/2468 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-m

Re: [PR] Blog: Add blog post about DataFusion 50.0.0 release [datafusion-site]

2025-09-26 Thread via GitHub
nuno-faria commented on PR #115: URL: https://github.com/apache/datafusion-site/pull/115#issuecomment-3340033431 > Thanks @nuno-faria I think we need to include a `Known issues` section and point users to upcoming hot fixes release and whats in there. > > Just point to [apache/datafu

Re: [I] Interest in partial parsing and sql formatting? [datafusion-sqlparser-rs]

2025-09-26 Thread via GitHub
alamb commented on issue #1392: URL: https://github.com/apache/datafusion-sqlparser-rs/issues/1392#issuecomment-3340197047 cc @iffyio and @yoavcloud as they are the primary maintainers these days -- This is an automated message from the Apache Git Service. To respond to the message, plea

Re: [PR] Display qualifiers in EXPLAIN [datafusion]

2025-09-26 Thread via GitHub
findepi commented on code in PR #17645: URL: https://github.com/apache/datafusion/pull/17645#discussion_r2383479380 ## datafusion/core/tests/sql/explain_analyze.rs: ## @@ -182,9 +182,9 @@ async fn csv_explain_plans() { actual, @r" Explain [plan_type:Utf8,

Re: [PR] Display qualifiers in EXPLAIN [datafusion]

2025-09-26 Thread via GitHub
findepi commented on PR #17645: URL: https://github.com/apache/datafusion/pull/17645#issuecomment-3340499323 I see how it's controversial. Maybe it could go behind a session property. -- This is an automated message from the Apache Git Service. To respond to the message, please log on

Re: [PR] Display qualifiers in EXPLAIN [datafusion]

2025-09-26 Thread via GitHub
findepi commented on code in PR #17645: URL: https://github.com/apache/datafusion/pull/17645#discussion_r2383481367 ## datafusion/core/tests/dataframe/dataframe_functions.rs: ## @@ -1310,8 +1310,8 @@ async fn test_count_wildcard() -> Result<()> { @r" Sort: count(*)

Re: [PR] Display qualifiers in EXPLAIN [datafusion]

2025-09-26 Thread via GitHub
findepi closed pull request #17645: Display qualifiers in EXPLAIN URL: https://github.com/apache/datafusion/pull/17645 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubs

Re: [PR] Display qualifiers in EXPLAIN [datafusion]

2025-09-26 Thread via GitHub
findepi commented on code in PR #17645: URL: https://github.com/apache/datafusion/pull/17645#discussion_r2383478286 ## datafusion/core/tests/dataframe/mod.rs: ## @@ -1863,7 +1863,7 @@ async fn with_column_renamed_join() -> Result<()> { assert_snapshot!( df_renamed.

Re: [PR] minor: Skip calculating per-task memory limit when in off-heap mode [datafusion-comet]

2025-09-26 Thread via GitHub
andygrove commented on code in PR #2462: URL: https://github.com/apache/datafusion-comet/pull/2462#discussion_r2383448563 ## spark/src/main/scala/org/apache/comet/CometExecIterator.scala: ## @@ -93,6 +93,14 @@ class CometExecIterator( } val protobufSparkConfigs = build

Re: [I] `sql_planner` benchmark panic'ing on main [datafusion]

2025-09-26 Thread via GitHub
alamb commented on issue #17801: URL: https://github.com/apache/datafusion/issues/17801#issuecomment-3340482694 I ran this on the most recent commit on branch-50 (DataFusion 50.0.0) and it works I am now running git bisect to find the bad commit -- This is an automated message from

Re: [PR] minor: Skip calculating per-task memory limit when in off-heap mode [datafusion-comet]

2025-09-26 Thread via GitHub
andygrove commented on code in PR #2462: URL: https://github.com/apache/datafusion-comet/pull/2462#discussion_r2383448563 ## spark/src/main/scala/org/apache/comet/CometExecIterator.scala: ## @@ -93,6 +93,14 @@ class CometExecIterator( } val protobufSparkConfigs = build

Re: [PR] feat: change `datafusion-proto` to use `TaskContext` rather than`SessionContext` for physical plan serialization [datafusion]

2025-09-26 Thread via GitHub
alamb commented on PR #17601: URL: https://github.com/apache/datafusion/pull/17601#issuecomment-3340457394 🚀 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe

Re: [PR] feat: change `datafusion-proto` to use `TaskContext` rather than`SessionContext` for physical plan serialization [datafusion]

2025-09-26 Thread via GitHub
alamb merged PR #17601: URL: https://github.com/apache/datafusion/pull/17601 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] Snowflake: ALTER USER and KeyValueOptions Refactoring [datafusion-sqlparser-rs]

2025-09-26 Thread via GitHub
alamb commented on PR #2035: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/2035#issuecomment-3340455452 > That is s cool! Thank you @alamb and @iffyio! Welcome aboard! -- This is an automated message from the Apache Git Service. To respond to the message, please l

[I] Dropping a GlobalRef in a detached thread [datafusion-comet]

2025-09-26 Thread via GitHub
andygrove opened a new issue, #2470: URL: https://github.com/apache/datafusion-comet/issues/2470 ### Describe the bug When running benchmarks with reduced off-heap memory allocated, I see these warnings. I am not sure if this is something we can fix, or is just the result of a drop h

Re: [PR] fix: ignore `DataType::Null` in possible types during csv type inference [datafusion]

2025-09-26 Thread via GitHub
alamb commented on code in PR #17796: URL: https://github.com/apache/datafusion/pull/17796#discussion_r2383402440 ## datafusion/datasource-csv/src/file_format.rs: ## @@ -593,15 +593,23 @@ fn build_schema_helper(names: Vec, types: &[HashSet]) -> Schem .zip(types)

Re: [PR] [branch-50] Backport change to avoid debug symbols in ci builds to 50.0.0 [datafusion]

2025-09-26 Thread via GitHub
alamb merged PR #17795: URL: https://github.com/apache/datafusion/pull/17795 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

[PR] Chore: Used DataFusion impl of bit_get function [datafusion-comet]

2025-09-26 Thread via GitHub
kazantsev-maksim opened a new pull request, #2466: URL: https://github.com/apache/datafusion-comet/pull/2466 ## Which issue does this PR close? ## What changes are included in this PR? ## How are these changes tested? Tested with existing unit tests -- This is

  1   2   3   4   5   6   7   8   9   10   >