Re: [PR] Add support for MSSQL IF/ELSE statements. [datafusion-sqlparser-rs]

2025-04-05 Thread via GitHub
iffyio merged PR #1791: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1791 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr

Re: [I] Getting started guide for new users (who want to use DataFusion in their project) [datafusion]

2025-04-05 Thread via GitHub
aaryyya commented on issue #7014: URL: https://github.com/apache/datafusion/issues/7014#issuecomment-2781239569 Hey is this still active? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the s

[I] Redundant Repartition: `RoundRobinBatch` Followed by `Hash` in Physical Plans [datafusion]

2025-04-05 Thread via GitHub
UBarney opened a new issue, #15601: URL: https://github.com/apache/datafusion/issues/15601 DataFusion's physical plan optimizer can generate plans containing a sequence of two RepartitionExec operators: `RepartitionExec(Hash)` directly consuming the output of `RepartitionExec(RoundRobinBatc

Re: [I] Blog post about TopK filter pushdown [datafusion]

2025-04-05 Thread via GitHub
aaryyya commented on issue #15513: URL: https://github.com/apache/datafusion/issues/15513#issuecomment-2781244647 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. T

Re: [PR] Improve performance of `last_value` by implementing special `GroupsAccumulator` [datafusion]

2025-04-05 Thread via GitHub
UBarney commented on PR #15542: URL: https://github.com/apache/datafusion/pull/15542#issuecomment-2781205780 @comphead Thanks for reviewing. I have split this PR. This PR only contains performance improvements. After this PR is merged, I will start a refactor PR to handle renames and code m

Re: [PR] fix: recursion protection for physical plan node [datafusion]

2025-04-05 Thread via GitHub
chenkovsky commented on PR #15600: URL: https://github.com/apache/datafusion/pull/15600#issuecomment-2781233257 @milenkovicm could you please help review this PR. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

Re: [PR] Comet 0.7.0 [datafusion-site]

2025-04-05 Thread via GitHub
andygrove merged PR #63: URL: https://github.com/apache/datafusion-site/pull/63 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

Re: [PR] simplify `array_has` UDF to `InList` expr when haystack is constant [datafusion]

2025-04-05 Thread via GitHub
davidhewitt commented on code in PR #15354: URL: https://github.com/apache/datafusion/pull/15354#discussion_r2008383822 ## datafusion/functions-nested/src/array_has.rs: ## @@ -121,6 +123,43 @@ impl ScalarUDFImpl for ArrayHas { Ok(DataType::Boolean) } +fn simp

Re: [PR] Reuse last projection layer when renaming columns [datafusion]

2025-04-05 Thread via GitHub
blaginin commented on PR #14684: URL: https://github.com/apache/datafusion/pull/14684#issuecomment-2730403354 Want to merge https://github.com/apache/datafusion/pull/14781 and https://github.com/apache/datafusion/pull/15159 first so making this as a draft -- This is an automated message f

[I] Improve performance if min/max aggregates for Durations [datafusion]

2025-04-05 Thread via GitHub
alamb opened a new issue, #15317: URL: https://github.com/apache/datafusion/issues/15317 ### Is your feature request related to a problem or challenge? @svranesevic implemented basic support for Min/Max Duration types in this PR: ❤ - https://github.com/apache/datafusion/pull/153

Re: [PR] feat: instrument spawned tasks with current tracing span when `tracing` feature is enabled [datafusion]

2025-04-05 Thread via GitHub
alamb commented on PR #14547: URL: https://github.com/apache/datafusion/pull/14547#issuecomment-2738119213 (this is a very nice first contribution @geoffreyclaude -- thank you for sticking with it 🙏 ) -- This is an automated message from the Apache Git Service. To respond to the message,

Re: [PR] Migrate dataframe tests to `insta` [datafusion]

2025-04-05 Thread via GitHub
blaginin commented on code in PR #15262: URL: https://github.com/apache/datafusion/pull/15262#discussion_r1998926847 ## datafusion/core/tests/dataframe/dataframe_functions.rs: ## @@ -75,34 +75,28 @@ async fn create_test_table() -> Result { } /// Executes an expression on the

Re: [PR] Migrate-substrait-tests-to-insta, part2 [datafusion]

2025-04-05 Thread via GitHub
blaginin commented on PR #15480: URL: https://github.com/apache/datafusion/pull/15480#issuecomment-2764289843 Hey again, thanks for working on that 🙏 > can you merge main into this branch please? to remove extra diff Just to explain, the current PR diff is quite large because it

Re: [PR] Migrate datafusion/sql tests to insta, part5 [datafusion]

2025-04-05 Thread via GitHub
qstommyshu commented on code in PR #15567: URL: https://github.com/apache/datafusion/pull/15567#discussion_r2027712552 ## datafusion/sql/tests/sql_integration.rs: ## @@ -3388,26 +3389,15 @@ fn ident_normalization_parser_options_ident_normalization() -> ParserOptions { } }

Re: [I] Speed up hash partitioning [datafusion]

2025-04-05 Thread via GitHub
alamb commented on issue #6822: URL: https://github.com/apache/datafusion/issues/6822#issuecomment-2746394343 Thanks for checking this out @zebsme I don't really have any other ideas -- This is an automated message from the Apache Git Service. To respond to the message, please log

Re: [PR] WIP: Add `FileScanConfigBuilder` [datafusion]

2025-04-05 Thread via GitHub
blaginin commented on PR #15352: URL: https://github.com/apache/datafusion/pull/15352#issuecomment-2749431193 also fyi @AdamGS 👀 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comme

Re: [PR] Improve the calculation of statistics in `statistics_from_parquet_meta_calc` [datafusion]

2025-04-05 Thread via GitHub
xudong963 commented on PR #15289: URL: https://github.com/apache/datafusion/pull/15289#issuecomment-2742811385 > Instead of either "try for all" or "skip at all", isn't better to only go over the columns which has statistics.is_some() ? In fact, even if a column in `table_schema` does

Re: [PR] Fix predicate pushdown for custom SchemaAdapters [datafusion]

2025-04-05 Thread via GitHub
alamb commented on PR #15263: URL: https://github.com/apache/datafusion/pull/15263#issuecomment-2737802791 🚀 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe

Re: [PR] Always use `PartitionMode::Auto` in planner [datafusion]

2025-04-05 Thread via GitHub
ozankabak commented on code in PR #15339: URL: https://github.com/apache/datafusion/pull/15339#discussion_r2009150221 ## datafusion/sqllogictest/test_files/explain_tree.slt: ## @@ -345,63 +345,68 @@ FROM physical_plan 01)┌───┐ -02)│CoalesceBat

[PR] Sf grant account privileges [datafusion-sqlparser-rs]

2025-04-05 Thread via GitHub
yoavcloud opened a new pull request, #1794: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1794 This PR adds parsing support for more Snowflake GRANT statement options, specifically, privileges on account global objects such as connections, external volumes, etc. -- This is

Re: [PR] Add dynamic pruning filters from TopK state [datafusion]

2025-04-05 Thread via GitHub
geoffreyclaude commented on code in PR #15301: URL: https://github.com/apache/datafusion/pull/15301#discussion_r2024431842 ## datafusion/physical-plan/src/sorts/sort_filters.rs: ## @@ -0,0 +1,297 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contr

Re: [PR] Move `DataSink` to `datasource` and add session crate [datafusion]

2025-04-05 Thread via GitHub
alamb commented on PR #15371: URL: https://github.com/apache/datafusion/pull/15371#issuecomment-2748732753 I will review this carefully later today -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] (WIP) Upgrading to arrow 55 [datafusion]

2025-04-05 Thread via GitHub
alamb commented on PR #15466: URL: https://github.com/apache/datafusion/pull/15466#issuecomment-2778654942 Benchmark completed Details ``` Comparing HEAD and alamb_test_upgrade_54 Benchmark tpch_mem_sf1.json ┏━

Re: [PR] documentation :: quick-start.md sample source code correction [datafusion-ballista]

2025-04-05 Thread via GitHub
milenkovicm merged PR #1213: URL: https://github.com/apache/datafusion-ballista/pull/1213 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubsc

Re: [PR] Remove inline table scan analyzer rule [datafusion]

2025-04-05 Thread via GitHub
alamb commented on PR #15201: URL: https://github.com/apache/datafusion/pull/15201#issuecomment-2740798896 > I will port the tests on our next iteration, but would appreciate a recommendation for a good suite to add them to as I rewrite them - they are currently only validating the `InlineT

Re: [PR] Blog post on Parquet pruning in datafusion [datafusion-site]

2025-04-05 Thread via GitHub
alamb commented on PR #60: URL: https://github.com/apache/datafusion-site/pull/60#issuecomment-2741062448 Thanks @XiangpengHao and everyone who helped review. I updated the publishing date to today and changed some capitalization for consistency so off we go! -- This is an automated mess

Re: [PR] Support computing statistics for FileGroup [datafusion]

2025-04-05 Thread via GitHub
xudong963 commented on PR #15432: URL: https://github.com/apache/datafusion/pull/15432#issuecomment-2768150465 Thanks for your review! Lets go -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] Add doc for the `statistics_from_parquet_meta_calc method` [datafusion]

2025-04-05 Thread via GitHub
alamb commented on code in PR #15330: URL: https://github.com/apache/datafusion/pull/15330#discussion_r2006654113 ## datafusion/datasource-parquet/src/file_format.rs: ## @@ -797,10 +797,34 @@ pub async fn fetch_statistics( statistics_from_parquet_meta_calc(&metadata, table_

Re: [PR] Add dynamic pruning filters from TopK state [datafusion]

2025-04-05 Thread via GitHub
adriangb commented on code in PR #15301: URL: https://github.com/apache/datafusion/pull/15301#discussion_r2022023507 ## datafusion/physical-plan/src/filter.rs: ## @@ -433,6 +433,22 @@ impl ExecutionPlan for FilterExec { } try_embed_projection(projection, self)

Re: [PR] feat: add test to check for `ctx.read_json()` [datafusion-ballista]

2025-04-05 Thread via GitHub
westhide closed pull request #1212: feat: add test to check for `ctx.read_json()` URL: https://github.com/apache/datafusion-ballista/pull/1212 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spe

Re: [PR] fix: write hive partitions for any int/uint/float [datafusion]

2025-04-05 Thread via GitHub
alamb merged PR #15337: URL: https://github.com/apache/datafusion/pull/15337 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] Introduce load-balanced `split_groups_by_statistics` method [datafusion]

2025-04-05 Thread via GitHub
xudong963 commented on code in PR #15473: URL: https://github.com/apache/datafusion/pull/15473#discussion_r2020499604 ## datafusion/datasource/src/file_scan_config.rs: ## @@ -575,6 +575,95 @@ impl FileScanConfig { }) } +/// Splits file groups into new groups

Re: [PR] Migrate datafusion/sql tests to insta, part4 [datafusion]

2025-04-05 Thread via GitHub
qstommyshu commented on code in PR #15548: URL: https://github.com/apache/datafusion/pull/15548#discussion_r2025673832 ## datafusion/sql/tests/sql_integration.rs: ## @@ -2928,37 +3062,74 @@ fn select_groupby_orderby() { date_trunc('month', birth_date) AS "birth_date" FROM

Re: [PR] feat: implement GroupsAccumulator for `count(DISTINCT)` aggr [datafusion]

2025-04-05 Thread via GitHub
Dandandan commented on code in PR #15324: URL: https://github.com/apache/datafusion/pull/15324#discussion_r2005790136 ## datafusion/functions-aggregate/src/count.rs: ## @@ -752,10 +761,245 @@ impl Accumulator for DistinctCountAccumulator { } } +/// GroupsAccumulator for

Re: [PR] Add dynamic pruning filters from TopK state [datafusion]

2025-04-05 Thread via GitHub
adriangb commented on code in PR #15301: URL: https://github.com/apache/datafusion/pull/15301#discussion_r2020157010 ## datafusion/proto/src/physical_plan/to_proto.rs: ## @@ -210,6 +212,7 @@ pub fn serialize_physical_expr( value: &Arc, codec: &dyn PhysicalExtensionCode

Re: [I] Implement withField and dropField for struct types [datafusion-comet]

2025-04-05 Thread via GitHub
andygrove commented on issue #813: URL: https://github.com/apache/datafusion-comet/issues/813#issuecomment-2775866450 It sounds like we just need to add tests and update documentation. I have added this to the 0.8.0 milestone. -- This is an automated message from the Apache Git Service.

Re: [I] Implement authorization in DataFusion [datafusion]

2025-04-05 Thread via GitHub
adriangb commented on issue #15192: URL: https://github.com/apache/datafusion/issues/15192#issuecomment-2777223602 We do something along these lines. Not for security per-se so I'm not going to vouch for it being secure but I think in principle it is: ```rust fn register_filtered_v

Re: [PR] Draft: Make Clickbench Q29 5x faster for datafusion [datafusion]

2025-04-05 Thread via GitHub
zhuqi-lucas commented on PR #15532: URL: https://github.com/apache/datafusion/pull/15532#issuecomment-2771316983 More than 5X faster for clickbench Q29 with this PR: ```rust cargo run --profile release-nonlto --target aarch64-apple-darwin --bin dfbench -- clickbench -p benchmark

[PR] Improve performance of `last_value` by implementing special `GroupsAccumulator` [datafusion]

2025-04-05 Thread via GitHub
UBarney opened a new pull request, #15542: URL: https://github.com/apache/datafusion/pull/15542 ## Which issue does this PR close? - Closes #13998. ## Rationale for this change | benchmark sql

Re: [I] Make ClickBench Q23 Go Faster [datafusion]

2025-04-05 Thread via GitHub
Dandandan commented on issue #15177: URL: https://github.com/apache/datafusion/issues/15177#issuecomment-2740754487 Ah actually, the query given by @xudong963 is slightly off, I think it should be the following (without the explicit join): ``` > EXPLAIN (WITH ids AS (SELECT row_id,

Re: [I] Use `SpillManager` in `AggregateExec` and `SortMergeJoinExec` [datafusion]

2025-04-05 Thread via GitHub
alamb closed issue #15374: Use `SpillManager` in `AggregateExec` and `SortMergeJoinExec` URL: https://github.com/apache/datafusion/issues/15374 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the sp

Re: [PR] chore: update changelog for 45.0.0 [datafusion-ballista]

2025-04-05 Thread via GitHub
andygrove commented on code in PR #1218: URL: https://github.com/apache/datafusion-ballista/pull/1218#discussion_r2021855140 ## CHANGELOG.md: ## @@ -19,6 +19,33 @@ # Changelog +## [45.0.0](https://github.com/apache/datafusion-ballista/tree/44.0.0) (2025-03-30) Review Com

[I] Blog post about TopK filter pushdown [datafusion]

2025-04-05 Thread via GitHub
alamb opened a new issue, #15513: URL: https://github.com/apache/datafusion/issues/15513 ### Is your feature request related to a problem or challenge? @adriangb has a great PR to add dynamic filtering with TopK in DataFusion here: - https://github.com/apache/datafusion/pull/15301

Re: [I] Improve memory pool configuration code, documentation, and tests [datafusion-comet]

2025-04-05 Thread via GitHub
andygrove commented on issue #1560: URL: https://github.com/apache/datafusion-comet/issues/1560#issuecomment-2741121492 Correction: fair_unified was using unified memory after all, but only utilizing 20% of allocated off-heap memory with default settings -- This is an automated message f

Re: [PR] FIX : some benchmarks are failing [datafusion]

2025-04-05 Thread via GitHub
alamb commented on PR #15367: URL: https://github.com/apache/datafusion/pull/15367#issuecomment-2767065311 Let's get this in and keep iterating on the benchmarks -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

Re: [I] Ballista: Partition columns are duplicated in protobuf decoding. [datafusion-ballista]

2025-04-05 Thread via GitHub
iho commented on issue #484: URL: https://github.com/apache/datafusion-ballista/issues/484#issuecomment-2774514596 Hi Can I work on it? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the s

[PR] fix decimal precision issue in simplify expression optimize rule [datafusion]

2025-04-05 Thread via GitHub
jayzhan211 opened a new pull request, #15588: URL: https://github.com/apache/datafusion/pull/15588 ## Which issue does this PR close? - Closes #15174 . ## Rationale for this change ## What changes are included in this PR? ## Are these change

Re: [I] Set DataFusion runtime configurations through SQL interface [datafusion]

2025-04-05 Thread via GitHub
kumarlokesh commented on issue #15552: URL: https://github.com/apache/datafusion/issues/15552#issuecomment-2780371752 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. T

[PR] Enhance: simplify x=x [datafusion]

2025-04-05 Thread via GitHub
ding-young opened a new pull request, #15589: URL: https://github.com/apache/datafusion/pull/15589 ## Which issue does this PR close? - Closes #15387 ## Rationale for this change ## What changes are included in this PR? This pr adds a rule i

[PR] chore: rm duplicated `JoinOn` type [datafusion]

2025-04-05 Thread via GitHub
jayzhan211 opened a new pull request, #15590: URL: https://github.com/apache/datafusion/pull/15590 ## Which issue does this PR close? - Closes #. ## Rationale for this change ## What changes are included in this PR? ## Are these changes test

Re: [I] Regression with compound field access and join schema [datafusion]

2025-04-05 Thread via GitHub
kosiew commented on issue #15549: URL: https://github.com/apache/datafusion/issues/15549#issuecomment-2775023421 @alexwilcoxson-rel #15556 contains the slt tests which do not fail. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

Re: [PR] Blog post on Parquet filter pushdown [datafusion-site]

2025-04-05 Thread via GitHub
comphead commented on code in PR #61: URL: https://github.com/apache/datafusion-site/pull/61#discussion_r2006161294 ## content/blog/2025-03-21-parquet-pushdown.md: ## @@ -0,0 +1,259 @@ +--- +layout: post +title: Efficient Filter Pushdown in Parquet +date: 2025-03-21 +author: Xia

Re: [I] Improve performance if min/max aggregates for Durations [datafusion]

2025-04-05 Thread via GitHub
alamb commented on issue #15317: URL: https://github.com/apache/datafusion/issues/15317#issuecomment-2737899174 I think this is a pretty good first issue as it is well described and self contained -- This is an automated message from the Apache Git Service. To respond to the message, plea

Re: [PR] chore: Enable Comet explicitly in `CometTPCDSQueryTestSuite` [datafusion-comet]

2025-04-05 Thread via GitHub
andygrove commented on code in PR #1559: URL: https://github.com/apache/datafusion-comet/pull/1559#discussion_r2008011066 ## spark/src/test/scala/org/apache/spark/sql/CometTPCDSQueryTestSuite.scala: ## @@ -213,7 +221,7 @@ class CometTPCDSQueryTestSuite extends QueryTest with TP

Re: [PR] chore: Update links for released version [datafusion-comet]

2025-04-05 Thread via GitHub
parthchandra commented on code in PR #1540: URL: https://github.com/apache/datafusion-comet/pull/1540#discussion_r2001397547 ## docs/source/user-guide/kubernetes.md: ## @@ -65,31 +65,31 @@ metadata: spec: type: Scala mode: cluster - image: ghcr.io/apache/datafusion-comet

Re: [PR] feat: Support serde for JsonSource PhysicalPlan [datafusion]

2025-04-05 Thread via GitHub
alamb commented on PR #15311: URL: https://github.com/apache/datafusion/pull/15311#issuecomment-2740937315 Thanks @westhide For anyone following along: - https://github.com/apache/datafusion/pull/15335 -- This is an automated message from the Apache Git Service. To respond to t

Re: [PR] Improved error for expand wildcard rule [datafusion]

2025-04-05 Thread via GitHub
alamb commented on PR #15287: URL: https://github.com/apache/datafusion/pull/15287#issuecomment-2746384487 > This PR seems to have broken `main`. It is an easy fix (error messages do not match). > > @alamb -- Extended tests break very frequently these days. We should prioritize compl

Re: [I] Consolidate statistics aggregation [datafusion]

2025-04-05 Thread via GitHub
xudong963 commented on issue #8229: URL: https://github.com/apache/datafusion/issues/8229#issuecomment-2769492303 @alamb nearly, I'll check the union stats tomorrow -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the U

Re: [PR] chore: Reimplement ShuffleWriterExec using interleave_record_batch [datafusion-comet]

2025-04-05 Thread via GitHub
Kontinuation commented on PR #1511: URL: https://github.com/apache/datafusion-comet/pull/1511#issuecomment-2743369835 > Dumb question: is `take_record_batch` an option here (it looks like interleave can actually coalesce a vector of `RecordBatch`. What would the performance implication be

Re: [I] `BinaryExpr` evaluate lacks optimization for `Or` and `And` scenarios [datafusion]

2025-04-05 Thread via GitHub
acking-you commented on issue #11212: URL: https://github.com/apache/datafusion/issues/11212#issuecomment-2753584617 @alamb I sincerely apologize for not revisiting this issue or pushing forward with that [PR](https://github.com/apache/datafusion/pull/11247) for such a long time. However, t

Re: [I] array_has_any(column, []) with empty array throws RowConverter column schema mismatch, expected Utf8 got Int64 [datafusion]

2025-04-05 Thread via GitHub
alamb closed issue #15038: array_has_any(column, []) with empty array throws RowConverter column schema mismatch, expected Utf8 got Int64 URL: https://github.com/apache/datafusion/issues/15038 -- This is an automated message from the Apache Git Service. To respond to the message, please log o

Re: [I] Avoid evaluating filters when they can be discarded purely from statistics [datafusion]

2025-04-05 Thread via GitHub
matthewmturner commented on issue #15425: URL: https://github.com/apache/datafusion/issues/15425#issuecomment-2752619613 I think this is similar to something I recently asked on Discord - except I had in mind using only the metadata stats for queries like "SELECT MAX(timestamp) FROM quotes"

Re: [I] Support RangePartitioning with native shuffle [datafusion-comet]

2025-04-05 Thread via GitHub
andygrove commented on issue #458: URL: https://github.com/apache/datafusion-comet/issues/458#issuecomment-2762956253 > Hi @andygrove > May I ask why we decide not support RangePartitioning ? and will it be supported in the near future ? > Thanks We didn't decide not to suppor

Re: [I] ColumnarBatch cannot be cast to class InternalRow [datafusion-comet]

2025-04-05 Thread via GitHub
andygrove closed issue #1314: ColumnarBatch cannot be cast to class InternalRow URL: https://github.com/apache/datafusion-comet/issues/1314 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specifi

Re: [PR] chore: Remove redundant shims for getFailOnError [datafusion-comet]

2025-04-05 Thread via GitHub
codecov-commenter commented on PR #1608: URL: https://github.com/apache/datafusion-comet/pull/1608#issuecomment-2778929701 ## [Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/1608?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_ca

[I] Make `DiskManagerBuilder` [datafusion]

2025-04-05 Thread via GitHub
alamb opened a new issue, #15319: URL: https://github.com/apache/datafusion/issues/15319 ### Is your feature request related to a problem or challenge? Creating a `DiskManager` today is somewhat awkward as described by @2010YOUY01 in https://github.com/apache/datafusion/pull/14975#d

Re: [PR] feat(sql): add diagnostic for wrong number of function arguments [datafusion]

2025-04-05 Thread via GitHub
prowang01 commented on PR #15490: URL: https://github.com/apache/datafusion/pull/15490#issuecomment-2767268386 > This PR has several CI failures so marking as a draft while they are addressed. (I do this to make it easier to see what PRs are waiting on review) Thank you for the feedba

Re: [PR] Introduce load-balanced `split_groups_by_statistics` method [datafusion]

2025-04-05 Thread via GitHub
xudong963 commented on code in PR #15473: URL: https://github.com/apache/datafusion/pull/15473#discussion_r2024732164 ## datafusion/datasource/src/mod.rs: ## @@ -313,6 +314,78 @@ async fn find_first_newline( Ok(index) } +/// Generates test files with min-max statistics i

Re: [PR] chore: update changelog for 45.0.0 [datafusion-ballista]

2025-04-05 Thread via GitHub
milenkovicm commented on PR #1218: URL: https://github.com/apache/datafusion-ballista/pull/1218#issuecomment-2764703687 @andygrove if you agree we could release 45.0.0 as we're getting bit behind datafusion -- This is an automated message from the Apache Git Service. To respond to the m

Re: [PR] updatted github action by change version tag to sha hashes [datafusion]

2025-04-05 Thread via GitHub
Jiashu-Hu commented on PR #15315: URL: https://github.com/apache/datafusion/pull/15315#issuecomment-2741526839 > Well that is unfortunate. I wonder if the apache regex is correct - the one in the error message is not, should be `.*\/.*@[a-f0-9][a-f0-9][a-f0-9][a-f0-9][a-f0-9][a-f0-9][a-f0-9

Re: [PR] Improve performance of `first_value` by implementing special `GroupsAccumulator` [datafusion]

2025-04-05 Thread via GitHub
UBarney commented on code in PR #15266: URL: https://github.com/apache/datafusion/pull/15266#discussion_r2007338741 ## datafusion/functions-aggregate/src/first_last.rs: ## @@ -179,6 +292,420 @@ impl AggregateUDFImpl for FirstValue { } } +struct FirstPrimitiveGroupsAccumu

Re: [PR] docs: various improvements to tuning guide [datafusion-comet]

2025-04-05 Thread via GitHub
andygrove commented on code in PR #1525: URL: https://github.com/apache/datafusion-comet/pull/1525#discussion_r2004365325 ## docs/source/user-guide/tuning.md: ## @@ -17,18 +17,96 @@ specific language governing permissions and limitations under the License. --> -# Tuning Guid

Re: [PR] Add GLOBAL context/modifier to SET statements [datafusion-sqlparser-rs]

2025-04-05 Thread via GitHub
MohamedAbdeen21 commented on code in PR #1767: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1767#discussion_r2000753514 ## src/ast/mod.rs: ## @@ -7919,11 +7921,28 @@ impl fmt::Display for ContextModifier { write!(f, "") }

Re: [I] Apache DataFusion Google Summer of Code (GSoC) 2025 Application Guidelines [datafusion]

2025-04-05 Thread via GitHub
oznur-synnada commented on issue #14577: URL: https://github.com/apache/datafusion/issues/14577#issuecomment-203524 > Hello, [@ozankabak](https://github.com/ozankabak). May I ask for new discord invite link? I would like to take a look on ongoing discussion on project ideas, but the lin

Re: [PR] fix: aggregation corner case [datafusion]

2025-04-05 Thread via GitHub
jayzhan211 commented on code in PR #15457: URL: https://github.com/apache/datafusion/pull/15457#discussion_r2026221040 ## datafusion/functions-table/src/generate_series.rs: ## @@ -138,12 +138,15 @@ impl TableProvider for GenerateSeriesTable { async fn scan( &self,

Re: [PR] Add short circuit evaluation for `AND` and `OR` [datafusion]

2025-04-05 Thread via GitHub
Dandandan commented on code in PR #15462: URL: https://github.com/apache/datafusion/pull/15462#discussion_r2020558418 ## datafusion/physical-expr/src/expressions/binary.rs: ## @@ -805,6 +811,47 @@ impl BinaryExpr { } } +/// Check if it meets the short-circuit condition +

Re: [PR] Improve performance of `first_value` by implementing special `GroupsAccumulator` [datafusion]

2025-04-05 Thread via GitHub
UBarney commented on code in PR #15266: URL: https://github.com/apache/datafusion/pull/15266#discussion_r2001076145 ## datafusion/functions-aggregate/src/first_last.rs: ## @@ -179,6 +292,423 @@ impl AggregateUDFImpl for FirstValue { } } +struct FirstGroupsAccumulator Re

Re: [PR] Migrate physical plan tests to `insta` (Part-2) [datafusion]

2025-04-05 Thread via GitHub
alamb commented on PR #15364: URL: https://github.com/apache/datafusion/pull/15364#issuecomment-2749125261 > @alamb I updated everything. There are a few failing tests but they are failing due to time (lib) not getting compiled. This is unusual as tests passed the last time and I did not ad

Re: [I] [Rust] [datafusion] Allow integration in non libc environments [datafusion]

2025-04-05 Thread via GitHub
arpity22 commented on issue #102: URL: https://github.com/apache/datafusion/issues/102#issuecomment-2741300881 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsub

Re: [PR] feat: Add union_by_name, union_by_name_distinct to DataFrame api [datafusion]

2025-04-05 Thread via GitHub
alamb merged PR #15489: URL: https://github.com/apache/datafusion/pull/15489 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

[I] DeltaLake integration not working (Python) (FFI Table providers not working) [datafusion-python]

2025-04-05 Thread via GitHub
riziles opened a new issue, #1077: URL: https://github.com/apache/datafusion-python/issues/1077 ### Describe the bug After upgrading to deltalake (Python) 0.25.1, this basic example fails. Was working fine before. ```python from deltalake import DeltaTable, write_deltalake

Re: [PR] feat: pushdown filter for native_iceberg_compat [datafusion-comet]

2025-04-05 Thread via GitHub
wForget commented on code in PR #1566: URL: https://github.com/apache/datafusion-comet/pull/1566#discussion_r2016032409 ## spark/src/main/scala/org/apache/comet/parquet/SourceFilterSerde.scala: ## @@ -0,0 +1,175 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under o

Re: [I] Support zero copy hash repartitioning for Hash Join [datafusion]

2025-04-05 Thread via GitHub
goldmedal commented on issue #15382: URL: https://github.com/apache/datafusion/issues/15382#issuecomment-2751311833 > * Add a mode that outputs selection vectors (for now let's use dense boolean arrays so it can be added to `RecordBatch`) in `RepartitionExec`. The array outputs `true` f

[PR] perf: Introduce sort prefix computation for early TopK exit optimization on partially sorted input [datafusion]

2025-04-05 Thread via GitHub
geoffreyclaude opened a new pull request, #15563: URL: https://github.com/apache/datafusion/pull/15563 ## Which issue does this PR close? - Closes #15529. ## Rationale for this change ## What changes are included in this PR? ## Are these changes tes

Re: [I] Emit warning with attached `Diagnostic` when doing `= NULL` [datafusion]

2025-04-05 Thread via GitHub
changsun20 commented on issue #14434: URL: https://github.com/apache/datafusion/issues/14434#issuecomment-2755932600 Quick update: I was a bit busy with schoolwork last week, but I’ll try to fix this ticket this week. Thanks for your patience! -- This is an automated message from the Apa

Re: [PR] chore: Fix some inconsistencies in memory pool configuration [datafusion-comet]

2025-04-05 Thread via GitHub
viirya commented on code in PR #1561: URL: https://github.com/apache/datafusion-comet/pull/1561#discussion_r2008080558 ## spark/src/main/scala/org/apache/comet/CometExecIterator.scala: ## @@ -63,9 +64,28 @@ class CometExecIterator( }.toArray private val plan = { val c

Re: [PR] Support computing statistics for FileGroup [datafusion]

2025-04-05 Thread via GitHub
jayzhan211 commented on code in PR #15432: URL: https://github.com/apache/datafusion/pull/15432#discussion_r2016084730 ## datafusion/core/src/datasource/statistics.rs: ## @@ -145,7 +147,142 @@ pub async fn get_statistics_with_limit( Ok((result_files, statistics)) } -fn a

Re: [PR] perf: replace `merge` `uninitiated_partitions` `VecDeque` with custom fixed size queue [datafusion]

2025-04-05 Thread via GitHub
ctsk commented on PR #15562: URL: https://github.com/apache/datafusion/pull/15562#issuecomment-2776775364 As I *just* learned, VecDeque already has an API (e.g. `VecDeque::insert`) that let you use it as a fixed-size queue. Is that insufficient for this use case? -- This is an automated

Re: [I] Failed optimizations with Int64 type [datafusion]

2025-04-05 Thread via GitHub
aectaan commented on issue #15291: URL: https://github.com/apache/datafusion/issues/15291#issuecomment-2743057983 @alamb What's the reason to cast numeric columns to i64/u64 and not to smallest compatible type? Why not to try something like this: ```rust macro_rules! try_parse_

Re: [PR] feat: add test to check for `ctx.read_json()` [datafusion-ballista]

2025-04-05 Thread via GitHub
milenkovicm commented on code in PR #1212: URL: https://github.com/apache/datafusion-ballista/pull/1212#discussion_r2020224134 ## ballista/scheduler/src/state/task_manager.rs: ## @@ -524,24 +524,35 @@ impl TaskManager pub(crate) async fn launch_multi_task( &self,

[I] Enable `tree` explain by default [datafusion]

2025-04-05 Thread via GitHub
alamb opened a new issue, #15343: URL: https://github.com/apache/datafusion/issues/15343 ### Is your feature request related to a problem or challenge? _No response_ ### Describe the solution you'd like _No response_ ### Describe alternatives you've considered

Re: [D] More thorough contribution guideline [datafusion]

2025-04-05 Thread via GitHub
GitHub user logan-keede added a comment to the discussion: More thorough contribution guideline > iii. Collect feedback from downstream projects to reveal any possible design > issues Do we have any communication channel for collecting feedback, or announcing feature branch? @alamb What are

Re: [PR] Add documentation example for `AggregateExprBuilder` [datafusion]

2025-04-05 Thread via GitHub
Shreyaskr1409 commented on code in PR #15504: URL: https://github.com/apache/datafusion/pull/15504#discussion_r2021131776 ## datafusion/physical-expr/src/aggregate.rs: ## @@ -97,6 +97,167 @@ impl AggregateExprBuilder { /// Constructs an `AggregateFunctionExpr` from the buil

Re: [PR] Fix predicate pushdown for custom SchemaAdapters [datafusion]

2025-04-05 Thread via GitHub
alamb commented on PR #15263: URL: https://github.com/apache/datafusion/pull/15263#issuecomment-2734535392 Thanks for bearing with me here -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spec

Re: [I] Add documentation about how to plan custom expressions [datafusion]

2025-04-05 Thread via GitHub
alamb commented on issue #15267: URL: https://github.com/apache/datafusion/issues/15267#issuecomment-2744196342 There might be an example in https://github.com/apache/datafusion/tree/main/datafusion-examples you could port to the docs -- This is an automated message from the Apache Git S

Re: [PR] Fix duplicate unqualified Field name (schema error) on join queries [datafusion]

2025-04-05 Thread via GitHub
alamb merged PR #15438: URL: https://github.com/apache/datafusion/pull/15438 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] Add all missing table options to be handled in any order [datafusion-sqlparser-rs]

2025-04-05 Thread via GitHub
mvzink commented on code in PR #1747: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1747#discussion_r2025390474 ## src/parser/mod.rs: ## @@ -7081,18 +7029,243 @@ impl<'a> Parser<'a> { if let Token::Word(word) = self.peek_token().token {

Re: [PR] feat: introduce hadoop mini cluster to test native scan on hdfs [datafusion-comet]

2025-04-05 Thread via GitHub
parthchandra commented on code in PR #1556: URL: https://github.com/apache/datafusion-comet/pull/1556#discussion_r2006524311 ## spark/src/test/scala/org/apache/spark/sql/benchmark/CometReadBenchmark.scala: ## @@ -63,6 +65,7 @@ object CometReadBenchmark extends CometBenchmarkBase

Re: [PR] GroupsAccumulator for Duration (#15322) [datafusion]

2025-04-05 Thread via GitHub
emilk commented on PR #15522: URL: https://github.com/apache/datafusion/pull/15522#issuecomment-2768532598 Sorry, wrong repository -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific com

Re: [I] RecordBatchReader respect `COMET_BATCH_SIZE` [datafusion-comet]

2025-04-05 Thread via GitHub
andygrove closed issue #1571: RecordBatchReader respect `COMET_BATCH_SIZE` URL: https://github.com/apache/datafusion-comet/issues/1571 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific com

  1   2   3   4   5   >