Re: [I] feature: adapt predicate pushdown for mismatched nested/struct schemas [datafusion]

2025-07-04 Thread via GitHub
adriangb commented on issue #16565: URL: https://github.com/apache/datafusion/issues/16565#issuecomment-3035790931 Question: will we be able to handle something like `Dict(UInt32, List(List(Struct(...`? -- This is an automated message from the Apache Git Service. To respond to the mes

[PR] Improve `ScalarUDFImpl::equals` Default Behavior and Add Unit Tests for Custom Equality [datafusion]

2025-07-04 Thread via GitHub
kosiew opened a new pull request, #16681: URL: https://github.com/apache/datafusion/pull/16681 ## Which issue does this PR close? - Closes [#16677](https://github.com/apache/datafusion/issues/16677) ## Rationale for this change The default implementation of `ScalarUDF

Re: [I] Release DataFusion `48.0.1` [datafusion]

2025-07-04 Thread via GitHub
alamb commented on issue #16486: URL: https://github.com/apache/datafusion/issues/16486#issuecomment-3035205437 Ok, I have backported the three identified issues and I will now create a release candidate -- This is an automated message from the Apache Git Service. To respond to the messag

Re: [PR] Draft : Accelerating Query Processing in DataFusion with Embedded Parquet Indexes [datafusion-site]

2025-07-04 Thread via GitHub
alamb commented on PR #79: URL: https://github.com/apache/datafusion-site/pull/79#issuecomment-3035285253 This is amazing -- thank you @zhuqi-lucas and @2010YOUY01 -- I will review this asap, but as today is a holiday in the US I may not have a chance to do so until tomorrow. -- This i

Re: [PR] [branch-48] Prepare 48.0.1 ad CHANGELOG [datafusion]

2025-07-04 Thread via GitHub
alamb merged PR #16679: URL: https://github.com/apache/datafusion/pull/16679 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] Draft : Accelerating Query Processing in DataFusion with Embedded Parquet Indexes [datafusion-site]

2025-07-04 Thread via GitHub
zhuqi-lucas commented on PR #79: URL: https://github.com/apache/datafusion-site/pull/79#issuecomment-3035225130 > This post is great, I find the content easy to follow. > > I have a suggestion for the first paragraph though: perhaps we should emphasize the motivation more clearly at t

Re: [PR] Draft : Accelerating Query Processing in DataFusion with Embedded Parquet Indexes [datafusion-site]

2025-07-04 Thread via GitHub
zhuqi-lucas commented on PR #79: URL: https://github.com/apache/datafusion-site/pull/79#issuecomment-3035299849 Thank you @alamb , i will keep polishing it before you reviewing! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to Git

Re: [I] feature: adapt predicate pushdown for mismatched nested/struct schemas [datafusion]

2025-07-04 Thread via GitHub
kosiew commented on issue #16565: URL: https://github.com/apache/datafusion/issues/16565#issuecomment-3035480855 @adriangb, > incorporating the struct-aware casting logic from #16371 into `CastExpr` and `TryCastExpr` Yes. I think it is a necessary step to explore the feasibilit

Re: [PR] refactor: shrink `SchemaError` [datafusion]

2025-07-04 Thread via GitHub
crepererum commented on PR #16653: URL: https://github.com/apache/datafusion/pull/16653#issuecomment-3035482230 Rebased after #16672. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[PR] Simplify HTML Formatter Style Handling Using Script Injection [datafusion-python]

2025-07-04 Thread via GitHub
kosiew opened a new pull request, #1177: URL: https://github.com/apache/datafusion-python/pull/1177 ## Which issue does this PR close? - Closes #1171. ## Rationale for this change This change simplifies the logic for injecting HTML styles in the `DataFrameHtmlFormatter`

Re: [PR] Only update TopK dynamic filters if the new ones are more selective [datafusion]

2025-07-04 Thread via GitHub
Dandandan commented on code in PR #16433: URL: https://github.com/apache/datafusion/pull/16433#discussion_r2185314221 ## datafusion/physical-plan/src/topk/mod.rs: ## @@ -214,41 +238,39 @@ impl TopK { let mut selected_rows = None; -if let Some(filter) = self.

Re: [PR] Only update TopK dynamic filters if the new ones are more selective [datafusion]

2025-07-04 Thread via GitHub
adriangb commented on code in PR #16433: URL: https://github.com/apache/datafusion/pull/16433#discussion_r2185318951 ## datafusion/physical-plan/src/topk/mod.rs: ## @@ -214,41 +238,39 @@ impl TopK { let mut selected_rows = None; -if let Some(filter) = self.f

Re: [PR] Comet 0.9.0 [datafusion-site]

2025-07-04 Thread via GitHub
andygrove commented on code in PR #78: URL: https://github.com/apache/datafusion-site/pull/78#discussion_r2185674170 ## content/blog/2025-07-01-datafusion-comet-0.9.0.md: ## @@ -0,0 +1,176 @@ +--- +layout: post +title: Apache DataFusion Comet 0.9.0 Release +date: 2025-07-01 +aut

Re: [PR] feat: add multi level merge sort that will always fit in memory [datafusion]

2025-07-04 Thread via GitHub
rluvaton commented on PR #15700: URL: https://github.com/apache/datafusion/pull/15700#issuecomment-3036724762 So should I fix this PR conflicts? It seems like this pr has a change to be merged -- This is an automated message from the Apache Git Service. To respond to the message, please

Re: [PR] Draft : Accelerating Query Processing in DataFusion with Embedded Parquet Indexes [datafusion-site]

2025-07-04 Thread via GitHub
2010YOUY01 commented on PR #79: URL: https://github.com/apache/datafusion-site/pull/79#issuecomment-3035056285 This post is great, I find the content easy to follow. I have a suggestion for the first paragraph though: perhaps we should emphasize the motivation more clearly at the begi

Re: [I] ScalarUDFImpl::equals default implementation is error-prone [datafusion]

2025-07-04 Thread via GitHub
alamb commented on issue #16677: URL: https://github.com/apache/datafusion/issues/16677#issuecomment-3035199021 🤔 That is a tricky business Maybe ScalarUDFImpl needs some way to have `eq` called on it too 🤔 -- This is an automated message from the Apache Git Service. To respond to

[PR] [branch-48] Prepare 48.0.1 [datafusion]

2025-07-04 Thread via GitHub
alamb opened a new pull request, #16679: URL: https://github.com/apache/datafusion/pull/16679 ## Which issue does this PR close? - Part of https://github.com/apache/datafusion/issues/16486 - Related to https://github.com/apache/datafusion/issues/16626 ## Rationale for

[PR] Add link to upgrade guide in changelog script [datafusion]

2025-07-04 Thread via GitHub
alamb opened a new pull request, #16680: URL: https://github.com/apache/datafusion/pull/16680 ## Which issue does this PR close? - Closes https://github.com/apache/datafusion/issues/16626 ## Rationale for this change As @jonmmease pointed out, for people seeing t

[PR] Remove unused AggregateUDF struct [datafusion]

2025-07-04 Thread via GitHub
ViggoC opened a new pull request, #16683: URL: https://github.com/apache/datafusion/pull/16683 AggregateUDF is not used. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[I] `DataSourceExec` is projecting/reading unused columns from Parquet files for recursive queries [datafusion]

2025-07-04 Thread via GitHub
debajyoti-truefoundry opened a new issue, #16684: URL: https://github.com/apache/datafusion/issues/16684 ### Describe the bug I am on datafusion 47. ```rust use arrow::array::Int64Array; use arrow::datatypes::{DataType, Field, Schema}; use arrow::record_batch::RecordBat

Re: [PR] Add dynamic filter (bounds) pushdown to HashJoinExec [datafusion]

2025-07-04 Thread via GitHub
LiaCastaneda commented on code in PR #16445: URL: https://github.com/apache/datafusion/pull/16445#discussion_r2184901077 ## datafusion/physical-plan/src/joins/hash_join.rs: ## @@ -1039,12 +1196,50 @@ async fn collect_left_input( let data = JoinLeftData::new( hashma

Re: [PR] Add a note about Boxing errors in upgrade guide [datafusion]

2025-07-04 Thread via GitHub
alamb commented on code in PR #16673: URL: https://github.com/apache/datafusion/pull/16673#discussion_r2184855966 ## docs/source/library-user-guide/upgrading.md: ## @@ -24,6 +24,38 @@ **Note:** DataFusion `49.0.0` has not been released yet. The information provided in this sec

Re: [PR] Add a note about Boxing errors in upgrade guide [datafusion]

2025-07-04 Thread via GitHub
alamb closed pull request #16673: Add a note about Boxing errors in upgrade guide URL: https://github.com/apache/datafusion/pull/16673 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific co

Re: [PR] Add a note about Boxing errors in upgrade guide [datafusion]

2025-07-04 Thread via GitHub
alamb commented on PR #16673: URL: https://github.com/apache/datafusion/pull/16673#issuecomment-3035186129 @kosiew included this content in - https://github.com/apache/datafusion/pull/16672 -- This is an automated message from the Apache Git Service. To respond to the message, please l

Re: [PR] Change tag and policy names to `ObjectName` instead of `Ident` [datafusion-sqlparser-rs]

2025-07-04 Thread via GitHub
eliaperantoni commented on code in PR #1892: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1892#discussion_r2184969127 ## src/parser/mod.rs: ## @@ -7866,7 +7866,7 @@ impl<'a> Parser<'a> { } pub(crate) fn parse_tag(&mut self) -> Result { -let na

Re: [PR] refactor: shrink `SchemaError` [datafusion]

2025-07-04 Thread via GitHub
crepererum commented on PR #16653: URL: https://github.com/apache/datafusion/pull/16653#issuecomment-3035427853 > Thanks @crepererum for taking care of that, I suspect it is motivated by clippy complains [rust-lang.github.io/rust-clippy/v0.0.212#large_enum_variant](https://rust-lang.github.

Re: [PR] fix: create file for empty stream [datafusion]

2025-07-04 Thread via GitHub
chenkovsky commented on code in PR #16342: URL: https://github.com/apache/datafusion/pull/16342#discussion_r2185571139 ## datafusion/datasource/src/file_sink_config.rs: ## @@ -77,13 +79,34 @@ pub trait FileSink: DataSink { .runtime_env() .object_store(&

[PR] Nick/readme update [datafusion-python]

2025-07-04 Thread via GitHub
ntjohnson1 opened a new pull request, #1179: URL: https://github.com/apache/datafusion-python/pull/1179 # Which issue does this PR close? Closes #1178 # Rationale for this change Explained in issue. Trying to bring develop instructions up to current experience getting started

[I] How To Develop Misses Edge cases [datafusion-python]

2025-07-04 Thread via GitHub
ntjohnson1 opened a new issue, #1178: URL: https://github.com/apache/datafusion-python/issues/1178 **Describe the bug** Running the steps for how to develop on a clean mac leads to some missing package errors. 1. Running maturin develop command initial errors that `protoc` can't be fo

[PR] Partially implement MATCH_RECOGNIZE for Advanced Pattern Matching [datafusion]

2025-07-04 Thread via GitHub
geoffreyclaude opened a new pull request, #16685: URL: https://github.com/apache/datafusion/pull/16685 ## Which issue does this PR close? - Draft attempt at #13583. ## How to review? - The sqllogic tests in the `datafusion/sqllogictest/test_files/match_recognize` cover e

Re: [PR] Only update TopK dynamic filters if the new ones are more selective [datafusion]

2025-07-04 Thread via GitHub
adriangb commented on code in PR #16433: URL: https://github.com/apache/datafusion/pull/16433#discussion_r2185669579 ## datafusion/physical-plan/src/topk/mod.rs: ## @@ -214,41 +238,39 @@ impl TopK { let mut selected_rows = None; -if let Some(filter) = self.f

Re: [PR] feat: add multi level merge sort that will always fit in memory [datafusion]

2025-07-04 Thread via GitHub
ding-young commented on PR #15700: URL: https://github.com/apache/datafusion/pull/15700#issuecomment-3036770691 @rluvaton If you’d like, I can send a PR to your (fork's) branch that resolve merge conflicts since I already have one. Anyway there were only minor diffs to handle when I rebased

Re: [PR] fix: sqllogictest runner label condition mismatch [datafusion]

2025-07-04 Thread via GitHub
gabotechs commented on code in PR #16633: URL: https://github.com/apache/datafusion/pull/16633#discussion_r2185709428 ## datafusion/sqllogictest/bin/sqllogictests.rs: ## @@ -243,7 +243,7 @@ async fn run_test_file_substrait_round_trip( }; setup_scratch_dir(&relative_pat

Re: [PR] feat: add multi level merge sort that will always fit in memory [datafusion]

2025-07-04 Thread via GitHub
rluvaton commented on PR #15700: URL: https://github.com/apache/datafusion/pull/15700#issuecomment-3036807094 I would appreciate it, it would greatly help me -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abov

Re: [PR] Change tag and policy names to `ObjectName` [datafusion-sqlparser-rs]

2025-07-04 Thread via GitHub
iffyio merged PR #1892: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1892 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr

Re: [PR] Simplify HTML Formatter Style Handling Using Script Injection [datafusion-python]

2025-07-04 Thread via GitHub
timsaucer merged PR #1177: URL: https://github.com/apache/datafusion-python/pull/1177 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...

Re: [PR] fix: create file for empty stream [datafusion]

2025-07-04 Thread via GitHub
brunal commented on code in PR #16342: URL: https://github.com/apache/datafusion/pull/16342#discussion_r2185262228 ## datafusion/datasource/src/file_sink_config.rs: ## @@ -77,13 +79,34 @@ pub trait FileSink: DataSink { .runtime_env() .object_store(&conf

Re: [I] Add DOM-guarded CSS/JS injection to DataFrameHtmlFormatter to prevent duplicate style/script inserts [datafusion-python]

2025-07-04 Thread via GitHub
timsaucer closed issue #1171: Add DOM-guarded CSS/JS injection to DataFrameHtmlFormatter to prevent duplicate style/script inserts URL: https://github.com/apache/datafusion-python/issues/1171 -- This is an automated message from the Apache Git Service. To respond to the message, please log on

Re: [PR] fix: create file for empty stream [datafusion]

2025-07-04 Thread via GitHub
brunal commented on PR #16342: URL: https://github.com/apache/datafusion/pull/16342#issuecomment-3036097695 User facing breakage: one cannot explicitly write an empty recordbatch that has a schema anymore. The tests in the PR don't have a schema so they don't reveal the issue. -- T

Re: [PR] fix: create file for empty stream [datafusion]

2025-07-04 Thread via GitHub
brunal commented on PR #16342: URL: https://github.com/apache/datafusion/pull/16342#issuecomment-3036100543 How can i send a rollback of this PR? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to t

[PR] Revert "fix: create file for empty stream" [datafusion]

2025-07-04 Thread via GitHub
brunal opened a new pull request, #16682: URL: https://github.com/apache/datafusion/pull/16682 Reverts apache/datafusion#16342 After that change, one cannot write an empty RecordBatch with a schema to parquet anymore. Indeed, the logic added tries to write an empty recordbatch

Re: [PR] Only update TopK dynamic filters if the new ones are more selective [datafusion]

2025-07-04 Thread via GitHub
adriangb commented on PR #16433: URL: https://github.com/apache/datafusion/pull/16433#issuecomment-3036131650 I did a bench run, confounding results: ``` ┏━━┳━━┳━━┳━━━┓ ┃ Query┃ main ┃ topk-filters ┃Change ┃ ┡

Re: [PR] Only update TopK dynamic filters if the new ones are more selective [datafusion]

2025-07-04 Thread via GitHub
Dandandan commented on code in PR #16433: URL: https://github.com/apache/datafusion/pull/16433#discussion_r2185350825 ## datafusion/physical-plan/src/topk/mod.rs: ## @@ -214,41 +238,39 @@ impl TopK { let mut selected_rows = None; -if let Some(filter) = self.

Re: [PR] fix: create file for empty stream [datafusion]

2025-07-04 Thread via GitHub
chenkovsky commented on code in PR #16342: URL: https://github.com/apache/datafusion/pull/16342#discussion_r2185358781 ## datafusion/datasource/src/file_sink_config.rs: ## @@ -77,13 +79,34 @@ pub trait FileSink: DataSink { .runtime_env() .object_store(&

Re: [PR] Extend binary coercion rules to support Decimal arithmetic operations with integer(signed and unsigned) types [datafusion]

2025-07-04 Thread via GitHub
adriangb commented on PR #16668: URL: https://github.com/apache/datafusion/pull/16668#issuecomment-3038036678 note to self: check if these improvements trickle down into the physical optimizers -- This is an automated message from the Apache Git Service. To respond to the message, please

[PR] Support for ClickHouse CREATE TABLE .... Engine = MergeTree() [datafusion-sqlparser-rs]

2025-07-04 Thread via GitHub
solontsev opened a new pull request, #1925: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1925 Closes #1853 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. T

Re: [PR] Support for ClickHouse CREATE TABLE .... Engine = MergeTree() [datafusion-sqlparser-rs]

2025-07-04 Thread via GitHub
solontsev commented on PR #1925: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1925#issuecomment-3038044031 I made it more permissive by allowing empty parentheses. I'd appreciate your feedback on this -- This is an automated message from the Apache Git Service. To respond

Re: [I] Support Push down expression evaluation in `TableProviders` [datafusion]

2025-07-04 Thread via GitHub
adriangb commented on issue #14993: URL: https://github.com/apache/datafusion/issues/14993#issuecomment-3038010228 > > Related blog from [@gatesn](https://github.com/gatesn) > > https://blog.spiraldb.com/what-if-we-just-didnt-decompress-it/ > > I now see how the vision of this can c

Re: [PR] Draft : Accelerating Query Processing in DataFusion with Embedded Parquet Indexes [datafusion-site]

2025-07-04 Thread via GitHub
zhuqi-lucas commented on code in PR #79: URL: https://github.com/apache/datafusion-site/pull/79#discussion_r2186669292 ## content/blog/datafusion-custom-parquet-index.md: ## @@ -0,0 +1,232 @@ +## Accelerating Query Processing in DataFusion with Embedded Parquet Indexes + +It’s a

Re: [PR] Draft : Accelerating Query Processing in DataFusion with Embedded Parquet Indexes [datafusion-site]

2025-07-04 Thread via GitHub
zhuqi-lucas commented on code in PR #79: URL: https://github.com/apache/datafusion-site/pull/79#discussion_r2186696545 ## content/blog/datafusion-custom-parquet-index.md: ## @@ -0,0 +1,251 @@ +## Extending Parquet with Embedded Indexes and Accelerating Query Processing with Dat

Re: [PR] Draft : Accelerating Query Processing in DataFusion with Embedded Parquet Indexes [datafusion-site]

2025-07-04 Thread via GitHub
zhuqi-lucas commented on PR #79: URL: https://github.com/apache/datafusion-site/pull/79#issuecomment-3038122113 Thank you @alamb ! Addressed comments for the first round, but the image still not add to the content due to it not showing well in my local. -- This is an automated message fro

Re: [PR] Refactor SortMergeJoinMetrics to reuse BaselineMetrics [datafusion]

2025-07-04 Thread via GitHub
comphead commented on code in PR #16675: URL: https://github.com/apache/datafusion/pull/16675#discussion_r2185865853 ## datafusion/physical-plan/src/joins/sort_merge_join.rs: ## @@ -2032,7 +2034,10 @@ impl SortMergeJoinStream { let record_batch = concat_bat

Re: [PR] Refactor SortMergeJoinMetrics to reuse BaselineMetrics [datafusion]

2025-07-04 Thread via GitHub
Standing-Man commented on code in PR #16675: URL: https://github.com/apache/datafusion/pull/16675#discussion_r2186251681 ## datafusion/physical-plan/src/joins/sort_merge_join.rs: ## @@ -2032,7 +2034,10 @@ impl SortMergeJoinStream { let record_batch = concat

Re: [PR] Only update TopK dynamic filters if the new ones are more selective [datafusion]

2025-07-04 Thread via GitHub
Dandandan commented on code in PR #16433: URL: https://github.com/apache/datafusion/pull/16433#discussion_r2186109216 ## datafusion/physical-plan/src/topk/mod.rs: ## @@ -319,19 +342,75 @@ impl TopK { /// (a > 2 OR (a = 2 AND b < 3)) /// ``` fn update_filter(&mut s

[PR] chore(devcontainer): use debian's `protobuf-compiler` package [datafusion]

2025-07-04 Thread via GitHub
fvj opened a new pull request, #16687: URL: https://github.com/apache/datafusion/pull/16687 ## Which issue does this PR close? None. It's too small of a change for an issue, in my opinion. Happy to retroactively create one, though. ## Rationale for this change unfortunat

Re: [PR] Draft : Accelerating Query Processing in DataFusion with Embedded Parquet Indexes [datafusion-site]

2025-07-04 Thread via GitHub
zhuqi-lucas commented on PR #79: URL: https://github.com/apache/datafusion-site/pull/79#issuecomment-3037856509 > https://docs.google.com/presentation/d/1aFjTLEDJyDqzFZHgcmRxecCvLKKXV2OvyEpTQFCNZPw/edit?slide=id.g33d7337a5a0_0_85 Thank you @alamb for review and great suggestions! I wi

Re: [PR] Draft : Accelerating Query Processing in DataFusion with Embedded Parquet Indexes [datafusion-site]

2025-07-04 Thread via GitHub
zhuqi-lucas commented on code in PR #79: URL: https://github.com/apache/datafusion-site/pull/79#discussion_r2186508081 ## content/blog/datafusion-custom-parquet-index.md: ## @@ -0,0 +1,232 @@ +## Accelerating Query Processing in DataFusion with Embedded Parquet Indexes + +It’s a

Re: [PR] Draft : Accelerating Query Processing in DataFusion with Embedded Parquet Indexes [datafusion-site]

2025-07-04 Thread via GitHub
zhuqi-lucas commented on code in PR #79: URL: https://github.com/apache/datafusion-site/pull/79#discussion_r2186552221 ## content/blog/datafusion-custom-parquet-index.md: ## @@ -0,0 +1,232 @@ +## Accelerating Query Processing in DataFusion with Embedded Parquet Indexes + +It’s a

Re: [PR] Draft : Accelerating Query Processing in DataFusion with Embedded Parquet Indexes [datafusion-site]

2025-07-04 Thread via GitHub
zhuqi-lucas commented on code in PR #79: URL: https://github.com/apache/datafusion-site/pull/79#discussion_r2186527671 ## content/blog/datafusion-custom-parquet-index.md: ## @@ -0,0 +1,232 @@ +## Accelerating Query Processing in DataFusion with Embedded Parquet Indexes + +It’s a

Re: [PR] Draft : Accelerating Query Processing in DataFusion with Embedded Parquet Indexes [datafusion-site]

2025-07-04 Thread via GitHub
zhuqi-lucas commented on code in PR #79: URL: https://github.com/apache/datafusion-site/pull/79#discussion_r2186543546 ## content/blog/datafusion-custom-parquet-index.md: ## @@ -0,0 +1,232 @@ +## Accelerating Query Processing in DataFusion with Embedded Parquet Indexes + +It’s a

Re: [PR] Draft : Accelerating Query Processing in DataFusion with Embedded Parquet Indexes [datafusion-site]

2025-07-04 Thread via GitHub
zhuqi-lucas commented on code in PR #79: URL: https://github.com/apache/datafusion-site/pull/79#discussion_r2186572247 ## content/blog/datafusion-custom-parquet-index.md: ## @@ -0,0 +1,232 @@ +## Accelerating Query Processing in DataFusion with Embedded Parquet Indexes + +It’s a

Re: [PR] Refactor StreamJoinMetrics to reuse BaselineMetrics [datafusion]

2025-07-04 Thread via GitHub
Standing-Man commented on code in PR #16674: URL: https://github.com/apache/datafusion/pull/16674#discussion_r2184687258 ## datafusion/physical-plan/src/joins/symmetric_hash_join.rs: ## @@ -1375,7 +1375,7 @@ impl SymmetricHashJoinStream { } Some

Re: [PR] Draft : Accelerating Query Processing in DataFusion with Embedded Parquet Indexes [datafusion-site]

2025-07-04 Thread via GitHub
zhuqi-lucas commented on PR #79: URL: https://github.com/apache/datafusion-site/pull/79#issuecomment-3034853896 I am not expert for blog, welcome folks to polish it together, thanks a lot! cc @alamb -- This is an automated message from the Apache Git Service. To respond to the message, p

Re: [I] Blog Post for Accelerating Query Processing with Specialized Indexes [datafusion]

2025-07-04 Thread via GitHub
zhuqi-lucas commented on issue #16372: URL: https://github.com/apache/datafusion/issues/16372#issuecomment-3034853243 Submit the draft blog: https://github.com/apache/datafusion-site/pull/79 I am not expert for blog, welcome folks to polish it together, thanks a lot! -- This is a

Re: [PR] Refactor StreamJoinMetrics to reuse BaselineMetrics [datafusion]

2025-07-04 Thread via GitHub
2010YOUY01 commented on code in PR #16674: URL: https://github.com/apache/datafusion/pull/16674#discussion_r2184635238 ## datafusion/physical-plan/src/joins/symmetric_hash_join.rs: ## @@ -1375,7 +1375,7 @@ impl SymmetricHashJoinStream { } Some((

Re: [PR] Refactor SortMergeJoinMetrics to reuse BaselineMetrics [datafusion]

2025-07-04 Thread via GitHub
2010YOUY01 commented on code in PR #16675: URL: https://github.com/apache/datafusion/pull/16675#discussion_r2184636583 ## datafusion/physical-plan/src/joins/sort_merge_join.rs: ## @@ -2032,7 +2034,10 @@ impl SortMergeJoinStream { let record_batch = concat_b

[PR] Draft : Accelerating Query Processing in DataFusion with Embedded Parquet Indexes [datafusion-site]

2025-07-04 Thread via GitHub
zhuqi-lucas opened a new pull request, #79: URL: https://github.com/apache/datafusion-site/pull/79 Try to blog our work for the custom parquet example for datafusion: https://github.com/apache/datafusion/pull/16395 And also close this ticket: https://github.com/apache/dat

Re: [PR] Cascaded spill merge and re-spill [datafusion]

2025-07-04 Thread via GitHub
2010YOUY01 closed pull request #15610: Cascaded spill merge and re-spill URL: https://github.com/apache/datafusion/pull/15610 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [PR] Cascaded spill merge and re-spill [datafusion]

2025-07-04 Thread via GitHub
2010YOUY01 commented on PR #15610: URL: https://github.com/apache/datafusion/pull/15610#issuecomment-3034782742 > > tested my fuzz tests with this pr and all of them are failing currently > > Update: I think the failure is not due to this PR's implementation, instead it's caused by `F

Re: [PR] Reuse Rows allocation in RowCursorStream [datafusion]

2025-07-04 Thread via GitHub
Dandandan merged PR #16647: URL: https://github.com/apache/datafusion/pull/16647 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@data

Re: [I] Reuse Rows allocation in SortPreservingMergeStream / `RowCursorStream` [datafusion]

2025-07-04 Thread via GitHub
Dandandan closed issue #15720: Reuse Rows allocation in SortPreservingMergeStream / `RowCursorStream` URL: https://github.com/apache/datafusion/issues/15720 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

Re: [I] Blog Post for Accelerating Query Processing with Specialized Indexes [datafusion]

2025-07-04 Thread via GitHub
zhuqi-lucas commented on issue #16372: URL: https://github.com/apache/datafusion/issues/16372#issuecomment-3034802176 I will submit a draft blog soon, thank you @alamb ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

Re: [I] Blog Post for Accelerating Query Processing with Specialized Indexes [datafusion]

2025-07-04 Thread via GitHub
zhuqi-lucas commented on issue #16372: URL: https://github.com/apache/datafusion/issues/16372#issuecomment-3034801355 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. T

Re: [PR] Add an example of embedding indexes inside a parquet file [datafusion]

2025-07-04 Thread via GitHub
zhuqi-lucas commented on PR #16395: URL: https://github.com/apache/datafusion/pull/16395#issuecomment-3034804227 Thank you @alamb , i will submit a draft blog soon in: https://github.com/apache/datafusion/issues/16372 -- This is an automated message from the Apache Git Service. To r

Re: [PR] feat: add multi level merge sort that will always fit in memory [datafusion]

2025-07-04 Thread via GitHub
2010YOUY01 commented on PR #15700: URL: https://github.com/apache/datafusion/pull/15700#issuecomment-3034804482 https://github.com/apache/datafusion/pull/15700#discussion_r2041372025 I have a idea to fix this concern: adding a max merge degree configuration, if either a. SPM's est

Re: [PR] Refactor SortMergeJoinMetrics to reuse BaselineMetrics [datafusion]

2025-07-04 Thread via GitHub
Standing-Man commented on code in PR #16675: URL: https://github.com/apache/datafusion/pull/16675#discussion_r2184734711 ## datafusion/physical-plan/src/joins/sort_merge_join.rs: ## @@ -2032,7 +2034,10 @@ impl SortMergeJoinStream { let record_batch = concat

Re: [PR] Add SchemaAdapterFactory Support for ListingTable with Schema Evolution and Mapping [datafusion]

2025-07-04 Thread via GitHub
kosiew merged PR #16583: URL: https://github.com/apache/datafusion/pull/16583 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafus

Re: [I] Datafusion can't seem to cast evolving structs [datafusion]

2025-07-04 Thread via GitHub
kosiew closed issue #14757: Datafusion can't seem to cast evolving structs URL: https://github.com/apache/datafusion/issues/14757 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] Refactor error handling to use boxed errors for DataFusionError variants [datafusion]

2025-07-04 Thread via GitHub
kosiew merged PR #16672: URL: https://github.com/apache/datafusion/pull/16672 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafus

Re: [PR] Fix duplicate field name error in Join::try_new_with_project_input during physical planning [datafusion]

2025-07-04 Thread via GitHub
gabotechs commented on code in PR #16454: URL: https://github.com/apache/datafusion/pull/16454#discussion_r2184812398 ## datafusion/core/src/physical_planner.rs: ## @@ -1502,6 +1521,64 @@ fn get_null_physical_expr_pair( Ok((Arc::new(null_value), physical_name)) } +/// Qu

Re: [PR] Add reproducer for tpch Q16 deserialization bug [datafusion]

2025-07-04 Thread via GitHub
gabotechs commented on code in PR #16662: URL: https://github.com/apache/datafusion/pull/16662#discussion_r2184807380 ## datafusion/proto/tests/cases/roundtrip_physical_plan.rs: ## @@ -1736,3 +1737,55 @@ async fn roundtrip_physical_plan_node() { let _ = plan.execute(0, ct

Re: [PR] [branch-48] Set the default value of datafusion.execution.collect_statistics to true #16447 [datafusion]

2025-07-04 Thread via GitHub
alamb commented on PR #16659: URL: https://github.com/apache/datafusion/pull/16659#issuecomment-3035158627 Thanks agian @dmitriibugakov and @blaginin -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to g

Re: [I] Continue optimizing the CursorValues compare for StringViewArray [datafusion]

2025-07-04 Thread via GitHub
alamb closed issue #16629: Continue optimizing the CursorValues compare for StringViewArray URL: https://github.com/apache/datafusion/issues/16629 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] Perf: fast CursorValues compare for StringViewArray using inline_key_… [datafusion]

2025-07-04 Thread via GitHub
alamb merged PR #16630: URL: https://github.com/apache/datafusion/pull/16630 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] [branch-48] Set the default value of datafusion.execution.collect_statistics to true #16447 [datafusion]

2025-07-04 Thread via GitHub
alamb merged PR #16659: URL: https://github.com/apache/datafusion/pull/16659 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] [branch-48] fix: column indices in FFI partition evaluator (#16480) [datafusion]

2025-07-04 Thread via GitHub
alamb commented on PR #16657: URL: https://github.com/apache/datafusion/pull/16657#issuecomment-3035160095 FYI @timsaucer I am backporting this and making an 48.0.1 RC -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use t

Re: [PR] [branch-48] fix: column indices in FFI partition evaluator (#16480) [datafusion]

2025-07-04 Thread via GitHub
alamb merged PR #16657: URL: https://github.com/apache/datafusion/pull/16657 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] docs: Update Maven links for 0.9.0 release [datafusion-comet]

2025-07-04 Thread via GitHub
andygrove merged PR #1988: URL: https://github.com/apache/datafusion-comet/pull/1988 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@

Re: [PR] Add support for NULL escape char in pattern match searches [datafusion-sqlparser-rs]

2025-07-04 Thread via GitHub
iffyio merged PR #1913: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1913 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr

[PR] Add the missing equivalence info for filter pushdown [datafusion]

2025-07-04 Thread via GitHub
liamzwbao opened a new pull request, #16686: URL: https://github.com/apache/datafusion/pull/16686 ## Which issue does this PR close? - Closes #16563. ## Rationale for this change Add the missing equivalence info so that optimizer can pick it up when pruni

Re: [PR] Refactor SortMergeJoinMetrics to reuse BaselineMetrics [datafusion]

2025-07-04 Thread via GitHub
comphead commented on code in PR #16675: URL: https://github.com/apache/datafusion/pull/16675#discussion_r2185864775 ## datafusion/physical-plan/src/joins/sort_merge_join.rs: ## @@ -2032,7 +2034,10 @@ impl SortMergeJoinStream { let record_batch = concat_bat

Re: [PR] Update to Rust 1.88 [datafusion]

2025-07-04 Thread via GitHub
Dandandan merged PR #16663: URL: https://github.com/apache/datafusion/pull/16663 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@data

Re: [I] Update workspace to use Rust 1.88 [datafusion]

2025-07-04 Thread via GitHub
Dandandan closed issue #16655: Update workspace to use Rust 1.88 URL: https://github.com/apache/datafusion/issues/16655 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsub

[I] Update release scripts to publish Comet jars for Spark 4.0.0 [datafusion-comet]

2025-07-04 Thread via GitHub
andygrove opened a new issue, #1989: URL: https://github.com/apache/datafusion-comet/issues/1989 ### What is the problem the feature request solves? We currently only publish jars to Maven for Spark 3.4 and 3.5. We should now include 4.0 See `dev/release/build-release-comet.sh`

[PR] Align Snowflake dialect to new test of reserved keywords [datafusion-sqlparser-rs]

2025-07-04 Thread via GitHub
yoavcloud opened a new pull request, #1924: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1924 We've encountered a problem parsing statements like `SELECT 1, sort FROM tbl` which are valid in Snowflake. The reason is that in the Snowflake dialect, supports_projection_tr

Re: [PR] Add the missing equivalence info for filter pushdown [datafusion]

2025-07-04 Thread via GitHub
liamzwbao commented on code in PR #16686: URL: https://github.com/apache/datafusion/pull/16686#discussion_r2186078102 ## datafusion/core/tests/physical_optimizer/filter_pushdown/mod.rs: ## @@ -289,7 +289,7 @@ fn test_no_pushdown_through_aggregates() { Ok: - F

Re: [PR] Only update TopK dynamic filters if the new ones are more selective [datafusion]

2025-07-04 Thread via GitHub
Dandandan commented on PR #16433: URL: https://github.com/apache/datafusion/pull/16433#issuecomment-3037193167 I am taking a look now, see if I can find a thing -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL a

[PR] docs: Update Maven links for 0.9.0 release [datafusion-comet]

2025-07-04 Thread via GitHub
andygrove opened a new pull request, #1988: URL: https://github.com/apache/datafusion-comet/pull/1988 ## Which issue does this PR close? Closes #. ## Rationale for this change ## What changes are included in this PR? ## How are these changes

Re: [I] Bug: the new filter pushdown optimizer rule in physical layer will miss the equivalence info in filter [datafusion]

2025-07-04 Thread via GitHub
liamzwbao commented on issue #16563: URL: https://github.com/apache/datafusion/issues/16563#issuecomment-3037228593 I put up a fix in #16686. It works but not sure whether this is the best place to apply the change. -- This is an automated message from the Apache Git Service. To respond t

Re: [PR] Only update TopK dynamic filters if the new ones are more selective [datafusion]

2025-07-04 Thread via GitHub
Dandandan commented on code in PR #16433: URL: https://github.com/apache/datafusion/pull/16433#discussion_r2186106170 ## datafusion/physical-plan/src/topk/mod.rs: ## @@ -319,19 +342,75 @@ impl TopK { /// (a > 2 OR (a = 2 AND b < 3)) /// ``` fn update_filter(&mut s

  1   2   >