Re: [PR] Align Snowflake dialect to new test of reserved keywords [datafusion-sqlparser-rs]

2025-07-04 Thread via GitHub
iffyio merged PR #1924: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1924 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr

Re: [PR] Perform type coercion for corr aggregate function [datafusion]

2025-07-04 Thread via GitHub
kumarlokesh commented on PR #15776: URL: https://github.com/apache/datafusion/pull/15776#issuecomment-3038251223 @alamb thanks for the review! Updated the PR to address review comments. -- This is an automated message from the Apache Git Service. To respond to the message, please log on

Re: [PR] Add support for dropping multiple columns in Snowflake [datafusion-sqlparser-rs]

2025-07-04 Thread via GitHub
iffyio merged PR #1918: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1918 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr

Re: [PR] Draft : Accelerating Query Processing in DataFusion with Embedded Parquet Indexes [datafusion-site]

2025-07-04 Thread via GitHub
zhuqi-lucas commented on PR #79: URL: https://github.com/apache/datafusion-site/pull/79#issuecomment-3038122113 Thank you @alamb ! Addressed comments for the first round, but the image still not add to the content due to it not showing well in my local. -- This is an automated message fro

Re: [PR] Draft : Accelerating Query Processing in DataFusion with Embedded Parquet Indexes [datafusion-site]

2025-07-04 Thread via GitHub
zhuqi-lucas commented on code in PR #79: URL: https://github.com/apache/datafusion-site/pull/79#discussion_r2186696545 ## content/blog/datafusion-custom-parquet-index.md: ## @@ -0,0 +1,251 @@ +## Extending Parquet with Embedded Indexes and Accelerating Query Processing with Dat

Re: [PR] Draft : Accelerating Query Processing in DataFusion with Embedded Parquet Indexes [datafusion-site]

2025-07-04 Thread via GitHub
zhuqi-lucas commented on code in PR #79: URL: https://github.com/apache/datafusion-site/pull/79#discussion_r2186669292 ## content/blog/datafusion-custom-parquet-index.md: ## @@ -0,0 +1,232 @@ +## Accelerating Query Processing in DataFusion with Embedded Parquet Indexes + +It’s a

Re: [PR] Support for ClickHouse CREATE TABLE .... Engine = MergeTree() [datafusion-sqlparser-rs]

2025-07-04 Thread via GitHub
solontsev commented on PR #1925: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1925#issuecomment-3038044031 I made it more permissive by allowing empty parentheses. I'd appreciate your feedback on this -- This is an automated message from the Apache Git Service. To respond

Re: [PR] Extend binary coercion rules to support Decimal arithmetic operations with integer(signed and unsigned) types [datafusion]

2025-07-04 Thread via GitHub
adriangb commented on PR #16668: URL: https://github.com/apache/datafusion/pull/16668#issuecomment-3038036678 note to self: check if these improvements trickle down into the physical optimizers -- This is an automated message from the Apache Git Service. To respond to the message, please

[PR] Support for ClickHouse CREATE TABLE .... Engine = MergeTree() [datafusion-sqlparser-rs]

2025-07-04 Thread via GitHub
solontsev opened a new pull request, #1925: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1925 Closes #1853 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. T

Re: [I] Support Push down expression evaluation in `TableProviders` [datafusion]

2025-07-04 Thread via GitHub
adriangb commented on issue #14993: URL: https://github.com/apache/datafusion/issues/14993#issuecomment-3038010228 > > Related blog from [@gatesn](https://github.com/gatesn) > > https://blog.spiraldb.com/what-if-we-just-didnt-decompress-it/ > > I now see how the vision of this can c

Re: [PR] Draft : Accelerating Query Processing in DataFusion with Embedded Parquet Indexes [datafusion-site]

2025-07-04 Thread via GitHub
zhuqi-lucas commented on code in PR #79: URL: https://github.com/apache/datafusion-site/pull/79#discussion_r2186572247 ## content/blog/datafusion-custom-parquet-index.md: ## @@ -0,0 +1,232 @@ +## Accelerating Query Processing in DataFusion with Embedded Parquet Indexes + +It’s a

Re: [PR] Draft : Accelerating Query Processing in DataFusion with Embedded Parquet Indexes [datafusion-site]

2025-07-04 Thread via GitHub
zhuqi-lucas commented on code in PR #79: URL: https://github.com/apache/datafusion-site/pull/79#discussion_r2186552221 ## content/blog/datafusion-custom-parquet-index.md: ## @@ -0,0 +1,232 @@ +## Accelerating Query Processing in DataFusion with Embedded Parquet Indexes + +It’s a

Re: [PR] Draft : Accelerating Query Processing in DataFusion with Embedded Parquet Indexes [datafusion-site]

2025-07-04 Thread via GitHub
zhuqi-lucas commented on code in PR #79: URL: https://github.com/apache/datafusion-site/pull/79#discussion_r2186543546 ## content/blog/datafusion-custom-parquet-index.md: ## @@ -0,0 +1,232 @@ +## Accelerating Query Processing in DataFusion with Embedded Parquet Indexes + +It’s a

Re: [PR] Draft : Accelerating Query Processing in DataFusion with Embedded Parquet Indexes [datafusion-site]

2025-07-04 Thread via GitHub
zhuqi-lucas commented on code in PR #79: URL: https://github.com/apache/datafusion-site/pull/79#discussion_r2186527671 ## content/blog/datafusion-custom-parquet-index.md: ## @@ -0,0 +1,232 @@ +## Accelerating Query Processing in DataFusion with Embedded Parquet Indexes + +It’s a

Re: [PR] Draft : Accelerating Query Processing in DataFusion with Embedded Parquet Indexes [datafusion-site]

2025-07-04 Thread via GitHub
zhuqi-lucas commented on code in PR #79: URL: https://github.com/apache/datafusion-site/pull/79#discussion_r2186508081 ## content/blog/datafusion-custom-parquet-index.md: ## @@ -0,0 +1,232 @@ +## Accelerating Query Processing in DataFusion with Embedded Parquet Indexes + +It’s a

Re: [PR] Draft : Accelerating Query Processing in DataFusion with Embedded Parquet Indexes [datafusion-site]

2025-07-04 Thread via GitHub
zhuqi-lucas commented on PR #79: URL: https://github.com/apache/datafusion-site/pull/79#issuecomment-3037856509 > https://docs.google.com/presentation/d/1aFjTLEDJyDqzFZHgcmRxecCvLKKXV2OvyEpTQFCNZPw/edit?slide=id.g33d7337a5a0_0_85 Thank you @alamb for review and great suggestions! I wi

Re: [PR] Simplify HTML Formatter Style Handling Using Script Injection [datafusion-python]

2025-07-04 Thread via GitHub
kosiew commented on PR #1177: URL: https://github.com/apache/datafusion-python/pull/1177#issuecomment-3037605845 You're welcome. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comm

Re: [PR] Refactor SortMergeJoinMetrics to reuse BaselineMetrics [datafusion]

2025-07-04 Thread via GitHub
Standing-Man commented on code in PR #16675: URL: https://github.com/apache/datafusion/pull/16675#discussion_r2186251681 ## datafusion/physical-plan/src/joins/sort_merge_join.rs: ## @@ -2032,7 +2034,10 @@ impl SortMergeJoinStream { let record_batch = concat

[PR] chore(devcontainer): use debian's `protobuf-compiler` package [datafusion]

2025-07-04 Thread via GitHub
fvj opened a new pull request, #16687: URL: https://github.com/apache/datafusion/pull/16687 ## Which issue does this PR close? None. It's too small of a change for an issue, in my opinion. Happy to retroactively create one, though. ## Rationale for this change unfortunat

Re: [PR] Add support for Arrow Dictionary type in Substrait [datafusion]

2025-07-04 Thread via GitHub
jkosh44 commented on code in PR #16608: URL: https://github.com/apache/datafusion/pull/16608#discussion_r2186149622 ## datafusion/substrait/src/variation_const.rs: ## @@ -53,6 +53,10 @@ pub const DATE_64_TYPE_VARIATION_REF: u32 = 1; pub const DEFAULT_CONTAINER_TYPE_VARIATION_RE

Re: [PR] Update to Rust 1.88 [datafusion]

2025-07-04 Thread via GitHub
Dandandan merged PR #16663: URL: https://github.com/apache/datafusion/pull/16663 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@data

Re: [I] Update workspace to use Rust 1.88 [datafusion]

2025-07-04 Thread via GitHub
Dandandan closed issue #16655: Update workspace to use Rust 1.88 URL: https://github.com/apache/datafusion/issues/16655 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsub

Re: [PR] Only update TopK dynamic filters if the new ones are more selective [datafusion]

2025-07-04 Thread via GitHub
Dandandan commented on code in PR #16433: URL: https://github.com/apache/datafusion/pull/16433#discussion_r2186109216 ## datafusion/physical-plan/src/topk/mod.rs: ## @@ -319,19 +342,75 @@ impl TopK { /// (a > 2 OR (a = 2 AND b < 3)) /// ``` fn update_filter(&mut s

Re: [PR] Only update TopK dynamic filters if the new ones are more selective [datafusion]

2025-07-04 Thread via GitHub
Dandandan commented on code in PR #16433: URL: https://github.com/apache/datafusion/pull/16433#discussion_r2186107501 ## datafusion/physical-plan/src/topk/mod.rs: ## @@ -319,19 +342,75 @@ impl TopK { /// (a > 2 OR (a = 2 AND b < 3)) /// ``` fn update_filter(&mut s

Re: [PR] Only update TopK dynamic filters if the new ones are more selective [datafusion]

2025-07-04 Thread via GitHub
Dandandan commented on code in PR #16433: URL: https://github.com/apache/datafusion/pull/16433#discussion_r2186106170 ## datafusion/physical-plan/src/topk/mod.rs: ## @@ -319,19 +342,75 @@ impl TopK { /// (a > 2 OR (a = 2 AND b < 3)) /// ``` fn update_filter(&mut s

Re: [I] Bug: the new filter pushdown optimizer rule in physical layer will miss the equivalence info in filter [datafusion]

2025-07-04 Thread via GitHub
liamzwbao commented on issue #16563: URL: https://github.com/apache/datafusion/issues/16563#issuecomment-3037228593 I put up a fix in #16686. It works but not sure whether this is the best place to apply the change. -- This is an automated message from the Apache Git Service. To respond t

Re: [PR] Add the missing equivalence info for filter pushdown [datafusion]

2025-07-04 Thread via GitHub
liamzwbao commented on code in PR #16686: URL: https://github.com/apache/datafusion/pull/16686#discussion_r2186078102 ## datafusion/core/tests/physical_optimizer/filter_pushdown/mod.rs: ## @@ -289,7 +289,7 @@ fn test_no_pushdown_through_aggregates() { Ok: - F

Re: [PR] Only update TopK dynamic filters if the new ones are more selective [datafusion]

2025-07-04 Thread via GitHub
Dandandan commented on PR #16433: URL: https://github.com/apache/datafusion/pull/16433#issuecomment-3037193167 I am taking a look now, see if I can find a thing -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL a

Re: [PR] fix: sqllogictest runner label condition mismatch [datafusion]

2025-07-04 Thread via GitHub
lliangyu-lin commented on code in PR #16633: URL: https://github.com/apache/datafusion/pull/16633#discussion_r2186054590 ## datafusion/sqllogictest/bin/sqllogictests.rs: ## @@ -243,7 +243,7 @@ async fn run_test_file_substrait_round_trip( }; setup_scratch_dir(&relative_

[PR] Align Snowflake dialect to new test of reserved keywords [datafusion-sqlparser-rs]

2025-07-04 Thread via GitHub
yoavcloud opened a new pull request, #1924: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1924 We've encountered a problem parsing statements like `SELECT 1, sort FROM tbl` which are valid in Snowflake. The reason is that in the Snowflake dialect, supports_projection_tr

[PR] Add the missing equivalence info for filter pushdown [datafusion]

2025-07-04 Thread via GitHub
liamzwbao opened a new pull request, #16686: URL: https://github.com/apache/datafusion/pull/16686 ## Which issue does this PR close? - Closes #16563. ## Rationale for this change Add the missing equivalence info so that optimizer can pick it up when pruni

Re: [PR] Add support for NULL escape char in pattern match searches [datafusion-sqlparser-rs]

2025-07-04 Thread via GitHub
iffyio merged PR #1913: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1913 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr

Re: [PR] docs: Update Maven links for 0.9.0 release [datafusion-comet]

2025-07-04 Thread via GitHub
andygrove merged PR #1988: URL: https://github.com/apache/datafusion-comet/pull/1988 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@

Re: [PR] Draft : Accelerating Query Processing in DataFusion with Embedded Parquet Indexes [datafusion-site]

2025-07-04 Thread via GitHub
alamb commented on code in PR #79: URL: https://github.com/apache/datafusion-site/pull/79#discussion_r2185866801 ## content/blog/datafusion-custom-parquet-index.md: ## @@ -0,0 +1,232 @@ +## Accelerating Query Processing in DataFusion with Embedded Parquet Indexes + +It’s a commo

Re: [PR] Refactor SortMergeJoinMetrics to reuse BaselineMetrics [datafusion]

2025-07-04 Thread via GitHub
comphead commented on code in PR #16675: URL: https://github.com/apache/datafusion/pull/16675#discussion_r2185865853 ## datafusion/physical-plan/src/joins/sort_merge_join.rs: ## @@ -2032,7 +2034,10 @@ impl SortMergeJoinStream { let record_batch = concat_bat

Re: [PR] Refactor SortMergeJoinMetrics to reuse BaselineMetrics [datafusion]

2025-07-04 Thread via GitHub
comphead commented on code in PR #16675: URL: https://github.com/apache/datafusion/pull/16675#discussion_r2185864775 ## datafusion/physical-plan/src/joins/sort_merge_join.rs: ## @@ -2032,7 +2034,10 @@ impl SortMergeJoinStream { let record_batch = concat_bat

[I] Update release scripts to publish Comet jars for Spark 4.0.0 [datafusion-comet]

2025-07-04 Thread via GitHub
andygrove opened a new issue, #1989: URL: https://github.com/apache/datafusion-comet/issues/1989 ### What is the problem the feature request solves? We currently only publish jars to Maven for Spark 3.4 and 3.5. We should now include 4.0 See `dev/release/build-release-comet.sh`

[PR] docs: Update Maven links for 0.9.0 release [datafusion-comet]

2025-07-04 Thread via GitHub
andygrove opened a new pull request, #1988: URL: https://github.com/apache/datafusion-comet/pull/1988 ## Which issue does this PR close? Closes #. ## Rationale for this change ## What changes are included in this PR? ## How are these changes

Re: [PR] Comet 0.9.0 [datafusion-site]

2025-07-04 Thread via GitHub
andygrove merged PR #78: URL: https://github.com/apache/datafusion-site/pull/78 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

Re: [PR] Change tag and policy names to `ObjectName` [datafusion-sqlparser-rs]

2025-07-04 Thread via GitHub
iffyio merged PR #1892: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1892 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr

Re: [PR] feat: add multi level merge sort that will always fit in memory [datafusion]

2025-07-04 Thread via GitHub
rluvaton commented on PR #15700: URL: https://github.com/apache/datafusion/pull/15700#issuecomment-3036807094 I would appreciate it, it would greatly help me -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abov

Re: [PR] fix: sqllogictest runner label condition mismatch [datafusion]

2025-07-04 Thread via GitHub
gabotechs commented on code in PR #16633: URL: https://github.com/apache/datafusion/pull/16633#discussion_r2185709428 ## datafusion/sqllogictest/bin/sqllogictests.rs: ## @@ -243,7 +243,7 @@ async fn run_test_file_substrait_round_trip( }; setup_scratch_dir(&relative_pat

Re: [PR] feat: add multi level merge sort that will always fit in memory [datafusion]

2025-07-04 Thread via GitHub
ding-young commented on PR #15700: URL: https://github.com/apache/datafusion/pull/15700#issuecomment-3036770691 @rluvaton If you’d like, I can send a PR to your (fork's) branch that resolve merge conflicts since I already have one. Anyway there were only minor diffs to handle when I rebased

Re: [PR] Comet 0.9.0 [datafusion-site]

2025-07-04 Thread via GitHub
andygrove commented on code in PR #78: URL: https://github.com/apache/datafusion-site/pull/78#discussion_r2185674170 ## content/blog/2025-07-01-datafusion-comet-0.9.0.md: ## @@ -0,0 +1,176 @@ +--- +layout: post +title: Apache DataFusion Comet 0.9.0 Release +date: 2025-07-01 +aut

Re: [PR] feat: add multi level merge sort that will always fit in memory [datafusion]

2025-07-04 Thread via GitHub
rluvaton commented on PR #15700: URL: https://github.com/apache/datafusion/pull/15700#issuecomment-3036724762 So should I fix this PR conflicts? It seems like this pr has a change to be merged -- This is an automated message from the Apache Git Service. To respond to the message, please

Re: [PR] Only update TopK dynamic filters if the new ones are more selective [datafusion]

2025-07-04 Thread via GitHub
adriangb commented on code in PR #16433: URL: https://github.com/apache/datafusion/pull/16433#discussion_r2185669579 ## datafusion/physical-plan/src/topk/mod.rs: ## @@ -214,41 +238,39 @@ impl TopK { let mut selected_rows = None; -if let Some(filter) = self.f

[PR] Partially implement MATCH_RECOGNIZE for Advanced Pattern Matching [datafusion]

2025-07-04 Thread via GitHub
geoffreyclaude opened a new pull request, #16685: URL: https://github.com/apache/datafusion/pull/16685 ## Which issue does this PR close? - Draft attempt at #13583. ## How to review? - The sqllogic tests in the `datafusion/sqllogictest/test_files/match_recognize` cover e

[PR] Nick/readme update [datafusion-python]

2025-07-04 Thread via GitHub
ntjohnson1 opened a new pull request, #1179: URL: https://github.com/apache/datafusion-python/pull/1179 # Which issue does this PR close? Closes #1178 # Rationale for this change Explained in issue. Trying to bring develop instructions up to current experience getting started

[I] How To Develop Misses Edge cases [datafusion-python]

2025-07-04 Thread via GitHub
ntjohnson1 opened a new issue, #1178: URL: https://github.com/apache/datafusion-python/issues/1178 **Describe the bug** Running the steps for how to develop on a clean mac leads to some missing package errors. 1. Running maturin develop command initial errors that `protoc` can't be fo

Re: [PR] fix: create file for empty stream [datafusion]

2025-07-04 Thread via GitHub
chenkovsky commented on code in PR #16342: URL: https://github.com/apache/datafusion/pull/16342#discussion_r2185571139 ## datafusion/datasource/src/file_sink_config.rs: ## @@ -77,13 +79,34 @@ pub trait FileSink: DataSink { .runtime_env() .object_store(&

[I] `DataSourceExec` is projecting/reading unused columns from Parquet files for recursive queries [datafusion]

2025-07-04 Thread via GitHub
debajyoti-truefoundry opened a new issue, #16684: URL: https://github.com/apache/datafusion/issues/16684 ### Describe the bug I am on datafusion 47. ```rust use arrow::array::Int64Array; use arrow::datatypes::{DataType, Field, Schema}; use arrow::record_batch::RecordBat

[PR] Remove unused AggregateUDF struct [datafusion]

2025-07-04 Thread via GitHub
ViggoC opened a new pull request, #16683: URL: https://github.com/apache/datafusion/pull/16683 AggregateUDF is not used. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [PR] fix: create file for empty stream [datafusion]

2025-07-04 Thread via GitHub
chenkovsky commented on code in PR #16342: URL: https://github.com/apache/datafusion/pull/16342#discussion_r2185358781 ## datafusion/datasource/src/file_sink_config.rs: ## @@ -77,13 +79,34 @@ pub trait FileSink: DataSink { .runtime_env() .object_store(&

Re: [PR] Only update TopK dynamic filters if the new ones are more selective [datafusion]

2025-07-04 Thread via GitHub
Dandandan commented on code in PR #16433: URL: https://github.com/apache/datafusion/pull/16433#discussion_r2185350825 ## datafusion/physical-plan/src/topk/mod.rs: ## @@ -214,41 +238,39 @@ impl TopK { let mut selected_rows = None; -if let Some(filter) = self.

Re: [PR] Only update TopK dynamic filters if the new ones are more selective [datafusion]

2025-07-04 Thread via GitHub
adriangb commented on code in PR #16433: URL: https://github.com/apache/datafusion/pull/16433#discussion_r2185318951 ## datafusion/physical-plan/src/topk/mod.rs: ## @@ -214,41 +238,39 @@ impl TopK { let mut selected_rows = None; -if let Some(filter) = self.f

Re: [PR] Only update TopK dynamic filters if the new ones are more selective [datafusion]

2025-07-04 Thread via GitHub
Dandandan commented on code in PR #16433: URL: https://github.com/apache/datafusion/pull/16433#discussion_r2185314221 ## datafusion/physical-plan/src/topk/mod.rs: ## @@ -214,41 +238,39 @@ impl TopK { let mut selected_rows = None; -if let Some(filter) = self.

Re: [PR] Only update TopK dynamic filters if the new ones are more selective [datafusion]

2025-07-04 Thread via GitHub
adriangb commented on PR #16433: URL: https://github.com/apache/datafusion/pull/16433#issuecomment-3036131650 I did a bench run, confounding results: ``` ┏━━┳━━┳━━┳━━━┓ ┃ Query┃ main ┃ topk-filters ┃Change ┃ ┡

[PR] Revert "fix: create file for empty stream" [datafusion]

2025-07-04 Thread via GitHub
brunal opened a new pull request, #16682: URL: https://github.com/apache/datafusion/pull/16682 Reverts apache/datafusion#16342 After that change, one cannot write an empty RecordBatch with a schema to parquet anymore. Indeed, the logic added tries to write an empty recordbatch

Re: [PR] fix: create file for empty stream [datafusion]

2025-07-04 Thread via GitHub
brunal commented on PR #16342: URL: https://github.com/apache/datafusion/pull/16342#issuecomment-3036100543 How can i send a rollback of this PR? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to t

Re: [PR] fix: create file for empty stream [datafusion]

2025-07-04 Thread via GitHub
brunal commented on PR #16342: URL: https://github.com/apache/datafusion/pull/16342#issuecomment-3036097695 User facing breakage: one cannot explicitly write an empty recordbatch that has a schema anymore. The tests in the PR don't have a schema so they don't reveal the issue. -- T

Re: [PR] fix: create file for empty stream [datafusion]

2025-07-04 Thread via GitHub
brunal commented on code in PR #16342: URL: https://github.com/apache/datafusion/pull/16342#discussion_r2185262228 ## datafusion/datasource/src/file_sink_config.rs: ## @@ -77,13 +79,34 @@ pub trait FileSink: DataSink { .runtime_env() .object_store(&conf

Re: [PR] Simplify HTML Formatter Style Handling Using Script Injection [datafusion-python]

2025-07-04 Thread via GitHub
timsaucer merged PR #1177: URL: https://github.com/apache/datafusion-python/pull/1177 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...

Re: [I] Add DOM-guarded CSS/JS injection to DataFrameHtmlFormatter to prevent duplicate style/script inserts [datafusion-python]

2025-07-04 Thread via GitHub
timsaucer closed issue #1171: Add DOM-guarded CSS/JS injection to DataFrameHtmlFormatter to prevent duplicate style/script inserts URL: https://github.com/apache/datafusion-python/issues/1171 -- This is an automated message from the Apache Git Service. To respond to the message, please log on

Re: [PR] refactor: shrink `SchemaError` [datafusion]

2025-07-04 Thread via GitHub
crepererum merged PR #16653: URL: https://github.com/apache/datafusion/pull/16653 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dat

Re: [PR] Improve `ScalarUDFImpl::equals` Default Behavior and Add Unit Tests for Custom Equality [datafusion]

2025-07-04 Thread via GitHub
findepi commented on code in PR #16681: URL: https://github.com/apache/datafusion/pull/16681#discussion_r2185166539 ## datafusion/expr/src/test/udf_equals.rs: ## @@ -0,0 +1,187 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agre

Re: [I] feature: adapt predicate pushdown for mismatched nested/struct schemas [datafusion]

2025-07-04 Thread via GitHub
adriangb commented on issue #16565: URL: https://github.com/apache/datafusion/issues/16565#issuecomment-3035790931 Question: will we be able to handle something like `Dict(UInt32, List(List(Struct(...`? -- This is an automated message from the Apache Git Service. To respond to the mes

[PR] Simplify HTML Formatter Style Handling Using Script Injection [datafusion-python]

2025-07-04 Thread via GitHub
kosiew opened a new pull request, #1177: URL: https://github.com/apache/datafusion-python/pull/1177 ## Which issue does this PR close? - Closes #1171. ## Rationale for this change This change simplifies the logic for injecting HTML styles in the `DataFrameHtmlFormatter`

Re: [PR] refactor: shrink `SchemaError` [datafusion]

2025-07-04 Thread via GitHub
crepererum commented on PR #16653: URL: https://github.com/apache/datafusion/pull/16653#issuecomment-3035482230 Rebased after #16672. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [I] feature: adapt predicate pushdown for mismatched nested/struct schemas [datafusion]

2025-07-04 Thread via GitHub
kosiew commented on issue #16565: URL: https://github.com/apache/datafusion/issues/16565#issuecomment-3035480855 @adriangb, > incorporating the struct-aware casting logic from #16371 into `CastExpr` and `TryCastExpr` Yes. I think it is a necessary step to explore the feasibilit

Re: [PR] refactor: shrink `SchemaError` [datafusion]

2025-07-04 Thread via GitHub
crepererum commented on PR #16653: URL: https://github.com/apache/datafusion/pull/16653#issuecomment-3035427853 > Thanks @crepererum for taking care of that, I suspect it is motivated by clippy complains [rust-lang.github.io/rust-clippy/v0.0.212#large_enum_variant](https://rust-lang.github.

Re: [PR] Change tag and policy names to `ObjectName` instead of `Ident` [datafusion-sqlparser-rs]

2025-07-04 Thread via GitHub
eliaperantoni commented on code in PR #1892: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1892#discussion_r2184969127 ## src/parser/mod.rs: ## @@ -7866,7 +7866,7 @@ impl<'a> Parser<'a> { } pub(crate) fn parse_tag(&mut self) -> Result { -let na

[PR] Improve `ScalarUDFImpl::equals` Default Behavior and Add Unit Tests for Custom Equality [datafusion]

2025-07-04 Thread via GitHub
kosiew opened a new pull request, #16681: URL: https://github.com/apache/datafusion/pull/16681 ## Which issue does this PR close? - Closes [#16677](https://github.com/apache/datafusion/issues/16677) ## Rationale for this change The default implementation of `ScalarUDF

Re: [PR] [branch-48] Prepare 48.0.1 ad CHANGELOG [datafusion]

2025-07-04 Thread via GitHub
alamb merged PR #16679: URL: https://github.com/apache/datafusion/pull/16679 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

[PR] Add link to upgrade guide in changelog script [datafusion]

2025-07-04 Thread via GitHub
alamb opened a new pull request, #16680: URL: https://github.com/apache/datafusion/pull/16680 ## Which issue does this PR close? - Closes https://github.com/apache/datafusion/issues/16626 ## Rationale for this change As @jonmmease pointed out, for people seeing t

Re: [PR] Draft : Accelerating Query Processing in DataFusion with Embedded Parquet Indexes [datafusion-site]

2025-07-04 Thread via GitHub
zhuqi-lucas commented on PR #79: URL: https://github.com/apache/datafusion-site/pull/79#issuecomment-3035299849 Thank you @alamb , i will keep polishing it before you reviewing! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to Git

Re: [PR] Draft : Accelerating Query Processing in DataFusion with Embedded Parquet Indexes [datafusion-site]

2025-07-04 Thread via GitHub
alamb commented on PR #79: URL: https://github.com/apache/datafusion-site/pull/79#issuecomment-3035285253 This is amazing -- thank you @zhuqi-lucas and @2010YOUY01 -- I will review this asap, but as today is a holiday in the US I may not have a chance to do so until tomorrow. -- This i

Re: [PR] Add dynamic filter (bounds) pushdown to HashJoinExec [datafusion]

2025-07-04 Thread via GitHub
LiaCastaneda commented on code in PR #16445: URL: https://github.com/apache/datafusion/pull/16445#discussion_r2184901077 ## datafusion/physical-plan/src/joins/hash_join.rs: ## @@ -1039,12 +1196,50 @@ async fn collect_left_input( let data = JoinLeftData::new( hashma

[PR] [branch-48] Prepare 48.0.1 [datafusion]

2025-07-04 Thread via GitHub
alamb opened a new pull request, #16679: URL: https://github.com/apache/datafusion/pull/16679 ## Which issue does this PR close? - Part of https://github.com/apache/datafusion/issues/16486 - Related to https://github.com/apache/datafusion/issues/16626 ## Rationale for

Re: [PR] Draft : Accelerating Query Processing in DataFusion with Embedded Parquet Indexes [datafusion-site]

2025-07-04 Thread via GitHub
zhuqi-lucas commented on PR #79: URL: https://github.com/apache/datafusion-site/pull/79#issuecomment-3035225130 > This post is great, I find the content easy to follow. > > I have a suggestion for the first paragraph though: perhaps we should emphasize the motivation more clearly at t

Re: [I] Release DataFusion `48.0.1` [datafusion]

2025-07-04 Thread via GitHub
alamb commented on issue #16486: URL: https://github.com/apache/datafusion/issues/16486#issuecomment-3035205437 Ok, I have backported the three identified issues and I will now create a release candidate -- This is an automated message from the Apache Git Service. To respond to the messag

Re: [I] ScalarUDFImpl::equals default implementation is error-prone [datafusion]

2025-07-04 Thread via GitHub
alamb commented on issue #16677: URL: https://github.com/apache/datafusion/issues/16677#issuecomment-3035199021 🤔 That is a tricky business Maybe ScalarUDFImpl needs some way to have `eq` called on it too 🤔 -- This is an automated message from the Apache Git Service. To respond to

Re: [PR] Add a note about Boxing errors in upgrade guide [datafusion]

2025-07-04 Thread via GitHub
alamb closed pull request #16673: Add a note about Boxing errors in upgrade guide URL: https://github.com/apache/datafusion/pull/16673 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific co

Re: [PR] Add a note about Boxing errors in upgrade guide [datafusion]

2025-07-04 Thread via GitHub
alamb commented on PR #16673: URL: https://github.com/apache/datafusion/pull/16673#issuecomment-3035186129 @kosiew included this content in - https://github.com/apache/datafusion/pull/16672 -- This is an automated message from the Apache Git Service. To respond to the message, please l

Re: [PR] Add a note about Boxing errors in upgrade guide [datafusion]

2025-07-04 Thread via GitHub
alamb commented on code in PR #16673: URL: https://github.com/apache/datafusion/pull/16673#discussion_r2184855966 ## docs/source/library-user-guide/upgrading.md: ## @@ -24,6 +24,38 @@ **Note:** DataFusion `49.0.0` has not been released yet. The information provided in this sec

Re: [PR] [branch-48] fix: column indices in FFI partition evaluator (#16480) [datafusion]

2025-07-04 Thread via GitHub
alamb merged PR #16657: URL: https://github.com/apache/datafusion/pull/16657 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] [branch-48] fix: column indices in FFI partition evaluator (#16480) [datafusion]

2025-07-04 Thread via GitHub
alamb commented on PR #16657: URL: https://github.com/apache/datafusion/pull/16657#issuecomment-3035160095 FYI @timsaucer I am backporting this and making an 48.0.1 RC -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use t

Re: [PR] [branch-48] Set the default value of datafusion.execution.collect_statistics to true #16447 [datafusion]

2025-07-04 Thread via GitHub
alamb commented on PR #16659: URL: https://github.com/apache/datafusion/pull/16659#issuecomment-3035158627 Thanks agian @dmitriibugakov and @blaginin -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to g

Re: [PR] [branch-48] Set the default value of datafusion.execution.collect_statistics to true #16447 [datafusion]

2025-07-04 Thread via GitHub
alamb merged PR #16659: URL: https://github.com/apache/datafusion/pull/16659 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] Perf: fast CursorValues compare for StringViewArray using inline_key_… [datafusion]

2025-07-04 Thread via GitHub
alamb merged PR #16630: URL: https://github.com/apache/datafusion/pull/16630 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [I] Continue optimizing the CursorValues compare for StringViewArray [datafusion]

2025-07-04 Thread via GitHub
alamb closed issue #16629: Continue optimizing the CursorValues compare for StringViewArray URL: https://github.com/apache/datafusion/issues/16629 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] Fix duplicate field name error in Join::try_new_with_project_input during physical planning [datafusion]

2025-07-04 Thread via GitHub
gabotechs commented on code in PR #16454: URL: https://github.com/apache/datafusion/pull/16454#discussion_r2184812398 ## datafusion/core/src/physical_planner.rs: ## @@ -1502,6 +1521,64 @@ fn get_null_physical_expr_pair( Ok((Arc::new(null_value), physical_name)) } +/// Qu

Re: [PR] Add reproducer for tpch Q16 deserialization bug [datafusion]

2025-07-04 Thread via GitHub
gabotechs commented on code in PR #16662: URL: https://github.com/apache/datafusion/pull/16662#discussion_r2184807380 ## datafusion/proto/tests/cases/roundtrip_physical_plan.rs: ## @@ -1736,3 +1737,55 @@ async fn roundtrip_physical_plan_node() { let _ = plan.execute(0, ct

Re: [PR] Draft : Accelerating Query Processing in DataFusion with Embedded Parquet Indexes [datafusion-site]

2025-07-04 Thread via GitHub
2010YOUY01 commented on PR #79: URL: https://github.com/apache/datafusion-site/pull/79#issuecomment-3035056285 This post is great, I find the content easy to follow. I have a suggestion for the first paragraph though: perhaps we should emphasize the motivation more clearly at the begi

Re: [PR] Refactor SortMergeJoinMetrics to reuse BaselineMetrics [datafusion]

2025-07-04 Thread via GitHub
Standing-Man commented on code in PR #16675: URL: https://github.com/apache/datafusion/pull/16675#discussion_r2184734711 ## datafusion/physical-plan/src/joins/sort_merge_join.rs: ## @@ -2032,7 +2034,10 @@ impl SortMergeJoinStream { let record_batch = concat

Re: [PR] Refactor StreamJoinMetrics to reuse BaselineMetrics [datafusion]

2025-07-04 Thread via GitHub
Standing-Man commented on code in PR #16674: URL: https://github.com/apache/datafusion/pull/16674#discussion_r2184687258 ## datafusion/physical-plan/src/joins/symmetric_hash_join.rs: ## @@ -1375,7 +1375,7 @@ impl SymmetricHashJoinStream { } Some

Re: [PR] Draft : Accelerating Query Processing in DataFusion with Embedded Parquet Indexes [datafusion-site]

2025-07-04 Thread via GitHub
zhuqi-lucas commented on PR #79: URL: https://github.com/apache/datafusion-site/pull/79#issuecomment-3034853896 I am not expert for blog, welcome folks to polish it together, thanks a lot! cc @alamb -- This is an automated message from the Apache Git Service. To respond to the message, p

Re: [I] Blog Post for Accelerating Query Processing with Specialized Indexes [datafusion]

2025-07-04 Thread via GitHub
zhuqi-lucas commented on issue #16372: URL: https://github.com/apache/datafusion/issues/16372#issuecomment-3034853243 Submit the draft blog: https://github.com/apache/datafusion-site/pull/79 I am not expert for blog, welcome folks to polish it together, thanks a lot! -- This is a

[PR] Draft : Accelerating Query Processing in DataFusion with Embedded Parquet Indexes [datafusion-site]

2025-07-04 Thread via GitHub
zhuqi-lucas opened a new pull request, #79: URL: https://github.com/apache/datafusion-site/pull/79 Try to blog our work for the custom parquet example for datafusion: https://github.com/apache/datafusion/pull/16395 And also close this ticket: https://github.com/apache/dat

Re: [PR] Refactor SortMergeJoinMetrics to reuse BaselineMetrics [datafusion]

2025-07-04 Thread via GitHub
2010YOUY01 commented on code in PR #16675: URL: https://github.com/apache/datafusion/pull/16675#discussion_r2184636583 ## datafusion/physical-plan/src/joins/sort_merge_join.rs: ## @@ -2032,7 +2034,10 @@ impl SortMergeJoinStream { let record_batch = concat_b

Re: [PR] Refactor StreamJoinMetrics to reuse BaselineMetrics [datafusion]

2025-07-04 Thread via GitHub
2010YOUY01 commented on code in PR #16674: URL: https://github.com/apache/datafusion/pull/16674#discussion_r2184635238 ## datafusion/physical-plan/src/joins/symmetric_hash_join.rs: ## @@ -1375,7 +1375,7 @@ impl SymmetricHashJoinStream { } Some((

  1   2   >