Re: [PR] feat: rand expression support [datafusion-comet]

2025-06-24 Thread via GitHub
andygrove commented on code in PR #1199: URL: https://github.com/apache/datafusion-comet/pull/1199#discussion_r2164559552 ## spark/src/test/scala/org/apache/comet/CometExpressionSuite.scala: ## @@ -2806,6 +2806,26 @@ class CometExpressionSuite extends CometTestBase with Adaptiv

Re: [PR] Add dynamic filter (bounds) pushdown to HashJoinExec [datafusion]

2025-06-24 Thread via GitHub
adriangb commented on PR #16445: URL: https://github.com/apache/datafusion/pull/16445#issuecomment-3001813655 I'll add that :) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment

Re: [PR] adapt filter expressions to file schema during parquet scan [datafusion]

2025-06-24 Thread via GitHub
kosiew commented on PR #16461: URL: https://github.com/apache/datafusion/pull/16461#issuecomment-3003420880 hi @adriangb > could you take a look at https://github.com/apache/datafusion/commit/32725dd621ec5e96caf1970433f3549dca977a80? 👍👍👍 The new tests in `PhysicalExpr

Re: [I] Improve performance of `datafusion-cli` when reading from remote storage [datafusion]

2025-06-24 Thread via GitHub
swaingotnochill commented on issue #16365: URL: https://github.com/apache/datafusion/issues/16365#issuecomment-3003394648 I started profiling and I can definitely see reading of parquet footers and metadata taking a chunk of time initially. Just want to post an update that I will cont

Re: [PR] fix: Add overflow check for SumDecimalGroupsAccumulator::evaluate [datafusion-comet]

2025-06-24 Thread via GitHub
leung-ming commented on code in PR #1922: URL: https://github.com/apache/datafusion-comet/pull/1922#discussion_r2165657752 ## native/spark-expr/src/agg_funcs/sum_decimal.rs: ## @@ -375,11 +375,17 @@ impl GroupsAccumulator for SumDecimalGroupsAccumulator { // are nu

Re: [I] Support standard syntax for filtered aggregations [datafusion]

2025-06-24 Thread via GitHub
chenkovsky commented on issue #16516: URL: https://github.com/apache/datafusion/issues/16516#issuecomment-3003360170 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [PR] feat: Parquet modular encryption [datafusion]

2025-06-24 Thread via GitHub
corwinjoy commented on code in PR #16351: URL: https://github.com/apache/datafusion/pull/16351#discussion_r2165756822 ## datafusion-cli/tests/sql/encrypted_parquet.sql: ## @@ -0,0 +1,75 @@ +/* +Test parquet encryption and decryption in DataFusion SQL. +See datafusion/com

[I] Add DOM-guarded CSS/JS injection to DataFrameHtmlFormatter to prevent duplicate style/script inserts [datafusion-python]

2025-06-24 Thread via GitHub
kosiew opened a new issue, #1171: URL: https://github.com/apache/datafusion-python/issues/1171 ## Description: Currently, DataFrameHtmlFormatter tracks a _styles_loaded class flag in Python to avoid re-injecting styles and scripts across notebook cells. However, this approach:

Re: [PR] Add compression_level support to ParquetWriterOptions and enhance write_parquet to accept full options object [datafusion-python]

2025-06-24 Thread via GitHub
kosiew commented on code in PR #1169: URL: https://github.com/apache/datafusion-python/pull/1169#discussion_r2165665300 ## python/datafusion/dataframe.py: ## @@ -873,7 +877,7 @@ def write_csv(self, path: str | pathlib.Path, with_header: bool = False) -> None def write_par

Re: [PR] fix: Add overflow check for SumDecimalGroupsAccumulator::evaluate [datafusion-comet]

2025-06-24 Thread via GitHub
leung-ming commented on code in PR #1922: URL: https://github.com/apache/datafusion-comet/pull/1922#discussion_r2165657752 ## native/spark-expr/src/agg_funcs/sum_decimal.rs: ## @@ -375,11 +375,17 @@ impl GroupsAccumulator for SumDecimalGroupsAccumulator { // are nu

Re: [PR] feat: dataframe string formatter [datafusion-python]

2025-06-24 Thread via GitHub
kosiew commented on code in PR #1170: URL: https://github.com/apache/datafusion-python/pull/1170#discussion_r2165554718 ## python/datafusion/dataframe.py: ## @@ -1112,3 +,17 @@ def fill_null(self, value: Any, subset: list[str] | None = None) -> DataFrame: - Fo

Re: [I] Use sha2 implementation from datafusion-spark crate [datafusion-comet]

2025-06-24 Thread via GitHub
rishvin commented on issue #1820: URL: https://github.com/apache/datafusion-comet/issues/1820#issuecomment-3002947426 > > [@andygrove](https://github.com/andygrove) can I backport [SHA2-fix](https://github.com/apache/datafusion/pull/16350) to branch-48 of datafusion ? I tried updating with

[PR] Fix WindowFrame::new with order_by [datafusion]

2025-06-24 Thread via GitHub
findepi opened a new pull request, #16537: URL: https://github.com/apache/datafusion/pull/16537 ## Which issue does this PR close? - Closes #. ## Rationale for this change Before the change, the frame constructed with `WindowFrame::new(Some(true))` would not be

[PR] fix: The inconsistency between scalar and array on the cast of timestamp [datafusion]

2025-06-24 Thread via GitHub
chenkovsky opened a new pull request, #16539: URL: https://github.com/apache/datafusion/pull/16539 ## Which issue does this PR close? - Closes #16531. ## Rationale for this change In arrow(array), it will treat decimal as millisecond. but in datafusion(scalar), decimal w

Re: [PR] Reuse `BaselineMetrics` in `UnnestMetrics` [datafusion]

2025-06-24 Thread via GitHub
alamb commented on PR #16497: URL: https://github.com/apache/datafusion/pull/16497#issuecomment-3001654371 Thanks again @hendrikmakait and @2010YOUY01 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to g

Re: [PR] feat: Encapsulate Parquet objects [datafusion-comet]

2025-06-24 Thread via GitHub
huaxingao commented on PR #1920: URL: https://github.com/apache/datafusion-comet/pull/1920#issuecomment-3002056893 cc @andygrove @parthchandra @hsiang-c Could you please review this PR? Thanks a lot! -- This is an automated message from the Apache Git Service. To respond to the messag

Re: [PR] fix: The inconsistency between scalar and array on the cast of timestamp [datafusion]

2025-06-24 Thread via GitHub
chenkovsky commented on code in PR #16539: URL: https://github.com/apache/datafusion/pull/16539#discussion_r2165199807 ## datafusion/common/src/scalar/mod.rs: ## @@ -3069,7 +3069,7 @@ impl ScalarValue { ScalarValue::Decimal128(Some(decimal_value), _, scale),

Re: [PR] feat: Parquet modular encryption [datafusion]

2025-06-24 Thread via GitHub
corwinjoy commented on code in PR #16351: URL: https://github.com/apache/datafusion/pull/16351#discussion_r2165246656 ## docs/source/user-guide/configs.md: ## @@ -81,6 +81,8 @@ Environment variables are read during `SessionConfig` initialisation so they mus | datafusion.execut

[I] Move pruning logic into its own crate [datafusion]

2025-06-24 Thread via GitHub
alamb opened a new issue, #16542: URL: https://github.com/apache/datafusion/issues/16542 I made this `pub` as I think it could be useful for other data sources. I do think we should move this + `PruningPredicate` stuff into a `datafusion-pruning` create or something. _Originall

Re: [I] Incorrect memory accounting in `array_agg` function [datafusion]

2025-06-24 Thread via GitHub
sfluor commented on issue #16517: URL: https://github.com/apache/datafusion/issues/16517#issuecomment-2999245472 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To uns

[PR] Add note for planning release in Upgrade Guides [datafusion]

2025-06-24 Thread via GitHub
xudong963 opened a new pull request, #16534: URL: https://github.com/apache/datafusion/pull/16534 ## Which issue does this PR close? - Closes #. ## Rationale for this change Reduce confusion for users https://github.com/user-attachments/assets/6f4430db-

Re: [PR] feat: support table sample [datafusion]

2025-06-24 Thread via GitHub
chenkovsky commented on PR #16505: URL: https://github.com/apache/datafusion/pull/16505#issuecomment-2999488349 > I suggest to first open an issue to describe full syntax and semantics of this table sample feature, and also include the reference system (like postgres). After we have reached

Re: [PR] Optimize allocation rate for `int64` array in `hex` function [datafusion]

2025-06-24 Thread via GitHub
Fly-Style commented on PR #16483: URL: https://github.com/apache/datafusion/pull/16483#issuecomment-2999735349 Benchmark results: ``` Benchmarking compute_hex_new: Warming up for 3. s Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 5.

Re: [PR] Fix array_has to return false for empty arrays instead of null [datafusion]

2025-06-24 Thread via GitHub
comphead merged PR #16529: URL: https://github.com/apache/datafusion/pull/16529 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

Re: [PR] chore(deps): bump prost-build from 0.13.5 to 0.14.1 in the proto group [datafusion]

2025-06-24 Thread via GitHub
alamb commented on PR #16439: URL: https://github.com/apache/datafusion/pull/16439#issuecomment-3001700833 This needs to wait for the next release of arrow I think to keep all prost versions in sync -- This is an automated message from the Apache Git Service. To respond to the message, pl

Re: [PR] Add note for planning release in Upgrade Guides [datafusion]

2025-06-24 Thread via GitHub
xudong963 merged PR #16534: URL: https://github.com/apache/datafusion/pull/16534 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@data

Re: [PR] Simplify predicates in `PushDownFilter` optimizer rule [datafusion]

2025-06-24 Thread via GitHub
xudong963 merged PR #16362: URL: https://github.com/apache/datafusion/pull/16362 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@data

Re: [PR] chore: Upload hprof files on failure [datafusion-comet]

2025-06-24 Thread via GitHub
andygrove closed pull request #1791: chore: Upload hprof files on failure URL: https://github.com/apache/datafusion-comet/pull/1791 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific commen

[I] Different result of decimal to timestamp cast when source value is constant [datafusion]

2025-06-24 Thread via GitHub
findepi opened a new issue, #16531: URL: https://github.com/apache/datafusion/issues/16531 ### Describe the bug decimal value V cast to timestamp type produces different values T1 or T2, depending on whether V is a constant ### To Reproduce ``` > SELECT CAST(C

Re: [I] Field Naming Collisions [datafusion]

2025-06-24 Thread via GitHub
kosiew commented on issue #16478: URL: https://github.com/apache/datafusion/issues/16478#issuecomment-2999357989 hi @hknlof I could not reproduce this with main commit ` fb01049d7` all.json ``` {"TaskId":1,"Title":"Design homepage","AssignedTo":"Alice"} {"TaskId":2,"Ti

Re: [PR] fix: extend recursive protection to prevent stack overflows in additional functions [datafusion]

2025-06-24 Thread via GitHub
alamb commented on PR #16506: URL: https://github.com/apache/datafusion/pull/16506#issuecomment-3001108899 > I’d really appreciate it if DataFusion provided a "knob" that allowed users to express this kind of preference. Yeah this makes sense to me to I was thinking sometimes w

Re: [PR] feat: optimize and unparse grouping [datafusion]

2025-06-24 Thread via GitHub
alamb commented on code in PR #16161: URL: https://github.com/apache/datafusion/pull/16161#discussion_r2164876290 ## datafusion/sql/Cargo.toml: ## @@ -48,6 +48,7 @@ arrow = { workspace = true } bigdecimal = { workspace = true } datafusion-common = { workspace = true, default-f

Re: [PR] feat: supports array_distinct [datafusion-comet]

2025-06-24 Thread via GitHub
drexler-sky commented on PR #1923: URL: https://github.com/apache/datafusion-comet/pull/1923#issuecomment-3002328767 > ...1 more thing please add tests with empty array. I tested array_distinct with an empty array. ``` SELECT array_distinct(array()) FROM t1; == Optimized

Re: [PR] feat: supports array_distinct [datafusion-comet]

2025-06-24 Thread via GitHub
drexler-sky commented on code in PR #1923: URL: https://github.com/apache/datafusion-comet/pull/1923#discussion_r2165266799 ## spark/src/test/scala/org/apache/comet/CometArrayExpressionSuite.scala: ## @@ -232,24 +232,42 @@ class CometArrayExpressionSuite extends CometTestBase w

Re: [PR] datafusion-cli: Use correct S3 region if it is not specified [datafusion]

2025-06-24 Thread via GitHub
alamb commented on PR #16502: URL: https://github.com/apache/datafusion/pull/16502#issuecomment-3002266142 Thanks @liamzwbao -- I'll put this one on my review queue for tomorrow -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

Re: [PR] feat: Parquet modular encryption [datafusion]

2025-06-24 Thread via GitHub
alamb commented on code in PR #16351: URL: https://github.com/apache/datafusion/pull/16351#discussion_r2165219803 ## datafusion/core/src/dataframe/parquet.rs: ## @@ -246,4 +246,72 @@ mod tests { Ok(()) } + +#[tokio::test] +async fn roundtrip_parquet_with_

Re: [PR] Support array_position [datafusion-comet]

2025-06-24 Thread via GitHub
andygrove closed pull request #1363: Support array_position URL: https://github.com/apache/datafusion-comet/pull/1363 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubsc

Re: [PR] datafusion-cli: Use correct S3 region if it is not specified [datafusion]

2025-06-24 Thread via GitHub
liamzwbao commented on PR #16502: URL: https://github.com/apache/datafusion/pull/16502#issuecomment-3002260982 Hi @alamb, this PR is ready for review! One concern here is whether it is acceptable to test against a bucket outside of datafusion, but I couldn’t find a good way to test r

Re: [PR] feat: Support hadoop s3a config in native_iceberg_compat [datafusion-comet]

2025-06-24 Thread via GitHub
parthchandra merged PR #1925: URL: https://github.com/apache/datafusion-comet/pull/1925 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr.

Re: [PR] feat: Parquet modular encryption [datafusion]

2025-06-24 Thread via GitHub
corwinjoy commented on code in PR #16351: URL: https://github.com/apache/datafusion/pull/16351#discussion_r2165202964 ## datafusion/core/src/dataframe/parquet.rs: ## @@ -246,4 +246,72 @@ mod tests { Ok(()) } + +#[tokio::test] +async fn roundtrip_parquet_w

Re: [PR] fix: The inconsistency between scalar and array on the cast of timestamp [datafusion]

2025-06-24 Thread via GitHub
chenkovsky commented on PR #16539: URL: https://github.com/apache/datafusion/pull/16539#issuecomment-3002194281 should I also change https://github.com/apache/datafusion/blob/334d449ff778af590a7d8421544ac60222e5f69b/datafusion/functions/src/datetime/to_timestamp.rs#L339 ? -- This is an

Re: [PR] fix: The inconsistency between scalar and array on the cast of timestamp [datafusion]

2025-06-24 Thread via GitHub
chenkovsky commented on code in PR #16539: URL: https://github.com/apache/datafusion/pull/16539#discussion_r2165155973 ## datafusion/common/src/scalar/mod.rs: ## @@ -3069,7 +3069,7 @@ impl ScalarValue { ScalarValue::Decimal128(Some(decimal_value), _, scale),

Re: [PR] Consolidate DataFrame Docs: Merge HTML Rendering Section as Subpage [datafusion-python]

2025-06-24 Thread via GitHub
timsaucer merged PR #1161: URL: https://github.com/apache/datafusion-python/pull/1161 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...

Re: [I] Merge dataframe documentation prior to release of DF48 [datafusion-python]

2025-06-24 Thread via GitHub
timsaucer closed issue #1158: Merge dataframe documentation prior to release of DF48 URL: https://github.com/apache/datafusion-python/issues/1158 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] Consolidate DataFrame Docs: Merge HTML Rendering Section as Subpage [datafusion-python]

2025-06-24 Thread via GitHub
timsaucer commented on PR #1161: URL: https://github.com/apache/datafusion-python/pull/1161#issuecomment-3002159630 I pushed a change to just move a few things over. We can open another PR if you feel strongly about splitting it out. -- This is an automated message from the Apache Git Se

Re: [PR] Add dynamic filter (bounds) pushdown to HashJoinExec [datafusion]

2025-06-24 Thread via GitHub
Dandandan commented on PR #16445: URL: https://github.com/apache/datafusion/pull/16445#issuecomment-3001863494 > I was originally planning on keeping this PR smaller but it's been growing so I might as well add the Arc :) Feel free to PR it however you like ;) -- This is an automa

Re: [PR] feat: Encapsulate Parquet objects [datafusion-comet]

2025-06-24 Thread via GitHub
huaxingao commented on PR #1920: URL: https://github.com/apache/datafusion-comet/pull/1920#issuecomment-3002050043 I have a draft iceberg [PR](https://github.com/apache/iceberg/pull/13378) -- This is an automated message from the Apache Git Service. To respond to the message, please log o

Re: [PR] fix: Add continue after append_null when casting float to decimal [datafusion-comet]

2025-06-24 Thread via GitHub
leung-ming commented on code in PR #1914: URL: https://github.com/apache/datafusion-comet/pull/1914#discussion_r2164577686 ## native/spark-expr/src/conversion_funcs/cast.rs: ## @@ -1298,6 +1298,7 @@ where }); } else {

[PR] Feat/dataframe str formatter [datafusion-python]

2025-06-24 Thread via GitHub
timsaucer opened a new pull request, #1170: URL: https://github.com/apache/datafusion-python/pull/1170 # Which issue does this PR close? None # Rationale for this change We recently added a html formatter, which is useful for generating html tables as a customer. This a

Re: [PR] fix: Make cast from float/double to decimal compatible with Spark [datafusion-comet]

2025-06-24 Thread via GitHub
leung-ming commented on PR #1915: URL: https://github.com/apache/datafusion-comet/pull/1915#issuecomment-3001405315 @parthchandra @andygrove I checked [ryu](https://crates.io/crates/ryu), [schubfach](https://crates.io/crates/schubfach) and [dragonbox](https://crates.io/crates/dragonbox

Re: [PR] Add note for planning release in Upgrade Guides [datafusion]

2025-06-24 Thread via GitHub
alamb commented on code in PR #16534: URL: https://github.com/apache/datafusion/pull/16534#discussion_r2164482272 ## docs/source/library-user-guide/upgrading.md: ## @@ -21,6 +21,8 @@ ## DataFusion `49.0.0` +**Note:** DataFusion 49.0.0 has not been released yet. The informat

Re: [PR] chore: Enable Spark SQL tests for auto scan mode [datafusion-comet]

2025-06-24 Thread via GitHub
andygrove commented on code in PR #1885: URL: https://github.com/apache/datafusion-comet/pull/1885#discussion_r2165064746 ## .github/workflows/spark_sql_test_native_auto.yml: ## @@ -35,7 +45,7 @@ jobs: matrix: os: [ubuntu-24.04] java-version: [11] -

[PR] Auto file prefix [datafusion-comet]

2025-06-24 Thread via GitHub
andygrove opened a new pull request, #1930: URL: https://github.com/apache/datafusion-comet/pull/1930 ## Which issue does this PR close? Closes https://github.com/apache/datafusion-comet/issues/1908 ## Rationale for this change ## What changes are included

[I] `array_contains` falls back to Spark in case of empty Array [datafusion-comet]

2025-06-24 Thread via GitHub
comphead opened a new issue, #1929: URL: https://github.com/apache/datafusion-comet/issues/1929 ### Describe the bug After reviewing https://github.com/apache/datafusion/pull/16529 I added empty array test to Comet ``` test("array_contains") { withSQLConf(CometC

Re: [PR] Support `array_union` scalar expr [datafusion-comet]

2025-06-24 Thread via GitHub
andygrove commented on PR #1362: URL: https://github.com/apache/datafusion-comet/pull/1362#issuecomment-3001253623 I'm closing this issue because it has been stale for a while. @dharanad feel free to reopen this if you resume work on this featrure. -- This is an automated message from th

Re: [PR] chore: refactor `BuildProbeJoinMetrics` to use `BaselineMetrics` [datafusion]

2025-06-24 Thread via GitHub
Samyak2 commented on code in PR #16500: URL: https://github.com/apache/datafusion/pull/16500#discussion_r2164580513 ## datafusion/physical-plan/src/joins/cross_join.rs: ## @@ -632,7 +632,7 @@ impl CrossJoinStream { } self.join_metrics.

Re: [I] AQE may materialize a non-supported Final-mode HashAggregate [datafusion-comet]

2025-06-24 Thread via GitHub
coderfender commented on issue #1389: URL: https://github.com/apache/datafusion-comet/issues/1389#issuecomment-3002064952 As per @EmilyMatt ' s comment on the related PR ,this seems to be no longer a bug but a performance boost. @andygrove we might have to remove `bug` label and remark it

Re: [PR] Add some comments about adding new dependencies in datafusion-sql [datafusion]

2025-06-24 Thread via GitHub
comphead merged PR #16543: URL: https://github.com/apache/datafusion/pull/16543 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

Re: [PR] Minor: Add documentation to `AggregateWindowExpr::get_result_column` [datafusion]

2025-06-24 Thread via GitHub
comphead merged PR #16479: URL: https://github.com/apache/datafusion/pull/16479 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

Re: [PR] `TableProvider` to skip files in the folder which non relevant to selected reader [datafusion]

2025-06-24 Thread via GitHub
comphead merged PR #16487: URL: https://github.com/apache/datafusion/pull/16487 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

Re: [I] array_has function returns null for an empty list ([]) instead of false [datafusion]

2025-06-24 Thread via GitHub
comphead closed issue #16474: array_has function returns null for an empty list ([]) instead of false URL: https://github.com/apache/datafusion/issues/16474 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

Re: [PR] chore: Enable Spark SQL tests for auto scan mode [datafusion-comet]

2025-06-24 Thread via GitHub
kazuyukitanimura commented on code in PR #1885: URL: https://github.com/apache/datafusion-comet/pull/1885#discussion_r2165034490 ## .github/workflows/spark_sql_test_native_auto.yml: ## @@ -35,7 +45,7 @@ jobs: matrix: os: [ubuntu-24.04] java-version: [11]

Re: [PR] Add compression_level support to ParquetWriterOptions and enhance write_parquet to accept full options object [datafusion-python]

2025-06-24 Thread via GitHub
timsaucer commented on code in PR #1169: URL: https://github.com/apache/datafusion-python/pull/1169#discussion_r2165031025 ## python/datafusion/dataframe.py: ## @@ -873,7 +877,7 @@ def write_csv(self, path: str | pathlib.Path, with_header: bool = False) -> None def write_

Re: [PR] Implement Parquet filter pushdown via new filter pushdown APIs [datafusion]

2025-06-24 Thread via GitHub
adriangb commented on code in PR #15769: URL: https://github.com/apache/datafusion/pull/15769#discussion_r2163855317 ## datafusion/datasource/src/file_format.rs: ## @@ -94,7 +93,6 @@ pub trait FileFormat: Send + Sync + fmt::Debug { &self, state: &dyn Session,

Re: [PR] [WIP] Remove COMET_SHUFFLE_FALLBACK_TO_COLUMNAR config #1736 - v_vadlamani [datafusion-comet]

2025-06-24 Thread via GitHub
andygrove closed pull request #1827: [WIP] Remove COMET_SHUFFLE_FALLBACK_TO_COLUMNAR config #1736 - v_vadlamani URL: https://github.com/apache/datafusion-comet/pull/1827 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

Re: [I] [EPIC] expression pushdown and file level expression handling [datafusion]

2025-06-24 Thread via GitHub
adriangb commented on issue #16528: URL: https://github.com/apache/datafusion/issues/16528#issuecomment-3000748063 I added an example of how https://github.com/apache/datafusion/pull/16461 will help solve https://github.com/apache/datafusion/issues/14993 in https://github.com/apache/datafus

Re: [PR] Reuse `BaselineMetrics` in `UnnestMetrics` [datafusion]

2025-06-24 Thread via GitHub
2010YOUY01 commented on code in PR #16497: URL: https://github.com/apache/datafusion/pull/16497#discussion_r2163184021 ## datafusion/physical-plan/src/unnest.rs: ## @@ -284,7 +279,9 @@ impl UnnestStream { loop { return Poll::Ready(match ready!(self.input.po

Re: [PR] fix: SortMergeJoin for timestamp keys [datafusion-comet]

2025-06-24 Thread via GitHub
SKY-ALIN commented on code in PR #1901: URL: https://github.com/apache/datafusion-comet/pull/1901#discussion_r2164969731 ## spark/src/test/scala/org/apache/comet/exec/CometJoinSuite.scala: ## @@ -54,25 +54,6 @@ class CometJoinSuite extends CometTestBase { .toSeq) }

[PR] Fix join precedence for non-snowflake queries [datafusion-sqlparser-rs]

2025-06-24 Thread via GitHub
Dimchikkk opened a new pull request, #1905: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1905 Fixes https://github.com/apache/datafusion-sqlparser-rs/issues/1904. The bug was introduced in https://github.com/apache/datafusion-sqlparser-rs/pull/1799. -- This is an aut

Re: [I] AQE may materialize a non-supported Final-mode HashAggregate [datafusion-comet]

2025-06-24 Thread via GitHub
coderfender commented on issue #1389: URL: https://github.com/apache/datafusion-comet/issues/1389#issuecomment-324166 Working on this -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spe

Re: [PR] Add PhysicalExpr optimizer and cast unwrapping [datafusion]

2025-06-24 Thread via GitHub
alamb commented on PR #16530: URL: https://github.com/apache/datafusion/pull/16530#issuecomment-3001900336 I will try and find time to review this tomorrow -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[PR] Consolidate configuration sectionds in docs [datafusion]

2025-06-24 Thread via GitHub
alamb opened a new pull request, #16544: URL: https://github.com/apache/datafusion/pull/16544 ## Which issue does this PR close? - Part of #7013 ## Rationale for this change The main navbar on the left of the page is already pretty big. I noticed that there are

[PR] Add some comments about adding new dependencies in datafusion-sql [datafusion]

2025-06-24 Thread via GitHub
alamb opened a new pull request, #16543: URL: https://github.com/apache/datafusion/pull/16543 ## Which issue does this PR close? - Closes #. ## Rationale for this change - While reviewing https://github.com/apache/datafusion/pull/16161 from @chenkovsky it seems

Re: [PR] feat: Parquet modular encryption [datafusion]

2025-06-24 Thread via GitHub
alamb commented on code in PR #16351: URL: https://github.com/apache/datafusion/pull/16351#discussion_r2164809665 ## datafusion/core/src/dataframe/parquet.rs: ## @@ -246,4 +246,72 @@ mod tests { Ok(()) } + +#[tokio::test] +async fn roundtrip_parquet_with_

Re: [PR] Add dynamic filter (bounds) pushdown to HashJoinExec [datafusion]

2025-06-24 Thread via GitHub
Dandandan commented on PR #16445: URL: https://github.com/apache/datafusion/pull/16445#issuecomment-3001717244 To share some experience, we recently added some similar pushdown for HashJoinExec (at Coralogix) using sharing of `Arc` / comparing column hashes and it is seems so far very effec

Re: [PR] fix: The inconsistency between scalar and array on the cast of timestamp [datafusion]

2025-06-24 Thread via GitHub
alamb commented on code in PR #16539: URL: https://github.com/apache/datafusion/pull/16539#discussion_r2164790029 ## datafusion/common/src/scalar/mod.rs: ## @@ -3069,7 +3069,7 @@ impl ScalarValue { ScalarValue::Decimal128(Some(decimal_value), _, scale),

Re: [PR] chore(deps): bump prost-build from 0.13.5 to 0.14.1 in the proto group [datafusion]

2025-06-24 Thread via GitHub
dependabot[bot] commented on PR #16439: URL: https://github.com/apache/datafusion/pull/16439#issuecomment-3001700920 This pull request was built based on a group rule. Closing it will not ignore any of these versions in future pull requests. To ignore these dependencies, configure [ig

Re: [PR] Allow unparser to override the alias name for the specific dialect [datafusion]

2025-06-24 Thread via GitHub
alamb commented on code in PR #16540: URL: https://github.com/apache/datafusion/pull/16540#discussion_r2164780443 ## datafusion/sql/tests/cases/plan_to_sql.rs: ## @@ -923,6 +923,35 @@ fn roundtrip_statement_with_dialect_45() -> Result<(), DataFusionError> { Ok(()) } +#[

Re: [PR] chore: Enable `native_iceberg_compat` Spark SQL tests (for real, this time) [datafusion-comet]

2025-06-24 Thread via GitHub
parthchandra commented on code in PR #1910: URL: https://github.com/apache/datafusion-comet/pull/1910#discussion_r2164779883 ## dev/diffs/3.5.6.diff: ## @@ -1938,7 +1938,17 @@ index 8e88049f51e..d3c0737d52e 100644 import testImplicits._ // keep() should take effect o

Re: [PR] fix: Add overflow check for SumDecimalGroupsAccumulator::evaluate [datafusion-comet]

2025-06-24 Thread via GitHub
parthchandra commented on code in PR #1922: URL: https://github.com/apache/datafusion-comet/pull/1922#discussion_r2164776753 ## native/spark-expr/src/agg_funcs/sum_decimal.rs: ## @@ -375,11 +375,17 @@ impl GroupsAccumulator for SumDecimalGroupsAccumulator { // are

Re: [PR] Blog post on query cancellation [datafusion-site]

2025-06-24 Thread via GitHub
alamb commented on PR #75: URL: https://github.com/apache/datafusion-site/pull/75#issuecomment-3001676393 In my opinion this article does a pretty good job explaining the issues with cancellation, but it doesn't talk about `async` destructors which I agree are probably best left out of scop

Re: [PR] Blog post on query cancellation [datafusion-site]

2025-06-24 Thread via GitHub
alamb commented on PR #75: URL: https://github.com/apache/datafusion-site/pull/75#issuecomment-3001672876 > I was wondering if it makes sense to add challenges why async is challenging to cancel on low level but it probably would be noisy. But just in case this article shed the light on can

Re: [I] array_has function returns null for an empty list ([]) instead of false [datafusion]

2025-06-24 Thread via GitHub
alamb commented on issue #16474: URL: https://github.com/apache/datafusion/issues/16474#issuecomment-3001656082 @kosiew has a proposed PR to fix this: - https://github.com/apache/datafusion/pull/16529 -- This is an automated message from the Apache Git Service. To respond to the messag

Re: [I] Refactor `UnnestMetrics` to reuse `BaselineMetrics` [datafusion]

2025-06-24 Thread via GitHub
alamb closed issue #16494: Refactor `UnnestMetrics` to reuse `BaselineMetrics` URL: https://github.com/apache/datafusion/issues/16494 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comm

Re: [PR] Reuse `BaselineMetrics` in `UnnestMetrics` [datafusion]

2025-06-24 Thread via GitHub
alamb merged PR #16497: URL: https://github.com/apache/datafusion/pull/16497 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] Skip re-pruning based on partition values and file level stats if there are no dynamic filters [datafusion]

2025-06-24 Thread via GitHub
alamb commented on PR #16424: URL: https://github.com/apache/datafusion/pull/16424#issuecomment-3001652418 Also thank you @xudong963 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] Skip re-pruning based on partition values and file level stats if there are no dynamic filters [datafusion]

2025-06-24 Thread via GitHub
alamb commented on code in PR #16424: URL: https://github.com/apache/datafusion/pull/16424#discussion_r2164757786 ## datafusion/datasource-parquet/src/opener.rs: ## @@ -524,6 +512,91 @@ fn should_enable_page_index( .unwrap_or(false) } +/// Prune based on partitio

Re: [I] Investigate performance tradeoff in compressing spill files [datafusion]

2025-06-24 Thread via GitHub
ding-young commented on issue #16367: URL: https://github.com/apache/datafusion/issues/16367#issuecomment-3000103132 ### Need for a Custom Batch Writer? 1. `concat_batches` before writing? I tried a quick local test where, instead of writing one batch at a time using the curre

Re: [PR] feat: Add ConfigOptions to ScalarFunctionArgs [datafusion]

2025-06-24 Thread via GitHub
Omega359 commented on PR #13527: URL: https://github.com/apache/datafusion/pull/13527#issuecomment-3001594979 The async udf approach using an ExecutionPlan is interesting. I think it's an approach that could work for scalar udfs as well but it would take some time to impl -- This is an a

Re: [PR] Add dynamic filter (bounds) pushdown to HashJoinExec [datafusion]

2025-06-24 Thread via GitHub
adriangb commented on code in PR #16445: URL: https://github.com/apache/datafusion/pull/16445#discussion_r2164716153 ## datafusion/physical-plan/src/joins/hash_join.rs: ## @@ -666,10 +679,25 @@ impl DisplayAs for HashJoinExec { .map(|(c1, c2)| format!("({c1}

Re: [PR] Add dynamic filter (bounds) pushdown to HashJoinExec [datafusion]

2025-06-24 Thread via GitHub
adriangb commented on code in PR #16445: URL: https://github.com/apache/datafusion/pull/16445#discussion_r2164714496 ## datafusion/core/tests/physical_optimizer/filter_pushdown/mod.rs: ## @@ -433,6 +433,117 @@ async fn test_topk_dynamic_filter_pushdown() { ); } +#[tokio:

Re: [PR] Simplify AsyncScalarUdfImpl so it extends ScalarUdfImpl [datafusion]

2025-06-24 Thread via GitHub
goldmedal commented on code in PR #16523: URL: https://github.com/apache/datafusion/pull/16523#discussion_r2164074452 ## datafusion-examples/examples/async_udf.rs: ## @@ -243,9 +252,20 @@ impl AsyncScalarUDFImpl for AsyncEqual { Ok(DataType::Boolean) } +fn in

Re: [I] Field Naming Collisions [datafusion]

2025-06-24 Thread via GitHub
hknlof commented on issue #16478: URL: https://github.com/apache/datafusion/issues/16478#issuecomment-2999702499 Hi @kosiew, thanks for your response. Updating from `47.0.0` to `48.0.0` resolves the issue. Your example had issues with `47.0.0`, as well. Apologies, if this is a duplicate of

Re: [PR] feat: rand expression support [datafusion-comet]

2025-06-24 Thread via GitHub
andygrove commented on code in PR #1199: URL: https://github.com/apache/datafusion-comet/pull/1199#discussion_r2164642169 ## spark/src/test/scala/org/apache/comet/CometExpressionSuite.scala: ## @@ -2806,6 +2806,26 @@ class CometExpressionSuite extends CometTestBase with Adaptiv

Re: [PR] Blog post on query cancellation [datafusion-site]

2025-06-24 Thread via GitHub
comphead commented on PR #75: URL: https://github.com/apache/datafusion-site/pull/75#issuecomment-3001443602 I was wondering if it makes sense to add challenges why async is challenging to cancel on low level but it probably would be noisy. But just in case this article shed the light on ca

Re: [PR] fix: Make cast from float/double to decimal compatible with Spark [datafusion-comet]

2025-06-24 Thread via GitHub
leung-ming commented on PR #1915: URL: https://github.com/apache/datafusion-comet/pull/1915#issuecomment-3001437169 @parthchandra @andygrove I am going to do the following things those day, 1. check the issue in dragonbox repo. 2. compare it with the original c++ implementation and

Re: [I] [Epic] A collection of Substrait conversion issues [datafusion]

2025-06-24 Thread via GitHub
gabotechs commented on issue #16248: URL: https://github.com/apache/datafusion/issues/16248#issuecomment-3001436651 I imagine that in the same what you were able to encode the extra information about the types in https://github.com/apache/datafusion/pull/16503 as different variations of the

[I] Postgres NOT VALID and VALIDATE CONSTRAINT not parsed for ALTER TABLE [datafusion-sqlparser-rs]

2025-06-24 Thread via GitHub
achristmascarl opened a new issue, #1907: URL: https://github.com/apache/datafusion-sqlparser-rs/issues/1907 Doesn't recognize statements creating an unvalidated constraint (`ALTER TABLE "xyz" ADD "constraint_name" NOT VALID`) And also doesn't recognize statements validating constrain

Re: [I] [substrait] [sqllogictest] Unsupported cast type: Duration [datafusion]

2025-06-24 Thread via GitHub
gabotechs commented on issue #16285: URL: https://github.com/apache/datafusion/issues/16285#issuecomment-3001400544 I think using a type variation reference as you did in https://github.com/apache/datafusion/pull/16503 is a good solution to this. I see that this is a similar approach taken

Re: [PR] Add support for Arrow Duration type in Substrait [datafusion]

2025-06-24 Thread via GitHub
gabotechs commented on code in PR #16503: URL: https://github.com/apache/datafusion/pull/16503#discussion_r2164581001 ## datafusion/substrait/src/variation_const.rs: ## @@ -55,6 +55,8 @@ pub const LARGE_CONTAINER_TYPE_VARIATION_REF: u32 = 1; pub const VIEW_CONTAINER_TYPE_VARIAT

  1   2   >