Re: [PR] feat: Parquet modular encryption [datafusion]

2025-06-24 Thread via GitHub
mbutrovich commented on PR #16351: URL: https://github.com/apache/datafusion/pull/16351#issuecomment-3000137303 > I am sorry I haven't had a chance to review this yet. It would be great if @mbutrovich could also take a look. I have this on my list to review but I haven't been able to find t

Re: [PR] Use `IndexColumn` in all index definitions [datafusion-sqlparser-rs]

2025-06-24 Thread via GitHub
iffyio commented on PR #1900: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1900#issuecomment-2998964400 Ah yeah that's not ideal when consuming the AST, but I think we could leave the foreign key representation as `Ident` since it maps to what's expected at that level - it w

[I] Wrong join precedence parsing for non-Snowflake dialects (nested joins parsed incorrectly) [datafusion-sqlparser-rs]

2025-06-24 Thread via GitHub
Dimchikkk opened a new issue, #1904: URL: https://github.com/apache/datafusion-sqlparser-rs/issues/1904 ```toml [package] name = "sqlparsertest" version = "0.1.0" edition = "2024" [dependencies] sqlparser055 = { package = "sqlparser", version = "0.55.0" } sqlparser056

Re: [I] EPIC: use cp_solver framework to develop a more sophisticated predicate simplification [datafusion]

2025-06-24 Thread via GitHub
adriangb commented on issue #16511: URL: https://github.com/apache/datafusion/issues/16511#issuecomment-3001322864 > Make it a new physical optimizer rule? Or add them to the current physical filter push down Do we need a phyiscal optimizer rule? My thought is that these optimizations

Re: [PR] fix: Add continue after append_null when casting float to decimal [datafusion-comet]

2025-06-24 Thread via GitHub
leung-ming commented on code in PR #1914: URL: https://github.com/apache/datafusion-comet/pull/1914#discussion_r2164577355 ## native/spark-expr/src/conversion_funcs/cast.rs: ## @@ -1298,6 +1298,7 @@ where }); } else {

Re: [PR] fix: Make cast from float/double to decimal compatible with Spark [datafusion-comet]

2025-06-24 Thread via GitHub
parthchandra commented on PR #1915: URL: https://github.com/apache/datafusion-comet/pull/1915#issuecomment-3001310202 FWIW there is a - [Rust Schubfach create](https://docs.rs/schubfach/latest/schubfach/ ) -- This is an automated message from the Apache Git Service. To respond to the me

Re: [PR] chore: refactor `BuildProbeJoinMetrics` to use `BaselineMetrics` [datafusion]

2025-06-24 Thread via GitHub
Samyak2 commented on code in PR #16500: URL: https://github.com/apache/datafusion/pull/16500#discussion_r2164583753 ## datafusion/physical-plan/src/joins/utils.rs: ## @@ -1212,12 +1213,19 @@ pub(crate) struct BuildProbeJoinMetrics { pub(crate) input_rows: metrics::Count,

Re: [PR] fix: Make cast from float/double to decimal compatible with Spark [datafusion-comet]

2025-06-24 Thread via GitHub
andygrove commented on PR #1915: URL: https://github.com/apache/datafusion-comet/pull/1915#issuecomment-3001390224 Thanks for finding the dragonbox crate @leung-ming. Is there a reason we must add the code to Comet rather than use it as a dependency? -- This is an automated message from

Re: [PR] Add support for Arrow Duration type in Substrait [datafusion]

2025-06-24 Thread via GitHub
gabotechs commented on code in PR #16503: URL: https://github.com/apache/datafusion/pull/16503#discussion_r2164581001 ## datafusion/substrait/src/variation_const.rs: ## @@ -55,6 +55,8 @@ pub const LARGE_CONTAINER_TYPE_VARIATION_REF: u32 = 1; pub const VIEW_CONTAINER_TYPE_VARIAT

Re: [I] [substrait] [sqllogictest] Unsupported cast type: Duration [datafusion]

2025-06-24 Thread via GitHub
gabotechs commented on issue #16285: URL: https://github.com/apache/datafusion/issues/16285#issuecomment-3001400544 I think using a type variation reference as you did in https://github.com/apache/datafusion/pull/16503 is a good solution to this. I see that this is a similar approach taken

[I] Postgres NOT VALID and VALIDATE CONSTRAINT not parsed for ALTER TABLE [datafusion-sqlparser-rs]

2025-06-24 Thread via GitHub
achristmascarl opened a new issue, #1907: URL: https://github.com/apache/datafusion-sqlparser-rs/issues/1907 Doesn't recognize statements creating an unvalidated constraint (`ALTER TABLE "xyz" ADD "constraint_name" NOT VALID`) And also doesn't recognize statements validating constrain

Re: [PR] fix: Make cast from float/double to decimal compatible with Spark [datafusion-comet]

2025-06-24 Thread via GitHub
leung-ming commented on PR #1915: URL: https://github.com/apache/datafusion-comet/pull/1915#issuecomment-3001437169 @parthchandra @andygrove I am going to do the following things those day, 1. check the issue in dragonbox repo. 2. compare it with the original c++ implementation and

Re: [I] [Epic] A collection of Substrait conversion issues [datafusion]

2025-06-24 Thread via GitHub
gabotechs commented on issue #16248: URL: https://github.com/apache/datafusion/issues/16248#issuecomment-3001436651 I imagine that in the same what you were able to encode the extra information about the types in https://github.com/apache/datafusion/pull/16503 as different variations of the

Re: [PR] Blog post on query cancellation [datafusion-site]

2025-06-24 Thread via GitHub
comphead commented on PR #75: URL: https://github.com/apache/datafusion-site/pull/75#issuecomment-3001443602 I was wondering if it makes sense to add challenges why async is challenging to cancel on low level but it probably would be noisy. But just in case this article shed the light on ca

Re: [PR] Add dynamic filter (bounds) pushdown to HashJoinExec [datafusion]

2025-06-24 Thread via GitHub
Dandandan commented on PR #16445: URL: https://github.com/apache/datafusion/pull/16445#issuecomment-3001717244 To share some experience, we recently added some similar pushdown for HashJoinExec (at Coralogix) using sharing of `Arc` / comparing column hashes and it is seems so far very effec

[PR] Add some comments about adding new dependencies in datafusion-sql [datafusion]

2025-06-24 Thread via GitHub
alamb opened a new pull request, #16543: URL: https://github.com/apache/datafusion/pull/16543 ## Which issue does this PR close? - Closes #. ## Rationale for this change - While reviewing https://github.com/apache/datafusion/pull/16161 from @chenkovsky it seems

Re: [I] AQE may materialize a non-supported Final-mode HashAggregate [datafusion-comet]

2025-06-24 Thread via GitHub
coderfender commented on issue #1389: URL: https://github.com/apache/datafusion-comet/issues/1389#issuecomment-324166 Working on this -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spe

Re: [PR] Reuse `BaselineMetrics` in `UnnestMetrics` [datafusion]

2025-06-24 Thread via GitHub
2010YOUY01 commented on code in PR #16497: URL: https://github.com/apache/datafusion/pull/16497#discussion_r2163184021 ## datafusion/physical-plan/src/unnest.rs: ## @@ -284,7 +279,9 @@ impl UnnestStream { loop { return Poll::Ready(match ready!(self.input.po

Re: [PR] fix: SortMergeJoin for timestamp keys [datafusion-comet]

2025-06-24 Thread via GitHub
SKY-ALIN commented on code in PR #1901: URL: https://github.com/apache/datafusion-comet/pull/1901#discussion_r2164969731 ## spark/src/test/scala/org/apache/comet/exec/CometJoinSuite.scala: ## @@ -54,25 +54,6 @@ class CometJoinSuite extends CometTestBase { .toSeq) }

[PR] Fix join precedence for non-snowflake queries [datafusion-sqlparser-rs]

2025-06-24 Thread via GitHub
Dimchikkk opened a new pull request, #1905: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1905 Fixes https://github.com/apache/datafusion-sqlparser-rs/issues/1904. The bug was introduced in https://github.com/apache/datafusion-sqlparser-rs/pull/1799. -- This is an aut

Re: [I] [EPIC] expression pushdown and file level expression handling [datafusion]

2025-06-24 Thread via GitHub
adriangb commented on issue #16528: URL: https://github.com/apache/datafusion/issues/16528#issuecomment-3000748063 I added an example of how https://github.com/apache/datafusion/pull/16461 will help solve https://github.com/apache/datafusion/issues/14993 in https://github.com/apache/datafus

Re: [PR] Implement Parquet filter pushdown via new filter pushdown APIs [datafusion]

2025-06-24 Thread via GitHub
adriangb commented on code in PR #15769: URL: https://github.com/apache/datafusion/pull/15769#discussion_r2163855317 ## datafusion/datasource/src/file_format.rs: ## @@ -94,7 +93,6 @@ pub trait FileFormat: Send + Sync + fmt::Debug { &self, state: &dyn Session,

Re: [PR] [WIP] Remove COMET_SHUFFLE_FALLBACK_TO_COLUMNAR config #1736 - v_vadlamani [datafusion-comet]

2025-06-24 Thread via GitHub
andygrove closed pull request #1827: [WIP] Remove COMET_SHUFFLE_FALLBACK_TO_COLUMNAR config #1736 - v_vadlamani URL: https://github.com/apache/datafusion-comet/pull/1827 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

Re: [I] Field Naming Collisions [datafusion]

2025-06-24 Thread via GitHub
hknlof closed issue #16478: Field Naming Collisions URL: https://github.com/apache/datafusion/issues/16478 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mai

Re: [PR] Implement Parquet filter pushdown via new filter pushdown APIs [datafusion]

2025-06-24 Thread via GitHub
alamb commented on code in PR #15769: URL: https://github.com/apache/datafusion/pull/15769#discussion_r2163543672 ## datafusion/datasource/src/file_format.rs: ## @@ -94,7 +93,6 @@ pub trait FileFormat: Send + Sync + fmt::Debug { &self, state: &dyn Session,

Re: [PR] Implement Parquet filter pushdown via new filter pushdown APIs [datafusion]

2025-06-24 Thread via GitHub
xudong963 commented on code in PR #15769: URL: https://github.com/apache/datafusion/pull/15769#discussion_r2163862690 ## datafusion/datasource/src/file_format.rs: ## @@ -94,7 +93,6 @@ pub trait FileFormat: Send + Sync + fmt::Debug { &self, state: &dyn Session,

Re: [PR] fix: Add continue after append_null when casting float to decimal [datafusion-comet]

2025-06-24 Thread via GitHub
andygrove commented on code in PR #1914: URL: https://github.com/apache/datafusion-comet/pull/1914#discussion_r2164429663 ## native/spark-expr/src/conversion_funcs/cast.rs: ## @@ -1298,6 +1298,7 @@ where }); } else {

Re: [PR] fix: The inconsistency between scalar and array on the cast of timestamp [datafusion]

2025-06-24 Thread via GitHub
findepi commented on code in PR #16539: URL: https://github.com/apache/datafusion/pull/16539#discussion_r2164433469 ## datafusion/common/src/scalar/mod.rs: ## @@ -3069,7 +3069,7 @@ impl ScalarValue { ScalarValue::Decimal128(Some(decimal_value), _, scale),

Re: [PR] feat: Add ConfigOptions to ScalarFunctionArgs [datafusion]

2025-06-24 Thread via GitHub
Omega359 commented on PR #13527: URL: https://github.com/apache/datafusion/pull/13527#issuecomment-3001594979 The async udf approach using an ExecutionPlan is interesting. I think it's an approach that could work for scalar udfs as well but it would take some time to impl -- This is an a

Re: [I] Investigate performance tradeoff in compressing spill files [datafusion]

2025-06-24 Thread via GitHub
ding-young commented on issue #16367: URL: https://github.com/apache/datafusion/issues/16367#issuecomment-3000103132 ### Need for a Custom Batch Writer? 1. `concat_batches` before writing? I tried a quick local test where, instead of writing one batch at a time using the curre

Re: [PR] fix: The inconsistency between scalar and array on the cast of timestamp [datafusion]

2025-06-24 Thread via GitHub
alamb commented on code in PR #16539: URL: https://github.com/apache/datafusion/pull/16539#discussion_r2164790029 ## datafusion/common/src/scalar/mod.rs: ## @@ -3069,7 +3069,7 @@ impl ScalarValue { ScalarValue::Decimal128(Some(decimal_value), _, scale),

Re: [PR] chore(deps): bump prost-build from 0.13.5 to 0.14.1 in the proto group [datafusion]

2025-06-24 Thread via GitHub
dependabot[bot] commented on PR #16439: URL: https://github.com/apache/datafusion/pull/16439#issuecomment-3001700920 This pull request was built based on a group rule. Closing it will not ignore any of these versions in future pull requests. To ignore these dependencies, configure [ig

Re: [PR] feat: Parquet modular encryption [datafusion]

2025-06-24 Thread via GitHub
alamb commented on code in PR #16351: URL: https://github.com/apache/datafusion/pull/16351#discussion_r2164809665 ## datafusion/core/src/dataframe/parquet.rs: ## @@ -246,4 +246,72 @@ mod tests { Ok(()) } + +#[tokio::test] +async fn roundtrip_parquet_with_

[PR] Consolidate configuration sectionds in docs [datafusion]

2025-06-24 Thread via GitHub
alamb opened a new pull request, #16544: URL: https://github.com/apache/datafusion/pull/16544 ## Which issue does this PR close? - Part of #7013 ## Rationale for this change The main navbar on the left of the page is already pretty big. I noticed that there are

Re: [PR] Add PhysicalExpr optimizer and cast unwrapping [datafusion]

2025-06-24 Thread via GitHub
alamb commented on PR #16530: URL: https://github.com/apache/datafusion/pull/16530#issuecomment-3001900336 I will try and find time to review this tomorrow -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [PR] Add compression_level support to ParquetWriterOptions and enhance write_parquet to accept full options object [datafusion-python]

2025-06-24 Thread via GitHub
timsaucer commented on code in PR #1169: URL: https://github.com/apache/datafusion-python/pull/1169#discussion_r2165031025 ## python/datafusion/dataframe.py: ## @@ -873,7 +877,7 @@ def write_csv(self, path: str | pathlib.Path, with_header: bool = False) -> None def write_

Re: [PR] chore: Enable Spark SQL tests for auto scan mode [datafusion-comet]

2025-06-24 Thread via GitHub
kazuyukitanimura commented on code in PR #1885: URL: https://github.com/apache/datafusion-comet/pull/1885#discussion_r2165034490 ## .github/workflows/spark_sql_test_native_auto.yml: ## @@ -35,7 +45,7 @@ jobs: matrix: os: [ubuntu-24.04] java-version: [11]

Re: [I] array_has function returns null for an empty list ([]) instead of false [datafusion]

2025-06-24 Thread via GitHub
comphead closed issue #16474: array_has function returns null for an empty list ([]) instead of false URL: https://github.com/apache/datafusion/issues/16474 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

Re: [PR] `TableProvider` to skip files in the folder which non relevant to selected reader [datafusion]

2025-06-24 Thread via GitHub
comphead merged PR #16487: URL: https://github.com/apache/datafusion/pull/16487 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

Re: [PR] Minor: Add documentation to `AggregateWindowExpr::get_result_column` [datafusion]

2025-06-24 Thread via GitHub
comphead merged PR #16479: URL: https://github.com/apache/datafusion/pull/16479 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

Re: [PR] Add some comments about adding new dependencies in datafusion-sql [datafusion]

2025-06-24 Thread via GitHub
comphead merged PR #16543: URL: https://github.com/apache/datafusion/pull/16543 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

Re: [I] AQE may materialize a non-supported Final-mode HashAggregate [datafusion-comet]

2025-06-24 Thread via GitHub
coderfender commented on issue #1389: URL: https://github.com/apache/datafusion-comet/issues/1389#issuecomment-3002064952 As per @EmilyMatt ' s comment on the related PR ,this seems to be no longer a bug but a performance boost. @andygrove we might have to remove `bug` label and remark it

Re: [PR] Add microbenchmark for spilling with compression [datafusion]

2025-06-24 Thread via GitHub
ding-young commented on PR #16512: URL: https://github.com/apache/datafusion/pull/16512#issuecomment-378588 As expected, although `lz4_frame` has a lower compression ratio than `zstd`, it runs faster, making it a reasonable tradeoff. However, since it's roughly 2x slower than the uncomp

Re: [I] [EPIC] More Async User Defined Function work [datafusion]

2025-06-24 Thread via GitHub
goldmedal commented on issue #16520: URL: https://github.com/apache/datafusion/issues/16520#issuecomment-3000487680 I only implemented the function expression in `Projection` and `Filter`. Maybe we can implement for other plans, like the condition of `Join`. Then, we only currently support

Re: [I] CLI shows highlight syntax [datafusion]

2025-06-24 Thread via GitHub
l1t1 commented on issue #16536: URL: https://github.com/apache/datafusion/issues/16536#issuecomment-3000339251 ![Image](https://github.com/user-attachments/assets/8d112bb3-676e-4972-9210-b9efe37a007e) ![Image](https://github.com/user-attachments/assets/b7a5b8b2-19aa-430d-a72a-f570c09

Re: [PR] Add dynamic filter (bounds) pushdown to HashJoinExec [datafusion]

2025-06-24 Thread via GitHub
adriangb commented on code in PR #16445: URL: https://github.com/apache/datafusion/pull/16445#discussion_r2164714496 ## datafusion/core/tests/physical_optimizer/filter_pushdown/mod.rs: ## @@ -433,6 +433,117 @@ async fn test_topk_dynamic_filter_pushdown() { ); } +#[tokio:

Re: [PR] Add dynamic filter (bounds) pushdown to HashJoinExec [datafusion]

2025-06-24 Thread via GitHub
adriangb commented on code in PR #16445: URL: https://github.com/apache/datafusion/pull/16445#discussion_r2164716153 ## datafusion/physical-plan/src/joins/hash_join.rs: ## @@ -666,10 +679,25 @@ impl DisplayAs for HashJoinExec { .map(|(c1, c2)| format!("({c1}

Re: [PR] adapt filter expressions to file schema during parquet scan [datafusion]

2025-06-24 Thread via GitHub
adriangb commented on PR #16461: URL: https://github.com/apache/datafusion/pull/16461#issuecomment-3000698443 @kosiew could you take a look at 32725dd? > Complex handling for deeply nested types. I do think this is a concern, I'm not sure how hard it would be to actually implem

Re: [I] Make `datafusion` read parquet folders if non parquet files exists [datafusion]

2025-06-24 Thread via GitHub
comphead closed issue #16460: Make `datafusion` read parquet folders if non parquet files exists URL: https://github.com/apache/datafusion/issues/16460 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go t

Re: [PR] feat: rand expression support [datafusion-comet]

2025-06-24 Thread via GitHub
andygrove commented on code in PR #1199: URL: https://github.com/apache/datafusion-comet/pull/1199#discussion_r2164642169 ## spark/src/test/scala/org/apache/comet/CometExpressionSuite.scala: ## @@ -2806,6 +2806,26 @@ class CometExpressionSuite extends CometTestBase with Adaptiv

Re: [I] Field Naming Collisions [datafusion]

2025-06-24 Thread via GitHub
hknlof commented on issue #16478: URL: https://github.com/apache/datafusion/issues/16478#issuecomment-2999702499 Hi @kosiew, thanks for your response. Updating from `47.0.0` to `48.0.0` resolves the issue. Your example had issues with `47.0.0`, as well. Apologies, if this is a duplicate of

Re: [PR] Simplify AsyncScalarUdfImpl so it extends ScalarUdfImpl [datafusion]

2025-06-24 Thread via GitHub
goldmedal commented on code in PR #16523: URL: https://github.com/apache/datafusion/pull/16523#discussion_r2164074452 ## datafusion-examples/examples/async_udf.rs: ## @@ -243,9 +252,20 @@ impl AsyncScalarUDFImpl for AsyncEqual { Ok(DataType::Boolean) } +fn in

Re: [PR] Implement Parquet filter pushdown via new filter pushdown APIs [datafusion]

2025-06-24 Thread via GitHub
xudong963 commented on code in PR #15769: URL: https://github.com/apache/datafusion/pull/15769#discussion_r2163561507 ## datafusion/datasource/src/file_format.rs: ## @@ -94,7 +93,6 @@ pub trait FileFormat: Send + Sync + fmt::Debug { &self, state: &dyn Session,

Re: [I] Implement a script to detect breaking changes automatically [datafusion]

2025-06-24 Thread via GitHub
lucqui commented on issue #16532: URL: https://github.com/apache/datafusion/issues/16532#issuecomment-3000638331 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To uns

Re: [PR] feat: Support hadoop s3a config in native_iceberg_compat [datafusion-comet]

2025-06-24 Thread via GitHub
parthchandra commented on PR #1925: URL: https://github.com/apache/datafusion-comet/pull/1925#issuecomment-3001134902 > I have tested it with AWS S3 and it worked fine Thank you! I managed to test it with S3 as well. -- This is an automated message from the Apache Git Service. To

Re: [PR] feat: Implement ToPrettyString [datafusion-comet]

2025-06-24 Thread via GitHub
andygrove merged PR #1921: URL: https://github.com/apache/datafusion-comet/pull/1921 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@

Re: [PR] feat: Implement ToPrettyString [datafusion-comet]

2025-06-24 Thread via GitHub
andygrove commented on PR #1921: URL: https://github.com/apache/datafusion-comet/pull/1921#issuecomment-3001191520 Thanks for the review @comphead -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go t

Re: [PR] feat: support array_distinct [datafusion-comet]

2025-06-24 Thread via GitHub
andygrove commented on PR #1306: URL: https://github.com/apache/datafusion-comet/pull/1306#issuecomment-3001199064 Thanks for the contribution @NoeB, but this PR has become stale and there is now https://github.com/apache/datafusion-comet/pull/1923 so I will close this one. -- This is a

[I] `array_contains` falls back to Spark in case of empty Array [datafusion-comet]

2025-06-24 Thread via GitHub
comphead opened a new issue, #1929: URL: https://github.com/apache/datafusion-comet/issues/1929 ### Describe the bug After reviewing https://github.com/apache/datafusion/pull/16529 I added empty array test to Comet ``` test("array_contains") { withSQLConf(CometC

[PR] Auto file prefix [datafusion-comet]

2025-06-24 Thread via GitHub
andygrove opened a new pull request, #1930: URL: https://github.com/apache/datafusion-comet/pull/1930 ## Which issue does this PR close? Closes https://github.com/apache/datafusion-comet/issues/1908 ## Rationale for this change ## What changes are included

Re: [PR] chore: refactor `BuildProbeJoinMetrics` to use `BaselineMetrics` [datafusion]

2025-06-24 Thread via GitHub
Samyak2 commented on code in PR #16500: URL: https://github.com/apache/datafusion/pull/16500#discussion_r2164580513 ## datafusion/physical-plan/src/joins/cross_join.rs: ## @@ -632,7 +632,7 @@ impl CrossJoinStream { } self.join_metrics.

Re: [PR] Support `array_union` scalar expr [datafusion-comet]

2025-06-24 Thread via GitHub
andygrove commented on PR #1362: URL: https://github.com/apache/datafusion-comet/pull/1362#issuecomment-3001253623 I'm closing this issue because it has been stale for a while. @dharanad feel free to reopen this if you resume work on this featrure. -- This is an automated message from th

Re: [PR] chore: Enable Spark SQL tests for auto scan mode [datafusion-comet]

2025-06-24 Thread via GitHub
andygrove commented on code in PR #1885: URL: https://github.com/apache/datafusion-comet/pull/1885#discussion_r2165064746 ## .github/workflows/spark_sql_test_native_auto.yml: ## @@ -35,7 +45,7 @@ jobs: matrix: os: [ubuntu-24.04] java-version: [11] -

Re: [PR] Add note for planning release in Upgrade Guides [datafusion]

2025-06-24 Thread via GitHub
alamb commented on code in PR #16534: URL: https://github.com/apache/datafusion/pull/16534#discussion_r2164482272 ## docs/source/library-user-guide/upgrading.md: ## @@ -21,6 +21,8 @@ ## DataFusion `49.0.0` +**Note:** DataFusion 49.0.0 has not been released yet. The informat

Re: [PR] fix: Make cast from float/double to decimal compatible with Spark [datafusion-comet]

2025-06-24 Thread via GitHub
leung-ming commented on PR #1915: URL: https://github.com/apache/datafusion-comet/pull/1915#issuecomment-3001405315 @parthchandra @andygrove I checked [ryu](https://crates.io/crates/ryu), [schubfach](https://crates.io/crates/schubfach) and [dragonbox](https://crates.io/crates/dragonbox

[PR] Feat/dataframe str formatter [datafusion-python]

2025-06-24 Thread via GitHub
timsaucer opened a new pull request, #1170: URL: https://github.com/apache/datafusion-python/pull/1170 # Which issue does this PR close? None # Rationale for this change We recently added a html formatter, which is useful for generating html tables as a customer. This a

Re: [PR] feat: supports array_distinct [datafusion-comet]

2025-06-24 Thread via GitHub
andygrove commented on code in PR #1923: URL: https://github.com/apache/datafusion-comet/pull/1923#discussion_r2164099829 ## spark/src/test/scala/org/apache/comet/CometArrayExpressionSuite.scala: ## @@ -232,24 +232,42 @@ class CometArrayExpressionSuite extends CometTestBase wit

[PR] Snowflake future grants [datafusion-sqlparser-rs]

2025-06-24 Thread via GitHub
yoavcloud opened a new pull request, #1906: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1906 Added support for parsing FUTURE grants in Snowflake -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

Re: [PR] adapt filter expressions to file schema during parquet scan [datafusion]

2025-06-24 Thread via GitHub
kosiew commented on PR #16461: URL: https://github.com/apache/datafusion/pull/16461#issuecomment-3000612368 @adriangb , Sorry, it was not my intention to presume the conclusions. I do look forward to a solution that handles schema adaptation in one pass. -- This is an automat

Re: [PR] feat: Implement ToPrettyString [datafusion-comet]

2025-06-24 Thread via GitHub
andygrove commented on code in PR #1921: URL: https://github.com/apache/datafusion-comet/pull/1921#discussion_r2164131324 ## native/core/src/execution/planner.rs: ## @@ -746,6 +746,22 @@ impl PhysicalPlanner { let child = self.create_expr(expr.child.as_ref().unw

[I] Add unit tests for Parquet methods used by Apache Iceberg [datafusion-comet]

2025-06-24 Thread via GitHub
andygrove opened a new issue, #1928: URL: https://github.com/apache/datafusion-comet/issues/1928 ### What is the problem the feature request solves? PR https://github.com/apache/datafusion-comet/pull/1920 adds/updates methods intenced to be used by Apache Iceberg, but did not add any

Re: [PR] feat: Encapsulate Parquet objects [datafusion-comet]

2025-06-24 Thread via GitHub
andygrove commented on code in PR #1920: URL: https://github.com/apache/datafusion-comet/pull/1920#discussion_r2164143550 ## common/src/main/java/org/apache/comet/parquet/ColumnReader.java: ## @@ -126,6 +126,13 @@ public void setPageReader(PageReader pageReader) throws IOExcept

Re: [PR] feat: support array_distinct [datafusion-comet]

2025-06-24 Thread via GitHub
andygrove closed pull request #1306: feat: support array_distinct URL: https://github.com/apache/datafusion-comet/pull/1306 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To u

Re: [PR] fix: SortMergeJoin for timestamp keys [datafusion-comet]

2025-06-24 Thread via GitHub
parthchandra commented on code in PR #1901: URL: https://github.com/apache/datafusion-comet/pull/1901#discussion_r2164519556 ## spark/src/test/scala/org/apache/comet/exec/CometJoinSuite.scala: ## @@ -54,25 +54,6 @@ class CometJoinSuite extends CometTestBase { .toSeq)

Re: [PR] Support array_position [datafusion-comet]

2025-06-24 Thread via GitHub
andygrove commented on PR #1363: URL: https://github.com/apache/datafusion-comet/pull/1363#issuecomment-3001257247 I'm closing this PR since it has been stale for a while. @dharanad feel free to reopen this if you resume work on this feature. -- This is an automated message from the Apac

Re: [PR] [wip] Add scripts for running benchmarks on EC2 [datafusion-comet]

2025-06-24 Thread via GitHub
andygrove closed pull request #1654: [wip] Add scripts for running benchmarks on EC2 URL: https://github.com/apache/datafusion-comet/pull/1654 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spe

Re: [PR] [WIP] Remove COMET_SHUFFLE_FALLBACK_TO_COLUMNAR config #1736 - v_vadlamani [datafusion-comet]

2025-06-24 Thread via GitHub
andygrove commented on PR #1827: URL: https://github.com/apache/datafusion-comet/pull/1827#issuecomment-3001268907 @coderfender I think this PR is no longer needed, so I will close for now. Feel free to reopen if you still need it. -- This is an automated message from the Apache Git Serv

Re: [PR] Minor: Add unit tests for `ceil`/`floor` functions [datafusion-comet]

2025-06-24 Thread via GitHub
andygrove commented on PR #1728: URL: https://github.com/apache/datafusion-comet/pull/1728#issuecomment-3001276431 Thanks for the PR @tlm365. Could I suggest that you mark the tests with `ignore` and add a link to the issue https://github.com/apache/datafusion-comet/issues/1729 and then we

Re: [PR] feat: rand expression support [datafusion-comet]

2025-06-24 Thread via GitHub
andygrove commented on PR #1199: URL: https://github.com/apache/datafusion-comet/pull/1199#issuecomment-3001227976 I upmerged this PR and re-triggered the workflows. Sorry for the delay @akupchinskiy -- This is an automated message from the Apache Git Service. To respond to the message,

Re: [PR] adapt filter expressions to file schema during parquet scan [datafusion]

2025-06-24 Thread via GitHub
adriangb commented on PR #16461: URL: https://github.com/apache/datafusion/pull/16461#issuecomment-3000255690 @kosiew I'm not sure I agree with the conclusions there. Why can't we use expressions to do the schema adapting during the scan? It's very possible as @alamb pointed out in https:/

Re: [PR] Fix signature of `__arrow_c_stream__` [datafusion-python]

2025-06-24 Thread via GitHub
timsaucer merged PR #1168: URL: https://github.com/apache/datafusion-python/pull/1168 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...

Re: [PR] feat: Support hadoop s3a config in native_iceberg_compat [datafusion-comet]

2025-06-24 Thread via GitHub
andygrove commented on code in PR #1925: URL: https://github.com/apache/datafusion-comet/pull/1925#discussion_r2164542993 ## native/core/src/parquet/mod.rs: ## @@ -644,6 +647,26 @@ fn get_file_groups_single_file( vec![groups] } +pub fn get_object_store_options( +env:

Re: [I] Different result of decimal to timestamp cast when source value is constant [datafusion]

2025-06-24 Thread via GitHub
chenkovsky commented on issue #16531: URL: https://github.com/apache/datafusion/issues/16531#issuecomment-3000803343 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[PR] fix: reject within_group for non ordered aggregate function [datafusion]

2025-06-24 Thread via GitHub
chenkovsky opened a new pull request, #16538: URL: https://github.com/apache/datafusion/pull/16538 ## Which issue does this PR close? - Closes #16515. ## Rationale for this change WITHIN GROUP clause gets ignored for non ordered aggregate function ## What changes a

Re: [PR] Blog post on query cancellation [datafusion-site]

2025-06-24 Thread via GitHub
alamb commented on PR #75: URL: https://github.com/apache/datafusion-site/pull/75#issuecomment-3000831078 I pushed the images to this post, updated the publish date to June 30 (next Monday), and started doing some wordsmiting ![Screenshot 2025-06-24 at 10 52 01  AM](https://github.com

Re: [PR] feat: Finalize support for `RightMark` join + `Mark` join swap [datafusion]

2025-06-24 Thread via GitHub
comphead commented on PR #16488: URL: https://github.com/apache/datafusion/pull/16488#issuecomment-3000862878 > If you have the time, are you able to take a look? Should be a straightforward review, thanks! @comphead @Dandandan Thanks @jonathanc-n its on my list! -- This is an auto

Re: [PR] Reuse `BaselineMetrics` in `UnnestMetrics` [datafusion]

2025-06-24 Thread via GitHub
hendrikmakait commented on code in PR #16497: URL: https://github.com/apache/datafusion/pull/16497#discussion_r2164362053 ## datafusion/physical-plan/src/unnest.rs: ## @@ -299,7 +296,7 @@ impl UnnestStream { continue; };

Re: [PR] fix: Add continue after append_null when casting float to decimal [datafusion-comet]

2025-06-24 Thread via GitHub
andygrove commented on code in PR #1914: URL: https://github.com/apache/datafusion-comet/pull/1914#discussion_r2164336592 ## native/spark-expr/src/conversion_funcs/cast.rs: ## @@ -1298,6 +1298,7 @@ where }); } else {

Re: [I] Refactor `UnnestMetrics` to reuse `BaselineMetrics` [datafusion]

2025-06-24 Thread via GitHub
alamb closed issue #16494: Refactor `UnnestMetrics` to reuse `BaselineMetrics` URL: https://github.com/apache/datafusion/issues/16494 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comm

Re: [PR] Reuse `BaselineMetrics` in `UnnestMetrics` [datafusion]

2025-06-24 Thread via GitHub
alamb merged PR #16497: URL: https://github.com/apache/datafusion/pull/16497 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] Skip re-pruning based on partition values and file level stats if there are no dynamic filters [datafusion]

2025-06-24 Thread via GitHub
alamb commented on PR #16424: URL: https://github.com/apache/datafusion/pull/16424#issuecomment-3001652418 Also thank you @xudong963 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] Skip re-pruning based on partition values and file level stats if there are no dynamic filters [datafusion]

2025-06-24 Thread via GitHub
alamb commented on code in PR #16424: URL: https://github.com/apache/datafusion/pull/16424#discussion_r2164757786 ## datafusion/datasource-parquet/src/opener.rs: ## @@ -524,6 +512,91 @@ fn should_enable_page_index( .unwrap_or(false) } +/// Prune based on partitio

Re: [I] array_has function returns null for an empty list ([]) instead of false [datafusion]

2025-06-24 Thread via GitHub
alamb commented on issue #16474: URL: https://github.com/apache/datafusion/issues/16474#issuecomment-3001656082 @kosiew has a proposed PR to fix this: - https://github.com/apache/datafusion/pull/16529 -- This is an automated message from the Apache Git Service. To respond to the messag

Re: [PR] Blog post on query cancellation [datafusion-site]

2025-06-24 Thread via GitHub
alamb commented on PR #75: URL: https://github.com/apache/datafusion-site/pull/75#issuecomment-3001672876 > I was wondering if it makes sense to add challenges why async is challenging to cancel on low level but it probably would be noisy. But just in case this article shed the light on can

Re: [PR] fix: Add overflow check for SumDecimalGroupsAccumulator::evaluate [datafusion-comet]

2025-06-24 Thread via GitHub
parthchandra commented on code in PR #1922: URL: https://github.com/apache/datafusion-comet/pull/1922#discussion_r2164776753 ## native/spark-expr/src/agg_funcs/sum_decimal.rs: ## @@ -375,11 +375,17 @@ impl GroupsAccumulator for SumDecimalGroupsAccumulator { // are

Re: [PR] Blog post on query cancellation [datafusion-site]

2025-06-24 Thread via GitHub
alamb commented on PR #75: URL: https://github.com/apache/datafusion-site/pull/75#issuecomment-3001676393 In my opinion this article does a pretty good job explaining the issues with cancellation, but it doesn't talk about `async` destructors which I agree are probably best left out of scop

Re: [PR] chore: Enable `native_iceberg_compat` Spark SQL tests (for real, this time) [datafusion-comet]

2025-06-24 Thread via GitHub
parthchandra commented on code in PR #1910: URL: https://github.com/apache/datafusion-comet/pull/1910#discussion_r2164779883 ## dev/diffs/3.5.6.diff: ## @@ -1938,7 +1938,17 @@ index 8e88049f51e..d3c0737d52e 100644 import testImplicits._ // keep() should take effect o

Re: [PR] Allow unparser to override the alias name for the specific dialect [datafusion]

2025-06-24 Thread via GitHub
alamb commented on code in PR #16540: URL: https://github.com/apache/datafusion/pull/16540#discussion_r2164780443 ## datafusion/sql/tests/cases/plan_to_sql.rs: ## @@ -923,6 +923,35 @@ fn roundtrip_statement_with_dialect_45() -> Result<(), DataFusionError> { Ok(()) } +#[

Re: [PR] Add dynamic filter (bounds) pushdown to HashJoinExec [datafusion]

2025-06-24 Thread via GitHub
adriangb commented on PR #16445: URL: https://github.com/apache/datafusion/pull/16445#issuecomment-3001813655 I'll add that :) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment

Re: [PR] feat: rand expression support [datafusion-comet]

2025-06-24 Thread via GitHub
andygrove commented on code in PR #1199: URL: https://github.com/apache/datafusion-comet/pull/1199#discussion_r2164559552 ## spark/src/test/scala/org/apache/comet/CometExpressionSuite.scala: ## @@ -2806,6 +2806,26 @@ class CometExpressionSuite extends CometTestBase with Adaptiv

  1   2   >