Re: [PR] Blog post on query cancellation [datafusion-site]

2025-06-25 Thread via GitHub
pepijnve commented on PR #75: URL: https://github.com/apache/datafusion-site/pull/75#issuecomment-3003611703 > I was wondering if it makes sense to add challenges why async is challenging to cancel on low level but it probably would be noisy. But just in case this article shed the light on

Re: [PR] Blog post on query cancellation [datafusion-site]

2025-06-25 Thread via GitHub
pepijnve commented on PR #75: URL: https://github.com/apache/datafusion-site/pull/75#issuecomment-3003623249 > In my opinion this article does a pretty good job explaining the issues with cancellation, but it doesn't talk about `async` destructors which I agree are probably best left out of

Re: [PR] fix: Incorrect memory accounting in `array_agg` function [datafusion]

2025-06-25 Thread via GitHub
gabotechs commented on code in PR #16519: URL: https://github.com/apache/datafusion/pull/16519#discussion_r2165946809 ## datafusion/functions-aggregate/src/array_agg.rs: ## @@ -341,12 +341,20 @@ impl Accumulator for ArrayAggAccumulator { Some(values) => {

Re: [PR] feat: support table sample [datafusion]

2025-06-25 Thread via GitHub
theirix commented on PR #16505: URL: https://github.com/apache/datafusion/pull/16505#issuecomment-3003701483 @2010YOUY01 thank you for pointing this out. @chenkovsky, it looks like both our PRs solve the same sampling problem from different approaches. The direction of my PR is to con

Re: [PR] fix: Incorrect memory accounting in `array_agg` function [datafusion]

2025-06-25 Thread via GitHub
gabotechs commented on code in PR #16519: URL: https://github.com/apache/datafusion/pull/16519#discussion_r2165952424 ## datafusion/functions-aggregate/src/array_agg.rs: ## @@ -994,6 +1002,34 @@ mod tests { Ok(()) } +#[test] +fn does_not_over_account_memo

[I] Physical plan pushdown for volatile predicates [datafusion]

2025-06-25 Thread via GitHub
theirix opened a new issue, #16545: URL: https://github.com/apache/datafusion/issues/16545 ### Describe the bug This is a follow-up to a discussion in https://github.com/apache/datafusion/pull/16325#issuecomment-2985522134, which is not directly related to table sampling but could af

Re: [PR] Support data source sampling with TABLESAMPLE [datafusion]

2025-06-25 Thread via GitHub
theirix commented on PR #16325: URL: https://github.com/apache/datafusion/pull/16325#issuecomment-3003725980 > I'd like to double-check if a volatile filter pushdown to a Parquet executor is expected. In the mentioned PR, I disabled optimisation in a logical plan optimiser to push down vola

[PR] chore(deps): bump mimalloc from 0.1.46 to 0.1.47 [datafusion]

2025-06-25 Thread via GitHub
dependabot[bot] opened a new pull request, #16547: URL: https://github.com/apache/datafusion/pull/16547 Bumps [mimalloc](https://github.com/purpleprotocol/mimalloc_rust) from 0.1.46 to 0.1.47. Release notes Sourced from https://github.com/purpleprotocol/mimalloc_rust/releases";>mim

[PR] chore(deps): bump prost-build from 0.13.5 to 0.14.1 in the proto group [datafusion]

2025-06-25 Thread via GitHub
dependabot[bot] opened a new pull request, #16546: URL: https://github.com/apache/datafusion/pull/16546 Bumps the proto group with 1 update: [prost-build](https://github.com/tokio-rs/prost). Updates `prost-build` from 0.13.5 to 0.14.1 Changelog Sourced from https://github.co

[PR] chore(deps): bump object_store from 0.12.1 to 0.12.2 [datafusion]

2025-06-25 Thread via GitHub
dependabot[bot] opened a new pull request, #16548: URL: https://github.com/apache/datafusion/pull/16548 Bumps [object_store](https://github.com/apache/arrow-rs-object-store) from 0.12.1 to 0.12.2. Changelog Sourced from https://github.com/apache/arrow-rs-object-store/blob/main/CHAN

Re: [PR] feat: support table sample [datafusion]

2025-06-25 Thread via GitHub
chenkovsky commented on PR #16505: URL: https://github.com/apache/datafusion/pull/16505#issuecomment-3003924958 > @2010YOUY01 thank you for pointing this out. > > @chenkovsky, it looks like both our PRs solve the same sampling problem from different approaches. The direction of my PR

Re: [PR] chore: use DF scalar functions for StartsWith, EndsWith, Contains, DF LikeExpr [datafusion-comet]

2025-06-25 Thread via GitHub
mbutrovich commented on PR #1887: URL: https://github.com/apache/datafusion-comet/pull/1887#issuecomment-3004032495 Converting back to draft for now since I likely won't sort out the dictionary unpacking for a couple of weeks. -- This is an automated message from the Apache Git Service.

[PR] Support timestamp and date arguments for `range` and `generate_series` table functions [datafusion]

2025-06-25 Thread via GitHub
simonvandel opened a new pull request, #16552: URL: https://github.com/apache/datafusion/pull/16552 ## Which issue does this PR close? - Closes https://github.com/apache/datafusion/issues/14209 ## Rationale for this change The table functions `range` and `

Re: [PR] Fix `impl Ord for Ident` [datafusion-sqlparser-rs]

2025-06-25 Thread via GitHub
iffyio merged PR #1893: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1893 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr

Re: [PR] fix: Add overflow check for SumDecimalGroupsAccumulator::evaluate [datafusion-comet]

2025-06-25 Thread via GitHub
leung-ming commented on code in PR #1922: URL: https://github.com/apache/datafusion-comet/pull/1922#discussion_r2165657752 ## native/spark-expr/src/agg_funcs/sum_decimal.rs: ## @@ -375,11 +375,17 @@ impl GroupsAccumulator for SumDecimalGroupsAccumulator { // are nu

Re: [PR] chore: Introduce `exprHandlers` map in QueryPlanSerde [datafusion-comet]

2025-06-25 Thread via GitHub
mbutrovich merged PR #1903: URL: https://github.com/apache/datafusion-comet/pull/1903 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...

Re: [PR] Snowflake: support multiple column options in `CREATE VIEW` [datafusion-sqlparser-rs]

2025-06-25 Thread via GitHub
iffyio commented on code in PR #1891: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1891#discussion_r2166403214 ## tests/sqlparser_snowflake.rs: ## @@ -4165,3 +4167,10 @@ fn test_snowflake_fetch_clause_syntax() { canonical, ); } + +#[test] +fn test_

Re: [PR] feat: Parquet modular encryption [datafusion]

2025-06-25 Thread via GitHub
corwinjoy commented on PR #16351: URL: https://github.com/apache/datafusion/pull/16351#issuecomment-3003280082 > Thank you @corwinjoy and @adamreeve -- this PR was a joy to read and review. The code is clear, well commented, and well tested ❤️ 🏆 > > I think we should follow up with:

Re: [PR] fix: Incorrect memory accounting in `array_agg` function [datafusion]

2025-06-25 Thread via GitHub
gabotechs commented on code in PR #16519: URL: https://github.com/apache/datafusion/pull/16519#discussion_r2165952424 ## datafusion/functions-aggregate/src/array_agg.rs: ## @@ -994,6 +1002,34 @@ mod tests { Ok(()) } +#[test] +fn does_not_over_account_memo

Re: [PR] chore: Enable Spark SQL tests for auto scan mode [datafusion-comet]

2025-06-25 Thread via GitHub
andygrove merged PR #1885: URL: https://github.com/apache/datafusion-comet/pull/1885 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@

Re: [PR] feat: supports array_distinct [datafusion-comet]

2025-06-25 Thread via GitHub
andygrove commented on PR #1923: URL: https://github.com/apache/datafusion-comet/pull/1923#issuecomment-3004790090 > > ...1 more thing please add tests with empty array. > > I tested array_distinct with an empty array. > > ``` > SELECT array_distinct(array()) FROM t1; >

Re: [PR] fix: Incorrect memory accounting in `array_agg` function [datafusion]

2025-06-25 Thread via GitHub
gabotechs commented on code in PR #16519: URL: https://github.com/apache/datafusion/pull/16519#discussion_r2166863158 ## datafusion/functions-aggregate/src/array_agg.rs: ## @@ -341,12 +341,20 @@ impl Accumulator for ArrayAggAccumulator { Some(values) => {

[PR] Move Pruning Logic to a Dedicated datafusion-pruning Crate for Improved Modularity [datafusion]

2025-06-25 Thread via GitHub
kosiew opened a new pull request, #16549: URL: https://github.com/apache/datafusion/pull/16549 ## Which issue does this PR close? Closes #16542 ## Rationale for this change This change modularizes the pruning functionality currently embedded within the core DataFusion co

Re: [PR] feat: dataframe string formatter [datafusion-python]

2025-06-25 Thread via GitHub
timsaucer commented on code in PR #1170: URL: https://github.com/apache/datafusion-python/pull/1170#discussion_r2166481205 ## python/datafusion/dataframe_formatter.py: ## @@ -0,0 +1,739 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor lice

[PR] Avoid clones when calling find_window_exprs [datafusion]

2025-06-25 Thread via GitHub
findepi opened a new pull request, #16551: URL: https://github.com/apache/datafusion/pull/16551 Update `find_window_exprs` signature to be as flexible as `find_aggregate_exprs`, letting the caller avoid Expr clones. -- This is an automated message from the Apache Git Service. To respo

Re: [PR] feat: support table sample [datafusion]

2025-06-25 Thread via GitHub
chenkovsky commented on PR #16505: URL: https://github.com/apache/datafusion/pull/16505#issuecomment-3004038741 some comments were added in cargo file today. https://github.com/apache/datafusion/blob/20a723b7b6d91da57fe6abea8ecac08ea5267a89/datafusion/sql/Cargo.toml#L49 . it makes s

Re: [PR] Add dynamic filter (bounds) pushdown to HashJoinExec [datafusion]

2025-06-25 Thread via GitHub
adriangb commented on PR #16445: URL: https://github.com/apache/datafusion/pull/16445#issuecomment-3004662324 @Dandandan any chance you'd be willing to contribute your implementation of sharing `Arc` so we use something we know is working / I don't have to re-invent the wheel? I think you c

Re: [PR] feat: Encapsulate Parquet objects [datafusion-comet]

2025-06-25 Thread via GitHub
andygrove commented on PR #1920: URL: https://github.com/apache/datafusion-comet/pull/1920#issuecomment-3004865649 I am testing this locally now. There is still one API call that references a Parquet class, causing Iceberg to fail to compile: ``` /home/andy/git/apache/iceberg/par

Re: [PR] Snowflake: support multiple column options in `CREATE VIEW` [datafusion-sqlparser-rs]

2025-06-25 Thread via GitHub
iffyio merged PR #1891: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1891 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr

Re: [PR] feat: Encapsulate Parquet objects [datafusion-comet]

2025-06-25 Thread via GitHub
andygrove commented on PR #1920: URL: https://github.com/apache/datafusion-comet/pull/1920#issuecomment-3004952533 I added the following method to `FileReader` locally: ```scala /** Sets the projected columns to be read later via {@link #readNextRowGroup()} */ public void s

[I] Aggregations on partition columns may fail if `wrap_partition_value_in_dict` is used. [datafusion]

2025-06-25 Thread via GitHub
debajyoti-truefoundry opened a new issue, #16550: URL: https://github.com/apache/datafusion/issues/16550 ### Describe the bug https://github.com/delta-io/delta-rs/blob/2cc8081caf89ec4aa162d8c9a903d3b9890fcac9/crates/core/src/delta_datafusion/mod.rs#L679 https://github.com/apache/d

Re: [PR] feat: dataframe string formatter [datafusion-python]

2025-06-25 Thread via GitHub
timsaucer merged PR #1170: URL: https://github.com/apache/datafusion-python/pull/1170 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...

Re: [PR] adapt filter expressions to file schema during parquet scan [datafusion]

2025-06-25 Thread via GitHub
adriangb commented on PR #16461: URL: https://github.com/apache/datafusion/pull/16461#issuecomment-3004580959 > Every invocation in the evaluator will loop over rows to build child arrays, then pack them into a StructArray As far as I know a PhysicalExpr can operate at the array level

Re: [I] Support standard syntax for filtered aggregations [datafusion]

2025-06-25 Thread via GitHub
chenkovsky commented on issue #16516: URL: https://github.com/apache/datafusion/issues/16516#issuecomment-3004614625 it's solved in https://github.com/apache/datafusion-sqlparser-rs/commit/3ec80e187d163c4f90c5bfc7c04ef71a2705a631 -- This is an automated message from the Apache Git Servi

Re: [PR] adapt filter expressions to file schema during parquet scan [datafusion]

2025-06-25 Thread via GitHub
adriangb commented on PR #16461: URL: https://github.com/apache/datafusion/pull/16461#issuecomment-3004630417 Thank you very much for the feedback @kosiew 🙏🏻! I don't mean to disregard it, you make great points, but I think they are surmountable! Let's move forward with this and keep iterat

Re: [PR] Add support for Arrow Duration type in Substrait [datafusion]

2025-06-25 Thread via GitHub
alamb commented on PR #16503: URL: https://github.com/apache/datafusion/pull/16503#issuecomment-3004828625 > I don't 100% know how the substrait plans are used, but I am slightly worried about one thing with this approach. Would it be possible to construct a substrait plan that converts the

Re: [PR] Fix `impl Ord for Ident` [datafusion-sqlparser-rs]

2025-06-25 Thread via GitHub
alamb commented on PR #1893: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1893#issuecomment-3004814250 🚀 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [I] Add tests for yielding / cancelling in SpillManager [datafusion]

2025-06-25 Thread via GitHub
ding-young commented on issue #16482: URL: https://github.com/apache/datafusion/issues/16482#issuecomment-3004872124 Hi, I just came across this issue and wanted to leave a comment after working around writing some test codes. I wrote a test in my local branch where I tried to simula

Re: [I] Unify write_parquet signatures [datafusion-python]

2025-06-25 Thread via GitHub
timsaucer closed issue #1162: Unify write_parquet signatures URL: https://github.com/apache/datafusion-python/issues/1162 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To uns

Re: [PR] Snowflake: support multiple column options in `CREATE VIEW` [datafusion-sqlparser-rs]

2025-06-25 Thread via GitHub
eliaperantoni commented on code in PR #1891: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1891#discussion_r2166676894 ## tests/sqlparser_common.rs: ## @@ -7964,7 +7964,7 @@ fn parse_create_view_with_columns() { let sql = "CREATE VIEW v (has, cols) AS SELECT

Re: [PR] Snowflake: support multiple column options in `CREATE VIEW` [datafusion-sqlparser-rs]

2025-06-25 Thread via GitHub
eliaperantoni commented on PR #1891: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1891#issuecomment-3004736081 Updated @iffyio :) Thanks for your feedback! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and us

Re: [I] Postgres NOT VALID and VALIDATE CONSTRAINT not parsed for ALTER TABLE [datafusion-sqlparser-rs]

2025-06-25 Thread via GitHub
git-hulk commented on issue #1907: URL: https://github.com/apache/datafusion-sqlparser-rs/issues/1907#issuecomment-3005046028 @achristmascarl I believe it should be supported since it's the valid syntax. Would you like to support them? If not, I could submit PRs to resolve them when I get

Re: [PR] feat: Parquet modular encryption [datafusion]

2025-06-25 Thread via GitHub
corwinjoy commented on code in PR #16351: URL: https://github.com/apache/datafusion/pull/16351#discussion_r2168131023 ## datafusion-cli/tests/sql/encrypted_parquet.sql: ## @@ -0,0 +1,75 @@ +/* +Test parquet encryption and decryption in DataFusion SQL. +See datafusion/com

Re: [PR] feat: supports array_distinct [datafusion-comet]

2025-06-25 Thread via GitHub
drexler-sky commented on PR #1923: URL: https://github.com/apache/datafusion-comet/pull/1923#issuecomment-3007078399 I stepped into the code. The reason Comet falls back to Spark for the literal [] is that it goes to https://github.com/apache/datafusion-comet/blob/main/spark/src/main/scala

[I] Refactor statement execution logic in `datafusion-cli` [datafusion]

2025-06-25 Thread via GitHub
liamzwbao opened a new issue, #16559: URL: https://github.com/apache/datafusion/issues/16559 Follow-up of #16502 to refactor the logic of statement execution in `datafusion-cli` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

Re: [I] Refactor statement execution logic in `datafusion-cli` [datafusion]

2025-06-25 Thread via GitHub
liamzwbao commented on issue #16559: URL: https://github.com/apache/datafusion/issues/16559#issuecomment-3006746589 Take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [PR] Add microbenchmark for spilling with compression [datafusion]

2025-06-25 Thread via GitHub
2010YOUY01 commented on code in PR #16512: URL: https://github.com/apache/datafusion/pull/16512#discussion_r216805 ## datafusion/physical-plan/benches/spill_io.rs: ## @@ -119,5 +126,414 @@ fn bench_spill_io(c: &mut Criterion) { group.finish(); } -criterion_group!(ben

Re: [PR] feat: Parquet modular encryption [datafusion]

2025-06-25 Thread via GitHub
corwinjoy commented on code in PR #16351: URL: https://github.com/apache/datafusion/pull/16351#discussion_r2168078769 ## datafusion/proto-common/src/from_proto/mod.rs: ## @@ -1066,6 +1066,7 @@ impl TryFrom<&protobuf::TableParquetOptions> for TableParquetOptions {

Re: [PR] fix: Add overflow check to evaluate of sum decimal accumulator [datafusion-comet]

2025-06-25 Thread via GitHub
parthchandra commented on code in PR #1922: URL: https://github.com/apache/datafusion-comet/pull/1922#discussion_r2167796799 ## native/spark-expr/src/agg_funcs/sum_decimal.rs: ## @@ -375,11 +375,17 @@ impl GroupsAccumulator for SumDecimalGroupsAccumulator { // are

Re: [PR] Allow unparser to override the alias name for the specific dialect [datafusion]

2025-06-25 Thread via GitHub
goldmedal commented on PR #16540: URL: https://github.com/apache/datafusion/pull/16540#issuecomment-3006609720 Thanks @alamb 👍 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific commen

Re: [PR] feat: `auto` scan mode should check for supported file location [datafusion-comet]

2025-06-25 Thread via GitHub
andygrove commented on code in PR #1930: URL: https://github.com/apache/datafusion-comet/pull/1930#discussion_r2167849940 ## dev/diffs/3.4.3.diff: ## @@ -2868,6 +2868,28 @@ index 52abd248f3a..7a199931a08 100644 case h: HiveTableScanExec => h.partitionPruningPred.collect

Re: [PR] feat: Encapsulate Parquet objects [datafusion-comet]

2025-06-25 Thread via GitHub
hsiang-c commented on code in PR #1920: URL: https://github.com/apache/datafusion-comet/pull/1920#discussion_r2167786141 ## common/src/main/java/org/apache/comet/parquet/FileReader.java: ## @@ -209,6 +257,55 @@ public void setRequestedSchema(List projection) { } } +

Re: [I] [substrait] [sqllogictest] Unsupported cast type: Time64 [datafusion]

2025-06-25 Thread via GitHub
jkosh44 commented on issue #16275: URL: https://github.com/apache/datafusion/issues/16275#issuecomment-3006505351 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To un

Re: [PR] Skip re-pruning based on partition values and file level stats if there are no dynamic filters [datafusion]

2025-06-25 Thread via GitHub
adriangb commented on PR #16424: URL: https://github.com/apache/datafusion/pull/16424#issuecomment-3006625465 @xudong963 @alamb I've re-organized this to incorporate https://github.com/apache/datafusion/pull/16549. Sadly I did not catch in that PR that we were putting everything in `l

Re: [I] Use sha2 implementation from datafusion-spark crate [datafusion-comet]

2025-06-25 Thread via GitHub
andygrove commented on issue #1820: URL: https://github.com/apache/datafusion-comet/issues/1820#issuecomment-3006624446 > [@andygrove](https://github.com/andygrove) can I backport [SHA2-fix](https://github.com/apache/datafusion/pull/16350) to branch-48 of datafusion ? I tried updating with

Re: [PR] feat: `auto` scan mode should check for supported file location [datafusion-comet]

2025-06-25 Thread via GitHub
parthchandra commented on code in PR #1930: URL: https://github.com/apache/datafusion-comet/pull/1930#discussion_r2167875652 ## dev/diffs/3.4.3.diff: ## @@ -2868,6 +2868,28 @@ index 52abd248f3a..7a199931a08 100644 case h: HiveTableScanExec => h.partitionPruningPred.colle

Re: [PR] [PoC] Add API for tracking distinct buffers in `MemoryPool` by reference count [datafusion]

2025-06-25 Thread via GitHub
joroKr21 commented on code in PR #16359: URL: https://github.com/apache/datafusion/pull/16359#discussion_r2168219555 ## datafusion/execution/src/memory_pool/mod.rs: ## @@ -131,14 +133,58 @@ pub trait MemoryPool: Send + Sync + std::fmt::Debug { /// This must always succeed

Re: [I] Add tests for yielding / cancelling in SpillManager [datafusion]

2025-06-25 Thread via GitHub
ding-young commented on issue #16482: URL: https://github.com/apache/datafusion/issues/16482#issuecomment-3007233676 @pepijnve Thanks for the clarification and simplifying it :) I’ll go ahead and open a PR based on that. -- This is an automated message from the Apache Git Service. To resp

Re: [I] Add support of rand() expression [datafusion-comet]

2025-06-25 Thread via GitHub
andygrove closed issue #1198: Add support of rand() expression URL: https://github.com/apache/datafusion-comet/issues/1198 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [PR] adapt filter expressions to file schema during parquet scan [datafusion]

2025-06-25 Thread via GitHub
adriangb commented on PR #16461: URL: https://github.com/apache/datafusion/pull/16461#issuecomment-3006801288 Great I'll address https://github.com/apache/datafusion/pull/16461#discussion_r2159713329 and then I think this will be ready to merge! -- This is an automated message from the A

Re: [PR] dynamic filter refactor [datafusion]

2025-06-25 Thread via GitHub
github-actions[bot] closed pull request #15685: dynamic filter refactor URL: https://github.com/apache/datafusion/pull/15685 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [PR] feat: `auto` scan mode should check for supported file location [datafusion-comet]

2025-06-25 Thread via GitHub
parthchandra commented on code in PR #1930: URL: https://github.com/apache/datafusion-comet/pull/1930#discussion_r2167792716 ## spark/src/main/scala/org/apache/comet/rules/CometScanRule.scala: ## @@ -258,11 +258,15 @@ case class CometScanRule(session: SparkSession) extends Rule

Re: [PR] Feat: support bit_get function [datafusion-comet]

2025-06-25 Thread via GitHub
comphead commented on code in PR #1713: URL: https://github.com/apache/datafusion-comet/pull/1713#discussion_r2167886851 ## spark/src/main/scala/org/apache/comet/serde/bitwise.scala: ## @@ -0,0 +1,161 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or m

Re: [PR] fix: SortMergeJoin for timestamp keys [datafusion-comet]

2025-06-25 Thread via GitHub
parthchandra commented on PR #1901: URL: https://github.com/apache/datafusion-comet/pull/1901#issuecomment-3006474262 I don't think this PR is making the correct change. With this PR the removed test fails to execute the query (let alone pass the assertion) ``` test("SortMergeJo

Re: [PR] fix: Fix `EquivalenceClass` calculation for Union queries [datafusion]

2025-06-25 Thread via GitHub
chenkovsky commented on PR #16185: URL: https://github.com/apache/datafusion/pull/16185#issuecomment-3006620230 > * This PR might also be superceded by [[MAJOR] Equivalence System Overhaul  #16217](https://github.com/apache/datafusion/pull/16217) @alamb it seems that #16217 didn't fix

Re: [PR] feat: Parquet modular encryption [datafusion]

2025-06-25 Thread via GitHub
adamreeve commented on code in PR #16351: URL: https://github.com/apache/datafusion/pull/16351#discussion_r2167900882 ## datafusion/datasource-parquet/src/file_format.rs: ## @@ -930,12 +959,14 @@ pub async fn fetch_parquet_metadata( store: &dyn ObjectStore, meta: &Obje

Re: [PR] adapt filter expressions to file schema during parquet scan [datafusion]

2025-06-25 Thread via GitHub
kosiew commented on PR #16461: URL: https://github.com/apache/datafusion/pull/16461#issuecomment-3006767151 @adriangb > @kosiew any objections to merging this? Nope. I am excited to see the solution of the puzzle. -- This is an automated message from the Apache Git Servic

Re: [PR] datafusion-cli: Use correct S3 region if it is not specified [datafusion]

2025-06-25 Thread via GitHub
liamzwbao commented on code in PR #16502: URL: https://github.com/apache/datafusion/pull/16502#discussion_r2167939780 ## datafusion-cli/src/exec.rs: ## @@ -231,10 +231,24 @@ pub(super) async fn exec_and_print( let adjusted = AdjustedPrintOptions::new(print

Re: [PR] feat: Parquet modular encryption [datafusion]

2025-06-25 Thread via GitHub
corwinjoy commented on code in PR #16351: URL: https://github.com/apache/datafusion/pull/16351#discussion_r2168085502 ## docs/source/user-guide/configs.md: ## @@ -81,6 +81,8 @@ Environment variables are read during `SessionConfig` initialisation so they mus | datafusion.execut

Re: [PR] feat: Parquet modular encryption [datafusion]

2025-06-25 Thread via GitHub
corwinjoy commented on code in PR #16351: URL: https://github.com/apache/datafusion/pull/16351#discussion_r2168097037 ## datafusion/common/src/config.rs: ## @@ -2017,6 +2056,305 @@ config_namespace_with_hashmap! { } } +#[derive(Clone, Debug, Default, PartialEq)] +pub str

Re: [PR] feat: supports array_distinct [datafusion-comet]

2025-06-25 Thread via GitHub
drexler-sky commented on PR #1923: URL: https://github.com/apache/datafusion-comet/pull/1923#issuecomment-3007061357 > "spark.sql.optimizer.excludedRules" -> "org.apache.spark.sql.catalyst.optimizer.ConstantFolding" I tried this, but it didn't work for me. -- This is an au

Re: [PR] fix: support within_group [datafusion]

2025-06-25 Thread via GitHub
chenkovsky commented on code in PR #16538: URL: https://github.com/apache/datafusion/pull/16538#discussion_r2167848401 ## datafusion/sqllogictest/test_files/aggregate.slt: ## @@ -7040,3 +7040,7 @@ VALUES ) GROUP BY 1 ORDER BY 1; x 1 + +statement error DataFusion error:

[PR] feat: support `map_entries` builtin function [datafusion]

2025-06-25 Thread via GitHub
comphead opened a new pull request, #16557: URL: https://github.com/apache/datafusion/pull/16557 ## Which issue does this PR close? - Closes #16553 ## Rationale for this change Support a builtin `map_entries` function ## What changes are included in thi

Re: [PR] Add dynamic filter (bounds) pushdown to HashJoinExec [datafusion]

2025-06-25 Thread via GitHub
adriangb commented on PR #16445: URL: https://github.com/apache/datafusion/pull/16445#issuecomment-3006600650 @alamb I'd be interested to see what benchmarks say if you don't mind kicking them off? -- This is an automated message from the Apache Git Service. To respond to the message, ple

Re: [PR] fix: enable one more filter pushdown test for iceberg_compat [datafusion-comet]

2025-06-25 Thread via GitHub
parthchandra commented on PR #1931: URL: https://github.com/apache/datafusion-comet/pull/1931#issuecomment-3006653030 Doesn't look like this is right. Closing it and will reopen after I look at it again. -- This is an automated message from the Apache Git Service. To respond to the messa

Re: [PR] fix: enable one more filter pushdown test for iceberg_compat [datafusion-comet]

2025-06-25 Thread via GitHub
parthchandra closed pull request #1931: fix: enable one more filter pushdown test for iceberg_compat URL: https://github.com/apache/datafusion-comet/pull/1931 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [PR] feat: supports array_distinct [datafusion-comet]

2025-06-25 Thread via GitHub
comphead commented on PR #1923: URL: https://github.com/apache/datafusion-comet/pull/1923#issuecomment-3006672046 @drexler-sky what if ``` "spark.sql.optimizer.excludedRules" -> "org.apache.spark.sql.catalyst.optimizer.ConstantFolding", ``` ? -- This is an automated mess

Re: [I] Change back SmallVec to Vec for JoinHashMap [datafusion]

2025-06-25 Thread via GitHub
comphead closed issue #5940: Change back SmallVec to Vec for JoinHashMap URL: https://github.com/apache/datafusion/issues/5940 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. T

Re: [PR] fix: support within_group [datafusion]

2025-06-25 Thread via GitHub
chenkovsky commented on code in PR #16538: URL: https://github.com/apache/datafusion/pull/16538#discussion_r2167849726 ## datafusion/sql/src/expr/function.rs: ## @@ -375,6 +375,10 @@ impl SqlToRel<'_, S> { return plan_err!("WITHIN GROUP clause is required wh

Re: [PR] adapt filter expressions to file schema during parquet scan [datafusion]

2025-06-25 Thread via GitHub
adriangb commented on PR #16461: URL: https://github.com/apache/datafusion/pull/16461#issuecomment-3006603280 @kosiew any objections to merging this? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [PR] datafusion-cli: Use correct S3 region if it is not specified [datafusion]

2025-06-25 Thread via GitHub
liamzwbao commented on code in PR #16502: URL: https://github.com/apache/datafusion/pull/16502#discussion_r2167939780 ## datafusion-cli/src/exec.rs: ## @@ -231,10 +231,24 @@ pub(super) async fn exec_and_print( let adjusted = AdjustedPrintOptions::new(print

Re: [PR] feat: Encapsulate Parquet objects [datafusion-comet]

2025-06-25 Thread via GitHub
hsiang-c commented on PR #1920: URL: https://github.com/apache/datafusion-comet/pull/1920#issuecomment-3006454989 > In my local copy of Iceberg, I updated SparkBatchQueryScan to implement SupportsComet. @andygrove You can apply the diff to Iceberg 1.8.1 for Comet support. I'll

Re: [PR] feat: Encapsulate Parquet objects [datafusion-comet]

2025-06-25 Thread via GitHub
hsiang-c commented on code in PR #1920: URL: https://github.com/apache/datafusion-comet/pull/1920#discussion_r2167782872 ## common/src/main/java/org/apache/comet/parquet/FileReader.java: ## @@ -128,6 +134,48 @@ public FileReader(InputFile file, ParquetReadOptions options, ReadO

Re: [I] [substrait] [sqllogictest] Unsupported cast type: Time32(Second) [datafusion]

2025-06-25 Thread via GitHub
jkosh44 commented on issue #16296: URL: https://github.com/apache/datafusion/issues/16296#issuecomment-3006505233 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To un

Re: [PR] feat: rand expression support [datafusion-comet]

2025-06-25 Thread via GitHub
andygrove merged PR #1199: URL: https://github.com/apache/datafusion-comet/pull/1199 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@

Re: [PR] feat: Encapsulate Parquet objects [datafusion-comet]

2025-06-25 Thread via GitHub
hsiang-c commented on code in PR #1920: URL: https://github.com/apache/datafusion-comet/pull/1920#discussion_r2167786141 ## common/src/main/java/org/apache/comet/parquet/FileReader.java: ## @@ -209,6 +257,55 @@ public void setRequestedSchema(List projection) { } } +

Re: [PR] feat: Encapsulate Parquet objects [datafusion-comet]

2025-06-25 Thread via GitHub
hsiang-c commented on code in PR #1920: URL: https://github.com/apache/datafusion-comet/pull/1920#discussion_r2167782872 ## common/src/main/java/org/apache/comet/parquet/FileReader.java: ## @@ -128,6 +134,48 @@ public FileReader(InputFile file, ParquetReadOptions options, ReadO

Re: [PR] [PoC] Add API for tracking distinct buffers in `MemoryPool` by reference count [datafusion]

2025-06-25 Thread via GitHub
joroKr21 commented on code in PR #16359: URL: https://github.com/apache/datafusion/pull/16359#discussion_r2168281061 ## datafusion/execution/src/memory_pool/pool.rs: ## @@ -112,6 +113,144 @@ impl MemoryPool for GreedyMemoryPool { } } +// A [`MemoryPool`] that implements

Re: [PR] [PoC] Add API for tracking distinct buffers in `MemoryPool` by reference count [datafusion]

2025-06-25 Thread via GitHub
joroKr21 commented on PR #16359: URL: https://github.com/apache/datafusion/pull/16359#issuecomment-3007338886 Perhaps the best solution would be another interface that wraps the memory pool so it can't be misused. The problem with having two atomic counters is that we need to check both in

Re: [PR] Feat: support bit_get function [datafusion-comet]

2025-06-25 Thread via GitHub
andygrove commented on PR #1713: URL: https://github.com/apache/datafusion-comet/pull/1713#issuecomment-3006591464 @kazantsev-maksim I took the liberty of upmering this PR @parthchandra @comphead could you take another look when you have a chance? -- This is an automated message fr

[PR] Add support for Arrow Time types in Substrait [datafusion]

2025-06-25 Thread via GitHub
jkosh44 opened a new pull request, #16558: URL: https://github.com/apache/datafusion/pull/16558 ## Which issue does this PR close? Resolves #16296 Resolves #16275 ## Rationale for this change This commit adds support for Arrow Time types Time32 and Time64 in Substrait

Re: [PR] feat: Encapsulate Parquet objects [datafusion-comet]

2025-06-25 Thread via GitHub
huaxingao commented on code in PR #1920: URL: https://github.com/apache/datafusion-comet/pull/1920#discussion_r2167873990 ## common/src/main/java/org/apache/comet/parquet/FileReader.java: ## @@ -128,6 +134,48 @@ public FileReader(InputFile file, ParquetReadOptions options, Read

Re: [PR] feat: Parquet modular encryption [datafusion]

2025-06-25 Thread via GitHub
adamreeve commented on code in PR #16351: URL: https://github.com/apache/datafusion/pull/16351#discussion_r2167883986 ## datafusion/datasource-parquet/src/file_format.rs: ## @@ -930,12 +959,14 @@ pub async fn fetch_parquet_metadata( store: &dyn ObjectStore, meta: &Obje

[PR] fix: Fix `EquivalenceClass` calculation for Union queries [datafusion]

2025-06-25 Thread via GitHub
chenkovsky opened a new pull request, #16185: URL: https://github.com/apache/datafusion/pull/16185 ## Which issue does this PR close? - Closes #16171. ## Rationale for this change equivalence is not set. ## What changes are included in this PR? compute i

Re: [PR] feat: `auto` scan mode should check for supported file location [datafusion-comet]

2025-06-25 Thread via GitHub
codecov-commenter commented on PR #1930: URL: https://github.com/apache/datafusion-comet/pull/1930#issuecomment-3005204578 ## [Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/1930?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_ca

Re: [I] Add a case for reading map_entries in the `read basic complex types` test [datafusion-comet]

2025-06-25 Thread via GitHub
comphead commented on issue #1916: URL: https://github.com/apache/datafusion-comet/issues/1916#issuecomment-3005170665 `map_entries` is not supported in DataFusion. Filed https://github.com/apache/datafusion/issues/16553 -- This is an automated message from the Apache Git Service. To res

Re: [PR] feat: collect once during display() in jupyter notebooks [datafusion-python]

2025-06-25 Thread via GitHub
timsaucer merged PR #1167: URL: https://github.com/apache/datafusion-python/pull/1167 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...

Re: [PR] Skip re-pruning based on partition values and file level stats if there are no dynamic filters [datafusion]

2025-06-25 Thread via GitHub
adriangb commented on PR #16424: URL: https://github.com/apache/datafusion/pull/16424#issuecomment-3005207099 @alamb I added test assertions to confirm the stats are working correctly which addresses https://github.com/apache/datafusion/issues/16402 -- This is an automated message from th

Re: [PR] chore(deps): bump object_store from 0.12.1 to 0.12.2 [datafusion]

2025-06-25 Thread via GitHub
comphead merged PR #16548: URL: https://github.com/apache/datafusion/pull/16548 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

Re: [I] Add tests for yielding / cancelling in SpillManager [datafusion]

2025-06-25 Thread via GitHub
pepijnve commented on issue #16482: URL: https://github.com/apache/datafusion/issues/16482#issuecomment-3005212491 @ding-young I think the gist of it is correct indeed. I tinkered a bit locally with `spill_reader_yield` and condensed it down to this which is doing the same thing but is a bi

  1   2   3   >