Re: [PR] [Test] Upgrade to edition 2024 [datafusion]

2025-04-22 Thread via GitHub
Dandandan commented on PR #15805: URL: https://github.com/apache/datafusion/pull/15805#issuecomment-2820649623 Nice experiment, closing. No compilation improvement just from `cargo check` that I observed -- This is an automated message from the Apache Git Service. To respond to the mess

Re: [PR] doc: Adding Feldera as known user [datafusion]

2025-04-22 Thread via GitHub
xudong963 merged PR #15799: URL: https://github.com/apache/datafusion/pull/15799 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@data

Re: [PR] add support for XMLTABLE(...) [datafusion-sqlparser-rs]

2025-04-22 Thread via GitHub
lovasoa commented on PR #1817: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1817#issuecomment-2820406341 Hi @alamb ! I'd love to get your feedback on this :) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

Re: [PR] fix: AQE creating a non-supported Final HashAggregate post-shuffle [datafusion-comet]

2025-04-22 Thread via GitHub
EmilyMatt commented on PR #1390: URL: https://github.com/apache/datafusion-comet/pull/1390#issuecomment-2820434223 > @EmilyMatt I see that plan stability tests are failing. Do you plan to update the golden files? I was unable to, I tried following the guide, as well as tried various

[I] Eliminate the function call in `xxx_or (e.g. unwrap_or("".to_string())` [datafusion]

2025-04-22 Thread via GitHub
Rachelint opened a new issue, #15803: URL: https://github.com/apache/datafusion/issues/15803 ### Is your feature request related to a problem or challenge? I found some unnecessary functions are called, due to using `unwrap_or` rather than `unwrap_or_else` in funcation call case (some

[I] Eliminate the function call in `xxx_or (e.g. unwrap_or("".to_string())` [datafusion]

2025-04-22 Thread via GitHub
Rachelint opened a new issue, #15802: URL: https://github.com/apache/datafusion/issues/15802 ### Is your feature request related to a problem or challenge? I found some unnecessary functions are called, due to using `unwrap_or` rather than `unwrap_or_else` in funcation call case (some

Re: [PR] fix: AQE creating a non-supported Final HashAggregate post-shuffle [datafusion-comet]

2025-04-22 Thread via GitHub
EmilyMatt closed pull request #1390: fix: AQE creating a non-supported Final HashAggregate post-shuffle URL: https://github.com/apache/datafusion-comet/pull/1390 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abo

Re: [I] Deprecate ExprSchemable functions [datafusion]

2025-04-22 Thread via GitHub
ajita-asthana commented on issue #15798: URL: https://github.com/apache/datafusion/issues/15798#issuecomment-2821104895 I want to work on this issue. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[PR] Fix: fetch is missing in `replace_order_preserving_variants` method during `EnforceDistribution` optimizer [datafusion]

2025-04-22 Thread via GitHub
xudong963 opened a new pull request, #15808: URL: https://github.com/apache/datafusion/pull/15808 ## Which issue does this PR close? - Closes #. ## Rationale for this change If SPM contains `fetch`, then fetch will be missing in the `replace_order_preserv

[I] Update to 2024 edition [datafusion]

2025-04-22 Thread via GitHub
Dandandan opened a new issue, #15804: URL: https://github.com/apache/datafusion/issues/15804 ### Is your feature request related to a problem or challenge? At some point we would like to migrate to the 2024 which contains some improvements to the language / compiler syntax and librari

Re: [I] Eliminate the function call in `xxx_or (e.g. unwrap_or("".to_string())` [datafusion]

2025-04-22 Thread via GitHub
Rachelint closed issue #15803: Eliminate the function call in `xxx_or (e.g. unwrap_or("".to_string())` URL: https://github.com/apache/datafusion/issues/15803 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above t

[PR] [Test] Upgrade to edition 2024 [datafusion]

2025-04-22 Thread via GitHub
Dandandan opened a new pull request, #15805: URL: https://github.com/apache/datafusion/pull/15805 ## Which issue does this PR close? - Closes #. ## Rationale for this change ## What changes are included in this PR? ## Are these changes teste

Re: [I] Eliminate the function call in `xxx_or (e.g. unwrap_or("".to_string())` [datafusion]

2025-04-22 Thread via GitHub
Rachelint commented on issue #15803: URL: https://github.com/apache/datafusion/issues/15803#issuecomment-2820522360 Sorry, due to network problem, this issue are created twice, just close this. -- This is an automated message from the Apache Git Service. To respond to the message, please l

Re: [PR] chore(deps): bump half from 2.5.0 to 2.6.0 [datafusion]

2025-04-22 Thread via GitHub
xudong963 merged PR #15806: URL: https://github.com/apache/datafusion/pull/15806 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@data

Re: [PR] minor: `executor_shutdown_while_running` test has race condition [datafusion-ballista]

2025-04-22 Thread via GitHub
milenkovicm merged PR #1248: URL: https://github.com/apache/datafusion-ballista/pull/1248 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubsc

Re: [I] Suspicious slow test in Ballista [datafusion-ballista]

2025-04-22 Thread via GitHub
milenkovicm closed issue #235: Suspicious slow test in Ballista URL: https://github.com/apache/datafusion-ballista/issues/235 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[I] Merging Statistics is slow [datafusion]

2025-04-22 Thread via GitHub
robert3005 opened a new issue, #15809: URL: https://github.com/apache/datafusion/issues/15809 ### Is your feature request related to a problem or challenge? If you have a datasource that reports a sum statistic then merging statistics for a file group is considerably slower when compa

Re: [I] Support non UTF-8 in CSV files [datafusion]

2025-04-22 Thread via GitHub
houseme commented on issue #15756: URL: https://github.com/apache/datafusion/issues/15756#issuecomment-2820310701 > Or fully support utf8? 期待支持全部的utf-8[偷笑] -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[PR] refactor filter pushdown apis [datafusion]

2025-04-22 Thread via GitHub
adriangb opened a new pull request, #15801: URL: https://github.com/apache/datafusion/pull/15801 (no comment) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe

Re: [PR] TopK dynamic filter pushdown attempt 2 [datafusion]

2025-04-22 Thread via GitHub
adriangb commented on code in PR #15770: URL: https://github.com/apache/datafusion/pull/15770#discussion_r2053476865 ## datafusion/physical-optimizer/src/push_down_filter.rs: ## @@ -382,7 +383,7 @@ impl PhysicalOptimizerRule for PushdownFilter { context .

Re: [PR] TopK dynamic filter pushdown attempt 2 [datafusion]

2025-04-22 Thread via GitHub
adriangb commented on code in PR #15770: URL: https://github.com/apache/datafusion/pull/15770#discussion_r2053476865 ## datafusion/physical-optimizer/src/push_down_filter.rs: ## @@ -382,7 +383,7 @@ impl PhysicalOptimizerRule for PushdownFilter { context .

Re: [PR] [Test] Upgrade to edition 2024 [datafusion]

2025-04-22 Thread via GitHub
Dandandan closed pull request #15805: [Test] Upgrade to edition 2024 URL: https://github.com/apache/datafusion/pull/15805 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To uns

Re: [I] Eliminate the function call in `xxx_or (e.g. unwrap_or("".to_string())` [datafusion]

2025-04-22 Thread via GitHub
Rachelint commented on issue #15802: URL: https://github.com/apache/datafusion/issues/15802#issuecomment-2820528686 Recommend to do add it crate by crate(one pr for one crate), like what are done in https://github.com/apache/datafusion/issues/11143 -- This is an automated message from the

[PR] chore(deps): bump half from 2.5.0 to 2.6.0 [datafusion]

2025-04-22 Thread via GitHub
dependabot[bot] opened a new pull request, #15806: URL: https://github.com/apache/datafusion/pull/15806 Bumps [half](https://github.com/VoidStarKat/half-rs) from 2.5.0 to 2.6.0. Release notes Sourced from https://github.com/VoidStarKat/half-rs/releases";>half's releases. 2.6

Re: [PR] Feature/benchmark config from env [datafusion]

2025-04-22 Thread via GitHub
ctsk commented on PR #15782: URL: https://github.com/apache/datafusion/pull/15782#issuecomment-2820537298 I can't see an easy way to check what environment variables were actually picked up. I've opted to add some logging to `ConfigOptions::from_env` -- This is an automated message from t

[PR] Add `or_fun_call` and `unnecessary_lazy_evaluations` lints on `core` [datafusion]

2025-04-22 Thread via GitHub
Rachelint opened a new pull request, #15807: URL: https://github.com/apache/datafusion/pull/15807 ## Which issue does this PR close? Part of #15802 - Closes #. ## Rationale for this change ## What changes are included in this PR? ## Are

Re: [PR] Implement Parquet filter pushdown via new filter pushdown APIs [datafusion]

2025-04-22 Thread via GitHub
jayzhan211 commented on code in PR #15769: URL: https://github.com/apache/datafusion/pull/15769#discussion_r2053975043 ## datafusion/sqllogictest/test_files/aggregate.slt: ## @@ -5006,7 +5006,7 @@ SELECT column5, avg(column1) FROM d GROUP BY column5; query I?? SELECT column5

Re: [PR] Implement Parquet filter pushdown via new filter pushdown APIs [datafusion]

2025-04-22 Thread via GitHub
jayzhan211 commented on code in PR #15769: URL: https://github.com/apache/datafusion/pull/15769#discussion_r2053975043 ## datafusion/sqllogictest/test_files/aggregate.slt: ## @@ -5006,7 +5006,7 @@ SELECT column5, avg(column1) FROM d GROUP BY column5; query I?? SELECT column5

Re: [PR] Implement Parquet filter pushdown via new filter pushdown APIs [datafusion]

2025-04-22 Thread via GitHub
jayzhan211 commented on code in PR #15769: URL: https://github.com/apache/datafusion/pull/15769#discussion_r2053975043 ## datafusion/sqllogictest/test_files/aggregate.slt: ## @@ -5006,7 +5006,7 @@ SELECT column5, avg(column1) FROM d GROUP BY column5; query I?? SELECT column5

Re: [PR] Implement Parquet filter pushdown via new filter pushdown APIs [datafusion]

2025-04-22 Thread via GitHub
jayzhan211 commented on code in PR #15769: URL: https://github.com/apache/datafusion/pull/15769#discussion_r2053975043 ## datafusion/sqllogictest/test_files/aggregate.slt: ## @@ -5006,7 +5006,7 @@ SELECT column5, avg(column1) FROM d GROUP BY column5; query I?? SELECT column5

Re: [PR] Introduce Async User Defined Functions [datafusion]

2025-04-22 Thread via GitHub
berkaysynnada commented on PR #14837: URL: https://github.com/apache/datafusion/pull/14837#issuecomment-2821209325 What's the status of this PR? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to th

Re: [PR] Fix Infer prepare statement type tests [datafusion]

2025-04-22 Thread via GitHub
qstommyshu commented on code in PR #15743: URL: https://github.com/apache/datafusion/pull/15743#discussion_r2054082743 ## datafusion/sql/tests/sql_integration.rs: ## @@ -4673,16 +4675,17 @@ fn test_infer_types_from_predicate() { } #[test] -fn test_infer_types_from_between_pr

Re: [PR] feat: More warning info for users [datafusion-comet]

2025-04-22 Thread via GitHub
andygrove commented on code in PR #1667: URL: https://github.com/apache/datafusion-comet/pull/1667#discussion_r2054367997 ## spark/src/main/scala/org/apache/comet/DataTypeSupport.scala: ## @@ -37,8 +39,7 @@ trait DataTypeSupport { private def isGloballySupported(dt: DataTyp

Re: [PR] Add Extension Type / Metadata support for Scalar UDFs [datafusion]

2025-04-22 Thread via GitHub
timsaucer commented on PR #15646: URL: https://github.com/apache/datafusion/pull/15646#issuecomment-2821605182 > Hi @timsaucer. I'd like to review this before getting merged. If it's not blocking anything, would it be okay to wait until weekend? I would greatly appreciate the review,

Re: [PR] feat: More warning info for users [datafusion-comet]

2025-04-22 Thread via GitHub
comphead commented on code in PR #1667: URL: https://github.com/apache/datafusion-comet/pull/1667#discussion_r2054307233 ## spark/src/main/scala/org/apache/comet/DataTypeSupport.scala: ## @@ -37,8 +39,7 @@ trait DataTypeSupport { private def isGloballySupported(dt: DataType

[I] Pruning of floating point Parquet columns is incorrect when `NaN` is present [datafusion]

2025-04-22 Thread via GitHub
etseidl opened a new issue, #15812: URL: https://github.com/apache/datafusion/issues/15812 ### Describe the bug This was mentioned in https://github.com/apache/datafusion/issues/15742#issuecomment-2815595171 and discussed in detail in https://github.com/apache/parquet-format/pull/221

Re: [PR] Introduce Async User Defined Functions [datafusion]

2025-04-22 Thread via GitHub
goldmedal commented on PR #14837: URL: https://github.com/apache/datafusion/pull/14837#issuecomment-2821640679 > What's the status of this PR? It's ready to review. I'm still waiting for someone to help review it. -- This is an automated message from the Apache Git Service. To r

Re: [PR] chore: match Maven plugin versions with Spark 3.5 [datafusion-comet]

2025-04-22 Thread via GitHub
andygrove commented on PR #1668: URL: https://github.com/apache/datafusion-comet/pull/1668#issuecomment-2821464372 CI builds are failing with errors such as: ``` [error] /__w/datafusion-comet/datafusion-comet/apache-spark/core/target/scala-2.13/src_managed/main/org/apache/spark/st

Re: [PR] feat: More warning info for users [datafusion-comet]

2025-04-22 Thread via GitHub
comphead commented on code in PR #1667: URL: https://github.com/apache/datafusion-comet/pull/1667#discussion_r2054555242 ## spark/src/main/scala/org/apache/comet/DataTypeSupport.scala: ## @@ -37,8 +39,7 @@ trait DataTypeSupport { private def isGloballySupported(dt: DataType

Re: [I] Pruning of floating point Parquet columns is incorrect when `NaN` is present [datafusion]

2025-04-22 Thread via GitHub
etseidl commented on issue #15812: URL: https://github.com/apache/datafusion/issues/15812#issuecomment-2822027266 > Where is `ColumnOrder` available? In theory we have access to the full metadata of the parquet file when building the pruning predicate batches / predicate. It's an arr

Re: [I] Pruning of floating point Parquet columns is incorrect when `NaN` is present [datafusion]

2025-04-22 Thread via GitHub
adriangb commented on issue #15812: URL: https://github.com/apache/datafusion/issues/15812#issuecomment-2822044138 Long term I think it will only happen in `ParquetOpener::open` https://github.com/apache/datafusion/pull/15769 precisely for reasons like this 😄 -- This is an automated mes

Re: [PR] feat: More warning info for users [datafusion-comet]

2025-04-22 Thread via GitHub
hsiang-c commented on code in PR #1667: URL: https://github.com/apache/datafusion-comet/pull/1667#discussion_r2054576057 ## spark/src/main/scala/org/apache/comet/CometSparkSessionExtensions.scala: ## @@ -161,10 +161,28 @@ class CometSparkSessionExtensions }

[I] Inline table scan drops projection [datafusion]

2025-04-22 Thread via GitHub
vadimpiven opened a new issue, #15810: URL: https://github.com/apache/datafusion/issues/15810 ### Describe the bug PR https://github.com/apache/datafusion/pull/15201 introduced a bug here https://github.com/apache/datafusion/blob/9730404028a91a7fe875ea3f88bafdbcb305ae6c/datafusion/exp

[PR] chore: Make Aggregate transformation more compact [datafusion-comet]

2025-04-22 Thread via GitHub
EmilyMatt opened a new pull request, #1670: URL: https://github.com/apache/datafusion-comet/pull/1670 ## Which issue does this PR close? Does not close anything, but is probably a step towards #1669. ## Rationale for this change There is needless nesting a

Re: [PR] Add `or_fun_call` and `unnecessary_lazy_evaluations` lints on `core` [datafusion]

2025-04-22 Thread via GitHub
Rachelint merged PR #15807: URL: https://github.com/apache/datafusion/pull/15807 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@data

Re: [PR] Add `or_fun_call` and `unnecessary_lazy_evaluations` lints on `core` [datafusion]

2025-04-22 Thread via GitHub
Rachelint commented on PR #15807: URL: https://github.com/apache/datafusion/pull/15807#issuecomment-2821213910 @jayzhan211 @2010YOUY01 Thanks for reviewing! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [I] Allow UDFs to return custom `Diagnostic` [datafusion]

2025-04-22 Thread via GitHub
eliaperantoni commented on issue #15276: URL: https://github.com/apache/datafusion/issues/15276#issuecomment-2821421644 > [@eliaperantoni](https://github.com/eliaperantoni) I had a follow up question to this regarding the `diagnose` trait function. We want to call the diagnose function duri

Re: [I] Inline table scan drops projection [datafusion]

2025-04-22 Thread via GitHub
vadimpiven commented on issue #15810: URL: https://github.com/apache/datafusion/issues/15810#issuecomment-2821420869 @qstommyshu I have pushed the fix https://github.com/apache/datafusion/pull/15811 -- This is an automated message from the Apache Git Service. To respond to the message, pl

Re: [PR] feat: transfer Apache Spark runtime conf to native engine [datafusion-comet]

2025-04-22 Thread via GitHub
andygrove commented on code in PR #1649: URL: https://github.com/apache/datafusion-comet/pull/1649#discussion_r2054186718 ## native/core/src/execution/jni_api.rs: ## @@ -198,6 +199,7 @@ pub unsafe extern "system" fn Java_org_apache_comet_Native_createPlan( task_attempt_id:

Re: [PR] feat: Add option to adjust writer buffer size for query output [datafusion]

2025-04-22 Thread via GitHub
m09526 commented on code in PR #15747: URL: https://github.com/apache/datafusion/pull/15747#discussion_r2054291448 ## datafusion/datasource/src/write/mod.rs: ## @@ -88,6 +91,21 @@ pub async fn create_writer( file_compression_type.convert_async_writer(buf_writer) } +/// R

Re: [PR] Add Extension Type / Metadata support for Scalar UDFs [datafusion]

2025-04-22 Thread via GitHub
paleolimbot commented on code in PR #15646: URL: https://github.com/apache/datafusion/pull/15646#discussion_r2054292174 ## datafusion/core/tests/user_defined/user_defined_scalar_functions.rs: ## @@ -1367,3 +1372,342 @@ async fn register_alltypes_parquet(ctx: &SessionContext) ->

Re: [PR] feat: More warning info for users [datafusion-comet]

2025-04-22 Thread via GitHub
comphead commented on code in PR #1667: URL: https://github.com/apache/datafusion-comet/pull/1667#discussion_r2054294073 ## spark/src/main/scala/org/apache/comet/CometSparkSessionExtensions.scala: ## @@ -161,10 +161,28 @@ class CometSparkSessionExtensions }

Re: [PR] feat: More warning info for users [datafusion-comet]

2025-04-22 Thread via GitHub
comphead commented on code in PR #1667: URL: https://github.com/apache/datafusion-comet/pull/1667#discussion_r2054295647 ## spark/src/main/scala/org/apache/comet/CometSparkSessionExtensions.scala: ## @@ -161,10 +161,28 @@ class CometSparkSessionExtensions }

Re: [I] Pruning of floating point Parquet columns is incorrect when `NaN` is present [datafusion]

2025-04-22 Thread via GitHub
adriangb commented on issue #15812: URL: https://github.com/apache/datafusion/issues/15812#issuecomment-2821897604 I'm not immediately sure. Is the point that the result of `max(2.0, NaN)` depends on how you define the ordering of floating point numbers wrt NaN, which has two variations?

Re: [I] Pruning of floating point Parquet columns is incorrect when `NaN` is present [datafusion]

2025-04-22 Thread via GitHub
adriangb commented on issue #15812: URL: https://github.com/apache/datafusion/issues/15812#issuecomment-2821901778 If so the simplest short term solution would be to not write stats for containers that have NaN. At least results would then be correct. How do we handle this with nulls

Re: [PR] Fix Infer prepare statement type tests [datafusion]

2025-04-22 Thread via GitHub
qstommyshu commented on code in PR #15743: URL: https://github.com/apache/datafusion/pull/15743#discussion_r2054496588 ## datafusion/sql/tests/sql_integration.rs: ## @@ -4673,16 +4675,17 @@ fn test_infer_types_from_predicate() { } #[test] -fn test_infer_types_from_between_pr

Re: [PR] Fix Infer prepare statement type tests [datafusion]

2025-04-22 Thread via GitHub
qstommyshu commented on code in PR #15743: URL: https://github.com/apache/datafusion/pull/15743#discussion_r2054496588 ## datafusion/sql/tests/sql_integration.rs: ## @@ -4673,16 +4675,17 @@ fn test_infer_types_from_predicate() { } #[test] -fn test_infer_types_from_between_pr

Re: [PR] Fix Infer prepare statement type tests [datafusion]

2025-04-22 Thread via GitHub
qstommyshu commented on code in PR #15743: URL: https://github.com/apache/datafusion/pull/15743#discussion_r2054082743 ## datafusion/sql/tests/sql_integration.rs: ## @@ -4673,16 +4675,17 @@ fn test_infer_types_from_predicate() { } #[test] -fn test_infer_types_from_between_pr

Re: [PR] Fix Infer prepare statement type tests [datafusion]

2025-04-22 Thread via GitHub
qstommyshu commented on code in PR #15743: URL: https://github.com/apache/datafusion/pull/15743#discussion_r2054082743 ## datafusion/sql/tests/sql_integration.rs: ## @@ -4673,16 +4675,17 @@ fn test_infer_types_from_predicate() { } #[test] -fn test_infer_types_from_between_pr

Re: [I] Inline table scan drops projection [datafusion]

2025-04-22 Thread via GitHub
qstommyshu commented on issue #15810: URL: https://github.com/apache/datafusion/issues/15810#issuecomment-2821345483 I can take a look -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] Preserve projection for inline scan [datafusion]

2025-04-22 Thread via GitHub
xudong963 commented on code in PR #15811: URL: https://github.com/apache/datafusion/pull/15811#discussion_r2054259450 ## datafusion/expr/src/logical_plan/builder.rs: ## @@ -498,7 +498,7 @@ impl LogicalPlanBuilder { TableScan::try_new(table_name, table_source, projec

Re: [PR] Add Extension Type / Metadata support for Scalar UDFs [datafusion]

2025-04-22 Thread via GitHub
berkaysynnada commented on PR #15646: URL: https://github.com/apache/datafusion/pull/15646#issuecomment-2821563176 Hi @timsaucer. I'd like to review this before getting merged. If it's not blocking anything, would it be okay to wait until weekend? -- This is an automated message from the

Re: [I] Investigate unstable benchmark results on macOS [datafusion-comet]

2025-04-22 Thread via GitHub
mbutrovich commented on issue #1648: URL: https://github.com/apache/datafusion-comet/issues/1648#issuecomment-2821858616 I've also only ever used jemalloc in the past. I'm not sure what the discussion was at the time to pursue mimalloc for Comet. @mdcallag has some recent writing on the to

Re: [I] Investigate unstable benchmark results on macOS [datafusion-comet]

2025-04-22 Thread via GitHub
mbutrovich commented on issue #1648: URL: https://github.com/apache/datafusion-comet/issues/1648#issuecomment-2821860775 > However, I'm uncertain whether jemalloc can still be used with LD_PRELOAD when Comet is compiled with mimalloc. The last time I attempted to dynamically override mallo

Re: [I] Investigate unstable benchmark results on macOS [datafusion-comet]

2025-04-22 Thread via GitHub
Kontinuation commented on issue #1648: URL: https://github.com/apache/datafusion-comet/issues/1648#issuecomment-2821878876 > Does LD_PRELOAD also change the allocator for the JVM? Memory allocated by `Unsafe_AllocateMemory0` (Arrow native memory, Spark off-heap memory) uses the alloc

[I] Refactor CometSparkSessionExtensions.scala [datafusion-comet]

2025-04-22 Thread via GitHub
EmilyMatt opened a new issue, #1669: URL: https://github.com/apache/datafusion-comet/issues/1669 ### What is the problem the feature request solves? This is a huge file, containing multiple rules, I think it can benefit from splitting it into smaller, more manageable files. It can

Re: [PR] chore: match Maven plugin versions with Spark 3.5 [datafusion-comet]

2025-04-22 Thread via GitHub
codecov-commenter commented on PR #1668: URL: https://github.com/apache/datafusion-comet/pull/1668#issuecomment-2821568973 ## [Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/1668?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_ca

Re: [PR] Add union_tag scalar function [datafusion]

2025-04-22 Thread via GitHub
Omega359 commented on PR #14687: URL: https://github.com/apache/datafusion/pull/14687#issuecomment-2821576525 @alamb, thoughts on this? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specifi

Re: [PR] feat: Emit warning with Diagnostic when doing = Null [datafusion]

2025-04-22 Thread via GitHub
eliaperantoni commented on PR #15696: URL: https://github.com/apache/datafusion/pull/15696#issuecomment-2821404985 > The warning detection is integrated during `BinaryExpr` processing, which should naturally limit it to predicate contexts. Statements like `UPDATE users SET password = NULL`

[PR] Preserve projection for inline scan [datafusion]

2025-04-22 Thread via GitHub
vadimpiven opened a new pull request, #15811: URL: https://github.com/apache/datafusion/pull/15811 ## Which issue does this PR close? - Closes #15810 ## Rationale for this change PR https://github.com/apache/datafusion/pull/15201 introduced a bug here https://github.com/

Re: [I] Investigate unstable benchmark results on macOS [datafusion-comet]

2025-04-22 Thread via GitHub
Dandandan commented on issue #1648: URL: https://github.com/apache/datafusion-comet/issues/1648#issuecomment-2821590325 FYI: We found mimalloc to be unstable for running long term (keeps memory allocated but seems doesn't release it over time => running into OOMs much more easily). Switchi

Re: [PR] feat: More warning info for users [datafusion-comet]

2025-04-22 Thread via GitHub
andygrove commented on code in PR #1667: URL: https://github.com/apache/datafusion-comet/pull/1667#discussion_r2054660077 ## spark/src/main/scala/org/apache/comet/CometSparkSessionExtensions.scala: ## @@ -161,10 +161,28 @@ class CometSparkSessionExtensions }

Re: [PR] feat: More warning info for users [datafusion-comet]

2025-04-22 Thread via GitHub
codecov-commenter commented on PR #1667: URL: https://github.com/apache/datafusion-comet/pull/1667#issuecomment-2822167010 ## [Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/1667?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_ca

Re: [PR] chore: Refactor Memory Pools [datafusion-comet]

2025-04-22 Thread via GitHub
andygrove merged PR #1662: URL: https://github.com/apache/datafusion-comet/pull/1662 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@

[PR] Fix ILIKE expression support in SQL unparser [datafusion]

2025-04-22 Thread via GitHub
ewgenius opened a new pull request, #15820: URL: https://github.com/apache/datafusion/pull/15820 ## Which issue does this PR close? SQL Unparser incorrectly handles `ILIKE` expressions, handling all as `LIKE`, ignoring `case_insensitive` flag. ## Rationale for this change

Re: [PR] pipe column orderings into pruning predicate creation [datafusion]

2025-04-22 Thread via GitHub
adriangb commented on code in PR #15821: URL: https://github.com/apache/datafusion/pull/15821#discussion_r2055221571 ## datafusion-examples/examples/advanced_parquet_index.rs: ## @@ -300,8 +300,11 @@ impl IndexTableProvider { // In this example, we use the PruningPredic

Re: [I] Pruning of floating point Parquet columns is incorrect when `NaN` is present [datafusion]

2025-04-22 Thread via GitHub
adriangb commented on issue #15812: URL: https://github.com/apache/datafusion/issues/15812#issuecomment-2823016671 @etseidl I opened https://github.com/apache/datafusion/pull/15821 to prove we can pipe the info in. Would you like to continue that PR since you seem to know what the right thi

Re: [PR] Improve documentation for `FileSource`, `DataSource` and `DataSourceExec` [datafusion]

2025-04-22 Thread via GitHub
adriangb commented on PR #15766: URL: https://github.com/apache/datafusion/pull/15766#issuecomment-2823040352 I didn't have time to do the initialization diagram but here is execution: ``` ┌─┐ │

Re: [PR] Improve documentation for `FileSource`, `DataSource` and `DataSourceExec` [datafusion]

2025-04-22 Thread via GitHub
adriangb commented on code in PR #15766: URL: https://github.com/apache/datafusion/pull/15766#discussion_r2055236859 ## datafusion/datasource/src/file.rs: ## @@ -37,9 +37,14 @@ use datafusion_physical_plan::DisplayFormatType; use object_store::ObjectStore; -/// Common file

Re: [PR] Improve documentation for `FileSource`, `DataSource` and `DataSourceExec` [datafusion]

2025-04-22 Thread via GitHub
adriangb commented on code in PR #15766: URL: https://github.com/apache/datafusion/pull/15766#discussion_r2055236859 ## datafusion/datasource/src/file.rs: ## @@ -37,9 +37,14 @@ use datafusion_physical_plan::DisplayFormatType; use object_store::ObjectStore; -/// Common file

Re: [I] Pruning of floating point Parquet columns is incorrect when `NaN` is present [datafusion]

2025-04-22 Thread via GitHub
etseidl commented on issue #15812: URL: https://github.com/apache/datafusion/issues/15812#issuecomment-2823047999 Wow, thanks @adriangb! I'll start on it tomorrow! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the UR

Re: [I] Refactor CometSparkSessionExtensions.scala [datafusion-comet]

2025-04-22 Thread via GitHub
andygrove commented on issue #1669: URL: https://github.com/apache/datafusion-comet/issues/1669#issuecomment-2821372388 This sounds good to me. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to th

Re: [I] Memory limited nest loop join [datafusion]

2025-04-22 Thread via GitHub
UBarney commented on issue #15760: URL: https://github.com/apache/datafusion/issues/15760#issuecomment-2821374117 > If the right side is a table backed by a local file, the operator can scan it again without buffering or spilling. I previously thought that 'backed by a local file' an

Re: [PR] Fix Infer prepare statement type tests [datafusion]

2025-04-22 Thread via GitHub
brayanjuls commented on code in PR #15743: URL: https://github.com/apache/datafusion/pull/15743#discussion_r2054252874 ## datafusion/sql/tests/sql_integration.rs: ## @@ -4673,16 +4675,17 @@ fn test_infer_types_from_predicate() { } #[test] -fn test_infer_types_from_between_pr

Re: [PR] feat: More warning info for users [datafusion-comet]

2025-04-22 Thread via GitHub
comphead commented on code in PR #1667: URL: https://github.com/apache/datafusion-comet/pull/1667#discussion_r2054587262 ## spark/src/main/scala/org/apache/comet/CometSparkSessionExtensions.scala: ## @@ -161,10 +161,28 @@ class CometSparkSessionExtensions }

Re: [PR] Implement Parquet filter pushdown via new filter pushdown APIs [datafusion]

2025-04-22 Thread via GitHub
adriangb commented on PR #15769: URL: https://github.com/apache/datafusion/pull/15769#issuecomment-2822076779 https://github.com/apache/datafusion/issues/15812 surfaced another reason why building the predicates from the files schemas is necessary -- This is an automated message from the

Re: [PR] refactor filter pushdown apis [datafusion]

2025-04-22 Thread via GitHub
adriangb commented on code in PR #15801: URL: https://github.com/apache/datafusion/pull/15801#discussion_r2054603061 ## datafusion/physical-plan/src/filter.rs: ## @@ -438,55 +440,112 @@ impl ExecutionPlan for FilterExec { try_embed_projection(projection, self) }

Re: [PR] feat: More warning info for users [datafusion-comet]

2025-04-22 Thread via GitHub
andygrove commented on code in PR #1667: URL: https://github.com/apache/datafusion-comet/pull/1667#discussion_r2054613906 ## spark/src/main/scala/org/apache/comet/CometSparkSessionExtensions.scala: ## @@ -161,10 +161,28 @@ class CometSparkSessionExtensions }

Re: [PR] feat: More warning info for users [datafusion-comet]

2025-04-22 Thread via GitHub
andygrove commented on code in PR #1667: URL: https://github.com/apache/datafusion-comet/pull/1667#discussion_r2054613906 ## spark/src/main/scala/org/apache/comet/CometSparkSessionExtensions.scala: ## @@ -161,10 +161,28 @@ class CometSparkSessionExtensions }

Re: [PR] feat: More warning info for users [datafusion-comet]

2025-04-22 Thread via GitHub
andygrove commented on code in PR #1667: URL: https://github.com/apache/datafusion-comet/pull/1667#discussion_r2054613906 ## spark/src/main/scala/org/apache/comet/CometSparkSessionExtensions.scala: ## @@ -161,10 +161,28 @@ class CometSparkSessionExtensions }

[I] Type coercion does not handle `Float16` correctly [datafusion]

2025-04-22 Thread via GitHub
etseidl opened a new issue, #15815: URL: https://github.com/apache/datafusion/issues/15815 ### Describe the bug Performing a query where a `Float16` column is compared to an integer literal results in the `Float16` being cast to `Int64`. ```sql > explain format indent select *

Re: [PR] refactor filter pushdown apis [datafusion]

2025-04-22 Thread via GitHub
adriangb commented on PR #15801: URL: https://github.com/apache/datafusion/pull/15801#issuecomment-2822461601 cc @berkaysynnada -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comme

Re: [PR] fix: validate range_fn when align not found [datafusion-sqlparser-rs]

2025-04-22 Thread via GitHub
killme2008 closed pull request #1820: fix: validate range_fn when align not found URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1820 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[PR] fix: validate range_fn when align not found [datafusion-sqlparser-rs]

2025-04-22 Thread via GitHub
killme2008 opened a new pull request, #1820: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1820 Try to fix https://github.com/GreptimeTeam/greptimedb/issues/5957 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub an

[PR] fix: Add coercion rules for Float16 types [datafusion]

2025-04-22 Thread via GitHub
etseidl opened a new pull request, #15816: URL: https://github.com/apache/datafusion/pull/15816 ## Which issue does this PR close? - Closes #15815. ## Rationale for this change Queries of `Float16` columns using integer literals can lead to a loss of precision be

Re: [PR] fix: Add coercion rules for Float16 types [datafusion]

2025-04-22 Thread via GitHub
etseidl commented on code in PR #15816: URL: https://github.com/apache/datafusion/pull/15816#discussion_r2054873962 ## datafusion/expr-common/src/type_coercion/binary.rs: ## @@ -931,6 +931,7 @@ fn coerce_numeric_type_to_decimal(numeric_type: &DataType) -> Option { Int3

[PR] perf: Experimental fix to avoid join strategy regression [datafusion-comet]

2025-04-22 Thread via GitHub
andygrove opened a new pull request, #1674: URL: https://github.com/apache/datafusion-comet/pull/1674 ## Which issue does this PR close? Closes #. ## Rationale for this change ## What changes are included in this PR? ## How are these changes

Re: [PR] doc: Document local HDFS setup [datafusion-comet]

2025-04-22 Thread via GitHub
codecov-commenter commented on PR #1673: URL: https://github.com/apache/datafusion-comet/pull/1673#issuecomment-2822552934 ## [Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/1673?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_ca

Re: [I] Use HDFS file system based on some parameter (schema or `spark.defaultFS`) [datafusion-comet]

2025-04-22 Thread via GitHub
comphead closed issue #1360: Use HDFS file system based on some parameter (schema or `spark.defaultFS`) URL: https://github.com/apache/datafusion-comet/issues/1360 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL a

Re: [I] [Experimental] Integrate Comet native reader with remote HDFS [datafusion-comet]

2025-04-22 Thread via GitHub
comphead commented on issue #1336: URL: https://github.com/apache/datafusion-comet/issues/1336#issuecomment-2822709486 Closing this, as everything completed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abov

  1   2   >