Re: [PR] fix: Check acquired memory when CometMemoryPool grows [datafusion-comet]

2025-05-12 Thread via GitHub
codecov-commenter commented on PR #1732: URL: https://github.com/apache/datafusion-comet/pull/1732#issuecomment-2871305060 ## [Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/1732?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_ca

[PR] chore(deps): bump sysinfo from 0.34.2 to 0.35.1 [datafusion]

2025-05-12 Thread via GitHub
dependabot[bot] opened a new pull request, #16027: URL: https://github.com/apache/datafusion/pull/16027 Bumps [sysinfo](https://github.com/GuillaumeGomez/sysinfo) from 0.34.2 to 0.35.1. Changelog Sourced from https://github.com/GuillaumeGomez/sysinfo/blob/master/CHANGELOG.md";>sysi

[I] Check acquired memory when CometMemoryPool grows [datafusion-comet]

2025-05-12 Thread via GitHub
wForget opened a new issue, #1733: URL: https://github.com/apache/datafusion-comet/issues/1733 ### Describe the bug The memory acquired by `CometMemoryPool.grow` may be less than the actual request, so `CometMemoryPool.shrink` may release more memory than acquired memory. htt

Re: [PR] Add late pruning of file based on file level statistics [datafusion]

2025-05-12 Thread via GitHub
berkaysynnada commented on code in PR #16014: URL: https://github.com/apache/datafusion/pull/16014#discussion_r2083998118 ## datafusion/datasource/src/file_stream.rs: ## @@ -367,7 +368,7 @@ impl Default for OnError { pub trait FileOpener: Unpin + Send + Sync { /// Asynchro

Re: [PR] [DNM] fix: Avoid releasing unacquired memory [datafusion-comet]

2025-05-12 Thread via GitHub
codecov-commenter commented on PR #1731: URL: https://github.com/apache/datafusion-comet/pull/1731#issuecomment-2871211326 ## [Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/1731?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_ca

[PR] refactor: move `should_enable_page_index` from `mod.rs` to `opener.rs` [datafusion]

2025-05-12 Thread via GitHub
miroim opened a new pull request, #16026: URL: https://github.com/apache/datafusion/pull/16026 ## Which issue does this PR close? - Closes #16008. ## Rationale for this change This pull request refactors the `should_enable_page_index` function by relocating it from `

Re: [PR] Introduce Async User Defined Functions [datafusion]

2025-05-12 Thread via GitHub
berkaysynnada commented on PR #14837: URL: https://github.com/apache/datafusion/pull/14837#issuecomment-2871247518 > 1. The async overhead (e.g. what it takes to make await vs a normal function) could be noticable, but maybe not that big a deal If the awaited task doesn't include any

Re: [PR] fix: Fix data race in memory profiling [datafusion-comet]

2025-05-12 Thread via GitHub
wForget commented on code in PR #1727: URL: https://github.com/apache/datafusion-comet/pull/1727#discussion_r2083780711 ## spark/src/main/java/org/apache/spark/CometTaskMemoryManager.java: ## @@ -30,36 +34,41 @@ * memory manager. This assumes Spark's off-heap memory mode is en

Re: [I] Linear Aggregate Functions Optimization [datafusion]

2025-05-12 Thread via GitHub
berkaysynnada commented on issue #15633: URL: https://github.com/apache/datafusion/issues/15633#issuecomment-2871250901 > Found discord is banned in my current mac (work mac belonging to company), I plan to switch to work on my personal mac and start to communicate on it today later.

Re: [PR] Enhance Schema adapter to accommodate evolving struct [datafusion]

2025-05-12 Thread via GitHub
kosiew commented on PR #15295: URL: https://github.com/apache/datafusion/pull/15295#issuecomment-2871799892 hi @TheBuilderJR , I pushed a simpler test_datafusion_schema_evolution- https://github.com/apache/datafusion/pull/15295/files#diff-f73022e8850396e8f5595b58e1b24a7fd08499dc6ac

Re: [I] Linear Aggregate Functions Optimization [datafusion]

2025-05-12 Thread via GitHub
Rachelint commented on issue #15633: URL: https://github.com/apache/datafusion/issues/15633#issuecomment-2873408010 Already submit a pr https://github.com/synnada-ai/datafusion-upstream/pull/71 Switching work mac for more convenient communication now. -- This is an automated message

Re: [PR] Add h2o window benchmark [datafusion]

2025-05-12 Thread via GitHub
alamb commented on code in PR #16003: URL: https://github.com/apache/datafusion/pull/16003#discussion_r2085183657 ## benchmarks/README.md: ## @@ -591,49 +599,46 @@ For example, to run query 1 with the small data generated above: cargo run --release --bin dfbench -- h2o --path

[PR] Improve docs for Exprs and scalar functions [datafusion]

2025-05-12 Thread via GitHub
alamb opened a new pull request, #16036: URL: https://github.com/apache/datafusion/pull/16036 ## Which issue does this PR close? - Closes #. ## Rationale for this change While preparing to review https://github.com/apache/datafusion/pull/15911 from @timsaucer I

Re: [I] Move datasource-parquet `should_enable_page_index` from `mod.rs` to `opener.rs` [datafusion]

2025-05-12 Thread via GitHub
alamb closed issue #16008: Move datasource-parquet `should_enable_page_index` from `mod.rs` to `opener.rs` URL: https://github.com/apache/datafusion/issues/16008 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL ab

Re: [PR] refactor: move `should_enable_page_index` from `mod.rs` to `opener.rs` [datafusion]

2025-05-12 Thread via GitHub
alamb merged PR #16026: URL: https://github.com/apache/datafusion/pull/16026 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] refactor: move `should_enable_page_index` from `mod.rs` to `opener.rs` [datafusion]

2025-05-12 Thread via GitHub
alamb commented on PR #16026: URL: https://github.com/apache/datafusion/pull/16026#issuecomment-2873450880 Thanks @miroim and @xudong963 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the speci

Re: [PR] support simple/cross lateral joins [datafusion]

2025-05-12 Thread via GitHub
alamb commented on PR #16015: URL: https://github.com/apache/datafusion/pull/16015#issuecomment-2873457094 FYI @Lordworms I am not sure if this is related to some of your other work -- This is an automated message from the Apache Git Service. To respond to the message, please log on to Gi

Re: [I] IDEA: Use one of the examples from Datafusion Blog 45 to complete custom logical plans/execution plans page [datafusion]

2025-05-12 Thread via GitHub
alamb closed issue #15422: IDEA: Use one of the examples from Datafusion Blog 45 to complete custom logical plans/execution plans page URL: https://github.com/apache/datafusion/issues/15422 -- This is an automated message from the Apache Git Service. To respond to the message, please log on t

Re: [PR] feat: add macros for DataFusionError variants [datafusion]

2025-05-12 Thread via GitHub
comphead commented on code in PR #15946: URL: https://github.com/apache/datafusion/pull/15946#discussion_r2085166189 ## datafusion/common/src/error.rs: ## @@ -655,6 +671,20 @@ impl DataFusionError { queue.push_back(self); ErrorIterator { queue } } + +/

Re: [PR] fix: add an "expr_planners" method to SessionState [datafusion]

2025-05-12 Thread via GitHub
alamb merged PR #15119: URL: https://github.com/apache/datafusion/pull/15119 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [I] Regression: count(*) does not work unless using default expression planners [datafusion]

2025-05-12 Thread via GitHub
alamb closed issue #15114: Regression: count(*) does not work unless using default expression planners URL: https://github.com/apache/datafusion/issues/15114 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above t

Re: [PR] Updated extending operators documentation [datafusion]

2025-05-12 Thread via GitHub
alamb merged PR #15612: URL: https://github.com/apache/datafusion/pull/15612 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] Updated extending operators documentation [datafusion]

2025-05-12 Thread via GitHub
alamb commented on PR #15612: URL: https://github.com/apache/datafusion/pull/15612#issuecomment-2873452897 Thanks again@ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [PR] feat(proto): udf decoding fallback [datafusion]

2025-05-12 Thread via GitHub
alamb merged PR #15997: URL: https://github.com/apache/datafusion/pull/15997 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [I] More flexible udf decoding in datafusion-proto [datafusion]

2025-05-12 Thread via GitHub
alamb closed issue #15996: More flexible udf decoding in datafusion-proto URL: https://github.com/apache/datafusion/issues/15996 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] feat(proto): udf decoding fallback [datafusion]

2025-05-12 Thread via GitHub
alamb commented on PR #15997: URL: https://github.com/apache/datafusion/pull/15997#issuecomment-2873454438 🚀 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe

Re: [PR] DRAFT: Eliminate Self Joins [datafusion]

2025-05-12 Thread via GitHub
alamb commented on PR #16023: URL: https://github.com/apache/datafusion/pull/16023#issuecomment-2873459281 FYI @irenjj -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To u

Re: [PR] [WIP] refactor: framework for subquery decorrelation [datafusion]

2025-05-12 Thread via GitHub
alamb commented on PR #16016: URL: https://github.com/apache/datafusion/pull/16016#issuecomment-2873460039 FYI @irenjj -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To u

Re: [PR] Include data types in logical plans of inferred prepare statements [datafusion]

2025-05-12 Thread via GitHub
alamb commented on code in PR #16019: URL: https://github.com/apache/datafusion/pull/16019#discussion_r2085171533 ## datafusion/sqllogictest/test_files/prepare.slt: ## @@ -92,7 +92,7 @@ DEALLOCATE my_plan statement ok PREPARE my_plan AS SELECT * FROM person WHERE id < $1; -s

Re: [PR] chore: Replace MSRV link on main page with Github badge [datafusion]

2025-05-12 Thread via GitHub
comphead merged PR #16020: URL: https://github.com/apache/datafusion/pull/16020 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

Re: [PR] implement pretty-printing with `{:#}` [datafusion-sqlparser-rs]

2025-05-12 Thread via GitHub
lovasoa commented on code in PR #1847: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1847#discussion_r2085360498 ## src/display_utils.rs: ## @@ -0,0 +1,133 @@ +//! Utilities for formatting SQL AST nodes with pretty printing support. +//! +//! The module provides f

Re: [PR] Add late pruning of file based on file level statistics [datafusion]

2025-05-12 Thread via GitHub
adriangb commented on code in PR #16014: URL: https://github.com/apache/datafusion/pull/16014#discussion_r2085377284 ## datafusion/physical-optimizer/src/pruning.rs: ## @@ -995,6 +996,184 @@ fn build_statistics_record_batch( }) } +/// Prune a set of containers represente

Re: [PR] fix: Fix data race in memory profiling [datafusion-comet]

2025-05-12 Thread via GitHub
comphead commented on code in PR #1727: URL: https://github.com/apache/datafusion-comet/pull/1727#discussion_r2085153843 ## spark/src/main/java/org/apache/spark/CometTaskMemoryManager.java: ## @@ -30,36 +34,41 @@ * memory manager. This assumes Spark's off-heap memory mode is e

Re: [PR] fix: Check acquired memory when CometMemoryPool grows [datafusion-comet]

2025-05-12 Thread via GitHub
comphead commented on code in PR #1732: URL: https://github.com/apache/datafusion-comet/pull/1732#discussion_r2085153040 ## spark/src/main/java/org/apache/spark/CometTaskMemoryManager.java: ## @@ -48,6 +48,10 @@ public CometTaskMemoryManager(long id) { // Returns the actual a

Re: [PR] fix: Fix data race in memory profiling [datafusion-comet]

2025-05-12 Thread via GitHub
comphead commented on code in PR #1727: URL: https://github.com/apache/datafusion-comet/pull/1727#discussion_r2085157238 ## spark/src/main/java/org/apache/spark/CometTaskMemoryManager.java: ## @@ -30,36 +34,41 @@ * memory manager. This assumes Spark's off-heap memory mode is e

[PR] Use qualified names on DELETE selections [datafusion]

2025-05-12 Thread via GitHub
nuno-faria opened a new pull request, #16033: URL: https://github.com/apache/datafusion/pull/16033 ## Which issue does this PR close? - N/A. ## Rationale for this change The logical plan for Deletions was not using qualified column names for the selections, which

Re: [PR] refactor: remove deprecated `JsonExec` [datafusion]

2025-05-12 Thread via GitHub
alamb commented on PR #16005: URL: https://github.com/apache/datafusion/pull/16005#issuecomment-2873593998 > Sounds good @berkaysynnada @miroim please update also the `upgrading.md` file Since I filed https://github.com/apache/datafusion/issues/15950 I will make a PR to update upgrad

Re: [PR] refactor: remove deprecated `JsonExec` [datafusion]

2025-05-12 Thread via GitHub
alamb commented on PR #16005: URL: https://github.com/apache/datafusion/pull/16005#issuecomment-2873627863 > > Sounds good @berkaysynnada @miroim please update also the `upgrading.md` file > > Since I filed #15950 I will make a PR to update upgrading.md - Filed https://github.c

Re: [PR] Add late pruning of file based on file level statistics [datafusion]

2025-05-12 Thread via GitHub
adriangb commented on code in PR #16014: URL: https://github.com/apache/datafusion/pull/16014#discussion_r2085375843 ## datafusion/physical-optimizer/src/pruning.rs: ## @@ -995,6 +996,184 @@ fn build_statistics_record_batch( }) } +/// Prune a set of containers represente

[I] Allow filtering specific sqllogictests [datafusion]

2025-05-12 Thread via GitHub
gabotechs opened a new issue, #16028: URL: https://github.com/apache/datafusion/issues/16028 ### Is your feature request related to a problem or challenge? While executing sqllogictests, we can only filter for specific files, but we cannot chose to execute specific tests within a sing

Re: [PR] fix: stack overflow for substrait functions with large argument lists that translate to DataFusion binary operators [datafusion]

2025-05-12 Thread via GitHub
LiaCastaneda commented on code in PR #16031: URL: https://github.com/apache/datafusion/pull/16031#discussion_r2084629270 ## datafusion/substrait/src/logical_plan/consumer/expr/scalar_function.rs: ## @@ -124,6 +109,31 @@ pub fn name_to_op(name: &str) -> Option { } } +///

Re: [PR] fix: stack overflow for substrait functions with large argument lists that translate to DataFusion binary operators [datafusion]

2025-05-12 Thread via GitHub
LiaCastaneda commented on code in PR #16031: URL: https://github.com/apache/datafusion/pull/16031#discussion_r2084631098 ## datafusion/substrait/src/logical_plan/consumer/expr/scalar_function.rs: ## @@ -124,6 +109,31 @@ pub fn name_to_op(name: &str) -> Option { } } +///

Re: [I] Project Ideas for GSoC 2025 (Google Summer of Code) [datafusion]

2025-05-12 Thread via GitHub
oznur-synnada commented on issue #14478: URL: https://github.com/apache/datafusion/issues/14478#issuecomment-2872516534 Hi all, we've registered @Rachelint as the co-mentor on the GSoC portal since they've already agreed to the program rules. However @XiangpengHao could be the unofficial 3r

Re: [PR] Introduce Async User Defined Functions [datafusion]

2025-05-12 Thread via GitHub
ozankabak commented on PR #14837: URL: https://github.com/apache/datafusion/pull/14837#issuecomment-2872558077 Can we extend this sort of an approach to UDAFs? Having two entirely separate mechanisms would not be great. -- This is an automated message from the Apache Git Service. To respo

Re: [PR] fix: Check acquired memory when CometMemoryPool grows [datafusion-comet]

2025-05-12 Thread via GitHub
andygrove commented on code in PR #1732: URL: https://github.com/apache/datafusion-comet/pull/1732#discussion_r2084723948 ## native/core/src/execution/memory_pools/unified_pool.rs: ## @@ -89,10 +89,8 @@ unsafe impl Send for CometMemoryPool {} unsafe impl Sync for CometMemoryPoo

Re: [PR] Introduce Async User Defined Functions [datafusion]

2025-05-12 Thread via GitHub
alamb commented on PR #14837: URL: https://github.com/apache/datafusion/pull/14837#issuecomment-2872678278 > Can we extend this sort of an approach to UDAFs? Having two entirely separate mechanisms would not be great. The data for an `async` user defined aggregate function must come f

Re: [PR] fix: Fix data race in memory profiling [datafusion-comet]

2025-05-12 Thread via GitHub
andygrove commented on code in PR #1727: URL: https://github.com/apache/datafusion-comet/pull/1727#discussion_r2084735443 ## spark/src/main/java/org/apache/spark/CometTaskMemoryManager.java: ## @@ -30,36 +34,41 @@ * memory manager. This assumes Spark's off-heap memory mode is

Re: [PR] Fix Infer prepare statement type tests [datafusion]

2025-05-12 Thread via GitHub
kczimm commented on code in PR #15743: URL: https://github.com/apache/datafusion/pull/15743#discussion_r2084740956 ## datafusion/sql/tests/sql_integration.rs: ## @@ -4607,82 +4607,58 @@ fn test_prepare_statement_to_plan_params_as_constants() { } #[test] -fn test_infer_types

Re: [PR] Introduce Async User Defined Functions [datafusion]

2025-05-12 Thread via GitHub
ozankabak commented on PR #14837: URL: https://github.com/apache/datafusion/pull/14837#issuecomment-2872740234 > Maybe the usecaes is "send many rows of data to a remote LLM service" for example, That's a use case, but there are others too. Maybe one runs a forecast model, which is a

Re: [PR] Add late pruning of file based on file level statistics [datafusion]

2025-05-12 Thread via GitHub
alamb commented on code in PR #16014: URL: https://github.com/apache/datafusion/pull/16014#discussion_r2084838154 ## datafusion/datasource/src/file_stream.rs: ## @@ -367,7 +368,7 @@ impl Default for OnError { pub trait FileOpener: Unpin + Send + Sync { /// Asynchronously o

Re: [PR] Improve docs for Exprs and scalar functions [datafusion]

2025-05-12 Thread via GitHub
ig-raiderler commented on PR #16036: URL: https://github.com/apache/datafusion/pull/16036#issuecomment-2874015203 The performance optimizations here are quite clever. Nice engineering work! -- This is an automated message from the Apache Git Service. To respond to the message, please log o

Re: [PR] Add support for `GO` batch delimiter in SQL Server [datafusion-sqlparser-rs]

2025-05-12 Thread via GitHub
aharpervc commented on PR #1809: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1809#issuecomment-2874016995 > But actually I think if we can come up with something that doesn't touch that core parser loop we might have something mergeable. I'll reopen this PR, feel free to it

Re: [PR] fix: Check acquired memory when CometMemoryPool grows [datafusion-comet]

2025-05-12 Thread via GitHub
kazuyukitanimura commented on code in PR #1732: URL: https://github.com/apache/datafusion-comet/pull/1732#discussion_r2085522012 ## native/core/src/execution/memory_pools/unified_pool.rs: ## @@ -89,10 +89,8 @@ unsafe impl Send for CometMemoryPool {} unsafe impl Sync for CometMe

Re: [PR] chore: Comet + Iceberg (1.8.1) CI [datafusion-comet]

2025-05-12 Thread via GitHub
hsiang-c commented on code in PR #1715: URL: https://github.com/apache/datafusion-comet/pull/1715#discussion_r2085595037 ## .github/workflows/iceberg_spark_test.yml: ## @@ -0,0 +1,80 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license

Re: [PR] fix: Skip row index Spark SQL tests for native_datafusion Parquet scan [datafusion-comet]

2025-05-12 Thread via GitHub
parthchandra commented on PR #1724: URL: https://github.com/apache/datafusion-comet/pull/1724#issuecomment-2874346842 > I didn't see anything that could be uniquely identified as the row index column Maybe `ShimFileFormat.findRowIndexColumnIndexInSchema` can locate it? I don't know

Re: [PR] Add late pruning of file based on file level statistics [datafusion]

2025-05-12 Thread via GitHub
alamb commented on code in PR #16014: URL: https://github.com/apache/datafusion/pull/16014#discussion_r2085480884 ## datafusion/physical-optimizer/src/pruning.rs: ## @@ -995,6 +996,184 @@ fn build_statistics_record_batch( }) } +/// Prune a set of containers represented b

Re: [PR] Improve docs for Exprs and scalar functions [datafusion]

2025-05-12 Thread via GitHub
alamb commented on PR #16036: URL: https://github.com/apache/datafusion/pull/16036#issuecomment-2874045933 > The performance optimizations here are quite clever. Nice engineering work! I am not sure what performance optimizations you are referring to I think this PR is just docs --

Re: [PR] chore: Comet + Iceberg (1.8.1) CI [datafusion-comet]

2025-05-12 Thread via GitHub
kazuyukitanimura commented on code in PR #1715: URL: https://github.com/apache/datafusion-comet/pull/1715#discussion_r2085534875 ## .github/workflows/iceberg_spark_test.yml: ## @@ -0,0 +1,80 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor

Re: [PR] fix: Check acquired memory when CometMemoryPool grows [datafusion-comet]

2025-05-12 Thread via GitHub
andygrove commented on code in PR #1732: URL: https://github.com/apache/datafusion-comet/pull/1732#discussion_r2085594915 ## native/core/src/execution/memory_pools/unified_pool.rs: ## @@ -89,10 +89,8 @@ unsafe impl Send for CometMemoryPool {} unsafe impl Sync for CometMemoryPoo

Re: [PR] fix: Check acquired memory when CometMemoryPool grows [datafusion-comet]

2025-05-12 Thread via GitHub
andygrove merged PR #1732: URL: https://github.com/apache/datafusion-comet/pull/1732 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@

Re: [I] CometMemoryPool may release more memory than acquired memory [datafusion-comet]

2025-05-12 Thread via GitHub
andygrove closed issue #1733: CometMemoryPool may release more memory than acquired memory URL: https://github.com/apache/datafusion-comet/issues/1733 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] fix: Check acquired memory when CometMemoryPool grows [datafusion-comet]

2025-05-12 Thread via GitHub
andygrove commented on code in PR #1732: URL: https://github.com/apache/datafusion-comet/pull/1732#discussion_r2085641094 ## spark/src/main/java/org/apache/spark/CometTaskMemoryManager.java: ## @@ -48,6 +48,10 @@ public CometTaskMemoryManager(long id) { // Returns the actual

Re: [I] CometMemoryPool sometimes goes negative [datafusion-comet]

2025-05-12 Thread via GitHub
andygrove closed issue #1726: CometMemoryPool sometimes goes negative URL: https://github.com/apache/datafusion-comet/issues/1726 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] fix: Fix data race in memory profiling [datafusion-comet]

2025-05-12 Thread via GitHub
andygrove merged PR #1727: URL: https://github.com/apache/datafusion-comet/pull/1727 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@

Re: [PR] feat: Improve performance tracing feature [datafusion-comet]

2025-05-12 Thread via GitHub
andygrove commented on code in PR #1730: URL: https://github.com/apache/datafusion-comet/pull/1730#discussion_r2085748917 ## native/core/src/execution/tracing.rs: ## @@ -95,30 +102,41 @@ pub(crate) fn trace_end(name: &str) { RECORDER.end_task(name); } -pub(crate) fn log_

Re: [PR] Include data types in logical plans of inferred prepare statements [datafusion]

2025-05-12 Thread via GitHub
qstommyshu commented on code in PR #16019: URL: https://github.com/apache/datafusion/pull/16019#discussion_r2085769121 ## datafusion/sql/src/statement.rs: ## @@ -710,6 +710,17 @@ impl SqlToRel<'_, S> { *statement, &mut planner_context,

Re: [PR] feat: Improve performance tracing feature [datafusion-comet]

2025-05-12 Thread via GitHub
comphead commented on code in PR #1730: URL: https://github.com/apache/datafusion-comet/pull/1730#discussion_r2085773798 ## native/core/src/execution/shuffle/shuffle_writer.rs: ## @@ -650,96 +653,97 @@ impl ShufflePartitioner for MultiPartitionShuffleRepartitioner { /// Th

Re: [PR] feat: Improve performance tracing feature [datafusion-comet]

2025-05-12 Thread via GitHub
comphead commented on code in PR #1730: URL: https://github.com/apache/datafusion-comet/pull/1730#discussion_r2085775857 ## native/core/src/execution/shuffle/shuffle_writer.rs: ## @@ -625,22 +626,24 @@ impl MultiPartitionShuffleRepartitioner { return Ok(());

Re: [PR] [DNM] fix: Avoid releasing unacquired memory [datafusion-comet]

2025-05-12 Thread via GitHub
wForget closed pull request #1731: [DNM] fix: Avoid releasing unacquired memory URL: https://github.com/apache/datafusion-comet/pull/1731 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] fix: Support Schema Evolution in iceberg [datafusion-comet]

2025-05-12 Thread via GitHub
huaxingao closed pull request #1723: fix: Support Schema Evolution in iceberg URL: https://github.com/apache/datafusion-comet/pull/1723 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific co

Re: [PR] Include data types in logical plans of inferred prepare statements [datafusion]

2025-05-12 Thread via GitHub
alamb commented on code in PR #16019: URL: https://github.com/apache/datafusion/pull/16019#discussion_r2085180878 ## datafusion/sql/src/statement.rs: ## @@ -710,6 +710,15 @@ impl SqlToRel<'_, S> { *statement, &mut planner_context,

Re: [PR] implement pretty-printing with `{:#}` [datafusion-sqlparser-rs]

2025-05-12 Thread via GitHub
alamb commented on code in PR #1847: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1847#discussion_r2085270580 ## src/ast/mod.rs: ## @@ -628,7 +634,12 @@ pub struct CaseWhen { impl fmt::Display for CaseWhen { fn fmt(&self, f: &mut fmt::Formatter) -> fmt::R

Re: [I] Allow filtering specific sqllogictests [datafusion]

2025-05-12 Thread via GitHub
2010YOUY01 commented on issue #16028: URL: https://github.com/apache/datafusion/issues/16028#issuecomment-2875212201 This feature is useful to me when debugging. +1 🙌🏼 We might also extend this syntax to execute line ranges like ```sh # run aggregate.slt from line 100 to line 20

Re: [I] Stack overflow for substrait functions with large argument lists that translate to DataFusion binary operators [datafusion]

2025-05-12 Thread via GitHub
fmonjalet commented on issue #16030: URL: https://github.com/apache/datafusion/issues/16030#issuecomment-2872256155 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[I] Stack overflow for substrait functions with large argument lists that translate to DataFusion binary operators [datafusion]

2025-05-12 Thread via GitHub
fmonjalet opened a new issue, #16030: URL: https://github.com/apache/datafusion/issues/16030 ### Describe the bug When [translating a substrait scalar function call](https://github.com/fmonjalet/datafusion/blob/b7060c143c3eede8f49f8b51d92e184043e3a081/datafusion/substrait/src/logical_

[PR] Support filtering specific sqllogictests identified by line number [datafusion]

2025-05-12 Thread via GitHub
gabotechs opened a new pull request, #16029: URL: https://github.com/apache/datafusion/pull/16029 ## Which issue does this PR close? - Closes #16028. ## Rationale for this change Being able to execute only certain sqllogictest cases identified by line num

Re: [I] Allow filtering specific sqllogictests [datafusion]

2025-05-12 Thread via GitHub
gabotechs commented on issue #16028: URL: https://github.com/apache/datafusion/issues/16028#issuecomment-2872066892 Made this draft PR https://github.com/apache/datafusion/pull/16029, but I'll be curious to know what people think about this before committing to anything -- This is an auto

Re: [PR] fix: stack overflow for substrait functions with large argument lists that translate to DataFusion binary operators [datafusion]

2025-05-12 Thread via GitHub
jayzhan211 commented on PR #16031: URL: https://github.com/apache/datafusion/pull/16031#issuecomment-2872337218 Is it possible to add the test that generate equivalent json? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

Re: [PR] fix: stack overflow for substrait functions with large argument lists that translate to DataFusion binary operators [datafusion]

2025-05-12 Thread via GitHub
fmonjalet commented on PR #16031: URL: https://github.com/apache/datafusion/pull/16031#issuecomment-2872319033 According to the [contributor guide](https://datafusion.apache.org/contributor-guide/index.html#pull-request-overview), mentioning @alamb to trigger the CI because you were the mos

[PR] fix: stack overflow for substrait functions with large argument lists that translate to DataFusion binary operators [datafusion]

2025-05-12 Thread via GitHub
fmonjalet opened a new pull request, #16031: URL: https://github.com/apache/datafusion/pull/16031 ## Which issue does this PR close? - Closes #16030 ## Rationale for this change Some substrait plans can cause DataFusion to trigger stack overflow, crashing the hosting pro

Re: [PR] fix: stack overflow for substrait functions with large argument lists that translate to DataFusion binary operators [datafusion]

2025-05-12 Thread via GitHub
gabotechs commented on code in PR #16031: URL: https://github.com/apache/datafusion/pull/16031#discussion_r2084556175 ## datafusion/substrait/src/logical_plan/consumer/expr/scalar_function.rs: ## @@ -124,6 +109,31 @@ pub fn name_to_op(name: &str) -> Option { } } +/// Bui

Re: [I] Linear Aggregate Functions Optimization [datafusion]

2025-05-12 Thread via GitHub
Rachelint commented on issue #15633: URL: https://github.com/apache/datafusion/issues/15633#issuecomment-2872388738 > > Found discord is banned in my current mac (work mac belonging to company), I plan to switch to work on my personal mac and start to communicate on it today later. >

Re: [I] Support Min/Max accumulator for type List [datafusion]

2025-05-12 Thread via GitHub
LiaCastaneda closed issue #15477: Support Min/Max accumulator for type List URL: https://github.com/apache/datafusion/issues/15477 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment

Re: [I] Project Ideas for GSoC 2025 (Google Summer of Code) [datafusion]

2025-05-12 Thread via GitHub
alamb commented on issue #14478: URL: https://github.com/apache/datafusion/issues/14478#issuecomment-2872828969 > @alamb since you're already the co-mentor of another project we feel focusing solely on that would be more manageable. We're being cautious as we're not sure how much effort and

Re: [PR] Min max over lists [datafusion]

2025-05-12 Thread via GitHub
alamb commented on code in PR #16025: URL: https://github.com/apache/datafusion/pull/16025#discussion_r2084919951 ## datafusion/functions-aggregate/src/min_max.rs: ## @@ -616,7 +616,8 @@ fn min_batch(values: &ArrayRef) -> Result { min_binary_view )

Re: [PR] fix: overcounting of memory in first/last. [datafusion]

2025-05-12 Thread via GitHub
alamb commented on PR #15924: URL: https://github.com/apache/datafusion/pull/15924#issuecomment-2873022128 🤖 `./gh_compare_branch.sh` [Benchmark Script](https://github.com/alamb/datafusion-benchmarking/blob/main/gh_compare_branch.sh) Running Linux aal-dev 6.11.0-1013-gcp #13~24.04.1-Ubun

Re: [PR] fix: overcounting of memory in first/last. [datafusion]

2025-05-12 Thread via GitHub
ashdnazg commented on PR #15924: URL: https://github.com/apache/datafusion/pull/15924#issuecomment-2873056330 @alamb fixed. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] refactor: remove deprecated `JsonExec` [datafusion]

2025-05-12 Thread via GitHub
comphead commented on PR #16005: URL: https://github.com/apache/datafusion/pull/16005#issuecomment-2873058448 Sounds good @berkaysynnada @miroim please update also the `upgrading.md` file -- This is an automated message from the Apache Git Service. To respond to the message, please log

Re: [I] Table function supports non-literal args [datafusion]

2025-05-12 Thread via GitHub
alamb commented on issue #14958: URL: https://github.com/apache/datafusion/issues/14958#issuecomment-2873061319 > wating for [#16015](https://github.com/apache/datafusion/pull/16015) to be merged Maybe @irenjj has some thoughts as he startes reviewing correlated subqueries -- This

Re: [PR] refactor: remove deprecated `ArrowExec` [datafusion]

2025-05-12 Thread via GitHub
comphead commented on PR #16006: URL: https://github.com/apache/datafusion/pull/16006#issuecomment-2873059704 Sounds good @berkaysynnada @miroim please update also the `upgrading.md` file -- This is an automated message from the Apache Git Service. To respond to the message, please log

Re: [PR] refactor: remove deprecated `MemoryExec` [datafusion]

2025-05-12 Thread via GitHub
comphead commented on PR #16007: URL: https://github.com/apache/datafusion/pull/16007#issuecomment-2873060482 Sounds good @berkaysynnada @miroim please update also the upgrading.md file -- This is an automated message from the Apache Git Service. To respond to the message, please log on

Re: [PR] Support `MIN` and `MAX` for `DataType::List` [datafusion]

2025-05-12 Thread via GitHub
gabotechs commented on code in PR #16025: URL: https://github.com/apache/datafusion/pull/16025#discussion_r2084928534 ## datafusion/functions-aggregate/src/min_max.rs: ## @@ -625,7 +626,7 @@ fn min_batch(values: &ArrayRef) -> Result { }) } -fn min_max_batch_struct(array:

Re: [PR] fix: overcounting of memory in first/last. [datafusion]

2025-05-12 Thread via GitHub
alamb commented on code in PR #15924: URL: https://github.com/apache/datafusion/pull/15924#discussion_r2084924225 ## datafusion/common/src/scalar/mod.rs: ## @@ -3435,49 +3435,80 @@ impl ScalarValue { .sum::() } -/// Performs a deep clone of the Scalar

Re: [PR] Support `MIN` and `MAX` for `DataType::List` [datafusion]

2025-05-12 Thread via GitHub
gabotechs commented on code in PR #16025: URL: https://github.com/apache/datafusion/pull/16025#discussion_r2084937353 ## datafusion/functions-aggregate/src/min_max.rs: ## @@ -616,7 +616,8 @@ fn min_batch(values: &ArrayRef) -> Result { min_binary_view

[I] Add support for Min/Max over LargeList and FixedSizeList [datafusion]

2025-05-12 Thread via GitHub
gabotechs opened a new issue, #16032: URL: https://github.com/apache/datafusion/issues/16032 ### Is your feature request related to a problem or challenge? Support for Min/Max over lists was shipped on the following PR: - https://github.com/apache/datafusion/pull/16025 It wou

Re: [PR] fix: overcounting of memory in first/last. [datafusion]

2025-05-12 Thread via GitHub
eshed-flarion commented on code in PR #15924: URL: https://github.com/apache/datafusion/pull/15924#discussion_r2084949448 ## datafusion/functions-aggregate/src/first_last.rs: ## @@ -1772,4 +1790,60 @@ mod tests { Ok(()) } + +#[test] +fn test_first_list_ac

Re: [PR] fix: allow arbitrary operators with ANY and ALL on Postgres [datafusion-sqlparser-rs]

2025-05-12 Thread via GitHub
lovasoa commented on PR #1842: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1842#issuecomment-2873043513 Maybe we can remove the check entirely ? In my opinion, this should be a logical error. And when not sure, it's always better to accept parsing and let library user

Re: [PR] fix: overcounting of memory in first/last. [datafusion]

2025-05-12 Thread via GitHub
eshed-flarion commented on PR #15924: URL: https://github.com/apache/datafusion/pull/15924#issuecomment-2873053955 @alamb fixed. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comme

Re: [PR] support simple/cross lateral joins [datafusion]

2025-05-12 Thread via GitHub
comphead commented on code in PR #16015: URL: https://github.com/apache/datafusion/pull/16015#discussion_r2084982971 ## datafusion/optimizer/src/decorrelate.rs: ## @@ -313,6 +317,11 @@ impl TreeNodeRewriter for PullUpCorrelatedExpr { missing_exprs.push(u

Re: [PR] fix: Skip row index Spark SQL tests for native_datafusion Parquet scan [datafusion-comet]

2025-05-12 Thread via GitHub
mbutrovich commented on PR #1724: URL: https://github.com/apache/datafusion-comet/pull/1724#issuecomment-2873162728 > I believe the schema we get at planning time will have the temporary name to indicate that this is the column where the row index should be populated. When I looked i

  1   2   >