Re: [PR] Add `FormatOptions` to Config [datafusion]

2025-05-02 Thread via GitHub
blaginin commented on PR #15793: URL: https://github.com/apache/datafusion/pull/15793#issuecomment-2847785699 Thank for the reviews!!! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] Add `FormatOptions` to Config [datafusion]

2025-05-02 Thread via GitHub
blaginin merged PR #15793: URL: https://github.com/apache/datafusion/pull/15793 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

Re: [I] Add an option to display column types in the table [datafusion]

2025-05-02 Thread via GitHub
blaginin closed issue #15442: Add an option to display column types in the table URL: https://github.com/apache/datafusion/issues/15442 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific co

Re: [PR] perf: Add memory profiling [datafusion-comet]

2025-05-02 Thread via GitHub
andygrove commented on PR #1702: URL: https://github.com/apache/datafusion-comet/pull/1702#issuecomment-2847894878 I'm not sure how to reconcile the different resident numbers: ``` JVM_MEMORY: { heapUsed: 1290, heapCommitted: 3184, nonHeapUsed: 111, nonHeapCommitted: 120 } NATI

Re: [PR] Implement intermeidate result blocked approach sketch [datafusion]

2025-05-02 Thread via GitHub
alamb commented on PR #15591: URL: https://github.com/apache/datafusion/pull/15591#issuecomment-2847864538 Starting to check this out -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] Perf: Optimize in memory sort [datafusion]

2025-05-02 Thread via GitHub
alamb commented on PR #15380: URL: https://github.com/apache/datafusion/pull/15380#issuecomment-2847867039 πŸ€– `./gh_compare_branch.sh` [Benchmark Script](https://github.com/alamb/datafusion-benchmarking/blob/main/gh_compare_branch.sh) Running Linux aal-dev 6.11.0-1013-gcp #13~24.04.1-Ubun

Re: [PR] Perf: Optimize in memory sort [datafusion]

2025-05-02 Thread via GitHub
alamb commented on PR #15380: URL: https://github.com/apache/datafusion/pull/15380#issuecomment-2847867775 πŸ€– `./gh_compare_branch.sh` [Benchmark Script](https://github.com/alamb/datafusion-benchmarking/blob/main/gh_compare_branch.sh) Running Linux aal-dev 6.11.0-1013-gcp #13~24.04.1-Ubun

Re: [PR] perf: Add memory profiling [datafusion-comet]

2025-05-02 Thread via GitHub
andygrove commented on PR #1702: URL: https://github.com/apache/datafusion-comet/pull/1702#issuecomment-2847882035 Moving to draft while I experiment with jemalloc-specific logging -- This is an automated message from the Apache Git Service. To respond to the message, please log on to Git

Re: [PR] perf: Add memory profiling [datafusion-comet]

2025-05-02 Thread via GitHub
andygrove commented on PR #1702: URL: https://github.com/apache/datafusion-comet/pull/1702#issuecomment-2847888664 I added some `jemalloc` specific logging: ``` NATIVE_MEMORY_JEMALLOC: { allocated=561605760, resident=3749654528 } NATIVE_MEMORY_JEMALLOC: { allocated=360603312, re

Re: [PR] Implement intermediate result blocked approach sketch [datafusion]

2025-05-02 Thread via GitHub
alamb commented on code in PR #15591: URL: https://github.com/apache/datafusion/pull/15591#discussion_r2072019575 ## datafusion/common/src/config.rs: ## @@ -405,6 +405,18 @@ config_namespace! { /// in joins can reduce memory usage when joining large /// tables

Re: [I] Wrong query results for filters that involve partition columns and data file columns [datafusion]

2025-05-02 Thread via GitHub
alamb commented on issue #15912: URL: https://github.com/apache/datafusion/issues/15912#issuecomment-2847949124 sounds serious -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment

Re: [I] Support metadata columns (`location`, `size`, `last_modified`) in `ListingTableProvider` [datafusion]

2025-05-02 Thread via GitHub
alamb commented on issue #15173: URL: https://github.com/apache/datafusion/issues/15173#issuecomment-2847954914 Here is another example of complexity related to this type of operation: - https://github.com/apache/datafusion/issues/15912 (and all the more reason not to make `Listing

Re: [I] Wrong query results for filters that involve partition columns and data file columns and `pushdown_filters` is enabled [datafusion]

2025-05-02 Thread via GitHub
alamb commented on issue #15912: URL: https://github.com/apache/datafusion/issues/15912#issuecomment-2847952914 This is all the more reason I think to avoid adding more complexity to ListingTable as we are disucssing in - https://github.com/apache/datafusion/issues/15173 -- Th

Re: [I] Support integration with Parquet modular encryption [datafusion]

2025-05-02 Thread via GitHub
alamb commented on issue #15216: URL: https://github.com/apache/datafusion/issues/15216#issuecomment-2847108736 Here is how spark does encryption configuration https://spark.apache.org/docs/latest/sql-data-sources-parquet.html -- This is an automated message from the Apache Git Serv

Re: [PR] Add `CREATE TRIGGER` support for SQL Server [datafusion-sqlparser-rs]

2025-05-02 Thread via GitHub
aharpervc commented on code in PR #1810: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1810#discussion_r2071734824 ## src/dialect/mssql.rs: ## @@ -215,6 +225,59 @@ impl MsSqlDialect { })) } +/// Parse `CREATE TRIGGER` for [MsSql] +/// +

[PR] fix: overcounting of memory in first/last. [datafusion]

2025-05-02 Thread via GitHub
ashdnazg opened a new pull request, #15924: URL: https://github.com/apache/datafusion/pull/15924 ## Which issue does this PR close? - Closes #15923. ## Rationale for this change When aggregating first/last list over a column of lists, the first/last accumulators hold the n

Re: [PR] Add all missing table options to be handled in any order [datafusion-sqlparser-rs]

2025-05-02 Thread via GitHub
tomershaniii commented on PR #1747: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1747#issuecomment-2847146382 @iffyio Note that i am unable to resolve the issues on the PR (as i am not the owner of the original PR), please consider all issues resolved. -- This is an automa

Re: [PR] fix: overcounting of memory in first/last. [datafusion]

2025-05-02 Thread via GitHub
ashdnazg commented on code in PR #15924: URL: https://github.com/apache/datafusion/pull/15924#discussion_r2071709763 ## datafusion/common/src/scalar/mod.rs: ## @@ -3415,6 +3415,100 @@ impl ScalarValue { .map(|sv| sv.size() - size_of_val(sv)) .su

Re: [PR] refactor: replace `unwrap_or` with `unwrap_or_else` for improved lazy… [datafusion]

2025-05-02 Thread via GitHub
NevroHelios commented on code in PR #15841: URL: https://github.com/apache/datafusion/pull/15841#discussion_r2071709066 ## benchmarks/src/util/options.rs: ## @@ -72,16 +72,11 @@ impl CommonOpt { /// Modify the existing config appropriately pub fn update_config(&self, m

Re: [PR] Resolved bug in `parse_function_arg` [datafusion-sqlparser-rs]

2025-05-02 Thread via GitHub
LucaCappelletti94 commented on code in PR #1826: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1826#discussion_r2071710061 ## src/parser/mod.rs: ## @@ -5199,13 +5199,20 @@ impl<'a> Parser<'a> { // parse: [ argname ] argtype let mut name = None;

Re: [I] `DROP DOMAIN` is not supported [datafusion-sqlparser-rs]

2025-05-02 Thread via GitHub
LucaCappelletti94 closed issue #1827: `DROP DOMAIN` is not supported URL: https://github.com/apache/datafusion-sqlparser-rs/issues/1827 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific co

Re: [PR] perf: Add performance tracing capability [datafusion-comet]

2025-05-02 Thread via GitHub
andygrove commented on PR #1706: URL: https://github.com/apache/datafusion-comet/pull/1706#issuecomment-2847545879 > UPD: it works with binaries profiling, but I'm not sure how to use it in hybrid env like Comet through I was looking for a solution where we can choose which sections

[I] Deterministic Dictionary testing in CometFuzzTestSuite [datafusion]

2025-05-02 Thread via GitHub
mbutrovich opened a new issue, #15925: URL: https://github.com/apache/datafusion/issues/15925 ### Is your feature request related to a problem or challenge? #1697 exposed an issue in the random data generation: depending on RNG you can get Dictionary encoded values when written to Par

Re: [PR] Added SQL Example for `Aggregate Functions` [datafusion]

2025-05-02 Thread via GitHub
Adez017 commented on PR #15778: URL: https://github.com/apache/datafusion/pull/15778#issuecomment-2846563520 could you trigger the CI again @xudong963 @alamb -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL ab

Re: [PR] Perf: Optimize in memory sort [datafusion]

2025-05-02 Thread via GitHub
zhuqi-lucas commented on PR #15380: URL: https://github.com/apache/datafusion/pull/15380#issuecomment-2847094436 Thank you @alamb @Dandandan , i can checkout the script too. > πŸ€–: Benchmark completed > > Details The first result of the sort-tpch small data set,

[I] Inconsistent column name case handling when round tripping names from arrow metadata [datafusion]

2025-05-02 Thread via GitHub
abey79 opened a new issue, #15922: URL: https://github.com/apache/datafusion/issues/15922 ### Describe the bug Given a table in a `SessionContext` and the `RecordBatch` that backs it (e.g. through `ctx.register_batch()`), I want to refer to the table's columns using the field names f

Re: [PR] Minor: cleanup datafusion-spark scalar functions [datafusion]

2025-05-02 Thread via GitHub
Copilot commented on code in PR #15921: URL: https://github.com/apache/datafusion/pull/15921#discussion_r2071745291 ## datafusion/spark/src/function/utils.rs: ## @@ -28,7 +28,7 @@ pub mod test { let expected: datafusion_common::Result> = $EXPECTED; let

Re: [PR] feat: create helpers to set the max_temp_directory_size [datafusion]

2025-05-02 Thread via GitHub
alamb commented on PR #15919: URL: https://github.com/apache/datafusion/pull/15919#issuecomment-2847420251 FYI @2010YOUY01 who I think was thinking about something similar -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and us

Re: [PR] feat: regexp_replace() expression with no starting offset [datafusion-comet]

2025-05-02 Thread via GitHub
mbutrovich commented on code in PR #1700: URL: https://github.com/apache/datafusion-comet/pull/1700#discussion_r2071746681 ## spark/src/test/scala/org/apache/comet/CometFuzzTestSuite.scala: ## @@ -188,6 +188,22 @@ class CometFuzzTestSuite extends CometTestBase with AdaptiveSpar

Re: [I] Add memory profiling / logging [datafusion-comet]

2025-05-02 Thread via GitHub
alamb commented on issue #1701: URL: https://github.com/apache/datafusion-comet/issues/1701#issuecomment-2847426725 Related discussion in DataFusion: - https://github.com/apache/datafusion/issues/14510 -- This is an automated message from the Apache Git Service. To respond to the mess

[I] Experimental native Parquet readers unpack dictionaries by default [datafusion-comet]

2025-05-02 Thread via GitHub
mbutrovich opened a new issue, #1707: URL: https://github.com/apache/datafusion-comet/issues/1707 ### What is the problem the feature request solves? When using `native_datafusion` or `native_iceberg_compat` Parquet readers based on DataFusion's DataSourceExec, the schemas that Comet

Re: [PR] POC: Parse to Merge Logical Plan [datafusion]

2025-05-02 Thread via GitHub
jonathanc-n commented on code in PR #15862: URL: https://github.com/apache/datafusion/pull/15862#discussion_r2071905742 ## datafusion/expr/src/logical_plan/dml.rs: ## @@ -241,3 +241,10 @@ fn make_count_schema() -> DFSchemaRef { .unwrap(), ) } + Review Comment

Re: [I] ClickBench extended queries are not working - `WITHIN GROUP clause is required when calling ordered set aggregate function(approx_percentile_cont)` [datafusion]

2025-05-02 Thread via GitHub
alamb commented on issue #15927: URL: https://github.com/apache/datafusion/issues/15927#issuecomment-2847981986 I believe the correct syntax is ```sql > SELECT "ClientIP", "WatchID", COUNT(*) c, MIN("ResponseStartTiming") tmin, APPROX_PERCENTILE_CONT(0.95) WITHIN GROUP (ORDER BY "Re

[PR] Implementation for regex_instr [datafusion]

2025-05-02 Thread via GitHub
nirnayroy opened a new pull request, #15928: URL: https://github.com/apache/datafusion/pull/15928 ## Which issue does this PR close? - Closes #13009 ## Rationale for this change Implements a regex SQL standard function in datafusion ## What changes are inc

[PR] fix: sqllogictest on Windows [datafusion]

2025-05-02 Thread via GitHub
nuno-faria opened a new pull request, #15932: URL: https://github.com/apache/datafusion/pull/15932 ## Which issue does this PR close? - N/A ## Rationale for this change There were some sqllogictests writing to `/tmp/...`, which on Windows works but writes

Re: [PR] fix: sqllogictest on Windows [datafusion]

2025-05-02 Thread via GitHub
alamb commented on PR #15932: URL: https://github.com/apache/datafusion/pull/15932#issuecomment-2848025719 FYI @xudong963 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. T

Re: [I] Wrong query results for filters that involve partition columns and data file columns and `pushdown_filters` is enabled [datafusion]

2025-05-02 Thread via GitHub
alamb commented on issue #15912: URL: https://github.com/apache/datafusion/issues/15912#issuecomment-2848050170 > Also although yes this is serious I suspect is pretty rare to have a filter that depends on both a partition column and data column. It hasn't been reported for years...

Re: [PR] feat: create helpers to set the max_temp_directory_size [datafusion]

2025-05-02 Thread via GitHub
alamb commented on PR #15919: URL: https://github.com/apache/datafusion/pull/15919#issuecomment-2848052439 > CI is failing but that seems not related πŸ€” I think we just need to merge up from main -- I will do so (and thanks @xudong963 for enabling the button on github UI!) -- This

Re: [PR] Implement intermediate result blocked approach to aggregation memory management [datafusion]

2025-05-02 Thread via GitHub
alamb commented on PR #15591: URL: https://github.com/apache/datafusion/pull/15591#issuecomment-2848062289 πŸ€–: Benchmark completed Details ``` Comparing HEAD and intermeidate-result-blocked-approach Benchmark clickbench_1.json

Re: [I] Wrong query results for filters that involve partition columns and data file columns and `pushdown_filters` is enabled [datafusion]

2025-05-02 Thread via GitHub
adriangb commented on issue #15912: URL: https://github.com/apache/datafusion/issues/15912#issuecomment-2848061667 Good point -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [I] Wrong query results for filters that involve partition columns and data file columns and `pushdown_filters` is enabled [datafusion]

2025-05-02 Thread via GitHub
adriangb commented on issue #15912: URL: https://github.com/apache/datafusion/issues/15912#issuecomment-2847987958 Also although yes this is serious I suspect is pretty rare to have a filter that depends on both a partition column and data column. It hasn't been reported for years... --

Re: [PR] Implement intermediate result blocked approach to aggregation memory management [datafusion]

2025-05-02 Thread via GitHub
alamb commented on PR #15591: URL: https://github.com/apache/datafusion/pull/15591#issuecomment-2847988587 πŸ€– `./gh_compare_branch.sh` [Benchmark Script](https://github.com/alamb/datafusion-benchmarking/blob/main/gh_compare_branch.sh) Running Linux aal-dev 6.11.0-1013-gcp #13~24.04.1-Ubun

Re: [I] ClickBench extended queries are not working - `WITHIN GROUP clause is required when calling ordered set aggregate function(approx_percentile_cont)` [datafusion]

2025-05-02 Thread via GitHub
alamb commented on issue #15927: URL: https://github.com/apache/datafusion/issues/15927#issuecomment-2847974750 I believe this was changed in this PR: - https://github.com/apache/datafusion/pull/13511 -- This is an automated message from the Apache Git Service. To respond to the messag

Re: [I] Wrong query results for filters that involve partition columns and data file columns and `pushdown_filters` is enabled [datafusion]

2025-05-02 Thread via GitHub
adriangb commented on issue #15912: URL: https://github.com/apache/datafusion/issues/15912#issuecomment-2847975562 I think the fix is relatively simple though: any filters that reference both partition columns and data columns need to be marked as Inexact. I'm traveling so don't know that I

[I] ClickBench extended queries are not working [datafusion]

2025-05-02 Thread via GitHub
alamb opened a new issue, #15927: URL: https://github.com/apache/datafusion/issues/15927 ### Describe the bug https://github.com/apache/datafusion/tree/main/benchmarks/queries/clickbench#extended-queries > The "extended" queries are not part of the official ClickBench benchmark

Re: [PR] Minor: cleanup datafusion-spark scalar functions [datafusion]

2025-05-02 Thread via GitHub
alamb commented on code in PR #15921: URL: https://github.com/apache/datafusion/pull/15921#discussion_r2072077763 ## datafusion/spark/src/function/utils.rs: ## @@ -67,7 +69,12 @@ pub mod test { let return_field = return_field.unwrap(); a

Re: [PR] Minor: cleanup datafusion-spark scalar functions [datafusion]

2025-05-02 Thread via GitHub
alamb commented on PR #15921: URL: https://github.com/apache/datafusion/pull/15921#issuecomment-2847983724 Thank you for the review @blaginin -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] Minor: cleanup datafusion-spark scalar functions [datafusion]

2025-05-02 Thread via GitHub
alamb merged PR #15921: URL: https://github.com/apache/datafusion/pull/15921 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] Fix ClickBench extended queries after update to APPROX_PERCENTILE_CONT [datafusion]

2025-05-02 Thread via GitHub
alamb commented on code in PR #15929: URL: https://github.com/apache/datafusion/pull/15929#discussion_r2072093281 ## benchmarks/queries/clickbench/extended.sql: ## @@ -3,5 +3,5 @@ SELECT COUNT(DISTINCT "HitColor"), COUNT(DISTINCT "BrowserCountry"), COUNT(DISTI SELECT "BrowserC

[PR] Fix ClickBench extended queries after update to APPROX_PERCENTILE_CONT [datafusion]

2025-05-02 Thread via GitHub
alamb opened a new pull request, #15929: URL: https://github.com/apache/datafusion/pull/15929 ## Which issue does this PR close? - Closes https://github.com/apache/datafusion/issues/15927 ## Rationale for this change See https://github.com/apache/datafusion/issues/15927

[I] Speedup character_length [datafusion]

2025-05-02 Thread via GitHub
Dandandan opened a new issue, #15930: URL: https://github.com/apache/datafusion/issues/15930 ### Is your feature request related to a problem or challenge? This is a hot function in some benchmarks - let's optimize it a bit further ### Describe the solution you'd like * F

Re: [PR] Support WITHIN GROUP syntax to standardize certain existing aggregate functions [datafusion]

2025-05-02 Thread via GitHub
alamb commented on PR #13511: URL: https://github.com/apache/datafusion/pull/13511#issuecomment-2848017108 One of the extended clickbench queries also needed to be updated to the new syntax. I made a PR to do this here: - https://github.com/apache/datafusion/pull/15929 -- This is an au

[PR] Speedup `character_length` [datafusion]

2025-05-02 Thread via GitHub
Dandandan opened a new pull request, #15931: URL: https://github.com/apache/datafusion/pull/15931 ## Which issue does this PR close? - Closes #. ## Rationale for this change ## What changes are included in this PR? Closes: #15930 ## A

[PR] docs: Label `bloom_filter_on_read` as a reading config [datafusion]

2025-05-02 Thread via GitHub
nuno-faria opened a new pull request, #15933: URL: https://github.com/apache/datafusion/pull/15933 ## Which issue does this PR close? - N/A ## Rationale for this change The `bloom_filter_on_read` config was incorrectly labeled as a writing config.

Re: [PR] add benchmark code for `Reuse rows in row cursor stream` [datafusion]

2025-05-02 Thread via GitHub
alamb commented on code in PR #15913: URL: https://github.com/apache/datafusion/pull/15913#discussion_r2072106466 ## datafusion/physical-plan/benches/sort_preserving.rs: ## @@ -0,0 +1,138 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor l

[PR] chore: Prepare for DataFusion 48.0.0 [datafusion-comet]

2025-05-02 Thread via GitHub
andygrove opened a new pull request, #1710: URL: https://github.com/apache/datafusion-comet/pull/1710 ## Which issue does this PR close? Closes #. ## Rationale for this change ## What changes are included in this PR? ## How are these changes

Re: [PR] Speedup `character_length` [datafusion]

2025-05-02 Thread via GitHub
Dandandan commented on PR #15931: URL: https://github.com/apache/datafusion/pull/15931#issuecomment-2848098062 The main speedup I am expecting in e2e benchmarks is query 27 of clickbench, which has some mixed ascii / utf8 data and uses a `LENGTH` function. Local runs don't show a very lar

[I] Release sqlparser-rs version `0.57.0` around 2024-06-15 [datafusion-sqlparser-rs]

2025-05-02 Thread via GitHub
alamb opened a new issue, #1837: URL: https://github.com/apache/datafusion-sqlparser-rs/issues/1837 Follow on to - https://github.com/apache/datafusion-sqlparser-rs/issues/1756 This ticket tracks creating the next sqlparser release (mostly so others can follow along) **Targ

Re: [PR] feat: create helpers to set the max_temp_directory_size [datafusion]

2025-05-02 Thread via GitHub
alamb commented on PR #15919: URL: https://github.com/apache/datafusion/pull/15919#issuecomment-2848109385 πŸ€” but now CI is not running; Closing / reopening this PR to retrigger -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

Re: [PR] feat: create helpers to set the max_temp_directory_size [datafusion]

2025-05-02 Thread via GitHub
alamb closed pull request #15919: feat: create helpers to set the max_temp_directory_size URL: https://github.com/apache/datafusion/pull/15919 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spe

Re: [I] Release sqlparser-rs version `0.56.0` around 2024-04-20 [datafusion-sqlparser-rs]

2025-05-02 Thread via GitHub
alamb commented on issue #1756: URL: https://github.com/apache/datafusion-sqlparser-rs/issues/1756#issuecomment-2848112796 The release is published to crates.io: https://crates.io/crates/sqlparser/0.56.0 - Next release tracked here: https://github.com/apache/datafusion-sqlparser-rs/

Re: [I] Release sqlparser-rs version `0.56.0` around 2024-04-20 [datafusion-sqlparser-rs]

2025-05-02 Thread via GitHub
alamb closed issue #1756: Release sqlparser-rs version `0.56.0` around 2024-04-20 URL: https://github.com/apache/datafusion-sqlparser-rs/issues/1756 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to t

Re: [PR] feat: create helpers to set the max_temp_directory_size [datafusion]

2025-05-02 Thread via GitHub
alamb commented on code in PR #15919: URL: https://github.com/apache/datafusion/pull/15919#discussion_r2072147661 ## datafusion-cli/src/main.rs: ## @@ -177,13 +177,14 @@ async fn main_inner() -> Result<()> { // set disk limit if let Some(disk_limit) = args.disk_limit

Re: [PR] perf: Add memory profiling [datafusion-comet]

2025-05-02 Thread via GitHub
comphead commented on code in PR #1702: URL: https://github.com/apache/datafusion-comet/pull/1702#discussion_r2072182045 ## native/core/src/execution/jni_api.rs: ## @@ -359,6 +367,41 @@ pub unsafe extern "system" fn Java_org_apache_comet_Native_executePlan( // Retrieve

[I] Parquet predicate filters fail with "Invalid comparison operation: Utf8View <= Utf8" [datafusion]

2025-05-02 Thread via GitHub
ctsk opened a new issue, #15920: URL: https://github.com/apache/datafusion/issues/15920 ### Describe the bug When running the tpch benchmark with debug logging, some errors are logged - they look like this: > DEBUG datafusion_datasource_parquet::row_group_filter] Error evaluati

Re: [PR] Perf: Optimize in memory sort [datafusion]

2025-05-02 Thread via GitHub
alamb commented on PR #15380: URL: https://github.com/apache/datafusion/pull/15380#issuecomment-2846952043 > To me this looks like a good improvement. @alamb can we rerun the benchmarks on this to see if we don't get regressions? Will do Just FYI I would like to find some way t

Re: [I] [datafusion-spark] Example of using Spark compatible function library [datafusion]

2025-05-02 Thread via GitHub
alamb commented on issue #15915: URL: https://github.com/apache/datafusion/issues/15915#issuecomment-2846943883 Thanks @Adez017 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comme

Re: [I] Add support for event tracing for visualizing where time is spent during execution [datafusion-comet]

2025-05-02 Thread via GitHub
alamb commented on issue #1705: URL: https://github.com/apache/datafusion-comet/issues/1705#issuecomment-2846947377 @geoffreyclaude added some hooks recently in DataFusion to support various tracing libraries. See the really nice example here: - https://github.com/apache/datafusion/blob

Re: [PR] refactor: replace `unwrap_or` with `unwrap_or_else` for improved lazy… [datafusion]

2025-05-02 Thread via GitHub
alamb commented on code in PR #15841: URL: https://github.com/apache/datafusion/pull/15841#discussion_r2071453759 ## benchmarks/src/util/options.rs: ## @@ -72,16 +72,11 @@ impl CommonOpt { /// Modify the existing config appropriately pub fn update_config(&self, mut con

Re: [PR] refactor filter pushdown apis [datafusion]

2025-05-02 Thread via GitHub
berkaysynnada commented on PR #15801: URL: https://github.com/apache/datafusion/pull/15801#issuecomment-2846962860 Hi again @adriangb. I've sent the commit, but unfortunately I can't say I fully achieved what I had in mind. Most of the work are idiomatic changes, comments and naming -- I'd

Re: [PR] Perf: Optimize in memory sort [datafusion]

2025-05-02 Thread via GitHub
alamb commented on PR #15380: URL: https://github.com/apache/datafusion/pull/15380#issuecomment-2846963551 πŸ€– `./gh_compare_branch.sh` [Benchmark Script](https://github.com/alamb/datafusion-benchmarking/blob/main/gh_compare_branch.sh) Running Linux aal-dev 6.11.0-1013-gcp #13~24.04.1-Ubun

Re: [PR] Fix CI in main [datafusion]

2025-05-02 Thread via GitHub
alamb merged PR #15917: URL: https://github.com/apache/datafusion/pull/15917 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] feat: Add `datafusion-spark` crate [datafusion]

2025-05-02 Thread via GitHub
alamb commented on PR #15168: URL: https://github.com/apache/datafusion/pull/15168#issuecomment-2846971785 > Fyi, the main CI has failed since the PR @blaginin has fixed it -- it appears to have been a logical conflict -- This is an automated message from the Apache Git Service. To

Re: [PR] Support inferring new predicates to push down [datafusion]

2025-05-02 Thread via GitHub
berkaysynnada commented on code in PR #15906: URL: https://github.com/apache/datafusion/pull/15906#discussion_r2071468319 ## datafusion/optimizer/src/push_down_filter.rs: ## @@ -1382,6 +1386,73 @@ fn contain(e: &Expr, check_map: &HashMap) -> bool { is_contain } +/// Inf

Re: [PR] Perf: Optimize in memory sort [datafusion]

2025-05-02 Thread via GitHub
alamb commented on PR #15380: URL: https://github.com/apache/datafusion/pull/15380#issuecomment-2846984954 πŸ€–: Benchmark completed Details ``` Comparing HEAD and concat_batches_for_sort Benchmark sort_tpch.json ┏━━━

Re: [PR] Perf: Optimize in memory sort [datafusion]

2025-05-02 Thread via GitHub
alamb commented on PR #15380: URL: https://github.com/apache/datafusion/pull/15380#issuecomment-2846985013 πŸ€– `./gh_compare_branch.sh` [Benchmark Script](https://github.com/alamb/datafusion-benchmarking/blob/main/gh_compare_branch.sh) Running Linux aal-dev 6.11.0-1013-gcp #13~24.04.1-Ubun

[PR] Minor: cleanup datafusion-spark scalar functions [datafusion]

2025-05-02 Thread via GitHub
alamb opened a new pull request, #15921: URL: https://github.com/apache/datafusion/pull/15921 ## Which issue does this PR close? - Follow on to https://github.com/apache/datafusion/pull/15917 from @blaginin ## Rationale for this change I was reviewing the macro

Re: [PR] Minor: cleanup datafusion-spark scalar functions [datafusion]

2025-05-02 Thread via GitHub
alamb commented on code in PR #15921: URL: https://github.com/apache/datafusion/pull/15921#discussion_r2071475600 ## datafusion/spark/src/function/utils.rs: ## @@ -67,7 +69,12 @@ pub mod test { let return_field = return_field.unwrap(); a

Re: [PR] Minor: cleanup datafusion-spark scalar functions [datafusion]

2025-05-02 Thread via GitHub
alamb commented on code in PR #15921: URL: https://github.com/apache/datafusion/pull/15921#discussion_r2071475600 ## datafusion/spark/src/function/utils.rs: ## @@ -67,7 +69,12 @@ pub mod test { let return_field = return_field.unwrap(); a

Re: [PR] Perf: Optimize in memory sort [datafusion]

2025-05-02 Thread via GitHub
Dandandan commented on PR #15380: URL: https://github.com/apache/datafusion/pull/15380#issuecomment-2846999317 > > To me this looks like a good improvement. @alamb can we rerun the benchmarks on this to see if we don't get regressions? > > Will do > > Just FYI I

Re: [I] Support metadata columns (`location`, `size`, `last_modified`) in `ListingTableProvider` [datafusion]

2025-05-02 Thread via GitHub
alamb commented on issue #15173: URL: https://github.com/apache/datafusion/issues/15173#issuecomment-2847003930 > I think a good first step would be figuring out how to abstract out the partition columns so that the existing ListingTable can work without any special code in core. I a

Re: [PR] Migrate Optimizer tests to insta, part3 [datafusion]

2025-05-02 Thread via GitHub
blaginin merged PR #15893: URL: https://github.com/apache/datafusion/pull/15893 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

Re: [PR] support OR operator in binary `evaluate_bounds` [datafusion]

2025-05-02 Thread via GitHub
berkaysynnada commented on code in PR #15716: URL: https://github.com/apache/datafusion/pull/15716#discussion_r2071501942 ## datafusion/physical-expr/src/intervals/cp_solver.rs: ## @@ -645,6 +645,17 @@ impl ExprIntervalGraph { .map(|child| self.graph[*child].int

[I] Add imdb 10 rows slt test [datafusion]

2025-05-02 Thread via GitHub
jayzhan211 opened a new issue, #15934: URL: https://github.com/apache/datafusion/issues/15934 ### Is your feature request related to a problem or challenge? Follow up issue from https://github.com/apache/datafusion/issues/12311 We have clickbench.slt that run the queries against

Re: [PR] perf: Add memory profiling [datafusion-comet]

2025-05-02 Thread via GitHub
comphead commented on code in PR #1702: URL: https://github.com/apache/datafusion-comet/pull/1702#discussion_r2072183808 ## native/core/src/execution/jni_api.rs: ## @@ -359,6 +367,41 @@ pub unsafe extern "system" fn Java_org_apache_comet_Native_executePlan( // Retrieve

Re: [PR] Fix ClickBench extended queries after update to APPROX_PERCENTILE_CONT [datafusion]

2025-05-02 Thread via GitHub
jayzhan211 merged PR #15929: URL: https://github.com/apache/datafusion/pull/15929 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dat

Re: [PR] Fix ClickBench extended queries after update to APPROX_PERCENTILE_CONT [datafusion]

2025-05-02 Thread via GitHub
jayzhan211 commented on PR #15929: URL: https://github.com/apache/datafusion/pull/15929#issuecomment-2848333212 Thanks @alamb -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment

Re: [I] ClickBench extended queries are not working - `WITHIN GROUP clause is required when calling ordered set aggregate function(approx_percentile_cont)` [datafusion]

2025-05-02 Thread via GitHub
jayzhan211 closed issue #15927: ClickBench extended queries are not working - `WITHIN GROUP clause is required when calling ordered set aggregate function(approx_percentile_cont)` URL: https://github.com/apache/datafusion/issues/15927 -- This is an automated message from the Apache Git Servi

Re: [PR] Implement intermediate result blocked approach to aggregation memory management [datafusion]

2025-05-02 Thread via GitHub
jayzhan211 commented on PR #15591: URL: https://github.com/apache/datafusion/pull/15591#issuecomment-2848409762 how is the benchmark triggered and can we run clickbench extended too? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to Gi

Re: [PR] refactor: replace `unwrap_or` with `unwrap_or_else` for improved lazy… [datafusion]

2025-05-02 Thread via GitHub
NevroHelios commented on PR #15841: URL: https://github.com/apache/datafusion/pull/15841#issuecomment-2848405766 Hi. I pushed the updates. Could you please re run the CI? @alamb -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[PR] [wip] feat: Add support for `expm1` expression from `datafusion-spark` crate [datafusion-comet]

2025-05-02 Thread via GitHub
andygrove opened a new pull request, #1711: URL: https://github.com/apache/datafusion-comet/pull/1711 ## Which issue does this PR close? Part of https://github.com/apache/datafusion-comet/issues/1704 ## Rationale for this change Demonstrate using an expres

[PR] chore: fix CI job name [datafusion-comet]

2025-05-02 Thread via GitHub
hsiang-c opened a new pull request, #1712: URL: https://github.com/apache/datafusion-comet/pull/1712 ## Which issue does this PR close? Closes #. N/A ## Rationale for this change Make sure CI job name matches the actual scan implementation in test #

Re: [PR] feat: use edition 2024 [datafusion-sqlparser-rs]

2025-05-02 Thread via GitHub
github-actions[bot] closed pull request #1736: feat: use edition 2024 URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1736 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific com

Re: [I] `select count(distinct ..)` query doesn't go to the specialized distinct accumulator [datafusion]

2025-05-02 Thread via GitHub
jayzhan211 commented on issue #15850: URL: https://github.com/apache/datafusion/issues/15850#issuecomment-2848380600 Another idea is that what if we alter this plan to count distinct specific ``` 05)AggregateExec: mode=FinalPartitioned, gby=[alias1@0 as alias1], aggr=[]

Re: [I] Deterministic Dictionary testing in CometFuzzTestSuite [datafusion]

2025-05-02 Thread via GitHub
mbutrovich closed issue #15925: Deterministic Dictionary testing in CometFuzzTestSuite URL: https://github.com/apache/datafusion/issues/15925 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spec

Re: [I] Deterministic Dictionary testing in CometFuzzTestSuite [datafusion]

2025-05-02 Thread via GitHub
mbutrovich commented on issue #15925: URL: https://github.com/apache/datafusion/issues/15925#issuecomment-2847625240 Wrong repo, my bad :( -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spec

Re: [PR] Allow stored procedures to be defined without `BEGIN`/`END` [datafusion-sqlparser-rs]

2025-05-02 Thread via GitHub
aharpervc commented on PR #1834: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1834#issuecomment-2847567460 I have temporarily rebased this branch on https://github.com/apache/datafusion-sqlparser-rs/pull/1810 to pick up the new helper function and similarly use the enum patt

[PR] Add Support for Dynamic SQL Macros for Flexible Column Selection [datafusion]

2025-05-02 Thread via GitHub
kumarlokesh opened a new pull request, #15926: URL: https://github.com/apache/datafusion/pull/15926 ## Which issue does this PR close? - Closes #14512. ## Rationale for this change ## What changes are included in this PR? ## Are these change

Re: [I] [DISCUSSION] JOIN "task force" / project team [datafusion]

2025-05-02 Thread via GitHub
2010YOUY01 commented on issue #15885: URL: https://github.com/apache/datafusion/issues/15885#issuecomment-2847562551 > > not sure if it will help direction, cost nothing to share :) [Debunking the Myth of Join Ordering: Toward Robust SQL Analytics](https://arxiv.org/abs/2502.15181) >

Re: [PR] Add `CREATE TRIGGER` support for SQL Server [datafusion-sqlparser-rs]

2025-05-02 Thread via GitHub
aharpervc commented on code in PR #1810: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1810#discussion_r2071819600 ## src/dialect/mssql.rs: ## @@ -215,6 +225,59 @@ impl MsSqlDialect { })) } +/// Parse `CREATE TRIGGER` for [MsSql] +/// +

  1   2   >