Re: [I] Register schema table, failed to resolve schema [datafusion]

2025-05-05 Thread via GitHub
shencangsheng closed issue #15897: Register schema table, failed to resolve schema URL: https://github.com/apache/datafusion/issues/15897 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] Implement intermediate result blocked approach to aggregation memory management [datafusion]

2025-05-05 Thread via GitHub
Rachelint commented on PR #15591: URL: https://github.com/apache/datafusion/pull/15591#issuecomment-2853367261 @jayzhan211 @alamb I guess I nearly understand why It is related to `target_partitions`, we maintain the `intermediate results` respectively in each `partition`. So wh

Re: [I] Memory leak in `datafusion-cli` [datafusion]

2025-05-05 Thread via GitHub
2010YOUY01 commented on issue #15939: URL: https://github.com/apache/datafusion/issues/15939#issuecomment-2853267673 > This might be due to the system allocator you are using (e.g. `jemalloc` for example). Often as a performance optimization such allocators will not return memory to the und

Re: [PR] PERF : modify SMJ shuffle file reader to skip validation [datafusion]

2025-05-05 Thread via GitHub
2010YOUY01 commented on PR #15948: URL: https://github.com/apache/datafusion/pull/15948#issuecomment-2853206011 > > Benchmark test is not included because it was difficult to limit them to shuffle read scope. > > Did you test it locally? Do you have any performance numbers you can sha

Re: [I] Support full UTF-8 in CSV files [datafusion]

2025-05-05 Thread via GitHub
guojidan closed issue #15756: Support full UTF-8 in CSV files URL: https://github.com/apache/datafusion/issues/15756 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscr

Re: [I] Support full UTF-8 in CSV files [datafusion]

2025-05-05 Thread via GitHub
guojidan commented on issue #15756: URL: https://github.com/apache/datafusion/issues/15756#issuecomment-2853172710 > btw, why do you need this kind of delimiter support? Is converting them to `,` an option for your use-case? This is a good approach πŸ‘ -- This is an automated messag

Re: [PR] Implement intermediate result blocked approach to aggregation memory management [datafusion]

2025-05-05 Thread via GitHub
Rachelint commented on PR #15591: URL: https://github.com/apache/datafusion/pull/15591#issuecomment-2853164420 I suspect if it is due to the row-level random access of `VecDeque`? I have replace `VecDeque` with `Vec`. And I am trying to run the benchmark in more enviorments rather t

Re: [PR] fix: fold cast null to substrait typed null [datafusion]

2025-05-05 Thread via GitHub
discord9 commented on code in PR #15854: URL: https://github.com/apache/datafusion/pull/15854#discussion_r2074660012 ## datafusion/substrait/src/logical_plan/producer.rs: ## @@ -1590,6 +1590,21 @@ pub fn from_cast( schema: &DFSchemaRef, ) -> Result { let Cast { expr,

[I] adding statrs statistical functions [datafusion]

2025-05-05 Thread via GitHub
drtconway opened a new issue, #15953: URL: https://github.com/apache/datafusion/issues/15953 ### Is your feature request related to a problem or challenge? Hi Datafusion, Great software - thanks! For some of my bioinformatics applications I needed to call some statistical

Re: [PR] feat: Set/cancel with job tag and make max broadcast table size configurable [datafusion-comet]

2025-05-05 Thread via GitHub
wForget commented on code in PR #1693: URL: https://github.com/apache/datafusion-comet/pull/1693#discussion_r2074622846 ## spark/src/main/spark-3.4/org/apache/comet/shims/ShimCometBroadcastExchangeExec.scala: ## @@ -0,0 +1,49 @@ +/* + * Licensed to the Apache Software Foundation

Re: [PR] [WIP] chore: Add detailed error for sum::coerce_type [datafusion]

2025-05-05 Thread via GitHub
github-actions[bot] commented on PR #14710: URL: https://github.com/apache/datafusion/pull/14710#issuecomment-2853067603 Thank you for your contribution. Unfortunately, this pull request is stale because it has been open 60 days with no activity. Please remove the stale label or comment or

Re: [PR] Improve SQL syntax error messages in parser (#14437) [datafusion]

2025-05-05 Thread via GitHub
github-actions[bot] commented on PR #14986: URL: https://github.com/apache/datafusion/pull/14986#issuecomment-2853067490 Thank you for your contribution. Unfortunately, this pull request is stale because it has been open 60 days with no activity. Please remove the stale label or comment or

Re: [PR] [WIP] Attempt piping through field metadata in as many places as possible [datafusion]

2025-05-05 Thread via GitHub
github-actions[bot] commented on PR #15036: URL: https://github.com/apache/datafusion/pull/15036#issuecomment-2853067357 Thank you for your contribution. Unfortunately, this pull request is stale because it has been open 60 days with no activity. Please remove the stale label or comment or

Re: [PR] Always install correct version of rust in CI [datafusion]

2025-05-05 Thread via GitHub
github-actions[bot] commented on PR #14992: URL: https://github.com/apache/datafusion/pull/14992#issuecomment-2853067390 Thank you for your contribution. Unfortunately, this pull request is stale because it has been open 60 days with no activity. Please remove the stale label or comment or

Re: [PR] Fix Infer prepare statement type tests [datafusion]

2025-05-05 Thread via GitHub
qstommyshu commented on code in PR #15743: URL: https://github.com/apache/datafusion/pull/15743#discussion_r2074561403 ## datafusion/sql/src/statement.rs: ## @@ -710,6 +710,25 @@ impl SqlToRel<'_, S> { *statement, &mut planner_context,

Re: [PR] Fix Infer prepare statement type tests [datafusion]

2025-05-05 Thread via GitHub
qstommyshu commented on PR #15743: URL: https://github.com/apache/datafusion/pull/15743#issuecomment-2852990157 Hi @brayanjuls, Thanks for the clever fix! For this issue, I suggest we keep our changes limited to the test file. Modifying [datafusion/sql/src/statement.rs](https://githu

Re: [I] [Experimental scans] schema adapter does not apply required schema for structs within lists [datafusion-comet]

2025-05-05 Thread via GitHub
comphead commented on issue #1681: URL: https://github.com/apache/datafusion-comet/issues/1681#issuecomment-2852855311 I went down the call stack and found the reader gives batches wrong in `OpStruct::NativeScan(scan)` ``` [core/src/execution/planner.rs:1169:17] result = [

Re: [PR] Implement intermediate result blocked approach to aggregation memory management [datafusion]

2025-05-05 Thread via GitHub
Rachelint commented on PR #15591: URL: https://github.com/apache/datafusion/pull/15591#issuecomment-2852844732 > > It is really suprised that it show slower? > > This is my only concern too. Yes, I think we should find out the reason, it is really suprised and it actually show

Re: [PR] Implement intermediate result blocked approach to aggregation memory management [datafusion]

2025-05-05 Thread via GitHub
jayzhan211 commented on PR #15591: URL: https://github.com/apache/datafusion/pull/15591#issuecomment-2852827157 > It is really suprised that it show slower? This is my only concern too. -- This is an automated message from the Apache Git Service. To respond to the message, please lo

Re: [PR] WIP: scalar UDFs with metadata [datafusion-python]

2025-05-05 Thread via GitHub
kylebarron commented on code in PR #1110: URL: https://github.com/apache/datafusion-python/pull/1110#discussion_r2074464751 ## python/datafusion/user_defined.py: ## @@ -77,6 +77,15 @@ def __str__(self) -> str: return self.name.lower() +class ScalarUDFExportable(Pro

Re: [I] Support integration with Parquet modular encryption [datafusion]

2025-05-05 Thread via GitHub
adamreeve commented on issue #15216: URL: https://github.com/apache/datafusion/issues/15216#issuecomment-2852780352 > I don't really understand the reason for using `Any` Actually I think I remember now that this would let us include structs in config types in `datafusion::common::con

Re: [PR] Implement intermediate result blocked approach to aggregation memory management [datafusion]

2025-05-05 Thread via GitHub
Rachelint commented on PR #15591: URL: https://github.com/apache/datafusion/pull/15591#issuecomment-2852744312 > πŸ€–: Benchmark completed > Details It is really suprised that it show slower? -- This is an automated message from the Apache Git Service. To respond to the message, ple

[I] TPC-DS q67 causes OOM after repeated runs [datafusion-comet]

2025-05-05 Thread via GitHub
andygrove opened a new issue, #1716: URL: https://github.com/apache/datafusion-comet/issues/1716 ### Describe the bug I am running with the following config: - SF 1000 (1TB) dataset - 32 executors - Each executor has 16 cores and 32 GB memory + 32 GB off-heap memory - D

Re: [I] Support integration with Parquet modular encryption [datafusion]

2025-05-05 Thread via GitHub
adamreeve commented on issue #15216: URL: https://github.com/apache/datafusion/issues/15216#issuecomment-2852529965 > Here is how spark does encryption configuration My understanding of how this works in Spark from reading this and looking at some of the code: * Spark requires spec

[PR] feat: Support Parquet writer options [datafusion-python]

2025-05-05 Thread via GitHub
nuno-faria opened a new pull request, #1123: URL: https://github.com/apache/datafusion-python/pull/1123 # Which issue does this PR close? N/A. # Rationale for this change Supporting all Parquet writer options allows us more flexibility when creating data dir

Re: [PR] refactor filter pushdown apis [datafusion]

2025-05-05 Thread via GitHub
berkaysynnada commented on PR #15801: URL: https://github.com/apache/datafusion/pull/15801#issuecomment-2852390838 @adriangb I forgot to add some note: we have converted the design here, but there wasn't any test change or addition. I think we should add some tests to protect these behavior

Re: [PR] Implement intermediate result blocked approach to aggregation memory management [datafusion]

2025-05-05 Thread via GitHub
alamb commented on PR #15591: URL: https://github.com/apache/datafusion/pull/15591#issuecomment-2852381080 πŸ€–: Benchmark completed Details ``` Comparing HEAD and intermeidate-result-blocked-approach Benchmark clickbench_extended.json -

Re: [PR] chore: Comet + Iceberg (1.8.1) CI [datafusion-comet]

2025-05-05 Thread via GitHub
hsiang-c commented on code in PR #1715: URL: https://github.com/apache/datafusion-comet/pull/1715#discussion_r2074221404 ## pom.xml: ## @@ -986,6 +986,7 @@ under the License. **/build/** **/target/** **/apache-spark/** +**/apach

Re: [PR] Implement Parquet filter pushdown via new filter pushdown APIs [datafusion]

2025-05-05 Thread via GitHub
adriangb commented on code in PR #15769: URL: https://github.com/apache/datafusion/pull/15769#discussion_r2074216845 ## datafusion/physical-optimizer/src/optimizer.rs: ## @@ -95,6 +95,10 @@ impl PhysicalOptimizer { // as that rule may inject other operations in betw

Re: [PR] Implement Parquet filter pushdown via new filter pushdown APIs [datafusion]

2025-05-05 Thread via GitHub
adriangb commented on code in PR #15769: URL: https://github.com/apache/datafusion/pull/15769#discussion_r2074213248 ## datafusion/datasource-parquet/src/source.rs: ## @@ -589,4 +559,49 @@ impl FileSource for ParquetSource { } } } + +fn try_pushdow

[I] Clean up APIs around `FileScanConfigBuilder`, `FileScanConfig` and `FileSource` [datafusion]

2025-05-05 Thread via GitHub
adriangb opened a new issue, #15952: URL: https://github.com/apache/datafusion/issues/15952 In particular, passing around of `schema` is a bit wonky, see https://github.com/apache/datafusion/pull/15769/files#r2052245527. If we force passing `file_schema` into `ParquetFileSource::new`

Re: [PR] TopK dynamic filter pushdown attempt 2 [datafusion]

2025-05-05 Thread via GitHub
adriangb commented on PR #15770: URL: https://github.com/apache/datafusion/pull/15770#issuecomment-2852350646 I think some tweaks will be needed based on https://github.com/apache/datafusion/pull/15769/files#r2074207291 -- This is an automated message from the Apache Git Service. To respo

Re: [PR] Implement Parquet filter pushdown via new filter pushdown APIs [datafusion]

2025-05-05 Thread via GitHub
adriangb commented on code in PR #15769: URL: https://github.com/apache/datafusion/pull/15769#discussion_r2074207291 ## datafusion/physical-optimizer/src/optimizer.rs: ## @@ -95,6 +95,10 @@ impl PhysicalOptimizer { // as that rule may inject other operations in betw

Re: [PR] Implement intermediate result blocked approach to aggregation memory management [datafusion]

2025-05-05 Thread via GitHub
alamb commented on code in PR #15591: URL: https://github.com/apache/datafusion/pull/15591#discussion_r2074192086 ## datafusion/core/tests/fuzz_cases/aggregation_fuzzer/context_generator.rs: ## @@ -146,11 +147,14 @@ impl SessionContextGenerator { (provider, fals

Re: [I] Add Extended Clickbench benchmark for avg(duration) [datafusion]

2025-05-05 Thread via GitHub
alamb commented on issue #15949: URL: https://github.com/apache/datafusion/issues/15949#issuecomment-2852308152 BTW on my machine, the query above runs on main in ``` Elapsed 0.478 seconds. Elapsed 0.473 seconds. ``` And using https://github.com/apache/datafusion/pull/15748

Re: [I] Investigate impact of breaking changes in DataFusion since 47.0.0 release [datafusion-comet]

2025-05-05 Thread via GitHub
andygrove commented on issue #1709: URL: https://github.com/apache/datafusion-comet/issues/1709#issuecomment-2852322439 This does not seem to be an issue. I made the necessary updates in https://github.com/apache/datafusion-comet/pull/1710 -- This is an automated message from the Apache

Re: [I] Investigate impact of breaking changes in DataFusion since 47.0.0 release [datafusion-comet]

2025-05-05 Thread via GitHub
andygrove closed issue #1709: Investigate impact of breaking changes in DataFusion since 47.0.0 release URL: https://github.com/apache/datafusion-comet/issues/1709 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL a

Re: [PR] Implement Parquet filter pushdown via new filter pushdown APIs [datafusion]

2025-05-05 Thread via GitHub
adriangb commented on code in PR #15769: URL: https://github.com/apache/datafusion/pull/15769#discussion_r2074193501 ## datafusion/sqllogictest/test_files/parquet_filter_pushdown.slt: ## @@ -221,5 +226,23 @@ physical_plan DataSourceExec: file_groups={2 groups: [[WORKSPACE_ROOT/

Re: [I] [DISCUSSION] Should `CREATE EXTERNAL TABLE's create a new file if it doesn't exist? [datafusion]

2025-05-05 Thread via GitHub
alamb commented on issue #15944: URL: https://github.com/apache/datafusion/issues/15944#issuecomment-2852313796 I think there is a usecase for DataFusion remaining "read only" and so some users will prefer to get errors if the table doesn't exist Perhaps we can create a mode / config

Re: [PR] Implement intermediate result blocked approach to aggregation memory management [datafusion]

2025-05-05 Thread via GitHub
alamb commented on PR #15591: URL: https://github.com/apache/datafusion/pull/15591#issuecomment-285229 πŸ€– `./gh_compare_branch.sh` [Benchmark Script](https://github.com/alamb/datafusion-benchmarking/blob/main/gh_compare_branch.sh) Running Linux aal-dev 6.11.0-1013-gcp #13~24.04.1-Ubun

[I] Duplicated `cargo test --doc` in CI [datafusion]

2025-05-05 Thread via GitHub
alamb opened a new issue, #15951: URL: https://github.com/apache/datafusion/issues/15951 ### Describe the bug - While reviewing https://github.com/apache/datafusion/pull/15769 from @adriangb I noticed there are two CI tests that each take 15 minutes that are doing the same thing:

Re: [PR] Implement Parquet filter pushdown via new filter pushdown APIs [datafusion]

2025-05-05 Thread via GitHub
alamb commented on PR #15769: URL: https://github.com/apache/datafusion/pull/15769#issuecomment-2852276025 It looks like there are a few doc errors and doc exaple errors -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

Re: [PR] Implement Parquet filter pushdown via new filter pushdown APIs [datafusion]

2025-05-05 Thread via GitHub
alamb commented on code in PR #15769: URL: https://github.com/apache/datafusion/pull/15769#discussion_r2074120924 ## datafusion/core/tests/sql/path_partition.rs: ## @@ -57,55 +53,6 @@ use object_store::{ use object_store::{Attributes, MultipartUpload, PutMultipartOpts, PutPaylo

[I] Remove Deprecated `ParquetExec`, `AvroExec`, `CsvExec`, `JsonExec` early (before deprecation deadline) [datafusion]

2025-05-05 Thread via GitHub
alamb opened a new issue, #15950: URL: https://github.com/apache/datafusion/issues/15950 ### Is your feature request related to a problem or challenge? As explained in the [UpgradeGuide](https://datafusion.apache.org/library-user-guide/upgrading.html#parquetexec-avroexec-csvexec-jsone

Re: [I] Parquet predicate filters fail with "Invalid comparison operation: Utf8View <= Utf8" [datafusion]

2025-05-05 Thread via GitHub
alamb commented on issue #15920: URL: https://github.com/apache/datafusion/issues/15920#issuecomment-2852241948 - I think https://github.com/apache/datafusion/issues/15780 is ready to go now I left some suggestions on how to start: https://github.com/apache/datafusion/issues/15780#is

Re: [I] Unnecessary casting in stats & filter evaluation [datafusion]

2025-05-05 Thread via GitHub
alamb commented on issue #15780: URL: https://github.com/apache/datafusion/issues/15780#issuecomment-2852240671 I think the first thing to do would be to try and write some tests that show the error happening Perhaps we could use the existing statistics: `predicate_evaluation_errors`

Re: [I] Memory leak in `datafusion-cli` [datafusion]

2025-05-05 Thread via GitHub
alamb commented on issue #15939: URL: https://github.com/apache/datafusion/issues/15939#issuecomment-2852228859 This might be due to the system allocator you are using (e.g. `jemalloc` for example). Often as a performance optimization such allocators will not return memory to the underlying

Re: [I] Implement `hf://` / "hugging face" integration in datafusion-cli [datafusion]

2025-05-05 Thread via GitHub
alamb closed issue #10720: Implement `hf://` / "hugging face" integration in datafusion-cli URL: https://github.com/apache/datafusion/issues/10720 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] Support `GroupsAccumulator` for Avg duration [datafusion]

2025-05-05 Thread via GitHub
alamb commented on PR #15748: URL: https://github.com/apache/datafusion/pull/15748#issuecomment-2852209904 > > I wrote up a ticket about how to make a benchmark: > > > > * [Add Extended Clickbench benchmark for avg(duration)Β  #15949](https://github.com/apache/datafusion/issues/15949)

Re: [PR] [datafusion-spark] Add Spark-compatible hex function [datafusion]

2025-05-05 Thread via GitHub
alamb commented on code in PR #15947: URL: https://github.com/apache/datafusion/pull/15947#discussion_r2074113601 ## datafusion/spark/src/function/math/hex.rs: ## @@ -0,0 +1,404 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agr

Re: [PR] [datafusion-spark] Add Spark-compatible hex function [datafusion]

2025-05-05 Thread via GitHub
alamb commented on code in PR #15947: URL: https://github.com/apache/datafusion/pull/15947#discussion_r2074114006 ## datafusion/spark/src/function/math/hex.rs: ## @@ -0,0 +1,404 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agr

Re: [I] Support metadata columns (`location`, `size`, `last_modified`) in `ListingTableProvider` [datafusion]

2025-05-05 Thread via GitHub
alamb commented on issue #15173: URL: https://github.com/apache/datafusion/issues/15173#issuecomment-2852204806 > That would allow for injection of metadata columns in my custom implementation and for parsing out of the partition column values from the location in the ListingTableProvider (

Re: [PR] PERF : modify SMJ shuffle file reader to skip validation [datafusion]

2025-05-05 Thread via GitHub
alamb commented on PR #15948: URL: https://github.com/apache/datafusion/pull/15948#issuecomment-2852201364 > Benchmark test is not included because it was difficult to limit them to shuffle read scope. Did you test it locally? Do you have any performance numbers you can share?

Re: [PR] Support inferring new predicates to push down [datafusion]

2025-05-05 Thread via GitHub
alamb commented on PR #15906: URL: https://github.com/apache/datafusion/pull/15906#issuecomment-2852200333 πŸ€–: Benchmark completed Details ``` Comparing HEAD and infer_filter Benchmark clickbench_extended.json ┏

Re: [I] [datafusion-spark] Implement `ceil` function [datafusion]

2025-05-05 Thread via GitHub
alamb commented on issue #15916: URL: https://github.com/apache/datafusion/issues/15916#issuecomment-2852195168 Sounds good -- thanks @shehabgamin Looks to me like @irenjj has offered to take a look at this one, so let's see what they come up with Note that @andygrove create

Re: [PR] Add xxhash algorithms in SQL and expression api [datafusion]

2025-05-05 Thread via GitHub
alamb commented on PR #14367: URL: https://github.com/apache/datafusion/pull/14367#issuecomment-2852189748 We recently added the datafusion-spark crate for spark compatible functions. @Spaarsh , would you be willing to port this function over to that crate and update it to be spark compatib

Re: [I] Spark-compatible CAST operation [datafusion]

2025-05-05 Thread via GitHub
alamb commented on issue #11201: URL: https://github.com/apache/datafusion/issues/11201#issuecomment-2852188264 Now that we have the datafusion-spark crate, maybe we have a home for this functionality: - https://github.com/apache/datafusion/issues/15914 -- This is an automated message

Re: [PR] [datafusion-spark] Add Spark-compatible hex function [datafusion]

2025-05-05 Thread via GitHub
shehabgamin commented on code in PR #15947: URL: https://github.com/apache/datafusion/pull/15947#discussion_r2074093408 ## datafusion/spark/src/function/math/hex.rs: ## @@ -0,0 +1,404 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor licen

Re: [I] Add Extended Clickbench benchmark for [datafusion]

2025-05-05 Thread via GitHub
shruti2522 commented on issue #15949: URL: https://github.com/apache/datafusion/issues/15949#issuecomment-2852160509 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [PR] Support `GroupsAccumulator` for Avg duration [datafusion]

2025-05-05 Thread via GitHub
shruti2522 commented on PR #15748: URL: https://github.com/apache/datafusion/pull/15748#issuecomment-2852159307 > I wrote up a ticket about how to make a benchmark: > - https://github.com/apache/datafusion/issues/15949 Thanks, @alamb. I've been a bit caught up with semester exams la

Re: [PR] Consolidate feature flags into configuration guide [datafusion]

2025-05-05 Thread via GitHub
alamb commented on PR #14657: URL: https://github.com/apache/datafusion/pull/14657#issuecomment-2852158533 Yeah it just needs a committer to review I think -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [I] [datafusion-spark] Implement `ceil` function [datafusion]

2025-05-05 Thread via GitHub
shehabgamin commented on issue #15916: URL: https://github.com/apache/datafusion/issues/15916#issuecomment-2852130041 > [@shehabgamin](https://github.com/shehabgamin) and [@andygrove](https://github.com/andygrove) -- here is a ticket for another spark function. I am hoping we can do one or

Re: [PR] Support `GroupsAccumulator` for Avg duration [datafusion]

2025-05-05 Thread via GitHub
alamb commented on PR #15748: URL: https://github.com/apache/datafusion/pull/15748#issuecomment-2852121347 I wrote up a ticket about how to make a benchmark: - https://github.com/apache/datafusion/issues/15949 -- This is an automated message from the Apache Git Service. To respond to th

[I] Add Extended Clickbench benchmark for [datafusion]

2025-05-05 Thread via GitHub
alamb opened a new issue, #15949: URL: https://github.com/apache/datafusion/issues/15949 ### Is your feature request related to a problem or challenge? - Part of https://github.com/apache/datafusion/issues/15458 In https://github.com/apache/datafusion/pull/15748, @shruti2522 i

Re: [PR] Perf: Optimize in memory sort [datafusion]

2025-05-05 Thread via GitHub
alamb commented on PR #15380: URL: https://github.com/apache/datafusion/pull/15380#issuecomment-2852092540 πŸ€–: Benchmark completed Details ``` Comparing HEAD and concat_batches_for_sort Benchmark clickbench_extended.json --

Re: [PR] Support inferring new predicates to push down [datafusion]

2025-05-05 Thread via GitHub
alamb commented on PR #15906: URL: https://github.com/apache/datafusion/pull/15906#issuecomment-2852092666 πŸ€– `./gh_compare_branch.sh` [Benchmark Script](https://github.com/alamb/datafusion-benchmarking/blob/main/gh_compare_branch.sh) Running Linux aal-dev 6.11.0-1013-gcp #13~24.04.1-Ubun

Re: [I] Eliminate the function call in `xxx_or (e.g. unwrap_or("".to_string())` [datafusion]

2025-05-05 Thread via GitHub
alamb commented on issue #15802: URL: https://github.com/apache/datafusion/issues/15802#issuecomment-2852074753 @NevroHelios has a PR to fix this and add the lint: - https://github.com/apache/datafusion/pull/15841#top -- This is an automated message from the Apache Git Service. To respo

Re: [PR] refactor: replace `unwrap_or` with `unwrap_or_else` for improved lazy… [datafusion]

2025-05-05 Thread via GitHub
alamb commented on PR #15841: URL: https://github.com/apache/datafusion/pull/15841#issuecomment-2852073927 - FYI @Rachelint as you filed https://github.com/apache/datafusion/issues/15802 -- This is an automated message from the Apache Git Service. To respond to the message, please log on

Re: [PR] refactor: replace `unwrap_or` with `unwrap_or_else` for improved lazy… [datafusion]

2025-05-05 Thread via GitHub
alamb commented on code in PR #15841: URL: https://github.com/apache/datafusion/pull/15841#discussion_r2074030018 ## datafusion/physical-optimizer/src/enforce_distribution.rs: ## @@ -950,11 +949,7 @@ fn add_spm_on_top(input: DistributionContext) -> DistributionContext {

Re: [D] outlier, time compare or frequency analysis operators in datafusion? [datafusion]

2025-05-05 Thread via GitHub
GitHub user alamb added a comment to the discussion: outlier, time compare or frequency analysis operators in datafusion? The window functions might also be relevant: https://datafusion.apache.org/user-guide/sql/window_functions.html GitHub link: https://github.com/apache/datafusion/discussi

Re: [PR] Update extending-operators.md [datafusion]

2025-05-05 Thread via GitHub
alamb commented on PR #15832: URL: https://github.com/apache/datafusion/pull/15832#issuecomment-2852026880 > hey @xudong963 , i think there might be something that I am missing I had done imports but it cause failing again and again , could you please help out ? Here is a writeup that

Re: [PR] refactor filter pushdown apis [datafusion]

2025-05-05 Thread via GitHub
alamb commented on PR #15801: URL: https://github.com/apache/datafusion/pull/15801#issuecomment-2852018421 Woohoo -- it is moving! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific com

Re: [PR] Introduce selection vector repartitioning [datafusion]

2025-05-05 Thread via GitHub
alamb commented on PR #15423: URL: https://github.com/apache/datafusion/pull/15423#issuecomment-2852012982 BTW I have not had a chance to look at this PR yet, but the high level idea of pushdown / late materialization is also being explored under the name "dynamic filtering" -- it isn't qui

Re: [PR] fix: fold cast null to substrait typed null [datafusion]

2025-05-05 Thread via GitHub
alamb commented on PR #15854: URL: https://github.com/apache/datafusion/pull/15854#issuecomment-2852002030 Thanks again @discord9 @vbarua and @gabotechs -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

Re: [PR] Perf: Optimize in memory sort [datafusion]

2025-05-05 Thread via GitHub
alamb commented on PR #15380: URL: https://github.com/apache/datafusion/pull/15380#issuecomment-2852001513 πŸ€–: Benchmark completed Details ``` Comparing HEAD and concat_batches_for_sort Benchmark sort_tpch.json ┏━━━

Re: [PR] Perf: Optimize in memory sort [datafusion]

2025-05-05 Thread via GitHub
alamb commented on PR #15380: URL: https://github.com/apache/datafusion/pull/15380#issuecomment-2852001621 πŸ€– `./gh_compare_branch.sh` [Benchmark Script](https://github.com/alamb/datafusion-benchmarking/blob/main/gh_compare_branch.sh) Running Linux aal-dev 6.11.0-1013-gcp #13~24.04.1-Ubun

Re: [PR] fix: fold cast null to substrait typed null [datafusion]

2025-05-05 Thread via GitHub
alamb merged PR #15854: URL: https://github.com/apache/datafusion/pull/15854 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] fix: fold cast null to substrait typed null [datafusion]

2025-05-05 Thread via GitHub
alamb commented on PR #15854: URL: https://github.com/apache/datafusion/pull/15854#issuecomment-2851983386 I restarted a failed CI run that looked to be due to some network error -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

Re: [PR] Add all missing table options to be handled in any order [datafusion-sqlparser-rs]

2025-05-05 Thread via GitHub
alamb commented on PR #1747: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1747#issuecomment-2851975306 This is very nice -- thank you -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [PR] Perf: Optimize in memory sort [datafusion]

2025-05-05 Thread via GitHub
alamb commented on PR #15380: URL: https://github.com/apache/datafusion/pull/15380#issuecomment-2851972944 πŸ€– `./gh_compare_branch.sh` [Benchmark Script](https://github.com/alamb/datafusion-benchmarking/blob/main/gh_compare_branch.sh) Running Linux aal-dev 6.11.0-1013-gcp #13~24.04.1-Ubun

Re: [PR] Perf: Optimize in memory sort [datafusion]

2025-05-05 Thread via GitHub
alamb commented on PR #15380: URL: https://github.com/apache/datafusion/pull/15380#issuecomment-2851959454 > It seems latest benchmark not triggered. I think it was due to - https://github.com/apache/datafusion/pull/15929 I will try again -- This is an automated message f

Re: [PR] Perf: Optimize in memory sort [datafusion]

2025-05-05 Thread via GitHub
alamb commented on PR #15380: URL: https://github.com/apache/datafusion/pull/15380#issuecomment-2851956703 Weird -- looking into it -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific co

Re: [PR] [datafusion-spark] Add Spark-compatible hex function [datafusion]

2025-05-05 Thread via GitHub
alamb commented on code in PR #15947: URL: https://github.com/apache/datafusion/pull/15947#discussion_r2073954485 ## datafusion/spark/src/function/math/hex.rs: ## @@ -0,0 +1,404 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agr

Re: [PR] Feat: support bit_get function [datafusion-comet]

2025-05-05 Thread via GitHub
kazantsev-maksim commented on PR #1713: URL: https://github.com/apache/datafusion-comet/pull/1713#issuecomment-2851931381 @comphead I'd like to merge https://github.com/apache/datafusion-comet/pull/1602 first, and then update this PR again, but the reverse order is also fine. -- This is

Re: [PR] Feat: support bit_get function [datafusion-comet]

2025-05-05 Thread via GitHub
codecov-commenter commented on PR #1713: URL: https://github.com/apache/datafusion-comet/pull/1713#issuecomment-2851879847 ## [Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/1713?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_ca

Re: [PR] chore: Comet + Iceberg (1.8.1) CI [datafusion-comet]

2025-05-05 Thread via GitHub
codecov-commenter commented on PR #1715: URL: https://github.com/apache/datafusion-comet/pull/1715#issuecomment-2851882281 ## [Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/1715?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_ca

Re: [PR] experiment: Selectively remove CoalesceBatchesExec [datafusion]

2025-05-05 Thread via GitHub
ctsk closed pull request #15479: experiment: Selectively remove CoalesceBatchesExec URL: https://github.com/apache/datafusion/pull/15479 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] TopK dynamic filter pushdown attempt 2 [datafusion]

2025-05-05 Thread via GitHub
adriangb commented on PR #15770: URL: https://github.com/apache/datafusion/pull/15770#issuecomment-2851724974 > @adriangb I'll complete reviewing this after merging other open PR's. Thanks for all of the reviews @berkaysynnada. This one is now ready again. -- This is an automated me

Re: [PR] Fix: parsing ident starting with underscore in certain dialects [datafusion-sqlparser-rs]

2025-05-05 Thread via GitHub
MohamedAbdeen21 commented on code in PR #1835: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1835#discussion_r2073850382 ## src/tokenizer.rs: ## @@ -1281,20 +1262,91 @@ impl<'a> Tokenizer<'a> { return Ok(Some(Token::make_word(s.as_

Re: [PR] Add support for `DENY` statements [datafusion-sqlparser-rs]

2025-05-05 Thread via GitHub
aharpervc commented on code in PR #1836: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1836#discussion_r2073809648 ## src/parser/mod.rs: ## @@ -12987,23 +12991,34 @@ impl<'a> Parser<'a> { /// Parse a GRANT statement. pub fn parse_grant(&mut self) -> Re

Re: [PR] Implement Parquet filter pushdown via new filter pushdown APIs [datafusion]

2025-05-05 Thread via GitHub
adriangb commented on PR #15769: URL: https://github.com/apache/datafusion/pull/15769#issuecomment-2851630545 Okay folks this is rebased and ready for another round of review -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

Re: [PR] chore: generate changelog for 46.0.0 [datafusion-ballista]

2025-05-05 Thread via GitHub
andygrove merged PR #1259: URL: https://github.com/apache/datafusion-ballista/pull/1259 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr.

Re: [PR] refactor: replace `unwrap_or` with `unwrap_or_else` for improved lazy… [datafusion]

2025-05-05 Thread via GitHub
NevroHelios commented on PR #15841: URL: https://github.com/apache/datafusion/pull/15841#issuecomment-2851616818 Hi @alamb I updated and pushed the code, avaoiding the use of `.clone()`. Could you please rerun the CI? -- This is an automated message from the Apache Git Service. To resp

[PR] PERF : modify SMJ shuffle file reader to skip validation [datafusion]

2025-05-05 Thread via GitHub
getChan opened a new pull request, #15948: URL: https://github.com/apache/datafusion/pull/15948 ## Which issue does this PR close? ## Rationale for this change #14078 #15454 shows when read shuffle file, skipping validation is effective. ## What change

Re: [PR] [wip] Add scripts for running benchmarks on EC2 [datafusion-comet]

2025-05-05 Thread via GitHub
andygrove commented on code in PR #1654: URL: https://github.com/apache/datafusion-comet/pull/1654#discussion_r2073781405 ## dev/benchmarks/setup.sh: ## @@ -0,0 +1,44 @@ +#!/bin/bash +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license

Re: [PR] [wip] Add scripts for running benchmarks on EC2 [datafusion-comet]

2025-05-05 Thread via GitHub
andygrove commented on code in PR #1654: URL: https://github.com/apache/datafusion-comet/pull/1654#discussion_r2073780545 ## dev/benchmarks/setup.sh: ## @@ -0,0 +1,44 @@ +#!/bin/bash +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license

Re: [PR] [wip] Add scripts for running benchmarks on EC2 [datafusion-comet]

2025-05-05 Thread via GitHub
andygrove commented on code in PR #1654: URL: https://github.com/apache/datafusion-comet/pull/1654#discussion_r2073775557 ## dev/benchmarks/setup.sh: ## @@ -0,0 +1,44 @@ +#!/bin/bash +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license

Re: [PR] feat: regexp_replace() expression with no starting offset [datafusion-comet]

2025-05-05 Thread via GitHub
andygrove merged PR #1700: URL: https://github.com/apache/datafusion-comet/pull/1700 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@

Re: [PR] perf: Add memory profiling [datafusion-comet]

2025-05-05 Thread via GitHub
andygrove commented on PR #1702: URL: https://github.com/apache/datafusion-comet/pull/1702#issuecomment-2851473789 Thanks for the reviews @comphead @kazuyukitanimura and @mbutrovich (offline review). I'm going to go ahead and merge this and then iterate on improving it once I have used it

Re: [I] Improve release verification instructions [datafusion]

2025-05-05 Thread via GitHub
andygrove closed issue #9848: Improve release verification instructions URL: https://github.com/apache/datafusion/issues/9848 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

  1   2   >