Re: [PR] Minor: remove unused `AutoFinishBzEncoder` [datafusion]

2025-02-12 Thread via GitHub
jonahgao merged PR #14630: URL: https://github.com/apache/datafusion/pull/14630 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

Re: [PR] test: Add experimental native scans to CometReadBenchmark [datafusion-comet]

2025-02-12 Thread via GitHub
mbutrovich commented on PR #1150: URL: https://github.com/apache/datafusion-comet/pull/1150#issuecomment-2655093733 New numbers: [CometReadBenchmark-jdk11-results.txt](https://github.com/user-attachments/files/18775342/CometReadBenchmark-jdk11-results.txt) -- This is an automa

Re: [PR] [substrait] Add support for ExtensionTable [datafusion]

2025-02-12 Thread via GitHub
ccciudatu closed pull request #13772: [substrait] Add support for ExtensionTable URL: https://github.com/apache/datafusion/pull/13772 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comm

Re: [PR] [substrait] Add support for ExtensionTable [datafusion]

2025-02-12 Thread via GitHub
ccciudatu commented on PR #13772: URL: https://github.com/apache/datafusion/pull/13772#issuecomment-2655092431 Closing this PR as the new Consumer/Producer APIs make the Substrait conversion fully customizable and the explicit support for `ExtensionTable`s can wait at least until they're fu

Re: [PR] fix: Passthrough condition in StaticInvoke case block [datafusion-comet]

2025-02-12 Thread via GitHub
andygrove merged PR #1392: URL: https://github.com/apache/datafusion-comet/pull/1392 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@

Re: [I] StaticInvoke class checked with elided types [datafusion-comet]

2025-02-12 Thread via GitHub
andygrove closed issue #1391: StaticInvoke class checked with elided types URL: https://github.com/apache/datafusion-comet/issues/1391 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific com

Re: [PR] function: Allow more expressive array signatures [datafusion]

2025-02-12 Thread via GitHub
jayzhan211 commented on PR #14532: URL: https://github.com/apache/datafusion/pull/14532#issuecomment-2655163076 > > It is because of this, I think we now only coerce to list if the flag is set > > Are you saying that the function should look something like this? > > ```rust

Re: [PR] DataFusion Ray rewrite to connect stages with Arrow Flight Streaming [datafusion-ray]

2025-02-12 Thread via GitHub
milenkovicm commented on PR #60: URL: https://github.com/apache/datafusion-ray/pull/60#issuecomment-2654229801 please do @andygrove, my comment is minor, it should not be considered as blocker in any sense. -- This is an automated message from the Apache Git Service. To respond to the

Re: [I] Parser error with GROUP BY with multiple filters on DataFusion 45 [datafusion]

2025-02-12 Thread via GitHub
alamb commented on issue #14633: URL: https://github.com/apache/datafusion/issues/14633#issuecomment-2654631472 > > So as a workaround we could use any dialect that supports it (e.g. postgresql), gotcha. > > That sounds like it should work. From some googling it looks like the `FILTE

Re: [PR] fix: Remove cast.rs logic from parquet_support.rs for experimental native readers [datafusion-comet]

2025-02-12 Thread via GitHub
mbutrovich commented on PR #1387: URL: https://github.com/apache/datafusion-comet/pull/1387#issuecomment-2654633256 For context: when we were working in the experimental branch and added the `SchemaAdapter`, we were calling into cast.rs for type conversions. However, we started making chan

Re: [I] Return the "position" of rows in parquet files after performing a query. [datafusion]

2025-02-12 Thread via GitHub
alamb commented on issue #13261: URL: https://github.com/apache/datafusion/issues/13261#issuecomment-2654632787 BTW I think this can be achieved when we merge the metadata columns PR - https://github.com/apache/datafusion/pull/14057 -- This is an automated message from the Apache Git Se

[PR] doc: update memory tuning guide [datafusion-comet]

2025-02-12 Thread via GitHub
kazuyukitanimura opened a new pull request, #1394: URL: https://github.com/apache/datafusion-comet/pull/1394 ## Which issue does this PR close? Closes #1388 ## Rationale for this change Following up on #1369 and #1386 ## What changes are included in this PR?

Re: [PR] doc: update memory tuning guide [datafusion-comet]

2025-02-12 Thread via GitHub
kazuyukitanimura commented on PR #1394: URL: https://github.com/apache/datafusion-comet/pull/1394#issuecomment-2654647275 cc @parthchandra @andygrove -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to g

Re: [PR] fix: Reduce cast.rs logic from parquet_support.rs for experimental native readers [datafusion-comet]

2025-02-12 Thread via GitHub
kazuyukitanimura commented on code in PR #1387: URL: https://github.com/apache/datafusion-comet/pull/1387#discussion_r1953416190 ## native/core/src/parquet/parquet_support.rs: ## @@ -596,7 +595,10 @@ fn cast_array( parquet_options: &SparkParquetOptions, ) -> DataFusionResu

Re: [PR] fix: Reduce cast.rs logic from parquet_support.rs for experimental native readers [datafusion-comet]

2025-02-12 Thread via GitHub
kazuyukitanimura commented on code in PR #1387: URL: https://github.com/apache/datafusion-comet/pull/1387#discussion_r1953416190 ## native/core/src/parquet/parquet_support.rs: ## @@ -596,7 +595,10 @@ fn cast_array( parquet_options: &SparkParquetOptions, ) -> DataFusionResu

Re: [PR] chore: Adding an optional `hdfs` crate [datafusion-comet]

2025-02-12 Thread via GitHub
comphead commented on code in PR #1377: URL: https://github.com/apache/datafusion-comet/pull/1377#discussion_r1953415659 ## .github/actions/setup-builder/action.yaml: ## @@ -34,6 +34,7 @@ runs: run: | apt-get update apt-get install -y protobuf-compiler +

[PR] Early exit on column normalisation [datafusion]

2025-02-12 Thread via GitHub
blaginin opened a new pull request, #14636: URL: https://github.com/apache/datafusion/pull/14636 ## Which issue does this PR close? Related to https://github.com/apache/datafusion/issues/14563 (probably more prs to come) ## Rationale for this change Now, when normali

Re: [PR] perf: Use DataFusion FilterExec for experimental native scans [datafusion-comet]

2025-02-12 Thread via GitHub
mbutrovich commented on code in PR #1395: URL: https://github.com/apache/datafusion-comet/pull/1395#discussion_r1953445991 ## native/core/src/execution/planner.rs: ## @@ -992,14 +994,25 @@ impl PhysicalPlanner { let predicate = self.create_e

Re: [PR] Early exit on column normalisation [datafusion]

2025-02-12 Thread via GitHub
blaginin commented on PR #14636: URL: https://github.com/apache/datafusion/pull/14636#issuecomment-2654919627 Got +38% increase in `dataframe` benchmark ``` before after with_column_10 769.06 µs 673.32 µs with_column_100 1.4952 s978.43 m

[I] Parser error with GROUP BY with multiple filters on DataFusion 45 [datafusion]

2025-02-12 Thread via GitHub
mildbyte opened a new issue, #14633: URL: https://github.com/apache/datafusion/issues/14633 ### Describe the bug Running this query: ``` SELECT c1, SUM(c2) FILTER (WHERE c2 >= 20) AS sum_c2, AVG(c3) FILTER (WHERE c3 <= 70) AS avg_c3 FROM test_table GROUP BY

Re: [I] Sub-field names are not handled correctly when combining named_structs and NULL structs [datafusion]

2025-02-12 Thread via GitHub
Blizzara commented on issue #14632: URL: https://github.com/apache/datafusion/issues/14632#issuecomment-2654253000 Actually, this is related to an interplay between the substrait stuff and the analyzer. Seems like what happens is: - substrait is kind of handled "correctly", in that the pr

Re: [PR] Ballista Release Blog Announcement [datafusion-site]

2025-02-12 Thread via GitHub
andygrove merged PR #53: URL: https://github.com/apache/datafusion-site/pull/53 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

Re: [I] Simple Functions [datafusion]

2025-02-12 Thread via GitHub
comphead commented on issue #12635: URL: https://github.com/apache/datafusion/issues/12635#issuecomment-2654675641 Thanks @findepi and everyone, this work is epic, literally. in DataFusion it was always needed to unify the builtin functions as they implemented by different developers in dif

Re: [PR] feat: Support On-Demand Repartition [datafusion]

2025-02-12 Thread via GitHub
Dandandan commented on PR #14411: URL: https://github.com/apache/datafusion/pull/14411#issuecomment-2654601057 Specifically, I think we can try [this approach](https://github.com/apache/datafusion/pull/13707) together with on-demand repartition 🤔 -- This is an automated message from the

Re: [PR] Minor: Add docs and examples for `DataFusionErrorBuilder` [datafusion]

2025-02-12 Thread via GitHub
alamb merged PR #14551: URL: https://github.com/apache/datafusion/pull/14551 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] Example for using a separate threadpool for CPU bound work (try 2) [datafusion]

2025-02-12 Thread via GitHub
djanderson commented on code in PR #14286: URL: https://github.com/apache/datafusion/pull/14286#discussion_r1953482293 ## datafusion-examples/examples/thread_pools.rs: ## @@ -0,0 +1,238 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor lic

Re: [PR] Example for using a separate threadpool for CPU bound work (try 2) [datafusion]

2025-02-12 Thread via GitHub
slyons commented on code in PR #14286: URL: https://github.com/apache/datafusion/pull/14286#discussion_r1953478390 ## datafusion-examples/examples/thread_pools.rs: ## @@ -0,0 +1,238 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license

Re: [PR] perf: Use DataFusion FilterExec for experimental native scans [datafusion-comet]

2025-02-12 Thread via GitHub
mbutrovich commented on PR #1395: URL: https://github.com/apache/datafusion-comet/pull/1395#issuecomment-2654977328 This is the current performance difference when running the filter benchmark. DataFusion FilterExec corresponds to `arrow_filter_record_batch` while Comet FilterExec uses `co

Re: [PR] Example for using a separate threadpool for CPU bound work (try 2) [datafusion]

2025-02-12 Thread via GitHub
djanderson commented on code in PR #14286: URL: https://github.com/apache/datafusion/pull/14286#discussion_r1953490590 ## datafusion-examples/examples/thread_pools.rs: ## @@ -0,0 +1,238 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor lic

Re: [PR] Use ` take_function_args` in more places [datafusion]

2025-02-12 Thread via GitHub
alamb merged PR #14525: URL: https://github.com/apache/datafusion/pull/14525 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] Use ` take_function_args` in more places [datafusion]

2025-02-12 Thread via GitHub
alamb commented on PR #14525: URL: https://github.com/apache/datafusion/pull/14525#issuecomment-2654980522 Thanks again @lgingerich -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific c

Re: [I] Apply `take_function_args` to functions validating argument count [datafusion]

2025-02-12 Thread via GitHub
alamb closed issue #14516: Apply `take_function_args` to functions validating argument count URL: https://github.com/apache/datafusion/issues/14516 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to th

Re: [PR] chore: Adding an optional `hdfs` crate [datafusion-comet]

2025-02-12 Thread via GitHub
comphead commented on PR #1377: URL: https://github.com/apache/datafusion-comet/pull/1377#issuecomment-2654568051 @kazuyukitanimura @parthchandra @andygrove appreciate if you can have another look? -- This is an automated message from the Apache Git Service. To respond to the message, pl

Re: [PR] feat: add table source to DML proto to eliminate need for table lookup after deserialisation [datafusion]

2025-02-12 Thread via GitHub
alamb commented on code in PR #14631: URL: https://github.com/apache/datafusion/pull/14631#discussion_r1953445303 ## datafusion/core/src/datasource/memory.rs: ## @@ -648,9 +649,14 @@ mod tests { // Create a table scan logical plan to read from the source table

Re: [I] Remove `Wildcard` from `Expr` [datafusion]

2025-02-12 Thread via GitHub
rkrishn7 commented on issue #7765: URL: https://github.com/apache/datafusion/issues/7765#issuecomment-2654922651 From a user perspective, my opinion is that representing wildcard with `None` is too implicit. As @alamb mentioned, having an `expr_fn` helper makes this a bit better, but still

Re: [PR] Little changes "cache control" [datafusion]

2025-02-12 Thread via GitHub
alamb commented on PR #14611: URL: https://github.com/apache/datafusion/pull/14611#issuecomment-2654889040 > #14611 A cache control header is missing or empty . A cache control header is mising or empty meta[name=theme-color]' is not supported by Firefox. Button type attribute has not been

Re: [PR] perf: Use DataFusion FilterExec for experimental native scans [datafusion-comet]

2025-02-12 Thread via GitHub
mbutrovich commented on code in PR #1395: URL: https://github.com/apache/datafusion-comet/pull/1395#discussion_r1953445991 ## native/core/src/execution/planner.rs: ## @@ -992,14 +994,25 @@ impl PhysicalPlanner { let predicate = self.create_e

[PR] perf: Use DataFusion FilterExec for experimental native scans [datafusion-comet]

2025-02-12 Thread via GitHub
mbutrovich opened a new pull request, #1395: URL: https://github.com/apache/datafusion-comet/pull/1395 ## Which issue does this PR close? Closes #. ## Rationale for this change Currently Comet has a copy of DataFusion's FilterExec that is modified to do a

Re: [PR] Early exit on column normalisation [datafusion]

2025-02-12 Thread via GitHub
alamb commented on PR #14636: URL: https://github.com/apache/datafusion/pull/14636#issuecomment-2654921622 FYI @Omega359 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [PR] fix: Reduce cast.rs logic from parquet_support.rs for experimental native readers [datafusion-comet]

2025-02-12 Thread via GitHub
mbutrovich commented on code in PR #1387: URL: https://github.com/apache/datafusion-comet/pull/1387#discussion_r1953466766 ## native/core/src/parquet/parquet_support.rs: ## @@ -596,7 +595,10 @@ fn cast_array( parquet_options: &SparkParquetOptions, ) -> DataFusionResult {

Re: [PR] fix: Reduce cast.rs logic from parquet_support.rs for experimental native readers [datafusion-comet]

2025-02-12 Thread via GitHub
mbutrovich commented on code in PR #1387: URL: https://github.com/apache/datafusion-comet/pull/1387#discussion_r1953466766 ## native/core/src/parquet/parquet_support.rs: ## @@ -596,7 +595,10 @@ fn cast_array( parquet_options: &SparkParquetOptions, ) -> DataFusionResult {

Re: [PR] support simple/cross lateral joins [datafusion]

2025-02-12 Thread via GitHub
alamb commented on code in PR #14595: URL: https://github.com/apache/datafusion/pull/14595#discussion_r1953460041 ## datafusion/optimizer/src/decorrelate_lateral_join.rs: ## @@ -0,0 +1,106 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor

Re: [I] Move `ExpandWildcardRule` into Logical Plan construction [datafusion]

2025-02-12 Thread via GitHub
jayzhan211 commented on issue #14634: URL: https://github.com/apache/datafusion/issues/14634#issuecomment-2655172437 I don't think removing Expr::Wildcard is that trivial given there is WildcardOptions and it doesn't block moving `ExpandWildcardRule` out of analyzer -- This is an automate

Re: [I] Unable to query file on Kubernetes on AWS EKS, for remote-sql.rs example [datafusion-ballista]

2025-02-12 Thread via GitHub
Noah-FetchRewards commented on issue #1180: URL: https://github.com/apache/datafusion-ballista/issues/1180#issuecomment-2655171404 What do you mean by the client needs to list the files? I'm running the rust code locally, so I'm assuming that's the client in question. Does the file n

Re: [PR] doc: update memory tuning guide [datafusion-comet]

2025-02-12 Thread via GitHub
kazuyukitanimura commented on PR #1394: URL: https://github.com/apache/datafusion-comet/pull/1394#issuecomment-2654913900 cc @wForget -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specifi

Re: [PR] fix(substrait): Do not add implicit groupBy expressions when building logical plans from Substrait [datafusion]

2025-02-12 Thread via GitHub
anlinc commented on code in PR #14553: URL: https://github.com/apache/datafusion/pull/14553#discussion_r1953535318 ## datafusion/substrait/tests/cases/roundtrip_logical_plan.rs: ## @@ -300,6 +300,17 @@ async fn aggregate_grouping_rollup() -> Result<()> { ).await } +#[tok

Re: [PR] fix(substrait): Do not add implicit groupBy expressions when building logical plans from Substrait [datafusion]

2025-02-12 Thread via GitHub
anlinc commented on code in PR #14553: URL: https://github.com/apache/datafusion/pull/14553#discussion_r1953535318 ## datafusion/substrait/tests/cases/roundtrip_logical_plan.rs: ## @@ -300,6 +300,17 @@ async fn aggregate_grouping_rollup() -> Result<()> { ).await } +#[tok

Re: [PR] fix: disable checking for uint_8 and uint_16 if complex type readers are enabled [datafusion-comet]

2025-02-12 Thread via GitHub
andygrove commented on code in PR #1376: URL: https://github.com/apache/datafusion-comet/pull/1376#discussion_r1953535325 ## spark/src/test/scala/org/apache/comet/CometArrayExpressionSuite.scala: ## @@ -39,12 +39,14 @@ class CometArrayExpressionSuite extends CometTestBase with

Re: [PR] Demonstrate EnforceSorting can remove a needed coalesce [datafusion]

2025-02-12 Thread via GitHub
wiedld commented on PR #14637: URL: https://github.com/apache/datafusion/pull/14637#issuecomment-2655071801 Asking for advice from @alamb, @mustafasrepo , or anyone else on the expected behavior. 🙏🏼 -- This is an automated message from the Apache Git Service. To respond to the message, p

Re: [PR] chore: Remove redundant processing from exprToProtoInternal [datafusion-comet]

2025-02-12 Thread via GitHub
andygrove commented on PR #1351: URL: https://github.com/apache/datafusion-comet/pull/1351#issuecomment-2655018982 > Question @andygrove @parthchandra @comphead Can any of the child node be decimal calculations? Calling `exprToProtoInternal` will skip `DecimalPrecision.promote()`. I am not

Re: [PR] chore: Adding an optional `hdfs` crate [datafusion-comet]

2025-02-12 Thread via GitHub
andygrove commented on code in PR #1377: URL: https://github.com/apache/datafusion-comet/pull/1377#discussion_r1953545835 ## NOTICE.txt: ## @@ -8,3 +8,6 @@ This product includes software developed at Apache Gluten (https://github.com/apache/incubator-gluten/) Specifically: -

Re: [PR] chore: Adding an optional `hdfs` crate [datafusion-comet]

2025-02-12 Thread via GitHub
andygrove commented on code in PR #1377: URL: https://github.com/apache/datafusion-comet/pull/1377#discussion_r1953546281 ## NOTICE.txt: ## @@ -8,3 +8,6 @@ This product includes software developed at Apache Gluten (https://github.com/apache/incubator-gluten/) Specifically: -

Re: [PR] feat: override executor overhead memory only when comet unified memory manager is disabled [datafusion-comet]

2025-02-12 Thread via GitHub
wForget commented on code in PR #1379: URL: https://github.com/apache/datafusion-comet/pull/1379#discussion_r1953672696 ## spark/src/main/scala/org/apache/spark/Plugins.scala: ## @@ -62,7 +62,13 @@ class CometDriverPlugin extends DriverPlugin with Logging with ShimCometDriverPl

<    1   2   3