Re: [PR] refactor: Move various parts of datasource out of core [datafusion]

2025-02-12 Thread via GitHub
alamb commented on PR #14616: URL: https://github.com/apache/datafusion/pull/14616#issuecomment-2653424551 FWIW I think we can merge this PR and then keep moving code around as follow on PRs) -- This is an automated message from the Apache Git Service. To respond to the message, please lo

Re: [PR] Fix ci test [datafusion]

2025-02-12 Thread via GitHub
alamb commented on PR #14625: URL: https://github.com/apache/datafusion/pull/14625#issuecomment-2653416218 Thank you @xudong963 and @jayzhan211 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to th

Re: [PR] fix(substrait): Do not add implicit groupBy expressions when building logical plans from Substrait [datafusion]

2025-02-12 Thread via GitHub
Blizzara commented on PR #14553: URL: https://github.com/apache/datafusion/pull/14553#issuecomment-2654013073 Thanks, seems like a clear enough bug, appreciate both the report and the PR to fix it! -- This is an automated message from the Apache Git Service. To respond to the message, ple

Re: [PR] function: Allow more expressive array signatures [datafusion]

2025-02-12 Thread via GitHub
jkosh44 commented on code in PR #14532: URL: https://github.com/apache/datafusion/pull/14532#discussion_r1952912279 ## datafusion/expr-common/src/signature.rs: ## @@ -286,6 +258,72 @@ impl Display for ArrayFunctionSignature { } } +/// A wrapper around a vec of [`ArrayFun

Re: [PR] Feat: support array_except function [datafusion-comet]

2025-02-12 Thread via GitHub
kazantsev-maksim commented on PR #1343: URL: https://github.com/apache/datafusion-comet/pull/1343#issuecomment-2654088772 Thanks @kazuyukitanimura. I've fixed the formatting in the `native` module -- This is an automated message from the Apache Git Service. To respond to the message, plea

Re: [PR] fix: Passthrough condition in StaticInvoke case block [datafusion-comet]

2025-02-12 Thread via GitHub
codecov-commenter commented on PR #1392: URL: https://github.com/apache/datafusion-comet/pull/1392#issuecomment-2654124657 ## [Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/1392?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_ca

Re: [PR] chore: Refactor aggregate expression serde [datafusion-comet]

2025-02-12 Thread via GitHub
andygrove commented on PR #1380: URL: https://github.com/apache/datafusion-comet/pull/1380#issuecomment-2654127513 > It's hard to spot the two functional changes made in this PR because of the large amount of code moved. Can you tag the places where the changes were made? Sure, func

Re: [PR] function: Allow more expressive array signatures [datafusion]

2025-02-12 Thread via GitHub
jkosh44 commented on PR #14532: URL: https://github.com/apache/datafusion/pull/14532#issuecomment-2654129469 > It is because of this, I think we now only coerce to list if the flag is set Are you saying that the function should look something like this? ```Rust fn array(

Re: [PR] chore: Refactor aggregate expression serde [datafusion-comet]

2025-02-12 Thread via GitHub
andygrove commented on code in PR #1380: URL: https://github.com/apache/datafusion-comet/pull/1380#discussion_r1952928795 ## spark/src/main/scala/org/apache/comet/serde/aggregates.scala: ## @@ -0,0 +1,734 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + *

[I] Sub-field names are not handled correctly when combining named_structs and NULL structs [datafusion]

2025-02-12 Thread via GitHub
Blizzara opened a new issue, #14632: URL: https://github.com/apache/datafusion/issues/14632 ### Describe the bug When a Substrait plan contains e.g. an CASE WHEN statement, where one arm returns a named struct, and another arm returns a non-named struct, the names are handled incorre

[I] Documentation regarding running/regenerating stability test plans [datafusion-comet]

2025-02-12 Thread via GitHub
EmilyMatt opened a new issue, #1393: URL: https://github.com/apache/datafusion-comet/issues/1393 ### What is the problem the feature request solves? I think the documentation is not clear enough on this. Using a clean main branch, I tried running `./mvnw -pl spark -Dsuites="org.

Re: [PR] feat: Support On-Demand Repartition [datafusion]

2025-02-12 Thread via GitHub
Dandandan commented on PR #14411: URL: https://github.com/apache/datafusion/pull/14411#issuecomment-2654136409 > > > I wonder why tpch_mem_sf10 is slower for some queries? Might it be possible the created memtable is not created evenly because of the new round robin (that might be fixable e

Re: [PR] function: Allow more expressive array signatures [datafusion]

2025-02-12 Thread via GitHub
jkosh44 commented on code in PR #14532: URL: https://github.com/apache/datafusion/pull/14532#discussion_r1952948300 ## datafusion/functions-nested/src/array_has.rs: ## @@ -94,7 +94,7 @@ impl Default for ArrayHas { impl ArrayHas { pub fn new() -> Self { Self { -

Re: [PR] DataFusion Ray rewrite to connect stages with Arrow Flight Streaming [datafusion-ray]

2025-02-12 Thread via GitHub
andygrove commented on PR #60: URL: https://github.com/apache/datafusion-ray/pull/60#issuecomment-2654222146 Thanks @robtandy. There has been a lot of progress and I agree that it would be good to merge this. The Kubernetes CI tests are failing, which is not surprising. Do we want to

Re: [PR] Add union_extract scalar function [datafusion]

2025-02-12 Thread via GitHub
Omega359 commented on PR #12116: URL: https://github.com/apache/datafusion/pull/12116#issuecomment-2655362249 @alamb should be, yes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific c

Re: [PR] Early exit on column normalisation [datafusion]

2025-02-12 Thread via GitHub
Omega359 commented on PR #14636: URL: https://github.com/apache/datafusion/pull/14636#issuecomment-2655363307 I'll check this out tomorrow. we've been chatting about our approaches on #14563 -- This is an automated message from the Apache Git Service. To respond to the message, please log

Re: [PR] support simple/cross lateral joins [datafusion]

2025-02-12 Thread via GitHub
skyzh commented on PR #14595: URL: https://github.com/apache/datafusion/pull/14595#issuecomment-2655639889 Okay, I just realized that the plan is not correct. I'll need to do more transformation to make it work :( So back to the semantics of the lateral join, ``` SELECT *

Re: [I] Move `ExpandWildcardRule` into Logical Plan construction [datafusion]

2025-02-12 Thread via GitHub
jayzhan211 commented on issue #14634: URL: https://github.com/apache/datafusion/issues/14634#issuecomment-2655380153 Even though `Expr::Wildcard` has been removed, we still need an equivalent concept and a way to "expand wildcard" -- This is an automated message from the Apache Git Servic

Re: [PR] fix: disable checking for uint_8 and uint_16 if complex type readers are enabled [datafusion-comet]

2025-02-12 Thread via GitHub
parthchandra commented on code in PR #1376: URL: https://github.com/apache/datafusion-comet/pull/1376#discussion_r1953768242 ## spark/src/test/scala/org/apache/comet/CometCastSuite.scala: ## @@ -145,88 +145,134 @@ class CometCastSuite extends CometTestBase with AdaptiveSparkPla

Re: [PR] bug: improve schema checking for `insert into` cases [datafusion]

2025-02-12 Thread via GitHub
zhuqi-lucas commented on code in PR #14572: URL: https://github.com/apache/datafusion/pull/14572#discussion_r1953772428 ## datafusion/common/src/dfschema.rs: ## @@ -1028,21 +1028,41 @@ impl SchemaExt for Schema { }) } -fn logically_equivalent_names_and_ty

Re: [I] Unable to query file on Kubernetes on AWS EKS, for remote-sql.rs example [datafusion-ballista]

2025-02-12 Thread via GitHub
milenkovicm commented on issue #1180: URL: https://github.com/apache/datafusion-ballista/issues/1180#issuecomment-2655674033 Client need access to files during logical planning phase in order to setup appropriate table scan -- This is an automated message from the Apache Git Service. To

Re: [PR] perf: Drop RowConverter from GroupOrderingPartial [datafusion]

2025-02-12 Thread via GitHub
2010YOUY01 commented on PR #14566: URL: https://github.com/apache/datafusion/pull/14566#issuecomment-2655677717 Thank you, this looks good to me. Let's get the CI fixed. https://github.com/apache/datafusion/blob/main/datafusion/core/tests/fuzz_cases/aggregate_fuzz.rs should have good

Re: [PR] function: Allow more expressive array signatures [datafusion]

2025-02-12 Thread via GitHub
jkosh44 commented on PR #14532: URL: https://github.com/apache/datafusion/pull/14532#issuecomment-2655302737 > It might because of `array_coercion` you set for the function is incorrect 🤔 The query that fails is `array.slt:1221` which is ``` query IT select array_element(arro

Re: [I] Move `ExpandWildcardRule` into Logical Plan construction [datafusion]

2025-02-12 Thread via GitHub
rkrishn7 commented on issue #14634: URL: https://github.com/apache/datafusion/issues/14634#issuecomment-2655366067 @jayzhan211 Definitely, I don't think it's blocking either. I guess I was just thinking that if `Expr::Wildcard` gets removed, then it removes the need for this work. But sound

Re: [PR] Example for using a separate threadpool for CPU bound work (try 2) [datafusion]

2025-02-12 Thread via GitHub
djanderson commented on PR #14286: URL: https://github.com/apache/datafusion/pull/14286#issuecomment-2655439836 I checked out this PR and tried to add a test to reproduce the issue. Including here although it's definitely too dependency heavy for a unit test: #[tokio::test

Re: [PR] fix: Reduce cast.rs logic from parquet_support.rs for experimental native readers [datafusion-comet]

2025-02-12 Thread via GitHub
parthchandra commented on PR #1387: URL: https://github.com/apache/datafusion-comet/pull/1387#issuecomment-2655443473 I had the impression that this PR had originally reduced the number of failures in native_iceberg_compat as well but that is no longer true after the cleanup. Is that corre

[I] Feature: support cast `date` to `timestamp` with tz [datafusion]

2025-02-12 Thread via GitHub
xudong963 opened a new issue, #14638: URL: https://github.com/apache/datafusion/issues/14638 ### Is your feature request related to a problem or challenge? ``` > select to_char(arrow_cast('2023-09-04'::date, 'Timestamp(Second, Some("UTC"))'), '%Y-%m-%dT%H:%M:%S%.3f'); This featu

Re: [PR] fix: Reduce cast.rs logic from parquet_support.rs for experimental native readers [datafusion-comet]

2025-02-12 Thread via GitHub
parthchandra commented on code in PR #1387: URL: https://github.com/apache/datafusion-comet/pull/1387#discussion_r1953778478 ## native/core/src/parquet/parquet_support.rs: ## @@ -618,174 +143,20 @@ fn cast_array( Dictionary(_, _) => Arc::new(casted_dictionary.cl

Re: [PR] Add test for nullable doesn't work when create memory table [datafusion]

2025-02-12 Thread via GitHub
xudong963 commented on PR #14624: URL: https://github.com/apache/datafusion/pull/14624#issuecomment-2655349713 Thanks all -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [I] Nullable doesn't work when create memory table [datafusion]

2025-02-12 Thread via GitHub
xudong963 closed issue #14522: Nullable doesn't work when create memory table URL: https://github.com/apache/datafusion/issues/14522 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comme

Re: [PR] Add test for nullable doesn't work when create memory table [datafusion]

2025-02-12 Thread via GitHub
xudong963 merged PR #14624: URL: https://github.com/apache/datafusion/pull/14624 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@data

Re: [PR] feat: override executor overhead memory only when comet unified memory manager is disabled [datafusion-comet]

2025-02-12 Thread via GitHub
wForget commented on code in PR #1379: URL: https://github.com/apache/datafusion-comet/pull/1379#discussion_r1953731417 ## spark/src/main/scala/org/apache/comet/CometSparkSessionExtensions.scala: ## @@ -1354,9 +1354,14 @@ object CometSparkSessionExtensions extends Logging {

[PR] Add documentation for prepare statements. [datafusion]

2025-02-12 Thread via GitHub
dhegberg opened a new pull request, #14639: URL: https://github.com/apache/datafusion/pull/14639 ## Which issue does this PR close? - Partially addresses #13570 . ## Rationale for this change Add basic documentation for SQL PREPARE and the handling in Data

[PR] DNM: test dpp support [datafusion-comet]

2025-02-12 Thread via GitHub
wForget opened a new pull request, #1396: URL: https://github.com/apache/datafusion-comet/pull/1396 ## Which issue does this PR close? Closes #. ## Rationale for this change ## What changes are included in this PR? ## How are these changes t

Re: [I] External sorting not working for (maybe only for string columns??) [datafusion]

2025-02-12 Thread via GitHub
zhuqi-lucas commented on issue #12136: URL: https://github.com/apache/datafusion/issues/12136#issuecomment-2655549725 Thanks @xuchen-plus , unluckily after change to 32MB for sort_spill_reservation_bytes, it still failed, i am not sure which i am missing. -- This is an automated message f

Re: [PR] Little changes "cache control" [datafusion]

2025-02-12 Thread via GitHub
Ramjee194 commented on PR #14611: URL: https://github.com/apache/datafusion/pull/14611#issuecomment-2654294377 #14611 A cache control header is missing or empty . A cache control header is mising or empty meta[name=theme-color]' is not supported by Firefox. Button type attribute has

Re: [PR] fix: AQE creating a non-supported Final HashAggregate post-shuffle [datafusion-comet]

2025-02-12 Thread via GitHub
andygrove commented on PR #1390: URL: https://github.com/apache/datafusion-comet/pull/1390#issuecomment-2654307818 Thanks @EmilyMatt. Would it be possible to add a test to reproduce the issue? -- This is an automated message from the Apache Git Service. To respond to the message, please l

Re: [I] Unable to query file on Kubernetes on AWS EKS, for remote-sql.rs example [datafusion-ballista]

2025-02-12 Thread via GitHub
milenkovicm commented on issue #1180: URL: https://github.com/apache/datafusion-ballista/issues/1180#issuecomment-2654359041 > I can confirm the scheduler has the file loaded onto it because I can "sh" into the cluster and view the file with "ls". client needs to list the files and e

Re: [PR] feat: Add unbounded memory pool [datafusion-comet]

2025-02-12 Thread via GitHub
andygrove commented on code in PR #1386: URL: https://github.com/apache/datafusion-comet/pull/1386#discussion_r1953178365 ## common/src/main/scala/org/apache/comet/CometConf.scala: ## @@ -504,8 +504,8 @@ object CometConf extends ShimCometConf { .doc( "The type of mem

Re: [PR] feat: Add unbounded memory pool [datafusion-comet]

2025-02-12 Thread via GitHub
andygrove commented on code in PR #1386: URL: https://github.com/apache/datafusion-comet/pull/1386#discussion_r1953180065 ## native/core/src/execution/jni_api.rs: ## @@ -319,6 +320,7 @@ fn parse_memory_pool_config( "greedy_global" => MemoryPoolConfig::new(MemoryPoo

Re: [PR] feat: Add unbounded memory pool [datafusion-comet]

2025-02-12 Thread via GitHub
kazuyukitanimura commented on PR #1386: URL: https://github.com/apache/datafusion-comet/pull/1386#issuecomment-2654507066 Merged, thanks @parthchandra @andygrove -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the UR

Re: [PR] feat: Add unbounded memory pool [datafusion-comet]

2025-02-12 Thread via GitHub
kazuyukitanimura merged PR #1386: URL: https://github.com/apache/datafusion-comet/pull/1386 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsub

Re: [PR] feat: Add unbounded memory pool [datafusion-comet]

2025-02-12 Thread via GitHub
kazuyukitanimura commented on code in PR #1386: URL: https://github.com/apache/datafusion-comet/pull/1386#discussion_r1953181357 ## common/src/main/scala/org/apache/comet/CometConf.scala: ## @@ -504,8 +504,8 @@ object CometConf extends ShimCometConf { .doc( "The type

Re: [I] Parser error with GROUP BY with multiple filters on DataFusion 45 [datafusion]

2025-02-12 Thread via GitHub
mildbyte commented on issue #14633: URL: https://github.com/apache/datafusion/issues/14633#issuecomment-2654526480 https://github.com/search?q=repo%3Aapache%2Fdatafusion-sqlparser-rs%20supports_filter_during_aggregation&type=code So as a workaround we could use any dialect that suppor

Re: [I] Integrate `Analyzer` within LogicalPlan building stage [datafusion]

2025-02-12 Thread via GitHub
rkrishn7 commented on issue #14618: URL: https://github.com/apache/datafusion/issues/14618#issuecomment-2654832263 +1 I'd be happy to help here. I can start by taking on moving `ExpandWildcardRule` -- This is an automated message from the Apache Git Service. To respond to the messa

[I] Move `ExpandWildcardRule` into Logical Plan construction [datafusion]

2025-02-12 Thread via GitHub
rkrishn7 opened a new issue, #14634: URL: https://github.com/apache/datafusion/issues/14634 ### Is your feature request related to a problem or challenge? Related to #14618. The first step in integrating analyzer rules in the builder stage is moving `ExpandWildcardRule` out of

Re: [PR] chore: Adding an optional `hdfs` crate [datafusion-comet]

2025-02-12 Thread via GitHub
kazuyukitanimura commented on code in PR #1377: URL: https://github.com/apache/datafusion-comet/pull/1377#discussion_r1953403442 ## .github/actions/setup-builder/action.yaml: ## @@ -34,6 +34,7 @@ runs: run: | apt-get update apt-get install -y protobuf-co

[I] Remove `Wildcard` from `Expr` [datafusion]

2025-02-12 Thread via GitHub
findepi opened a new issue, #14635: URL: https://github.com/apache/datafusion/issues/14635 `Wildcard` is not an expression. It should not be part of `Expr` enum it doesn't even have a type https://github.com/apache/datafusion/blob/68e372f2952bae884590f0589f23e28fb8bc3eaf/datafusion/ex

Re: [I] Integrate `Analyzer` within LogicalPlan building stage [datafusion]

2025-02-12 Thread via GitHub
findepi commented on issue #14618: URL: https://github.com/apache/datafusion/issues/14618#issuecomment-2654864803 > The role of the Analyzer is unclear to me. Having two types of "optimization" after the plan is completed doesn’t seem necessary. Instead, we should have one optimization step

Re: [I] Remove `Wildcard` from `Expr` [datafusion]

2025-02-12 Thread via GitHub
findepi closed issue #14635: Remove `Wildcard` from `Expr` URL: https://github.com/apache/datafusion/issues/14635 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe

Re: [PR] DataFusion Ray rewrite to connect stages with Arrow Flight Streaming [datafusion-ray]

2025-02-12 Thread via GitHub
robtandy commented on PR #60: URL: https://github.com/apache/datafusion-ray/pull/60#issuecomment-2654267535 I don't know much yet about setting up CI, if you/we can disable, please do. My preference would be land this, and sort out the housekeeping like tasks we need to do in order to get

Re: [I] Perf: Dataframe with_column and with_column_renamed are slow [datafusion]

2025-02-12 Thread via GitHub
blaginin commented on issue #14563: URL: https://github.com/apache/datafusion/issues/14563#issuecomment-2654717456 I really like that idea, Bruce! I tried to break your branch, but everything seems to work 🙂 I think the issue was that on every rename, we tried to recursively normalize _ever

Re: [PR] fix: Reduce cast.rs logic from parquet_support.rs for experimental native readers [datafusion-comet]

2025-02-12 Thread via GitHub
mbutrovich commented on PR #1387: URL: https://github.com/apache/datafusion-comet/pull/1387#issuecomment-2654730899 Updated the test failures in the original description. `native_iceberg_compat` really does not like this change. -- This is an automated message from the Apache Git Service

Re: [PR] feat: override executor overhead memory only when comet unified memory manager is disabled [datafusion-comet]

2025-02-12 Thread via GitHub
kazuyukitanimura commented on code in PR #1379: URL: https://github.com/apache/datafusion-comet/pull/1379#discussion_r1953331376 ## spark/src/main/scala/org/apache/spark/Plugins.scala: ## @@ -62,7 +62,13 @@ class CometDriverPlugin extends DriverPlugin with Logging with ShimCome

Re: [PR] fix: AQE creating a non-supported Final HashAggregate post-shuffle [datafusion-comet]

2025-02-12 Thread via GitHub
codecov-commenter commented on PR #1390: URL: https://github.com/apache/datafusion-comet/pull/1390#issuecomment-2654449945 ## [Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/1390?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_ca

Re: [I] Parser error with GROUP BY with multiple filters on DataFusion 45 [datafusion]

2025-02-12 Thread via GitHub
jkosh44 commented on issue #14633: URL: https://github.com/apache/datafusion/issues/14633#issuecomment-2654537260 > So as a workaround we could use any dialect that supports it (e.g. postgresql), gotcha. That sounds like it should work. From some googling it looks like the `FILTER` c

Re: [I] Perf: Dataframe with_column and with_column_renamed are slow [datafusion]

2025-02-12 Thread via GitHub
Omega359 commented on issue #14563: URL: https://github.com/apache/datafusion/issues/14563#issuecomment-2654798363 I'll be honest - I'm pretty out of my element with these changes. I don't know what is 'correct behaviour' and what isn't here. My thinking for the changes in my current branch

Re: [PR] Example for using a separate threadpool for CPU bound work (try 2) [datafusion]

2025-02-12 Thread via GitHub
djanderson commented on PR #14286: URL: https://github.com/apache/datafusion/pull/14286#issuecomment-2654795222 Is there extra work I'd need to do to get this to work with the `ParquetSink`? This isn't a full reproducer but just a quick copy of the relevant section of the `do_put_statement_

Re: [PR] Add support for PostgreSQL/Redshift geometric operators [datafusion-sqlparser-rs]

2025-02-12 Thread via GitHub
benrsatori commented on code in PR #1723: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1723#discussion_r1953366365 ## tests/sqlparser_postgres.rs: ## @@ -2057,13 +2057,13 @@ fn parse_pg_custom_binary_ops() { // Here, we test the ones used by common extension

Re: [PR] feat: override executor overhead memory only when comet unified memory manager is disabled [datafusion-comet]

2025-02-12 Thread via GitHub
kazuyukitanimura commented on code in PR #1379: URL: https://github.com/apache/datafusion-comet/pull/1379#discussion_r1953334086 ## spark/src/main/scala/org/apache/spark/Plugins.scala: ## @@ -62,7 +62,13 @@ class CometDriverPlugin extends DriverPlugin with Logging with ShimCome

Re: [PR] Add support for PostgreSQL/Redshift geometric operators [datafusion-sqlparser-rs]

2025-02-12 Thread via GitHub
benrsatori commented on code in PR #1723: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1723#discussion_r1953345423 ## src/ast/operator.rs: ## @@ -121,11 +136,11 @@ pub enum BinaryOperator { MyIntegerDivide, /// Support for custom operators (such as Post

Re: [PR] Add support for PostgreSQL/Redshift geometric operators [datafusion-sqlparser-rs]

2025-02-12 Thread via GitHub
benrsatori commented on code in PR #1723: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1723#discussion_r1953345106 ## src/ast/operator.rs: ## @@ -53,6 +53,16 @@ pub enum UnaryOperator { PGAbs, /// Unary logical not operator: e.g. `! false` (Hive-specifi

Re: [PR] fix(substrait): Do not add implicit groupBy expressions when building logical plans from Substrait [datafusion]

2025-02-12 Thread via GitHub
anlinc commented on code in PR #14553: URL: https://github.com/apache/datafusion/pull/14553#discussion_r1953345644 ## datafusion/expr/src/logical_plan/builder.rs: ## @@ -1090,11 +1090,31 @@ impl LogicalPlanBuilder { group_expr: impl IntoIterator>, aggr_expr: im

Re: [PR] Add support for PostgreSQL/Redshift geometric operators [datafusion-sqlparser-rs]

2025-02-12 Thread via GitHub
benrsatori commented on code in PR #1723: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1723#discussion_r1953344303 ## src/ast/operator.rs: ## @@ -77,13 +92,13 @@ impl fmt::Display for UnaryOperator { #[cfg_attr(feature = "serde", derive(Serialize, Deserialize))]

Re: [PR] feat: override executor overhead memory only when comet unified memory manager is disabled [datafusion-comet]

2025-02-12 Thread via GitHub
kazuyukitanimura commented on code in PR #1379: URL: https://github.com/apache/datafusion-comet/pull/1379#discussion_r195448 ## spark/src/main/scala/org/apache/comet/CometSparkSessionExtensions.scala: ## @@ -1354,9 +1354,14 @@ object CometSparkSessionExtensions extends Loggi

Re: [PR] Add support for PostgreSQL/Redshift geometric operators [datafusion-sqlparser-rs]

2025-02-12 Thread via GitHub
benrsatori commented on code in PR #1723: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1723#discussion_r1953347665 ## src/ast/operator.rs: ## @@ -253,6 +268,40 @@ pub enum BinaryOperator { /// Specifies a test for an overlap between two datetime periods:

Re: [PR] Add support for PostgreSQL/Redshift geometric operators [datafusion-sqlparser-rs]

2025-02-12 Thread via GitHub
benrsatori commented on code in PR #1723: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1723#discussion_r1953348633 ## src/parser/mod.rs: ## @@ -1411,6 +1420,33 @@ impl<'a> Parser<'a> { ), }) } +tok @

Re: [PR] fix(substrait): Do not add implicit groupBy expressions when building logical plans from Substrait [datafusion]

2025-02-12 Thread via GitHub
anlinc commented on code in PR #14553: URL: https://github.com/apache/datafusion/pull/14553#discussion_r1953345644 ## datafusion/expr/src/logical_plan/builder.rs: ## @@ -1090,11 +1090,31 @@ impl LogicalPlanBuilder { group_expr: impl IntoIterator>, aggr_expr: im

Re: [PR] Add support for PostgreSQL/Redshift geometric operators [datafusion-sqlparser-rs]

2025-02-12 Thread via GitHub
benrsatori commented on code in PR #1723: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1723#discussion_r1953349274 ## src/test_utils.rs: ## @@ -284,6 +284,17 @@ where dialects } +pub fn only_psql_redshift() -> TestedDialects { +TestedDialects { +

Re: [PR] Add support for PostgreSQL/Redshift geometric operators [datafusion-sqlparser-rs]

2025-02-12 Thread via GitHub
benrsatori commented on code in PR #1723: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1723#discussion_r1953348166 ## src/parser/mod.rs: ## @@ -1350,7 +1351,15 @@ impl<'a> Parser<'a> { Ok(Some(expr)) => Ok(expr), // No

Re: [PR] fix(substrait): Do not add implicit groupBy expressions when building logical plans from Substrait [datafusion]

2025-02-12 Thread via GitHub
anlinc commented on code in PR #14553: URL: https://github.com/apache/datafusion/pull/14553#discussion_r1953347407 ## datafusion/expr/src/logical_plan/builder.rs: ## @@ -1090,11 +1090,31 @@ impl LogicalPlanBuilder { group_expr: impl IntoIterator>, aggr_expr: im

Re: [PR] Add support for PostgreSQL/Redshift geometric operators [datafusion-sqlparser-rs]

2025-02-12 Thread via GitHub
benrsatori commented on code in PR #1723: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1723#discussion_r1953345692 ## src/ast/operator.rs: ## @@ -253,6 +268,40 @@ pub enum BinaryOperator { /// Specifies a test for an overlap between two datetime periods:

Re: [PR] Example for using a separate threadpool for CPU bound work (try 2) [datafusion]

2025-02-12 Thread via GitHub
alamb commented on PR #14286: URL: https://github.com/apache/datafusion/pull/14286#issuecomment-2654986803 > Is there extra work I'd need to do to get this to work with the `ParquetSink`? This isn't a full reproducer but just a quick copy of the relevant section of the `do_put_statement_ing

Re: [I] Implement nested join optimization [datafusion]

2025-02-12 Thread via GitHub
alamb commented on issue #3843: URL: https://github.com/apache/datafusion/issues/3843#issuecomment-2654991464 Thanks @clflushopt I don't have a great handle in my head on the current state of Boundary and Selectivity anaylsis. Maybe your first PRs could focus on adding some docs and

Re: [I] Move `ExpandWildcardRule` into Logical Plan construction [datafusion]

2025-02-12 Thread via GitHub
rkrishn7 commented on issue #14634: URL: https://github.com/apache/datafusion/issues/14634#issuecomment-2654992676 Waiting until we decide on a path forward in https://github.com/apache/datafusion/issues/7765 -- This is an automated message from the Apache Git Service. To respond to the m

Re: [PR] chore: Remove redundant processing from exprToProtoInternal [datafusion-comet]

2025-02-12 Thread via GitHub
kazuyukitanimura commented on PR #1351: URL: https://github.com/apache/datafusion-comet/pull/1351#issuecomment-2654992219 Question @andygrove @parthchandra @comphead Can any of the child node be decimal calculations? Calling `exprToProtoInternal` will skip `DecimalPrecision.promote()`.

Re: [PR] [comet-parquet-exec] Add Native Scan to CometReadBenchmark [datafusion-comet]

2025-02-12 Thread via GitHub
mbutrovich commented on PR #1150: URL: https://github.com/apache/datafusion-comet/pull/1150#issuecomment-2654998780 I'll see about reviving it tomorrow. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

Re: [PR] Example for using a separate threadpool for CPU bound work (try 2) [datafusion]

2025-02-12 Thread via GitHub
djanderson commented on code in PR #14286: URL: https://github.com/apache/datafusion/pull/14286#discussion_r1953482293 ## datafusion-examples/examples/thread_pools.rs: ## @@ -0,0 +1,238 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor lic

Re: [PR] feat: Support `Utf8View` for `get_wider_type` + `binary_to_string_coercion` functions [datafusion]

2025-02-12 Thread via GitHub
alamb closed pull request #13370: feat: Support `Utf8View` for `get_wider_type` + `binary_to_string_coercion` functions URL: https://github.com/apache/datafusion/pull/13370 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

Re: [I] Logically repartition files by row splits [datafusion]

2025-02-12 Thread via GitHub
alamb commented on issue #14607: URL: https://github.com/apache/datafusion/issues/14607#issuecomment-2655004320 > Seems like the best way would be to configure FileGroupPartitioner through FileSource. The other option would be to make FileRange an enum, but that would still mean we (and any

Re: [PR] Add union_extract scalar function [datafusion]

2025-02-12 Thread via GitHub
alamb commented on PR #12116: URL: https://github.com/apache/datafusion/pull/12116#issuecomment-2655006559 What is the status of this PR? Is is ready for a look? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

Re: [PR] Implement predicate pruning for not like expressions [datafusion]

2025-02-12 Thread via GitHub
alamb commented on PR #14567: URL: https://github.com/apache/datafusion/pull/14567#issuecomment-2655007520 This is on my list of PRs to review tomorrow -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to g

Re: [PR] fix(substrait): Do not add implicit groupBy expressions when building logical plans from Substrait [datafusion]

2025-02-12 Thread via GitHub
anlinc commented on code in PR #14553: URL: https://github.com/apache/datafusion/pull/14553#discussion_r1953514409 ## datafusion/expr/src/logical_plan/builder.rs: ## @@ -1090,11 +1090,31 @@ impl LogicalPlanBuilder { group_expr: impl IntoIterator>, aggr_expr: im

Re: [PR] fix(substrait): Do not add implicit groupBy expressions when building logical plans from Substrait [datafusion]

2025-02-12 Thread via GitHub
anlinc commented on code in PR #14553: URL: https://github.com/apache/datafusion/pull/14553#discussion_r1953514141 ## datafusion/expr/src/logical_plan/builder.rs: ## @@ -1090,11 +1090,31 @@ impl LogicalPlanBuilder { group_expr: impl IntoIterator>, aggr_expr: im

Re: [PR] Example for using a separate threadpool for CPU bound work (try 2) [datafusion]

2025-02-12 Thread via GitHub
djanderson commented on code in PR #14286: URL: https://github.com/apache/datafusion/pull/14286#discussion_r1953490590 ## datafusion-examples/examples/thread_pools.rs: ## @@ -0,0 +1,238 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor lic

[PR] chore(deps): bump clap from 4.5.28 to 4.5.29 [datafusion]

2025-02-12 Thread via GitHub
dependabot[bot] opened a new pull request, #14619: URL: https://github.com/apache/datafusion/pull/14619 Bumps [clap](https://github.com/clap-rs/clap) from 4.5.28 to 4.5.29. Release notes Sourced from https://github.com/clap-rs/clap/releases";>clap's releases. v4.5.29 [4.5.

[PR] chore(deps): bump prost from 0.13.4 to 0.13.5 [datafusion]

2025-02-12 Thread via GitHub
dependabot[bot] opened a new pull request, #14621: URL: https://github.com/apache/datafusion/pull/14621 Bumps [prost](https://github.com/tokio-rs/prost) from 0.13.4 to 0.13.5. Changelog Sourced from https://github.com/tokio-rs/prost/blob/master/CHANGELOG.md";>prost's changelog.

[PR] chore(deps): bump bzip2 from 0.5.0 to 0.5.1 [datafusion]

2025-02-12 Thread via GitHub
dependabot[bot] opened a new pull request, #14620: URL: https://github.com/apache/datafusion/pull/14620 Bumps [bzip2](https://github.com/trifectatechfoundation/bzip2-rs) from 0.5.0 to 0.5.1. Commits https://github.com/trifectatechfoundation/bzip2-rs/commit/dbbc3b46809ed15d6dc40

[PR] chore(deps): bump prost-build from 0.13.4 to 0.13.5 [datafusion]

2025-02-12 Thread via GitHub
dependabot[bot] opened a new pull request, #14623: URL: https://github.com/apache/datafusion/pull/14623 Bumps [prost-build](https://github.com/tokio-rs/prost) from 0.13.4 to 0.13.5. Changelog Sourced from https://github.com/tokio-rs/prost/blob/master/CHANGELOG.md";>prost-build's ch

Re: [PR] feat: Support `Utf8View` for `get_wider_type` + `binary_to_string_coercion` functions [datafusion]

2025-02-12 Thread via GitHub
alamb commented on PR #13370: URL: https://github.com/apache/datafusion/pull/13370#issuecomment-2655005676 I think we have merged all relevant parts of this PR so closing this one -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHu

[PR] Demonstrate EnforceSorting can remove a needed coalesce [datafusion]

2025-02-12 Thread via GitHub
wiedld opened a new pull request, #14637: URL: https://github.com/apache/datafusion/pull/14637 ## Which issue does this PR close? I'll make a new issue, once we confirm this reproducer is a bug. ## Rationale for this change We have physical plans which are failing in the

Re: [PR] Example for using a separate threadpool for CPU bound work (try 2) [datafusion]

2025-02-12 Thread via GitHub
djanderson commented on PR #14286: URL: https://github.com/apache/datafusion/pull/14286#issuecomment-2655024463 > What is the panic error message? Panic error message is just > error=status: DataLoss, message: "data write failed: IO error: Error joining spawned task: task 93

Re: [PR] perf: Use DataFusion FilterExec for experimental native scans [datafusion-comet]

2025-02-12 Thread via GitHub
codecov-commenter commented on PR #1395: URL: https://github.com/apache/datafusion-comet/pull/1395#issuecomment-2655116606 ## [Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/1395?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_ca

Re: [PR] Add support for MS Varbinary(MAX) (#1714) [datafusion-sqlparser-rs]

2025-02-12 Thread via GitHub
TylerBrinks commented on PR #1715: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1715#issuecomment-2655131625 @alamb Anything else I need to address? I'm learning the standard Rust way of doing things, hoping I got it right based on the PR checks in place. -- This is an au

Re: [PR] Signature::Coercible with user defined implicit casting [datafusion]

2025-02-12 Thread via GitHub
jayzhan211 commented on code in PR #14440: URL: https://github.com/apache/datafusion/pull/14440#discussion_r1953648058 ## datafusion/expr-common/src/signature.rs: ## @@ -365,7 +365,12 @@ impl TypeSignature { } } -/// get all possible types for the given `Type

Re: [PR] fix: disable checking for uint_8 and uint_16 if complex type readers are enabled [datafusion-comet]

2025-02-12 Thread via GitHub
parthchandra commented on code in PR #1376: URL: https://github.com/apache/datafusion-comet/pull/1376#discussion_r1953648631 ## spark/src/test/scala/org/apache/comet/CometArrayExpressionSuite.scala: ## @@ -39,12 +39,14 @@ class CometArrayExpressionSuite extends CometTestBase wit

Re: [PR] Demonstrate EnforceSorting can remove a needed coalesce [datafusion]

2025-02-12 Thread via GitHub
alamb commented on code in PR #14637: URL: https://github.com/apache/datafusion/pull/14637#discussion_r1953591399 ## datafusion/core/tests/physical_optimizer/enforce_sorting.rs: ## @@ -2280,3 +2285,62 @@ async fn test_not_replaced_with_partial_sort_for_unbounded_input() -> Resu

Re: [PR] Example for using a separate threadpool for CPU bound work (try 2) [datafusion]

2025-02-12 Thread via GitHub
slyons commented on code in PR #14286: URL: https://github.com/apache/datafusion/pull/14286#discussion_r1953478390 ## datafusion-examples/examples/thread_pools.rs: ## @@ -0,0 +1,238 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license

Re: [PR] perf: Use DataFusion FilterExec for experimental native scans [datafusion-comet]

2025-02-12 Thread via GitHub
andygrove commented on code in PR #1395: URL: https://github.com/apache/datafusion-comet/pull/1395#discussion_r1953532505 ## spark/src/main/scala/org/apache/comet/serde/QueryPlanSerde.scala: ## @@ -2722,7 +2721,16 @@ object QueryPlanSerde extends Logging with ShimQueryPlanSerde

Re: [PR] fix: disable checking for uint_8 and uint_16 if complex type readers are enabled [datafusion-comet]

2025-02-12 Thread via GitHub
andygrove commented on code in PR #1376: URL: https://github.com/apache/datafusion-comet/pull/1376#discussion_r1953540948 ## spark/src/test/scala/org/apache/comet/CometCastSuite.scala: ## @@ -145,88 +145,134 @@ class CometCastSuite extends CometTestBase with AdaptiveSparkPlanHe

<    1   2   3   >