Re: [PR] refactor: replace `unwrap_or` with `unwrap_or_else` for improved lazy… [datafusion]

2025-04-30 Thread via GitHub
NevroHelios commented on code in PR #15841: URL: https://github.com/apache/datafusion/pull/15841#discussion_r2069897470 ## benchmarks/src/util/options.rs: ## @@ -70,14 +71,12 @@ impl CommonOpt { } /// Modify the existing config appropriately -pub fn update_config

Re: [PR] Support inferring new predicates to push down [datafusion]

2025-04-30 Thread via GitHub
xudong963 commented on code in PR #15906: URL: https://github.com/apache/datafusion/pull/15906#discussion_r2069866919 ## datafusion/optimizer/src/push_down_filter.rs: ## @@ -1382,6 +1386,73 @@ fn contain(e: &Expr, check_map: &HashMap) -> bool { is_contain } +/// Infers

Re: [PR] chore(deps): bump tokio from 1.44.1 to 1.44.2 [datafusion]

2025-04-30 Thread via GitHub
xudong963 merged PR #15900: URL: https://github.com/apache/datafusion/pull/15900 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@data

Re: [PR] Improve push down limit (logical optimizer rule) [datafusion]

2025-04-30 Thread via GitHub
xudong963 commented on code in PR #15744: URL: https://github.com/apache/datafusion/pull/15744#discussion_r2069865253 ## datafusion/core/tests/user_defined/user_defined_plan.rs: ## @@ -102,362 +96,10 @@ use datafusion_physical_plan::execution_plan::{Boundedness, EmissionType};

Re: [PR] Improve push down limit (logical optimizer rule) [datafusion]

2025-04-30 Thread via GitHub
xudong963 commented on code in PR #15744: URL: https://github.com/apache/datafusion/pull/15744#discussion_r2069864919 ## datafusion/core/tests/user_defined/user_defined_plan.rs: ## @@ -102,362 +96,10 @@ use datafusion_physical_plan::execution_plan::{Boundedness, EmissionType};

Re: [I] Make it easier to run TPCH queries with datafusion-cli [datafusion]

2025-04-30 Thread via GitHub
clflushopt commented on issue #14608: URL: https://github.com/apache/datafusion/issues/14608#issuecomment-2844075397 I agree with @alamb on this one, regarding the separation of creation & storing the files on disk explicitly. One suggestion I would propose is that I would add a scalar func

Re: [PR] fix: Add coercion rules for Float16 types [datafusion]

2025-04-30 Thread via GitHub
etseidl commented on PR #15816: URL: https://github.com/apache/datafusion/pull/15816#issuecomment-2843937271 > Would you be willing to add sqllogic tests for float16? Willing? Yes. Able to? We'll see 😅. I'll be back in office early next week and will give it a try. Thanks for the exa

Re: [PR] Reduce size of `Expr` struct [datafusion]

2025-04-30 Thread via GitHub
github-actions[bot] closed pull request #14366: Reduce size of `Expr` struct URL: https://github.com/apache/datafusion/pull/14366 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] feat: ORDER BY ALL [datafusion]

2025-04-30 Thread via GitHub
alamb commented on code in PR #15772: URL: https://github.com/apache/datafusion/pull/15772#discussion_r2069775315 ## datafusion/expr/src/expr.rs: ## @@ -701,6 +701,24 @@ impl TryCast { } } +/// OrderBy Expressions +pub enum OrderByExprs { +OrderByExprVec(Vec), +A

Re: [PR] add config parse_hex_as_fixed_size_binary [datafusion]

2025-04-30 Thread via GitHub
alamb closed pull request #15687: add config parse_hex_as_fixed_size_binary URL: https://github.com/apache/datafusion/pull/15687 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] add config parse_hex_as_fixed_size_binary [datafusion]

2025-04-30 Thread via GitHub
alamb commented on PR #15687: URL: https://github.com/apache/datafusion/pull/15687#issuecomment-2843932354 I believe we went with an alternate approach in - https://github.com/apache/datafusion/pull/15726 -- This is an automated message from the Apache Git Service. To respond to the me

Re: [PR] fix: Add coercion rules for Float16 types [datafusion]

2025-04-30 Thread via GitHub
alamb commented on code in PR #15816: URL: https://github.com/apache/datafusion/pull/15816#discussion_r2069770608 ## datafusion/expr-common/src/type_coercion/binary.rs: ## @@ -931,6 +931,7 @@ fn coerce_numeric_type_to_decimal(numeric_type: &DataType) -> Option { Int32

Re: [PR] hash join: add build-side join keys to memory accounting [datafusion]

2025-04-30 Thread via GitHub
github-actions[bot] closed pull request #14222: hash join: add build-side join keys to memory accounting URL: https://github.com/apache/datafusion/pull/14222 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above t

Re: [PR] start refactoring process by setting up base + init [datafusion]

2025-04-30 Thread via GitHub
github-actions[bot] closed pull request #14306: start refactoring process by setting up base + init URL: https://github.com/apache/datafusion/pull/14306 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [PR] Support marking columns as system columns via Field's metadata [datafusion]

2025-04-30 Thread via GitHub
github-actions[bot] closed pull request #14362: Support marking columns as system columns via Field's metadata URL: https://github.com/apache/datafusion/pull/14362 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL a

Re: [PR] Add xxhash algorithms in SQL and expression api [datafusion]

2025-04-30 Thread via GitHub
github-actions[bot] commented on PR #14367: URL: https://github.com/apache/datafusion/pull/14367#issuecomment-2843925031 Thank you for your contribution. Unfortunately, this pull request is stale because it has been open 60 days with no activity. Please remove the stale label or comment or

Re: [PR] disable coercison for unmatched struct type [datafusion]

2025-04-30 Thread via GitHub
github-actions[bot] commented on PR #14409: URL: https://github.com/apache/datafusion/pull/14409#issuecomment-2843925012 Thank you for your contribution. Unfortunately, this pull request is stale because it has been open 60 days with no activity. Please remove the stale label or comment or

Re: [PR] feat: decode() expression when using 'utf-8' encoding [datafusion-comet]

2025-04-30 Thread via GitHub
codecov-commenter commented on PR #1697: URL: https://github.com/apache/datafusion-comet/pull/1697#issuecomment-2843924454 ## [Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/1697?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_ca

Re: [PR] Improve push down limit (logical optimizer rule) [datafusion]

2025-04-30 Thread via GitHub
alamb commented on code in PR #15744: URL: https://github.com/apache/datafusion/pull/15744#discussion_r2069768717 ## datafusion/core/tests/user_defined/user_defined_plan.rs: ## @@ -102,362 +96,10 @@ use datafusion_physical_plan::execution_plan::{Boundedness, EmissionType}; use

Re: [PR] Improve sqllogictest error reporting [datafusion]

2025-04-30 Thread via GitHub
alamb commented on PR #15905: URL: https://github.com/apache/datafusion/pull/15905#issuecomment-2843921122 > > I would prefer it to be limited to say the first 10. Otherwise this looks good. > > 👍 no strong opinion here, I imagine that if more than 10 tests fail, it means that someth

Re: [PR] Support inferring new predicates to push down [datafusion]

2025-04-30 Thread via GitHub
alamb commented on code in PR #15906: URL: https://github.com/apache/datafusion/pull/15906#discussion_r2069766950 ## datafusion/optimizer/src/push_down_filter.rs: ## @@ -1382,6 +1386,73 @@ fn contain(e: &Expr, check_map: &HashMap) -> bool { is_contain } +/// Infers new

Re: [PR] Add `union_tag` scalar function [datafusion]

2025-04-30 Thread via GitHub
alamb merged PR #14687: URL: https://github.com/apache/datafusion/pull/14687 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [I] Add `union_tag` function [datafusion]

2025-04-30 Thread via GitHub
alamb closed issue #11080: Add `union_tag` function URL: https://github.com/apache/datafusion/issues/11080 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mai

Re: [PR] Add `union_tag` scalar function [datafusion]

2025-04-30 Thread via GitHub
alamb commented on PR #14687: URL: https://github.com/apache/datafusion/pull/14687#issuecomment-2843915892 Thank you @Omega359 and @gstvg for your patience -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [PR] Migrate Optimizer tests to insta, part3 [datafusion]

2025-04-30 Thread via GitHub
qstommyshu commented on code in PR #15893: URL: https://github.com/apache/datafusion/pull/15893#discussion_r2069761180 ## datafusion/optimizer/src/push_down_filter.rs: ## @@ -2001,24 +2055,16 @@ mod tests { .filter(col("sum(test.c)").gt(lit(10i64)))? .b

Re: [PR] Migrate Optimizer tests to insta, part3 [datafusion]

2025-04-30 Thread via GitHub
qstommyshu commented on code in PR #15893: URL: https://github.com/apache/datafusion/pull/15893#discussion_r2069761320 ## datafusion/optimizer/src/push_down_filter.rs: ## @@ -3039,21 +2974,14 @@ Projection: a, b .filter(and(col("b").gt(lit(10i64)), col("d").gt(lit(1

Re: [PR] fix: correctly specify the nullability of `map_values` return type [datafusion]

2025-04-30 Thread via GitHub
alamb merged PR #15901: URL: https://github.com/apache/datafusion/pull/15901 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [I] [Experimental scans] schema adapter does not apply required schema for structs within lists [datafusion-comet]

2025-04-30 Thread via GitHub
comphead commented on issue #1681: URL: https://github.com/apache/datafusion-comet/issues/1681#issuecomment-2843894989 Narrowing it down, the correctness issue comes from `ListExtract` -- This is an automated message from the Apache Git Service. To respond to the message, please log on t

Re: [PR] fix: correctly specify the nullability of `map_values` return type [datafusion]

2025-04-30 Thread via GitHub
alamb commented on PR #15901: URL: https://github.com/apache/datafusion/pull/15901#issuecomment-2843888911 Thanks again @rluvaton -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific com

Re: [I] Sorting is not maintained after using a window function [datafusion]

2025-04-30 Thread via GitHub
akurmustafa commented on issue #15833: URL: https://github.com/apache/datafusion/issues/15833#issuecomment-2843844600 I see, one alternative workaround would be adding `ORDER BY userPrimaryKey ` to your window function `row_number`. I am not sure how to do it with `row_number` but window fu

Re: [PR] Factor out Substrait consumers into separate files [datafusion]

2025-04-30 Thread via GitHub
alamb commented on PR #15794: URL: https://github.com/apache/datafusion/pull/15794#issuecomment-2843825721 I merged up and assuming the CI passes I'll merge this PR in -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use th

Re: [PR] Fix allow_update_branch [datafusion]

2025-04-30 Thread via GitHub
alamb commented on PR #15904: URL: https://github.com/apache/datafusion/pull/15904#issuecomment-2843824982 https://github.com/user-attachments/assets/4245ba25-0c4d-4a98-a20d-0dfe9d843388"; /> It is now available! -- This is an automated message from the Apache Git Service. To respo

Re: [PR] Fix allow_update_branch [datafusion]

2025-04-30 Thread via GitHub
alamb commented on PR #15904: URL: https://github.com/apache/datafusion/pull/15904#issuecomment-2843820293 Thanks for following up @xudong963 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] Fix allow_update_branch [datafusion]

2025-04-30 Thread via GitHub
alamb merged PR #15904: URL: https://github.com/apache/datafusion/pull/15904 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] chore: Improve reporting of fallback reasons for CollectLimit [datafusion-comet]

2025-04-30 Thread via GitHub
kazuyukitanimura commented on code in PR #1694: URL: https://github.com/apache/datafusion-comet/pull/1694#discussion_r2069667304 ## spark/src/main/scala/org/apache/comet/rules/CometExecRule.scala: ## @@ -196,18 +198,34 @@ case class CometExecRule(session: SparkSession) extends

[I] Add memory profiling / logging [datafusion-comet]

2025-04-30 Thread via GitHub
andygrove opened a new issue, #1701: URL: https://github.com/apache/datafusion-comet/issues/1701 ### What is the problem the feature request solves? I would like to add a config that enabled memory profiling so that we can monitor JVM and native memory usage throughout the lifetime of

Re: [I] Add memory profiling / logging [datafusion-comet]

2025-04-30 Thread via GitHub
andygrove commented on issue #1701: URL: https://github.com/apache/datafusion-comet/issues/1701#issuecomment-2843656743 I plan on working on this -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] Feat: support bit_count function [datafusion-comet]

2025-04-30 Thread via GitHub
mbutrovich commented on PR #1602: URL: https://github.com/apache/datafusion-comet/pull/1602#issuecomment-2843603389 > DataFusion's bit_count has same behavior with Spark 's bit_count function Spark If this is the case, can we delegate to a `ScalarFunc` expression instead of creating

[PR] Fix: parsing ident starting with underscore in certain dialects [datafusion-sqlparser-rs]

2025-04-30 Thread via GitHub
MohamedAbdeen21 opened a new pull request, #1835: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1835 The dialects that support underscore as a separator in numeric literals used to parse `._123` as a number, which is fine. But that means that something like `._abc` would be pa

Re: [PR] refactor: replace `unwrap_or` with `unwrap_or_else` for improved lazy… [datafusion]

2025-04-30 Thread via GitHub
alamb commented on code in PR #15841: URL: https://github.com/apache/datafusion/pull/15841#discussion_r2069576418 ## benchmarks/src/util/options.rs: ## @@ -70,14 +71,12 @@ impl CommonOpt { } /// Modify the existing config appropriately -pub fn update_config(&self

Re: [PR] refactor: replace `unwrap_or` with `unwrap_or_else` for improved lazy… [datafusion]

2025-04-30 Thread via GitHub
NevroHelios commented on PR #15841: URL: https://github.com/apache/datafusion/pull/15841#issuecomment-2843457081 Hi @alamb, apologies for the delay, I had exams recently. I've run the tests locally and everything looks good now. When you get a chance, could you please re-trigger the CI? Tha

Re: [PR] Feat: support bit_count function [datafusion-comet]

2025-04-30 Thread via GitHub
kazuyukitanimura commented on code in PR #1602: URL: https://github.com/apache/datafusion-comet/pull/1602#discussion_r2069467729 ## native/spark-expr/src/bitwise_funcs/bitwise_count.rs: ## @@ -0,0 +1,177 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or mo

Re: [PR] Allow stored procedures to be defined without `BEGIN`/`END` [datafusion-sqlparser-rs]

2025-04-30 Thread via GitHub
aharpervc commented on code in PR #1834: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1834#discussion_r2069458988 ## src/parser/mod.rs: ## @@ -15095,14 +15099,28 @@ impl<'a> Parser<'a> { let name = self.parse_object_name(false)?; let params = se

Re: [PR] Support some of pipe operators [datafusion-sqlparser-rs]

2025-04-30 Thread via GitHub
simonvandel commented on PR #1759: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1759#issuecomment-2843224034 > @simonvandel could you take a look at the CI issues when you get the time? I think CI should be green with https://github.com/apache/datafusion-sqlparser-rs/p

Re: [PR] feat: decode() expression when using 'utf-8' encoding [datafusion-comet]

2025-04-30 Thread via GitHub
mbutrovich commented on PR #1697: URL: https://github.com/apache/datafusion-comet/pull/1697#issuecomment-2843202728 I need to do a shim for `StringDecode` in Spark 4.0. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

Re: [PR] Add hooks to `SchemaAdapter` to add custom column generators [datafusion]

2025-04-30 Thread via GitHub
adriangb commented on PR #15261: URL: https://github.com/apache/datafusion/pull/15261#issuecomment-2843120846 Looking at how filter pushdown interacts with partition columns I think this will be a huge improvement for that. Currently the partition values get bound when the `FileStream` is

Re: [PR] chore: Prepare 0.8.1 release [branch-0.8] [datafusion-comet]

2025-04-30 Thread via GitHub
andygrove merged PR #1699: URL: https://github.com/apache/datafusion-comet/pull/1699 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@

Re: [I] bug: regexp_match not working? [datafusion]

2025-04-30 Thread via GitHub
Omega359 commented on issue #15872: URL: https://github.com/apache/datafusion/issues/15872#issuecomment-2843029954 @juju4 specifically for your use case I think changing the function to regexp_like will work: ```sql > select * from test where regexp_like(test.b, '(.){4,}'); +---+---

Re: [I] bug: regexp_match not working? [datafusion]

2025-04-30 Thread via GitHub
Omega359 commented on issue #15872: URL: https://github.com/apache/datafusion/issues/15872#issuecomment-2843022789 Note that this syntax is supported in DuckDB: ```sql D create table test (a int, b varchar); D insert into test values (1, 'one'); D insert into test values (2,

Re: [I] bug: regexp_match not working? [datafusion]

2025-04-30 Thread via GitHub
Omega359 commented on issue #15872: URL: https://github.com/apache/datafusion/issues/15872#issuecomment-2843008738 Can someone update the title of this issue to reflect the true nature of the enhancement? I don't think this has anything specifically to do with regexp_match except that was h

Re: [PR] feat: regexp_replace() expression with no starting offset [datafusion-comet]

2025-04-30 Thread via GitHub
codecov-commenter commented on PR #1700: URL: https://github.com/apache/datafusion-comet/pull/1700#issuecomment-2842878963 ## [Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/1700?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_ca

Re: [I] Sorting is not maintained after using a window function [datafusion]

2025-04-30 Thread via GitHub
daphnenhuch-at commented on issue #15833: URL: https://github.com/apache/datafusion/issues/15833#issuecomment-2842967139 That still doesn't solve my problem here. I added the extra sort on both columns as suggested. Now it seems the userPrimaryKey is sorted and the fileRowNumbers start at 1

[I] RFC: Add features to reduce dependencies on core crate [datafusion]

2025-04-30 Thread via GitHub
timsaucer opened a new issue, #15907: URL: https://github.com/apache/datafusion/issues/15907 ### Is your feature request related to a problem or challenge? Currently when you add the `datafusion` crate, it pulls in many dependencies that are not needed for all use cases. We have two s

Re: [I] Avro reader fails when query columns are reordered in SELECT statement [datafusion]

2025-04-30 Thread via GitHub
comphead closed issue #15839: Avro reader fails when query columns are reordered in SELECT statement URL: https://github.com/apache/datafusion/issues/15839 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

Re: [I] Support file row index / row id for each file in a `ListingTableProvider` [datafusion]

2025-04-30 Thread via GitHub
daphnenhuch-at commented on issue #15892: URL: https://github.com/apache/datafusion/issues/15892#issuecomment-2842930676 By the way, this is the exact bug I was referencing here: https://github.com/apache/datafusion/issues/15833 I don't actually need to maintain the row number for eac

Re: [I] [DISCUSSION] DataFusion Road Map: Q3-Q4 2025 [datafusion]

2025-04-30 Thread via GitHub
Omega359 commented on issue #15878: URL: https://github.com/apache/datafusion/issues/15878#issuecomment-2842927926 My list includes: - #13527 (finally finish this) - #14837 (adopt my async UDF's to use this and validate) - #15394 - #8282 (Specifically allowing changing default tz

Re: [PR] feat: Set/cancel with job tag and make max broadcast table size configurable [datafusion-comet]

2025-04-30 Thread via GitHub
parthchandra commented on code in PR #1693: URL: https://github.com/apache/datafusion-comet/pull/1693#discussion_r2069227147 ## spark/src/main/spark-3.4/org/apache/comet/shims/ShimCometBroadcastExchangeExec.scala: ## @@ -0,0 +1,49 @@ +/* + * Licensed to the Apache Software Found

Re: [PR] Support inferring new predicates to push down [datafusion]

2025-04-30 Thread via GitHub
Omega359 commented on PR #15906: URL: https://github.com/apache/datafusion/pull/15906#issuecomment-2842890086 Perhaps running clickbench or equivalent (assuming clickbend wouldn't trigger this optimization) to showcase the difference would be good? -- This is an automated message from the

Re: [PR] fix(avro): Respect projection order in Avro reader [datafusion]

2025-04-30 Thread via GitHub
comphead merged PR #15840: URL: https://github.com/apache/datafusion/pull/15840 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

Re: [PR] fix: correctly specify the nullability of `map_values` return type [datafusion]

2025-04-30 Thread via GitHub
alamb commented on PR #15901: URL: https://github.com/apache/datafusion/pull/15901#issuecomment-284283 > Thanks, there is another failure in CI which is unrelated I re-started the failed CI check -- This is an automated message from the Apache Git Service. To respond to the mess

Re: [PR] [wip] Add scripts for running benchmarks on EC2 [datafusion-comet]

2025-04-30 Thread via GitHub
anuragmantri commented on code in PR #1654: URL: https://github.com/apache/datafusion-comet/pull/1654#discussion_r2069180747 ## dev/benchmarks/setup.sh: ## @@ -0,0 +1,44 @@ +#!/bin/bash +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor licen

Re: [PR] [wip] Add scripts for running benchmarks on EC2 [datafusion-comet]

2025-04-30 Thread via GitHub
anuragmantri commented on code in PR #1654: URL: https://github.com/apache/datafusion-comet/pull/1654#discussion_r2069177697 ## dev/benchmarks/setup.sh: ## @@ -0,0 +1,44 @@ +#!/bin/bash +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor licen

Re: [PR] Migrate Optimizer tests to insta, part3 [datafusion]

2025-04-30 Thread via GitHub
qstommyshu commented on code in PR #15893: URL: https://github.com/apache/datafusion/pull/15893#discussion_r2069132932 ## datafusion/optimizer/src/push_down_filter.rs: ## @@ -3039,21 +2974,14 @@ Projection: a, b .filter(and(col("b").gt(lit(10i64)), col("d").gt(lit(1

Re: [PR] chore: Prepare 0.8.1 release [branch-0.8] [datafusion-comet]

2025-04-30 Thread via GitHub
andygrove commented on PR #1699: URL: https://github.com/apache/datafusion-comet/pull/1699#issuecomment-2842678170 @parthchandra @huaxingao Hopefully CI will pass this time - could I get a review? -- This is an automated message from the Apache Git Service. To respond to the message, ple

Re: [PR] fix: fold cast null to substrait typed null [datafusion]

2025-04-30 Thread via GitHub
vbarua commented on code in PR #15854: URL: https://github.com/apache/datafusion/pull/15854#discussion_r2069115514 ## datafusion/substrait/src/logical_plan/producer.rs: ## @@ -1590,6 +1590,21 @@ pub fn from_cast( schema: &DFSchemaRef, ) -> Result { let Cast { expr, da

Re: [PR] feat: regexp_replace() expression with no starting offset [datafusion-comet]

2025-04-30 Thread via GitHub
mbutrovich commented on PR #1700: URL: https://github.com/apache/datafusion-comet/pull/1700#issuecomment-2842651369 I should probably add some better tests, since now that we're checking the flag that means we won't get coverage in the Spark SQL tests -- we'll just fall back. -- This is

Re: [PR] Factor out Substrait consumers into separate files [datafusion]

2025-04-30 Thread via GitHub
vbarua commented on PR #15794: URL: https://github.com/apache/datafusion/pull/15794#issuecomment-2842642259 @alamb this PR should be ready for review I've checked that this PR consists purely of code moves with no functional changes, at this point in time, aside from `from_substrait_t

Re: [PR] Improve sqllogictest error reporting [datafusion]

2025-04-30 Thread via GitHub
gabotechs commented on PR #15905: URL: https://github.com/apache/datafusion/pull/15905#issuecomment-2842631495 > I would prefer it to be limited to say the first 10. Otherwise this looks good. 👍 no strong opinion here, I imagine that if more than 10 tests fail, it means that somethin

[I] Can't parse valid Snowflake compound expression [datafusion-sqlparser-rs]

2025-04-30 Thread via GitHub
ramnes opened a new issue, #1833: URL: https://github.com/apache/datafusion-sqlparser-rs/issues/1833 Hey there, thanks for the great project! I'm encountering an issue with the following query, which is valid in Snowflake: ```sql SELECT v.$2 FROM (VALUES (1, 'value1'), (2, '

Re: [PR] Improve sqllogictest error reporting [datafusion]

2025-04-30 Thread via GitHub
Omega359 commented on PR #15905: URL: https://github.com/apache/datafusion/pull/15905#issuecomment-2842522088 ``` Runs all the sqllogictests in a single file even if some of them fail, reporting all failures, instead of just the first one. ``` I would prefer it to be limited to say

Re: [PR] Consolidate feature flags into configuration guide [datafusion]

2025-04-30 Thread via GitHub
Omega359 commented on PR #14657: URL: https://github.com/apache/datafusion/pull/14657#issuecomment-2842538485 This is a good ticket and shouldn't be autoclosed. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL a

Re: [PR] feat: regexp_replace() expression with no starting offset [datafusion-comet]

2025-04-30 Thread via GitHub
andygrove commented on code in PR #1700: URL: https://github.com/apache/datafusion-comet/pull/1700#discussion_r2068997181 ## spark/src/main/scala/org/apache/comet/serde/QueryPlanSerde.scala: ## @@ -1430,6 +1430,27 @@ object QueryPlanSerde extends Logging with CometExprShim {

Re: [PR] feat: regexp_replace() expression with no starting offset [datafusion-comet]

2025-04-30 Thread via GitHub
andygrove commented on code in PR #1700: URL: https://github.com/apache/datafusion-comet/pull/1700#discussion_r2068997865 ## spark/src/test/scala/org/apache/comet/CometFuzzTestSuite.scala: ## @@ -188,6 +188,20 @@ class CometFuzzTestSuite extends CometTestBase with AdaptiveSpark

Re: [PR] feat: regexp_replace() expression with no starting offset [datafusion-comet]

2025-04-30 Thread via GitHub
andygrove commented on code in PR #1700: URL: https://github.com/apache/datafusion-comet/pull/1700#discussion_r2068995618 ## spark/src/main/scala/org/apache/comet/serde/QueryPlanSerde.scala: ## @@ -1430,6 +1430,27 @@ object QueryPlanSerde extends Logging with CometExprShim {

Re: [PR] chore: Move Comet rules into their own files [datafusion-comet]

2025-04-30 Thread via GitHub
andygrove merged PR #1695: URL: https://github.com/apache/datafusion-comet/pull/1695 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@

Re: [I] Refactor CometSparkSessionExtensions.scala [datafusion-comet]

2025-04-30 Thread via GitHub
andygrove closed issue #1669: Refactor CometSparkSessionExtensions.scala URL: https://github.com/apache/datafusion-comet/issues/1669 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comme

Re: [PR] feat: More warning info for users [datafusion-comet]

2025-04-30 Thread via GitHub
andygrove commented on code in PR #1667: URL: https://github.com/apache/datafusion-comet/pull/1667#discussion_r2068991765 ## spark/src/main/scala/org/apache/comet/DataTypeSupport.scala: ## @@ -33,20 +37,25 @@ trait DataTypeSupport { * @return * true if the datatype is

Re: [PR] Migrate Optimizer tests to insta, part3 [datafusion]

2025-04-30 Thread via GitHub
blaginin commented on code in PR #15893: URL: https://github.com/apache/datafusion/pull/15893#discussion_r2068983359 ## datafusion/optimizer/src/push_down_filter.rs: ## @@ -3039,21 +2974,14 @@ Projection: a, b .filter(and(col("b").gt(lit(10i64)), col("d").gt(lit(10i

Re: [PR] chore: Prepare 0.8.1 release [branch-0.8] [datafusion-comet]

2025-04-30 Thread via GitHub
andygrove commented on code in PR #1699: URL: https://github.com/apache/datafusion-comet/pull/1699#discussion_r2068974642 ## .github/workflows/benchmark.yml: ## @@ -70,6 +70,7 @@ jobs: with: path: ./tpcds-sf-1 key: tpcds-${{ hashFiles('.github/work

Re: [PR] Migrate Optimizer tests to insta, part3 [datafusion]

2025-04-30 Thread via GitHub
blaginin commented on code in PR #15893: URL: https://github.com/apache/datafusion/pull/15893#discussion_r2068974200 ## datafusion/optimizer/src/push_down_filter.rs: ## @@ -2001,24 +2055,16 @@ mod tests { .filter(col("sum(test.c)").gt(lit(10i64)))? .bui

Re: [PR] Migrate Optimizer tests to insta, part3 [datafusion]

2025-04-30 Thread via GitHub
blaginin commented on code in PR #15893: URL: https://github.com/apache/datafusion/pull/15893#discussion_r2068973338 ## datafusion/optimizer/src/push_down_filter.rs: ## @@ -1840,22 +1889,15 @@ mod tests { .filter(col("a").eq(lit(1i64)))? .build()?; -

Re: [PR] Support inferring new predicates to push down [datafusion]

2025-04-30 Thread via GitHub
xudong963 commented on code in PR #15906: URL: https://github.com/apache/datafusion/pull/15906#discussion_r2068952607 ## datafusion/expr/src/expr_rewriter/mod.rs: ## @@ -131,13 +131,25 @@ pub fn normalize_sorts( } /// Recursively replace all [`Column`] expressions in a given

Re: [PR] Support inferring new predicates to push down [datafusion]

2025-04-30 Thread via GitHub
xudong963 commented on code in PR #15906: URL: https://github.com/apache/datafusion/pull/15906#discussion_r2068940718 ## datafusion/optimizer/src/push_down_filter.rs: ## @@ -1382,6 +1386,73 @@ fn contain(e: &Expr, check_map: &HashMap) -> bool { is_contain } +/// Infers

Re: [PR] chore: Prepare 0.8.1 release [branch-0.8] [datafusion-comet]

2025-04-30 Thread via GitHub
andygrove commented on PR #1699: URL: https://github.com/apache/datafusion-comet/pull/1699#issuecomment-2842407052 hmm that did not work ``` /usr/bin/docker exec f44b663cc7dcb9c368890c652d8e41fd738b91080e1f7e561236b850254ac78b sh -c "cat /etc/*release | grep ^ID" Error: Faile

Re: [PR] Support inferring new predicates to push down [datafusion]

2025-04-30 Thread via GitHub
xudong963 commented on code in PR #15906: URL: https://github.com/apache/datafusion/pull/15906#discussion_r2068938611 ## datafusion/sqllogictest/test_files/push_down_filter.slt: ## @@ -259,3 +259,35 @@ logical_plan TableScan: t projection=[a], full_filters=[CAST(t.a AS Utf8) =

[PR] Support inferring new predicates to push down [datafusion]

2025-04-30 Thread via GitHub
xudong963 opened a new pull request, #15906: URL: https://github.com/apache/datafusion/pull/15906 ## Which issue does this PR close? - Closes #. ## Rationale for this change We can infer new predicates from existing predicates to push down to reduce IO an

Re: [PR] chore: Prepare 0.8.1 release [branch-0.8] [datafusion-comet]

2025-04-30 Thread via GitHub
andygrove commented on PR #1699: URL: https://github.com/apache/datafusion-comet/pull/1699#issuecomment-2842396751 > @andygrove CI failed. Could you please take a look? I deleted caches and am re-running the failed jobs -- This is an automated message from the Apache Git Service. T

Re: [PR] Add all missing table options to be handled in any order [datafusion-sqlparser-rs]

2025-04-30 Thread via GitHub
tomershaniii commented on PR #1747: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1747#issuecomment-2842369273 @iffyio See last commit, we should be god to go :-) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHu

[PR] Regexp replace [datafusion-comet]

2025-04-30 Thread via GitHub
mbutrovich opened a new pull request, #1700: URL: https://github.com/apache/datafusion-comet/pull/1700 ## Which issue does this PR close? Closes #. ## Rationale for this change Preliminary support for Spark's `regexp_replace`. Spark optionally allows for

Re: [I] Tracking: speed up the logical optimizer [datafusion]

2025-04-30 Thread via GitHub
xudong963 commented on issue #15775: URL: https://github.com/apache/datafusion/issues/15775#issuecomment-2842255768 After https://github.com/apache/datafusion/pull/15744, I'll close the tracking. Let's continue if we find the specific bottleneck -- This is an automated message from the Ap

Re: [PR] feat: make execution_graph.stages() public [datafusion-ballista]

2025-04-30 Thread via GitHub
milenkovicm merged PR #1256: URL: https://github.com/apache/datafusion-ballista/pull/1256 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubsc

Re: [PR] doc: Update known users docs [datafusion]

2025-04-30 Thread via GitHub
comphead commented on PR #15895: URL: https://github.com/apache/datafusion/pull/15895#issuecomment-284619 Thanks @alamb for the review -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spec

Re: [PR] doc: Update known users docs [datafusion]

2025-04-30 Thread via GitHub
comphead merged PR #15895: URL: https://github.com/apache/datafusion/pull/15895 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

Re: [PR] Improve support for cursors for SQL Server [datafusion-sqlparser-rs]

2025-04-30 Thread via GitHub
aharpervc commented on code in PR #1831: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1831#discussion_r2068787347 ## src/test_utils.rs: ## @@ -166,6 +168,30 @@ impl TestedDialects { only_statement } +/// The same as [`one_statement_parses_to`]

Re: [PR] Improve support for cursors for SQL Server [datafusion-sqlparser-rs]

2025-04-30 Thread via GitHub
aharpervc commented on code in PR #1831: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1831#discussion_r2068774017 ## src/test_utils.rs: ## @@ -166,6 +168,30 @@ impl TestedDialects { only_statement } +/// The same as [`one_statement_parses_to`]

Re: [PR] decode(col, 'UTF-8') support using cast [datafusion-comet]

2025-04-30 Thread via GitHub
mbutrovich commented on code in PR #1697: URL: https://github.com/apache/datafusion-comet/pull/1697#discussion_r2068759130 ## spark/src/main/scala/org/apache/comet/serde/QueryPlanSerde.scala: ## @@ -1430,6 +1430,22 @@ object QueryPlanSerde extends Logging with CometExprShim {

Re: [PR] decode(col, 'UTF-8') support using cast [datafusion-comet]

2025-04-30 Thread via GitHub
mbutrovich commented on code in PR #1697: URL: https://github.com/apache/datafusion-comet/pull/1697#discussion_r2068763268 ## spark/src/main/scala/org/apache/comet/serde/QueryPlanSerde.scala: ## @@ -1430,6 +1430,22 @@ object QueryPlanSerde extends Logging with CometExprShim {

Re: [PR] chore: Prepare 0.8.1 release [branch-0.8] [datafusion-comet]

2025-04-30 Thread via GitHub
huaxingao commented on PR #1699: URL: https://github.com/apache/datafusion-comet/pull/1699#issuecomment-2842133847 @andygrove CI failed. Could you please take a look? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use th

Re: [PR] decode(col, 'UTF-8') support using cast [datafusion-comet]

2025-04-30 Thread via GitHub
andygrove commented on code in PR #1697: URL: https://github.com/apache/datafusion-comet/pull/1697#discussion_r2068758199 ## spark/src/main/scala/org/apache/comet/serde/QueryPlanSerde.scala: ## @@ -1430,6 +1430,22 @@ object QueryPlanSerde extends Logging with CometExprShim {

Re: [PR] decode(col, 'UTF-8') support using cast [datafusion-comet]

2025-04-30 Thread via GitHub
mbutrovich commented on code in PR #1697: URL: https://github.com/apache/datafusion-comet/pull/1697#discussion_r2068751397 ## spark/src/main/scala/org/apache/comet/serde/QueryPlanSerde.scala: ## @@ -1430,6 +1430,22 @@ object QueryPlanSerde extends Logging with CometExprShim {

  1   2   >