Re: [PR] doc: Update known users docs [datafusion]

2025-04-30 Thread via GitHub
comphead commented on PR #15895: URL: https://github.com/apache/datafusion/pull/15895#issuecomment-284619 Thanks @alamb for the review -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spec

Re: [PR] doc: Update known users docs [datafusion]

2025-04-30 Thread via GitHub
comphead merged PR #15895: URL: https://github.com/apache/datafusion/pull/15895 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

Re: [PR] feat: make execution_graph.stages() public [datafusion-ballista]

2025-04-30 Thread via GitHub
milenkovicm merged PR #1256: URL: https://github.com/apache/datafusion-ballista/pull/1256 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubsc

Re: [I] Tracking: speed up the logical optimizer [datafusion]

2025-04-30 Thread via GitHub
xudong963 commented on issue #15775: URL: https://github.com/apache/datafusion/issues/15775#issuecomment-2842255768 After https://github.com/apache/datafusion/pull/15744, I'll close the tracking. Let's continue if we find the specific bottleneck -- This is an automated message from the Ap

[PR] Regexp replace [datafusion-comet]

2025-04-30 Thread via GitHub
mbutrovich opened a new pull request, #1700: URL: https://github.com/apache/datafusion-comet/pull/1700 ## Which issue does this PR close? Closes #. ## Rationale for this change Preliminary support for Spark's `regexp_replace`. Spark optionally allows for

Re: [PR] Add all missing table options to be handled in any order [datafusion-sqlparser-rs]

2025-04-30 Thread via GitHub
tomershaniii commented on PR #1747: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1747#issuecomment-2842369273 @iffyio See last commit, we should be god to go :-) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHu

Re: [PR] feat: Set/cancel with job tag and make max broadcast table size configurable [datafusion-comet]

2025-04-30 Thread via GitHub
codecov-commenter commented on PR #1693: URL: https://github.com/apache/datafusion-comet/pull/1693#issuecomment-2841370240 ## [Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/1693?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_ca

Re: [PR] Resolved bug in `parse_function_arg` [datafusion-sqlparser-rs]

2025-04-30 Thread via GitHub
LucaCappelletti94 commented on code in PR #1826: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1826#discussion_r2068068131 ## src/parser/mod.rs: ## @@ -5199,12 +5199,22 @@ impl<'a> Parser<'a> { // parse: [ argname ] argtype let mut name = None;

[I] Add metadata support for Aggregate and Window Functions [datafusion]

2025-04-30 Thread via GitHub
timsaucer opened a new issue, #15902: URL: https://github.com/apache/datafusion/issues/15902 ### Is your feature request related to a problem or challenge? This is a follow on to https://github.com/apache/datafusion/pull/15646 Now that we have metadata handling for scalar UDFs w

Re: [PR] Fix allow_update_branch [datafusion]

2025-04-30 Thread via GitHub
xudong963 commented on PR #15904: URL: https://github.com/apache/datafusion/pull/15904#issuecomment-2842017045 After the PR is merged, I hope I can see the "update branch" button in the PR: https://github.com/apache/datafusion/pull/15900 -- This is an automated message from the Apache Git

Re: [PR] Fix allow_update_branch [datafusion]

2025-04-30 Thread via GitHub
xudong963 commented on code in PR #15904: URL: https://github.com/apache/datafusion/pull/15904#discussion_r2068687642 ## .asf.yaml: ## @@ -50,7 +50,7 @@ github: main: required_pull_request_reviews: required_approving_review_count: 1 - pull_request: + pull_

Re: [PR] fix: correctly specify the nullability of `map_values` return type [datafusion]

2025-04-30 Thread via GitHub
rluvaton commented on PR #15901: URL: https://github.com/apache/datafusion/pull/15901#issuecomment-2842037364 Thanks, there is another failure in CI which is unrelated -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use t

Re: [PR] fix: fold cast null to substrait typed null [datafusion]

2025-04-30 Thread via GitHub
discord9 commented on PR #15854: URL: https://github.com/apache/datafusion/pull/15854#issuecomment-2842039530 Realized `try_cast` have similiar problem, will try to fix it in next PR, the root issue is that a `ScalarValue::Null` can't be properly translate to substrait's `Null(DataType)

Re: [PR] fix(avro): Respect projection order in Avro reader [datafusion]

2025-04-30 Thread via GitHub
nantunes commented on PR #15840: URL: https://github.com/apache/datafusion/pull/15840#issuecomment-2842045510 Integrated review suggestions and rebased. Also confirmed that the test added to `avro.slt` fails before the fix, i.e. in the current main branch. -- This is an automated messa

Re: [PR] Add `CREATE TRIGGER` support for SQL Server [datafusion-sqlparser-rs]

2025-04-30 Thread via GitHub
aharpervc commented on code in PR #1810: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1810#discussion_r2068726100 ## src/dialect/mssql.rs: ## @@ -215,6 +218,78 @@ impl MsSqlDialect { })) } +/// Parse a SQL CREATE statement +fn parse_create

Re: [PR] Add `CREATE TRIGGER` support for SQL Server [datafusion-sqlparser-rs]

2025-04-30 Thread via GitHub
aharpervc commented on code in PR #1810: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1810#discussion_r2068726100 ## src/dialect/mssql.rs: ## @@ -215,6 +218,78 @@ impl MsSqlDialect { })) } +/// Parse a SQL CREATE statement +fn parse_create

Re: [PR] decode(col, 'UTF-8') support using cast [datafusion-comet]

2025-04-30 Thread via GitHub
andygrove commented on code in PR #1697: URL: https://github.com/apache/datafusion-comet/pull/1697#discussion_r2068740883 ## spark/src/main/scala/org/apache/comet/serde/QueryPlanSerde.scala: ## @@ -1430,6 +1430,22 @@ object QueryPlanSerde extends Logging with CometExprShim {

Re: [PR] Add `CREATE TRIGGER` support for SQL Server [datafusion-sqlparser-rs]

2025-04-30 Thread via GitHub
aharpervc commented on code in PR #1810: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1810#discussion_r2068747275 ## src/dialect/mssql.rs: ## @@ -215,6 +218,78 @@ impl MsSqlDialect { })) } +/// Parse a SQL CREATE statement +fn parse_create

Re: [PR] Migrate Optimizer tests to insta, part3 [datafusion]

2025-04-30 Thread via GitHub
qstommyshu commented on code in PR #15893: URL: https://github.com/apache/datafusion/pull/15893#discussion_r2068682558 ## datafusion/optimizer/src/push_down_filter.rs: ## @@ -3039,21 +2974,14 @@ Projection: a, b .filter(and(col("b").gt(lit(10i64)), col("d").gt(lit(1

Re: [PR] Improve push down limit (logical optimizer rule) [datafusion]

2025-04-30 Thread via GitHub
xudong963 commented on code in PR #15744: URL: https://github.com/apache/datafusion/pull/15744#discussion_r2068747308 ## datafusion/core/tests/user_defined/user_defined_plan.rs: ## @@ -102,362 +96,10 @@ use datafusion_physical_plan::execution_plan::{Boundedness, EmissionType};

[PR] Improve sqllogictest error reporting [datafusion]

2025-04-30 Thread via GitHub
gabotechs opened a new pull request, #15905: URL: https://github.com/apache/datafusion/pull/15905 ## Which issue does this PR close? - Closes #. ## Rationale for this change Improve the way errors get shown to the developer while running sqllogictests.

Re: [PR] Improve push down limit (logical optimizer rule) [datafusion]

2025-04-30 Thread via GitHub
xudong963 commented on code in PR #15744: URL: https://github.com/apache/datafusion/pull/15744#discussion_r2068750473 ## datafusion/core/tests/user_defined/user_defined_plan.rs: ## @@ -102,362 +96,10 @@ use datafusion_physical_plan::execution_plan::{Boundedness, EmissionType};

Re: [PR] decode(col, 'UTF-8') support using cast [datafusion-comet]

2025-04-30 Thread via GitHub
mbutrovich commented on code in PR #1697: URL: https://github.com/apache/datafusion-comet/pull/1697#discussion_r2068751397 ## spark/src/main/scala/org/apache/comet/serde/QueryPlanSerde.scala: ## @@ -1430,6 +1430,22 @@ object QueryPlanSerde extends Logging with CometExprShim {

Re: [PR] Migrate Optimizer tests to insta, part3 [datafusion]

2025-04-30 Thread via GitHub
qstommyshu commented on code in PR #15893: URL: https://github.com/apache/datafusion/pull/15893#discussion_r2068682558 ## datafusion/optimizer/src/push_down_filter.rs: ## @@ -3039,21 +2974,14 @@ Projection: a, b .filter(and(col("b").gt(lit(10i64)), col("d").gt(lit(1

[PR] Fix allow_update_branch [datafusion]

2025-04-30 Thread via GitHub
xudong963 opened a new pull request, #15904: URL: https://github.com/apache/datafusion/pull/15904 ## Which issue does this PR close? - Follow up: https://github.com/apache/datafusion/pull/15894 ## Rationale for this change After https://github.com/apache/d

Re: [PR] Shell script to collect benchmarks for multiple versions [datafusion]

2025-04-30 Thread via GitHub
alamb commented on PR #15144: URL: https://github.com/apache/datafusion/pull/15144#issuecomment-2841914728 Thank you for this work @logan-keede -- I am sorry for the very long delay. It seems @saraghds is also working on this script so I am going to try and help it along. I merged t

Re: [PR] support OR operator in binary `evaluate_bounds` [datafusion]

2025-04-30 Thread via GitHub
davidhewitt commented on PR #15716: URL: https://github.com/apache/datafusion/pull/15716#issuecomment-2841940052 Thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [PR] Migrate Optimizer tests to insta, part3 [datafusion]

2025-04-30 Thread via GitHub
xudong963 commented on code in PR #15893: URL: https://github.com/apache/datafusion/pull/15893#discussion_r2068660826 ## datafusion/optimizer/src/push_down_filter.rs: ## @@ -3039,21 +2974,14 @@ Projection: a, b .filter(and(col("b").gt(lit(10i64)), col("d").gt(lit(10

Re: [PR] fix: fold cast null to substrait typed null [datafusion]

2025-04-30 Thread via GitHub
Blizzara commented on code in PR #15854: URL: https://github.com/apache/datafusion/pull/15854#discussion_r2068664824 ## datafusion/substrait/src/logical_plan/producer.rs: ## @@ -1590,6 +1590,21 @@ pub fn from_cast( schema: &DFSchemaRef, ) -> Result { let Cast { expr,

Re: [PR] Add all missing table options to be handled in any order [datafusion-sqlparser-rs]

2025-04-30 Thread via GitHub
tomershaniii commented on code in PR #1747: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1747#discussion_r2068075297 ## src/parser/mod.rs: ## @@ -7082,18 +7030,252 @@ impl<'a> Parser<'a> { if let Token::Word(word) = self.peek_token().token {

Re: [PR] Added support for `DROP DOMAIN` [datafusion-sqlparser-rs]

2025-04-30 Thread via GitHub
LucaCappelletti94 commented on code in PR #1828: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1828#discussion_r2068085168 ## src/ast/mod.rs: ## @@ -3319,6 +3319,18 @@ pub enum Statement { drop_behavior: Option, }, /// ```sql +/// DROP DOMAI

Re: [PR] Resolved bug in `parse_function_arg` [datafusion-sqlparser-rs]

2025-04-30 Thread via GitHub
LucaCappelletti94 commented on code in PR #1826: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1826#discussion_r2068070230 ## src/parser/mod.rs: ## @@ -5199,12 +5199,22 @@ impl<'a> Parser<'a> { // parse: [ argname ] argtype let mut name = None;

Re: [PR] fix: fold cast null to substrait typed null [datafusion]

2025-04-30 Thread via GitHub
discord9 commented on PR #15854: URL: https://github.com/apache/datafusion/pull/15854#issuecomment-2841594671 @vbarua might want to take a look again whether this fix is right -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub an

Re: [PR] fix: correctly specify the nullability of `map_values` return type [datafusion]

2025-04-30 Thread via GitHub
rluvaton commented on PR #15901: URL: https://github.com/apache/datafusion/pull/15901#issuecomment-2841611976 The broken tests failing on main as well -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [PR] decode(col, 'UTF-8') support using cast [datafusion-comet]

2025-04-30 Thread via GitHub
andygrove commented on code in PR #1697: URL: https://github.com/apache/datafusion-comet/pull/1697#discussion_r2068758199 ## spark/src/main/scala/org/apache/comet/serde/QueryPlanSerde.scala: ## @@ -1430,6 +1430,22 @@ object QueryPlanSerde extends Logging with CometExprShim {

Re: [PR] chore: Prepare 0.8.1 release [branch-0.8] [datafusion-comet]

2025-04-30 Thread via GitHub
huaxingao commented on PR #1699: URL: https://github.com/apache/datafusion-comet/pull/1699#issuecomment-2842133847 @andygrove CI failed. Could you please take a look? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use th

Re: [PR] decode(col, 'UTF-8') support using cast [datafusion-comet]

2025-04-30 Thread via GitHub
mbutrovich commented on code in PR #1697: URL: https://github.com/apache/datafusion-comet/pull/1697#discussion_r2068763268 ## spark/src/main/scala/org/apache/comet/serde/QueryPlanSerde.scala: ## @@ -1430,6 +1430,22 @@ object QueryPlanSerde extends Logging with CometExprShim {

Re: [PR] decode(col, 'UTF-8') support using cast [datafusion-comet]

2025-04-30 Thread via GitHub
mbutrovich commented on code in PR #1697: URL: https://github.com/apache/datafusion-comet/pull/1697#discussion_r2068759130 ## spark/src/main/scala/org/apache/comet/serde/QueryPlanSerde.scala: ## @@ -1430,6 +1430,22 @@ object QueryPlanSerde extends Logging with CometExprShim {

Re: [PR] Improve support for cursors for SQL Server [datafusion-sqlparser-rs]

2025-04-30 Thread via GitHub
aharpervc commented on code in PR #1831: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1831#discussion_r2068774017 ## src/test_utils.rs: ## @@ -166,6 +168,30 @@ impl TestedDialects { only_statement } +/// The same as [`one_statement_parses_to`]

Re: [PR] Improve support for cursors for SQL Server [datafusion-sqlparser-rs]

2025-04-30 Thread via GitHub
aharpervc commented on code in PR #1831: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1831#discussion_r2068787347 ## src/test_utils.rs: ## @@ -166,6 +168,30 @@ impl TestedDialects { only_statement } +/// The same as [`one_statement_parses_to`]

Re: [PR] Shell script to collect benchmarks for multiple versions [datafusion]

2025-04-30 Thread via GitHub
alamb commented on PR #15144: URL: https://github.com/apache/datafusion/pull/15144#issuecomment-2841919621 I want this script to be useable by more people so I think it is important to document it a bit more I am also going to investigate potentially overriding the list of git commit

Re: [PR] chore: Prepare 0.8.1 release [branch-0.8] [datafusion-comet]

2025-04-30 Thread via GitHub
andygrove commented on PR #1699: URL: https://github.com/apache/datafusion-comet/pull/1699#issuecomment-2842396751 > @andygrove CI failed. Could you please take a look? I deleted caches and am re-running the failed jobs -- This is an automated message from the Apache Git Service. T

[PR] Support inferring new predicates to push down [datafusion]

2025-04-30 Thread via GitHub
xudong963 opened a new pull request, #15906: URL: https://github.com/apache/datafusion/pull/15906 ## Which issue does this PR close? - Closes #. ## Rationale for this change We can infer new predicates from existing predicates to push down to reduce IO an

Re: [PR] chore: Prepare 0.8.1 release [branch-0.8] [datafusion-comet]

2025-04-30 Thread via GitHub
andygrove commented on PR #1699: URL: https://github.com/apache/datafusion-comet/pull/1699#issuecomment-2842407052 hmm that did not work ``` /usr/bin/docker exec f44b663cc7dcb9c368890c652d8e41fd738b91080e1f7e561236b850254ac78b sh -c "cat /etc/*release | grep ^ID" Error: Faile

Re: [PR] Support inferring new predicates to push down [datafusion]

2025-04-30 Thread via GitHub
xudong963 commented on code in PR #15906: URL: https://github.com/apache/datafusion/pull/15906#discussion_r2068938611 ## datafusion/sqllogictest/test_files/push_down_filter.slt: ## @@ -259,3 +259,35 @@ logical_plan TableScan: t projection=[a], full_filters=[CAST(t.a AS Utf8) =

Re: [PR] Support inferring new predicates to push down [datafusion]

2025-04-30 Thread via GitHub
xudong963 commented on code in PR #15906: URL: https://github.com/apache/datafusion/pull/15906#discussion_r2068940718 ## datafusion/optimizer/src/push_down_filter.rs: ## @@ -1382,6 +1386,73 @@ fn contain(e: &Expr, check_map: &HashMap) -> bool { is_contain } +/// Infers

Re: [PR] chore: Prepare 0.8.1 release [branch-0.8] [datafusion-comet]

2025-04-30 Thread via GitHub
andygrove commented on code in PR #1699: URL: https://github.com/apache/datafusion-comet/pull/1699#discussion_r2068974642 ## .github/workflows/benchmark.yml: ## @@ -70,6 +70,7 @@ jobs: with: path: ./tpcds-sf-1 key: tpcds-${{ hashFiles('.github/work

Re: [PR] Migrate Optimizer tests to insta, part3 [datafusion]

2025-04-30 Thread via GitHub
blaginin commented on code in PR #15893: URL: https://github.com/apache/datafusion/pull/15893#discussion_r2068974200 ## datafusion/optimizer/src/push_down_filter.rs: ## @@ -2001,24 +2055,16 @@ mod tests { .filter(col("sum(test.c)").gt(lit(10i64)))? .bui

Re: [PR] Support inferring new predicates to push down [datafusion]

2025-04-30 Thread via GitHub
xudong963 commented on code in PR #15906: URL: https://github.com/apache/datafusion/pull/15906#discussion_r2068952607 ## datafusion/expr/src/expr_rewriter/mod.rs: ## @@ -131,13 +131,25 @@ pub fn normalize_sorts( } /// Recursively replace all [`Column`] expressions in a given

Re: [PR] Migrate Optimizer tests to insta, part3 [datafusion]

2025-04-30 Thread via GitHub
blaginin commented on code in PR #15893: URL: https://github.com/apache/datafusion/pull/15893#discussion_r2068973338 ## datafusion/optimizer/src/push_down_filter.rs: ## @@ -1840,22 +1889,15 @@ mod tests { .filter(col("a").eq(lit(1i64)))? .build()?; -

Re: [PR] Migrate Optimizer tests to insta, part3 [datafusion]

2025-04-30 Thread via GitHub
blaginin commented on code in PR #15893: URL: https://github.com/apache/datafusion/pull/15893#discussion_r2068983359 ## datafusion/optimizer/src/push_down_filter.rs: ## @@ -3039,21 +2974,14 @@ Projection: a, b .filter(and(col("b").gt(lit(10i64)), col("d").gt(lit(10i

Re: [I] Refactor CometSparkSessionExtensions.scala [datafusion-comet]

2025-04-30 Thread via GitHub
andygrove closed issue #1669: Refactor CometSparkSessionExtensions.scala URL: https://github.com/apache/datafusion-comet/issues/1669 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comme

Re: [PR] feat: More warning info for users [datafusion-comet]

2025-04-30 Thread via GitHub
andygrove commented on code in PR #1667: URL: https://github.com/apache/datafusion-comet/pull/1667#discussion_r2068991765 ## spark/src/main/scala/org/apache/comet/DataTypeSupport.scala: ## @@ -33,20 +37,25 @@ trait DataTypeSupport { * @return * true if the datatype is

Re: [PR] chore: Move Comet rules into their own files [datafusion-comet]

2025-04-30 Thread via GitHub
andygrove merged PR #1695: URL: https://github.com/apache/datafusion-comet/pull/1695 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@

Re: [PR] feat: regexp_replace() expression with no starting offset [datafusion-comet]

2025-04-30 Thread via GitHub
andygrove commented on code in PR #1700: URL: https://github.com/apache/datafusion-comet/pull/1700#discussion_r2068995618 ## spark/src/main/scala/org/apache/comet/serde/QueryPlanSerde.scala: ## @@ -1430,6 +1430,27 @@ object QueryPlanSerde extends Logging with CometExprShim {

Re: [PR] feat: regexp_replace() expression with no starting offset [datafusion-comet]

2025-04-30 Thread via GitHub
andygrove commented on code in PR #1700: URL: https://github.com/apache/datafusion-comet/pull/1700#discussion_r2068997865 ## spark/src/test/scala/org/apache/comet/CometFuzzTestSuite.scala: ## @@ -188,6 +188,20 @@ class CometFuzzTestSuite extends CometTestBase with AdaptiveSpark

Re: [PR] Add all missing table options to be handled in any order [datafusion-sqlparser-rs]

2025-04-30 Thread via GitHub
iffyio commented on code in PR #1747: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1747#discussion_r2067982458 ## src/ast/helpers/stmt_create_table.rs: ## @@ -76,27 +78,20 @@ pub struct CreateTableBuilder { pub constraints: Vec, pub hive_distribution: H

Re: [PR] feat: More warning info for users [datafusion-comet]

2025-04-30 Thread via GitHub
hsiang-c commented on code in PR #1667: URL: https://github.com/apache/datafusion-comet/pull/1667#discussion_r2067930692 ## spark/src/main/scala/org/apache/comet/DataTypeSupport.scala: ## @@ -33,20 +37,25 @@ trait DataTypeSupport { * @return * true if the datatype is s

Re: [PR] Consolidate feature flags into configuration guide [datafusion]

2025-04-30 Thread via GitHub
Omega359 commented on PR #14657: URL: https://github.com/apache/datafusion/pull/14657#issuecomment-2842538485 This is a good ticket and shouldn't be autoclosed. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL a

Re: [PR] Improve sqllogictest error reporting [datafusion]

2025-04-30 Thread via GitHub
gabotechs commented on PR #15905: URL: https://github.com/apache/datafusion/pull/15905#issuecomment-2842631495 > I would prefer it to be limited to say the first 10. Otherwise this looks good. 👍 no strong opinion here, I imagine that if more than 10 tests fail, it means that somethin

Re: [PR] Support inferring new predicates to push down [datafusion]

2025-04-30 Thread via GitHub
Omega359 commented on PR #15906: URL: https://github.com/apache/datafusion/pull/15906#issuecomment-2842890086 Perhaps running clickbench or equivalent (assuming clickbend wouldn't trigger this optimization) to showcase the difference would be good? -- This is an automated message from the

Re: [PR] feat: Set/cancel with job tag and make max broadcast table size configurable [datafusion-comet]

2025-04-30 Thread via GitHub
parthchandra commented on code in PR #1693: URL: https://github.com/apache/datafusion-comet/pull/1693#discussion_r2069227147 ## spark/src/main/spark-3.4/org/apache/comet/shims/ShimCometBroadcastExchangeExec.scala: ## @@ -0,0 +1,49 @@ +/* + * Licensed to the Apache Software Found

Re: [PR] Add hooks to `SchemaAdapter` to add custom column generators [datafusion]

2025-04-30 Thread via GitHub
adriangb commented on PR #15261: URL: https://github.com/apache/datafusion/pull/15261#issuecomment-2843120846 Looking at how filter pushdown interacts with partition columns I think this will be a huge improvement for that. Currently the partition values get bound when the `FileStream` is

Re: [PR] Factor out Substrait consumers into separate files [datafusion]

2025-04-30 Thread via GitHub
vbarua commented on PR #15794: URL: https://github.com/apache/datafusion/pull/15794#issuecomment-2842642259 @alamb this PR should be ready for review I've checked that this PR consists purely of code moves with no functional changes, at this point in time, aside from `from_substrait_t

Re: [PR] [wip] Add scripts for running benchmarks on EC2 [datafusion-comet]

2025-04-30 Thread via GitHub
anuragmantri commented on code in PR #1654: URL: https://github.com/apache/datafusion-comet/pull/1654#discussion_r2069177697 ## dev/benchmarks/setup.sh: ## @@ -0,0 +1,44 @@ +#!/bin/bash +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor licen

Re: [PR] [wip] Add scripts for running benchmarks on EC2 [datafusion-comet]

2025-04-30 Thread via GitHub
anuragmantri commented on code in PR #1654: URL: https://github.com/apache/datafusion-comet/pull/1654#discussion_r2069180747 ## dev/benchmarks/setup.sh: ## @@ -0,0 +1,44 @@ +#!/bin/bash +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor licen

Re: [PR] fix: correctly specify the nullability of `map_values` return type [datafusion]

2025-04-30 Thread via GitHub
alamb commented on PR #15901: URL: https://github.com/apache/datafusion/pull/15901#issuecomment-284283 > Thanks, there is another failure in CI which is unrelated I re-started the failed CI check -- This is an automated message from the Apache Git Service. To respond to the mess

Re: [PR] chore: Prepare 0.8.1 release [branch-0.8] [datafusion-comet]

2025-04-30 Thread via GitHub
andygrove merged PR #1699: URL: https://github.com/apache/datafusion-comet/pull/1699 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@

Re: [PR] feat: regexp_replace() expression with no starting offset [datafusion-comet]

2025-04-30 Thread via GitHub
andygrove commented on code in PR #1700: URL: https://github.com/apache/datafusion-comet/pull/1700#discussion_r2068997181 ## spark/src/main/scala/org/apache/comet/serde/QueryPlanSerde.scala: ## @@ -1430,6 +1430,27 @@ object QueryPlanSerde extends Logging with CometExprShim {

[I] Can't parse valid Snowflake compound expression [datafusion-sqlparser-rs]

2025-04-30 Thread via GitHub
ramnes opened a new issue, #1833: URL: https://github.com/apache/datafusion-sqlparser-rs/issues/1833 Hey there, thanks for the great project! I'm encountering an issue with the following query, which is valid in Snowflake: ```sql SELECT v.$2 FROM (VALUES (1, 'value1'), (2, '

Re: [PR] chore: Prepare 0.8.1 release [branch-0.8] [datafusion-comet]

2025-04-30 Thread via GitHub
andygrove commented on PR #1699: URL: https://github.com/apache/datafusion-comet/pull/1699#issuecomment-2842678170 @parthchandra @huaxingao Hopefully CI will pass this time - could I get a review? -- This is an automated message from the Apache Git Service. To respond to the message, ple

Re: [PR] fix(avro): Respect projection order in Avro reader [datafusion]

2025-04-30 Thread via GitHub
comphead merged PR #15840: URL: https://github.com/apache/datafusion/pull/15840 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

Re: [I] bug: regexp_match not working? [datafusion]

2025-04-30 Thread via GitHub
Omega359 commented on issue #15872: URL: https://github.com/apache/datafusion/issues/15872#issuecomment-2843008738 Can someone update the title of this issue to reflect the true nature of the enhancement? I don't think this has anything specifically to do with regexp_match except that was h

Re: [I] bug: regexp_match not working? [datafusion]

2025-04-30 Thread via GitHub
Omega359 commented on issue #15872: URL: https://github.com/apache/datafusion/issues/15872#issuecomment-2843022789 Note that this syntax is supported in DuckDB: ```sql D create table test (a int, b varchar); D insert into test values (1, 'one'); D insert into test values (2,

Re: [I] bug: regexp_match not working? [datafusion]

2025-04-30 Thread via GitHub
Omega359 commented on issue #15872: URL: https://github.com/apache/datafusion/issues/15872#issuecomment-2843029954 @juju4 specifically for your use case I think changing the function to regexp_like will work: ```sql > select * from test where regexp_like(test.b, '(.){4,}'); +---+---

Re: [PR] Migrate Optimizer tests to insta, part3 [datafusion]

2025-04-30 Thread via GitHub
qstommyshu commented on code in PR #15893: URL: https://github.com/apache/datafusion/pull/15893#discussion_r2069132932 ## datafusion/optimizer/src/push_down_filter.rs: ## @@ -3039,21 +2974,14 @@ Projection: a, b .filter(and(col("b").gt(lit(10i64)), col("d").gt(lit(1

Re: [PR] Improve sqllogictest error reporting [datafusion]

2025-04-30 Thread via GitHub
Omega359 commented on PR #15905: URL: https://github.com/apache/datafusion/pull/15905#issuecomment-2842522088 ``` Runs all the sqllogictests in a single file even if some of them fail, reporting all failures, instead of just the first one. ``` I would prefer it to be limited to say

Re: [PR] fix: fold cast null to substrait typed null [datafusion]

2025-04-30 Thread via GitHub
vbarua commented on code in PR #15854: URL: https://github.com/apache/datafusion/pull/15854#discussion_r2069115514 ## datafusion/substrait/src/logical_plan/producer.rs: ## @@ -1590,6 +1590,21 @@ pub fn from_cast( schema: &DFSchemaRef, ) -> Result { let Cast { expr, da

Re: [PR] feat: regexp_replace() expression with no starting offset [datafusion-comet]

2025-04-30 Thread via GitHub
mbutrovich commented on PR #1700: URL: https://github.com/apache/datafusion-comet/pull/1700#issuecomment-2842651369 I should probably add some better tests, since now that we're checking the flag that means we won't get coverage in the Spark SQL tests -- we'll just fall back. -- This is

Re: [I] [DISCUSSION] DataFusion Road Map: Q3-Q4 2025 [datafusion]

2025-04-30 Thread via GitHub
Omega359 commented on issue #15878: URL: https://github.com/apache/datafusion/issues/15878#issuecomment-2842927926 My list includes: - #13527 (finally finish this) - #14837 (adopt my async UDF's to use this and validate) - #15394 - #8282 (Specifically allowing changing default tz

Re: [I] Support file row index / row id for each file in a `ListingTableProvider` [datafusion]

2025-04-30 Thread via GitHub
daphnenhuch-at commented on issue #15892: URL: https://github.com/apache/datafusion/issues/15892#issuecomment-2842930676 By the way, this is the exact bug I was referencing here: https://github.com/apache/datafusion/issues/15833 I don't actually need to maintain the row number for eac

Re: [I] Avro reader fails when query columns are reordered in SELECT statement [datafusion]

2025-04-30 Thread via GitHub
comphead closed issue #15839: Avro reader fails when query columns are reordered in SELECT statement URL: https://github.com/apache/datafusion/issues/15839 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[I] RFC: Add features to reduce dependencies on core crate [datafusion]

2025-04-30 Thread via GitHub
timsaucer opened a new issue, #15907: URL: https://github.com/apache/datafusion/issues/15907 ### Is your feature request related to a problem or challenge? Currently when you add the `datafusion` crate, it pulls in many dependencies that are not needed for all use cases. We have two s

Re: [I] Sorting is not maintained after using a window function [datafusion]

2025-04-30 Thread via GitHub
daphnenhuch-at commented on issue #15833: URL: https://github.com/apache/datafusion/issues/15833#issuecomment-2842967139 That still doesn't solve my problem here. I added the extra sort on both columns as suggested. Now it seems the userPrimaryKey is sorted and the fileRowNumbers start at 1

Re: [PR] feat: regexp_replace() expression with no starting offset [datafusion-comet]

2025-04-30 Thread via GitHub
codecov-commenter commented on PR #1700: URL: https://github.com/apache/datafusion-comet/pull/1700#issuecomment-2842878963 ## [Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/1700?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_ca

Re: [PR] feat: decode() expression when using 'utf-8' encoding [datafusion-comet]

2025-04-30 Thread via GitHub
mbutrovich commented on PR #1697: URL: https://github.com/apache/datafusion-comet/pull/1697#issuecomment-2843202728 I need to do a shim for `StringDecode` in Spark 4.0. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

Re: [PR] Handle FixedSizeList in flatten [datafusion]

2025-04-30 Thread via GitHub
berkaysynnada commented on PR #15899: URL: https://github.com/apache/datafusion/pull/15899#issuecomment-2841217502 Hi @joroKr21. Thanks for the fix, did you see https://github.com/apache/datafusion/pull/15898? -- This is an automated message from the Apache Git Service. To respond to the

Re: [PR] Simple Functions Preview [datafusion]

2025-04-30 Thread via GitHub
findepi commented on PR #14668: URL: https://github.com/apache/datafusion/pull/14668#issuecomment-2841237225 hey stale bot, your're right. I owe status update. TL;DR: no update, as we refocused away from execution in recent months. I still find this very valuable for compute, but can't pr

Re: [D] How does 'sort' interact with record batches? [datafusion]

2025-04-30 Thread via GitHub
GitHub user alamb added a comment to the discussion: How does 'sort' interact with record batches? That makes sense -- the `fileRowNumber` serves as provenance information (aka where that user came from in the file). Thank you for the very clear description GitHub link: https://github.com/ap

Re: [I] Interval arithmetic `apply_operator` does not support OR [datafusion]

2025-04-30 Thread via GitHub
alamb closed issue #15715: Interval arithmetic `apply_operator` does not support OR URL: https://github.com/apache/datafusion/issues/15715 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specifi

Re: [PR] support OR operator in binary `evaluate_bounds` [datafusion]

2025-04-30 Thread via GitHub
alamb commented on code in PR #15716: URL: https://github.com/apache/datafusion/pull/15716#discussion_r2068480882 ## datafusion/physical-expr/src/intervals/cp_solver.rs: ## @@ -645,6 +645,17 @@ impl ExprIntervalGraph { .map(|child| self.graph[*child].interval())

Re: [PR] support OR operator in binary `evaluate_bounds` [datafusion]

2025-04-30 Thread via GitHub
alamb merged PR #15716: URL: https://github.com/apache/datafusion/pull/15716 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] support OR operator in binary `evaluate_bounds` [datafusion]

2025-04-30 Thread via GitHub
alamb commented on PR #15716: URL: https://github.com/apache/datafusion/pull/15716#issuecomment-2841692463 Thanks @davidhewitt and @berkaysynnada -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [I] Implement `partition_statistics` API for more operators [datafusion]

2025-04-30 Thread via GitHub
UBarney commented on issue #15873: URL: https://github.com/apache/datafusion/issues/15873#issuecomment-2841707774 I'm interested in implementing `partition_statistics` for `AggregateExec`. However, I'll need some time to familiarize myself with the codebase -- This is an automated messag

Re: [PR] Migrate Optimizer tests to insta, part3 [datafusion]

2025-04-30 Thread via GitHub
alamb commented on PR #15893: URL: https://github.com/apache/datafusion/pull/15893#issuecomment-2841714075 Is this PR ready to review @qstommyshu ? It is marked as draft so I wanted to double check before doing so cc @blaginin -- This is an automated message from the Apache Git Se

Re: [PR] Saner handling of nulls inside arrays [datafusion]

2025-04-30 Thread via GitHub
joroKr21 commented on PR #15149: URL: https://github.com/apache/datafusion/pull/15149#issuecomment-2841023652 #15899 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsu

Re: [PR] feat: More warning info for users [datafusion-comet]

2025-04-30 Thread via GitHub
hsiang-c commented on code in PR #1667: URL: https://github.com/apache/datafusion-comet/pull/1667#discussion_r2067930692 ## spark/src/main/scala/org/apache/comet/DataTypeSupport.scala: ## @@ -33,20 +37,25 @@ trait DataTypeSupport { * @return * true if the datatype is s

[PR] Coerce FixedSizeList to List recursively [datafusion]

2025-04-30 Thread via GitHub
joroKr21 opened a new pull request, #15899: URL: https://github.com/apache/datafusion/pull/15899 ## Which issue does this PR close? - Followup to #15149 and #15160 which had a semantic merge conflict ## Rationale for this change Since the type signature of `flatte

Re: [PR] Handle FixedSizeList in flatten [datafusion]

2025-04-30 Thread via GitHub
joroKr21 commented on PR #15899: URL: https://github.com/apache/datafusion/pull/15899#issuecomment-2841256901 Whoops, I did not - I saw the comment on my merged PR and I went straight to fixing 😄 - should've checked first -- This is an automated message from the Apache Git Service. To res

Re: [I] Main is broken [datafusion]

2025-04-30 Thread via GitHub
alamb closed issue #15896: Main is broken URL: https://github.com/apache/datafusion/issues/15896 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-

  1   2   >