[PR] benchmark information as Lineformat [datafusion]

2025-02-14 Thread via GitHub
logan-keede opened a new pull request, #14662: URL: https://github.com/apache/datafusion/pull/14662 ## Which issue does this PR close? - Closes #6107 ## Rationale for this change a step towards https://github.com/apache/datafusion/issues/5504 ## What change

Re: [PR] feat: make random seek configurable in fuzz-testing [datafusion-comet]

2025-02-14 Thread via GitHub
codecov-commenter commented on PR #1401: URL: https://github.com/apache/datafusion-comet/pull/1401#issuecomment-2658617707 ## [Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/1401?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_ca

Re: [PR] script to export benchmark information as Lineformat [datafusion]

2025-02-14 Thread via GitHub
logan-keede commented on PR #14662: URL: https://github.com/apache/datafusion/pull/14662#issuecomment-2658628564 @alamb Check this out. do we have some documentation on json format? that would be helpful if we want to show more arguments in `tag set`. -- This is an automated messag

Re: [PR] bug: Fix memory reservation and allocation problems for SortExec [datafusion]

2025-02-14 Thread via GitHub
Kontinuation commented on code in PR #14644: URL: https://github.com/apache/datafusion/pull/14644#discussion_r1955829354 ## datafusion/physical-plan/src/sorts/sort.rs: ## @@ -408,50 +395,114 @@ impl ExternalSorter { debug!("Spilling sort data of ExternalSorter to disk

Re: [I] AQE may materialize a non-supported Final-mode HashAggregate [datafusion-comet]

2025-02-14 Thread via GitHub
viirya commented on issue #1389: URL: https://github.com/apache/datafusion-comet/issues/1389#issuecomment-2660573325 Because this? ``` // When Comet shuffle is disabled, we don't want to transform the HashAggregate // to CometHashAggregate. Otherwise, we probably get partial Co

Re: [PR] fix: [branch-0.6] Fix Comet version in Spark diffs [datafusion-comet]

2025-02-14 Thread via GitHub
kazuyukitanimura commented on PR #1405: URL: https://github.com/apache/datafusion-comet/pull/1405#issuecomment-2660613861 The title says `[branch-0.6]` but this PR is against `main`? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to G

Re: [PR] fix: [branch-0.6] Fix Comet version in Spark diffs [datafusion-comet]

2025-02-14 Thread via GitHub
andygrove commented on PR #1405: URL: https://github.com/apache/datafusion-comet/pull/1405#issuecomment-2660614850 > The title says `[branch-0.6]` but this PR is against `main`? Thanks. Updated. -- This is an automated message from the Apache Git Service. To respond to the message,

[I] Remove hard-coded Comet version numbers from GitHub actions [datafusion-comet]

2025-02-14 Thread via GitHub
andygrove opened a new issue, #1406: URL: https://github.com/apache/datafusion-comet/issues/1406 ### What is the problem the feature request solves? We hard-code the current snapshot version in some GitHub actions. We should get the version number from the pom.xml instead to remove th

Re: [PR] fix: [branch-0.6] Fix Comet version in Spark diffs [datafusion-comet]

2025-02-14 Thread via GitHub
andygrove commented on PR #1405: URL: https://github.com/apache/datafusion-comet/pull/1405#issuecomment-2660616259 > What about `.github/actions/setup-spark-builder/action.yaml`? There is no need to override the default value since it gets replace dynamically, but I filed an issue to

Re: [PR] feat: add resolved `target` to `DmlStatement` (to eliminate need for table lookup after deserialization) [datafusion]

2025-02-14 Thread via GitHub
milenkovicm commented on PR #14631: URL: https://github.com/apache/datafusion/pull/14631#issuecomment-2660174463 @alamb this PR looks fine I verified it with `datafusion.execution.parquet.pushdown_filters=true` but still same problem like https://github.com/apache/datafusion/pull/14631#iss

Re: [PR] feat: add resolved `target` to `DmlStatement` (to eliminate need for table lookup after deserialization) [datafusion]

2025-02-14 Thread via GitHub
alamb commented on PR #14631: URL: https://github.com/apache/datafusion/pull/14631#issuecomment-2660160341 Thanks @milenkovicm -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific commen

Re: [I] feat: add `TableSource` to `DML` `proto` to eliminate need for table lookup after deserialisation [datafusion]

2025-02-14 Thread via GitHub
alamb closed issue #14654: feat: add `TableSource` to `DML` `proto` to eliminate need for table lookup after deserialisation URL: https://github.com/apache/datafusion/issues/14654 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub a

Re: [PR] feat: add resolved `target` to `DmlStatement` (to eliminate need for table lookup after deserialization) [datafusion]

2025-02-14 Thread via GitHub
alamb merged PR #14631: URL: https://github.com/apache/datafusion/pull/14631 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [I] Tracking: date_time related features [datafusion]

2025-02-14 Thread via GitHub
Omega359 commented on issue #14661: URL: https://github.com/apache/datafusion/issues/14661#issuecomment-2660211300 https://github.com/apache/datafusion/issues/8282 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the UR

Re: [PR] Minor: remove confusing `update_plan_from_children` call from `EnforceSorting` [datafusion]

2025-02-14 Thread via GitHub
wiedld commented on code in PR #14650: URL: https://github.com/apache/datafusion/pull/14650#discussion_r1956722551 ## datafusion/physical-optimizer/src/enforce_sorting/mod.rs: ## @@ -119,7 +119,7 @@ fn update_sort_ctx_children( } node.data = data; -node.update_pl

[PR] Update EnforceSorting docs. [datafusion]

2025-02-14 Thread via GitHub
wiedld opened a new pull request, #14673: URL: https://github.com/apache/datafusion/pull/14673 ## Which issue does this PR close? Helps with the docs effort https://github.com/apache/datafusion/issues/7013. ## Rationale for this change Noticed while reviewing https://gith

Re: [PR] Minor: remove confusing `update_plan_from_children` call from `EnforceSorting` [datafusion]

2025-02-14 Thread via GitHub
wiedld commented on code in PR #14650: URL: https://github.com/apache/datafusion/pull/14650#discussion_r1956722551 ## datafusion/physical-optimizer/src/enforce_sorting/mod.rs: ## @@ -119,7 +119,7 @@ fn update_sort_ctx_children( } node.data = data; -node.update_pl

Re: [PR] feat: add experimental remote HDFS support for native DataFusion reader [datafusion-comet]

2025-02-14 Thread via GitHub
comphead commented on code in PR #1359: URL: https://github.com/apache/datafusion-comet/pull/1359#discussion_r1956765242 ## native/core/Cargo.toml: ## @@ -77,6 +77,7 @@ datafusion-comet-proto = { workspace = true } object_store = { workspace = true } url = { workspace = true }

Re: [PR] fix: Reduce cast.rs and utils.rs logic from parquet_support.rs for experimental native scans [datafusion-comet]

2025-02-14 Thread via GitHub
parthchandra commented on PR #1387: URL: https://github.com/apache/datafusion-comet/pull/1387#issuecomment-2660476596 @kazuyukitanimura please go ahead and merge this. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

Re: [PR] fix: AQE creating a non-supported Final HashAggregate post-shuffle [datafusion-comet]

2025-02-14 Thread via GitHub
kazuyukitanimura commented on code in PR #1390: URL: https://github.com/apache/datafusion-comet/pull/1390#discussion_r1956865360 ## common/src/main/scala/org/apache/comet/CometConf.scala: ## @@ -216,6 +216,17 @@ object CometConf extends ShimCometConf { val COMET_EXEC_INITCAP_

Re: [PR] fix: AQE creating a non-supported Final HashAggregate post-shuffle [datafusion-comet]

2025-02-14 Thread via GitHub
kazuyukitanimura commented on code in PR #1390: URL: https://github.com/apache/datafusion-comet/pull/1390#discussion_r1956799716 ## common/src/main/scala/org/apache/comet/CometConf.scala: ## @@ -216,6 +216,17 @@ object CometConf extends ShimCometConf { val COMET_EXEC_INITCAP_

[PR] chore: adding Linkedin follow page [datafusion]

2025-02-14 Thread via GitHub
comphead opened a new pull request, #14676: URL: https://github.com/apache/datafusion/pull/14676 ## Which issue does this PR close? Related to #14389 - Closes #. ## Rationale for this change ## What changes are included in this PR? ## Are th

Re: [PR] Speed up `uuid` UDF (20x faster) [datafusion]

2025-02-14 Thread via GitHub
simonvandel commented on PR #14675: URL: https://github.com/apache/datafusion/pull/14675#issuecomment-2660484055 Oops, need to generate valid uuidv4 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go t

Re: [PR] Speed up `uuid` UDF (20x faster) [datafusion]

2025-02-14 Thread via GitHub
comphead commented on code in PR #14675: URL: https://github.com/apache/datafusion/pull/14675#discussion_r1956877727 ## datafusion/functions/src/string/uuid.rs: ## @@ -87,7 +88,13 @@ impl ScalarUDFImpl for UuidFunc { if !args.is_empty() { return internal_er

Re: [PR] feat: add experimental remote HDFS support for native DataFusion reader [datafusion-comet]

2025-02-14 Thread via GitHub
parthchandra commented on code in PR #1359: URL: https://github.com/apache/datafusion-comet/pull/1359#discussion_r1956879942 ## native/core/src/execution/planner.rs: ## @@ -1155,12 +1154,9 @@ impl PhysicalPlanner { )) }); -

Re: [PR] fix: Change default value of COMET_SCAN_ALLOW_INCOMPATIBLE and add documentation [datafusion-comet]

2025-02-14 Thread via GitHub
andygrove merged PR #1398: URL: https://github.com/apache/datafusion-comet/pull/1398 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@

Re: [PR] fix: Change default value of COMET_SCAN_ALLOW_INCOMPATIBLE and add documentation [datafusion-comet]

2025-02-14 Thread via GitHub
parthchandra commented on code in PR #1398: URL: https://github.com/apache/datafusion-comet/pull/1398#discussion_r1956903894 ## spark/src/main/scala/org/apache/comet/DataTypeSupport.scala: ## @@ -37,7 +37,7 @@ trait DataTypeSupport { private def isGloballySupported(dt: Data

Re: [PR] test: Add experimental native scans to CometReadBenchmark [datafusion-comet]

2025-02-14 Thread via GitHub
parthchandra commented on PR #1150: URL: https://github.com/apache/datafusion-comet/pull/1150#issuecomment-2660530828 > > @mbutrovich Do you plan to overwrite spark/benchmarks/CometReadBenchmark-jdk11-results.txt ? > > > Thanks for running this benchmark @mbutrovich. Slowness in nati

Re: [PR] test: Add experimental native scans to CometReadBenchmark [datafusion-comet]

2025-02-14 Thread via GitHub
kazuyukitanimura commented on PR #1150: URL: https://github.com/apache/datafusion-comet/pull/1150#issuecomment-2660569970 Merged thanks @mbutrovich @parthchandra @comphead @andygrove -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

Re: [PR] test: Add experimental native scans to CometReadBenchmark [datafusion-comet]

2025-02-14 Thread via GitHub
kazuyukitanimura merged PR #1150: URL: https://github.com/apache/datafusion-comet/pull/1150 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsub

Re: [I] Attach `Diagnostic` to "more than one column in subquery" error [datafusion]

2025-02-14 Thread via GitHub
irenjj commented on issue #14438: URL: https://github.com/apache/datafusion/issues/14438#issuecomment-2660573715 > Hey [@irenjj](https://github.com/irenjj) how is it going with this ticket :) Can I help with anything? Hi, @eliaperantoni, Sorry for not updating my status for a long tim

Re: [I] AQE may materialize a non-supported Final-mode HashAggregate [datafusion-comet]

2025-02-14 Thread via GitHub
kazuyukitanimura commented on issue #1389: URL: https://github.com/apache/datafusion-comet/issues/1389#issuecomment-2660587673 Thanks @viirya @EmilyMatt Did you mean that we should be able to run Comet aggregation even Comet shuffle is disabled by > I believe I've seen a few

Re: [PR] bug: Fix memory reservation and allocation problems for SortExec [datafusion]

2025-02-14 Thread via GitHub
kazuyukitanimura commented on PR #14644: URL: https://github.com/apache/datafusion/pull/14644#issuecomment-2660597523 Not directly related to the point of this PR but regarding ` I had a hard time making DataFusion Comet work on cloud instances with 4GB memory per CPU core, partially b

Re: [PR] chore: Prepare for 0.7.0 development [datafusion-comet]

2025-02-14 Thread via GitHub
kazuyukitanimura commented on code in PR #1404: URL: https://github.com/apache/datafusion-comet/pull/1404#discussion_r1956946498 ## dev/diffs/3.4.3.diff: ## @@ -7,7 +7,7 @@ index d3544881af1..26ab186c65d 100644 2.5.1 2.0.8 +3.4 -+0.5.0-SNAPSHOT Review Comme

Re: [I] AQE may materialize a non-supported Final-mode HashAggregate [datafusion-comet]

2025-02-14 Thread via GitHub
kazuyukitanimura commented on issue #1389: URL: https://github.com/apache/datafusion-comet/issues/1389#issuecomment-2660567872 cc @viirya I forgot why we did this in #991 https://github.com/apache/datafusion-comet/blob/f099e6e40aa18441c7882e5bffd9d6dfb10c6c19/spark/src/main/scala/or

Re: [PR] fix: Reduce cast.rs and utils.rs logic from parquet_support.rs for experimental native scans [datafusion-comet]

2025-02-14 Thread via GitHub
kazuyukitanimura merged PR #1387: URL: https://github.com/apache/datafusion-comet/pull/1387 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsub

Re: [PR] feat: add experimental remote HDFS support for native DataFusion reader [datafusion-comet]

2025-02-14 Thread via GitHub
kazuyukitanimura commented on PR #1359: URL: https://github.com/apache/datafusion-comet/pull/1359#issuecomment-2660574425 oops, @comphead would you mind merging the latest main into this PR branch in order to resolve the conflict? -- This is an automated message from the Apache Git Servi

[PR] fix: [branch-0.6] Fix Comet version in Spark diffs [datafusion-comet]

2025-02-14 Thread via GitHub
andygrove opened a new pull request, #1405: URL: https://github.com/apache/datafusion-comet/pull/1405 ## Which issue does this PR close? Fix incorrect Comet version in Spark diffs. ## Rationale for this change Fixing this just in case anyone wants to run S

Re: [PR] chore: Prepare for 0.7.0 development [datafusion-comet]

2025-02-14 Thread via GitHub
andygrove commented on code in PR #1404: URL: https://github.com/apache/datafusion-comet/pull/1404#discussion_r1956971872 ## dev/diffs/3.4.3.diff: ## @@ -7,7 +7,7 @@ index d3544881af1..26ab186c65d 100644 2.5.1 2.0.8 +3.4 -+0.5.0-SNAPSHOT Review Comment:

Re: [PR] Refactor signatures for lpad, rpad, left, and right [datafusion]

2025-02-14 Thread via GitHub
github-actions[bot] commented on PR #13420: URL: https://github.com/apache/datafusion/pull/13420#issuecomment-2660627372 Thank you for your contribution. Unfortunately, this pull request is stale because it has been open 60 days with no activity. Please remove the stale label or comment or

Re: [PR] PoC Adaptive round robin repartitioning [datafusion]

2025-02-14 Thread via GitHub
github-actions[bot] closed pull request #13699: PoC Adaptive round robin repartitioning URL: https://github.com/apache/datafusion/pull/13699 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the speci

Re: [PR] Reorganize the Parser module [datafusion-sqlparser-rs]

2025-02-14 Thread via GitHub
github-actions[bot] closed pull request #1581: Reorganize the Parser module URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1581 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specif

Re: [PR] Fix CI doctests on main [datafusion]

2025-02-14 Thread via GitHub
findepi merged PR #14667: URL: https://github.com/apache/datafusion/pull/14667 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafu

Re: [PR] feat: Support On-Demand Repartition [datafusion]

2025-02-14 Thread via GitHub
Dandandan commented on PR #14411: URL: https://github.com/apache/datafusion/pull/14411#issuecomment-2659343979 I ran some tests yesterday and I can confirm the runtime improvements. I do get some high memory usage however especially with some queries (TPC-H Query 18 I believe) than with t

Re: [PR] Early exit on column normalisation [datafusion]

2025-02-14 Thread via GitHub
timsaucer commented on code in PR #14636: URL: https://github.com/apache/datafusion/pull/14636#discussion_r1956153542 ## datafusion/expr/src/logical_plan/builder.rs: ## @@ -834,10 +834,16 @@ impl LogicalPlanBuilder { plan: &LogicalPlan, column: impl Into,

Re: [PR] feat: add resolved `target` to `DmlStatement` (to eliminate need for table lookup after deserialization) [datafusion]

2025-02-14 Thread via GitHub
milenkovicm commented on code in PR #14631: URL: https://github.com/apache/datafusion/pull/14631#discussion_r1956209104 ## datafusion/expr/src/logical_plan/dml.rs: ## @@ -91,31 +91,64 @@ impl Hash for CopyTo { /// The operator that modifies the content of a database (adapted

Re: [PR] feat: add resolved `target` to `DmlStatement` (to eliminate need for table lookup after deserialization) [datafusion]

2025-02-14 Thread via GitHub
milenkovicm commented on code in PR #14631: URL: https://github.com/apache/datafusion/pull/14631#discussion_r1956209104 ## datafusion/expr/src/logical_plan/dml.rs: ## @@ -91,31 +91,64 @@ impl Hash for CopyTo { /// The operator that modifies the content of a database (adapted

Re: [PR] fix: Change default value of COMET_SCAN_ALLOW_INCOMPATIBLE and add documentation [datafusion-comet]

2025-02-14 Thread via GitHub
andygrove commented on PR #1398: URL: https://github.com/apache/datafusion-comet/pull/1398#issuecomment-2659469471 @kazuyukitanimura @comphead could I get a committer approval? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

Re: [PR] fix: Change default value of COMET_SCAN_ALLOW_INCOMPATIBLE and add documentation [datafusion-comet]

2025-02-14 Thread via GitHub
andygrove commented on code in PR #1398: URL: https://github.com/apache/datafusion-comet/pull/1398#discussion_r1956227422 ## docs/templates/compatibility-template.md: ## @@ -17,12 +17,43 @@ under the License. --> + + # Compatibility Guide Comet aims to provide consiste

Re: [PR] fix: Change default value of COMET_SCAN_ALLOW_INCOMPATIBLE and add documentation [datafusion-comet]

2025-02-14 Thread via GitHub
andygrove commented on PR #1398: URL: https://github.com/apache/datafusion-comet/pull/1398#issuecomment-2659468163 Thanks for the review @mbutrovich and @parthchandra -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use t

Re: [I] Datafusion binary size has been getting bigger [datafusion]

2025-02-14 Thread via GitHub
alamb commented on issue #13816: URL: https://github.com/apache/datafusion/issues/13816#issuecomment-2659471604 At a high level, I think this ticket has 2 parts: 1. Figure out what is contributing to code size increase 2. Then perhaps figure out how to make it better I think the

Re: [PR] hack out parqeut feature from datafusion-cli [datafusion]

2025-02-14 Thread via GitHub
alamb closed pull request #14666: hack out parqeut feature from datafusion-cli URL: https://github.com/apache/datafusion/pull/14666 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific commen

Re: [PR] Minor: remove confusing `update_plan_from_children` call from `EnforceSorting` [datafusion]

2025-02-14 Thread via GitHub
alamb commented on PR #14650: URL: https://github.com/apache/datafusion/pull/14650#issuecomment-2659552185 @wiedld can you please review this PR? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to t

Re: [PR] feat: add resolved `target` to `DmlStatement` (to eliminate need for table lookup after deserialization) [datafusion]

2025-02-14 Thread via GitHub
milenkovicm commented on code in PR #14631: URL: https://github.com/apache/datafusion/pull/14631#discussion_r1956211885 ## datafusion/sql/src/statement.rs: ## @@ -1709,18 +1709,22 @@ impl SqlToRel<'_, S> { // Do a table lookup to verify the table exists let tab

Re: [I] Test Rust 2024 Edition [datafusion]

2025-02-14 Thread via GitHub
Omega359 commented on issue #13631: URL: https://github.com/apache/datafusion/issues/13631#issuecomment-2659449432 I did a quick test using cargo fix --edition and there are a lot of concerns with drop ordering and if let scoping: ``` warning: `if let` assigns a shorter lif

[PR] Fix CI tests on main [datafusion]

2025-02-14 Thread via GitHub
alamb opened a new pull request, #14667: URL: https://github.com/apache/datafusion/pull/14667 ## Which issue does this PR close? A small logical conflict came in from - https://github.com/apache/datafusion/pull/12116 ## Rationale for this change CI is broken o

Re: [PR] Add union_extract scalar function [datafusion]

2025-02-14 Thread via GitHub
alamb commented on PR #12116: URL: https://github.com/apache/datafusion/pull/12116#issuecomment-2659540056 This PR caused a CI failure on main (due to a logical conflict). Small PR to fix: - https://github.com/apache/datafusion/pull/14667 -- This is an automated message from the Apach

[PR] Improve SQL Planner docs [datafusion]

2025-02-14 Thread via GitHub
alamb opened a new pull request, #14669: URL: https://github.com/apache/datafusion/pull/14669 ## Which issue does this PR close? - Part of #7013 ## Rationale for this change I have been listening to CMU's [A Journey Through Database Query Optimization](https://15799

Re: [PR] Fix CI doctests on main [datafusion]

2025-02-14 Thread via GitHub
alamb commented on PR #14667: URL: https://github.com/apache/datafusion/pull/14667#issuecomment-2659701690 Thanks @findepi -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] WIP : create `datafusion-datasource-avro` crate [datafusion]

2025-02-14 Thread via GitHub
getChan commented on PR #14651: URL: https://github.com/apache/datafusion/pull/14651#issuecomment-2659691325 I'll reopen when the datafusion-datasource crate is arranged. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

Re: [PR] Improve SQL Planner docs [datafusion]

2025-02-14 Thread via GitHub
alamb commented on code in PR #14669: URL: https://github.com/apache/datafusion/pull/14669#discussion_r1956364643 ## datafusion/sql/src/planner.rs: ## @@ -224,7 +224,24 @@ impl PlannerContext { } } -/// SQL query planner +/// SQL query planner and binder Review Comment:

Re: [PR] Improve SQL Planner docs [datafusion]

2025-02-14 Thread via GitHub
alamb commented on code in PR #14669: URL: https://github.com/apache/datafusion/pull/14669#discussion_r1956363618 ## datafusion/core/src/lib.rs: ## @@ -229,9 +229,9 @@ //! 1. The query string is parsed to an Abstract Syntax Tree (AST) //![`Statement`] using [sqlparser]. /

Re: [PR] WIP : create `datafusion-datasource-avro` crate [datafusion]

2025-02-14 Thread via GitHub
getChan closed pull request #14651: WIP : create `datafusion-datasource-avro` crate URL: https://github.com/apache/datafusion/pull/14651 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] feat: Add aggregate expression fuzz testing in CI [datafusion-comet]

2025-02-14 Thread via GitHub
andygrove commented on code in PR #1374: URL: https://github.com/apache/datafusion-comet/pull/1374#discussion_r1956385448 ## spark/src/test/scala/org/apache/spark/sql/CometTestBase.scala: ## @@ -116,12 +116,49 @@ abstract class CometTestBase require(absTol > 0 && absTol <=

Re: [I] function: `array_prepend` sometimes doesn't work with nested lists [datafusion]

2025-02-14 Thread via GitHub
jkosh44 commented on issue #14613: URL: https://github.com/apache/datafusion/issues/14613#issuecomment-2659528279 I just noticed a pattern about the errors. In the `array_prepend` function, `FixedSizeList` is coerced to a `List`. So all of the queries that succeed are coerced to the same in

Re: [PR] feat: add resolved `target` to `DmlStatement` (to eliminate need for table lookup after deserialization) [datafusion]

2025-02-14 Thread via GitHub
milenkovicm commented on PR #14631: URL: https://github.com/apache/datafusion/pull/14631#issuecomment-2659498586 I will push those small changes later, and will have a look if tests can be improved. Issue I have is that we can have one table ref but different table source if I'm not mist

Re: [PR] Dataframe with_column and with_column_renamed performance improvements [datafusion]

2025-02-14 Thread via GitHub
timsaucer commented on PR #14653: URL: https://github.com/apache/datafusion/pull/14653#issuecomment-2659395911 I suspect you're right about that assumption not being correct. I've dug through a bit, but I'd probably need to write up a unit test to verify. -- This is an automated message f

[PR] Improve docs `TableSource` and `DefaultTableSource` [datafusion]

2025-02-14 Thread via GitHub
alamb opened a new pull request, #14665: URL: https://github.com/apache/datafusion/pull/14665 ## Which issue does this PR close? - Part of #7013 ## Rationale for this change While reviewing https://github.com/apache/datafusion/pull/14631 from @milenkovicm I not

Re: [PR] Add union_extract scalar function [datafusion]

2025-02-14 Thread via GitHub
alamb merged PR #12116: URL: https://github.com/apache/datafusion/pull/12116 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] perf: Use DataFusion FilterExec for experimental native scans [datafusion-comet]

2025-02-14 Thread via GitHub
alamb commented on PR #1395: URL: https://github.com/apache/datafusion-comet/pull/1395#issuecomment-2660157127 Reusing DataFusion operators for the win! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

Re: [PR] fix: Reduce cast.rs and utils.rs logic from parquet_support.rs for experimental native scans [datafusion-comet]

2025-02-14 Thread via GitHub
kazuyukitanimura commented on PR #1387: URL: https://github.com/apache/datafusion-comet/pull/1387#issuecomment-2660195143 @parthchandra any other comments? otherwise I can merge this -- This is an automated message from the Apache Git Service. To respond to the message, please log on to G

[PR] bug: fix offset type mismatch when prepending lists [datafusion]

2025-02-14 Thread via GitHub
friendlymatthew opened a new pull request, #14672: URL: https://github.com/apache/datafusion/pull/14672 Closes #14613 `array_prepend` would error when attempting to concatenate certain `List` data types due to an incorrect offset type assumption. The error occurs because the impleme

Re: [PR] Update EnforceSorting docs. [datafusion]

2025-02-14 Thread via GitHub
wiedld commented on code in PR #14673: URL: https://github.com/apache/datafusion/pull/14673#discussion_r1956718287 ## datafusion/physical-optimizer/src/enforce_sorting/mod.rs: ## @@ -84,42 +84,56 @@ impl EnforceSorting { } } -/// This object is used within the [`EnforceS

Re: [PR] Minor: remove confusing `update_plan_from_children` call from `EnforceSorting` [datafusion]

2025-02-14 Thread via GitHub
wiedld commented on code in PR #14650: URL: https://github.com/apache/datafusion/pull/14650#discussion_r1956722551 ## datafusion/physical-optimizer/src/enforce_sorting/mod.rs: ## @@ -119,7 +119,7 @@ fn update_sort_ctx_children( } node.data = data; -node.update_pl

Re: [PR] Simple Functions Preview [datafusion]

2025-02-14 Thread via GitHub
findepi commented on PR #14668: URL: https://github.com/apache/datafusion/pull/14668#issuecomment-2660287208 i now also have support for various numeric ```rust #[excalibur_function] fn add(a: i32, b: u32) -> i64 { a as i64 + b as i64 } ``` nullable fu

Re: [PR] Minor: remove confusing `update_plan_from_children` call from `EnforceSorting` [datafusion]

2025-02-14 Thread via GitHub
wiedld commented on code in PR #14650: URL: https://github.com/apache/datafusion/pull/14650#discussion_r1956722551 ## datafusion/physical-optimizer/src/enforce_sorting/mod.rs: ## @@ -119,7 +119,7 @@ fn update_sort_ctx_children( } node.data = data; -node.update_pl

Re: [I] [DISCUSSION] 2025 Q1-Q2 Roadmap [datafusion]

2025-02-14 Thread via GitHub
comphead commented on issue #14580: URL: https://github.com/apache/datafusion/issues/14580#issuecomment-2660433774 I'll try to chase https://github.com/apache/datafusion/issues/13816 https://github.com/apache/datafusion/issues/14389 -- This is an automated message from the Apache Git

[PR] Speed up `uuid` UDF (20x faster) [datafusion]

2025-02-14 Thread via GitHub
simonvandel opened a new pull request, #14675: URL: https://github.com/apache/datafusion/pull/14675 ## Which issue does this PR close? N/A ## Rationale for this change It seems to be faster to generate random u128's in bulk, and then converting them to Uuids.

Re: [PR] fix: Change default value of COMET_SCAN_ALLOW_INCOMPATIBLE and add documentation [datafusion-comet]

2025-02-14 Thread via GitHub
kazuyukitanimura commented on code in PR #1398: URL: https://github.com/apache/datafusion-comet/pull/1398#discussion_r1956696728 ## spark/src/main/scala/org/apache/comet/DataTypeSupport.scala: ## @@ -37,7 +37,7 @@ trait DataTypeSupport { private def isGloballySupported(dt:

[PR] Update GitHub CI run image [datafusion]

2025-02-14 Thread via GitHub
findepi opened a new pull request, #14674: URL: https://github.com/apache/datafusion/pull/14674 GitHub runs include this warning The Ubuntu-20.04 brownout takes place from 2025-02-01. For more details, see https://github.com/actions/runner-images/issues/11101 Let's t

Re: [PR] test: Add experimental native scans to CometReadBenchmark [datafusion-comet]

2025-02-14 Thread via GitHub
mbutrovich commented on PR #1150: URL: https://github.com/apache/datafusion-comet/pull/1150#issuecomment-2660305494 > Sorry, one more question @mbutrovich do we need to add the DataFusion/IcebergCompat scans to `readerBenchmark` as well? That benchmark is more of a microbenchmark tha

Re: [PR] docs: Add changelog for 0.6.0 release [datafusion-comet]

2025-02-14 Thread via GitHub
andygrove merged PR #1402: URL: https://github.com/apache/datafusion-comet/pull/1402 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@

Re: [PR] feat: Add support for distinct aggregates [datafusion-comet]

2025-02-14 Thread via GitHub
andygrove closed pull request #1261: feat: Add support for distinct aggregates URL: https://github.com/apache/datafusion-comet/pull/1261 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific c

[PR] chore: Prepare for 0.7.0 development [datafusion-comet]

2025-02-14 Thread via GitHub
andygrove opened a new pull request, #1404: URL: https://github.com/apache/datafusion-comet/pull/1404 ## Which issue does this PR close? Closes #. ## Rationale for this change ## What changes are included in this PR? ## How are these changes

Re: [PR] chore: Prepare for 0.7.0 development [datafusion-comet]

2025-02-14 Thread via GitHub
andygrove commented on code in PR #1404: URL: https://github.com/apache/datafusion-comet/pull/1404#discussion_r1956753958 ## dev/diffs/3.4.3.diff: ## @@ -7,7 +7,7 @@ index d3544881af1..26ab186c65d 100644 2.5.1 2.0.8 +3.4 -+0.5.0-SNAPSHOT Review Comment:

Re: [PR] chore: Prepare for 0.7.0 development [datafusion-comet]

2025-02-14 Thread via GitHub
andygrove commented on code in PR #1404: URL: https://github.com/apache/datafusion-comet/pull/1404#discussion_r1956760901 ## dev/diffs/3.4.3.diff: ## @@ -7,7 +7,7 @@ index d3544881af1..26ab186c65d 100644 2.5.1 2.0.8 +3.4 -+0.5.0-SNAPSHOT Review Comment:

Re: [PR] fix: Change default value of COMET_SCAN_ALLOW_INCOMPATIBLE and add documentation [datafusion-comet]

2025-02-14 Thread via GitHub
andygrove commented on code in PR #1398: URL: https://github.com/apache/datafusion-comet/pull/1398#discussion_r1956762065 ## spark/src/main/scala/org/apache/comet/DataTypeSupport.scala: ## @@ -37,7 +37,7 @@ trait DataTypeSupport { private def isGloballySupported(dt: DataTyp

Re: [PR] chore: Prepare for 0.7.0 development [datafusion-comet]

2025-02-14 Thread via GitHub
andygrove commented on code in PR #1404: URL: https://github.com/apache/datafusion-comet/pull/1404#discussion_r1956760033 ## dev/diffs/3.4.3.diff: ## @@ -7,7 +7,7 @@ index d3544881af1..26ab186c65d 100644 2.5.1 2.0.8 +3.4 -+0.5.0-SNAPSHOT Review Comment:

Re: [PR] Minor: remove confusing `update_plan_from_children` call from `EnforceSorting` [datafusion]

2025-02-14 Thread via GitHub
alamb commented on code in PR #14650: URL: https://github.com/apache/datafusion/pull/14650#discussion_r1956785686 ## datafusion/physical-optimizer/src/enforce_sorting/replace_with_order_preserving_variants.rs: ## @@ -45,7 +45,7 @@ use itertools::izip; pub type OrderPreservation

Re: [PR] Minor: remove confusing `update_plan_from_children` call from `EnforceSorting` [datafusion]

2025-02-14 Thread via GitHub
alamb commented on PR #14650: URL: https://github.com/apache/datafusion/pull/14650#issuecomment-2660389460 FYI @ozankabak and @berkaysynnada -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the s

Re: [PR] Improve EnforceSorting docs. [datafusion]

2025-02-14 Thread via GitHub
alamb commented on code in PR #14673: URL: https://github.com/apache/datafusion/pull/14673#discussion_r1956781210 ## datafusion/physical-optimizer/src/enforce_sorting/mod.rs: ## @@ -151,10 +165,51 @@ fn update_coalesce_ctx_children( }; } -/// The boolean flag `repartitio

Re: [PR] chore: Prepare for 0.7.0 development [datafusion-comet]

2025-02-14 Thread via GitHub
codecov-commenter commented on PR #1404: URL: https://github.com/apache/datafusion-comet/pull/1404#issuecomment-2660420263 ## [Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/1404?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_ca

Re: [PR] fix: AQE creating a non-supported Final HashAggregate post-shuffle [datafusion-comet]

2025-02-14 Thread via GitHub
kazuyukitanimura commented on PR #1390: URL: https://github.com/apache/datafusion-comet/pull/1390#issuecomment-2660546336 #1389 mentioned https://github.com/apache/datafusion-comet/blob/f099e6e40aa18441c7882e5bffd9d6dfb10c6c19/spark/src/main/scala/org/apache/comet/CometSparkSessionExtens

Re: [PR] feat: add experimental remote HDFS support for native DataFusion reader [datafusion-comet]

2025-02-14 Thread via GitHub
comphead commented on code in PR #1359: URL: https://github.com/apache/datafusion-comet/pull/1359#discussion_r1956937048 ## native/core/src/execution/planner.rs: ## @@ -1155,12 +1154,9 @@ impl PhysicalPlanner { )) }); -let

Re: [PR] Speed up `uuid` UDF (20x faster) [datafusion]

2025-02-14 Thread via GitHub
comphead commented on PR #14675: URL: https://github.com/apache/datafusion/pull/14675#issuecomment-2660554270 @simonvandel I'd like to ask you to create a slt test for UUID(), I know it is non guaranteed output, but we can check the v4 validity format I suppose. -- This is an automated me

Re: [PR] bug: Fix memory reservation and allocation problems for SortExec [datafusion]

2025-02-14 Thread via GitHub
zhuqi-lucas commented on PR #14644: URL: https://github.com/apache/datafusion/pull/14644#issuecomment-2660771854 > > Thank you @kazuyukitanimura for the PR, i applied the PR try to fix the testing, but the above testing is still failed for me, i am not sure if i am missing something. >

Re: [PR] feat: Support On-Demand Repartition [datafusion]

2025-02-14 Thread via GitHub
Weijun-H commented on code in PR #14411: URL: https://github.com/apache/datafusion/pull/14411#discussion_r1957051847 ## datafusion/physical-plan/src/repartition/on_demand_repartition.rs: ## @@ -0,0 +1,1589 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or

Re: [PR] bug: Fix memory reservation and allocation problems for SortExec [datafusion]

2025-02-14 Thread via GitHub
zhuqi-lucas commented on code in PR #14644: URL: https://github.com/apache/datafusion/pull/14644#discussion_r1957051337 ## datafusion/physical-plan/src/sorts/sort.rs: ## @@ -302,31 +299,16 @@ impl ExternalSorter { } self.reserve_memory_for_merge()?; -

Re: [PR] feat: Support On-Demand Repartition [datafusion]

2025-02-14 Thread via GitHub
Weijun-H commented on code in PR #14411: URL: https://github.com/apache/datafusion/pull/14411#discussion_r1957051847 ## datafusion/physical-plan/src/repartition/on_demand_repartition.rs: ## @@ -0,0 +1,1589 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or

[I] Feature request: hermetic build [datafusion]

2025-02-14 Thread via GitHub
dentiny opened a new issue, #14678: URL: https://github.com/apache/datafusion/issues/14678 ### Is your feature request related to a problem or challenge? When I was building datafusion for the first time (with `cargo test`), I met an error: ``` error: failed to run custom build

  1   2   3   >