Re: [I] Failed optimizations with Int64 type [datafusion]

2025-03-18 Thread via GitHub
aectaan commented on issue #15291: URL: https://github.com/apache/datafusion/issues/15291#issuecomment-2735534960 That's interesting, same request performed via datafusion-cli returns without error that test have -- This is an automated message from the Apache Git Service. To respond to t

[PR] perf: Reuse row converter during sort [datafusion]

2025-03-18 Thread via GitHub
2010YOUY01 opened a new pull request, #15302: URL: https://github.com/apache/datafusion/pull/15302 ## Which issue does this PR close? This is a refactor towards https://github.com/apache/datafusion/issues/14748 and https://github.com/apache/datafusion/issues/7053 ##

[PR] fix: check if handle has been initialized before closing [datafusion-comet]

2025-03-18 Thread via GitHub
wForget opened a new pull request, #1554: URL: https://github.com/apache/datafusion-comet/pull/1554 ## Which issue does this PR close? Part of #1553 ## Rationale for this change Check that the handle is initialized before closing to avoid native thread panic.

Re: [PR] feat: support merge for `Distribution` [datafusion]

2025-03-18 Thread via GitHub
xudong963 commented on code in PR #15296: URL: https://github.com/apache/datafusion/pull/15296#discussion_r2002526236 ## datafusion/expr-common/src/statistics.rs: ## @@ -203,6 +203,121 @@ impl Distribution { }; Ok(dt) } + +/// Merges two distributions

Re: [PR] docs: various improvements to tuning guide [datafusion-comet]

2025-03-18 Thread via GitHub
kazuyukitanimura commented on code in PR #1525: URL: https://github.com/apache/datafusion-comet/pull/1525#discussion_r2002490500 ## docs/source/user-guide/tuning.md: ## @@ -17,18 +17,96 @@ specific language governing permissions and limitations under the License. --> -# Tuni

Re: [PR] feat: support merge for `Distribution` [datafusion]

2025-03-18 Thread via GitHub
xudong963 commented on code in PR #15296: URL: https://github.com/apache/datafusion/pull/15296#discussion_r2002503571 ## datafusion/expr-common/src/statistics.rs: ## @@ -203,6 +203,121 @@ impl Distribution { }; Ok(dt) } + +/// Merges two distributions

Re: [PR] docs: various improvements to tuning guide [datafusion-comet]

2025-03-18 Thread via GitHub
kazuyukitanimura commented on code in PR #1525: URL: https://github.com/apache/datafusion-comet/pull/1525#discussion_r2002493113 ## docs/source/user-guide/tuning.md: ## @@ -17,18 +17,96 @@ specific language governing permissions and limitations under the License. --> -# Tuni

Re: [PR] docs: various improvements to tuning guide [datafusion-comet]

2025-03-18 Thread via GitHub
kazuyukitanimura commented on code in PR #1525: URL: https://github.com/apache/datafusion-comet/pull/1525#discussion_r2002493113 ## docs/source/user-guide/tuning.md: ## @@ -17,18 +17,96 @@ specific language governing permissions and limitations under the License. --> -# Tuni

Re: [PR] docs: various improvements to tuning guide [datafusion-comet]

2025-03-18 Thread via GitHub
kazuyukitanimura commented on code in PR #1525: URL: https://github.com/apache/datafusion-comet/pull/1525#discussion_r2002475656 ## common/src/main/scala/org/apache/comet/CometConf.scala: ## @@ -274,11 +272,9 @@ object CometConf extends ShimCometConf { .createWithDefault(

Re: [I] Enable hdfs test(s) in ci [datafusion-comet]

2025-03-18 Thread via GitHub
wForget commented on issue #1515: URL: https://github.com/apache/datafusion-comet/issues/1515#issuecomment-2735439696 > but it seems that some native threads are hung, causing jvm process to be unable to exit. After removing `try_spawn_blocking` feature, it works fine. ![Imag

Re: [I] Parse MySQL `ALGORITHM` and `LOCK` options to `ALTER TABLE` [datafusion-sqlparser-rs]

2025-03-18 Thread via GitHub
iffyio closed issue #1665: Parse MySQL `ALGORITHM` and `LOCK` options to `ALTER TABLE` URL: https://github.com/apache/datafusion-sqlparser-rs/issues/1665 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [PR] minor: fix `data/sqlite` link [datafusion]

2025-03-18 Thread via GitHub
Weijun-H commented on code in PR #15286: URL: https://github.com/apache/datafusion/pull/15286#discussion_r2000123298 ## datafusion/sqllogictest/README.md: ## @@ -28,7 +28,7 @@ This crate is a submodule of DataFusion that contains an implementation of [sqll ## Overview This

Re: [PR] Add dynamic pruning filters from TopK state [datafusion]

2025-03-18 Thread via GitHub
adriangb commented on PR #15301: URL: https://github.com/apache/datafusion/pull/15301#issuecomment-2735413403 cc @alamb -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [PR] Add dynamic pruning filters from TopK state [datafusion]

2025-03-18 Thread via GitHub
adriangb commented on PR #15301: URL: https://github.com/apache/datafusion/pull/15301#issuecomment-2735408031 Tomorrow I plan on doing some tracer bullet testing to see if this approach works at all. -- This is an automated message from the Apache Git Service. To respond to the message, p

Re: [PR] Add dynamic pruning filters from TopK state [datafusion]

2025-03-18 Thread via GitHub
adriangb commented on PR #15301: URL: https://github.com/apache/datafusion/pull/15301#issuecomment-2735401711 Inspired by discussion in https://github.com/apache/datafusion/pull/13054 I went with adding this to `ExecutionPlan`. -- This is an automated message from the Apache Git Service.

Re: [PR] feat: support merge for `Distribution` [datafusion]

2025-03-18 Thread via GitHub
kosiew commented on code in PR #15296: URL: https://github.com/apache/datafusion/pull/15296#discussion_r2002421418 ## datafusion/expr-common/src/statistics.rs: ## @@ -203,6 +203,121 @@ impl Distribution { }; Ok(dt) } + +/// Merges two distributions int

[PR] Add dynamic pruning filters from TopK state [datafusion]

2025-03-18 Thread via GitHub
adriangb opened a new pull request, #15301: URL: https://github.com/apache/datafusion/pull/15301 Closes #15037 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscri

[I] Failed optimizations with Int64 type [datafusion]

2025-03-18 Thread via GitHub
aectaan opened a new issue, #15291: URL: https://github.com/apache/datafusion/issues/15291 ### Describe the bug Datafusion optimizer produce different behaviour with different types of arguments in request. Also behaviour is dependent on positions of arguments. ### To Reproduce

Re: [PR] Add dynamic pruning filters from TopK state [datafusion]

2025-03-18 Thread via GitHub
adriangb commented on PR #15301: URL: https://github.com/apache/datafusion/pull/15301#issuecomment-2735316127 Made a bit of progress on this... I think the general idea of sharing the state is there. The nice thing is that this mechanism can be used to push down other dynamic filters (joins

Re: [PR] Support logic optimize rule to pass the case that Utf8view datatype combined with Utf8 datatype [datafusion]

2025-03-18 Thread via GitHub
zhuqi-lucas commented on PR #15239: URL: https://github.com/apache/datafusion/pull/15239#issuecomment-2735274216 Thank you @alamb and @xudong963 for review! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [I] Support default values for columns in SchemaAdapter [datafusion]

2025-03-18 Thread via GitHub
alamb commented on issue #15220: URL: https://github.com/apache/datafusion/issues/15220#issuecomment-2729446009 > From conversation with Andrew a couple days ago he mentioned this was an open feature request however I could not find an issue. @alamb do you remember who else was asking for t

[I] Unsupported OS/arch [datafusion-comet]

2025-03-18 Thread via GitHub
jinwenjie123 opened a new issue, #1552: URL: https://github.com/apache/datafusion-comet/issues/1552 ### What is the problem the feature request solves? I am running a benchmark to evaluate the performance of Spark 3.4 with Comet in an AWS environment. I built the Comet JAR on my local

Re: [I] Support `merge` for `Distribution` [datafusion]

2025-03-18 Thread via GitHub
alamb commented on issue #15290: URL: https://github.com/apache/datafusion/issues/15290#issuecomment-2732107381 > I'm working on the ticket: https://github.com/apache/datafusion/issues/10316. > Create a function that combines their statistical properties into a new distribution. The

Re: [PR] Improve performance of `first_value` by implementing special `GroupsAccumulator` [datafusion]

2025-03-18 Thread via GitHub
2010YOUY01 commented on code in PR #15266: URL: https://github.com/apache/datafusion/pull/15266#discussion_r2002365042 ## datafusion/functions-aggregate/src/first_last.rs: ## @@ -179,6 +292,419 @@ impl AggregateUDFImpl for FirstValue { } } +struct FirstPrimitiveGroupsAcc

Re: [I] fixed size list type is not retained when writing to parquet [datafusion-python]

2025-03-18 Thread via GitHub
kosiew commented on issue #957: URL: https://github.com/apache/datafusion-python/issues/957#issuecomment-2735264159 I tested this with the current `main` branch and roundtrip schema is returning `array: fixed_size_list[2]` too. ``` >>> import datafusion as df >>> import pyarrow

Re: [I] Release DataFusion `47.0.0` (April 2025) [datafusion]

2025-03-18 Thread via GitHub
shehabgamin commented on issue #15072: URL: https://github.com/apache/datafusion/issues/15072#issuecomment-2735261314 I feel like this may be important enough to try to get into the release. Does anyone else have thoughts? https://github.com/apache/datafusion/issues/15174 -- This i

Re: [PR] feat: enable iceberg compat tests, more tests for complex types [datafusion-comet]

2025-03-18 Thread via GitHub
codecov-commenter commented on PR #1550: URL: https://github.com/apache/datafusion-comet/pull/1550#issuecomment-2735173935 ## [Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/1550?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_ca

Re: [I] Improve representation of `LIKE ALL` and variants [datafusion-sqlparser-rs]

2025-03-18 Thread via GitHub
mvzink commented on issue #1770: URL: https://github.com/apache/datafusion-sqlparser-rs/issues/1770#issuecomment-2734987324 Also note that `LIKE ANY` etc. are handled with a special `any` flag on `Expr::Like`. It would also make things more uniform to do the same with `all`, but one of the

Re: [PR] feat: add read array support [datafusion-comet]

2025-03-18 Thread via GitHub
kazuyukitanimura commented on code in PR #1456: URL: https://github.com/apache/datafusion-comet/pull/1456#discussion_r2001465890 ## spark/src/main/scala/org/apache/comet/serde/QueryPlanSerde.scala: ## @@ -61,13 +61,15 @@ object QueryPlanSerde extends Logging with ShimQueryPlanS

Re: [PR] Use `any` instead of `for_each` [datafusion]

2025-03-18 Thread via GitHub
xudong963 commented on PR #15289: URL: https://github.com/apache/datafusion/pull/15289#issuecomment-2732041851 > Instead of either "try for all" or "skip at all", isn't better to only go over the columns which has statistics.is_some() ? Do you mean the following? ```rust // Inst

[PR] wip: Update benchmark results for 0.7.0 release [datafusion-comet]

2025-03-18 Thread via GitHub
andygrove opened a new pull request, #1548: URL: https://github.com/apache/datafusion-comet/pull/1548 ## Which issue does this PR close? Closes #. ## Rationale for this change ## What changes are included in this PR? ## How are these changes

Re: [PR] Improve performance of `first_value` by implementing special `GroupsAccumulator` [datafusion]

2025-03-18 Thread via GitHub
UBarney commented on code in PR #15266: URL: https://github.com/apache/datafusion/pull/15266#discussion_r2002297890 ## datafusion/functions-aggregate/src/first_last.rs: ## @@ -179,6 +292,424 @@ impl AggregateUDFImpl for FirstValue { } } +struct FirstPrimitiveGroupsAccumu

Re: [PR] feat: support merge for `Distribution` [datafusion]

2025-03-18 Thread via GitHub
xudong963 commented on PR #15296: URL: https://github.com/apache/datafusion/pull/15296#issuecomment-2735210417 > I think eventually it would be nice to add some tests for this code Yes, as the ticket description said: I'll do it after we are consistent. -- This is an automated messa

Re: [I] [DISCUSS] Release DataFusion `46.0.1` Patch or `46.1.0` minor release (March 2025) [datafusion]

2025-03-18 Thread via GitHub
xudong963 commented on issue #15151: URL: https://github.com/apache/datafusion/issues/15151#issuecomment-2735217322 Just a reminder, we can do a final release today, it seems to require a PMC member to do the last steps. cc@alamb. -- This is an automated message from the Apache Git Servic

Re: [PR] feat: support merge for `Distribution` [datafusion]

2025-03-18 Thread via GitHub
xudong963 commented on code in PR #15296: URL: https://github.com/apache/datafusion/pull/15296#discussion_r2002299377 ## datafusion/expr-common/src/statistics.rs: ## @@ -857,6 +857,143 @@ pub fn compute_variance( ScalarValue::try_from(target_type) } +/// Merges two distr

Re: [PR] feat: support merge for `Distribution` [datafusion]

2025-03-18 Thread via GitHub
xudong963 commented on code in PR #15296: URL: https://github.com/apache/datafusion/pull/15296#discussion_r2002335941 ## datafusion/expr-common/src/statistics.rs: ## @@ -857,6 +857,143 @@ pub fn compute_variance( ScalarValue::try_from(target_type) } +/// Merges two distr

Re: [I] Investigate TPC-H q4 hanging when not enough memory is allocated [datafusion-comet]

2025-03-18 Thread via GitHub
Kontinuation commented on issue #1523: URL: https://github.com/apache/datafusion-comet/issues/1523#issuecomment-2735209828 The query blocked because we don't have enough number of blocking threads configured for the tokio runtime. In merge phase, each spill file will be wrapped by a

Re: [I] [EPIC] Attach `Diagnostic` to more errors [datafusion]

2025-03-18 Thread via GitHub
jsai28 commented on issue #14429: URL: https://github.com/apache/datafusion/issues/14429#issuecomment-2735199806 Hi, is this still a potential GSoC project? It looks like many of the tickets in this epic have an open pull request and are close to completion. If you know of any other areas t

Re: [I] [EPIC] A collection of tickets for improved WASM support in DataFusion [datafusion]

2025-03-18 Thread via GitHub
matthewmturner commented on issue #13815: URL: https://github.com/apache/datafusion/issues/13815#issuecomment-2735185316 For the WASM UDFs they just need some more real world testing / benchmarking. To be honest, the other points @alamb mentioned would probably better benefit the DataFusio

Re: [PR] CI Red: Fix union in view table test [datafusion]

2025-03-18 Thread via GitHub
jonahgao merged PR #15300: URL: https://github.com/apache/datafusion/pull/15300 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

Re: [PR] CI Red: Fix union in view table test [datafusion]

2025-03-18 Thread via GitHub
jonahgao commented on code in PR #15300: URL: https://github.com/apache/datafusion/pull/15300#discussion_r2002313134 ## datafusion/sqllogictest/test_files/union.slt: ## @@ -907,11 +907,56 @@ SELECT * FROM (SELECT y FROM u1 UNION ALL SELECT y FROM u2) ORDER BY y; 20 40 +quer

Re: [PR] feat: support merge for `Distribution` [datafusion]

2025-03-18 Thread via GitHub
xudong963 commented on code in PR #15296: URL: https://github.com/apache/datafusion/pull/15296#discussion_r2002309255 ## datafusion/expr-common/src/statistics.rs: ## @@ -857,6 +857,143 @@ pub fn compute_variance( ScalarValue::try_from(target_type) } +/// Merges two distr

Re: [PR] feat: support merge for `Distribution` [datafusion]

2025-03-18 Thread via GitHub
xudong963 commented on code in PR #15296: URL: https://github.com/apache/datafusion/pull/15296#discussion_r2002299377 ## datafusion/expr-common/src/statistics.rs: ## @@ -857,6 +857,143 @@ pub fn compute_variance( ScalarValue::try_from(target_type) } +/// Merges two distr

Re: [PR] Improve performance of `first_value` by implementing special `GroupsAccumulator` [datafusion]

2025-03-18 Thread via GitHub
UBarney commented on code in PR #15266: URL: https://github.com/apache/datafusion/pull/15266#discussion_r2002297890 ## datafusion/functions-aggregate/src/first_last.rs: ## @@ -179,6 +292,424 @@ impl AggregateUDFImpl for FirstValue { } } +struct FirstPrimitiveGroupsAccumu

Re: [PR] Improve performance of `first_value` by implementing special `GroupsAccumulator` [datafusion]

2025-03-18 Thread via GitHub
UBarney commented on code in PR #15266: URL: https://github.com/apache/datafusion/pull/15266#discussion_r2002297890 ## datafusion/functions-aggregate/src/first_last.rs: ## @@ -179,6 +292,424 @@ impl AggregateUDFImpl for FirstValue { } } +struct FirstPrimitiveGroupsAccumu

Re: [PR] Add WITH ORDER example to blog post [datafusion-site]

2025-03-18 Thread via GitHub
Omega359 commented on code in PR #59: URL: https://github.com/apache/datafusion-site/pull/59#discussion_r2002097220 ## content/blog/2025-03-11-ordering-analysis.md: ## @@ -291,6 +291,53 @@ Following third and fourth constraints for the simplified table, the succinct va `[time_

Re: [PR] feat: support merge for `Distribution` [datafusion]

2025-03-18 Thread via GitHub
xudong963 commented on code in PR #15296: URL: https://github.com/apache/datafusion/pull/15296#discussion_r2002296888 ## datafusion/expr-common/src/statistics.rs: ## @@ -857,6 +857,143 @@ pub fn compute_variance( ScalarValue::try_from(target_type) } +/// Merges two distr

[PR] feat: enable iceberg compat tests, more tests for complex types [datafusion-comet]

2025-03-18 Thread via GitHub
comphead opened a new pull request, #1550: URL: https://github.com/apache/datafusion-comet/pull/1550 ## Which issue does this PR close? Part of #1454 . ## Rationale for this change Enable Iceberg compat type tests, add more tests for complex types ## What

Re: [PR] Improve performance of `first_value` by implementing special `GroupsAccumulator` [datafusion]

2025-03-18 Thread via GitHub
UBarney commented on code in PR #15266: URL: https://github.com/apache/datafusion/pull/15266#discussion_r2002243574 ## datafusion/functions-aggregate/src/first_last.rs: ## @@ -179,6 +292,424 @@ impl AggregateUDFImpl for FirstValue { } } +struct FirstPrimitiveGroupsAccumu

Re: [PR] Support binary temporal arithmetic with integers [datafusion]

2025-03-18 Thread via GitHub
github-actions[bot] commented on PR #13741: URL: https://github.com/apache/datafusion/pull/13741#issuecomment-2735135388 Thank you for your contribution. Unfortunately, this pull request is stale because it has been open 60 days with no activity. Please remove the stale label or comment or

Re: [PR] Always add round robin repartitioning to leaves (data sources), benefitting unbalanced / small datasets [datafusion]

2025-03-18 Thread via GitHub
github-actions[bot] commented on PR #13707: URL: https://github.com/apache/datafusion/pull/13707#issuecomment-2735135422 Thank you for your contribution. Unfortunately, this pull request is stale because it has been open 60 days with no activity. Please remove the stale label or comment or

Re: [I] Failed optimizations with Int64 type [datafusion]

2025-03-18 Thread via GitHub
qazxcdswe123 commented on issue #15291: URL: https://github.com/apache/datafusion/issues/15291#issuecomment-2735134447 @alamb On head commit. ```csv col_int32,col_int64,col_uint32,col_utf8 1,1,0,a 2,2,1,b 3,3,2,c 4,4,3,d 5,5,4,e 6,6,5,f 7,7,6,g 8,8,7,h 9,9

Re: [PR] Improve collection during repr and repr_html [datafusion-python]

2025-03-18 Thread via GitHub
kosiew commented on code in PR #1036: URL: https://github.com/apache/datafusion-python/pull/1036#discussion_r2002243419 ## python/tests/test_dataframe.py: ## @@ -1191,13 +1192,17 @@ def add_with_parameter(df_internal, value: Any) -> DataFrame: def test_dataframe_repr_html(df)

Re: [PR] Improve collection during repr and repr_html [datafusion-python]

2025-03-18 Thread via GitHub
kosiew commented on code in PR #1036: URL: https://github.com/apache/datafusion-python/pull/1036#discussion_r2002243419 ## python/tests/test_dataframe.py: ## @@ -1191,13 +1192,17 @@ def add_with_parameter(df_internal, value: Any) -> DataFrame: def test_dataframe_repr_html(df)

Re: [PR] feat: enable iceberg compat tests, more tests for complex types [datafusion-comet]

2025-03-18 Thread via GitHub
comphead commented on code in PR #1550: URL: https://github.com/apache/datafusion-comet/pull/1550#discussion_r2002258552 ## spark/src/test/scala/org/apache/comet/exec/CometNativeReaderSuite.scala: ## @@ -63,4 +61,67 @@ class CometNativeReaderSuite extends CometTestBase with Ada

Re: [I] feat: fix schema issues for `native reader - read STRUCT of ARRAY fields` [datafusion-comet]

2025-03-18 Thread via GitHub
comphead commented on issue #1551: URL: https://github.com/apache/datafusion-comet/issues/1551#issuecomment-2735107639 Both datafusion and iceberg compat fails on ``` native reader - read STRUCT of ARRAY fields - native_datafusion *** FAILED *** (191 milliseconds) org.apache.sp

Re: [PR] feat: enable iceberg compat tests, more tests for complex types [datafusion-comet]

2025-03-18 Thread via GitHub
comphead commented on code in PR #1550: URL: https://github.com/apache/datafusion-comet/pull/1550#discussion_r2002258552 ## spark/src/test/scala/org/apache/comet/exec/CometNativeReaderSuite.scala: ## @@ -63,4 +61,67 @@ class CometNativeReaderSuite extends CometTestBase with Ada

Re: [PR] feat: implement scripts for binary release build [datafusion-comet]

2025-03-18 Thread via GitHub
parthchandra commented on PR #932: URL: https://github.com/apache/datafusion-comet/pull/932#issuecomment-2735075699 > @parthchandra Consulting a question: In the current compilation script `dev/release/build-release-comet.sh`, the final invocation of the compilation command is `core-amd64-l

Re: [PR] docs: various improvements to tuning guide [datafusion-comet]

2025-03-18 Thread via GitHub
parthchandra commented on code in PR #1525: URL: https://github.com/apache/datafusion-comet/pull/1525#discussion_r2002219014 ## docs/source/user-guide/tuning.md: ## @@ -141,30 +191,22 @@ It must be set before the Spark context is created. You can enable or disable Co at runtim

Re: [PR] feat: simplify regex wildcard pattern [datafusion]

2025-03-18 Thread via GitHub
jayzhan211 commented on code in PR #15299: URL: https://github.com/apache/datafusion/pull/15299#discussion_r2002218628 ## datafusion/optimizer/src/simplify_expressions/regex.rs: ## @@ -43,6 +45,23 @@ pub fn simplify_regex_expr( let mode = OperatorMode::new(&op); if l

Re: [PR] Add WITH ORDER example to blog post [datafusion-site]

2025-03-18 Thread via GitHub
akurmustafa commented on PR #59: URL: https://github.com/apache/datafusion-site/pull/59#issuecomment-2734727273 > Thanks @akurmustafa > > I noticed some small format issues when reviewing this so I pushed [e10c17f](https://github.com/apache/datafusion-site/commit/e10c17f0f99b83d8e15cd

Re: [PR] Fix union in view table test [datafusion]

2025-03-18 Thread via GitHub
jayzhan211 commented on code in PR #15300: URL: https://github.com/apache/datafusion/pull/15300#discussion_r2002213580 ## datafusion/sqllogictest/test_files/union.slt: ## @@ -907,11 +907,56 @@ SELECT * FROM (SELECT y FROM u1 UNION ALL SELECT y FROM u2) ORDER BY y; 20 40 +qu

Re: [PR] Fix union in view table test [datafusion]

2025-03-18 Thread via GitHub
jayzhan211 commented on code in PR #15300: URL: https://github.com/apache/datafusion/pull/15300#discussion_r2002215012 ## datafusion/sqllogictest/test_files/union.slt: ## @@ -907,11 +907,56 @@ SELECT * FROM (SELECT y FROM u1 UNION ALL SELECT y FROM u2) ORDER BY y; 20 40 +qu

Re: [I] Doc: Add an example how to test Comet in K8s against user define query [datafusion-comet]

2025-03-18 Thread via GitHub
comphead commented on issue #1546: URL: https://github.com/apache/datafusion-comet/issues/1546#issuecomment-2734719437 Apache Spark also relies on SparkPI example in their documentation https://spark.apache.org/docs/3.5.4/running-on-kubernetes.html -- This is an automated message from th

Re: [PR] Fix union in view table test [datafusion]

2025-03-18 Thread via GitHub
jayzhan211 commented on code in PR #15300: URL: https://github.com/apache/datafusion/pull/15300#discussion_r2002212215 ## datafusion/sqllogictest/test_files/union.slt: ## @@ -907,11 +907,56 @@ SELECT * FROM (SELECT y FROM u1 UNION ALL SELECT y FROM u2) ORDER BY y; 20 40 +qu

[PR] Fix union in view table test [datafusion]

2025-03-18 Thread via GitHub
jayzhan211 opened a new pull request, #15300: URL: https://github.com/apache/datafusion/pull/15300 ## Which issue does this PR close? - Closes #. ## Rationale for this change ## What changes are included in this PR? ## Are these changes test

Re: [PR] Remove inline table scan analyzer rule [datafusion]

2025-03-18 Thread via GitHub
jayzhan211 merged PR #15201: URL: https://github.com/apache/datafusion/pull/15201 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dat

Re: [PR] feat: support merge for `Distribution` [datafusion]

2025-03-18 Thread via GitHub
alamb commented on code in PR #15296: URL: https://github.com/apache/datafusion/pull/15296#discussion_r2002037023 ## datafusion/expr-common/src/statistics.rs: ## @@ -857,6 +857,143 @@ pub fn compute_variance( ScalarValue::try_from(target_type) } +/// Merges two distribut

Re: [I] Migrate datasource tests to `insta` [datafusion]

2025-03-18 Thread via GitHub
sreshu commented on issue #15246: URL: https://github.com/apache/datafusion/issues/15246#issuecomment-2734911692 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To uns

Re: [PR] Add WITH ORDER example to blog post [datafusion-site]

2025-03-18 Thread via GitHub
Omega359 commented on code in PR #59: URL: https://github.com/apache/datafusion-site/pull/59#discussion_r2002096546 ## content/blog/2025-03-11-ordering-analysis.md: ## @@ -291,6 +291,53 @@ Following third and fourth constraints for the simplified table, the succinct va `[time_

Re: [PR] Migrate tests to insta [datafusion]

2025-03-18 Thread via GitHub
jsai28 commented on code in PR #15288: URL: https://github.com/apache/datafusion/pull/15288#discussion_r2001378037 ## datafusion/core/tests/parquet/schema.rs: ## @@ -82,7 +69,18 @@ async fn schema_merge_ignores_metadata_by_default() { .unwrap(); let actual = df.col

[PR] feat: simplify regex wildcard pattern [datafusion]

2025-03-18 Thread via GitHub
waynexia opened a new pull request, #15299: URL: https://github.com/apache/datafusion/pull/15299 ## Which issue does this PR close? - Closes #. ## Rationale for this change Simplify dump regex cases like `~ '.*'` or `!~ '.*'`. ## What changes are in

Re: [PR] Migrate tests to insta [datafusion]

2025-03-18 Thread via GitHub
jsai28 commented on code in PR #15288: URL: https://github.com/apache/datafusion/pull/15288#discussion_r2001384839 ## datafusion/core/tests/sql/path_partition.rs: ## @@ -275,18 +267,7 @@ async fn csv_filter_with_file_col() -> Result<()> { .collect() .await?;

Re: [PR] Migrate tests to insta [datafusion]

2025-03-18 Thread via GitHub
jsai28 commented on code in PR #15288: URL: https://github.com/apache/datafusion/pull/15288#discussion_r2001386223 ## datafusion/core/tests/sql/path_partition.rs: ## @@ -390,15 +349,7 @@ async fn csv_grouping_by_partition() -> Result<()> { .collect() .await?;

Re: [PR] Migrate tests to insta [datafusion]

2025-03-18 Thread via GitHub
jsai28 commented on code in PR #15288: URL: https://github.com/apache/datafusion/pull/15288#discussion_r2001376856 ## datafusion/core/tests/parquet/custom_reader.rs: ## @@ -96,17 +97,15 @@ async fn route_data_access_ops_to_parquet_file_reader_factory() { let task_ctx = ses

Re: [PR] Add GLOBAL context/modifier to SET statements [datafusion-sqlparser-rs]

2025-03-18 Thread via GitHub
MohamedAbdeen21 commented on code in PR #1767: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1767#discussion_r2001991786 ## src/ast/mod.rs: ## @@ -7919,11 +7921,28 @@ impl fmt::Display for ContextModifier { write!(f, "") }

[I] Improve representation of `LIKE ALL` and variants [datafusion-sqlparser-rs]

2025-03-18 Thread via GitHub
mvzink opened a new issue, #1770: URL: https://github.com/apache/datafusion-sqlparser-rs/issues/1770 Currently, expressions such as `... LIKE ALL(...)` are parsed as an `Expr::Like` with the `pattern` being an `Expr::Function` with a `name` of `"ALL"`. It seems preferable to parse them as a

Re: [I] [EPIC] Redesign DataFusion main page [datafusion]

2025-03-18 Thread via GitHub
sreshu commented on issue #14389: URL: https://github.com/apache/datafusion/issues/14389#issuecomment-2734913948 Are you thinking of more modifications @alamb ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL a

Re: [PR] Remove inline table scan analyzer rule [datafusion]

2025-03-18 Thread via GitHub
jayzhan211 commented on PR #15201: URL: https://github.com/apache/datafusion/pull/15201#issuecomment-2734924833 Thanks @alamb -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment

Re: [I] Add Memory Profiling Functionality [datafusion]

2025-03-18 Thread via GitHub
comphead commented on issue #14510: URL: https://github.com/apache/datafusion/issues/14510#issuecomment-2734639666 The PR in arrow rs to avoid mem overcount for shared buffers https://github.com/apache/arrow-rs/pull/7303 -- This is an automated message from the Apache Git Service. To resp

Re: [PR] Add WITH ORDER example to blog post [datafusion-site]

2025-03-18 Thread via GitHub
alamb commented on PR #59: URL: https://github.com/apache/datafusion-site/pull/59#issuecomment-2734731781 > Thanks @alamb, I was working on to add the example you gave ("DataFusion can find / use orderings based on query intermediates"). Should we add this to the document what do you think?

Re: [I] Blog for DataFusion 46.0.0 [datafusion]

2025-03-18 Thread via GitHub
alamb commented on issue #15053: URL: https://github.com/apache/datafusion/issues/15053#issuecomment-2734564305 Thanks @berkaysynnada In general I suggest emphasizing things that many users of the crate will see / appreciate and mentioning, but not too deeply, things that developers

Re: [PR] docs: Use a shallow clone for Spark SQL test instructions [datafusion-comet]

2025-03-18 Thread via GitHub
andygrove merged PR #1547: URL: https://github.com/apache/datafusion-comet/pull/1547 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@

Re: [PR] feat: add read array support [datafusion-comet]

2025-03-18 Thread via GitHub
andygrove commented on code in PR #1456: URL: https://github.com/apache/datafusion-comet/pull/1456#discussion_r2002089283 ## native/core/src/execution/planner.rs: ## @@ -3004,4 +3006,130 @@ mod tests { type_info: None, } } + +#[test] +fn test_c

Re: [PR] feat: add read array support [datafusion-comet]

2025-03-18 Thread via GitHub
comphead merged PR #1456: URL: https://github.com/apache/datafusion-comet/pull/1456 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@d

Re: [PR] feat: add read array support [datafusion-comet]

2025-03-18 Thread via GitHub
comphead commented on PR #1456: URL: https://github.com/apache/datafusion-comet/pull/1456#issuecomment-2734862629 Thanks everyone -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific co

Re: [PR] Add WITH ORDER example to blog post [datafusion-site]

2025-03-18 Thread via GitHub
akurmustafa commented on PR #59: URL: https://github.com/apache/datafusion-site/pull/59#issuecomment-2734859868 With the [commit](https://github.com/apache/datafusion-site/pull/59/commits/85eea6a572f95972a155ee9926319112e7149ce8), I have added the @alamb's suggestion to the post. -- This

Re: [PR] Add WITH ORDER example to blog post [datafusion-site]

2025-03-18 Thread via GitHub
Omega359 commented on code in PR #59: URL: https://github.com/apache/datafusion-site/pull/59#discussion_r2002099018 ## content/images/ordering_analysis/query_window_plan.png: ## Review Comment: At the output of the window function the table has the ordering: -- This is

Re: [PR] feat: add read array support [datafusion-comet]

2025-03-18 Thread via GitHub
andygrove commented on code in PR #1456: URL: https://github.com/apache/datafusion-comet/pull/1456#discussion_r2002090347 ## spark/src/main/scala/org/apache/comet/serde/QueryPlanSerde.scala: ## @@ -763,7 +766,8 @@ object QueryPlanSerde extends Logging with ShimQueryPlanSerde wi

Re: [I] Doc: Add an example how to test Comet in K8s against user defined query [datafusion-comet]

2025-03-18 Thread via GitHub
comphead commented on issue #1546: URL: https://github.com/apache/datafusion-comet/issues/1546#issuecomment-2734792143 Spark-PI example is based on in memory RDD so Comet Scan cannot be tested, we need to prepare our own test case which reads data from local source and check the logs conta

Re: [PR] Simplify display format of `AggregateFunctionExpr`, add `Expr::sql_name` [datafusion]

2025-03-18 Thread via GitHub
alamb commented on code in PR #15253: URL: https://github.com/apache/datafusion/pull/15253#discussion_r2001842580 ## datafusion/expr/src/expr.rs: ## @@ -64,6 +64,15 @@ use sqlparser::ast::{ /// /// [`ExprFunctionExt`]: crate::expr_fn::ExprFunctionExt /// +/// # Printing Expre

Re: [I] Build failure in flight_sql.rs [datafusion-ballista]

2025-03-18 Thread via GitHub
ahmedriza commented on issue #895: URL: https://github.com/apache/datafusion-ballista/issues/895#issuecomment-2734822308 > It looks like this issue has been fixed, is it ok to close this issue? Sure, thanks -- This is an automated message from the Apache Git Service. To respond to

Re: [PR] Blog post on Parquet pruning in datafusion [datafusion-site]

2025-03-18 Thread via GitHub
comphead commented on code in PR #60: URL: https://github.com/apache/datafusion-site/pull/60#discussion_r2002075851 ## content/blog/2025-03-18-parquet-pruning.md: ## @@ -0,0 +1,111 @@ +--- +layout: post +title: Parquet pruning in DataFusion: Read Only What Matters +date: 2025-03

Re: [PR] Blog post on Parquet pruning in datafusion [datafusion-site]

2025-03-18 Thread via GitHub
comphead commented on code in PR #60: URL: https://github.com/apache/datafusion-site/pull/60#discussion_r2002069929 ## content/blog/2025-03-18-parquet-pruning.md: ## @@ -0,0 +1,111 @@ +--- +layout: post +title: Parquet pruning in DataFusion: Read Only What Matters +date: 2025-03

Re: [PR] feat: Add `datafusion-spark` crate [datafusion]

2025-03-18 Thread via GitHub
shehabgamin commented on code in PR #15168: URL: https://github.com/apache/datafusion/pull/15168#discussion_r2002067067 ## datafusion/spark/src/function/math/expm1.rs: ## @@ -0,0 +1,169 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor lic

[PR] doc: Renew `kubernetes.md` [datafusion-comet]

2025-03-18 Thread via GitHub
comphead opened a new pull request, #1549: URL: https://github.com/apache/datafusion-comet/pull/1549 ## Which issue does this PR close? Related to #1546. ## Rationale for this change ## What changes are included in this PR? ## How are these

Re: [PR] Add WITH ORDER example to blog post [datafusion-site]

2025-03-18 Thread via GitHub
akurmustafa commented on PR #59: URL: https://github.com/apache/datafusion-site/pull/59#issuecomment-2734756045 > > Thanks @alamb, I was working on to add the example you gave ("DataFusion can find / use orderings based on query intermediates"). Should we add this to the document what do yo

Re: [PR] feat: add read array support [datafusion-comet]

2025-03-18 Thread via GitHub
comphead commented on code in PR #1456: URL: https://github.com/apache/datafusion-comet/pull/1456#discussion_r2001739415 ## native/core/Cargo.toml: ## @@ -77,6 +77,7 @@ jni = { version = "0.21", features = ["invocation"] } lazy_static = "1.4" assertables = "7" hex = "0.4.3" +

Re: [PR] fix: `core_expressions` feature flag broken, move `overlay` into `core` functions [datafusion]

2025-03-18 Thread via GitHub
alamb commented on PR #15217: URL: https://github.com/apache/datafusion/pull/15217#issuecomment-2734530255 > hey @alamb, I have already added a re-export at the end of `datafusion/functions/src/string/overlay.rs` like this Thanks @shruti2522 - that looks good to me I double ch

Re: [PR] Migrate tests to insta [datafusion]

2025-03-18 Thread via GitHub
jsai28 commented on code in PR #15288: URL: https://github.com/apache/datafusion/pull/15288#discussion_r2001388172 ## datafusion/core/tests/sql/select.rs: ## @@ -114,33 +72,7 @@ async fn test_prepare_statement() -> Result<()> { let dataframe = dataframe.with_param_values(pa

  1   2   3   >