Re: [PR] [SPARK-49927][SS][PYTHON][TESTS][FOLLOW-UP] Fixes `q.lastProgress.batchId` to `q.lastProgress.progress.batchId` [spark]

2024-10-10 Thread via GitHub
HyukjinKwon closed pull request #48419: [SPARK-49927][SS][PYTHON][TESTS][FOLLOW-UP] Fixes `q.lastProgress.batchId` to `q.lastProgress.progress.batchId` URL: https://github.com/apache/spark/pull/48419 -- This is an automated message from the Apache Git Service. To respond to the message, plea

Re: [PR] [SPARK-49927][SS][PYTHON][TESTS][FOLLOW-UP] Fixes `q.lastProgress.batchId` to `q.lastProgress.progress.batchId` [spark]

2024-10-10 Thread via GitHub
HyukjinKwon commented on PR #48419: URL: https://github.com/apache/spark/pull/48419#issuecomment-2406663394 I am going to merge this to fix up the build. Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[PR] [SPARK-49927][SS][PYTHON][TESTS][FOLLOW-UP] Fixes `q.lastProgress.batchId` to `q.lastProgress.progress.batchId` [spark]

2024-10-10 Thread via GitHub
HyukjinKwon opened a new pull request, #48419: URL: https://github.com/apache/spark/pull/48419 ### What changes were proposed in this pull request? This PR is a followup of https://github.com/apache/spark/pull/48414 that fixes `q.lastProgress.batchId` -> `q.lastProgress.progress.batch

Re: [PR] [SPARK-48155][FOLLOWUP][SQL] AQEPropagateEmptyRelation for left anti join should check if remain child is just BroadcastQueryStageExec [spark]

2024-10-10 Thread via GitHub
zml1206 commented on code in PR #48300: URL: https://github.com/apache/spark/pull/48300#discussion_r1796485180 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/PropagateEmptyRelation.scala: ## @@ -111,7 +111,8 @@ abstract class PropagateEmptyRelationBase ex

Re: [PR] [SPARK-48155][FOLLOWUP][SQL] AQEPropagateEmptyRelation for left anti join should check if remain child is just BroadcastQueryStageExec [spark]

2024-10-10 Thread via GitHub
zml1206 commented on code in PR #48300: URL: https://github.com/apache/spark/pull/48300#discussion_r1796483876 ## sql/core/src/test/scala/org/apache/spark/sql/execution/adaptive/AdaptiveQueryExecSuite.scala: ## @@ -2829,6 +2829,38 @@ class AdaptiveQueryExecSuite assert(fi

Re: [PR] [SPARK-48155][FOLLOWUP][SQL] AQEPropagateEmptyRelation for left anti join should check if remain child is just BroadcastQueryStageExec [spark]

2024-10-10 Thread via GitHub
cloud-fan commented on code in PR #48300: URL: https://github.com/apache/spark/pull/48300#discussion_r1796469594 ## sql/core/src/test/scala/org/apache/spark/sql/execution/adaptive/AdaptiveQueryExecSuite.scala: ## @@ -2829,6 +2829,38 @@ class AdaptiveQueryExecSuite assert(

Re: [PR] [SPARK-48155][FOLLOWUP][SQL] AQEPropagateEmptyRelation for left anti join should check if remain child is just BroadcastQueryStageExec [spark]

2024-10-10 Thread via GitHub
cloud-fan commented on code in PR #48300: URL: https://github.com/apache/spark/pull/48300#discussion_r1796468910 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/PropagateEmptyRelation.scala: ## @@ -111,7 +111,8 @@ abstract class PropagateEmptyRelationBase

Re: [PR] [SPARK-49569][BUILD][FOLLOWUP] Exclude `spark-connect-shims` from `sql/core` module [spark]

2024-10-10 Thread via GitHub
LuciferYang commented on PR #48403: URL: https://github.com/apache/spark/pull/48403#issuecomment-2406607712 > FWIW, I think it's gonna fix https://github.com/apache/spark/actions/runs/11259624487/job/31309026637 too https://github.com/user-attachments/assets/1c6dcc25-ad23-43e9-b90b-7c

Re: [PR] [SPARK-48922][SQL] Optimize nested data type insertion performance [spark]

2024-10-10 Thread via GitHub
pan3793 commented on PR #47381: URL: https://github.com/apache/spark/pull/47381#issuecomment-2406604186 Is the fix proposed here partially fixed by SPARK-49352? also cc @viirya -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

Re: [PR] [SPARK-49915][SQL] Handle zeros and ones in ReorderAssociativeOperator [spark]

2024-10-10 Thread via GitHub
yaooqinn commented on code in PR #48395: URL: https://github.com/apache/spark/pull/48395#discussion_r1796446276 ## sql/core/src/test/resources/sql-tests/analyzer-results/null-handling.sql.out: ## @@ -69,6 +69,24 @@ Project [a#x, (b#x + c#x) AS (b + c)#x] +- Relation spark_ca

Re: [PR] [SPARK-49915][SQL] Handle zeros and ones in ReorderAssociativeOperator [spark]

2024-10-10 Thread via GitHub
yaooqinn commented on code in PR #48395: URL: https://github.com/apache/spark/pull/48395#discussion_r1796444211 ## sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/ReorderAssociativeOperatorSuite.scala: ## @@ -74,4 +74,33 @@ class ReorderAssociativeOperatorSui

Re: [PR] [SPARK-49915][SQL] Handle zeros and ones in ReorderAssociativeOperator [spark]

2024-10-10 Thread via GitHub
yaooqinn commented on code in PR #48395: URL: https://github.com/apache/spark/pull/48395#discussion_r1796443103 ## sql/core/src/test/resources/sql-tests/analyzer-results/null-handling.sql.out: ## @@ -69,6 +69,24 @@ Project [a#x, (b#x + c#x) AS (b + c)#x] +- Relation spark_ca

Re: [PR] [SPARK-49915][SQL] Handle zeros and ones in ReorderAssociativeOperator [spark]

2024-10-10 Thread via GitHub
cloud-fan commented on code in PR #48395: URL: https://github.com/apache/spark/pull/48395#discussion_r1796429306 ## sql/core/src/test/resources/sql-tests/analyzer-results/null-handling.sql.out: ## @@ -69,6 +69,24 @@ Project [a#x, (b#x + c#x) AS (b + c)#x] +- Relation spark_c

Re: [PR] [SPARK-49915][SQL] Handle zeros and ones in ReorderAssociativeOperator [spark]

2024-10-10 Thread via GitHub
yaooqinn commented on code in PR #48395: URL: https://github.com/apache/spark/pull/48395#discussion_r1796428537 ## sql/core/src/test/resources/sql-tests/analyzer-results/null-handling.sql.out: ## @@ -69,6 +69,24 @@ Project [a#x, (b#x + c#x) AS (b + c)#x] +- Relation spark_ca

Re: [PR] [SPARK-49615] Bugfix: Make ML column schema validation conforms with spark config `spark.sql.caseSensitive`. [spark]

2024-10-10 Thread via GitHub
cloud-fan commented on code in PR #48398: URL: https://github.com/apache/spark/pull/48398#discussion_r1796426992 ## mllib/src/main/scala/org/apache/spark/ml/util/SchemaUtils.scala: ## @@ -213,11 +216,17 @@ private[spark] object SchemaUtils { */ def getSchemaField(schema:

Re: [PR] [SPARK-49915][SQL] Handle zeros and ones in ReorderAssociativeOperator [spark]

2024-10-10 Thread via GitHub
cloud-fan commented on code in PR #48395: URL: https://github.com/apache/spark/pull/48395#discussion_r1796426171 ## sql/core/src/test/resources/sql-tests/analyzer-results/null-handling.sql.out: ## @@ -69,6 +69,24 @@ Project [a#x, (b#x + c#x) AS (b + c)#x] +- Relation spark_c

Re: [PR] [SPARK-49915][SQL] Handle zeros and ones in ReorderAssociativeOperator [spark]

2024-10-10 Thread via GitHub
cloud-fan commented on code in PR #48395: URL: https://github.com/apache/spark/pull/48395#discussion_r1796425812 ## sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/ReorderAssociativeOperatorSuite.scala: ## @@ -74,4 +74,33 @@ class ReorderAssociativeOperatorSu

Re: [PR] [SPARK-49829][SS] Revise the optimization on adding input to state store in stream-stream join (correctness fix) [spark]

2024-10-10 Thread via GitHub
neilramaswamy commented on code in PR #48297: URL: https://github.com/apache/spark/pull/48297#discussion_r1796403387 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamingSymmetricHashJoinExec.scala: ## @@ -668,13 +668,38 @@ case class StreamingSymmetricHa

Re: [PR] [SPARK-49930] Ensure that socket updates are flushed on exception from the python worker [spark]

2024-10-10 Thread via GitHub
anishshri-db commented on PR #48418: URL: https://github.com/apache/spark/pull/48418#issuecomment-2406548634 cc - @HeartSaVioR @WweiL - PTAL, thx ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[PR] [SPARK-49930] Ensure that socket updates are flushed on exception from the python worker [spark]

2024-10-10 Thread via GitHub
anishshri-db opened a new pull request, #48418: URL: https://github.com/apache/spark/pull/48418 ### What changes were proposed in this pull request? Ensure that socket updates are flushed on exception from the python worker ### Why are the changes needed? Without this, update

Re: [PR] [SPARK-49792][PS][BUILD] Upgrade to numpy 2 for building and testing Spark branches [spark]

2024-10-10 Thread via GitHub
xinrong-meng commented on PR #48180: URL: https://github.com/apache/spark/pull/48180#issuecomment-2406544218 ``` [info] - interrupt all - background queries, foreground interrupt *** FAILED *** (20 seconds, 50 milliseconds) [info] The code passed to eventually never returned normally

Re: [PR] [SPARK-49928][PYTHON][TESTS] Refactor plot-related unit tests [spark]

2024-10-10 Thread via GitHub
xinrong-meng commented on PR #48415: URL: https://github.com/apache/spark/pull/48415#issuecomment-2406542424 We may later port those expected_fig_data dictionaries to a separate JSON file for easier auditing if the number of tests increases -- This is an automated message from the Apache

Re: [PR] [SPARK-49928][PYTHON][TESTS] Refactor plot-related unit tests [spark]

2024-10-10 Thread via GitHub
xinrong-meng commented on PR #48415: URL: https://github.com/apache/spark/pull/48415#issuecomment-2406540695 cc @zhengruifeng @HyukjinKwon would you please review thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

Re: [PR] [SPARK-49928][PYTHON][TESTS] Refactor plot-related unit tests [spark]

2024-10-10 Thread via GitHub
xinrong-meng commented on code in PR #48415: URL: https://github.com/apache/spark/pull/48415#discussion_r1796413066 ## python/pyspark/sql/tests/plot/test_frame_plot_plotly.py: ## @@ -48,79 +48,174 @@ def sdf3(self): columns = ["sales", "signups", "visits", "date"]

Re: [PR] [SPARK-49928][PYTHON][TESTS] Refactor plot-related unit tests [spark]

2024-10-10 Thread via GitHub
xinrong-meng commented on PR #48415: URL: https://github.com/apache/spark/pull/48415#issuecomment-2406535987 Irrelevant tests failed, retriggering: ``` ERROR [3.661s]: test_listener_events (pyspark.sql.tests.streaming.test_streaming_listener.StreamingListenerTests.test_listener_events)

Re: [PR] [SPARK-49915][SQL] Handle zeros and ones in ReorderAssociativeOperator [spark]

2024-10-10 Thread via GitHub
yaooqinn commented on code in PR #48395: URL: https://github.com/apache/spark/pull/48395#discussion_r1796387845 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala: ## @@ -260,19 +260,32 @@ object ReorderAssociativeOperator extends Rule[Logi

Re: [PR] [SPARK-49569][BUILD][FOLLOWUP] Exclude `spark-connect-shims` from `sql/core` module [spark]

2024-10-10 Thread via GitHub
HyukjinKwon commented on PR #48403: URL: https://github.com/apache/spark/pull/48403#issuecomment-2406490863 Yeah, I think it's gonna fix https://github.com/apache/spark/actions/runs/11259624487/job/31309026637 too -- This is an automated message from the Apache Git Service. To respond to

Re: [PR] [SPARK-49915][SQL] Handle zeros and ones in ReorderAssociativeOperator [spark]

2024-10-10 Thread via GitHub
yaooqinn commented on code in PR #48395: URL: https://github.com/apache/spark/pull/48395#discussion_r1796381306 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala: ## @@ -260,19 +260,32 @@ object ReorderAssociativeOperator extends Rule[Logi

Re: [PR] [SPARK-48567][PYTHON][TESTS][FOLLOW-UP] Make the query scope higher so finally can access to it [spark]

2024-10-10 Thread via GitHub
HyukjinKwon closed pull request #48417: [SPARK-48567][PYTHON][TESTS][FOLLOW-UP] Make the query scope higher so finally can access to it URL: https://github.com/apache/spark/pull/48417 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitH

Re: [PR] [SPARK-48567][PYTHON][TESTS][FOLLOW-UP] Make the query scope higher so finally can access to it [spark]

2024-10-10 Thread via GitHub
HyukjinKwon commented on PR #48417: URL: https://github.com/apache/spark/pull/48417#issuecomment-2406489014 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] [SPARK-48567][PYTHON][TESTS][FOLLOW-UP] Make the query scope higher so finally can access to it [spark]

2024-10-10 Thread via GitHub
HyukjinKwon commented on PR #48417: URL: https://github.com/apache/spark/pull/48417#issuecomment-2406488944 Will merge this to fix up the build. Otherwise, I will revert this and https://github.com/apache/spark/commit/2af653688c20dde87eebaa6bd4dc21123fab74cc if it still fails. -- This is

Re: [PR] [SPARK-49748][CORE][FOLLOWUP] Add `getCondition` and deprecate `getErrorClass` in `QueryCompilationErrorsSuite` [spark]

2024-10-10 Thread via GitHub
panbingkun commented on PR #48416: URL: https://github.com/apache/spark/pull/48416#issuecomment-2406488740 cc @MaxGekk -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To u

Re: [PR] [SPARK-49748][CORE][FOLLOWUP] Add `getCondition` and deprecate `getErrorClass` in `QueryCompilationErrorsSuite` [spark]

2024-10-10 Thread via GitHub
panbingkun commented on PR #48416: URL: https://github.com/apache/spark/pull/48416#issuecomment-2406488645 I'm not sure if there will be similar uses in the future, because in our code, it's not mandatory not to use `getErrorClass`. -- This is an automated message from the Apache Git Serv

Re: [PR] [SPARK-49748][CORE][FOLLOWUP] Add `getCondition` and deprecate `getErrorClass` in `QueryCompilationErrorsSuite` [spark]

2024-10-10 Thread via GitHub
panbingkun commented on code in PR #48416: URL: https://github.com/apache/spark/pull/48416#discussion_r1796379400 ## sql/core/src/test/scala/org/apache/spark/sql/errors/QueryCompilationErrorsSuite.scala: ## @@ -1003,7 +1003,7 @@ class QueryCompilationErrorsSuite val exc

Re: [PR] [SPARK-49569][BUILD][FOLLOWUP] Exclude `spark-connect-shims` from `sql/core` module [spark]

2024-10-10 Thread via GitHub
LuciferYang commented on PR #48403: URL: https://github.com/apache/spark/pull/48403#issuecomment-2406473170 After merging this PR, Maven daily test has been restored: - Java 17: https://github.com/apache/spark/actions/runs/11274643792 https://github.com/user-attachments/assets/5

Re: [PR] [SPARK-49915][SQL] Handle zeros and ones in ReorderAssociativeOperator [spark]

2024-10-10 Thread via GitHub
cloud-fan commented on code in PR #48395: URL: https://github.com/apache/spark/pull/48395#discussion_r1796369501 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala: ## @@ -260,19 +260,32 @@ object ReorderAssociativeOperator extends Rule[Log

Re: [PR] [SPARK-49915][SQL] Handle zeros and ones in ReorderAssociativeOperator [spark]

2024-10-10 Thread via GitHub
cloud-fan commented on code in PR #48395: URL: https://github.com/apache/spark/pull/48395#discussion_r1796369501 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala: ## @@ -260,19 +260,32 @@ object ReorderAssociativeOperator extends Rule[Log

[PR] [SPARK-49748][CORE][FOLLOWUP] Add `getCondition` and deprecate `getErrorClass` [spark]

2024-10-10 Thread via GitHub
panbingkun opened a new pull request, #48416: URL: https://github.com/apache/spark/pull/48416 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? No. ### How was t

Re: [PR] [SPARK-48881][SQL] Some dynamic partitions can be compensated to specific partition values [spark]

2024-10-10 Thread via GitHub
fusheng-rd commented on PR #47418: URL: https://github.com/apache/spark/pull/47418#issuecomment-2406448334 Please help review it when you have free time, thanks! @ulysses-you cc @cloud-fan -- This is an automated message from the Apache Git Service. To respond to the message, please

Re: [PR] [SPARK-49915][SQL] Handle zeros and ones in ReorderAssociativeOperator [spark]

2024-10-10 Thread via GitHub
yaooqinn commented on PR #48395: URL: https://github.com/apache/spark/pull/48395#issuecomment-2406409291 cc @cloud-fan @dongjoon-hyun thanks -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the sp

Re: [PR] [SPARK-48155][FOLLOWUP][SQL] AQEPropagateEmptyRelation for left anti join should check if remain child is just BroadcastQueryStageExec [spark]

2024-10-10 Thread via GitHub
zml1206 commented on PR #48300: URL: https://github.com/apache/spark/pull/48300#issuecomment-2406402629 cc @cloud-fan Can you help take a look, thanks. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to g

Re: [PR] [SPARK-49558][SQL] Add SQL pipe syntax for LIMIT/OFFSET and ORDER/SORT/CLUSTER/DISTRIBUTE BY [spark]

2024-10-10 Thread via GitHub
cloud-fan commented on code in PR #48413: URL: https://github.com/apache/spark/pull/48413#discussion_r1796325607 ## sql/core/src/test/resources/sql-tests/results/pipe-operators.sql.out: ## @@ -1673,6 +1691,279 @@ org.apache.spark.sql.catalyst.ExtendedAnalysisException } +--

Re: [PR] [SPARK-49558][SQL] Add SQL pipe syntax for LIMIT/OFFSET and ORDER/SORT/CLUSTER/DISTRIBUTE BY [spark]

2024-10-10 Thread via GitHub
cloud-fan commented on code in PR #48413: URL: https://github.com/apache/spark/pull/48413#discussion_r1796325607 ## sql/core/src/test/resources/sql-tests/results/pipe-operators.sql.out: ## @@ -1673,6 +1691,279 @@ org.apache.spark.sql.catalyst.ExtendedAnalysisException } +--

Re: [PR] [SPARK-49558][SQL] Add SQL pipe syntax for LIMIT/OFFSET and ORDER/SORT/CLUSTER/DISTRIBUTE BY [spark]

2024-10-10 Thread via GitHub
cloud-fan commented on code in PR #48413: URL: https://github.com/apache/spark/pull/48413#discussion_r1796324933 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala: ## @@ -1010,21 +1018,40 @@ class AstBuilder extends DataTypeAstBuilder //

Re: [PR] [SPARK-49558][SQL] Add SQL pipe syntax for LIMIT/OFFSET and ORDER/SORT/CLUSTER/DISTRIBUTE BY [spark]

2024-10-10 Thread via GitHub
cloud-fan commented on code in PR #48413: URL: https://github.com/apache/spark/pull/48413#discussion_r1796324638 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala: ## @@ -1010,21 +1018,40 @@ class AstBuilder extends DataTypeAstBuilder //

Re: [PR] [SPARK-49558][SQL] Add SQL pipe syntax for LIMIT/OFFSET and ORDER/SORT/CLUSTER/DISTRIBUTE BY [spark]

2024-10-10 Thread via GitHub
cloud-fan commented on code in PR #48413: URL: https://github.com/apache/spark/pull/48413#discussion_r1796323209 ## sql/core/src/test/resources/sql-tests/results/pipe-operators.sql.out: ## @@ -1673,6 +1691,279 @@ org.apache.spark.sql.catalyst.ExtendedAnalysisException } +--

Re: [PR] [SPARK-49558][SQL] Add SQL pipe syntax for LIMIT/OFFSET and ORDER/SORT/CLUSTER/DISTRIBUTE BY [spark]

2024-10-10 Thread via GitHub
cloud-fan commented on code in PR #48413: URL: https://github.com/apache/spark/pull/48413#discussion_r1796322574 ## sql/core/src/test/resources/sql-tests/inputs/pipe-operators.sql: ## @@ -571,6 +583,95 @@ table t table t |> union all table st; +-- Sorting and repartitioning

[PR] [SPARK-49928][PYTHON][TESTS] Refactor plot-related unit tests [spark]

2024-10-10 Thread via GitHub
xinrong-meng opened a new pull request, #48415: URL: https://github.com/apache/spark/pull/48415 ### What changes were proposed in this pull request? Refactor plot-related unit tests. ### Why are the changes needed? Different plots have different key attributes of the resulting fi

Re: [PR] [SPARK-49927][SS] pyspark.sql.tests.streaming.test_streaming_listener to wait longer [spark]

2024-10-10 Thread via GitHub
HyukjinKwon closed pull request #48414: [SPARK-49927][SS] pyspark.sql.tests.streaming.test_streaming_listener to wait longer URL: https://github.com/apache/spark/pull/48414 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

Re: [PR] [SPARK-49927][SS] pyspark.sql.tests.streaming.test_streaming_listener to wait longer [spark]

2024-10-10 Thread via GitHub
HyukjinKwon commented on PR #48414: URL: https://github.com/apache/spark/pull/48414#issuecomment-2406380975 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] [SPARK-49927][SS] pyspark.sql.tests.streaming.test_streaming_listener to wait longer [spark]

2024-10-10 Thread via GitHub
HyukjinKwon commented on code in PR #48414: URL: https://github.com/apache/spark/pull/48414#discussion_r1796294336 ## python/pyspark/sql/tests/streaming/test_streaming_listener.py: ## @@ -381,7 +381,8 @@ def verify(test_listener): .start() )

Re: [PR] [SPARK-49927][SS] pyspark.sql.tests.streaming.test_streaming_listener to wait longer [spark]

2024-10-10 Thread via GitHub
HyukjinKwon commented on code in PR #48414: URL: https://github.com/apache/spark/pull/48414#discussion_r1796294336 ## python/pyspark/sql/tests/streaming/test_streaming_listener.py: ## @@ -381,7 +381,8 @@ def verify(test_listener): .start() )

Re: [PR] [SPARK-49925][SQL] Add tests for order by with collated strings [spark]

2024-10-10 Thread via GitHub
cloud-fan commented on PR #48412: URL: https://github.com/apache/spark/pull/48412#issuecomment-2406319893 Can you re-trigger Github Action jobs? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to th

Re: [PR] [SPARK-49792][PS][BUILD] Upgrade to numpy 2 for building and testing Spark branches [spark]

2024-10-10 Thread via GitHub
xinrong-meng commented on PR #48180: URL: https://github.com/apache/spark/pull/48180#issuecomment-2406309866 Retriggered irrelevant tests -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the speci

Re: [PR] [SPARK-49924][SQL] Keep `containsNull` after `ArrayCompact` replacement [spark]

2024-10-10 Thread via GitHub
cloud-fan commented on code in PR #48410: URL: https://github.com/apache/spark/pull/48410#discussion_r1796273894 ## sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/OptimizerSuite.scala: ## @@ -313,4 +313,23 @@ class OptimizerSuite extends PlanTest { as

Re: [PR] [SPARK-49766][SQL] Codegen Support for `json_array_length` (by `Invoke` & `RuntimeReplaceable`) [spark]

2024-10-10 Thread via GitHub
panbingkun commented on code in PR #48224: URL: https://github.com/apache/spark/pull/48224#discussion_r1796272621 ## sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/json/JsonExpressionUtils.java: ## @@ -0,0 +1,58 @@ +/* + * Licensed to the Apache Software Fo

Re: [PR] [SPARK-49808][SQL] Fix a deadlock in subquery execution due to lazy vals [spark]

2024-10-10 Thread via GitHub
zhengruifeng commented on code in PR #48391: URL: https://github.com/apache/spark/pull/48391#discussion_r1796265683 ## core/src/main/scala/org/apache/spark/util/Lazy.scala: ## @@ -0,0 +1,43 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contrib

Re: [PR] [SPARK-48694][CORE]Manage memory used by external cache [spark]

2024-10-10 Thread via GitHub
github-actions[bot] closed pull request #47067: [SPARK-48694][CORE]Manage memory used by external cache URL: https://github.com/apache/spark/pull/47067 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go t

Re: [PR] [DO-NOT-MERGE][DRAFT] Extract shared DataFrameSuite for Spark Connect [spark]

2024-10-10 Thread via GitHub
github-actions[bot] commented on PR #47174: URL: https://github.com/apache/spark/pull/47174#issuecomment-2406280309 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.

Re: [PR] [DO-NOT-MERGE] Structured logging style [spark]

2024-10-10 Thread via GitHub
github-actions[bot] commented on PR #47182: URL: https://github.com/apache/spark/pull/47182#issuecomment-2406280284 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.

Re: [PR] [SPARK-48667][PYTHON] Arrow python UDFS didn't support UDT as outputType [spark]

2024-10-10 Thread via GitHub
github-actions[bot] closed pull request #47036: [SPARK-48667][PYTHON] Arrow python UDFS didn't support UDT as outputType URL: https://github.com/apache/spark/pull/47036 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

Re: [PR] [SPARK-48669][K8S] K8s resource name prefix follows `DNS Subdomain Names` rule [spark]

2024-10-10 Thread via GitHub
github-actions[bot] closed pull request #47039: [SPARK-48669][K8S] K8s resource name prefix follows `DNS Subdomain Names` rule URL: https://github.com/apache/spark/pull/47039 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and us

Re: [PR] [SPARK-48696][SQL][CONNECT] Also truncate the schema row for show function [spark]

2024-10-10 Thread via GitHub
github-actions[bot] closed pull request #47078: [SPARK-48696][SQL][CONNECT] Also truncate the schema row for show function URL: https://github.com/apache/spark/pull/47078 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use th

Re: [PR] [SPARK-48739][SQL] Disable writing collated data to file formats that don't support them in non managed tables [spark]

2024-10-10 Thread via GitHub
github-actions[bot] closed pull request #47127: [SPARK-48739][SQL] Disable writing collated data to file formats that don't support them in non managed tables URL: https://github.com/apache/spark/pull/47127 -- This is an automated message from the Apache Git Service. To respond to the messag

Re: [PR] [SPARK-48750][SQL] AQEPropagateEmptyRelation convert broadcast query stage plan to empty relation causing error [spark]

2024-10-10 Thread via GitHub
github-actions[bot] commented on PR #47158: URL: https://github.com/apache/spark/pull/47158#issuecomment-2406280340 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.

[PR] [SPARK-49927][SS] pyspark.sql.tests.streaming.test_streaming_listener to wait longer [spark]

2024-10-10 Thread via GitHub
siying opened a new pull request, #48414: URL: https://github.com/apache/spark/pull/48414 ### What changes were proposed in this pull request? In test pyspark.sql.tests.streaming.test_streaming_listener, instead of waiting for fixed 10 seconds, we wait for progress made.

Re: [PR] [SPARK-49808][SQL] Fix a deadlock in subquery execution due to lazy vals [spark]

2024-10-10 Thread via GitHub
JoshRosen commented on code in PR #48391: URL: https://github.com/apache/spark/pull/48391#discussion_r1796194610 ## core/src/main/scala/org/apache/spark/util/Lazy.scala: ## @@ -0,0 +1,43 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributo

Re: [PR] [SPARK-49808][SQL] Fix a deadlock in subquery execution due to lazy vals [spark]

2024-10-10 Thread via GitHub
JoshRosen commented on code in PR #48391: URL: https://github.com/apache/spark/pull/48391#discussion_r1796190664 ## core/src/main/scala/org/apache/spark/util/Lazy.scala: ## @@ -0,0 +1,43 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributo

Re: [PR] [SPARK-19846][SQL] Add a flag to disable constraint propagation [spark]

2024-10-10 Thread via GitHub
ahshahid commented on PR #17186: URL: https://github.com/apache/spark/pull/17186#issuecomment-2406168502 @xyxiaoyou : for your reference: take a look at [https://issues.apache.org/jira/browse/SPARK-33152](https://issues.apache.org/jira/browse/SPARK-33152) and corresponding PR ( though it

Re: [PR] [SPARK-49902][SQL] Catch underlying runtime errors in RegExpReplace [spark]

2024-10-10 Thread via GitHub
harshmotw-db commented on code in PR #48379: URL: https://github.com/apache/spark/pull/48379#discussion_r1796173788 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala: ## @@ -700,7 +701,14 @@ case class RegExpReplace(subject: Express

[PR] Add dataframe PrintSchema and schema.TreeString [spark-connect-go]

2024-10-10 Thread via GitHub
magpierre opened a new pull request, #80: URL: https://github.com/apache/spark-connect-go/pull/80 Added the skeleton for dataframe.PrintSchema() and schema.TreeString() that can be extended with functionality pertaining to nested dataTypes once they become available. The feature has

Re: [PR] [SPARK-49723][SQL] Add Variant metrics to the JSON File Scan node [spark]

2024-10-10 Thread via GitHub
gene-db commented on code in PR #48172: URL: https://github.com/apache/spark/pull/48172#discussion_r1796151027 ## common/variant/src/main/java/org/apache/spark/types/variant/VariantBuilder.java: ## @@ -53,17 +53,21 @@ public VariantBuilder(boolean allowDuplicateKeys) { public

Re: [PR] [SPARK-49723][SQL] Add Variant metrics to the JSON File Scan node [spark]

2024-10-10 Thread via GitHub
harshmotw-db commented on code in PR #48172: URL: https://github.com/apache/spark/pull/48172#discussion_r1796071250 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileScanRDD.scala: ## @@ -72,9 +74,30 @@ case class PartitionedFile( } } +/** + * Class

Re: [PR] [SPARK-49547][SQL][PYTHON] Support returning iterator of RecordBatches in applyInArrow [spark]

2024-10-10 Thread via GitHub
Kimahriman commented on PR #48038: URL: https://github.com/apache/spark/pull/48038#issuecomment-2405924462 Gentle ping @zhengruifeng @HyukjinKwon -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] [SPARK-49411][SS] Communicate State Store Checkpoint ID between driver and stateful operators [spark]

2024-10-10 Thread via GitHub
09306677806 commented on PR #47895: URL: https://github.com/apache/spark/pull/47895#issuecomment-2405884753 bc1qqeysv50ayq0az93s5frfvtf2fe6rt5tfkdfx2y #*Shahrzadmahro# در تاریخ پنجشنبه ۱۰ اکتبر ۲۰۲۴،‏ ۲۲:۰۴ Burak Yavuz ***@***.***> نوشت: > ***@***. requested changes on

Re: [PR] [SPARK-49411][SS] Communicate State Store Checkpoint ID between driver and stateful operators [spark]

2024-10-10 Thread via GitHub
brkyvz commented on code in PR #47895: URL: https://github.com/apache/spark/pull/47895#discussion_r1795913662 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/StateStoreRDD.scala: ## @@ -126,6 +129,7 @@ class StateStoreRDD[T: ClassTag, U: ClassTag](

Re: [PR] [SPARK-49558][SQL] Add SQL pipe syntax for LIMIT/OFFSET and ORDER/SORT/CLUSTER/DISTRIBUTE BY [spark]

2024-10-10 Thread via GitHub
dtenedor commented on PR #48413: URL: https://github.com/apache/spark/pull/48413#issuecomment-2405602311 cc @cloud-fan @gengliangwang this is the PR to support LIMIT/OFFSET + sorting. There are a few more changes in the `AstBuilder` for this one but still contained only in the parser. --

[PR] [SPARK-49558][SQL] Add SQL pipe syntax for LIMIT/OFFSET and ORDER/SORT/CLUSTER/DISTRIBUTE BY [spark]

2024-10-10 Thread via GitHub
dtenedor opened a new pull request, #48413: URL: https://github.com/apache/spark/pull/48413 ### What changes were proposed in this pull request? This PR adds SQL pipe syntax support for LIMIT/OFFSET and ORDER/SORT/CLUSTER/DISTRIBUTE BY. For example: ``` CREATE TABLE t

Re: [PR] [SPARK-49249][SPARK-49122] Artifact isolation in Spark Classic [spark]

2024-10-10 Thread via GitHub
xupefei commented on code in PR #48120: URL: https://github.com/apache/spark/pull/48120#discussion_r1775548583 ## sql/core/src/main/scala/org/apache/spark/sql/artifact/ArtifactManager.scala: ## @@ -67,12 +67,18 @@ class ArtifactManager(session: SparkSession) extends Logging {

Re: [PR] [SPARK-49569][BUILD][FOLLOWUP] Exclude `spark-connect-shims` from `sql/core` module [spark]

2024-10-10 Thread via GitHub
LuciferYang commented on PR #48403: URL: https://github.com/apache/spark/pull/48403#issuecomment-2405449605 Merged into master. Thanks @hvanhovell and @HyukjinKwon -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the U

Re: [PR] [SPARK-49569][BUILD][FOLLOWUP] Exclude `spark-connect-shims` from `sql/core` module [spark]

2024-10-10 Thread via GitHub
LuciferYang closed pull request #48403: [SPARK-49569][BUILD][FOLLOWUP] Exclude `spark-connect-shims` from `sql/core` module URL: https://github.com/apache/spark/pull/48403 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use t

Re: [PR] [SPARK-49925][SQL] Add tests for order by with collated strings [spark]

2024-10-10 Thread via GitHub
ilicmarkodb commented on code in PR #48412: URL: https://github.com/apache/spark/pull/48412#discussion_r1795643488 ## sql/core/src/test/scala/org/apache/spark/sql/CollationSuite.scala: ## @@ -1101,6 +1101,259 @@ class CollationSuite extends DatasourceV2SQLBase with AdaptiveSpar

Re: [PR] [SPARK-49925][SQL] Add tests for order by with collated strings [spark]

2024-10-10 Thread via GitHub
cloud-fan commented on code in PR #48412: URL: https://github.com/apache/spark/pull/48412#discussion_r1795476158 ## sql/core/src/test/scala/org/apache/spark/sql/CollationSuite.scala: ## @@ -1101,6 +1101,259 @@ class CollationSuite extends DatasourceV2SQLBase with AdaptiveSparkP

Re: [PR] [SPARK-49925][SQL] Add tests for order by with collated strings [spark]

2024-10-10 Thread via GitHub
cloud-fan commented on code in PR #48412: URL: https://github.com/apache/spark/pull/48412#discussion_r1795468153 ## sql/core/src/test/scala/org/apache/spark/sql/CollationSuite.scala: ## @@ -1101,6 +1101,259 @@ class CollationSuite extends DatasourceV2SQLBase with AdaptiveSparkP

Re: [PR] [SPARK-49925][SQL] Add tests for order by with collated strings [spark]

2024-10-10 Thread via GitHub
cloud-fan commented on code in PR #48412: URL: https://github.com/apache/spark/pull/48412#discussion_r1795466130 ## sql/core/src/test/scala/org/apache/spark/sql/CollationSuite.scala: ## @@ -1101,6 +1101,259 @@ class CollationSuite extends DatasourceV2SQLBase with AdaptiveSparkP

Re: [PR] [SPARK-49916][SQL] Throw appropriate Exception for type mismatch between ColumnType and data type in some rows [spark]

2024-10-10 Thread via GitHub
MaxGekk commented on code in PR #48397: URL: https://github.com/apache/spark/pull/48397#discussion_r1795415992 ## common/utils/src/main/resources/error/error-conditions.json: ## @@ -606,6 +606,12 @@ ], "sqlState" : "42711" }, + "COLUMN_ARRAY_ELEMENT_TYPE_MISMATCH"

Re: [PR] [SPARK-49711][SQL] Remove ExperimentalMethods [spark]

2024-10-10 Thread via GitHub
hvanhovell commented on code in PR #48390: URL: https://github.com/apache/spark/pull/48390#discussion_r1795392918 ## sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala: ## @@ -204,17 +204,6 @@ class SparkSession private( */ def listenerManager: ExecutionListe

Re: [PR] [SPARK-49920][INFRA] Install `R` for `ubuntu 24.04` when GA run `k8s-integration-tests` [spark]

2024-10-10 Thread via GitHub
LuciferYang commented on PR #48406: URL: https://github.com/apache/spark/pull/48406#issuecomment-2405015303 Merged into master. Thanks @panbingkun @HyukjinKwon -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL a

Re: [PR] [SPARK-49920][INFRA] Install `R` for `ubuntu 24.04` when GA run `k8s-integration-tests` [spark]

2024-10-10 Thread via GitHub
LuciferYang closed pull request #48406: [SPARK-49920][INFRA] Install `R` for `ubuntu 24.04` when GA run `k8s-integration-tests` URL: https://github.com/apache/spark/pull/48406 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and u

Re: [PR] [SPARK-49920][INFRA] Install `R` for `ubuntu 24.04` when GA run `k8s-integration-tests` [spark]

2024-10-10 Thread via GitHub
LuciferYang commented on PR #48406: URL: https://github.com/apache/spark/pull/48406#issuecomment-2405012974 https://github.com/user-attachments/assets/c0f6ac74-abd3-49a0-aab0-f50c92574a80";> all test passed -- This is an automated message from the Apache Git Service. To respond

[PR] Doc: Fix required java versions for spark 3.5 [spark]

2024-10-10 Thread via GitHub
dvorst opened a new pull request, #48411: URL: https://github.com/apache/spark/pull/48411 The original description "PySpark requires Java 8 or later" is incorrect since 3.5 does not support java 8 anymore and the latest supported version is 17, the downloading page however, does correctly s

Re: [PR] [SPARK-37019][SQL] Add codegen support to array higher-order functions [spark]

2024-10-10 Thread via GitHub
chris-twiner commented on PR #34558: URL: https://github.com/apache/spark/pull/34558#issuecomment-2404981056 > @Kimahriman just out of curiosity, how much did the performance improve? I just wanted to add to the above response that I've implemented a compilation scheme [here](https:/

Re: [PR] [SPARK-49924][SQL] Fix `containsNull` of `ArrayCompact` [spark]

2024-10-10 Thread via GitHub
zhengruifeng commented on code in PR #48410: URL: https://github.com/apache/spark/pull/48410#discussion_r1795330652 ## sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/OptimizerSuite.scala: ## @@ -313,4 +313,23 @@ class OptimizerSuite extends PlanTest {

Re: [PR] [SPARK-49924][SQL] Fix `containsNull` of `ArrayCompact` [spark]

2024-10-10 Thread via GitHub
zhengruifeng commented on code in PR #48410: URL: https://github.com/apache/spark/pull/48410#discussion_r1795330652 ## sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/OptimizerSuite.scala: ## @@ -313,4 +313,23 @@ class OptimizerSuite extends PlanTest {

Re: [PR] [SPARK-49924][SQL] Fix `containsNull` of `ArrayCompact` [spark]

2024-10-10 Thread via GitHub
zhengruifeng commented on code in PR #48410: URL: https://github.com/apache/spark/pull/48410#discussion_r1795330189 ## sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/OptimizerSuite.scala: ## @@ -313,4 +313,23 @@ class OptimizerSuite extends PlanTest {

[PR] [SPARK-49924][SQL] Fix `containsNull` of `ArrayCompact` [spark]

2024-10-10 Thread via GitHub
zhengruifeng opened a new pull request, #48410: URL: https://github.com/apache/spark/pull/48410 ### What changes were proposed in this pull request? Fix `containsNull` of `ArrayCompact`, by adding a new expression `KnownNotContainsNull` ### Why are the changes needed? ht

Re: [PR] [SPARK-49756][SQL] Postgres dialect supports pushdown datetime functions. [spark]

2024-10-10 Thread via GitHub
cloud-fan closed pull request #48210: [SPARK-49756][SQL] Postgres dialect supports pushdown datetime functions. URL: https://github.com/apache/spark/pull/48210 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [PR] [SPARK-49756][SQL] Postgres dialect supports pushdown datetime functions. [spark]

2024-10-10 Thread via GitHub
cloud-fan commented on PR #48210: URL: https://github.com/apache/spark/pull/48210#issuecomment-2404890325 thanks, merging to master! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific c

Re: [PR] [SPARK-49816][SQL] Should only update out-going-ref-count for referenced outer CTE relation [spark]

2024-10-10 Thread via GitHub
cloud-fan commented on PR #48284: URL: https://github.com/apache/spark/pull/48284#issuecomment-2404886221 Yes please, we can discuss more on your PR later. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[PR] [SPARK-49922][BUILD] Upgrade `sbt-assembly` to `2.3.0` [spark]

2024-10-10 Thread via GitHub
panbingkun opened a new pull request, #48409: URL: https://github.com/apache/spark/pull/48409 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How

  1   2   >