Re: [PR] [SPARK-50793][SQL] Fix MySQL cast function for DOUBLE, LONGTEXT, BIGINT and BLOB types [spark]

2025-02-28 Thread via GitHub
beliefer commented on code in PR #49453: URL: https://github.com/apache/spark/pull/49453#discussion_r1975469190 ## connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/MySQLIntegrationSuite.scala: ## @@ -241,6 +241,84 @@ class MySQLIntegrationSuite exte

Re: [PR] [SPARK-50793][SQL] Fix MySQL cast function for DOUBLE, LONGTEXT, BIGINT and BLOB types [spark]

2025-02-28 Thread via GitHub
sunxiaoguang commented on code in PR #49453: URL: https://github.com/apache/spark/pull/49453#discussion_r1975473641 ## connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/MySQLIntegrationSuite.scala: ## @@ -241,6 +241,84 @@ class MySQLIntegrationSuite

Re: [PR] [SPARK-51341][CORE] Cancel time task with suitable way. [spark]

2025-02-28 Thread via GitHub
srowen commented on code in PR #50107: URL: https://github.com/apache/spark/pull/50107#discussion_r1975478508 ## core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala: ## @@ -169,6 +171,7 @@ private[spark] class TaskSchedulerImpl( protected val executorIdToHo

[PR] [SPARK-51357][SQL] Preserve plan change logging level for views [spark]

2025-02-28 Thread via GitHub
vladimirg-db opened a new pull request, #50118: URL: https://github.com/apache/spark/pull/50118 ### What changes were proposed in this pull request? Preserve plan change logging level for views: - `spark.sql.planChangeLog.level` - `spark.sql.expressionTreeChangeLog.level`

Re: [PR] [SPARK-51354][K8S][TEST][DOC] Fix sbt K8s integration test arg javaImageTag [spark]

2025-02-28 Thread via GitHub
dongjoon-hyun commented on code in PR #50113: URL: https://github.com/apache/spark/pull/50113#discussion_r1975674932 ## project/SparkBuild.scala: ## @@ -1002,10 +1002,9 @@ object KubernetesIntegrationTests { if (excludeTags.exists(_.equalsIgnoreCase("r"))) {

Re: [PR] [SPARK-51347] Enable Ingress and Service Support for Spark Driver [spark-kubernetes-operator]

2025-02-28 Thread via GitHub
jiangzho commented on PR #159: URL: https://github.com/apache/spark-kubernetes-operator/pull/159#issuecomment-2691300205 Thanks for the reminder! This is a result of local squashing, I'll fix this for following commits - will still use `jiang...@umich.edu` -- This is an automated messag

Re: [PR] [SPARK-50639][SQL] Improve warning logging in CacheManager [spark]

2025-02-28 Thread via GitHub
vrozov commented on PR #49276: URL: https://github.com/apache/spark/pull/49276#issuecomment-2691136056 @gengliangwang ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To u

Re: [PR] [SPARK-50714][SQL][SS] Enable schema evolution for TransformWithState when Avro encoding is used [spark]

2025-02-28 Thread via GitHub
HyukjinKwon commented on code in PR #49277: URL: https://github.com/apache/spark/pull/49277#discussion_r1975249747 ## python/pyspark/sql/tests/pandas/test_pandas_transform_with_state.py: ## @@ -1294,6 +1310,167 @@ def test_transform_with_state_with_timers_single_partition(self)

[PR] [MINOR][DOCS] Fix code comments for Executor [spark]

2025-02-28 Thread via GitHub
tomscut opened a new pull request, #50116: URL: https://github.com/apache/spark/pull/50116 ### Why are the changes needed? After SPARK-23429, the method Executor#startDriverHeartbeat has been removed. We should update the code comment. ### Does this PR introduce any user-fa

[PR] [SPARK-51355][DOCS] Add a brief explanation for Spark Connect at PySpark Overview page [spark]

2025-02-28 Thread via GitHub
HyukjinKwon opened a new pull request, #50117: URL: https://github.com/apache/spark/pull/50117 ### What changes were proposed in this pull request? This PR adds the brief explanation for Spark Connect at PySpark Overview page ![Screenshot 2025-02-28 at 7 35 07  PM](https://githu

Re: [PR] [SPARK-51229][BUILD][CONNECT] Fix dependency:analyze goal on connect common [spark]

2025-02-28 Thread via GitHub
vicennial commented on code in PR #49971: URL: https://github.com/apache/spark/pull/49971#discussion_r1975270283 ## pom.xml: ## @@ -860,6 +861,56 @@ ${protobuf.version} provided + Review Comment: That makes a lot of sense, thank you for the ex

[PR] [WIP][SPARK-49488][SQL][FOLLOWUP] Use correct mysql datetime fields when pushing down EXTRACT [spark]

2025-02-28 Thread via GitHub
beliefer opened a new pull request, #50112: URL: https://github.com/apache/spark/pull/50112 ### What changes were proposed in this pull request? ### Why are the changes needed? bug fix ### Does this PR introduce _any_ user-facing change? Yes, query result is co

Re: [PR] [SPARK-51354][K8S][TEST][DOC] Fix sbt K8s integration test arg javaImageTag [spark]

2025-02-28 Thread via GitHub
pan3793 commented on PR #50113: URL: https://github.com/apache/spark/pull/50113#issuecomment-2690138050 cc @dongjoon-hyun @yaooqinn -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific c

[PR] [BUILD] Test slf4j 2.0.17 [spark]

2025-02-28 Thread via GitHub
LuciferYang opened a new pull request, #50115: URL: https://github.com/apache/spark/pull/50115 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How

Re: [PR] [SPARK-51316][PYTHON][FOLLOW-UP] Revert unrelated changes and mark mapInPandas/mapInArrow batched in byte size [spark]

2025-02-28 Thread via GitHub
HyukjinKwon commented on PR #50111: URL: https://github.com/apache/spark/pull/50111#issuecomment-2690398551 Merged to master and branch-4.0. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the sp

Re: [PR] [SPARK-51316][PYTHON][FOLLOW-UP] Revert unrelated changes and mark mapInPandas/mapInArrow batched in byte size [spark]

2025-02-28 Thread via GitHub
HyukjinKwon closed pull request #50111: [SPARK-51316][PYTHON][FOLLOW-UP] Revert unrelated changes and mark mapInPandas/mapInArrow batched in byte size URL: https://github.com/apache/spark/pull/50111 -- This is an automated message from the Apache Git Service. To respond to the message, please

Re: [PR] [SPARK-51342][SQL] Add `TimeType` [spark]

2025-02-28 Thread via GitHub
MaxGekk commented on PR #50103: URL: https://github.com/apache/spark/pull/50103#issuecomment-2690413145 @LuciferYang @yaooqinn @HeartSaVioR Could you review the PR, please. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and u

Re: [PR] [SPARK-51289][SQL] Throw a proper error message for not fully implemented `SQLTableFunction` [spark]

2025-02-28 Thread via GitHub
wayneguow commented on code in PR #50073: URL: https://github.com/apache/spark/pull/50073#discussion_r1975385058 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/EliminateSQLFunctionNode.scala: ## @@ -34,8 +36,8 @@ object EliminateSQLFunctionNode extends Ru

Re: [PR] [SPARK-49756][SQL][FOLLOWUP] Use correct pgsql datetime fields when pushing down EXTRACT [spark]

2025-02-28 Thread via GitHub
cloud-fan commented on PR #50101: URL: https://github.com/apache/spark/pull/50101#issuecomment-2689792141 the streaming test failure is unrelated, thanks for the review, merging to master/4.0! -- This is an automated message from the Apache Git Service. To respond to the message, please l

Re: [PR] [SPARK-51351][SS] Do not materialize the output in Python worker for TWS [spark]

2025-02-28 Thread via GitHub
HeartSaVioR commented on PR #50110: URL: https://github.com/apache/spark/pull/50110#issuecomment-2690564346 https://github.com/HeartSaVioR/spark/actions/runs/13583292499/job/37988558904 One module failed due to heap space during compilation. I'm rerunning it. -- This is an automated

Re: [PR] [SPARK-51341][CORE] Cancel time task with suitable way. [spark]

2025-02-28 Thread via GitHub
beliefer commented on code in PR #50107: URL: https://github.com/apache/spark/pull/50107#discussion_r1975351020 ## core/src/main/scala/org/apache/spark/ui/ConsoleProgressBar.scala: ## @@ -121,5 +121,8 @@ private[spark] class ConsoleProgressBar(sc: SparkContext) extends Logging

Re: [PR] [SPARK-51341][CORE] Cancel time task with suitable way. [spark]

2025-02-28 Thread via GitHub
beliefer commented on PR #50107: URL: https://github.com/apache/spark/pull/50107#issuecomment-2690560779 ping @mridulm @LuciferYang @srowen cc @jjayadeep06 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [PR] [SPARK-49960][SQL] Custom ExpressionEncoder support and TransformingEncoder fixes [spark]

2025-02-28 Thread via GitHub
chris-twiner commented on PR #50023: URL: https://github.com/apache/spark/pull/50023#issuecomment-2691506401 > @chris-twiner can you fix the style issue? yeah done, I'll keep an eye on the progress over the next hour or so incase any restarts are needed. Odd though, no files mentione

Re: [PR] [SPARK-51341][CORE] Cancel time task with suitable way. [spark]

2025-02-28 Thread via GitHub
beliefer commented on code in PR #50107: URL: https://github.com/apache/spark/pull/50107#discussion_r1976289910 ## core/src/main/scala/org/apache/spark/BarrierTaskContext.scala: ## @@ -300,11 +300,7 @@ object BarrierTaskContext { @Since("2.4.0") def get(): BarrierTaskConte

Re: [PR] [SPARK-51341][CORE] Cancel time task with suitable way. [spark]

2025-02-28 Thread via GitHub
beliefer commented on code in PR #50107: URL: https://github.com/apache/spark/pull/50107#discussion_r1976289910 ## core/src/main/scala/org/apache/spark/BarrierTaskContext.scala: ## @@ -300,11 +300,7 @@ object BarrierTaskContext { @Since("2.4.0") def get(): BarrierTaskConte

Re: [PR] [SPARK-51290][SQL] Enable filling default values in DSv2 writes [spark]

2025-02-28 Thread via GitHub
aokolnychyi commented on code in PR #50044: URL: https://github.com/apache/spark/pull/50044#discussion_r1976098170 ## sql/core/src/test/scala/org/apache/spark/sql/connector/AlterTableTests.scala: ## @@ -328,7 +336,7 @@ trait AlterTableTests extends SharedSparkSession with Query

Re: [PR] [SPARK-49507][SQL] Fix Expected only partition pruning predicates exception [spark]

2025-02-28 Thread via GitHub
wangyum commented on code in PR #47998: URL: https://github.com/apache/spark/pull/47998#discussion_r1976298867 ## sql/hive/src/test/scala/org/apache/spark/sql/hive/TestHiveSuite.scala: ## @@ -47,4 +47,19 @@ class TestHiveSuite extends TestHiveSingleton with SQLTestUtils { te

Re: [PR] [SPARK-51354][K8S][TEST][DOC] Fix sbt K8s integration test arg javaImageTag [spark]

2025-02-28 Thread via GitHub
pan3793 commented on code in PR #50113: URL: https://github.com/apache/spark/pull/50113#discussion_r1976354240 ## project/SparkBuild.scala: ## @@ -1002,10 +1002,9 @@ object KubernetesIntegrationTests { if (excludeTags.exists(_.equalsIgnoreCase("r"))) { rDocke

[PR] Fix the teardown function of test_connect_function.py [spark]

2025-02-28 Thread via GitHub
sfc-gh-dyadav opened a new pull request, #50120: URL: https://github.com/apache/spark/pull/50120 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### H

Re: [PR] [SPARK-51290][SQL] Enable filling default values in DSv2 writes [spark]

2025-02-28 Thread via GitHub
aokolnychyi closed pull request #50044: [SPARK-51290][SQL] Enable filling default values in DSv2 writes URL: https://github.com/apache/spark/pull/50044 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go t

Re: [PR] [SPARK-42856][GRAPHX] Break tie with highest vertex id in label propagation [spark]

2025-02-28 Thread via GitHub
github-actions[bot] commented on PR #48871: URL: https://github.com/apache/spark/pull/48871#issuecomment-2691768545 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.

Re: [PR] [SPARK-49644][SQL] Support drop multi-level partition V2 table with partial partition spec [spark]

2025-02-28 Thread via GitHub
github-actions[bot] commented on PR #48108: URL: https://github.com/apache/spark/pull/48108#issuecomment-2691768570 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.

[PR] [WIP][SPARK-43221][CORE] Host local block fetching should use a block status of a block stored on disk [spark]

2025-02-28 Thread via GitHub
attilapiros opened a new pull request, #50122: URL: https://github.com/apache/spark/pull/50122 Thanks for @yorksity who reported this error and even provided a PR for it. This solution very different from https://github.com/apache/spark/pull/40883 as `BlockManagerMasterEndpoint#getLocati

[PR] [SPARK-50615][FOLLOWUP][SQL] Avoid dropping metadata in the push rule. [spark]

2025-02-28 Thread via GitHub
chenhao-db opened a new pull request, #50121: URL: https://github.com/apache/spark/pull/50121 ### What changes were proposed in this pull request? There is a bug in the initial optimizer rule that the `output` of the relation will be rebuilt based on the schema of the `HadoopFsRelatio

Re: [PR] [SPARK-51357][SQL] Preserve plan change logging level for views [spark]

2025-02-28 Thread via GitHub
vladimirg-db commented on PR #50118: URL: https://github.com/apache/spark/pull/50118#issuecomment-2691659971 @cloud-fan Wenchen, can you please take a look? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [PR] [SPARK-51334][CONNECT] Add java/scala version in analyze spark_version response [spark]

2025-02-28 Thread via GitHub
garlandz-db commented on PR #50102: URL: https://github.com/apache/spark/pull/50102#issuecomment-2690060116 @the-sakthi do you know where i can write them? im not seeing a relevant test suite that already includes tests for analyze plan handler -- This is an automated message from the Apa

Re: [PR] [SPARK-51351][SS] Do not materialize the output in Python worker for TWS [spark]

2025-02-28 Thread via GitHub
HeartSaVioR commented on PR #50110: URL: https://github.com/apache/spark/pull/50110#issuecomment-2691623600 Thanks! Merging to master/4.0. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spec

Re: [PR] [SPARK-51351][SS] Do not materialize the output in Python worker for TWS [spark]

2025-02-28 Thread via GitHub
HeartSaVioR closed pull request #50110: [SPARK-51351][SS] Do not materialize the output in Python worker for TWS URL: https://github.com/apache/spark/pull/50110 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abov

Re: [PR] [SPARK-51349] Change precedence of null and "null" in sorting in QueryTest [spark]

2025-02-28 Thread via GitHub
harshmotw-db commented on code in PR #50108: URL: https://github.com/apache/spark/pull/50108#discussion_r1975885008 ## sql/core/src/test/scala/org/apache/spark/sql/QueryTest.scala: ## @@ -326,7 +326,13 @@ object QueryTest extends Assertions { // For binary arrays, we conver

Re: [PR] [SPARK-51290][SQL] Enable filling default values in DSv2 writes [spark]

2025-02-28 Thread via GitHub
aokolnychyi commented on code in PR #50044: URL: https://github.com/apache/spark/pull/50044#discussion_r1976098471 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala: ## @@ -3534,7 +3534,8 @@ class Analyzer(override val catalogManager: CatalogM

[PR] [WIP] Removing extra copy for the changelog replay [spark]

2025-02-28 Thread via GitHub
ericm-db opened a new pull request, #50119: URL: https://github.com/apache/spark/pull/50119 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How wa

[PR] [SPARK-51358] [SS] Introduce snapshot upload lag detection through StateStoreCoordinator [spark]

2025-02-28 Thread via GitHub
zecookiez opened a new pull request, #50123: URL: https://github.com/apache/spark/pull/50123 ### What changes were proposed in this pull request? SPARK-51358 This PR adds detection logic + logging to detect delays in snapshot uploads across all state store instances

Re: [PR] [SPARK-51016][SQL] Fix for incorrect results on retry for Left Outer Join with indeterministic join keys [spark]

2025-02-28 Thread via GitHub
ahshahid commented on PR #50029: URL: https://github.com/apache/spark/pull/50029#issuecomment-2691810968 @mridulm @squito , I am unsure as to what you mean by marking the RDD inDeterministic, without modifying the RDD code 1) There is no concrete field in the RDD which marks it in

Re: [PR] [SPARK-49960][SQL] Custom ExpressionEncoder support and TransformingEncoder fixes [spark]

2025-02-28 Thread via GitHub
hvanhovell commented on PR #50023: URL: https://github.com/apache/spark/pull/50023#issuecomment-2691797769 Merging to master/4.0. Thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the speci

Re: [PR] [SPARK-49960][SQL] Custom ExpressionEncoder support and TransformingEncoder fixes [spark]

2025-02-28 Thread via GitHub
hvanhovell commented on PR #50023: URL: https://github.com/apache/spark/pull/50023#issuecomment-2691389433 @chris-twiner can you fix the style issue? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [PR] [SPARK-49507][SQL] Fix Expected only partition pruning predicates exception [spark]

2025-02-28 Thread via GitHub
wangyum commented on code in PR #47998: URL: https://github.com/apache/spark/pull/47998#discussion_r1976206065 ## sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala: ## @@ -415,8 +415,11 @@ private[client] class Shim_v2_0 extends Shim with Logging { t

Re: [PR] [SPARK-44856][PYTHON] Improve Python UDTF arrow serializer performance [spark]

2025-02-28 Thread via GitHub
ueshin commented on code in PR #50099: URL: https://github.com/apache/spark/pull/50099#discussion_r1976207186 ## python/pyspark/sql/pandas/serializers.py: ## @@ -175,6 +178,16 @@ def wrap_and_init_stream(): return super(ArrowStreamUDFSerializer, self).dump_stream(wrap_

Re: [PR] [SPARK-49507][SQL] Fix Expected only partition pruning predicates exception [spark]

2025-02-28 Thread via GitHub
wangyum commented on code in PR #47998: URL: https://github.com/apache/spark/pull/47998#discussion_r1976217724 ## sql/hive/src/test/scala/org/apache/spark/sql/hive/TestHiveSuite.scala: ## @@ -47,4 +47,19 @@ class TestHiveSuite extends TestHiveSingleton with SQLTestUtils { te

[PR] [SPARK-51354][K8S][TEST][DOC] Fix sbt K8s integration test arg javaImageTag [spark]

2025-02-28 Thread via GitHub
pan3793 opened a new pull request, #50113: URL: https://github.com/apache/spark/pull/50113 ### What changes were proposed in this pull request? As title. ### Why are the changes needed? When I follow the dev docs to run K8s IT using sbt, and set `spark.kubernetes

[PR] [SPARK-51353][INFRA][BUILD]Retry dyn/closer.lua for mvn before falling back to archive.a.o [spark]

2025-02-28 Thread via GitHub
yaooqinn opened a new pull request, #50114: URL: https://github.com/apache/spark/pull/50114 ### What changes were proposed in this pull request? This PR enables retry for dyn/closer.lua for mvn before falling back to archive.a.o. Before this PR, we used `curl` w/o retry to down