Re: [PR] [SPARK-51258][SQL][FOLLOWUP] Remove unnecessary inheritance from SQLConfHelper [spark]

2025-02-23 Thread via GitHub
beliefer commented on PR #50046: URL: https://github.com/apache/spark/pull/50046#issuecomment-2676739969 Merged into branch-4.0/master @LuciferYang Thank you! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

Re: [PR] [SPARK-51292][SQL] Remove unnecessary inheritance from PlanTestBase, ExpressionEvalHelper and PlanTest [spark]

2025-02-23 Thread via GitHub
beliefer closed pull request #50047: [SPARK-51292][SQL] Remove unnecessary inheritance from PlanTestBase, ExpressionEvalHelper and PlanTest URL: https://github.com/apache/spark/pull/50047 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

Re: [PR] [SPARK-51293][CORE][SQL][SS][MLLIB][TESTS] Cleanup unused private functions from test suites [spark]

2025-02-23 Thread via GitHub
LuciferYang closed pull request #50049: [SPARK-51293][CORE][SQL][SS][MLLIB][TESTS] Cleanup unused private functions from test suites URL: https://github.com/apache/spark/pull/50049 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

Re: [PR] [SPARK-51293][CORE][SQL][SS][MLLIB][TESTS] Cleanup unused private functions from test suites [spark]

2025-02-23 Thread via GitHub
LuciferYang commented on PR #50049: URL: https://github.com/apache/spark/pull/50049#issuecomment-2676746744 Merged into master. Thanks @dongjoon-hyun and @beliefer -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the U

Re: [PR] [SPARK-51292][SQL] Remove unnecessary inheritance from PlanTestBase, ExpressionEvalHelper and PlanTest [spark]

2025-02-23 Thread via GitHub
beliefer commented on PR #50047: URL: https://github.com/apache/spark/pull/50047#issuecomment-2676747067 Merged into branch-4.0/master @LuciferYang Thank you! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[PR] [SPARK-51298][WIP] Support variant in CSV scan [spark]

2025-02-23 Thread via GitHub
chenhao-db opened a new pull request, #50052: URL: https://github.com/apache/spark/pull/50052 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? No. ### How was this patch te

[PR] [DRAFT] Resolve default string exprs [spark]

2025-02-23 Thread via GitHub
stefankandic opened a new pull request, #50053: URL: https://github.com/apache/spark/pull/50053 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### Ho

Re: [PR] [SPARK-49912] Refactor simple CASE statement to evaluate the case variable only once [spark]

2025-02-23 Thread via GitHub
dusantism-db commented on code in PR #50027: URL: https://github.com/apache/spark/pull/50027#discussion_r1966931924 ## sql/core/src/main/scala/org/apache/spark/sql/scripting/SqlScriptingExecutionNode.scala: ## @@ -599,6 +599,116 @@ class CaseStatementExec( } } +/** + * Exe

Re: [PR] [SPARK-49912] Refactor simple CASE statement to evaluate the case variable only once [spark]

2025-02-23 Thread via GitHub
dusantism-db commented on code in PR #50027: URL: https://github.com/apache/spark/pull/50027#discussion_r1966944493 ## sql/core/src/main/scala/org/apache/spark/sql/scripting/SqlScriptingExecutionNode.scala: ## @@ -599,6 +599,116 @@ class CaseStatementExec( } } +/** + * Exe

[PR] [MINOR][DOCS] Clarify spark.remote and spark.master in pyspark-connect and pyspark-client at install.rst [spark]

2025-02-23 Thread via GitHub
HyukjinKwon opened a new pull request, #50054: URL: https://github.com/apache/spark/pull/50054 ### What changes were proposed in this pull request? This PR fixes the installation page for PySpark to clarify spark.remote and spark.master in pyspark-connect and pyspark-client at install

Re: [PR] [DRAFT] Resolve default string exprs [spark]

2025-02-23 Thread via GitHub
HyukjinKwon commented on code in PR #50053: URL: https://github.com/apache/spark/pull/50053#discussion_r1966967731 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveDDLCommandStringTypes.scala: ## @@ -28,6 +29,10 @@ import org.apache.spark.sql.types.{D

Re: [PR] [SPARK-51294][SQL][CONNECT][TESTS] Improve the readability by split the variable of jars and configs for SparkConnectServerUtils. [spark]

2025-02-23 Thread via GitHub
HyukjinKwon commented on PR #50050: URL: https://github.com/apache/spark/pull/50050#issuecomment-2677269920 Merged to master and branch-4.0. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the sp

Re: [PR] [SPARK-51256][SQL] Increase parallelism if joining with small bucket table [spark]

2025-02-23 Thread via GitHub
wangyum commented on PR #50004: URL: https://github.com/apache/spark/pull/50004#issuecomment-2677280562 ```scala spark.sql("set spark.sql.autoBroadcastJoinThreshold=-1") spark.range(1000).selectExpr("id", "id + 1 as new_id").write.saveAsTable("t1") spark.range(10).selectExpr("id

Re: [PR] [SPARK-51294][SQL][CONNECT][TESTS] Improve the readability by split the variable of jars and configs for SparkConnectServerUtils. [spark]

2025-02-23 Thread via GitHub
HyukjinKwon closed pull request #50050: [SPARK-51294][SQL][CONNECT][TESTS] Improve the readability by split the variable of jars and configs for SparkConnectServerUtils. URL: https://github.com/apache/spark/pull/50050 -- This is an automated message from the Apache Git Service. To respond to

Re: [PR] [MINOR][DOCS] Fixed the scope of the query option in sql-data-sources-jdbc.md [spark]

2025-02-23 Thread via GitHub
llphxd commented on PR #50048: URL: https://github.com/apache/spark/pull/50048#issuecomment-2676709230 > > I would love to, but I failed to register jira and have to wait 24 hours. So I'll create the issue later. > > Please tell me if the issue is ready. issue is ready. [ht

Re: [PR] [SPARK-51258][SQL][FOLLOWUP] Remove unnecessary inheritance from SQLConfHelper [spark]

2025-02-23 Thread via GitHub
beliefer closed pull request #50046: [SPARK-51258][SQL][FOLLOWUP] Remove unnecessary inheritance from SQLConfHelper URL: https://github.com/apache/spark/pull/50046 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL a

Re: [PR] [SPARK-51272][CORE]. Fix for the race condition in Scheduler causing failure in retrying all partitions in case of indeterministic shuffle keys [spark]

2025-02-23 Thread via GitHub
ahshahid commented on code in PR #50033: URL: https://github.com/apache/spark/pull/50033#discussion_r1966860945 ## core/src/test/scala/org/apache/spark/scheduler/DAGSchedulerSuite.scala: ## @@ -3185,16 +3217,103 @@ class DAGSchedulerSuite extends SparkFunSuite with TempLocalSpa

Re: [PR] [SPARK-49912] Refactor simple CASE statement to evaluate the case variable only once [spark]

2025-02-23 Thread via GitHub
dusantism-db commented on code in PR #50027: URL: https://github.com/apache/spark/pull/50027#discussion_r1966944493 ## sql/core/src/main/scala/org/apache/spark/sql/scripting/SqlScriptingExecutionNode.scala: ## @@ -599,6 +599,116 @@ class CaseStatementExec( } } +/** + * Exe

Re: [PR] [SPARK-49912] Refactor simple CASE statement to evaluate the case variable only once [spark]

2025-02-23 Thread via GitHub
dusantism-db commented on code in PR #50027: URL: https://github.com/apache/spark/pull/50027#discussion_r1966944963 ## sql/core/src/main/scala/org/apache/spark/sql/scripting/SqlScriptingExecutionNode.scala: ## @@ -599,6 +599,116 @@ class CaseStatementExec( } } +/** + * Exe

Re: [PR] [DRAFT] Two string types [spark]

2025-02-23 Thread via GitHub
github-actions[bot] commented on PR #48861: URL: https://github.com/apache/spark/pull/48861#issuecomment-2677214294 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.

Re: [PR] [SPARK-50292] Add MapStatus RowCount optimize skewed job [spark]

2025-02-23 Thread via GitHub
github-actions[bot] closed pull request #48825: [SPARK-50292] Add MapStatus RowCount optimize skewed job URL: https://github.com/apache/spark/pull/48825 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [PR] [SPARK-50319] Reorder ResolveIdentifierClause and BindParameter rules [spark]

2025-02-23 Thread via GitHub
github-actions[bot] commented on PR #48849: URL: https://github.com/apache/spark/pull/48849#issuecomment-2677214302 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.

Re: [PR] [MINOR][DOCS] Clarify spark.remote and spark.master in pyspark-connect and pyspark-client at install.rst [spark]

2025-02-23 Thread via GitHub
HyukjinKwon closed pull request #50054: [MINOR][DOCS] Clarify spark.remote and spark.master in pyspark-connect and pyspark-client at install.rst URL: https://github.com/apache/spark/pull/50054 -- This is an automated message from the Apache Git Service. To respond to the message, please log o

Re: [PR] [MINOR][DOCS] Clarify spark.remote and spark.master in pyspark-connect and pyspark-client at install.rst [spark]

2025-02-23 Thread via GitHub
HyukjinKwon commented on PR #50054: URL: https://github.com/apache/spark/pull/50054#issuecomment-2677426348 Merged to master and branch-4.0. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the sp

Re: [PR] [SPARK-51299][SQL][UI] MetricUtils.stringValue should filter metric values with initValue rather than a hardcoded value [spark]

2025-02-23 Thread via GitHub
jiwen624 commented on PR #50055: URL: https://github.com/apache/spark/pull/50055#issuecomment-2677443309 @dongjoon-hyun @cloud-fan could you let me know your thoughts about this fix? Thanks. -- This is an automated message from the Apache Git Service. To respond to the message, please log

Re: [PR] [SPARK-51282][ML][PYTHON][CONNECT] Optimize OneVsRestModel transform by eliminating the JVM-Python data exchange [spark]

2025-02-23 Thread via GitHub
zhengruifeng commented on PR #50041: URL: https://github.com/apache/spark/pull/50041#issuecomment-2677458005 Let me convert it to draft for now, to make it easy to debug another issue https://issues.apache.org/jira/browse/SPARK-51118 -- This is an automated message from the Apache Git Ser

Re: [PR] [SPARK-51281][SQL] DataFrameWriterV2 should respect the path option [spark]

2025-02-23 Thread via GitHub
beliefer commented on code in PR #50040: URL: https://github.com/apache/spark/pull/50040#discussion_r1967061151 ## sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala: ## @@ -5545,6 +5545,15 @@ object SQLConf { .booleanConf .createWithDefault(false)

Re: [PR] [SPARK-51281][SQL] DataFrameWriterV2 should respect the path option [spark]

2025-02-23 Thread via GitHub
beliefer commented on code in PR #50040: URL: https://github.com/apache/spark/pull/50040#discussion_r1967072225 ## sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala: ## @@ -5545,6 +5545,15 @@ object SQLConf { .booleanConf .createWithDefault(false)

[PR] [SPARK-51300][PS][DOCS] Fix broken link for `ps.sql` [spark]

2025-02-23 Thread via GitHub
itholic opened a new pull request, #50056: URL: https://github.com/apache/spark/pull/50056 ### What changes were proposed in this pull request? This PR proposes to fix broken link for `ps.sql` ### Why are the changes needed? There is broken link in official documentation

Re: [PR] [SPARK-51206][PYTHON][CONNECT] Move Arrow conversion helpers out of Spark Connect [spark]

2025-02-23 Thread via GitHub
zhengruifeng commented on code in PR #49941: URL: https://github.com/apache/spark/pull/49941#discussion_r1967091437 ## python/pyspark/sql/conversion.py: ## @@ -0,0 +1,543 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreement

[PR] [SPARK-51301][BUILD] Bump zstd-jni 1.5.7-1 [spark]

2025-02-23 Thread via GitHub
pan3793 opened a new pull request, #50057: URL: https://github.com/apache/spark/pull/50057 ### What changes were proposed in this pull request? Bump zstd-jni to the latest version. ### Why are the changes needed? https://github.com/facebook/zstd/releases/tag/v1.5.

Re: [PR] [SPARK-51301][BUILD] Bump zstd-jni 1.5.7-1 [spark]

2025-02-23 Thread via GitHub
pan3793 commented on PR #50057: URL: https://github.com/apache/spark/pull/50057#issuecomment-2677646689 cc @dongjoon-hyun @LuciferYang -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specifi

Re: [PR] [SPARK-49912] Refactor simple CASE statement to evaluate the case variable only once [spark]

2025-02-23 Thread via GitHub
davidm-db commented on code in PR #50027: URL: https://github.com/apache/spark/pull/50027#discussion_r1967153704 ## sql/core/src/main/scala/org/apache/spark/sql/scripting/SqlScriptingExecutionNode.scala: ## @@ -599,6 +599,116 @@ class CaseStatementExec( } } +/** + * Execut

Re: [PR] [SPARK-50785][SQL] Refactor FOR statement to utilize local variables properly. [spark]

2025-02-23 Thread via GitHub
cloud-fan commented on code in PR #50026: URL: https://github.com/apache/spark/pull/50026#discussion_r1967012051 ## sql/core/src/main/scala/org/apache/spark/sql/scripting/SqlScriptingExecutionNode.scala: ## @@ -206,6 +207,15 @@ class TriggerToExceptionHandlerMap( def getNotFo

Re: [PR] [SPARK-49489][SQL][HIVE] HMS client respects `hive.thrift.client.maxmessage.size` [spark]

2025-02-23 Thread via GitHub
wangyum commented on PR #50022: URL: https://github.com/apache/spark/pull/50022#issuecomment-2677370449 @shrprasa Does it work after applying the patch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

Re: [PR] [SPARK-51281][SQL] DataFrameWriterV2 should respect the path option [spark]

2025-02-23 Thread via GitHub
dongjoon-hyun commented on code in PR #50040: URL: https://github.com/apache/spark/pull/50040#discussion_r1967023059 ## sql/core/src/test/scala/org/apache/spark/sql/DataFrameWriterV2Suite.scala: ## @@ -839,4 +839,26 @@ class DataFrameWriterV2Suite extends QueryTest with SharedS

Re: [PR] [SPARK-51281][SQL] DataFrameWriterV2 should respect the path option [spark]

2025-02-23 Thread via GitHub
dongjoon-hyun commented on code in PR #50040: URL: https://github.com/apache/spark/pull/50040#discussion_r1967023059 ## sql/core/src/test/scala/org/apache/spark/sql/DataFrameWriterV2Suite.scala: ## @@ -839,4 +839,26 @@ class DataFrameWriterV2Suite extends QueryTest with SharedS

Re: [PR] [SPARK-51281][SQL] DataFrameWriterV2 should respect the path option [spark]

2025-02-23 Thread via GitHub
cloud-fan commented on code in PR #50040: URL: https://github.com/apache/spark/pull/50040#discussion_r1966991532 ## sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala: ## @@ -5545,6 +5545,15 @@ object SQLConf { .booleanConf .createWithDefault(false

Re: [PR] [SPARK-49489][SQL][HIVE] HMS client respects `hive.thrift.client.maxmessage.size` [spark]

2025-02-23 Thread via GitHub
pan3793 commented on code in PR #50022: URL: https://github.com/apache/spark/pull/50022#discussion_r1967007721 ## sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala: ## @@ -1407,13 +1409,62 @@ private[hive] object HiveClientImpl extends Logging {

Re: [PR] [SPARK-51297][DOCS] Fixed the scope of the query option in sql-data-sources-jdbc.md [spark]

2025-02-23 Thread via GitHub
beliefer commented on PR #50048: URL: https://github.com/apache/spark/pull/50048#issuecomment-2677392603 Merged into branch-4.0/master @llphxd @HyukjinKwon @yaooqinn Thank you all! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to G

[PR] [WIP][SPARK-51299][SQL][Web UI] Filter out invalid metric values with initValue [spark]

2025-02-23 Thread via GitHub
jiwen624 opened a new pull request, #50055: URL: https://github.com/apache/spark/pull/50055 ### What changes were proposed in this pull request? This PR proposes to use `initValue` of a metric in `org.apache.spark.util.MetricUtils.stringValue` instead of a hardcoded init value to filter

Re: [PR] [SPARK-49489][SQL][HIVE] HMS client respects `hive.thrift.client.maxmessage.size` [spark]

2025-02-23 Thread via GitHub
pan3793 commented on code in PR #50022: URL: https://github.com/apache/spark/pull/50022#discussion_r1967004950 ## sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala: ## @@ -81,14 +82,18 @@ private[spark] class HiveExternalCatalog(conf: SparkConf, hadoop

Re: [PR] [SPARK-51294][SQL][CONNECT][TESTS] Improve the readability by split the variable of jars and configs for SparkConnectServerUtils. [spark]

2025-02-23 Thread via GitHub
beliefer commented on PR #50050: URL: https://github.com/apache/spark/pull/50050#issuecomment-2677381793 @HyukjinKwon Thank you very much! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spec

Re: [PR] [SPARK-51297][DOCS] Fixed the scope of the query option in sql-data-sources-jdbc.md [spark]

2025-02-23 Thread via GitHub
beliefer closed pull request #50048: [SPARK-51297][DOCS] Fixed the scope of the query option in sql-data-sources-jdbc.md URL: https://github.com/apache/spark/pull/50048 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

Re: [PR] [SPARK-51278][PYTHON] Use appropriate structure of JSON format for `PySparkLogger` [spark]

2025-02-23 Thread via GitHub
itholic commented on code in PR #50038: URL: https://github.com/apache/spark/pull/50038#discussion_r1967030513 ## python/pyspark/logger/logger.py: ## @@ -66,10 +67,21 @@ def format(self, record: logging.LogRecord) -> str: } if record.exc_info: exc_

Re: [PR] [MINOR][DOCS] Clarify spark.remote and spark.master in pyspark-connect and pyspark-client at install.rst [spark]

2025-02-23 Thread via GitHub
HyukjinKwon commented on code in PR #50054: URL: https://github.com/apache/spark/pull/50054#discussion_r1967033629 ## python/docs/source/getting_started/install.rst: ## @@ -96,9 +96,7 @@ If you want to make Spark Connect default, you can install and additional librar It will a

Re: [PR] [MINOR][DOCS] Clarify spark.remote and spark.master in pyspark-connect and pyspark-client at install.rst [spark]

2025-02-23 Thread via GitHub
HyukjinKwon commented on code in PR #50054: URL: https://github.com/apache/spark/pull/50054#discussion_r1967033498 ## python/docs/source/getting_started/install.rst: ## @@ -96,9 +96,7 @@ If you want to make Spark Connect default, you can install and additional librar It will a

Re: [PR] [MINOR][DOCS] Clarify spark.remote and spark.master in pyspark-connect and pyspark-client at install.rst [spark]

2025-02-23 Thread via GitHub
dongjoon-hyun commented on code in PR #50054: URL: https://github.com/apache/spark/pull/50054#discussion_r1967034041 ## python/docs/source/getting_started/install.rst: ## @@ -96,9 +96,7 @@ If you want to make Spark Connect default, you can install and additional librar It will

Re: [PR] [MINOR][DOCS] Clarify spark.remote and spark.master in pyspark-connect and pyspark-client at install.rst [spark]

2025-02-23 Thread via GitHub
dongjoon-hyun commented on code in PR #50054: URL: https://github.com/apache/spark/pull/50054#discussion_r1967030929 ## python/docs/source/getting_started/install.rst: ## @@ -96,9 +96,7 @@ If you want to make Spark Connect default, you can install and additional librar It will