Re: [PR] [SPARK-50763][SQL] Add Analyzer rule for resolving SQL table functions [spark]

2025-03-05 Thread via GitHub
cloud-fan commented on code in PR #49471: URL: https://github.com/apache/spark/pull/49471#discussion_r1982862272 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala: ## @@ -1675,6 +1676,91 @@ class SessionCatalog( } } + /** + *

Re: [PR] [SPARK-50763][SQL] Add Analyzer rule for resolving SQL table functions [spark]

2025-03-05 Thread via GitHub
cloud-fan commented on code in PR #49471: URL: https://github.com/apache/spark/pull/49471#discussion_r1982858588 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala: ## @@ -1675,6 +1676,91 @@ class SessionCatalog( } } + /** + *

Re: [PR] [SPARK-50763][SQL] Add Analyzer rule for resolving SQL table functions [spark]

2025-03-05 Thread via GitHub
cloud-fan commented on code in PR #49471: URL: https://github.com/apache/spark/pull/49471#discussion_r1982859590 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala: ## @@ -1675,6 +1676,91 @@ class SessionCatalog( } } + /** + *

Re: [PR] [SPARK-51372][SQL] Introduce a builder pattern in TableCatalog [spark]

2025-03-05 Thread via GitHub
cloud-fan commented on PR #50137: URL: https://github.com/apache/spark/pull/50137#issuecomment-2703005083 From a user's point of view, I feel the new API is a bit cumbersome as I need to implement my own `TableBuilder`, and then implement `buildTable` to return it. `TableBuilder` has quite

Re: [PR] [SPARK-50763][SQL] Add Analyzer rule for resolving SQL table functions [spark]

2025-03-05 Thread via GitHub
cloud-fan commented on code in PR #49471: URL: https://github.com/apache/spark/pull/49471#discussion_r1982850033 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala: ## @@ -2636,6 +2637,93 @@ class Analyzer(override val catalogManager: CatalogMa

Re: [PR] [WIP][SPARK-50892][SQL]Add UnionLoopExec, physical operator for recursion, to perform execution of recursive queries [spark]

2025-03-05 Thread via GitHub
peter-toth commented on code in PR #49955: URL: https://github.com/apache/spark/pull/49955#discussion_r1982016441 ## sql/core/src/main/scala/org/apache/spark/sql/execution/UnionLoopExec.scala: ## @@ -0,0 +1,229 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under on

Re: [PR] [SPARK-51380][SQL] Make sure JDBC dialect could check the supported functions by Introduce functionToSQL and aggregateFunctionToSQL [spark]

2025-03-05 Thread via GitHub
beliefer commented on code in PR #50143: URL: https://github.com/apache/spark/pull/50143#discussion_r1982842931 ## sql/core/src/main/scala/org/apache/spark/sql/jdbc/JdbcDialects.scala: ## @@ -413,7 +413,7 @@ abstract class JdbcDialect extends Serializable with Logging {

Re: [PR] [SPARK-51366][SQL] Add a new visitCaseWhen method to V2ExpressionSQLBuilder [spark]

2025-03-05 Thread via GitHub
cloud-fan commented on PR #50129: URL: https://github.com/apache/spark/pull/50129#issuecomment-2703018784 This looks like not valuable to me but complicates the API. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

Re: [PR] [SPARK-51366][SQL] Add a new visitCaseWhen method to V2ExpressionSQLBuilder [spark]

2025-03-05 Thread via GitHub
cloud-fan commented on code in PR #50129: URL: https://github.com/apache/spark/pull/50129#discussion_r1982816913 ## sql/catalyst/src/main/java/org/apache/spark/sql/connector/util/V2ExpressionSQLBuilder.java: ## @@ -219,6 +218,10 @@ protected String inputToSQL(Expression input) {

Re: [PR] [SPARK-51396][SQL] RuntimeConfig.getOption shouldn't use exceptions for control flow [spark]

2025-03-05 Thread via GitHub
HyukjinKwon closed pull request #50167: [SPARK-51396][SQL] RuntimeConfig.getOption shouldn't use exceptions for control flow URL: https://github.com/apache/spark/pull/50167 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

Re: [PR] [SPARK-51396][SQL] RuntimeConfig.getOption shouldn't use exceptions for control flow [spark]

2025-03-05 Thread via GitHub
HyukjinKwon commented on PR #50167: URL: https://github.com/apache/spark/pull/50167#issuecomment-2702945371 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] [SPARK-51365][TESTS] Test maven + macos [spark]

2025-03-05 Thread via GitHub
dongjoon-hyun commented on code in PR #50178: URL: https://github.com/apache/spark/pull/50178#discussion_r1982774552 ## sql/core/src/test/scala/org/apache/spark/sql/test/SharedSparkSession.scala: ## @@ -79,6 +79,8 @@ trait SharedSparkSessionBase StaticSQLConf.WAREHOUSE_PA

Re: [PR] [SPARK-51380][SQL] Make sure JDBC dialect could check the supported functions by Introduce functionToSQL and aggregateFunctionToSQL [spark]

2025-03-05 Thread via GitHub
cloud-fan commented on code in PR #50143: URL: https://github.com/apache/spark/pull/50143#discussion_r1982782679 ## sql/core/src/main/scala/org/apache/spark/sql/jdbc/JdbcDialects.scala: ## @@ -413,7 +413,7 @@ abstract class JdbcDialect extends Serializable with Logging {

Re: [PR] [SPARK-51365][TESTS] Test maven + macos [spark]

2025-03-05 Thread via GitHub
LuciferYang commented on code in PR #50178: URL: https://github.com/apache/spark/pull/50178#discussion_r1982784603 ## sql/hive/src/test/scala/org/apache/spark/sql/hive/test/TestHive.scala: ## @@ -44,33 +44,48 @@ import org.apache.spark.sql.execution.{CommandExecutionMode, Query

Re: [PR] [SPARK-51097] [SS] Split apart SparkPlan metrics and instance metrics [spark]

2025-03-05 Thread via GitHub
cloud-fan commented on code in PR #50157: URL: https://github.com/apache/spark/pull/50157#discussion_r1982783776 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/statefulOperators.scala: ## @@ -434,12 +447,12 @@ trait StateStoreWriter } protected def s

[PR] [SPARK-51281][SQL][3.5] DataFrameWriterV2 should respect the path option [spark]

2025-03-05 Thread via GitHub
cloud-fan opened a new pull request, #50179: URL: https://github.com/apache/spark/pull/50179 backport https://github.com/apache/spark/pull/50040 to 3.5 ### What changes were proposed in this pull request? Unlike `DataFrameWriter.saveAsTable` where we explicitly get the "

Re: [PR] [SPARK-49488][SQL][FOLLOWUP] Use correct MySQL datetime functions when pushing down EXTRACT [spark]

2025-03-05 Thread via GitHub
cloud-fan closed pull request #50112: [SPARK-49488][SQL][FOLLOWUP] Use correct MySQL datetime functions when pushing down EXTRACT URL: https://github.com/apache/spark/pull/50112 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

Re: [PR] [SPARK-49488][SQL][FOLLOWUP] Use correct MySQL datetime functions when pushing down EXTRACT [spark]

2025-03-05 Thread via GitHub
cloud-fan commented on PR #50112: URL: https://github.com/apache/spark/pull/50112#issuecomment-2702922948 thanks, merging to master/4.0! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specif

Re: [PR] [SPARK-51408][YARN][TESTS] AmIpFilterSuite#testProxyUpdate fails in some networks [spark]

2025-03-05 Thread via GitHub
cnauroth commented on code in PR #50173: URL: https://github.com/apache/spark/pull/50173#discussion_r1982738051 ## resource-managers/yarn/src/test/scala/org/apache/spark/deploy/yarn/AmIpFilterSuite.scala: ## @@ -162,8 +162,10 @@ class AmIpFilterSuite extends SparkFunSuite {

Re: [PR] [SPARK-51408][YARN][TESTS] AmIpFilterSuite#testProxyUpdate fails in some networks [spark]

2025-03-05 Thread via GitHub
cnauroth commented on code in PR #50173: URL: https://github.com/apache/spark/pull/50173#discussion_r1982737452 ## resource-managers/yarn/src/test/scala/org/apache/spark/deploy/yarn/AmIpFilterSuite.scala: ## @@ -162,8 +162,10 @@ class AmIpFilterSuite extends SparkFunSuite {

Re: [PR] [SPARK-51393][PYTHON] Fallback to regular Python UDF when Arrow is not found but Arrow-optimized Python UDFs enabled [spark]

2025-03-05 Thread via GitHub
HyukjinKwon commented on PR #50160: URL: https://github.com/apache/spark/pull/50160#issuecomment-2702462302 Merged to master and branch-4.0. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the sp

Re: [PR] [SPARK-49479][CORE] Cancel the Timer non-daemon thread on stopping the BarrierCoordinator [spark]

2025-03-05 Thread via GitHub
jjayadeep06 commented on code in PR #50020: URL: https://github.com/apache/spark/pull/50020#discussion_r1982709575 ## core/src/main/scala/org/apache/spark/BarrierCoordinator.scala: ## @@ -53,8 +54,9 @@ private[spark] class BarrierCoordinator( // TODO SPARK-25030 Create a Ti

Re: [PR] [SPARK-51384][SQL] Support `java.time.LocalTime` as the external type of `TimeType` [spark]

2025-03-05 Thread via GitHub
MaxGekk closed pull request #50153: [SPARK-51384][SQL] Support `java.time.LocalTime` as the external type of `TimeType` URL: https://github.com/apache/spark/pull/50153 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the U

Re: [PR] [SPARK-51281][SQL] DataFrameWriterV2 should respect the path option [spark]

2025-03-05 Thread via GitHub
dongjoon-hyun commented on PR #50040: URL: https://github.com/apache/spark/pull/50040#issuecomment-2702759012 For the record, `branch-3.5` is recovered. https://github.com/user-attachments/assets/d88ab2ae-d817-4552-b7c4-edbf5bf41740"; /> -- This is an automated message from the

[PR] [SPARK-51365][TESTS] Test maven + macos [spark]

2025-03-05 Thread via GitHub
LuciferYang opened a new pull request, #50178: URL: https://github.com/apache/spark/pull/50178 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How

Re: [PR] [SPARK-51412][K8S] Upgrade Gradle to 8.13 [spark-kubernetes-operator]

2025-03-05 Thread via GitHub
dongjoon-hyun closed pull request #165: [SPARK-51412][K8S] Upgrade Gradle to 8.13 URL: https://github.com/apache/spark-kubernetes-operator/pull/165 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to th

Re: [PR] [SPARK-51412][K8S] Upgrade Gradle to 8.13 [spark-kubernetes-operator]

2025-03-05 Thread via GitHub
dongjoon-hyun commented on PR #165: URL: https://github.com/apache/spark-kubernetes-operator/pull/165#issuecomment-2702749617 Thank you so much! Merged to main. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

Re: [PR] [SPARK-51408][YARN][TESTS] AmIpFilterSuite#testProxyUpdate fails in some networks [spark]

2025-03-05 Thread via GitHub
cnauroth commented on code in PR #50173: URL: https://github.com/apache/spark/pull/50173#discussion_r1982653029 ## resource-managers/yarn/src/test/scala/org/apache/spark/deploy/yarn/AmIpFilterSuite.scala: ## @@ -162,8 +162,10 @@ class AmIpFilterSuite extends SparkFunSuite {

Re: [PR] [SPARK-51364][SQL][TESTS] Improve the integration tests for external data source by check filter pushed down [spark]

2025-03-05 Thread via GitHub
beliefer commented on PR #50126: URL: https://github.com/apache/spark/pull/50126#issuecomment-2702720959 cc @dongjoon-hyun -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] [SPARK-51408][YARN][TESTS] AmIpFilterSuite#testProxyUpdate fails in some networks [spark]

2025-03-05 Thread via GitHub
pan3793 commented on code in PR #50173: URL: https://github.com/apache/spark/pull/50173#discussion_r1982638623 ## resource-managers/yarn/src/test/scala/org/apache/spark/deploy/yarn/AmIpFilterSuite.scala: ## @@ -162,8 +162,10 @@ class AmIpFilterSuite extends SparkFunSuite {

Re: [PR] [SPARK-51380][SQL] Make sure JDBC dialect could check the supported functions by Introduce functionToSQL and aggregateFunctionToSQL [spark]

2025-03-05 Thread via GitHub
beliefer commented on PR #50143: URL: https://github.com/apache/spark/pull/50143#issuecomment-2702710011 ping @cloud-fan @MaxGekk -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific com

Re: [PR] [SPARK-49479][CORE] Cancel the Timer non-daemon thread on stopping the BarrierCoordinator [spark]

2025-03-05 Thread via GitHub
beliefer commented on code in PR #50020: URL: https://github.com/apache/spark/pull/50020#discussion_r1982637434 ## core/src/main/scala/org/apache/spark/BarrierCoordinator.scala: ## @@ -53,8 +54,9 @@ private[spark] class BarrierCoordinator( // TODO SPARK-25030 Create a Timer

Re: [PR] [SPARK-51097] [SS] Split apart SparkPlan metrics and instance metrics [spark]

2025-03-05 Thread via GitHub
zecookiez commented on code in PR #50157: URL: https://github.com/apache/spark/pull/50157#discussion_r1982636161 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/statefulOperators.scala: ## @@ -434,12 +447,12 @@ trait StateStoreWriter } protected def s

Re: [PR] [SPARK-51408][YARN][TESTS] AmIpFilterSuite#testProxyUpdate fails in some networks [spark]

2025-03-05 Thread via GitHub
cnauroth commented on code in PR #50173: URL: https://github.com/apache/spark/pull/50173#discussion_r1982630520 ## resource-managers/yarn/src/test/scala/org/apache/spark/deploy/yarn/AmIpFilterSuite.scala: ## @@ -162,8 +162,10 @@ class AmIpFilterSuite extends SparkFunSuite {

Re: [PR] [SPARK-51408][YARN][TESTS] AmIpFilterSuite#testProxyUpdate fails in some networks [spark]

2025-03-05 Thread via GitHub
pan3793 commented on code in PR #50173: URL: https://github.com/apache/spark/pull/50173#discussion_r1982607868 ## resource-managers/yarn/src/test/scala/org/apache/spark/deploy/yarn/AmIpFilterSuite.scala: ## @@ -162,8 +162,10 @@ class AmIpFilterSuite extends SparkFunSuite {

Re: [PR] [SPARK-51408][YARN][TESTS] AmIpFilterSuite#testProxyUpdate fails in some networks [spark]

2025-03-05 Thread via GitHub
pan3793 commented on code in PR #50173: URL: https://github.com/apache/spark/pull/50173#discussion_r1982635542 ## resource-managers/yarn/src/test/scala/org/apache/spark/deploy/yarn/AmIpFilterSuite.scala: ## @@ -162,8 +162,10 @@ class AmIpFilterSuite extends SparkFunSuite {

Re: [PR] [SPARK-51290][SQL] Enable filling default values in DSv2 writes [spark]

2025-03-05 Thread via GitHub
aokolnychyi commented on PR #50044: URL: https://github.com/apache/spark/pull/50044#issuecomment-2702697048 Thanks for reviewing, @cloud-fan @gengliangwang @dongjoon-hyun @viirya! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHu

Re: [PR] [SPARK-51408][YARN][TESTS] AmIpFilterSuite#testProxyUpdate fails in some networks [spark]

2025-03-05 Thread via GitHub
cnauroth commented on code in PR #50173: URL: https://github.com/apache/spark/pull/50173#discussion_r1982630520 ## resource-managers/yarn/src/test/scala/org/apache/spark/deploy/yarn/AmIpFilterSuite.scala: ## @@ -162,8 +162,10 @@ class AmIpFilterSuite extends SparkFunSuite {

Re: [PR] [SPARK-51272][CORE]. Fix for the race condition in Scheduler causing failure in retrying all partitions in case of indeterministic shuffle keys [spark]

2025-03-05 Thread via GitHub
sririshindra commented on code in PR #50033: URL: https://github.com/apache/spark/pull/50033#discussion_r1979882434 ## core/src/test/scala/org/apache/spark/scheduler/DAGSchedulerSuite.scala: ## @@ -3185,16 +3198,101 @@ class DAGSchedulerSuite extends SparkFunSuite with TempLoca

Re: [PR] [SPARK-51408][YARN][TESTS] AmIpFilterSuite#testProxyUpdate fails in some networks [spark]

2025-03-05 Thread via GitHub
cnauroth commented on code in PR #50173: URL: https://github.com/apache/spark/pull/50173#discussion_r1982628947 ## resource-managers/yarn/src/test/scala/org/apache/spark/deploy/yarn/AmIpFilterSuite.scala: ## @@ -162,8 +162,10 @@ class AmIpFilterSuite extends SparkFunSuite {

Re: [PR] [SPARK-51408][YARN][TESTS] AmIpFilterSuite#testProxyUpdate fails in some networks [spark]

2025-03-05 Thread via GitHub
pan3793 commented on code in PR #50173: URL: https://github.com/apache/spark/pull/50173#discussion_r1982607868 ## resource-managers/yarn/src/test/scala/org/apache/spark/deploy/yarn/AmIpFilterSuite.scala: ## @@ -162,8 +162,10 @@ class AmIpFilterSuite extends SparkFunSuite {

Re: [PR] [SPARK-51307][SQL][3.5] locationUri in CatalogStorageFormat shall be decoded for display [spark]

2025-03-05 Thread via GitHub
yaooqinn commented on PR #50164: URL: https://github.com/apache/spark/pull/50164#issuecomment-2702675826 Thank you @dongjoon-hyun -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific com

Re: [PR] [SPARK-51097] [SS] Split apart SparkPlan metrics and instance metrics [spark]

2025-03-05 Thread via GitHub
anishshri-db commented on code in PR #50157: URL: https://github.com/apache/spark/pull/50157#discussion_r1982612576 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/statefulOperators.scala: ## @@ -434,12 +447,12 @@ trait StateStoreWriter } protected de

Re: [PR] [SPARK-51272][CORE]. Fix for the race condition in Scheduler causing failure in retrying all partitions in case of indeterministic shuffle keys [spark]

2025-03-05 Thread via GitHub
ahshahid commented on code in PR #50033: URL: https://github.com/apache/spark/pull/50033#discussion_r1982597944 ## core/src/test/scala/org/apache/spark/scheduler/DAGSchedulerSuite.scala: ## @@ -3185,16 +3198,101 @@ class DAGSchedulerSuite extends SparkFunSuite with TempLocalSpa

[PR] [SPARK-51412][K8S] Upgrade Gradle to 8.13 [spark-kubernetes-operator]

2025-03-05 Thread via GitHub
dongjoon-hyun opened a new pull request, #165: URL: https://github.com/apache/spark-kubernetes-operator/pull/165 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change?

[PR] [SPARK-51411] Add documentation for the transformWithState operator [spark]

2025-03-05 Thread via GitHub
anishshri-db opened a new pull request, #50177: URL: https://github.com/apache/spark/pull/50177 ### What changes were proposed in this pull request? Add documentation for the transformWithState operator ### Why are the changes needed? We need to add documentation for the new

Re: [PR] [SPARK-51408][YARN][TESTS] AmIpFilterSuite#testProxyUpdate fails in some networks [spark]

2025-03-05 Thread via GitHub
pan3793 commented on code in PR #50173: URL: https://github.com/apache/spark/pull/50173#discussion_r1982606487 ## resource-managers/yarn/src/test/scala/org/apache/spark/deploy/yarn/AmIpFilterSuite.scala: ## @@ -162,8 +162,10 @@ class AmIpFilterSuite extends SparkFunSuite {

Re: [PR] [SPARK-51272][CORE]. Fix for the race condition in Scheduler causing failure in retrying all partitions in case of indeterministic shuffle keys [spark]

2025-03-05 Thread via GitHub
ahshahid commented on code in PR #50033: URL: https://github.com/apache/spark/pull/50033#discussion_r1982590926 ## core/src/test/scala/org/apache/spark/scheduler/DAGSchedulerSuite.scala: ## @@ -3185,16 +3197,101 @@ class DAGSchedulerSuite extends SparkFunSuite with TempLocalSpa

Re: [PR] [SPARK-51097] [SS] Revert RocksDB instance metrics changes [spark]

2025-03-05 Thread via GitHub
HeartSaVioR closed pull request #50161: [SPARK-51097] [SS] Revert RocksDB instance metrics changes URL: https://github.com/apache/spark/pull/50161 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] [SPARK-51097] [SS] Revert RocksDB instance metrics changes [spark]

2025-03-05 Thread via GitHub
HeartSaVioR commented on PR #50161: URL: https://github.com/apache/spark/pull/50161#issuecomment-2702645786 Thanks! Merging to master. (For 4.0 I'll handle this in backport PR) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub a

Re: [PR] [SPARK-51097] [SS] [4.0] Revert RocksDB instance metrics changes [spark]

2025-03-05 Thread via GitHub
HeartSaVioR commented on PR #50165: URL: https://github.com/apache/spark/pull/50165#issuecomment-2702646339 Thanks! Merging to 4.0. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific co

Re: [PR] [SPARK-51307][SQL][3.5] locationUri in CatalogStorageFormat shall be decoded for display [spark]

2025-03-05 Thread via GitHub
dongjoon-hyun commented on PR #50164: URL: https://github.com/apache/spark/pull/50164#issuecomment-2702645633 Sorry but could you rebase once more because Docker and SparkR CI also failed before, @yaooqinn ? https://github.com/user-attachments/assets/5e092a64-baea-4e93-9725-b490cbebe2

Re: [PR] [SPARK-51407][CONNECT][DOCS][3.5] Document missed `Spark Connect` configurations [spark]

2025-03-05 Thread via GitHub
dongjoon-hyun commented on PR #50172: URL: https://github.com/apache/spark/pull/50172#issuecomment-2702643279 Thank you all! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] [SPARK-51407][CONNECT][DOCS][3.5] Document missed `Spark Connect` configurations [spark]

2025-03-05 Thread via GitHub
dongjoon-hyun closed pull request #50172: [SPARK-51407][CONNECT][DOCS][3.5] Document missed `Spark Connect` configurations URL: https://github.com/apache/spark/pull/50172 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use th

Re: [PR] [SPARK-51407][CONNECT][DOCS][3.5] Document missed `Spark Connect` configurations [spark]

2025-03-05 Thread via GitHub
dongjoon-hyun commented on PR #50172: URL: https://github.com/apache/spark/pull/50172#issuecomment-2702643060 Merged to branch-3.5. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific co

Re: [PR] [SPARK-51281][SQL] DataFrameWriterV2 should respect the path option [spark]

2025-03-05 Thread via GitHub
dongjoon-hyun commented on PR #50040: URL: https://github.com/apache/spark/pull/50040#issuecomment-2702640013 Now, this is reverted from branch-3.5 via https://github.com/apache/spark/commit/c8c0f1feb0f2680ac7437eae926c17724af0d94e due to the CI failure. Please make a backporting PR

Re: [PR] [SPARK-51281][SQL] DataFrameWriterV2 should respect the path option [spark]

2025-03-05 Thread via GitHub
dongjoon-hyun commented on PR #50040: URL: https://github.com/apache/spark/pull/50040#issuecomment-2702637256 I verified that the reverting recovers `unidoc`. ``` $ git log --oneline -n1 c8c0f1feb0f (HEAD -> branch-3.5) Revert "[SPARK-51281][SQL] DataFrameWriterV2 should respect the

Re: [PR] [SPARK-51272][CORE]. Fix for the race condition in Scheduler causing failure in retrying all partitions in case of indeterministic shuffle keys [spark]

2025-03-05 Thread via GitHub
ahshahid commented on code in PR #50033: URL: https://github.com/apache/spark/pull/50033#discussion_r1982595027 ## core/src/test/scala/org/apache/spark/scheduler/DAGSchedulerSuite.scala: ## @@ -3185,16 +3198,101 @@ class DAGSchedulerSuite extends SparkFunSuite with TempLocalSpa

Re: [PR] [SPARK-51281][SQL] DataFrameWriterV2 should respect the path option [spark]

2025-03-05 Thread via GitHub
dongjoon-hyun commented on PR #50040: URL: https://github.com/apache/spark/pull/50040#issuecomment-2702546080 @cloud-fan . It's a `unidoc` failure and it happens on all commit builder and PR builder (on branch-3.5) after this PR. ``` [info] Main Scala API documentation to /__w/spark/s

Re: [PR] [SPARK-51290][SQL] Enable filling default values in DSv2 writes [spark]

2025-03-05 Thread via GitHub
cloud-fan commented on PR #50044: URL: https://github.com/apache/spark/pull/50044#issuecomment-2702534394 thanks, merging to master! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific c

Re: [PR] [SPARK-51290][SQL] Enable filling default values in DSv2 writes [spark]

2025-03-05 Thread via GitHub
cloud-fan closed pull request #50044: [SPARK-51290][SQL] Enable filling default values in DSv2 writes URL: https://github.com/apache/spark/pull/50044 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] [SPARK-51272][CORE]. Fix for the race condition in Scheduler causing failure in retrying all partitions in case of indeterministic shuffle keys [spark]

2025-03-05 Thread via GitHub
ahshahid commented on code in PR #50033: URL: https://github.com/apache/spark/pull/50033#discussion_r1982447456 ## core/src/test/scala/org/apache/spark/scheduler/DAGSchedulerSuite.scala: ## @@ -3185,16 +3198,101 @@ class DAGSchedulerSuite extends SparkFunSuite with TempLocalSpa

Re: [PR] [SPARK-51272][CORE]. Fix for the race condition in Scheduler causing failure in retrying all partitions in case of indeterministic shuffle keys [spark]

2025-03-05 Thread via GitHub
ahshahid commented on code in PR #50033: URL: https://github.com/apache/spark/pull/50033#discussion_r1982447026 ## core/src/test/scala/org/apache/spark/scheduler/DAGSchedulerSuite.scala: ## @@ -3185,16 +3197,101 @@ class DAGSchedulerSuite extends SparkFunSuite with TempLocalSpa

Re: [PR] [SPARK-51410][K8S] Add `Spark Connect Plugin` example [spark-kubernetes-operator]

2025-03-05 Thread via GitHub
dongjoon-hyun commented on PR #164: URL: https://github.com/apache/spark-kubernetes-operator/pull/164#issuecomment-2702523558 Thank you, @viirya . Merged to main. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the UR

Re: [PR] [SPARK-51410][K8S] Add `Spark Connect Plugin` example [spark-kubernetes-operator]

2025-03-05 Thread via GitHub
dongjoon-hyun closed pull request #164: [SPARK-51410][K8S] Add `Spark Connect Plugin` example URL: https://github.com/apache/spark-kubernetes-operator/pull/164 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [PR] [SPARK-51409][SS] Add error classification in the changelog writer creation path [spark]

2025-03-05 Thread via GitHub
anishshri-db commented on PR #50176: URL: https://github.com/apache/spark/pull/50176#issuecomment-2702523029 @HeartSaVioR - PTAL, thanks ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spec

[PR] [SPARK-51409] Add error classification in the changelog writer creation path [spark]

2025-03-05 Thread via GitHub
anishshri-db opened a new pull request, #50176: URL: https://github.com/apache/spark/pull/50176 ### What changes were proposed in this pull request? Add error classification in the changelog writer creation path ### Why are the changes needed? We have seen some transient erro

Re: [PR] [SPARK-51272][CORE]. Fix for the race condition in Scheduler causing failure in retrying all partitions in case of indeterministic shuffle keys [spark]

2025-03-05 Thread via GitHub
ahshahid commented on code in PR #50033: URL: https://github.com/apache/spark/pull/50033#discussion_r1982444723 ## core/src/test/scala/org/apache/spark/scheduler/DAGSchedulerSuite.scala: ## @@ -3185,16 +3198,101 @@ class DAGSchedulerSuite extends SparkFunSuite with TempLocalSpa

Re: [PR] [SPARK-51393][PYTHON] Fallback to regular Python UDF when Arrow is not found but Arrow-optimized Python UDFs enabled [spark]

2025-03-05 Thread via GitHub
HyukjinKwon closed pull request #50160: [SPARK-51393][PYTHON] Fallback to regular Python UDF when Arrow is not found but Arrow-optimized Python UDFs enabled URL: https://github.com/apache/spark/pull/50160 -- This is an automated message from the Apache Git Service. To respond to the message,

Re: [PR] [SPARK-51340][ML][CONNECT] Model size estimation for linear classification & regression models [spark]

2025-03-05 Thread via GitHub
zhengruifeng commented on code in PR #50106: URL: https://github.com/apache/spark/pull/50106#discussion_r1982392558 ## mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala: ## @@ -1248,6 +1266,19 @@ class LogisticRegressionModel private[spark] (

Re: [PR] [SPARK-51340][ML][CONNECT] Model size estimation for linear classification & regression models [spark]

2025-03-05 Thread via GitHub
zhengruifeng commented on code in PR #50106: URL: https://github.com/apache/spark/pull/50106#discussion_r1982392072 ## mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala: ## @@ -1041,6 +1041,24 @@ class LogisticRegression @Since("1.2.0") ( (sol

Re: [PR] [SPARK-51407][DOCS][FOLLOWUP] Fix `spark.connect.ml.backend.classes` description [spark]

2025-03-05 Thread via GitHub
dongjoon-hyun closed pull request #50175: [SPARK-51407][DOCS][FOLLOWUP] Fix `spark.connect.ml.backend.classes` description URL: https://github.com/apache/spark/pull/50175 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use th

Re: [PR] [SPARK-51407][DOCS][FOLLOWUP] Fix `spark.connect.ml.backend.classes` description [spark]

2025-03-05 Thread via GitHub
dongjoon-hyun commented on PR #50175: URL: https://github.com/apache/spark/pull/50175#issuecomment-2702432659 Thank you! Merged to master/4.0. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[PR] [SPARK-51410][K8S] Add `Spark Connect Plugin` example [spark-kubernetes-operator]

2025-03-05 Thread via GitHub
dongjoon-hyun opened a new pull request, #164: URL: https://github.com/apache/spark-kubernetes-operator/pull/164 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change?

Re: [PR] [MINOR][DOCS] Link to security reporting page [spark]

2025-03-05 Thread via GitHub
github-actions[bot] closed pull request #48933: [MINOR][DOCS] Link to security reporting page URL: https://github.com/apache/spark/pull/48933 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spec

Re: [PR] [SPARK-50413][SQL] Add a flag to insert commands to know if they are part of CTAS [spark]

2025-03-05 Thread via GitHub
github-actions[bot] commented on PR #48956: URL: https://github.com/apache/spark/pull/48956#issuecomment-2702388808 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.

Re: [PR] [SPARK-51391][SQL][CONNECT] Fix `SparkConnectClient` to respect `SPARK_USER` and `user.name` [spark]

2025-03-05 Thread via GitHub
dongjoon-hyun commented on PR #50159: URL: https://github.com/apache/spark/pull/50159#issuecomment-2702329675 Could you review this `Spark Connect` PR, @huaxingao ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the U

[PR] [SPARK-51407][DOCS][FOLLOWUP] Fix `spark.connect.ml.backend.classes` description [spark]

2025-03-05 Thread via GitHub
dongjoon-hyun opened a new pull request, #50175: URL: https://github.com/apache/spark/pull/50175 … ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change?

Re: [PR] [SPARK-51387][BUILD][4.0] Upgrade Netty to 4.1.119.Final [spark]

2025-03-05 Thread via GitHub
dongjoon-hyun commented on PR #50174: URL: https://github.com/apache/spark/pull/50174#issuecomment-2702280429 This is a retry of the previous commit. Currently, this PR is a draft status to investigate the CI Python failures. -- This is an automated message from the Apache Git Service. To

Re: [PR] [SPARK-51393][PYTHON] Fallback to regular Python UDF when Arrow is not found but Arrow-optimized Python UDFs enabled [spark]

2025-03-05 Thread via GitHub
dongjoon-hyun commented on PR #50160: URL: https://github.com/apache/spark/pull/50160#issuecomment-2702303740 Just for the record, PyPy 3.10 CIs have been broken in both `master` and `branch-4.0` after the reverting commit, https://github.com/apache/spark/commit/eb4855315049d2e8ab0105efbbd1

[PR] [SPARK-51387][BUILD][4.0] Upgrade Netty to 4.1.119.Final [spark]

2025-03-05 Thread via GitHub
dongjoon-hyun opened a new pull request, #50174: URL: https://github.com/apache/spark/pull/50174 ### What changes were proposed in this pull request? This PR aims to Upgrade Netty to 4.1.119.Final. ### Why are the changes needed? - https://github.com/netty/netty/milestone

Re: [PR] [SPARK-51408][YARN][TESTS] AmIpFilterSuite#testProxyUpdate fails in some networks [spark]

2025-03-05 Thread via GitHub
cnauroth commented on PR #50173: URL: https://github.com/apache/spark/pull/50173#issuecomment-2702251438 @pan3793 and @LuciferYang , this relates to code you added/reviewed in #46611 . Would you please review this test fix? Thank you. -- This is an automated message from the Apache Git Se

Re: [PR] [SPARK-51407][CONNECT][DOCS] Document missed `Spark Connect` configurations [spark]

2025-03-05 Thread via GitHub
dongjoon-hyun commented on PR #50171: URL: https://github.com/apache/spark/pull/50171#issuecomment-2702246906 Thank you, @allisonwang-db . -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spec

Re: [PR] [SPARK-51407][CONNECT][DOCS][3.5] Document missed `Spark Connect` con figurations [spark]

2025-03-05 Thread via GitHub
dongjoon-hyun commented on PR #50172: URL: https://github.com/apache/spark/pull/50172#issuecomment-2702249848 Thank you, @allisonwang-db . -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spec

Re: [PR] Log exception on Python runner termination [spark]

2025-03-05 Thread via GitHub
allisonwang-db commented on PR #49890: URL: https://github.com/apache/spark/pull/49890#issuecomment-2702249616 @antban Thanks for contributing. Could you create a JIRA ticket and add it to the PR title? -- This is an automated message from the Apache Git Service. To respond to the message

Re: [PR] [SPARK-51407][CONNECT][DOCS] Document missed `Spark Connect` configurations [spark]

2025-03-05 Thread via GitHub
dongjoon-hyun commented on PR #50171: URL: https://github.com/apache/spark/pull/50171#issuecomment-2702249226 Merged to master/4.0 for Apache Spark 4.0.0. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above t

Re: [PR] [SPARK-51407][CONNECT][DOCS] Document missed `Spark Connect` configurations [spark]

2025-03-05 Thread via GitHub
dongjoon-hyun closed pull request #50171: [SPARK-51407][CONNECT][DOCS] Document missed `Spark Connect` configurations URL: https://github.com/apache/spark/pull/50171 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

Re: [PR] [SPARK-51350][SQL] Implement Show Procedures [spark]

2025-03-05 Thread via GitHub
allisonwang-db commented on code in PR #50109: URL: https://github.com/apache/spark/pull/50109#discussion_r1982287527 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/v2Commands.scala: ## @@ -1664,3 +1664,22 @@ case class Call( override protected def

[PR] [SPARK-51408][YARN][TESTS] AmIpFilterSuite#testProxyUpdate fails in some networks [spark]

2025-03-05 Thread via GitHub
cnauroth opened a new pull request, #50173: URL: https://github.com/apache/spark/pull/50173 ### What changes were proposed in this pull request? While verifying Spark 4.0.0 RC2, I consistently saw YARN test `AmIpFilterSuite#testProxyUpdate` failing in my environment. The test is writ

Re: [PR] [SPARK-51393][PYTHON] Fallback to regular Python UDF when Arrow is not found but Arrow-optimized Python UDFs enabled [spark]

2025-03-05 Thread via GitHub
allisonwang-db commented on code in PR #50160: URL: https://github.com/apache/spark/pull/50160#discussion_r1982284472 ## python/pyspark/sql/udf.py: ## @@ -128,6 +130,18 @@ def _create_py_udf( else: is_arrow_enabled = useArrow +if is_arrow_enabled: +tr

Re: [PR] [SPARK-51407][CONNECT][DOCS] Document missed `Spark Connect` configurations [spark]

2025-03-05 Thread via GitHub
dongjoon-hyun commented on code in PR #50171: URL: https://github.com/apache/spark/pull/50171#discussion_r1982270730 ## docs/configuration.md: ## @@ -3459,6 +3475,70 @@ Expression types in proto. Command types in proto. 3.4.0 + + spark.connect.ml.backend.classes + +

[PR] [SPARK-51407][CONNECT][DOCS][3.5] Document missed `Spark Connect` con figurations [spark]

2025-03-05 Thread via GitHub
dongjoon-hyun opened a new pull request, #50172: URL: https://github.com/apache/spark/pull/50172 … ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change?

[PR] [SPARK-51407][CONNECT][DOCS] Document missed `spark.connect.*` configurations [spark]

2025-03-05 Thread via GitHub
dongjoon-hyun opened a new pull request, #50171: URL: https://github.com/apache/spark/pull/50171 … ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change?

Re: [PR] [SPARK-51097] [SS] Split apart SparkPlan metrics and instance metrics [spark]

2025-03-05 Thread via GitHub
ericm-db commented on code in PR #50157: URL: https://github.com/apache/spark/pull/50157#discussion_r1982183086 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/statefulOperators.scala: ## @@ -434,12 +447,12 @@ trait StateStoreWriter } protected def se

Re: [PR] [SPARK-51406][K8S] Remove no-op `spark.log.structuredLogging.enabled=false` [spark-kubernetes-operator]

2025-03-05 Thread via GitHub
dongjoon-hyun commented on PR #163: URL: https://github.com/apache/spark-kubernetes-operator/pull/163#issuecomment-2702072098 Merged to main. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] [SPARK-51406][K8S] Remove no-op `spark.log.structuredLogging.enabled=false` [spark-kubernetes-operator]

2025-03-05 Thread via GitHub
dongjoon-hyun commented on PR #163: URL: https://github.com/apache/spark-kubernetes-operator/pull/163#issuecomment-2702059506 Thank you again, @viirya ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

Re: [PR] [SPARK-51406][K8S] Remove no-op `spark.log.structuredLogging.enabled=false` [spark-kubernetes-operator]

2025-03-05 Thread via GitHub
dongjoon-hyun closed pull request #163: [SPARK-51406][K8S] Remove no-op `spark.log.structuredLogging.enabled=false` URL: https://github.com/apache/spark-kubernetes-operator/pull/163 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

Re: [PR] [SPARK-51097] [SS] Split apart SparkPlan metrics and instance metrics [spark]

2025-03-05 Thread via GitHub
zecookiez commented on code in PR #50157: URL: https://github.com/apache/spark/pull/50157#discussion_r1982177340 ## sql/core/src/main/scala/org/apache/spark/sql/execution/SparkPlan.scala: ## @@ -140,11 +140,17 @@ abstract class SparkPlan extends QueryPlan[SparkPlan] with Loggin

[PR] [SPARK-51406][K8S] Remove no-op `spark.log.structuredLogging.enabled=false` [spark-kubernetes-operator]

2025-03-05 Thread via GitHub
dongjoon-hyun opened a new pull request, #163: URL: https://github.com/apache/spark-kubernetes-operator/pull/163 … ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing cha

Re: [PR] [SPARK-51405][K8S] Upgrade `build-tools` to use Ubuntu 24.04 LTS instead of 22.04 LTS docker image [spark-kubernetes-operator]

2025-03-05 Thread via GitHub
dongjoon-hyun closed pull request #162: [SPARK-51405][K8S] Upgrade `build-tools` to use Ubuntu 24.04 LTS instead of 22.04 LTS docker image URL: https://github.com/apache/spark-kubernetes-operator/pull/162 -- This is an automated message from the Apache Git Service. To respond to the message,

  1   2   3   >