Re: [PR] [SPARK-49489][SQL][HIVE] HMS client respects `hive.thrift.client.maxmessage.size` [spark]

2025-02-24 Thread via GitHub
pan3793 commented on code in PR #50022: URL: https://github.com/apache/spark/pull/50022#discussion_r1969188058 ## sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala: ## @@ -1407,13 +1410,83 @@ private[hive] object HiveClientImpl extends Logging {

Re: [PR] [SPARK-51187][SQL][SS] Implement the graceful deprecation of incorrect config introduced in SPARK-49699 [spark]

2025-02-24 Thread via GitHub
HeartSaVioR commented on PR #49983: URL: https://github.com/apache/spark/pull/49983#issuecomment-2680925857 @cloud-fan > have we merged this graceful deprecation in branch 3.5? Yes, that is merged. It's still a blocker for Spark 4.0.0 though. @dongjoon-hyun > If

Re: [PR] [SPARK-51265][SQL][SS] Throw proper error for eagerlyExecuteCommands containing streaming source marker [spark]

2025-02-24 Thread via GitHub
HeartSaVioR closed pull request #50015: [SPARK-51265][SQL][SS] Throw proper error for eagerlyExecuteCommands containing streaming source marker URL: https://github.com/apache/spark/pull/50015 -- This is an automated message from the Apache Git Service. To respond to the message, please log on

Re: [PR] [SPARK-51265][SQL][SS] Throw proper error for eagerlyExecuteCommands containing streaming source marker [spark]

2025-02-24 Thread via GitHub
HeartSaVioR commented on PR #50015: URL: https://github.com/apache/spark/pull/50015#issuecomment-2680891589 Closing via #50037 - much simpler change and both of PRs do not address the origin report which @cloud-fan will address later. -- This is an automated message from the Apache Git Se

Re: [PR] [SPARK-51289][SQL] Throw a proper error message for not fully implemented `SQLTableFunction` [spark]

2025-02-24 Thread via GitHub
LuciferYang commented on PR #50073: URL: https://github.com/apache/spark/pull/50073#issuecomment-2680871903 also cc @cloud-fan -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific commen

Re: [PR] [SPARK-51252] [SS] Add instance metrics for last uploaded snapshot version in HDFS State Stores [spark]

2025-02-24 Thread via GitHub
micheal-o commented on code in PR #50030: URL: https://github.com/apache/spark/pull/50030#discussion_r1968977014 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/HDFSBackedStateStoreProvider.scala: ## @@ -419,6 +432,10 @@ private[sql] class HDFSBackedSta

Re: [PR] [SPARK-51309][BUILD] Upgrade rocksdbjni to 9.10.0 [spark]

2025-02-24 Thread via GitHub
wayneguow commented on PR #50076: URL: https://github.com/apache/spark/pull/50076#issuecomment-2680812582 Related benchmark results: - jdk17: https://github.com/wayneguow/spark/actions/runs/13513028574 - jdk21: https://github.com/wayneguow/spark/actions/runs/13513032754 -- This i

[PR] [SPARK-51309][BUILD] Upgrade rocksdbjni to 9.10.0 [spark]

2025-02-24 Thread via GitHub
wayneguow opened a new pull request, #50076: URL: https://github.com/apache/spark/pull/50076 ### What changes were proposed in this pull request? The pr aims to upgrade `rocksdbjni` from 9.8.4 to 9.10.0. ### Why are the changes needed? There are some bug fixes and

Re: [PR] [SPARK-51281][SQL] DataFrameWriterV2 should respect the path option [spark]

2025-02-24 Thread via GitHub
szehon-ho commented on code in PR #50040: URL: https://github.com/apache/spark/pull/50040#discussion_r1969046534 ## sql/core/src/test/scala/org/apache/spark/sql/DataFrameWriterV2Suite.scala: ## @@ -839,4 +839,30 @@ class DataFrameWriterV2Suite extends QueryTest with SharedSpark

[PR] [SPARK-51308][CONNECT][BUILD] Update the relocation rules for the `connect` module in `SparkBuild.scala` to ensure that both Maven and SBT produce the assembly JAR according to the same rules [sp

2025-02-24 Thread via GitHub
wayneguow opened a new pull request, #50075: URL: https://github.com/apache/spark/pull/50075 ### What changes were proposed in this pull request? This PR aims to update the relocation rules for the `connect` module in `SparkBuild.scala`. ### Why are the changes needed?

Re: [PR] [SPARK-51281][SQL] DataFrameWriterV2 should respect the path option [spark]

2025-02-24 Thread via GitHub
dongjoon-hyun commented on code in PR #50040: URL: https://github.com/apache/spark/pull/50040#discussion_r1969044495 ## sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala: ## @@ -5545,6 +5545,15 @@ object SQLConf { .booleanConf .createWithDefault(f

Re: [PR] [SPARK-51281][SQL] DataFrameWriterV2 should respect the path option [spark]

2025-02-24 Thread via GitHub
cloud-fan commented on code in PR #50040: URL: https://github.com/apache/spark/pull/50040#discussion_r1968959548 ## sql/core/src/test/scala/org/apache/spark/sql/DataFrameWriterV2Suite.scala: ## @@ -841,20 +841,24 @@ class DataFrameWriterV2Suite extends QueryTest with SharedSpar

Re: [PR] [SPARK-51305][SQL][CONNECT] Improve `SparkConnectPlanExecution.createObservedMetricsResponse` [spark]

2025-02-24 Thread via GitHub
beliefer commented on PR #50066: URL: https://github.com/apache/spark/pull/50066#issuecomment-2680591378 @dongjoon-hyun Thank you ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific co

Re: [PR] [SPARK-51289][SQL] Throw a proper error message for not fully implemented `SQLTableFunction` [spark]

2025-02-24 Thread via GitHub
wayneguow commented on PR #50073: URL: https://github.com/apache/spark/pull/50073#issuecomment-2680420915 cc @MaxGekk @allisonwang-db -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specifi

Re: [PR] [SPARK-50692][SQL][FOLLOWUP] Add the LPAD and RPAD pushdown support for H2 [spark]

2025-02-24 Thread via GitHub
beliefer commented on PR #50068: URL: https://github.com/apache/spark/pull/50068#issuecomment-2680314901 > Oh, did you aim to use this as a follow-up, @beliefer ? Uh, I forgot it. I want it to be a follow-up. -- This is an automated message from the Apache Git Service. To respond to

Re: [PR] [SPARK-50856][SS][PYTHON][CONNECT] Spark Connect Support for TransformWithStateInPandas In Python [spark]

2025-02-24 Thread via GitHub
hvanhovell commented on code in PR #49560: URL: https://github.com/apache/spark/pull/49560#discussion_r1968788349 ## sql/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala: ## @@ -1034,6 +1038,49 @@ class SparkConnectPlanner( .log

Re: [PR] [SPARK-50856][SS][PYTHON][CONNECT] Spark Connect Support for TransformWithStateInPandas In Python [spark]

2025-02-24 Thread via GitHub
hvanhovell commented on code in PR #49560: URL: https://github.com/apache/spark/pull/49560#discussion_r1968784789 ## sql/connect/common/src/main/protobuf/spark/connect/relations.proto: ## @@ -1031,6 +1031,26 @@ message GroupMap { // (Optional) The schema for the grouped sta

Re: [PR] [SPARK-50692][SQL][FOLLOWUP] Add the LPAD and RPAD pushdown support for H2 [spark]

2025-02-24 Thread via GitHub
beliefer commented on PR #50068: URL: https://github.com/apache/spark/pull/50068#issuecomment-2680315941 @dongjoon-hyun Thank you! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific com

Re: [PR] [SPARK-50856][SS][PYTHON][CONNECT] Spark Connect Support for TransformWithStateInPandas In Python [spark]

2025-02-24 Thread via GitHub
hvanhovell commented on code in PR #49560: URL: https://github.com/apache/spark/pull/49560#discussion_r1968783156 ## sql/connect/common/src/main/protobuf/spark/connect/relations.proto: ## @@ -1031,6 +1031,26 @@ message GroupMap { // (Optional) The schema for the grouped sta

Re: [PR] [SPARK-51302][CONNECT] Spark Connect supports JDBC should use the DataFrameReader API [spark]

2025-02-24 Thread via GitHub
beliefer commented on PR #50059: URL: https://github.com/apache/spark/pull/50059#issuecomment-2680328835 > Do you think you can add some test cases, @beliefer , to be clear what was the problem and to prevent a future regression? Spark Connect already have the test cases. This improve

[PR] [SPARK-51307][SQL] locationUri in CatalogStorageFormat shall be decoded for display [spark]

2025-02-24 Thread via GitHub
yaooqinn opened a new pull request, #50074: URL: https://github.com/apache/spark/pull/50074 ### What changes were proposed in this pull request? This PR uses CatalogUtils.URIToString instead of URI.toString to decode the location URI. ### Why are the changes needed?

Re: [PR] [SPARK-51261][ML][PYTHON][CONNECT] Introduce model size estimation to control ml cache [spark]

2025-02-24 Thread via GitHub
hvanhovell commented on code in PR #50013: URL: https://github.com/apache/spark/pull/50013#discussion_r1968738761 ## common/utils/src/main/resources/error/error-conditions.json: ## @@ -780,6 +780,11 @@ "Cannot retrieve from the ML cache. It is probably because the e

Re: [PR] [SPARK-51261][ML][PYTHON][CONNECT] Introduce model size estimation to control ml cache [spark]

2025-02-24 Thread via GitHub
hvanhovell commented on code in PR #50013: URL: https://github.com/apache/spark/pull/50013#discussion_r1968734181 ## mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala: ## @@ -1248,6 +1263,11 @@ class LogisticRegressionModel private[spark] ( }

Re: [PR] [SPARK-51261][ML][PYTHON][CONNECT] Introduce model size estimation to control ml cache [spark]

2025-02-24 Thread via GitHub
hvanhovell commented on code in PR #50013: URL: https://github.com/apache/spark/pull/50013#discussion_r1968733126 ## mllib/src/main/scala/org/apache/spark/ml/classification/FMClassifier.scala: ## @@ -235,6 +236,13 @@ class FMClassifier @Since("3.0.0") ( model.setSummary(Som

[PR] [SPARK-51289][SQL] Throw a proper error message for not fully implemented `SQLTableFunction` [spark]

2025-02-24 Thread via GitHub
wayneguow opened a new pull request, #50073: URL: https://github.com/apache/spark/pull/50073 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How w

Re: [PR] [SPARK-51261][ML][PYTHON][CONNECT] Introduce model size estimation to control ml cache [spark]

2025-02-24 Thread via GitHub
hvanhovell commented on code in PR #50013: URL: https://github.com/apache/spark/pull/50013#discussion_r1968717393 ## mllib-local/src/main/scala/org/apache/spark/ml/linalg/Vectors.scala: ## @@ -504,6 +506,10 @@ object Vectors { /** Max number of nonzero entries used in compu

Re: [PR] [SPARK-50795][SQL][FOLLOWUP] Set isParsing to false for the timestamp formatter in DESCRIBE AS JSON [spark]

2025-02-24 Thread via GitHub
yaooqinn commented on PR #50065: URL: https://github.com/apache/spark/pull/50065#issuecomment-2680188025 Merged to master/4.0, thank you @dongjoon-hyun @asl3 @HyukjinKwon -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and us

Re: [PR] [SPARK-51306][TESTS] Fix test errors caused by improper DROP TABLE/VIEW in describe.sql [spark]

2025-02-24 Thread via GitHub
dongjoon-hyun commented on PR #50061: URL: https://github.com/apache/spark/pull/50061#issuecomment-2680208684 Thank you for adding JIRA issue ID, @yaooqinn . -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abov

Re: [PR] [SPARK-50795][SQL][FOLLOWUP] Set isParsing to false for the timestamp formatter in DESCRIBE AS JSON [spark]

2025-02-24 Thread via GitHub
dongjoon-hyun commented on PR #50065: URL: https://github.com/apache/spark/pull/50065#issuecomment-2680207829 Thank you, @yaooqinn and all. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spe

Re: [PR] [SPARK-51261][ML][PYTHON][CONNECT] Introduce model size estimation to control ml cache [spark]

2025-02-24 Thread via GitHub
hvanhovell commented on code in PR #50013: URL: https://github.com/apache/spark/pull/50013#discussion_r1968711609 ## mllib/src/main/scala/org/apache/spark/ml/util/Summary.scala: ## @@ -18,11 +18,21 @@ package org.apache.spark.ml.util import org.apache.spark.annotation.Since

Re: [PR] [SPARK-51261][ML][PYTHON][CONNECT] Introduce model size estimation to control ml cache [spark]

2025-02-24 Thread via GitHub
hvanhovell commented on code in PR #50013: URL: https://github.com/apache/spark/pull/50013#discussion_r1968708896 ## sql/connect/server/src/main/scala/org/apache/spark/sql/connect/ml/MLCache.scala: ## @@ -21,23 +21,52 @@ import java.util.concurrent.{ConcurrentMap, TimeUnit} i

Re: [PR] [SPARK-51261][ML][PYTHON][CONNECT] Introduce model size estimation to control ml cache [spark]

2025-02-24 Thread via GitHub
hvanhovell commented on code in PR #50013: URL: https://github.com/apache/spark/pull/50013#discussion_r1968705260 ## sql/connect/server/src/main/scala/org/apache/spark/sql/connect/ml/MLException.scala: ## @@ -36,3 +36,17 @@ private[spark] case class MLCacheInvalidException(obje

Re: [PR] [SPARK-51261][ML][PYTHON][CONNECT] Introduce model size estimation to control ml cache [spark]

2025-02-24 Thread via GitHub
hvanhovell commented on code in PR #50013: URL: https://github.com/apache/spark/pull/50013#discussion_r1968704565 ## sql/connect/server/src/main/scala/org/apache/spark/sql/connect/ml/MLHandler.scala: ## @@ -125,6 +127,15 @@ private[connect] object MLHandler extends Logging {

Re: [PR] [SPARK-51261][ML][PYTHON][CONNECT] Introduce model size estimation to control ml cache [spark]

2025-02-24 Thread via GitHub
hvanhovell commented on code in PR #50013: URL: https://github.com/apache/spark/pull/50013#discussion_r1968703858 ## sql/connect/server/src/main/scala/org/apache/spark/sql/connect/ml/MLHandler.scala: ## @@ -125,6 +127,15 @@ private[connect] object MLHandler extends Logging {

Re: [PR] [SPARK-51306][TESTS] Fix test errors caused by improper DROP TABLE/VIEW in describe.sql [spark]

2025-02-24 Thread via GitHub
yaooqinn commented on PR #50061: URL: https://github.com/apache/spark/pull/50061#issuecomment-2680181236 Thank you @dongjoon-hyun @LuciferYang, SPARK-51306 is attached. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use t

Re: [PR] [SPARK-51261][ML][PYTHON][CONNECT] Introduce model size estimation to control ml cache [spark]

2025-02-24 Thread via GitHub
hvanhovell commented on code in PR #50013: URL: https://github.com/apache/spark/pull/50013#discussion_r1968701252 ## sql/connect/server/src/main/scala/org/apache/spark/sql/connect/config/Connect.scala: ## @@ -313,4 +313,49 @@ object Connect { .internal() .booleanCo

Re: [PR] [SPARK-50795][SQL][FOLLOWUP] Set isParsing to false for the timestamp formatter in DESCRIBE AS JSON [spark]

2025-02-24 Thread via GitHub
yaooqinn closed pull request #50065: [SPARK-50795][SQL][FOLLOWUP] Set isParsing to false for the timestamp formatter in DESCRIBE AS JSON URL: https://github.com/apache/spark/pull/50065 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to Git

Re: [PR] [SPARK-51306][TESTS] Fix test errors caused by improper DROP TABLE/VIEW in describe.sql [spark]

2025-02-24 Thread via GitHub
yaooqinn closed pull request #50061: [SPARK-51306][TESTS] Fix test errors caused by improper DROP TABLE/VIEW in describe.sql URL: https://github.com/apache/spark/pull/50061 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

Re: [PR] [SPARK-51304][DOCS][PYTHON] Use `getCondition` instead of `getErrorClass` in contribution guide [spark]

2025-02-24 Thread via GitHub
itholic commented on PR #50062: URL: https://github.com/apache/spark/pull/50062#issuecomment-2680085229 Thanks all for the review! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific com

Re: [PR] [SPARK-50319] Reorder ResolveIdentifierClause and BindParameter rules [spark]

2025-02-24 Thread via GitHub
github-actions[bot] closed pull request #48849: [SPARK-50319] Reorder ResolveIdentifierClause and BindParameter rules URL: https://github.com/apache/spark/pull/48849 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

Re: [PR] [DRAFT] Two string types [spark]

2025-02-24 Thread via GitHub
github-actions[bot] closed pull request #48861: [DRAFT] Two string types URL: https://github.com/apache/spark/pull/48861 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsu

Re: [PR] [SPARK-50914][PYTHON][CONNECT] Match GRPC dependencies for Python-only master scheduled job [spark]

2025-02-24 Thread via GitHub
HyukjinKwon closed pull request #50058: [SPARK-50914][PYTHON][CONNECT] Match GRPC dependencies for Python-only master scheduled job URL: https://github.com/apache/spark/pull/50058 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub a

Re: [PR] [SPARK-50914][PYTHON][CONNECT] Match GRPC dependencies for Python-only master scheduled job [spark]

2025-02-24 Thread via GitHub
HyukjinKwon commented on PR #50058: URL: https://github.com/apache/spark/pull/50058#issuecomment-2680012404 Merged to master and branch-4.0. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the sp

Re: [PR] [SPARK-50914][PYTHON][CONNECT] Match GRPC dependencies for Python-only master scheduled job [spark]

2025-02-24 Thread via GitHub
dongjoon-hyun commented on PR #50058: URL: https://github.com/apache/spark/pull/50058#issuecomment-268537 Thank you, @HyukjinKwon ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specifi

Re: [PR] [SPARK-50914][PYTHON][CONNECT] Match GRPC dependencies for Python-only master scheduled job [spark]

2025-02-24 Thread via GitHub
HyukjinKwon commented on PR #50058: URL: https://github.com/apache/spark/pull/50058#issuecomment-2679997693 cc @dongjoon-hyun This should fix the scheduled build and make it green 👍 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to Gi

Re: [PR] [SPARK-51099][PYTHON][FOLLOWUP][4.0] Avoid logging when selector.select returns 0 without waiting the configured timeout [spark]

2025-02-24 Thread via GitHub
HyukjinKwon closed pull request #50072: [SPARK-51099][PYTHON][FOLLOWUP][4.0] Avoid logging when selector.select returns 0 without waiting the configured timeout URL: https://github.com/apache/spark/pull/50072 -- This is an automated message from the Apache Git Service. To respond to the mess

Re: [PR] [SPARK-51099][PYTHON][FOLLOWUP] Avoid logging when selector.select returns 0 without waiting the configured timeout [spark]

2025-02-24 Thread via GitHub
HyukjinKwon closed pull request #50071: [SPARK-51099][PYTHON][FOLLOWUP] Avoid logging when selector.select returns 0 without waiting the configured timeout URL: https://github.com/apache/spark/pull/50071 -- This is an automated message from the Apache Git Service. To respond to the message, p

Re: [PR] [SPARK-51252] [SS] Add instance metrics for last uploaded snapshot version in HDFS State Stores [spark]

2025-02-24 Thread via GitHub
liviazhu-db commented on code in PR #50030: URL: https://github.com/apache/spark/pull/50030#discussion_r1968396281 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/HDFSBackedStateStoreProvider.scala: ## @@ -219,7 +219,18 @@ private[sql] class HDFSBackedS

Re: [PR] [SPARK-51252] [SS] Add instance metrics for last uploaded snapshot version in HDFS State Stores [spark]

2025-02-24 Thread via GitHub
zecookiez commented on code in PR #50030: URL: https://github.com/apache/spark/pull/50030#discussion_r1968381328 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/HDFSBackedStateStoreProvider.scala: ## @@ -419,6 +433,10 @@ private[sql] class HDFSBackedSta

Re: [PR] [SPARK-51252] [SS] Add instance metrics for last uploaded snapshot version in HDFS State Stores [spark]

2025-02-24 Thread via GitHub
zecookiez commented on code in PR #50030: URL: https://github.com/apache/spark/pull/50030#discussion_r1968378429 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/HDFSBackedStateStoreProvider.scala: ## @@ -219,7 +219,18 @@ private[sql] class HDFSBackedSta

Re: [PR] [SPARK-51252] [SS] Add instance metrics for last uploaded snapshot version in HDFS State Stores [spark]

2025-02-24 Thread via GitHub
liviazhu-db commented on code in PR #50030: URL: https://github.com/apache/spark/pull/50030#discussion_r1968332195 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/HDFSBackedStateStoreProvider.scala: ## @@ -219,7 +219,18 @@ private[sql] class HDFSBackedS

Re: [PR] [SPARK-51252] [SS] Add instance metrics for last uploaded snapshot version in HDFS State Stores [spark]

2025-02-24 Thread via GitHub
liviazhu-db commented on code in PR #50030: URL: https://github.com/apache/spark/pull/50030#discussion_r1968325595 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/HDFSBackedStateStoreProvider.scala: ## @@ -419,6 +433,10 @@ private[sql] class HDFSBackedS

Re: [PR] [SPARK-51252] [SS] Add instance metrics for last uploaded snapshot version in HDFS State Stores [spark]

2025-02-24 Thread via GitHub
liviazhu-db commented on code in PR #50030: URL: https://github.com/apache/spark/pull/50030#discussion_r1968325595 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/HDFSBackedStateStoreProvider.scala: ## @@ -419,6 +433,10 @@ private[sql] class HDFSBackedS

Re: [PR] [SPARK-49489][SQL][HIVE] HMS client respects `hive.thrift.client.maxmessage.size` [spark]

2025-02-24 Thread via GitHub
Madhukar525722 commented on PR #50022: URL: https://github.com/apache/spark/pull/50022#issuecomment-2679438548 HI @pan3793 . The flow was able to reach the case msc: But there I added the debug log - ``` msc.getTTransport match { case t: TEndpointTransport =>

Re: [PR] [SPARK-50856][SS][PYTHON][CONNECT] Spark Connect Support for TransformWithStateInPandas In Python [spark]

2025-02-24 Thread via GitHub
jingz-db commented on code in PR #49560: URL: https://github.com/apache/spark/pull/49560#discussion_r1968245505 ## sql/connect/common/src/main/protobuf/spark/connect/relations.proto: ## @@ -1031,6 +1031,26 @@ message GroupMap { // (Optional) The schema for the grouped state

Re: [PR] [SPARK-51290][SQL] Enable filling default values in DSv2 writes [spark]

2025-02-24 Thread via GitHub
viirya commented on code in PR #50044: URL: https://github.com/apache/spark/pull/50044#discussion_r1968152464 ## sql/catalyst/src/test/scala/org/apache/spark/sql/connector/catalog/InMemoryBaseTable.scala: ## @@ -718,6 +724,11 @@ private class BufferedRowsReader( schema: S

Re: [PR] [SPARK-51290][SQL] Enable filling default values in DSv2 writes [spark]

2025-02-24 Thread via GitHub
viirya commented on code in PR #50044: URL: https://github.com/apache/spark/pull/50044#discussion_r1968132329 ## sql/core/src/test/scala/org/apache/spark/sql/connector/AlterTableTests.scala: ## @@ -328,7 +336,7 @@ trait AlterTableTests extends SharedSparkSession with QueryError

Re: [PR] [SPARK-50692][SQL][FOLLOWUP] Add the LPAD and RPAD pushdown support for H2 [spark]

2025-02-24 Thread via GitHub
dongjoon-hyun commented on PR #50068: URL: https://github.com/apache/spark/pull/50068#issuecomment-2679188787 Oh, did you aim to use this as a follow-up, @beliefer ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

Re: [PR] [SPARK-50692][SQL][FOLLOWUP] Add the LPAD and RPAD pushdown support for H2 [spark]

2025-02-24 Thread via GitHub
dongjoon-hyun commented on PR #50068: URL: https://github.com/apache/spark/pull/50068#issuecomment-2679192384 Please don't forget `[FOLLOWUP]` in the PR title next time, @beliefer . Or, please use a new JIRA ID. -- This is an automated message from the Apache Git Service. To respond to th

Re: [PR] [SPARK-50692][SQL] Add the LPAD and RPAD pushdown support for H2 [spark]

2025-02-24 Thread via GitHub
dongjoon-hyun closed pull request #50068: [SPARK-50692][SQL] Add the LPAD and RPAD pushdown support for H2 URL: https://github.com/apache/spark/pull/50068 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to g

Re: [PR] [SPARK-51305][SQL][CONNECT] Improve `SparkConnectPlanExecution.createObservedMetricsResponse` [spark]

2025-02-24 Thread via GitHub
dongjoon-hyun commented on PR #50066: URL: https://github.com/apache/spark/pull/50066#issuecomment-2679154851 Merged to master/4.0. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific co

Re: [PR] [SPARK-51305][SQL][CONNECT] Improve `SparkConnectPlanExecution.createObservedMetricsResponse` [spark]

2025-02-24 Thread via GitHub
dongjoon-hyun closed pull request #50066: [SPARK-51305][SQL][CONNECT] Improve `SparkConnectPlanExecution.createObservedMetricsResponse` URL: https://github.com/apache/spark/pull/50066 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitH

Re: [PR] [WIP][SPARK-50892][SQL]Add UnionLoopExec, physical operator for recursion, to perform execution of recursive queries [spark]

2025-02-24 Thread via GitHub
Pajaraja commented on code in PR #49955: URL: https://github.com/apache/spark/pull/49955#discussion_r1968084349 ## sql/core/src/main/scala/org/apache/spark/sql/execution/basicPhysicalOperators.scala: ## @@ -714,6 +717,177 @@ case class UnionExec(children: Seq[SparkPlan]) extends

Re: [PR] [SPARK-51290][SQL] Enable filling default values in DSv2 writes [spark]

2025-02-24 Thread via GitHub
viirya commented on code in PR #50044: URL: https://github.com/apache/spark/pull/50044#discussion_r1968084126 ## sql/catalyst/src/test/scala/org/apache/spark/sql/connector/catalog/InMemoryTableCatalog.scala: ## @@ -122,7 +122,7 @@ class BasicInMemoryTableCatalog extends TableCat

Re: [PR] [SPARK-51290][SQL] Enable filling default values in DSv2 writes [spark]

2025-02-24 Thread via GitHub
viirya commented on code in PR #50044: URL: https://github.com/apache/spark/pull/50044#discussion_r1968081855 ## sql/catalyst/src/test/scala/org/apache/spark/sql/connector/catalog/InMemoryTableCatalog.scala: ## @@ -122,7 +122,7 @@ class BasicInMemoryTableCatalog extends TableCat

Re: [PR] [WIP][SPARK-50892][SQL]Add UnionLoopExec, physical operator for recursion, to perform execution of recursive queries [spark]

2025-02-24 Thread via GitHub
Pajaraja commented on code in PR #49955: URL: https://github.com/apache/spark/pull/49955#discussion_r1968080690 ## common/utils/src/main/resources/error/error-conditions.json: ## @@ -4421,6 +4421,12 @@ ], "sqlState" : "38000" }, + "RECURSION_LEVEL_LIMIT_EXCEEDED" :

Re: [PR] [WIP][SPARK-50892][SQL]Add UnionLoopExec, physical operator for recursion, to perform execution of recursive queries [spark]

2025-02-24 Thread via GitHub
Pajaraja commented on code in PR #49955: URL: https://github.com/apache/spark/pull/49955#discussion_r1968080315 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala: ## @@ -848,6 +848,15 @@ object LimitPushDown extends Rule[LogicalPlan] { c

Re: [PR] [SPARK-50785][SQL] Refactor FOR statement to utilize local variables properly. [spark]

2025-02-24 Thread via GitHub
dusantism-db commented on code in PR #50026: URL: https://github.com/apache/spark/pull/50026#discussion_r1968081531 ## sql/core/src/main/scala/org/apache/spark/sql/scripting/SqlScriptingExecutionNode.scala: ## @@ -206,6 +207,15 @@ class TriggerToExceptionHandlerMap( def getNo

Re: [PR] [WIP][SPARK-50892][SQL]Add UnionLoopExec, physical operator for recursion, to perform execution of recursive queries [spark]

2025-02-24 Thread via GitHub
Pajaraja commented on code in PR #49955: URL: https://github.com/apache/spark/pull/49955#discussion_r1968079676 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/cteOperators.scala: ## @@ -34,7 +34,9 @@ import org.apache.spark.sql.internal.SQLConf * @p

Re: [PR] [WIP][SPARK-50892][SQL]Add UnionLoopExec, physical operator for recursion, to perform execution of recursive queries [spark]

2025-02-24 Thread via GitHub
Pajaraja commented on code in PR #49955: URL: https://github.com/apache/spark/pull/49955#discussion_r1968079360 ## sql/core/src/main/scala/org/apache/spark/sql/execution/basicPhysicalOperators.scala: ## @@ -714,6 +717,177 @@ case class UnionExec(children: Seq[SparkPlan]) extends

Re: [PR] [SPARK-51156][CONNECT][FOLLOWUP] Remove unused `private val AUTH_TOKEN_ON_INSECURE_CONN_ERROR_MSG` from `SparkConnectClient` [spark]

2025-02-24 Thread via GitHub
dongjoon-hyun closed pull request #50070: [SPARK-51156][CONNECT][FOLLOWUP] Remove unused `private val AUTH_TOKEN_ON_INSECURE_CONN_ERROR_MSG` from `SparkConnectClient` URL: https://github.com/apache/spark/pull/50070 -- This is an automated message from the Apache Git Service. To respond to th

Re: [PR] [SPARK-51290][SQL] Enable filling default values in DSv2 writes [spark]

2025-02-24 Thread via GitHub
viirya commented on code in PR #50044: URL: https://github.com/apache/spark/pull/50044#discussion_r1968042471 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala: ## @@ -3534,7 +3534,8 @@ class Analyzer(override val catalogManager: CatalogManage

Re: [PR] [SPARK-50639][SQL] Improve warning logging in CacheManager [spark]

2025-02-24 Thread via GitHub
vrozov commented on PR #49276: URL: https://github.com/apache/spark/pull/49276#issuecomment-2679043977 @gengliangwang Please see my [response](https://github.com/apache/spark/pull/49276#discussion_r1956993969) to your [comment](https://github.com/apache/spark/pull/49276#discussion_r1956913

Re: [PR] [SPARK-51290][SQL] Enable filling default values in DSv2 writes [spark]

2025-02-24 Thread via GitHub
viirya commented on code in PR #50044: URL: https://github.com/apache/spark/pull/50044#discussion_r1968038807 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala: ## @@ -3534,7 +3534,8 @@ class Analyzer(override val catalogManager: CatalogManage

Re: [PR] [SPARK-51290][SQL] Enable filling default values in DSv2 writes [spark]

2025-02-24 Thread via GitHub
viirya commented on code in PR #50044: URL: https://github.com/apache/spark/pull/50044#discussion_r1968032448 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TableOutputResolver.scala: ## @@ -80,7 +80,6 @@ object TableOutputResolver extends SQLConfHelper wi

Re: [PR] [SPARK-51221][CONNECT][TESTS] Use unresolvable host name in SparkConnectClientSuite [spark]

2025-02-24 Thread via GitHub
vrozov commented on PR #49960: URL: https://github.com/apache/spark/pull/49960#issuecomment-2679056088 @HyukjinKwon Please review -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comm

Re: [PR] [SPARK-51290][SQL] Enable filling default values in DSv2 writes [spark]

2025-02-24 Thread via GitHub
viirya commented on code in PR #50044: URL: https://github.com/apache/spark/pull/50044#discussion_r1968032448 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TableOutputResolver.scala: ## @@ -80,7 +80,6 @@ object TableOutputResolver extends SQLConfHelper wi

Re: [PR] [SPARK-51182][SQL] DataFrameWriter should throw dataPathNotSpecifiedError when path is not specified [spark]

2025-02-24 Thread via GitHub
vrozov commented on PR #49928: URL: https://github.com/apache/spark/pull/49928#issuecomment-2679052693 @cloud-fan Please review -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific commen

Re: [PR] [SPARK-51149][CORE] Log classpath in SparkSubmit on ClassNotFoundException [spark]

2025-02-24 Thread via GitHub
vrozov commented on PR #49870: URL: https://github.com/apache/spark/pull/49870#issuecomment-2679050636 @dongjoon-hyun Please review or advise who may review the PR? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the U

Re: [PR] [SPARK-51303] [SQL] [TESTS] Extend `ORDER BY` testing coverage [spark]

2025-02-24 Thread via GitHub
mihailoale-db commented on PR #50069: URL: https://github.com/apache/spark/pull/50069#issuecomment-2679037294 @MaxGekk Could you PTAL when you have time? Thanks -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL a

[PR] [SPARK-51156][CONNECT][FOLLOWUP] Remove unused `private val AUTH_TOKEN_ON_INSECURE_CONN_ERROR_MSG` from `SparkConnectClient` [spark]

2025-02-24 Thread via GitHub
LuciferYang opened a new pull request, #50070: URL: https://github.com/apache/spark/pull/50070 ### What changes were proposed in this pull request? This pr aims to remove unused `private val AUTH_TOKEN_ON_INSECURE_CONN_ERROR_MSG` from `SparkConnectClient` because it becomes a useless `p

Re: [PR] [SPARK-51095][CORE][SQL] Include caller context for hdfs audit logs for calls from driver [spark]

2025-02-24 Thread via GitHub
attilapiros closed pull request #49814: [SPARK-51095][CORE][SQL] Include caller context for hdfs audit logs for calls from driver URL: https://github.com/apache/spark/pull/49814 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

Re: [PR] [SPARK-51156][CONNECT][FOLLOWUP] Remove unused `private val AUTH_TOKEN_ON_INSECURE_CONN_ERROR_MSG` from `SparkConnectClient` [spark]

2025-02-24 Thread via GitHub
LuciferYang commented on PR #50070: URL: https://github.com/apache/spark/pull/50070#issuecomment-2678986308 https://github.com/LuciferYang/spark/actions/runs/13502647699/job/37724729211 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

Re: [PR] [SPARK-51303] [SQL] [TESTS] Extend `ORDER BY` testing coverage [spark]

2025-02-24 Thread via GitHub
mihailoale-db commented on code in PR #50069: URL: https://github.com/apache/spark/pull/50069#discussion_r1967916766 ## sql/core/src/test/resources/sql-tests/inputs/order-by.sql: ## @@ -0,0 +1,24 @@ +-- Test data. +CREATE OR REPLACE TEMPORARY VIEW testData AS SELECT * FROM VALUE

Re: [PR] [SPARK-51303] [SQL] [TESTS] Extend `ORDER BY` testing coverage [spark]

2025-02-24 Thread via GitHub
mihailoale-db commented on code in PR #50069: URL: https://github.com/apache/spark/pull/50069#discussion_r1967915773 ## sql/core/src/test/resources/sql-tests/inputs/order-by.sql: ## @@ -0,0 +1,24 @@ +-- Test data. +CREATE OR REPLACE TEMPORARY VIEW testData AS SELECT * FROM VALUE

Re: [PR] [SPARK-51303] [SQL] [TESTS] Extend `ORDER BY` testing coverage [spark]

2025-02-24 Thread via GitHub
vladimirg-db commented on code in PR #50069: URL: https://github.com/apache/spark/pull/50069#discussion_r1967880226 ## sql/core/src/test/resources/sql-tests/inputs/order-by.sql: ## @@ -0,0 +1,24 @@ +-- Test data. +CREATE OR REPLACE TEMPORARY VIEW testData AS SELECT * FROM VALUES

Re: [PR] [SPARK-49912] Refactor simple CASE statement to evaluate the case variable only once [spark]

2025-02-24 Thread via GitHub
cloud-fan closed pull request #50027: [SPARK-49912] Refactor simple CASE statement to evaluate the case variable only once URL: https://github.com/apache/spark/pull/50027 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use th

Re: [PR] [SPARK-49912] Refactor simple CASE statement to evaluate the case variable only once [spark]

2025-02-24 Thread via GitHub
cloud-fan commented on PR #50027: URL: https://github.com/apache/spark/pull/50027#issuecomment-2678690747 thanks, merging to master/4.0! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specif

Re: [PR] [SPARK-51273][SQL] Spark Connect Call Procedure runs the procedure twice [spark]

2025-02-24 Thread via GitHub
cloud-fan commented on code in PR #50031: URL: https://github.com/apache/spark/pull/50031#discussion_r1967751946 ## sql/connect/client/jvm/src/test/scala/org/apache/spark/sql/connect/ProcedureSuite.scala: ## @@ -0,0 +1,31 @@ +/* + * Licensed to the Apache Software Foundation (AS

Re: [PR] [SPARK-51256][SQL] Increase parallelism if joining with small bucket table [spark]

2025-02-24 Thread via GitHub
cloud-fan commented on code in PR #50004: URL: https://github.com/apache/spark/pull/50004#discussion_r1967730371 ## sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/EnsureRequirements.scala: ## @@ -150,10 +150,15 @@ case class EnsureRequirements( // A:

Re: [PR] [SPARK-50785][SQL] Refactor FOR statement to utilize local variables properly. [spark]

2025-02-24 Thread via GitHub
cloud-fan commented on code in PR #50026: URL: https://github.com/apache/spark/pull/50026#discussion_r1967684965 ## sql/core/src/main/scala/org/apache/spark/sql/scripting/SqlScriptingExecutionNode.scala: ## @@ -206,6 +207,15 @@ class TriggerToExceptionHandlerMap( def getNotFo

Re: [PR] [SPARK-51256][SQL] Increase parallelism if joining with small bucket table [spark]

2025-02-24 Thread via GitHub
cloud-fan commented on code in PR #50004: URL: https://github.com/apache/spark/pull/50004#discussion_r1967710205 ## sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/EnsureRequirements.scala: ## @@ -150,10 +150,15 @@ case class EnsureRequirements( // A:

Re: [PR] [SPARK-51299][SQL][UI] MetricUtils.stringValue should filter metric values with initValue rather than a hardcoded value [spark]

2025-02-24 Thread via GitHub
cloud-fan commented on PR #50055: URL: https://github.com/apache/spark/pull/50055#issuecomment-2678515900 I'm not very convinced by the "Why" section. What's the end-to-end problem you hit? BTW https://github.com/apache/spark/pull/47721 has some more context about the SQLMetric initi

Re: [PR] [SPARK-50994][CORE] Perform RDD conversion under tracked execution [spark]

2025-02-24 Thread via GitHub
cloud-fan commented on code in PR #49678: URL: https://github.com/apache/spark/pull/49678#discussion_r1967698435 ## sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala: ## @@ -2721,6 +2721,25 @@ class DataFrameSuite extends QueryTest parameters = Map("name"

Re: [PR] [SPARK-51187][SQL][SS] Implement the graceful deprecation of incorrect config introduced in SPARK-49699 [spark]

2025-02-24 Thread via GitHub
cloud-fan commented on PR #49983: URL: https://github.com/apache/spark/pull/49983#issuecomment-2678491074 have we merged this graceful deprecation in branch 3.5? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

Re: [PR] [SPARK-51281][SQL] DataFrameWriterV2 should respect the path option [spark]

2025-02-24 Thread via GitHub
beliefer commented on code in PR #50040: URL: https://github.com/apache/spark/pull/50040#discussion_r1967675305 ## sql/core/src/test/scala/org/apache/spark/sql/DataFrameWriterV2Suite.scala: ## @@ -841,20 +841,24 @@ class DataFrameWriterV2Suite extends QueryTest with SharedSpark

Re: [PR] [SPARK-51273][SQL] Spark Connect Call Procedure runs the procedure twice [spark]

2025-02-24 Thread via GitHub
cloud-fan commented on code in PR #50031: URL: https://github.com/apache/spark/pull/50031#discussion_r1967642091 ## sql/catalyst/src/test/scala/org/apache/spark/sql/connector/catalog/InMemoryProcedureCatalog.scala: ## @@ -0,0 +1,69 @@ +/* + * Licensed to the Apache Software Foun

Re: [PR] [SPARK-51078][SPARK-50963][ML][PYTHON][CONNECT][TESTS][FOLLOW-UP] Add back tests for default value [spark]

2025-02-24 Thread via GitHub
zhengruifeng closed pull request #50067: [SPARK-51078][SPARK-50963][ML][PYTHON][CONNECT][TESTS][FOLLOW-UP] Add back tests for default value URL: https://github.com/apache/spark/pull/50067 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

Re: [PR] [SPARK-51273][SQL] Spark Connect Call Procedure runs the procedure twice [spark]

2025-02-24 Thread via GitHub
cloud-fan commented on code in PR #50031: URL: https://github.com/apache/spark/pull/50031#discussion_r1967645049 ## sql/catalyst/src/test/scala/org/apache/spark/sql/connector/catalog/InMemoryProcedureCatalog.scala: ## @@ -0,0 +1,69 @@ +/* + * Licensed to the Apache Software Foun

Re: [PR] [SPARK-51078][SPARK-50963][ML][PYTHON][CONNECT][TESTS][FOLLOW-UP] Add back tests for default value [spark]

2025-02-24 Thread via GitHub
zhengruifeng commented on PR #50067: URL: https://github.com/apache/spark/pull/50067#issuecomment-2678422359 thanks @LuciferYang merged to master/4.0 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

  1   2   >