Re: [PR] [SPARK-51305][SQL][CONNECT] Improve `SparkConnectPlanExecution.createObservedMetricsResponse` [spark]

2025-02-24 Thread via GitHub
beliefer commented on PR #50066: URL: https://github.com/apache/spark/pull/50066#issuecomment-2680591378 @dongjoon-hyun Thank you ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific co

Re: [PR] [SPARK-51281][SQL] DataFrameWriterV2 should respect the path option [spark]

2025-02-24 Thread via GitHub
cloud-fan commented on code in PR #50040: URL: https://github.com/apache/spark/pull/50040#discussion_r1968959548 ## sql/core/src/test/scala/org/apache/spark/sql/DataFrameWriterV2Suite.scala: ## @@ -841,20 +841,24 @@ class DataFrameWriterV2Suite extends QueryTest with SharedSpar

Re: [PR] [SPARK-51265][SQL][SS] Throw proper error for eagerlyExecuteCommands containing streaming source marker [spark]

2025-02-24 Thread via GitHub
HeartSaVioR closed pull request #50015: [SPARK-51265][SQL][SS] Throw proper error for eagerlyExecuteCommands containing streaming source marker URL: https://github.com/apache/spark/pull/50015 -- This is an automated message from the Apache Git Service. To respond to the message, please log on

Re: [PR] [SPARK-51265][SQL][SS] Throw proper error for eagerlyExecuteCommands containing streaming source marker [spark]

2025-02-24 Thread via GitHub
HeartSaVioR commented on PR #50015: URL: https://github.com/apache/spark/pull/50015#issuecomment-2680891589 Closing via #50037 - much simpler change and both of PRs do not address the origin report which @cloud-fan will address later. -- This is an automated message from the Apache Git Se

Re: [PR] [SPARK-51187][SQL][SS] Implement the graceful deprecation of incorrect config introduced in SPARK-49699 [spark]

2025-02-24 Thread via GitHub
HeartSaVioR commented on PR #49983: URL: https://github.com/apache/spark/pull/49983#issuecomment-2680925857 @cloud-fan > have we merged this graceful deprecation in branch 3.5? Yes, that is merged. It's still a blocker for Spark 4.0.0 though. @dongjoon-hyun > If

Re: [PR] [SPARK-49489][SQL][HIVE] HMS client respects `hive.thrift.client.maxmessage.size` [spark]

2025-02-24 Thread via GitHub
pan3793 commented on code in PR #50022: URL: https://github.com/apache/spark/pull/50022#discussion_r1969188058 ## sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala: ## @@ -1407,13 +1410,83 @@ private[hive] object HiveClientImpl extends Logging {

Re: [PR] [SPARK-50856][SS][PYTHON][CONNECT] Spark Connect Support for TransformWithStateInPandas In Python [spark]

2025-02-24 Thread via GitHub
hvanhovell commented on code in PR #49560: URL: https://github.com/apache/spark/pull/49560#discussion_r1968788349 ## sql/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala: ## @@ -1034,6 +1038,49 @@ class SparkConnectPlanner( .log

Re: [PR] [SPARK-51289][SQL] Throw a proper error message for not fully implemented `SQLTableFunction` [spark]

2025-02-24 Thread via GitHub
wayneguow commented on PR #50073: URL: https://github.com/apache/spark/pull/50073#issuecomment-2680420915 cc @MaxGekk @allisonwang-db -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specifi

Re: [PR] [SPARK-50692][SQL][FOLLOWUP] Add the LPAD and RPAD pushdown support for H2 [spark]

2025-02-24 Thread via GitHub
beliefer commented on PR #50068: URL: https://github.com/apache/spark/pull/50068#issuecomment-2680314901 > Oh, did you aim to use this as a follow-up, @beliefer ? Uh, I forgot it. I want it to be a follow-up. -- This is an automated message from the Apache Git Service. To respond to

[PR] [SPARK-51308][CONNECT][BUILD] Update the relocation rules for the `connect` module in `SparkBuild.scala` to ensure that both Maven and SBT produce the assembly JAR according to the same rules [sp

2025-02-24 Thread via GitHub
wayneguow opened a new pull request, #50075: URL: https://github.com/apache/spark/pull/50075 ### What changes were proposed in this pull request? This PR aims to update the relocation rules for the `connect` module in `SparkBuild.scala`. ### Why are the changes needed?

Re: [PR] [SPARK-51281][SQL] DataFrameWriterV2 should respect the path option [spark]

2025-02-24 Thread via GitHub
dongjoon-hyun commented on code in PR #50040: URL: https://github.com/apache/spark/pull/50040#discussion_r1969044495 ## sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala: ## @@ -5545,6 +5545,15 @@ object SQLConf { .booleanConf .createWithDefault(f

Re: [PR] [SPARK-51281][SQL] DataFrameWriterV2 should respect the path option [spark]

2025-02-24 Thread via GitHub
szehon-ho commented on code in PR #50040: URL: https://github.com/apache/spark/pull/50040#discussion_r1969046534 ## sql/core/src/test/scala/org/apache/spark/sql/DataFrameWriterV2Suite.scala: ## @@ -839,4 +839,30 @@ class DataFrameWriterV2Suite extends QueryTest with SharedSpark

Re: [PR] [SPARK-51309][BUILD] Upgrade rocksdbjni to 9.10.0 [spark]

2025-02-24 Thread via GitHub
wayneguow commented on PR #50076: URL: https://github.com/apache/spark/pull/50076#issuecomment-2680812582 Related benchmark results: - jdk17: https://github.com/wayneguow/spark/actions/runs/13513028574 - jdk21: https://github.com/wayneguow/spark/actions/runs/13513032754 -- This i

[PR] [SPARK-51309][BUILD] Upgrade rocksdbjni to 9.10.0 [spark]

2025-02-24 Thread via GitHub
wayneguow opened a new pull request, #50076: URL: https://github.com/apache/spark/pull/50076 ### What changes were proposed in this pull request? The pr aims to upgrade `rocksdbjni` from 9.8.4 to 9.10.0. ### Why are the changes needed? There are some bug fixes and

Re: [PR] [SPARK-51252] [SS] Add instance metrics for last uploaded snapshot version in HDFS State Stores [spark]

2025-02-24 Thread via GitHub
micheal-o commented on code in PR #50030: URL: https://github.com/apache/spark/pull/50030#discussion_r1968977014 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/HDFSBackedStateStoreProvider.scala: ## @@ -419,6 +432,10 @@ private[sql] class HDFSBackedSta

Re: [PR] [SPARK-51289][SQL] Throw a proper error message for not fully implemented `SQLTableFunction` [spark]

2025-02-24 Thread via GitHub
LuciferYang commented on PR #50073: URL: https://github.com/apache/spark/pull/50073#issuecomment-2680871903 also cc @cloud-fan -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific commen

Re: [PR] [SPARK-51252] [SS] Add instance metrics for last uploaded snapshot version in HDFS State Stores [spark]

2025-02-24 Thread via GitHub
liviazhu-db commented on code in PR #50030: URL: https://github.com/apache/spark/pull/50030#discussion_r1968325595 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/HDFSBackedStateStoreProvider.scala: ## @@ -419,6 +433,10 @@ private[sql] class HDFSBackedS

Re: [PR] [SPARK-51252] [SS] Add instance metrics for last uploaded snapshot version in HDFS State Stores [spark]

2025-02-24 Thread via GitHub
liviazhu-db commented on code in PR #50030: URL: https://github.com/apache/spark/pull/50030#discussion_r1968325595 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/HDFSBackedStateStoreProvider.scala: ## @@ -419,6 +433,10 @@ private[sql] class HDFSBackedS

Re: [PR] [SPARK-51290][SQL] Enable filling default values in DSv2 writes [spark]

2025-02-24 Thread via GitHub
viirya commented on code in PR #50044: URL: https://github.com/apache/spark/pull/50044#discussion_r1968152464 ## sql/catalyst/src/test/scala/org/apache/spark/sql/connector/catalog/InMemoryBaseTable.scala: ## @@ -718,6 +724,11 @@ private class BufferedRowsReader( schema: S

Re: [PR] [SPARK-50856][SS][PYTHON][CONNECT] Spark Connect Support for TransformWithStateInPandas In Python [spark]

2025-02-24 Thread via GitHub
jingz-db commented on code in PR #49560: URL: https://github.com/apache/spark/pull/49560#discussion_r1968245505 ## sql/connect/common/src/main/protobuf/spark/connect/relations.proto: ## @@ -1031,6 +1031,26 @@ message GroupMap { // (Optional) The schema for the grouped state

Re: [PR] [SPARK-49489][SQL][HIVE] HMS client respects `hive.thrift.client.maxmessage.size` [spark]

2025-02-24 Thread via GitHub
Madhukar525722 commented on PR #50022: URL: https://github.com/apache/spark/pull/50022#issuecomment-2679438548 HI @pan3793 . The flow was able to reach the case msc: But there I added the debug log - ``` msc.getTTransport match { case t: TEndpointTransport =>

Re: [PR] [SPARK-51252] [SS] Add instance metrics for last uploaded snapshot version in HDFS State Stores [spark]

2025-02-24 Thread via GitHub
liviazhu-db commented on code in PR #50030: URL: https://github.com/apache/spark/pull/50030#discussion_r1968396281 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/HDFSBackedStateStoreProvider.scala: ## @@ -219,7 +219,18 @@ private[sql] class HDFSBackedS

Re: [PR] [SPARK-51252] [SS] Add instance metrics for last uploaded snapshot version in HDFS State Stores [spark]

2025-02-24 Thread via GitHub
liviazhu-db commented on code in PR #50030: URL: https://github.com/apache/spark/pull/50030#discussion_r1968332195 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/HDFSBackedStateStoreProvider.scala: ## @@ -219,7 +219,18 @@ private[sql] class HDFSBackedS

Re: [PR] [SPARK-51252] [SS] Add instance metrics for last uploaded snapshot version in HDFS State Stores [spark]

2025-02-24 Thread via GitHub
zecookiez commented on code in PR #50030: URL: https://github.com/apache/spark/pull/50030#discussion_r1968378429 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/HDFSBackedStateStoreProvider.scala: ## @@ -219,7 +219,18 @@ private[sql] class HDFSBackedSta

Re: [PR] [SPARK-51252] [SS] Add instance metrics for last uploaded snapshot version in HDFS State Stores [spark]

2025-02-24 Thread via GitHub
zecookiez commented on code in PR #50030: URL: https://github.com/apache/spark/pull/50030#discussion_r1968381328 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/HDFSBackedStateStoreProvider.scala: ## @@ -419,6 +433,10 @@ private[sql] class HDFSBackedSta

Re: [PR] [SPARK-50914][PYTHON][CONNECT] Match GRPC dependencies for Python-only master scheduled job [spark]

2025-02-24 Thread via GitHub
HyukjinKwon commented on PR #50058: URL: https://github.com/apache/spark/pull/50058#issuecomment-2679997693 cc @dongjoon-hyun This should fix the scheduled build and make it green 👍 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to Gi

Re: [PR] [SPARK-50914][PYTHON][CONNECT] Match GRPC dependencies for Python-only master scheduled job [spark]

2025-02-24 Thread via GitHub
dongjoon-hyun commented on PR #50058: URL: https://github.com/apache/spark/pull/50058#issuecomment-268537 Thank you, @HyukjinKwon ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specifi

[PR] [SPARK-51307][SQL] locationUri in CatalogStorageFormat shall be decoded for display [spark]

2025-02-24 Thread via GitHub
yaooqinn opened a new pull request, #50074: URL: https://github.com/apache/spark/pull/50074 ### What changes were proposed in this pull request? This PR uses CatalogUtils.URIToString instead of URI.toString to decode the location URI. ### Why are the changes needed?

Re: [PR] [SPARK-51304][DOCS][PYTHON] Use `getCondition` instead of `getErrorClass` in contribution guide [spark]

2025-02-24 Thread via GitHub
itholic commented on PR #50062: URL: https://github.com/apache/spark/pull/50062#issuecomment-2680085229 Thanks all for the review! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific com

[PR] [SPARK-51289][SQL] Throw a proper error message for not fully implemented `SQLTableFunction` [spark]

2025-02-24 Thread via GitHub
wayneguow opened a new pull request, #50073: URL: https://github.com/apache/spark/pull/50073 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How w

Re: [PR] [SPARK-51261][ML][PYTHON][CONNECT] Introduce model size estimation to control ml cache [spark]

2025-02-24 Thread via GitHub
hvanhovell commented on code in PR #50013: URL: https://github.com/apache/spark/pull/50013#discussion_r1968733126 ## mllib/src/main/scala/org/apache/spark/ml/classification/FMClassifier.scala: ## @@ -235,6 +236,13 @@ class FMClassifier @Since("3.0.0") ( model.setSummary(Som

Re: [PR] [SPARK-51261][ML][PYTHON][CONNECT] Introduce model size estimation to control ml cache [spark]

2025-02-24 Thread via GitHub
hvanhovell commented on code in PR #50013: URL: https://github.com/apache/spark/pull/50013#discussion_r1968734181 ## mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala: ## @@ -1248,6 +1263,11 @@ class LogisticRegressionModel private[spark] ( }

Re: [PR] [SPARK-51261][ML][PYTHON][CONNECT] Introduce model size estimation to control ml cache [spark]

2025-02-24 Thread via GitHub
hvanhovell commented on code in PR #50013: URL: https://github.com/apache/spark/pull/50013#discussion_r1968738761 ## common/utils/src/main/resources/error/error-conditions.json: ## @@ -780,6 +780,11 @@ "Cannot retrieve from the ML cache. It is probably because the e

Re: [PR] [DRAFT] Two string types [spark]

2025-02-24 Thread via GitHub
github-actions[bot] closed pull request #48861: [DRAFT] Two string types URL: https://github.com/apache/spark/pull/48861 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsu

Re: [PR] [SPARK-51261][ML][PYTHON][CONNECT] Introduce model size estimation to control ml cache [spark]

2025-02-24 Thread via GitHub
hvanhovell commented on code in PR #50013: URL: https://github.com/apache/spark/pull/50013#discussion_r1968708896 ## sql/connect/server/src/main/scala/org/apache/spark/sql/connect/ml/MLCache.scala: ## @@ -21,23 +21,52 @@ import java.util.concurrent.{ConcurrentMap, TimeUnit} i

Re: [PR] [SPARK-50795][SQL][FOLLOWUP] Set isParsing to false for the timestamp formatter in DESCRIBE AS JSON [spark]

2025-02-24 Thread via GitHub
yaooqinn commented on PR #50065: URL: https://github.com/apache/spark/pull/50065#issuecomment-2680188025 Merged to master/4.0, thank you @dongjoon-hyun @asl3 @HyukjinKwon -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and us

Re: [PR] [SPARK-51261][ML][PYTHON][CONNECT] Introduce model size estimation to control ml cache [spark]

2025-02-24 Thread via GitHub
hvanhovell commented on code in PR #50013: URL: https://github.com/apache/spark/pull/50013#discussion_r1968704565 ## sql/connect/server/src/main/scala/org/apache/spark/sql/connect/ml/MLHandler.scala: ## @@ -125,6 +127,15 @@ private[connect] object MLHandler extends Logging {

Re: [PR] [SPARK-51261][ML][PYTHON][CONNECT] Introduce model size estimation to control ml cache [spark]

2025-02-24 Thread via GitHub
hvanhovell commented on code in PR #50013: URL: https://github.com/apache/spark/pull/50013#discussion_r1968705260 ## sql/connect/server/src/main/scala/org/apache/spark/sql/connect/ml/MLException.scala: ## @@ -36,3 +36,17 @@ private[spark] case class MLCacheInvalidException(obje

Re: [PR] [SPARK-50795][SQL][FOLLOWUP] Set isParsing to false for the timestamp formatter in DESCRIBE AS JSON [spark]

2025-02-24 Thread via GitHub
dongjoon-hyun commented on PR #50065: URL: https://github.com/apache/spark/pull/50065#issuecomment-2680207829 Thank you, @yaooqinn and all. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spe

Re: [PR] [SPARK-51306][TESTS] Fix test errors caused by improper DROP TABLE/VIEW in describe.sql [spark]

2025-02-24 Thread via GitHub
dongjoon-hyun commented on PR #50061: URL: https://github.com/apache/spark/pull/50061#issuecomment-2680208684 Thank you for adding JIRA issue ID, @yaooqinn . -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abov

Re: [PR] [SPARK-51261][ML][PYTHON][CONNECT] Introduce model size estimation to control ml cache [spark]

2025-02-24 Thread via GitHub
hvanhovell commented on code in PR #50013: URL: https://github.com/apache/spark/pull/50013#discussion_r1968711609 ## mllib/src/main/scala/org/apache/spark/ml/util/Summary.scala: ## @@ -18,11 +18,21 @@ package org.apache.spark.ml.util import org.apache.spark.annotation.Since

Re: [PR] [SPARK-51261][ML][PYTHON][CONNECT] Introduce model size estimation to control ml cache [spark]

2025-02-24 Thread via GitHub
hvanhovell commented on code in PR #50013: URL: https://github.com/apache/spark/pull/50013#discussion_r1968717393 ## mllib-local/src/main/scala/org/apache/spark/ml/linalg/Vectors.scala: ## @@ -504,6 +506,10 @@ object Vectors { /** Max number of nonzero entries used in compu

Re: [PR] [SPARK-51099][PYTHON][FOLLOWUP] Avoid logging when selector.select returns 0 without waiting the configured timeout [spark]

2025-02-24 Thread via GitHub
HyukjinKwon closed pull request #50071: [SPARK-51099][PYTHON][FOLLOWUP] Avoid logging when selector.select returns 0 without waiting the configured timeout URL: https://github.com/apache/spark/pull/50071 -- This is an automated message from the Apache Git Service. To respond to the message, p

Re: [PR] [SPARK-51099][PYTHON][FOLLOWUP][4.0] Avoid logging when selector.select returns 0 without waiting the configured timeout [spark]

2025-02-24 Thread via GitHub
HyukjinKwon closed pull request #50072: [SPARK-51099][PYTHON][FOLLOWUP][4.0] Avoid logging when selector.select returns 0 without waiting the configured timeout URL: https://github.com/apache/spark/pull/50072 -- This is an automated message from the Apache Git Service. To respond to the mess

Re: [PR] [SPARK-50914][PYTHON][CONNECT] Match GRPC dependencies for Python-only master scheduled job [spark]

2025-02-24 Thread via GitHub
HyukjinKwon closed pull request #50058: [SPARK-50914][PYTHON][CONNECT] Match GRPC dependencies for Python-only master scheduled job URL: https://github.com/apache/spark/pull/50058 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub a

Re: [PR] [SPARK-50914][PYTHON][CONNECT] Match GRPC dependencies for Python-only master scheduled job [spark]

2025-02-24 Thread via GitHub
HyukjinKwon commented on PR #50058: URL: https://github.com/apache/spark/pull/50058#issuecomment-2680012404 Merged to master and branch-4.0. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the sp

Re: [PR] [SPARK-50319] Reorder ResolveIdentifierClause and BindParameter rules [spark]

2025-02-24 Thread via GitHub
github-actions[bot] closed pull request #48849: [SPARK-50319] Reorder ResolveIdentifierClause and BindParameter rules URL: https://github.com/apache/spark/pull/48849 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

Re: [PR] [SPARK-51306][TESTS] Fix test errors caused by improper DROP TABLE/VIEW in describe.sql [spark]

2025-02-24 Thread via GitHub
yaooqinn closed pull request #50061: [SPARK-51306][TESTS] Fix test errors caused by improper DROP TABLE/VIEW in describe.sql URL: https://github.com/apache/spark/pull/50061 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

Re: [PR] [SPARK-51261][ML][PYTHON][CONNECT] Introduce model size estimation to control ml cache [spark]

2025-02-24 Thread via GitHub
hvanhovell commented on code in PR #50013: URL: https://github.com/apache/spark/pull/50013#discussion_r1968701252 ## sql/connect/server/src/main/scala/org/apache/spark/sql/connect/config/Connect.scala: ## @@ -313,4 +313,49 @@ object Connect { .internal() .booleanCo

Re: [PR] [SPARK-50795][SQL][FOLLOWUP] Set isParsing to false for the timestamp formatter in DESCRIBE AS JSON [spark]

2025-02-24 Thread via GitHub
yaooqinn closed pull request #50065: [SPARK-50795][SQL][FOLLOWUP] Set isParsing to false for the timestamp formatter in DESCRIBE AS JSON URL: https://github.com/apache/spark/pull/50065 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to Git

Re: [PR] [SPARK-51261][ML][PYTHON][CONNECT] Introduce model size estimation to control ml cache [spark]

2025-02-24 Thread via GitHub
hvanhovell commented on code in PR #50013: URL: https://github.com/apache/spark/pull/50013#discussion_r1968703858 ## sql/connect/server/src/main/scala/org/apache/spark/sql/connect/ml/MLHandler.scala: ## @@ -125,6 +127,15 @@ private[connect] object MLHandler extends Logging {

Re: [PR] [SPARK-51306][TESTS] Fix test errors caused by improper DROP TABLE/VIEW in describe.sql [spark]

2025-02-24 Thread via GitHub
yaooqinn commented on PR #50061: URL: https://github.com/apache/spark/pull/50061#issuecomment-2680181236 Thank you @dongjoon-hyun @LuciferYang, SPARK-51306 is attached. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use t

Re: [PR] [SPARK-50692][SQL][FOLLOWUP] Add the LPAD and RPAD pushdown support for H2 [spark]

2025-02-24 Thread via GitHub
beliefer commented on PR #50068: URL: https://github.com/apache/spark/pull/50068#issuecomment-2680315941 @dongjoon-hyun Thank you! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific com

Re: [PR] [SPARK-51302][CONNECT] Spark Connect supports JDBC should use the DataFrameReader API [spark]

2025-02-24 Thread via GitHub
beliefer commented on PR #50059: URL: https://github.com/apache/spark/pull/50059#issuecomment-2680328835 > Do you think you can add some test cases, @beliefer , to be clear what was the problem and to prevent a future regression? Spark Connect already have the test cases. This improve

Re: [PR] [SPARK-50856][SS][PYTHON][CONNECT] Spark Connect Support for TransformWithStateInPandas In Python [spark]

2025-02-24 Thread via GitHub
hvanhovell commented on code in PR #49560: URL: https://github.com/apache/spark/pull/49560#discussion_r1968783156 ## sql/connect/common/src/main/protobuf/spark/connect/relations.proto: ## @@ -1031,6 +1031,26 @@ message GroupMap { // (Optional) The schema for the grouped sta

Re: [PR] [SPARK-50856][SS][PYTHON][CONNECT] Spark Connect Support for TransformWithStateInPandas In Python [spark]

2025-02-24 Thread via GitHub
hvanhovell commented on code in PR #49560: URL: https://github.com/apache/spark/pull/49560#discussion_r1968784789 ## sql/connect/common/src/main/protobuf/spark/connect/relations.proto: ## @@ -1031,6 +1031,26 @@ message GroupMap { // (Optional) The schema for the grouped sta

Re: [PR] [SPARK-51300][PS][DOCS] Fix broken link for `ps.sql` [spark]

2025-02-24 Thread via GitHub
HyukjinKwon commented on PR #50056: URL: https://github.com/apache/spark/pull/50056#issuecomment-2677664899 Merged to master and branch-4.0. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the sp

Re: [PR] [SPARK-51300][PS][DOCS] Fix broken link for `ps.sql` [spark]

2025-02-24 Thread via GitHub
HyukjinKwon closed pull request #50056: [SPARK-51300][PS][DOCS] Fix broken link for `ps.sql` URL: https://github.com/apache/spark/pull/50056 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the speci

[PR] [WIP][SPARK-51302][CONNECT] Spark Connect supports JDBC should use the DataFrameReader API [spark]

2025-02-24 Thread via GitHub
beliefer opened a new pull request, #50059: URL: https://github.com/apache/spark/pull/50059 ### What changes were proposed in this pull request? This PR proposes to unify the calling to the DataFrameReader API in Spark Connect where supports the jdbc API. ### Why are the change

Re: [PR] [SPARK-51301][BUILD] Bump zstd-jni 1.5.7-1 [spark]

2025-02-24 Thread via GitHub
LuciferYang commented on code in PR #50057: URL: https://github.com/apache/spark/pull/50057#discussion_r1967181996 ## core/benchmarks/ZStandardBenchmark-results.txt: ## @@ -2,48 +2,48 @@ Benchmark ZStandardCompressionCodec =

Re: [PR] [SPARK-51301][BUILD] Bump zstd-jni 1.5.7-1 [spark]

2025-02-24 Thread via GitHub
yaooqinn commented on code in PR #50057: URL: https://github.com/apache/spark/pull/50057#discussion_r1967210507 ## core/benchmarks/ZStandardBenchmark-results.txt: ## @@ -2,48 +2,48 @@ Benchmark ZStandardCompressionCodec

Re: [PR] [SPARK-51301][BUILD] Bump zstd-jni 1.5.7-1 [spark]

2025-02-24 Thread via GitHub
LuciferYang commented on code in PR #50057: URL: https://github.com/apache/spark/pull/50057#discussion_r1967217142 ## core/benchmarks/ZStandardBenchmark-results.txt: ## @@ -2,48 +2,48 @@ Benchmark ZStandardCompressionCodec =

[PR] Add rpad and lpad support for PostgreDialect and MsSQLServerDialect [spark]

2025-02-24 Thread via GitHub
milosstojanovic opened a new pull request, #50060: URL: https://github.com/apache/spark/pull/50060 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ###

Re: [PR] [SPARK-50856][SS][PYTHON][CONNECT] Spark Connect Support for TransformWithStateInPandas In Python [spark]

2025-02-24 Thread via GitHub
haiyangsun-db commented on code in PR #49560: URL: https://github.com/apache/spark/pull/49560#discussion_r1967297789 ## sql/connect/common/src/main/protobuf/spark/connect/relations.proto: ## @@ -1031,6 +1031,26 @@ message GroupMap { // (Optional) The schema for the grouped

[PR] [MINOR][TESTS] Fix test errors caused by improper DROP TABLE/VIEW in describe.sql [spark]

2025-02-24 Thread via GitHub
yaooqinn opened a new pull request, #50061: URL: https://github.com/apache/spark/pull/50061 ### What changes were proposed in this pull request? This PR fixes test errors caused by improper DROP TABLE/VIEW in describe.sql - Table Not Found Error when dropping views after

Re: [PR] Add rpad and lpad support for PostgreDialect and MsSQLServerDialect [spark]

2025-02-24 Thread via GitHub
milosstojanovic closed pull request #50060: Add rpad and lpad support for PostgreDialect and MsSQLServerDialect URL: https://github.com/apache/spark/pull/50060 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [PR] [SPARK-51278][PYTHON] Use appropriate structure of JSON format for `PySparkLogger` [spark]

2025-02-24 Thread via GitHub
itholic closed pull request #50038: [SPARK-51278][PYTHON] Use appropriate structure of JSON format for `PySparkLogger` URL: https://github.com/apache/spark/pull/50038 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the UR

Re: [PR] [SPARK-51278][PYTHON] Use appropriate structure of JSON format for `PySparkLogger` [spark]

2025-02-24 Thread via GitHub
itholic commented on PR #50038: URL: https://github.com/apache/spark/pull/50038#issuecomment-2677920582 Merged to master and branch-4.0. Thanks @ueshin @HyukjinKwon for the review! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to G

Re: [PR] [SPARK-51301][BUILD] Bump zstd-jni 1.5.7-1 [spark]

2025-02-24 Thread via GitHub
beliefer commented on code in PR #50057: URL: https://github.com/apache/spark/pull/50057#discussion_r1967317270 ## core/benchmarks/ZStandardBenchmark-results.txt: ## @@ -2,48 +2,48 @@ Benchmark ZStandardCompressionCodec

[PR] [SPARK-51304][DOCS][PYTHON] Use `getCondition` instead of `getErrorClass` in contribution guide [spark]

2025-02-24 Thread via GitHub
itholic opened a new pull request, #50062: URL: https://github.com/apache/spark/pull/50062 ### What changes were proposed in this pull request? This PR proposes to use `getCondition` instead of `getErrorClass` in contribution guide ### Why are the changes needed?

Re: [PR] [DRAFT] Resolve default string producing expressions [spark]

2025-02-24 Thread via GitHub
stefankandic commented on code in PR #50053: URL: https://github.com/apache/spark/pull/50053#discussion_r1967363581 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveDDLCommandStringTypes.scala: ## @@ -28,6 +29,10 @@ import org.apache.spark.sql.types.{

[PR] [SPARK-50098][PYTHON][FOLLOW-UP] Update _minimum_googleapis_common_protos_version in setup.py for pyspark-client [spark]

2025-02-24 Thread via GitHub
HyukjinKwon opened a new pull request, #50063: URL: https://github.com/apache/spark/pull/50063 ### What changes were proposed in this pull request? This PR is a followup of https://github.com/apache/spark/pull/48643 that updates _minimum_googleapis_common_protos_version in setup.py fo

[PR] [SPARK-50015][PYTHON][FOLLOW-UP] Update _minimum_grpc_version in setup.py for pyspark-client [spark]

2025-02-24 Thread via GitHub
HyukjinKwon opened a new pull request, #50064: URL: https://github.com/apache/spark/pull/50064 ### What changes were proposed in this pull request? This PR is a followup of https://github.com/apache/spark/pull/48524 that updates _minimum_grpc_version in setup.py for pyspark-client

Re: [PR] [SPARK-50098][PYTHON][FOLLOW-UP] Update _minimum_googleapis_common_protos_version in setup.py for pyspark-client [spark]

2025-02-24 Thread via GitHub
HyukjinKwon commented on PR #50063: URL: https://github.com/apache/spark/pull/50063#issuecomment-2678115467 Merged to master and branch-4.0. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the sp

Re: [PR] [SPARK-50098][PYTHON][FOLLOW-UP] Update _minimum_googleapis_common_protos_version in setup.py for pyspark-client [spark]

2025-02-24 Thread via GitHub
HyukjinKwon closed pull request #50063: [SPARK-50098][PYTHON][FOLLOW-UP] Update _minimum_googleapis_common_protos_version in setup.py for pyspark-client URL: https://github.com/apache/spark/pull/50063 -- This is an automated message from the Apache Git Service. To respond to the message, plea

Re: [PR] [SPARK-50015][PYTHON][FOLLOW-UP] Update _minimum_grpc_version in setup.py for pyspark-client [spark]

2025-02-24 Thread via GitHub
HyukjinKwon commented on PR #50064: URL: https://github.com/apache/spark/pull/50064#issuecomment-2678117982 Merged to master and brnach-4.0. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the sp

Re: [PR] [SPARK-51302][CONNECT] Spark Connect supports JDBC should use the DataFrameReader API [spark]

2025-02-24 Thread via GitHub
beliefer commented on PR #50059: URL: https://github.com/apache/spark/pull/50059#issuecomment-2678117446 ping @HyukjinKwon @zhengruifeng @LuciferYang -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[PR] [SPARK-50795][SQL][FOLLOWUP] Set isParsing to false for the timestamp formatter in DESCRIBE AS JSON [spark]

2025-02-24 Thread via GitHub
yaooqinn opened a new pull request, #50065: URL: https://github.com/apache/spark/pull/50065 ### What changes were proposed in this pull request? This PR set isParsing to false for the timestamp formatter in DESCRIBE AS JSON, because the formatter is not used for parsing datetime strin

Re: [PR] [SPARK-50015][PYTHON][FOLLOW-UP] Update _minimum_grpc_version in setup.py for pyspark-client [spark]

2025-02-24 Thread via GitHub
HyukjinKwon closed pull request #50064: [SPARK-50015][PYTHON][FOLLOW-UP] Update _minimum_grpc_version in setup.py for pyspark-client URL: https://github.com/apache/spark/pull/50064 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

Re: [PR] [SPARK-50785][SQL] Refactor FOR statement to utilize local variables properly. [spark]

2025-02-24 Thread via GitHub
dusantism-db commented on code in PR #50026: URL: https://github.com/apache/spark/pull/50026#discussion_r1967448405 ## sql/core/src/main/scala/org/apache/spark/sql/scripting/SqlScriptingExecutionNode.scala: ## @@ -206,6 +207,15 @@ class TriggerToExceptionHandlerMap( def getNo

Re: [PR] [SPARK-50785][SQL] Refactor FOR statement to utilize local variables properly. [spark]

2025-02-24 Thread via GitHub
dusantism-db commented on code in PR #50026: URL: https://github.com/apache/spark/pull/50026#discussion_r1967452308 ## sql/core/src/main/scala/org/apache/spark/sql/scripting/SqlScriptingExecutionNode.scala: ## @@ -206,6 +207,15 @@ class TriggerToExceptionHandlerMap( def getNo

Re: [PR] [SPARK-51301][BUILD] Bump zstd-jni 1.5.7-1 [spark]

2025-02-24 Thread via GitHub
yaooqinn commented on code in PR #50057: URL: https://github.com/apache/spark/pull/50057#discussion_r1967455912 ## core/benchmarks/ZStandardBenchmark-results.txt: ## @@ -2,48 +2,48 @@ Benchmark ZStandardCompressionCodec

Re: [PR] [SPARK-51301][BUILD] Bump zstd-jni 1.5.7-1 [spark]

2025-02-24 Thread via GitHub
pan3793 commented on code in PR #50057: URL: https://github.com/apache/spark/pull/50057#discussion_r1967470537 ## core/benchmarks/ZStandardBenchmark-results.txt: ## @@ -2,48 +2,48 @@ Benchmark ZStandardCompressionCodec =

Re: [PR] [SPARK-49489][SQL][HIVE] HMS client respects `hive.thrift.client.maxmessage.size` [spark]

2025-02-24 Thread via GitHub
Madhukar525722 commented on PR #50022: URL: https://github.com/apache/spark/pull/50022#issuecomment-2678157954 HI @pan3793 , while testing we are facing a warning `[spark3-client]$ spark-sql --master yarn --deploy-mode client --driver-memory 4g --executor-memory 4g --conf spark.hadoo

Re: [PR] [SPARK-49489][SQL][HIVE] HMS client respects `hive.thrift.client.maxmessage.size` [spark]

2025-02-24 Thread via GitHub
pan3793 commented on PR #50022: URL: https://github.com/apache/spark/pull/50022#issuecomment-2678172344 @Madhukar525722 so it works? it's a warning message, not an error, and it seems reasonable to me. > 07:45:49 WARN HiveConf: HiveConf of name hive.thrift.client.max.message.

Re: [PR] [SPARK-49489][SQL][HIVE] HMS client respects `hive.thrift.client.maxmessage.size` [spark]

2025-02-24 Thread via GitHub
Madhukar525722 commented on PR #50022: URL: https://github.com/apache/spark/pull/50022#issuecomment-2678181886 It didnt worked @pan3793 , I am suspecting that the config setup didnt happend thats why it resulted in the same old behaviour. Apart from that other error logs are still same

[PR] [SPARK-51305][CONNECT] Improve the code for createObservedMetricsResponse [spark]

2025-02-24 Thread via GitHub
beliefer opened a new pull request, #50066: URL: https://github.com/apache/spark/pull/50066 ### What changes were proposed in this pull request? This PR proposes to improve the code for `createObservedMetricsResponse`. ### Why are the changes needed? There exists a duplicate

Re: [PR] [SPARK-49489][SQL][HIVE] HMS client respects `hive.thrift.client.maxmessage.size` [spark]

2025-02-24 Thread via GitHub
pan3793 commented on code in PR #50022: URL: https://github.com/apache/spark/pull/50022#discussion_r1967508152 ## sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala: ## @@ -1407,13 +1408,74 @@ private[hive] object HiveClientImpl extends Logging {

Re: [PR] [SPARK-49489][SQL][HIVE] HMS client respects `hive.thrift.client.maxmessage.size` [spark]

2025-02-24 Thread via GitHub
pan3793 commented on code in PR #50022: URL: https://github.com/apache/spark/pull/50022#discussion_r1967508738 ## sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala: ## @@ -1407,13 +1408,74 @@ private[hive] object HiveClientImpl extends Logging {

Re: [PR] [SPARK-51301][BUILD] Bump zstd-jni 1.5.7-1 [spark]

2025-02-24 Thread via GitHub
beliefer commented on code in PR #50057: URL: https://github.com/apache/spark/pull/50057#discussion_r1967516710 ## core/benchmarks/ZStandardBenchmark-jdk21-results.txt: ## @@ -2,48 +2,48 @@ Benchmark ZStandardCompressionCodec ==

Re: [PR] [WIP][SQL] Add rpad and lpad support for PostgresDialect and MsSQLServerDialect expression pushdown [spark]

2025-02-24 Thread via GitHub
beliefer commented on PR #50060: URL: https://github.com/apache/spark/pull/50060#issuecomment-2678235174 Please create an issue for track down. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[PR] [SPARK-51078][SPARK-50963][ML][PYTHON][CONNECT][TESTS][FOLLOW-UP] Add back tests for default value [spark]

2025-02-24 Thread via GitHub
zhengruifeng opened a new pull request, #50067: URL: https://github.com/apache/spark/pull/50067 ### What changes were proposed in this pull request? add back tests deleted in https://github.com/apache/spark/commit/e0a7db2d2a7d295f933f9fc2d5605c5e59c58aa7#diff-50e109673576cc6d4f872

Re: [PR] [SPARK-49489][SQL][HIVE] HMS client respects `hive.thrift.client.maxmessage.size` [spark]

2025-02-24 Thread via GitHub
Madhukar525722 commented on code in PR #50022: URL: https://github.com/apache/spark/pull/50022#discussion_r1967534275 ## sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala: ## @@ -1407,13 +1408,74 @@ private[hive] object HiveClientImpl extends Logging

Re: [PR] [SPARK-49489][SQL][HIVE] HMS client respects `hive.thrift.client.maxmessage.size` [spark]

2025-02-24 Thread via GitHub
Madhukar525722 commented on code in PR #50022: URL: https://github.com/apache/spark/pull/50022#discussion_r1967534275 ## sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala: ## @@ -1407,13 +1408,74 @@ private[hive] object HiveClientImpl extends Logging

Re: [PR] [SPARK-49489][SQL][HIVE] HMS client respects `hive.thrift.client.maxmessage.size` [spark]

2025-02-24 Thread via GitHub
pan3793 commented on code in PR #50022: URL: https://github.com/apache/spark/pull/50022#discussion_r1967538536 ## sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala: ## @@ -1407,13 +1408,74 @@ private[hive] object HiveClientImpl extends Logging {

Re: [PR] [SPARK-51301][BUILD] Bump zstd-jni 1.5.7-1 [spark]

2025-02-24 Thread via GitHub
pan3793 commented on code in PR #50057: URL: https://github.com/apache/spark/pull/50057#discussion_r1967556026 ## core/benchmarks/ZStandardBenchmark-jdk21-results.txt: ## @@ -2,48 +2,48 @@ Benchmark ZStandardCompressionCodec ===

Re: [PR] [SPARK-51301][BUILD] Bump zstd-jni 1.5.7-1 [spark]

2025-02-24 Thread via GitHub
beliefer commented on code in PR #50057: URL: https://github.com/apache/spark/pull/50057#discussion_r1967569627 ## core/benchmarks/ZStandardBenchmark-jdk21-results.txt: ## @@ -2,48 +2,48 @@ Benchmark ZStandardCompressionCodec ==

Re: [PR] [SPARK-51301][BUILD] Bump zstd-jni 1.5.7-1 [spark]

2025-02-24 Thread via GitHub
pan3793 commented on code in PR #50057: URL: https://github.com/apache/spark/pull/50057#discussion_r1967578779 ## core/benchmarks/ZStandardBenchmark-jdk21-results.txt: ## @@ -2,48 +2,48 @@ Benchmark ZStandardCompressionCodec ===

Re: [PR] [SPARK-49912] Refactor simple CASE statement to evaluate the case variable only once [spark]

2025-02-24 Thread via GitHub
dusantism-db commented on code in PR #50027: URL: https://github.com/apache/spark/pull/50027#discussion_r1967595134 ## sql/core/src/main/scala/org/apache/spark/sql/scripting/SqlScriptingExecutionNode.scala: ## @@ -599,6 +599,116 @@ class CaseStatementExec( } } +/** + * Exe

Re: [PR] [SPARK-51290][SQL] Enable filling default values in DSv2 writes [spark]

2025-02-24 Thread via GitHub
cloud-fan commented on code in PR #50044: URL: https://github.com/apache/spark/pull/50044#discussion_r1967618587 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala: ## @@ -3534,7 +3534,8 @@ class Analyzer(override val catalogManager: CatalogMan

  1   2   >