Re: [PR] [SPARK-50903][CONNECT] Cache logical plans after analysis [spark]

2025-01-30 Thread via GitHub
changgyoopark-db commented on code in PR #49584: URL: https://github.com/apache/spark/pull/49584#discussion_r1936816392 ## sql/connect/server/src/main/scala/org/apache/spark/sql/connect/service/SessionHolder.scala: ## @@ -440,46 +443,64 @@ case class SessionHolder(userId: String

Re: [PR] [SPARK-50903][CONNECT] Cache logical plans after analysis [spark]

2025-01-30 Thread via GitHub
changgyoopark-db commented on code in PR #49584: URL: https://github.com/apache/spark/pull/49584#discussion_r1936815612 ## sql/connect/server/src/main/scala/org/apache/spark/sql/connect/service/SessionHolder.scala: ## @@ -440,46 +443,64 @@ case class SessionHolder(userId: String

[PR] [SPARK-51049][CORE] Increase S3A Vector IO threshold for range merge [spark]

2025-01-30 Thread via GitHub
dongjoon-hyun opened a new pull request, #49748: URL: https://github.com/apache/spark/pull/49748 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### H

Re: [PR] [SPARK-51045][TESTS][4.0] Regenerate benchmark results after upgrading to Scala 2.13.16 [spark]

2025-01-30 Thread via GitHub
dongjoon-hyun commented on PR #49741: URL: https://github.com/apache/spark/pull/49741#issuecomment-2626487557 There are a few places to spot, but in general, when we cross check both Java 17/21. There seems to be no regression in Scala 2.13.16. -- This is an automated message from the Apa

Re: [PR] [SPARK-51045][TESTS][4.0] Regenerate benchmark results after upgrading to Scala 2.13.16 [spark]

2025-01-30 Thread via GitHub
dongjoon-hyun commented on code in PR #49741: URL: https://github.com/apache/spark/pull/49741#discussion_r1936792460 ## sql/core/benchmarks/ConstantColumnVectorBenchmark-results.txt: ## @@ -1,276 +1,276 @@ -OpenJDK 64-Bit Server VM 17.0.13+11-LTS on Linux 6.8.0-1017-azure +OpenJ

Re: [PR] [SPARK-51045][TESTS][4.0] Regenerate benchmark results after upgrading to Scala 2.13.16 [spark]

2025-01-30 Thread via GitHub
dongjoon-hyun commented on code in PR #49741: URL: https://github.com/apache/spark/pull/49741#discussion_r1936791506 ## sql/core/benchmarks/ConstantColumnVectorBenchmark-jdk21-results.txt: ## @@ -1,280 +1,280 @@ -OpenJDK 64-Bit Server VM 21.0.5+11-LTS on Linux 6.8.0-1017-azure +

Re: [PR] SPARK-51048 | Support stop java spark context with exit code [spark]

2025-01-30 Thread via GitHub
prathit06 commented on PR #49746: URL: https://github.com/apache/spark/pull/49746#issuecomment-2626473930 Hi @dongjoon-hyun could you please take a look, thank you ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

Re: [PR] [SPARK-51045][TESTS][4.0] Regenerate benchmark results after upgrading to Scala 2.13.16 [spark]

2025-01-30 Thread via GitHub
dongjoon-hyun commented on code in PR #49741: URL: https://github.com/apache/spark/pull/49741#discussion_r1936783192 ## sql/core/benchmarks/DataSourceReadBenchmark-jdk21-results.txt: ## @@ -2,437 +2,437 @@ SQL Single Numeric Column Scan ===

Re: [PR] [SPARK-50967][SS] Add option to skip emitting initial state keys within the FMGWS operator [spark]

2025-01-30 Thread via GitHub
anishshri-db commented on code in PR #49632: URL: https://github.com/apache/spark/pull/49632#discussion_r1936759341 ## sql/core/src/main/scala/org/apache/spark/sql/execution/SparkStrategies.scala: ## @@ -828,7 +830,9 @@ abstract class SparkStrategies extends QueryPlanner[SparkP

Re: [PR] [SPARK-50793][SQL] Fix MySQL cast function for DOUBLE, LONGTEXT, BIGINT and BLOB types [spark]

2025-01-30 Thread via GitHub
beliefer commented on code in PR #49453: URL: https://github.com/apache/spark/pull/49453#discussion_r1936756308 ## connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/MySQLIntegrationSuite.scala: ## @@ -241,6 +241,84 @@ class MySQLIntegrationSuite exte

Re: [PR] [SPARK-51047][SS] Add tests to verify scan ordering for non-zero start ordinals as well as non-ascending ordinals [spark]

2025-01-30 Thread via GitHub
HeartSaVioR commented on code in PR #49747: URL: https://github.com/apache/spark/pull/49747#discussion_r1936722556 ## sql/core/src/test/scala/org/apache/spark/sql/execution/streaming/state/RocksDBStateStoreSuite.scala: ## @@ -343,6 +343,63 @@ class RocksDBStateStoreSuite extends

Re: [PR] [SPARK-51047][SS] Add tests to verify scan ordering for non-zero start ordinals as well as non-ascending ordinals [spark]

2025-01-30 Thread via GitHub
anishshri-db commented on PR #49747: URL: https://github.com/apache/spark/pull/49747#issuecomment-2626353644 cc - @HeartSaVioR - PTAL, thx ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the sp

Re: [PR] [SPARK-51045][TESTS][4.0] Regenerate benchmark results after upgrading to Scala 2.13.16 [spark]

2025-01-30 Thread via GitHub
dongjoon-hyun commented on code in PR #49741: URL: https://github.com/apache/spark/pull/49741#discussion_r1936702343 ## sql/catalyst/benchmarks/HexBenchmark-jdk21-results.txt: ## @@ -2,13 +2,13 @@ UnHex Comparison ==

Re: [PR] [SPARK-51044][SS] Add ordering related tests for list state [spark]

2025-01-30 Thread via GitHub
HeartSaVioR commented on code in PR #49742: URL: https://github.com/apache/spark/pull/49742#discussion_r1936705916 ## sql/core/src/test/scala/org/apache/spark/sql/execution/streaming/state/ListStateSuite.scala: ## @@ -99,6 +99,153 @@ class ListStateSuite extends StateVariableSui

Re: [PR] [SPARK-50967][SS] Add option to skip emitting initial state keys within the FMGWS operator [spark]

2025-01-30 Thread via GitHub
HeartSaVioR commented on code in PR #49632: URL: https://github.com/apache/spark/pull/49632#discussion_r1936697121 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/FlatMapGroupsWithStateExec.scala: ## @@ -312,10 +315,10 @@ trait FlatMapGroupsWithStateExecBase

Re: [PR] [SPARK-51045][TESTS][4.0] Regenerate benchmark results after upgrading to Scala 2.13.16 [spark]

2025-01-30 Thread via GitHub
dongjoon-hyun commented on code in PR #49741: URL: https://github.com/apache/spark/pull/49741#discussion_r1936701638 ## sql/catalyst/benchmarks/HexBenchmark-results.txt: ## @@ -2,13 +2,13 @@ UnHex Comparison

Re: [PR] [SPARK-51045][TESTS][4.0] Regenerate benchmark results after upgrading to Scala 2.13.16 [spark]

2025-01-30 Thread via GitHub
dongjoon-hyun commented on code in PR #49741: URL: https://github.com/apache/spark/pull/49741#discussion_r1936699566 ## sql/catalyst/benchmarks/EscapePathBenchmark-jdk21-results.txt: ## @@ -2,23 +2,23 @@ Escape =

Re: [PR] [SPARK-50974][ML][PYTHON][CONNECT] Add support foldCol for CrossValidator on connect [spark]

2025-01-30 Thread via GitHub
wbo4958 commented on PR #49743: URL: https://github.com/apache/spark/pull/49743#issuecomment-2626332717 Hey @zhengruifeng, Could you help review this PR, thx very much. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use t

[PR] SPARK-51048 | Support stop java spark context with exit code [spark]

2025-01-30 Thread via GitHub
prathit06 opened a new pull request, #49746: URL: https://github.com/apache/spark/pull/49746 ### What changes were proposed in this pull request? Considering there is support to stop spark context with required exit code, This PR aims to use the same to add it to java spark context as

Re: [PR] [SPARK-51015][ML][PYTHON][CONNECT] Support RFormulaModel.toString on Connect [spark]

2025-01-30 Thread via GitHub
wbo4958 commented on PR #49745: URL: https://github.com/apache/spark/pull/49745#issuecomment-2626331932 Hi @zhengruifeng , Please take a look at this PR. Thx -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abov

Re: [PR] [SPARK-51045][TESTS][4.0] Regenerate benchmark results after upgrading to Scala 2.13.16 [spark]

2025-01-30 Thread via GitHub
dongjoon-hyun commented on code in PR #49741: URL: https://github.com/apache/spark/pull/49741#discussion_r1936695648 ## sql/catalyst/benchmarks/EnumTypeSetBenchmark-jdk21-results.txt: ## @@ -1,105 +1,105 @@ -OpenJDK 64-Bit Server VM 21.0.5+11-LTS on Linux 6.8.0-1017-azure +OpenJ

Re: [PR] [SPARK-51045][TESTS][4.0] Regenerate benchmark results after upgrading to Scala 2.13.16 [spark]

2025-01-30 Thread via GitHub
dongjoon-hyun commented on code in PR #49741: URL: https://github.com/apache/spark/pull/49741#discussion_r1936695648 ## sql/catalyst/benchmarks/EnumTypeSetBenchmark-jdk21-results.txt: ## @@ -1,105 +1,105 @@ -OpenJDK 64-Bit Server VM 21.0.5+11-LTS on Linux 6.8.0-1017-azure +OpenJ

Re: [PR] [SPARK-50767][SQL] Remove codegen of `from_json` [spark]

2025-01-30 Thread via GitHub
dongjoon-hyun commented on PR #49411: URL: https://github.com/apache/spark/pull/49411#issuecomment-2626323863 Got it. Thank you for the info, @LuciferYang ~ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [PR] [SPARK-50767][SQL] Remove codegen of `from_json` [spark]

2025-01-30 Thread via GitHub
LuciferYang commented on PR #49411: URL: https://github.com/apache/spark/pull/49411#issuecomment-2626294131 > * Based on that the root cause for the performance regression of `SubExprEliminationBenchmark` is not due to the codegen implementation of `from_json`(the root cause is that the ope

Re: [PR] [SPARK-50985][SS] Classify Kafka Timestamp Offsets mismatch error instead of assert and throw error for missing server in KafkaTokenProvider [spark]

2025-01-30 Thread via GitHub
HeartSaVioR commented on code in PR #49662: URL: https://github.com/apache/spark/pull/49662#discussion_r1936652586 ## connector/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaOffsetReaderSuite.scala: ## @@ -203,6 +205,41 @@ class KafkaOffsetReaderSuite extends

[PR] [SPARK-51015][ML][PYTHON][CONNECT] Support RFormulaModel.toString on Connect [spark]

2025-01-30 Thread via GitHub
wbo4958 opened a new pull request, #49745: URL: https://github.com/apache/spark/pull/49745 ### What changes were proposed in this pull request? This PR adds support toString for RFormulaModel on ml Connect. ### Why are the changes needed? Feature parity ###

Re: [PR] [SPARK-49428][SQL] Move Connect Scala Client from Connector to SQL [spark]

2025-01-30 Thread via GitHub
hvanhovell commented on PR #49695: URL: https://github.com/apache/spark/pull/49695#issuecomment-2626206535 Merging to master/4.0. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comm

Re: [PR] [SPARK-49428][SQL] Move Connect Scala Client from Connector to SQL [spark]

2025-01-30 Thread via GitHub
asfgit closed pull request #49695: [SPARK-49428][SQL] Move Connect Scala Client from Connector to SQL URL: https://github.com/apache/spark/pull/49695 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] [SPARK-51045][TESTS] Regenerate benchmark results after upgrading to Scala 2.13.16 [spark]

2025-01-30 Thread via GitHub
dongjoon-hyun commented on PR #49744: URL: https://github.com/apache/spark/pull/49744#issuecomment-2626196778 Currently, two benchmark failures are detected and I'm looking at them. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to Git

[PR] [SPARK-50974][ML][PYTHON][CONNECT] Add support foldCol for CrossValidator on connect [spark]

2025-01-30 Thread via GitHub
wbo4958 opened a new pull request, #49743: URL: https://github.com/apache/spark/pull/49743 ### What changes were proposed in this pull request? This PR adds support foldCol for CrossValidator on connect ### Why are the changes needed? feature parity ### Doe

[PR] Add Java 17 result [spark]

2025-01-30 Thread via GitHub
dongjoon-hyun opened a new pull request, #49744: URL: https://github.com/apache/spark/pull/49744 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### H

Re: [PR] [SPARK-51044][SS] Add ordering related tests for list state [spark]

2025-01-30 Thread via GitHub
anishshri-db commented on PR #49742: URL: https://github.com/apache/spark/pull/49742#issuecomment-2626174325 cc - @HeartSaVioR - PTAL, thx ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the sp

Re: [PR] [SPARK-50767][SQL] Remove codegen of `from_json` [spark]

2025-01-30 Thread via GitHub
dongjoon-hyun commented on PR #49411: URL: https://github.com/apache/spark/pull/49411#issuecomment-2626174459 Hi, @panbingkun , @cloud-fan , @LuciferYang . I ran the benchmark again as a part of regression check Today and I hit this still. - branch-4.0, Java 17: https://github.com/d

[PR] [SPARK-51044] Add ordering related tests for list state [spark]

2025-01-30 Thread via GitHub
anishshri-db opened a new pull request, #49742: URL: https://github.com/apache/spark/pull/49742 ### What changes were proposed in this pull request? Add ordering related tests for list state ### Why are the changes needed? Improve test coverage around relative ordering of ite

[PR] [SPARK-51045][TESTS] Regenerate benchmark results after upgrading to Scala 2.13.16 [spark]

2025-01-30 Thread via GitHub
dongjoon-hyun opened a new pull request, #49741: URL: https://github.com/apache/spark/pull/49741 … ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change?

Re: [PR] [25-01-31] [task] Spark Optimization Guide - Improving Regular Expres… [spark]

2025-01-30 Thread via GitHub
xiaoming12306 commented on PR #49740: URL: https://github.com/apache/spark/pull/49740#issuecomment-2626103201 test -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubs

Re: [PR] [25-01-31] [task] Spark Optimization Guide - Improving Regular Expres… [spark]

2025-01-30 Thread via GitHub
xiaoming12306 closed pull request #49740: [25-01-31] [task] Spark Optimization Guide - Improving Regular Expres… URL: https://github.com/apache/spark/pull/49740 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abov

[PR] [25-01-31] [task] Spark Optimization Guide - Improving Regular Expres… [spark]

2025-01-30 Thread via GitHub
xiaoming12306 opened a new pull request, #49740: URL: https://github.com/apache/spark/pull/49740 …sion Performance ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing cha

Re: [PR] [SPARK-50045][K8S] Spark supports executor HostNetwork on k8s mode [spark]

2025-01-30 Thread via GitHub
github-actions[bot] commented on PR #48568: URL: https://github.com/apache/spark/pull/48568#issuecomment-2626008964 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.

Re: [PR] [SPARK-50043][BUILD] Build should be stopped when Antrun script fails [spark]

2025-01-30 Thread via GitHub
github-actions[bot] commented on PR #48566: URL: https://github.com/apache/spark/pull/48566#issuecomment-2626008987 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.

Re: [PR] [WIP][DO-NOT-MERGE][SPARK-49917][SQL] Enable hash join rewrite rule for collated strings [spark]

2025-01-30 Thread via GitHub
github-actions[bot] closed pull request #48400: [WIP][DO-NOT-MERGE][SPARK-49917][SQL] Enable hash join rewrite rule for collated strings URL: https://github.com/apache/spark/pull/48400 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to Gi

Re: [PR] [SPARK-49774][BUILD] Upgrade guava to 33.3.1-jre [spark]

2025-01-30 Thread via GitHub
github-actions[bot] commented on PR #48233: URL: https://github.com/apache/spark/pull/48233#issuecomment-2626009017 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.

Re: [PR] [25-01-31] [task] Spark Optimization Guide - Improving Regular Expres… [spark]

2025-01-30 Thread via GitHub
xiaoming12306 closed pull request #49739: [25-01-31] [task] Spark Optimization Guide - Improving Regular Expres… URL: https://github.com/apache/spark/pull/49739 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abov

Re: [PR] [25-01-31] [task] Spark Optimization Guide - Improving Regular Expres… [spark]

2025-01-30 Thread via GitHub
xiaoming12306 commented on PR #49739: URL: https://github.com/apache/spark/pull/49739#issuecomment-2625946486 test -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubs

[PR] [25-01-31] [task] Spark Optimization Guide - Improving Regular Expres… [spark]

2025-01-30 Thread via GitHub
xiaoming12306 opened a new pull request, #49739: URL: https://github.com/apache/spark/pull/49739 …sion Performance ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing cha

[PR] [SPARK-51043][SS][CONNECT] Fine grained user logging for Spark Connect foreachBatch [spark]

2025-01-30 Thread via GitHub
WweiL opened a new pull request, #49738: URL: https://github.com/apache/spark/pull/49738 ### What changes were proposed in this pull request? When multiple users are under the same session / multiple sessions are shared on the same server. The current logging makes it very har

Re: [PR] [SPARK-50133][PYTHON][CONNECT] Support DataFrame conversion to table argument in Spark Connect Python Client [spark]

2025-01-30 Thread via GitHub
xinrong-meng commented on code in PR #49424: URL: https://github.com/apache/spark/pull/49424#discussion_r1936345930 ## sql/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala: ## @@ -3835,6 +3835,24 @@ class SparkConnectPlanner(

Re: [PR] [SPARK-50982][SQL] Support more SQL/DataFrame read path functionality in single-pass Analyzer [spark]

2025-01-30 Thread via GitHub
vladimirg-db commented on PR #49658: URL: https://github.com/apache/spark/pull/49658#issuecomment-2625652732 @MaxGekk thanks! resolved. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specifi

Re: [PR] [SPARK-50133][PYTHON][CONNECT] Support DataFrame conversion to table argument in Spark Connect Python Client [spark]

2025-01-30 Thread via GitHub
ueshin commented on code in PR #49424: URL: https://github.com/apache/spark/pull/49424#discussion_r1936198473 ## sql/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala: ## @@ -3835,6 +3835,24 @@ class SparkConnectPlanner( Unreso

Re: [PR] [SPARK-49230][Connect][SQL] Do not return UnboundRowEncoder when not needed [spark]

2025-01-30 Thread via GitHub
hvanhovell commented on code in PR #49339: URL: https://github.com/apache/spark/pull/49339#discussion_r1936181291 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/encoders/ExpressionEncoder.scala: ## @@ -49,7 +49,7 @@ import org.apache.spark.util.Utils object Expres

Re: [PR] [SPARK-49230][Connect][SQL] Do not return UnboundRowEncoder when not needed [spark]

2025-01-30 Thread via GitHub
hvanhovell commented on code in PR #49339: URL: https://github.com/apache/spark/pull/49339#discussion_r1936181291 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/encoders/ExpressionEncoder.scala: ## @@ -49,7 +49,7 @@ import org.apache.spark.util.Utils object Expres

[PR] [SPARK-51042][SQL] Read and write the month and days fields of intervals with one call in Unsafe* classes [spark]

2025-01-30 Thread via GitHub
jonathan-albrecht-ibm opened a new pull request, #49737: URL: https://github.com/apache/spark/pull/49737 ### What changes were proposed in this pull request? Write the month and days fields of intervals with one call to Platform.put/getLong() instead of two calls to Platform.p

Re: [PR] [SPARK-49230][Connect][SQL] Do not return UnboundRowEncoder when not needed [spark]

2025-01-30 Thread via GitHub
hvanhovell commented on code in PR #49339: URL: https://github.com/apache/spark/pull/49339#discussion_r193616 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/encoders/ExpressionEncoder.scala: ## @@ -49,7 +49,7 @@ import org.apache.spark.util.Utils object Expres

Re: [PR] [SPARK-50133][PYTHON][CONNECT] Support DataFrame conversion to table argument in Spark Connect Python Client [spark]

2025-01-30 Thread via GitHub
xinrong-meng commented on code in PR #49424: URL: https://github.com/apache/spark/pull/49424#discussion_r1936150923 ## python/pyspark/sql/connect/expressions.py: ## @@ -1268,7 +1278,37 @@ def to_plan(self, session: "SparkConnectClient") -> proto.Expression: expr.su

Re: [PR] [SPARK-50903][CONNECT] Cache logical plans after analysis [spark]

2025-01-30 Thread via GitHub
hvanhovell commented on code in PR #49584: URL: https://github.com/apache/spark/pull/49584#discussion_r1936138008 ## sql/connect/server/src/main/scala/org/apache/spark/sql/connect/service/SessionHolder.scala: ## @@ -440,46 +443,64 @@ case class SessionHolder(userId: String, sess

Re: [PR] [SPARK-50133][PYTHON][CONNECT] Support DataFrame conversion to table argument in Spark Connect Python Client [spark]

2025-01-30 Thread via GitHub
xinrong-meng commented on code in PR #49424: URL: https://github.com/apache/spark/pull/49424#discussion_r1936129646 ## python/pyspark/sql/tests/test_udtf.py: ## @@ -1144,33 +1144,33 @@ def eval(self, row: Row): ) with self.assertRaisesRegex( -Ille

Re: [PR] [SPARK-50133][PYTHON][CONNECT] Support DataFrame conversion to table argument in Spark Connect Python Client [spark]

2025-01-30 Thread via GitHub
xinrong-meng commented on code in PR #49424: URL: https://github.com/apache/spark/pull/49424#discussion_r1936127710 ## python/pyspark/sql/connect/table_arg.py: ## @@ -0,0 +1,101 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license ag

Re: [PR] [SPARK-50903][CONNECT] Cache logical plans after analysis [spark]

2025-01-30 Thread via GitHub
hvanhovell commented on code in PR #49584: URL: https://github.com/apache/spark/pull/49584#discussion_r1936121699 ## sql/connect/server/src/main/scala/org/apache/spark/sql/connect/service/SessionHolder.scala: ## @@ -440,46 +443,64 @@ case class SessionHolder(userId: String, sess

Re: [PR] [SPARK-50903][CONNECT] Cache logical plans after analysis [spark]

2025-01-30 Thread via GitHub
hvanhovell commented on code in PR #49584: URL: https://github.com/apache/spark/pull/49584#discussion_r1936120463 ## sql/connect/server/src/main/scala/org/apache/spark/sql/connect/service/SessionHolder.scala: ## @@ -440,46 +443,64 @@ case class SessionHolder(userId: String, sess

Re: [PR] [SPARK-51029][BUILD] Remove `hive-llap-common` compile dependency [spark]

2025-01-30 Thread via GitHub
dongjoon-hyun commented on PR #49725: URL: https://github.com/apache/spark/pull/49725#issuecomment-2625171917 Thank you. We can discuss more during the QA and RC period in order to get the final decision~ -- This is an automated message from the Apache Git Service. To respond to the messa

Re: [PR] [SPARK-51029][BUILD] Remove `hive-llap-common` compile dependency [spark]

2025-01-30 Thread via GitHub
LuciferYang commented on PR #49725: URL: https://github.com/apache/spark/pull/49725#issuecomment-2625110311 Understood, fine to me. Thank you @dongjoon-hyun -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abov

Re: [PR] [SPARK-51041][BUILD] Add `hive-llap-client` and `hive-llap-common` as test dependency of `hive-thriftserver` [spark]

2025-01-30 Thread via GitHub
LuciferYang commented on PR #49736: URL: https://github.com/apache/spark/pull/49736#issuecomment-2625085284 Thanks @dongjoon-hyun -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific com

Re: [PR] [SPARK-51029][BUILD] Remove `hive-llap-common` compile dependency [spark]

2025-01-30 Thread via GitHub
dongjoon-hyun commented on PR #49725: URL: https://github.com/apache/spark/pull/49725#issuecomment-2625072277 For example, this kind of risk issue. - https://github.com/apache/spark/security/dependabot/112 -- This is an automated message from the Apache Git Service. To respond to the me

Re: [PR] [SPARK-51041][BUILD] Add `hive-llap-client` and `hive-llap-common` as test dependency of `hive-thriftserver` [spark]

2025-01-30 Thread via GitHub
dongjoon-hyun closed pull request #49736: [SPARK-51041][BUILD] Add `hive-llap-client` and `hive-llap-common` as test dependency of `hive-thriftserver` URL: https://github.com/apache/spark/pull/49736 -- This is an automated message from the Apache Git Service. To respond to the message, pleas

Re: [PR] [SPARK-51021] Add log throttler [spark]

2025-01-30 Thread via GitHub
asfgit closed pull request #49712: [SPARK-51021] Add log throttler URL: https://github.com/apache/spark/pull/49712 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscrib

Re: [PR] [SPARK-51041][BUILD] Add `hive-llap-client` and `hive-llap-common` as test dependency of `hive-thriftserver` [spark]

2025-01-30 Thread via GitHub
dongjoon-hyun commented on PR #49736: URL: https://github.com/apache/spark/pull/49736#issuecomment-2625077011 I verified this manually. ``` $ build/mvn -Phive-thriftserver install -DskipTests $ build/mvn -pl sql/hive-thriftserver -Phive-thriftserver install -fae ... HiveThriftB

Re: [PR] [SPARK-51029][BUILD] Remove `hive-llap-common` compile dependency [spark]

2025-01-30 Thread via GitHub
dongjoon-hyun commented on PR #49725: URL: https://github.com/apache/spark/pull/49725#issuecomment-2625068666 The purpose of this PR is to eliminate the risk from Apache Spark side and to give a full freedom to users to take it or deploy with the patched `hive-llap-common`. For examp

Re: [PR] [SPARK-51041][BUILD] Add `hive-llap-client` and `hive-llap-common` as test dependency of `hive-thriftserver` [spark]

2025-01-30 Thread via GitHub
dongjoon-hyun commented on PR #49736: URL: https://github.com/apache/spark/pull/49736#issuecomment-2625060385 I'll reply on the original PR, @LuciferYang ~ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [PR] [SPARK-51029][BUILD] Remove `hive-llap-common` compile dependency [spark]

2025-01-30 Thread via GitHub
dongjoon-hyun commented on PR #49725: URL: https://github.com/apache/spark/pull/49725#issuecomment-2625054321 Yes, as I wrote in the PR description, UDF parts are affected, @LuciferYang . -- This is an automated message from the Apache Git Service. To respond to the message, please log on

Re: [PR] [SPARK-51029][BUILD] Remove `hive-llap-common` compile dependency [spark]

2025-01-30 Thread via GitHub
LuciferYang commented on PR #49725: URL: https://github.com/apache/spark/pull/49725#issuecomment-2625049205 Based on my analysis at https://github.com/apache/spark/pull/49736#issuecomment-2625039106, it seems that the compile-scope dependency on llap-common cannot be removed. -- This is

Re: [PR] [SPARK-51041][BUILD] Add `hive-llap-client` and `hive-llap-common` as test dependency of `hive-thriftserver` [spark]

2025-01-30 Thread via GitHub
LuciferYang commented on PR #49736: URL: https://github.com/apache/spark/pull/49736#issuecomment-2625039106 1. `GenericUDTFGetSplits` imports `org.apache.hadoop.hive.llap.security.LlapSigner`, which comes from llap-common. https://github.com/apache/hive/blob/5160d3af392248255f68e41

Re: [PR] [SPARK-51041][BUILD] Add `hive-llap-client` and `hive-llap-common` as test dependency of `hive-thriftserver` [spark]

2025-01-30 Thread via GitHub
dongjoon-hyun commented on PR #49736: URL: https://github.com/apache/spark/pull/49736#issuecomment-2624973098 Thank you so much, @LuciferYang . I also looking at those failure. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub a

Re: [PR] [SPARK-51029][BUILD] Remove `hive-llap-common` compile dependency [spark]

2025-01-30 Thread via GitHub
LuciferYang commented on PR #49725: URL: https://github.com/apache/spark/pull/49725#issuecomment-2624949334 Due to the lack of these test dependencies of hive-llap-client and hive-llap-common, testing hive-thriftserver using Maven will hang at ``` Discovery starting. 2025-01-29

[PR] [SPARK-51041][BUILD] Add `hive-llap-client` and `hive-llap-common` as test dependency of `hive-thriftserver` [spark]

2025-01-30 Thread via GitHub
LuciferYang opened a new pull request, #49736: URL: https://github.com/apache/spark/pull/49736 ### What changes were proposed in this pull request? This pr aims to add `hive-llap-client` and `hive-llap-common` as test dependency of `hive-thriftserver` ### Why are the changes needed

Re: [PR] [SPARK-49872][CORE] allow unlimited json size again [spark]

2025-01-30 Thread via GitHub
steven-aerts commented on PR #49163: URL: https://github.com/apache/spark/pull/49163#issuecomment-2624869021 > According to the CI results, this PR seems to introduce a binary compatibility issue. > > ``` > [info] spark-examples: mimaPreviousArtifacts not set, not analyzing binary

Re: [PR] [SPARK-50793][SQL] Fix MySQL cast function for DOUBLE, LONGTEXT, BIGINT and BLOB types [spark]

2025-01-30 Thread via GitHub
sunxiaoguang commented on code in PR #49453: URL: https://github.com/apache/spark/pull/49453#discussion_r1935844132 ## connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/MySQLIntegrationSuite.scala: ## @@ -241,6 +241,61 @@ class MySQLIntegrationSuite

Re: [PR] [SPARK-51030][CORE][TESTS] Add a check before `Utils.deleteRecursively(tempDir)` to ensure `tempDir` won't be cleaned up by the ShutdownHook in `afterEach` of `LocalRootDirsTest` [spark]

2025-01-30 Thread via GitHub
LuciferYang commented on PR #49723: URL: https://github.com/apache/spark/pull/49723#issuecomment-2624847560 Thank you @dongjoon-hyun -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-51030][CORE][TESTS] Add a check before `Utils.deleteRecursively(tempDir)` to ensure `tempDir` won't be cleaned up by the ShutdownHook in `afterEach` of `LocalRootDirsTest` [spark]

2025-01-30 Thread via GitHub
dongjoon-hyun commented on PR #49723: URL: https://github.com/apache/spark/pull/49723#issuecomment-2624845091 Merged to master/4.0. Thank you, @LuciferYang . -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abov

Re: [PR] [SPARK-51030][CORE][TESTS] Add a check before `Utils.deleteRecursively(tempDir)` to ensure `tempDir` won't be cleaned up by the ShutdownHook in `afterEach` of `LocalRootDirsTest` [spark]

2025-01-30 Thread via GitHub
dongjoon-hyun closed pull request #49723: [SPARK-51030][CORE][TESTS] Add a check before `Utils.deleteRecursively(tempDir)` to ensure `tempDir` won't be cleaned up by the ShutdownHook in `afterEach` of `LocalRootDirsTest` URL: https://github.com/apache/spark/pull/49723 -- This is an automated

[PR] [SPARK-51040] Enforce determinism when assigning implicit aliases to collation types [spark]

2025-01-30 Thread via GitHub
mihailotim-db opened a new pull request, #49735: URL: https://github.com/apache/spark/pull/49735 ### What changes were proposed in this pull request? This PR proposes a change that enforces determinism when assigning implicit aliases to collation types. ### Why are the

Re: [PR] [SPARK-51038][INFRA] Add `branch-4.0` to daily `Publish Snapshot` GitHub Action job [spark]

2025-01-30 Thread via GitHub
dongjoon-hyun commented on PR #49732: URL: https://github.com/apache/spark/pull/49732#issuecomment-2624735072 Thank you for the checking! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the speci

Re: [PR] [SPARK-51021] Add log throttler [spark]

2025-01-30 Thread via GitHub
hvanhovell commented on code in PR #49712: URL: https://github.com/apache/spark/pull/49712#discussion_r1935673955 ## common/utils/src/main/scala/org/apache/spark/internal/Logging.scala: ## @@ -531,3 +533,156 @@ private[spark] object Logging { override def isStopped: Boolean

Re: [PR] [SPARK-51021] Add log throttler [spark]

2025-01-30 Thread via GitHub
hvanhovell commented on code in PR #49712: URL: https://github.com/apache/spark/pull/49712#discussion_r1935670931 ## common/utils/src/main/scala/org/apache/spark/internal/Logging.scala: ## @@ -531,3 +533,156 @@ private[spark] object Logging { override def isStopped: Boolean

[PR] [SPARK-50881] Use cached schema instead of deep copying to retrieve names of columns [spark]

2025-01-30 Thread via GitHub
garlandz-db opened a new pull request, #49734: URL: https://github.com/apache/spark/pull/49734 ### What changes were proposed in this pull request? * instead of deep copying the schema for every call to `columns` we just deep copy the names ### Why are the changes neede

Re: [PR] [DRAFT][SPARK-48375][SQL] Add support for SIGNAL statement [spark]

2025-01-30 Thread via GitHub
davidm-db commented on code in PR #49726: URL: https://github.com/apache/spark/pull/49726#discussion_r1935518962 ## sql/catalyst/src/main/scala/org/apache/spark/sql/exceptions/SqlScriptingRuntimeException.scala: ## @@ -0,0 +1,57 @@ +/* + * Licensed to the Apache Software Foundat

Re: [PR] [DRAFT][SPARK-48375][SQL] Add support for SIGNAL statement [spark]

2025-01-30 Thread via GitHub
davidm-db commented on code in PR #49726: URL: https://github.com/apache/spark/pull/49726#discussion_r1935515323 ## sql/core/src/main/scala/org/apache/spark/sql/scripting/SqlScriptingExecution.scala: ## @@ -66,22 +88,20 @@ class SqlScriptingExecution( None } - /** -

Re: [PR] [SPARK-50982][SQL] Support more SQL/DataFrame read path functionality in single-pass Analyzer [spark]

2025-01-30 Thread via GitHub
MaxGekk commented on code in PR #49658: URL: https://github.com/apache/spark/pull/49658#discussion_r1935368749 ## sql/core/src/test/scala/org/apache/spark/sql/analysis/resolver/ViewResolverSuite.scala: ## @@ -0,0 +1,190 @@ +/* + * Licensed to the Apache Software Foundation (ASF)

Re: [PR] [SPARK-49230][Connect][SQL] Do not return UnboundRowEncoder when not needed [spark]

2025-01-30 Thread via GitHub
xupefei commented on code in PR #49339: URL: https://github.com/apache/spark/pull/49339#discussion_r1935348809 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/encoders/ExpressionEncoder.scala: ## @@ -49,7 +49,7 @@ import org.apache.spark.util.Utils object Expressio

Re: [PR] [SPARK-51038][INFRA] Add `branch-4.0` to daily `Publish Snapshot` GitHub Action job [spark]

2025-01-30 Thread via GitHub
LuciferYang commented on PR #49732: URL: https://github.com/apache/spark/pull/49732#issuecomment-2623896355 ![image](https://github.com/user-attachments/assets/8d07ed1c-e6cc-47f1-b417-3232aa128454) ![image](https://github.com/user-attachments/assets/4e64bcbf-b505-47bb-b904-ac80584164