date:20250213

Re: [PR] [SPARK-51179][SQL] Refactor SupportsOrderingWithinGroup so that centralized check [spark]

2025-02-13 Thread via GitHub



beliefer commented on PR #49908:
URL: https://github.com/apache/spark/pull/49908#issuecomment-2655798525

   ping @cloud-fan cc @mikhailnik-db 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-42746][SQL][FOLLOWUP] Improve the code for SupportsOrderingWithinGroup and Mode [spark]

2025-02-13 Thread via GitHub



cloud-fan commented on code in PR #49907:
URL: https://github.com/apache/spark/pull/49907#discussion_r1954008498


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/percentiles.scala:
##
@@ -390,10 +390,6 @@ case class PercentileCont(left: Expression, right: 
Expression, reverse: Boolean
   override protected def withNewChildrenInternal(
   newLeft: Expression, newRight: Expression): PercentileCont =
 this.copy(left = newLeft, right = newRight)
-
-  override def orderingFilled: Boolean = left != UnresolvedWithinGroup

Review Comment:
   Why do we make such a change?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-51067][SQL] Revert session level collation for DML queries and apply object level collation for DDL queries [spark]

2025-02-13 Thread via GitHub



cloud-fan commented on code in PR #49772:
URL: https://github.com/apache/spark/pull/49772#discussion_r1954019289


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveDDLCommandStringTypes.scala:
##
@@ -155,22 +123,22 @@ object ResolveDefaultStringTypes extends 
Rule[LogicalPlan] {
 dataType.existsRecursively(isDefaultStringType)
 
   private def isDefaultStringType(dataType: DataType): Boolean = {
+// STRING (without explicit collation) is considered default string type.
+// STRING COLLATE  (with explicit collation) is not 
considered
+// default string type even when explicit collation is UTF8_BINARY 
(default collation).
 dataType match {
-  case st: StringType =>
-// should only return true for StringType object and not 
StringType("UTF8_BINARY")
-st.eq(StringType) || st.isInstanceOf[TemporaryStringType]
+  // should only return true for StringType object and not for 
StringType("UTF8_BINARY")
+  case st: StringType => st.eq(StringType)
   case _ => false
 }
   }
 
   private def replaceDefaultStringType(dataType: DataType, newType: 
StringType): DataType = {
+// Should replace STRING with the new type.
+// Should not replace STRING COLLATE UTF8_BINARY, as that is explicit 
collation.
 dataType.transformRecursively {
   case currentType: StringType if isDefaultStringType(currentType) =>
-if (currentType == newType) {
-  TemporaryStringType()
-} else {
-  newType
-}
+newType

Review Comment:
   or can we move this rule to `sql/core` as an extended resolution rule in 
https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/internal/BaseSessionStateBuilder.scala#L212



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [WIP][SPARK-7101][SQL] support java.sql.Time [spark]

2025-02-13 Thread via GitHub



MaxGekk commented on PR #28858:
URL: https://github.com/apache/spark/pull/28858#issuecomment-2655900699

   I sent the SPIP https://issues.apache.org/jira/browse/SPARK-51162 to dev 
list for discussion 
([link](https://lists.apache.org/thread/892vkskktqrx1czk9wm6l8vchpydrny2)). 
@younggyuchun @dongjoon-hyun @HyukjinKwon   Please, leave comments in the 
thread if you are still interested in the feature.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-48530][SQL] Support for local variables in SQL Scripting [spark]

2025-02-13 Thread via GitHub



cloud-fan commented on code in PR #49445:
URL: https://github.com/apache/spark/pull/49445#discussion_r1954078549


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/ParserUtils.scala:
##
@@ -350,3 +357,12 @@ class SqlScriptingLabelContext {
 }
   }
 }
+
+object SqlScriptingLabelContext {
+  private val forbiddenLabelNames: immutable.Set[Regex] =
+immutable.Set("builtin".r, "session".r, "sys.*".r)

Review Comment:
   will it match `*builtin*`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-51210][CORE] Add `--enable-native-access=ALL-UNNAMED` to Java options for Java 24+ [spark]

2025-02-13 Thread via GitHub



dongjoon-hyun closed pull request #49944: [SPARK-51210][CORE] Add 
`--enable-native-access=ALL-UNNAMED` to Java options for Java 24+
URL: https://github.com/apache/spark/pull/49944


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-51210][CORE] Add `--enable-native-access=ALL-UNNAMED` to Java options for Java 24+ [spark]

2025-02-13 Thread via GitHub



dongjoon-hyun commented on PR #49944:
URL: https://github.com/apache/spark/pull/49944#issuecomment-2658003644

   Thank you for review and approval, @viirya !


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-48375][SQL] Add support for SIGNAL statement [spark]

2025-02-13 Thread via GitHub



cloud-fan commented on code in PR #49726:
URL: https://github.com/apache/spark/pull/49726#discussion_r1955489143


##
sql/core/src/main/scala/org/apache/spark/sql/scripting/SqlScriptingExecutionNode.scala:
##
@@ -1020,3 +1024,107 @@ class ExceptionHandlerExec(
 
   override def reset(): Unit = body.reset()
 }
+
+/**
+ * Executable node for Signal Statement.
+ * @param errorCondition Name of the error condition/SQL State for error that 
will be thrown.
+ * @param sqlState SQL State of the error that will be thrown.
+ * @param message Error message (either string or variable name).
+ * @param isBuiltinError Whether the error condition is a builtin condition.
+ * @param msgArguments Error message parameters for builtin conditions.
+ * @param session Spark session that SQL script is executed within.
+ * @param origin Origin descriptor for the statement.
+ */
+class SignalStatementExec(
+val errorCondition: String,
+val sqlState: String,
+val message: Either[String, UnresolvedAttribute],

Review Comment:
   I checked the error condition and handler ones. They don't have expressions 
to resolve. Can you provide other examples to justify your approach?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-42746][SQL][FOLLOWUP] Improve the code for SupportsOrderingWithinGroup and Mode [spark]

2025-02-13 Thread via GitHub



cloud-fan commented on code in PR #49907:
URL: https://github.com/apache/spark/pull/49907#discussion_r1955422516


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/percentiles.scala:
##
@@ -390,10 +390,6 @@ case class PercentileCont(left: Expression, right: 
Expression, reverse: Boolean
   override protected def withNewChildrenInternal(
   newLeft: Expression, newRight: Expression): PercentileCont =
 this.copy(left = newLeft, right = newRight)
-
-  override def orderingFilled: Boolean = left != UnresolvedWithinGroup

Review Comment:
   why it's not necessary? `orderingFilled` should be true if 
`UnresolvedWithinGroup` is resolved and replaced?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-51210][CORE] Add `--enable-native-access=ALL-UNNAMED` to Java options for Java 24+ [spark]

2025-02-13 Thread via GitHub



LuciferYang commented on PR #49944:
URL: https://github.com/apache/spark/pull/49944#issuecomment-2658349981

   late LGTM


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[PR] [SPARK-51215][ML][PYTHON][CONNECT] Add a helper function to invoke helper model attr [spark]

2025-02-13 Thread via GitHub



zhengruifeng opened a new pull request, #49951:
URL: https://github.com/apache/spark/pull/49951

   
   
   ### What changes were proposed in this pull request?
   Add a helper function to invoke helper model attr 
   
   ### Why are the changes needed?
   deduplicate code 
   
   
   ### Does this PR introduce _any_ user-facing change?
   no, minor refactor
   
   
   ### How was this patch tested?
   existing tests
   
   ### Was this patch authored or co-authored using generative AI tooling?
   no


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-51213][SQL] Keep Expression class info when resolving hint parameters [spark]

2025-02-13 Thread via GitHub



yaooqinn commented on PR #49950:
URL: https://github.com/apache/spark/pull/49950#issuecomment-2658350458

   Thank you @dongjoon-hyun, I've passed the failed test locally. Let's run 
another round in GA.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-51119][SQL][FOLLOW-UP] ColumnDefinition.toV1Column should preserve EXISTS_DEFAULT resolution [spark]

2025-02-13 Thread via GitHub



cloud-fan commented on PR #49947:
URL: https://github.com/apache/spark/pull/49947#issuecomment-2658350592

   thanks, merging to master/4.0!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-51119][SQL][FOLLOW-UP] ColumnDefinition.toV1Column should preserve EXISTS_DEFAULT resolution [spark]

2025-02-13 Thread via GitHub



cloud-fan closed pull request #49947: [SPARK-51119][SQL][FOLLOW-UP] 
ColumnDefinition.toV1Column should preserve EXISTS_DEFAULT resolution
URL: https://github.com/apache/spark/pull/49947


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-51204][BUILD] Upgrade `sbt-assembly` to 2.3.1 [spark]

2025-02-13 Thread via GitHub



pan3793 commented on PR #49939:
URL: https://github.com/apache/spark/pull/49939#issuecomment-2658374563

   According to the release notes, it is better to go 4.0 too.
   
   > - Fixes assemblyOutputPath
   > - Fixes assemblyExcludedJars
   
   Spark uses these features, but I am not sure if it is affected, cc 
@LuciferYang you may have more context.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-51209][CORE][FOLLOWUP] Use `user.name` system property first as a fallback [spark]

2025-02-13 Thread via GitHub



dongjoon-hyun closed pull request #49949: [SPARK-51209][CORE][FOLLOWUP] Use 
`user.name` system property first as a fallback
URL: https://github.com/apache/spark/pull/49949


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-51209][CORE][FOLLOWUP] Use `user.name` system property first as a fallback [spark]

2025-02-13 Thread via GitHub



dongjoon-hyun commented on PR #49949:
URL: https://github.com/apache/spark/pull/49949#issuecomment-2658378854

   `Document generations` passed at the last commit.
   Merged to master.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-51208][SQL] `ColumnDefinition.toV1Column` should preserve `EXISTS_DEFAULT` resolution [spark]

2025-02-13 Thread via GitHub



cloud-fan commented on code in PR #49942:
URL: https://github.com/apache/spark/pull/49942#discussion_r1955382758


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/ColumnDefinition.scala:
##
@@ -75,7 +75,7 @@ case class ColumnDefinition(
   // For v1 CREATE TABLE command, we will resolve and execute the default 
value expression later
   // in the rule `DataSourceAnalysis`. We just need to put the default 
value SQL string here.
   metadataBuilder.putString(CURRENT_DEFAULT_COLUMN_METADATA_KEY, 
default.originalSQL)
-  metadataBuilder.putString(EXISTS_DEFAULT_COLUMN_METADATA_KEY, 
default.originalSQL)
+  metadataBuilder.putString(EXISTS_DEFAULT_COLUMN_METADATA_KEY, 
default.child.sql)

Review Comment:
   Note that, `Expression#sql` is for display only, and it's not guaranteed to 
be able to parse back. I think we should only use `child.sql` here if `child` 
is a literal, which means this `ColumnDefinition` has been resolved and 
constant-folded.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[PR] [SPARK-51212][PYTHON] Add a separated PySpark package for Spark Connect by default [spark]

2025-02-13 Thread via GitHub



ueshin opened a new pull request, #49946:
URL: https://github.com/apache/spark/pull/49946

   ### What changes were proposed in this pull request?
   
   Adds a separated PySpark package for Spark Connect by default.
   
   - Rename `pyspark-connect` to `pyspark-client`
   - Add a new `pyspark-connect` that depends on `pyspark` but use Spark 
Connect by default
   
   ### Why are the changes needed?
   
   As discussed in the dev list, we will have an additional package for Spark 
Connect by default.
   
   ### Does this PR introduce _any_ user-facing change?
   
   Yes, there will be three packages for PySpark.
   
   - `pyspark`: contains the whole package to run PySpark, Spark Classic by 
default
   - `pyspark-connect`: depends on `pyspark`, Spark Connect by default
   - `pyspark-client`: only contains Python files to work with Spark Connect
   
   ### How was this patch tested?
   
   Manually.
   
   ### Was this patch authored or co-authored using generative AI tooling?
   
   No.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-51208][SQL] `ColumnDefinition.toV1Column` should preserve `EXISTS_DEFAULT` resolution [spark]

2025-02-13 Thread via GitHub



cloud-fan commented on code in PR #49942:
URL: https://github.com/apache/spark/pull/49942#discussion_r1955421786


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/ColumnDefinition.scala:
##
@@ -75,7 +75,7 @@ case class ColumnDefinition(
   // For v1 CREATE TABLE command, we will resolve and execute the default 
value expression later
   // in the rule `DataSourceAnalysis`. We just need to put the default 
value SQL string here.
   metadataBuilder.putString(CURRENT_DEFAULT_COLUMN_METADATA_KEY, 
default.originalSQL)
-  metadataBuilder.putString(EXISTS_DEFAULT_COLUMN_METADATA_KEY, 
default.originalSQL)
+  metadataBuilder.putString(EXISTS_DEFAULT_COLUMN_METADATA_KEY, 
default.child.sql)

Review Comment:
   When we convert v2 CREATE TABLE command to v1, `ColumnDefinition` is only 
resolved but not constant-folded (still in the analyzer phase). 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-51185][Core] Revert simplifications to PartitionedFileUtil API to reduce memory requirements [spark]

2025-02-13 Thread via GitHub



LukasRupprecht commented on PR #49915:
URL: https://github.com/apache/spark/pull/49915#issuecomment-2658110441

   Thanks @cloud-fan for merging this! Will prepare a separate PR for 3.5.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-51177][PYTHON][CONNECT] Add `InvalidCommandInput` to Spark Connect Python client [spark]

2025-02-13 Thread via GitHub



itholic closed pull request #49916: [SPARK-51177][PYTHON][CONNECT] Add 
`InvalidCommandInput` to Spark Connect Python client
URL: https://github.com/apache/spark/pull/49916


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-51177][PYTHON][CONNECT] Add `InvalidCommandInput` to Spark Connect Python client [spark]

2025-02-13 Thread via GitHub



itholic commented on PR #49916:
URL: https://github.com/apache/spark/pull/49916#issuecomment-2658076422

   Merged to master and branch-4.0
   Thanks @HyukjinKwon @zhengruifeng for the review.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-48375][SQL] Add support for SIGNAL statement [spark]

2025-02-13 Thread via GitHub



cloud-fan commented on code in PR #49726:
URL: https://github.com/apache/spark/pull/49726#discussion_r1955489998


##
sql/core/src/main/scala/org/apache/spark/sql/scripting/SqlScriptingExecutionNode.scala:
##
@@ -1020,3 +1024,107 @@ class ExceptionHandlerExec(
 
   override def reset(): Unit = body.reset()
 }
+
+/**
+ * Executable node for Signal Statement.
+ * @param errorCondition Name of the error condition/SQL State for error that 
will be thrown.
+ * @param sqlState SQL State of the error that will be thrown.
+ * @param message Error message (either string or variable name).
+ * @param isBuiltinError Whether the error condition is a builtin condition.
+ * @param msgArguments Error message parameters for builtin conditions.
+ * @param session Spark session that SQL script is executed within.
+ * @param origin Origin descriptor for the statement.
+ */
+class SignalStatementExec(
+val errorCondition: String,
+val sqlState: String,
+val message: Either[String, UnresolvedAttribute],
+val msgArguments: Option[UnresolvedAttribute],
+val isBuiltinError: Boolean,
+val session: SparkSession,
+override val origin: Origin)
+  extends LeafStatementExec
+  with ColumnResolutionHelper with WithOrigin {

Review Comment:
   The execution node of SIGNAL just gets the string value by calling `.eval()` 
of the message expression, which can be either literal or variables.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-51095][CORE][SQL] Include caller context for hdfs audit logs for calls from driver [spark]

2025-02-13 Thread via GitHub



pan3793 commented on code in PR #49814:
URL: https://github.com/apache/spark/pull/49814#discussion_r1955489715


##
sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala:
##
@@ -171,6 +172,11 @@ private[hive] class HiveClientImpl(
   private def newState(): SessionState = {
 val hiveConf = newHiveConf(sparkConf, hadoopConf, extraConfig, 
Some(initClassLoader))
 val state = new SessionState(hiveConf)
+// When SessionState is initialized, the caller context is overridden by 
hive
+// so we need to reset it back to the DRIVER

Review Comment:
   @sririshindra Thanks for your effort in testing it! For Iceberg cases, you 
can use Spark 3.5 to test, AFAIK, there are no changes related to CallerContext 
in the Spark 4.0 period



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-51095][CORE][SQL] Include caller context for hdfs audit logs for calls from driver [spark]

2025-02-13 Thread via GitHub



sririshindra commented on code in PR #49814:
URL: https://github.com/apache/spark/pull/49814#discussion_r1955487024


##
sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala:
##
@@ -171,6 +172,11 @@ private[hive] class HiveClientImpl(
   private def newState(): SessionState = {
 val hiveConf = newHiveConf(sparkConf, hadoopConf, extraConfig, 
Some(initClassLoader))
 val state = new SessionState(hiveConf)
+// When SessionState is initialized, the caller context is overridden by 
hive
+// so we need to reset it back to the DRIVER

Review Comment:
   @pan3793 , @cnauroth I was finally able to properly test on this upstream 
version in a docker based Cluster with this current branch. Looks like the 
change in hiveClinetImpl is not needed in Spark 4. I checked if the caller 
context is being set during the sessionSate initialization in HiveClinetImpl 
and it looks like it is not. So, Once the CallerContext is set inside the 
SparkContext class its is not being overridden by anything else from the Driver 
process. 
   
   ```
   2025-02-14 02:26:23,249 INFO FSNamesystem.audit: allowed=true   ugi=root 
(auth:SIMPLE)  ip=/192.168.97.4cmd=getfileinfo src=/warehouse/sample   
dst=nullperm=null   proto=rpc   
callerContext=SPARK_DRIVER_application_1739496632907_0005
   2025-02-14 02:26:23,265 INFO FSNamesystem.audit: allowed=true   ugi=root 
(auth:SIMPLE)  ip=/192.168.97.4cmd=listStatus  src=/warehouse/sample   
dst=nullperm=null   proto=rpc   
callerContext=SPARK_DRIVER_application_1739496632907_0005
   2025-02-14 02:26:25,519 INFO FSNamesystem.audit: allowed=true   ugi=root 
(auth:SIMPLE)  ip=/192.168.97.5cmd=open
src=/warehouse/sample/part-0-dd473344-76b1-4179-91ae-d15a8da4a888-c000  
dst=nullperm=null   proto=rpc   
callerContext=SPARK_TASK_application_1739496632907_0005_JId_0_SId_0_0_TId_0_0
   2025-02-14 02:26:26,345 INFO FSNamesystem.audit: allowed=true   ugi=root 
(auth:SIMPLE)  ip=/192.168.97.5cmd=open
src=/warehouse/sample/part-0-dd473344-76b1-4179-91ae-d15a8da4a888-c000  
dst=nullperm=null   proto=rpc   
callerContext=SPARK_TASK_application_1739496632907_0005_JId_1_SId_1_0_TId_1_0
   ```
   I wasn't able to test with Iceberg though. This is because Iceberg doesn't 
support Spark4 yet. Once an Iceberg release with Spark 4 support is released, I 
will retest it and make any changes needed in a separate PR. But For now, I 
removed the change that was in HiveClientImpl.scala .  



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-50655][SS] Move virtual col family related mapping into db layer instead of encoder [spark]

2025-02-13 Thread via GitHub



anishshri-db commented on code in PR #49304:
URL: https://github.com/apache/spark/pull/49304#discussion_r1955675678


##
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDBFileManager.scala:
##
@@ -931,7 +941,8 @@ case class RocksDBCheckpointMetadata(
 sstFiles: Seq[RocksDBSstFile],
 logFiles: Seq[RocksDBLogFile],
 numKeys: Long,
-columnFamilyMapping: Option[Map[String, Short]] = None,
+numInternalKeys: Long,
+columnFamilyMapping: Option[Map[String, ColumnFamilyInfo]] = None,

Review Comment:
   Done - added format validation tests for serde



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-51210][CORE] Add `--enable-native-access=ALL-UNNAMED` to Java options for Java 24+ [spark]

2025-02-13 Thread via GitHub



dongjoon-hyun commented on PR #49944:
URL: https://github.com/apache/spark/pull/49944#issuecomment-2657968814

   Could you review this `Java option` PR when you have some time, @viirya ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-51208][SQL] `ColumnDefinition.toV1Column` should preserve `EXISTS_DEFAULT` resolution [spark]

2025-02-13 Thread via GitHub



cloud-fan commented on code in PR #49942:
URL: https://github.com/apache/spark/pull/49942#discussion_r1955383049


##
sql/catalyst/src/test/scala/org/apache/spark/sql/types/StructTypeSuite.scala:
##
@@ -798,4 +800,35 @@ class StructTypeSuite extends SparkFunSuite with SQLHelper 
{
 assert(
   mapper.readTree(mapWithNestedArray.json) == 
mapper.readTree(expectedJson))
   }
+
+  test("SPARK-51208: ColumnDefinition.toV1Column should preserve 
EXISTS_DEFAULT resolution") {
+
+def validateConvertedDefaults(
+  colName: String,

Review Comment:
   super nit: 4 spaces indentation



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-51209][CORE] Improve `getCurrentUserName` to handle Java 24+ [spark]

2025-02-13 Thread via GitHub



dongjoon-hyun closed pull request #49943: [SPARK-51209][CORE] Improve 
`getCurrentUserName` to handle Java 24+
URL: https://github.com/apache/spark/pull/49943


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-51209][CORE] Improve `getCurrentUserName` to handle Java 24+ [spark]

2025-02-13 Thread via GitHub



dongjoon-hyun commented on PR #49943:
URL: https://github.com/apache/spark/pull/49943#issuecomment-2657967822

   Merged to master for Apache Spark 4.1.0.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-51209][CORE] Improve `getCurrentUserName` to handle Java 24+ [spark]

2025-02-13 Thread via GitHub



dongjoon-hyun commented on PR #49943:
URL: https://github.com/apache/spark/pull/49943#issuecomment-2657966885

   Thank you, @huaxingao !


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-51204][BUILD] Upgrade `sbt-assembly` to 2.3.1 [spark]

2025-02-13 Thread via GitHub



wayneguow commented on PR #49939:
URL: https://github.com/apache/spark/pull/49939#issuecomment-2658092268

   > Could you re-trigger the failed K8s test, @wayneguow ? Although it looks 
irrelevant, let's make it sure.
   
   After re-running, it was successful. However, this page does not yet show 
that the check has been passed. There may be some delay. 
https://github.com/wayneguow/spark/actions/runs/13312413853/job/37202892468
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[PR] [SPARK-51119][SQL][FOLLOW-UP] ColumnDefinition.toV1Column should preserve EXISTS_DEFAULT resolution [spark]

2025-02-13 Thread via GitHub



szehon-ho opened a new pull request, #49947:
URL: https://github.com/apache/spark/pull/49947

   Small follow up for: https://github.com/apache/spark/pull/49942
   
   
   ### What changes were proposed in this pull request?
   Restrict the logic of https://github.com/apache/spark/pull/49942 to only 
resolved and folded.
   
   ### Why are the changes needed?
   Previous part of PR may have unexpected behavior if expression is not 
resolved/folded.
   
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   ### How was this patch tested?
   Existing unit test
   
   
   ### Was this patch authored or co-authored using generative AI tooling?
   No
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-51216][BUILD] Remove the useless `bigtop-dist` profile and the related outdated files [spark]

2025-02-13 Thread via GitHub



LuciferYang commented on PR #49952:
URL: https://github.com/apache/spark/pull/49952#issuecomment-2658399233

   cc @pan3793 @dongjoon-hyun 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[PR] [BUILD] Remove the useless `bigtop-dist` profile [spark]

2025-02-13 Thread via GitHub



LuciferYang opened a new pull request, #49952:
URL: https://github.com/apache/spark/pull/49952

   
   
   ### What changes were proposed in this pull request?
   
   
   
   ### Why are the changes needed?
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   
   ### How was this patch tested?
   
   
   
   ### Was this patch authored or co-authored using generative AI tooling?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-51186][PYTHON] Add `StreamingPythonRunnerInitializationException` to PySpark base exception [spark]

2025-02-13 Thread via GitHub



itholic closed pull request #49917: [SPARK-51186][PYTHON] Add 
`StreamingPythonRunnerInitializationException` to PySpark base exception
URL: https://github.com/apache/spark/pull/49917


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-51186][PYTHON] Add `StreamingPythonRunnerInitializationException` to PySpark base exception [spark]

2025-02-13 Thread via GitHub



itholic commented on PR #49917:
URL: https://github.com/apache/spark/pull/49917#issuecomment-2658414570

   Merged to master and branch-4.0


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-46937][SQL] Improve concurrency performance for FunctionRegistry [spark]

2025-02-13 Thread via GitHub



beliefer commented on code in PR #47084:
URL: https://github.com/apache/spark/pull/47084#discussion_r1955643680


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala:
##
@@ -306,11 +296,9 @@ class SimpleFunctionRegistry
 extends SimpleFunctionRegistryBase[Expression]
 with FunctionRegistry {
 
-  override def clone(): SimpleFunctionRegistry = synchronized {
+  override def clone(): SimpleFunctionRegistry = {

Review Comment:
   Yes. I know it.
   ConcurrentHashMap creates the iterator with weakly consistent and it will 
not causes concurrent modification exception.
   I think the clone for SimpleFunctionRegistry allows the concurrent 
modification.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-46937][SQL] Improve concurrency performance for FunctionRegistry [spark]

2025-02-13 Thread via GitHub



beliefer commented on code in PR #47084:
URL: https://github.com/apache/spark/pull/47084#discussion_r1955643680


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala:
##
@@ -306,11 +296,9 @@ class SimpleFunctionRegistry
 extends SimpleFunctionRegistryBase[Expression]
 with FunctionRegistry {
 
-  override def clone(): SimpleFunctionRegistry = synchronized {
+  override def clone(): SimpleFunctionRegistry = {

Review Comment:
   Yes. I know it.
   `ConcurrentHashMap` creates the iterator with weakly consistent and it will 
not causes concurrent modification exception.
   I think the clone for `SimpleFunctionRegistry` allows the concurrent 
modification.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-51067][SQL] Revert session level collation for DML queries and apply object level collation for DDL queries [spark]

2025-02-13 Thread via GitHub



cloud-fan commented on code in PR #49772:
URL: https://github.com/apache/spark/pull/49772#discussion_r1954022933


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveDDLCommandStringTypes.scala:
##
@@ -18,80 +18,48 @@
 package org.apache.spark.sql.catalyst.analysis
 
 import org.apache.spark.sql.catalyst.expressions.{Cast, Expression, Literal}
-import org.apache.spark.sql.catalyst.plans.logical.{AddColumns, AlterColumns, 
AlterColumnSpec, AlterViewAs, ColumnDefinition, CreateView, LogicalPlan, 
QualifiedColType, ReplaceColumns, V1CreateTablePlan, V2CreateTablePlan}
-import org.apache.spark.sql.catalyst.rules.{Rule, RuleExecutor}
+import org.apache.spark.sql.catalyst.plans.logical.{AddColumns, AlterColumns, 
AlterColumnSpec, AlterTableCommand, AlterViewAs, ColumnDefinition, CreateTable, 
CreateView, LogicalPlan, QualifiedColType, ReplaceColumns, V1CreateTablePlan, 
V2CreateTablePlan}
+import org.apache.spark.sql.catalyst.rules.Rule
+import org.apache.spark.sql.connector.catalog.TableCatalog
 import org.apache.spark.sql.types.{DataType, StringType}
 
 /**
- * Resolves default string types in queries and commands. For queries, the 
default string type is
- * determined by the session's default string type. For DDL, the default 
string type is the
- * default type of the object (table -> schema -> catalog). However, this is 
not implemented yet.
- * So, we will just use UTF8_BINARY for now.
+ * Resolves string types in DDL commands, where the string type inherits the
+ * collation from the corresponding object (table/view -> schema -> catalog).
  */
-object ResolveDefaultStringTypes extends Rule[LogicalPlan] {
+object ResolveDDLCommandStringTypes extends Rule[LogicalPlan] {
   def apply(plan: LogicalPlan): LogicalPlan = {
-val newPlan = apply0(plan)
-if (plan.ne(newPlan)) {
-  // Due to how tree transformations work and StringType object being 
equal to
-  // StringType("UTF8_BINARY"), we need to transform the plan twice
-  // to ensure the correct results for occurrences of default string type.
-  val finalPlan = apply0(newPlan)
-  RuleExecutor.forceAdditionalIteration(finalPlan)
-  finalPlan
-} else {
-  newPlan
-}
-  }
-
-  private def apply0(plan: LogicalPlan): LogicalPlan = {
 if (isDDLCommand(plan)) {
   transformDDL(plan)
 } else {
-  transformPlan(plan, sessionDefaultStringType)
+  // For non-DDL commands no need to do any further resolution of string 
types
+  plan
 }
   }
 
-  /**
-   * Returns whether any of the given `plan` needs to have its
-   * default string type resolved.
-   */
-  def needsResolution(plan: LogicalPlan): Boolean = {
-if (!isDDLCommand(plan) && isDefaultSessionCollationUsed) {
-  return false
+  /** Default collation used, if object level collation is not provided */
+  private def defaultCollation: String = "UTF8_BINARY"
+
+  /** Returns the string type that should be used in a given DDL command */
+  private def stringTypeForDDLCommand(table: LogicalPlan): StringType = {
+table match {
+  case createTable: CreateTable if 
createTable.tableSpec.collation.isDefined =>
+StringType(createTable.tableSpec.collation.get)
+  case createView: CreateView if createView.collation.isDefined =>
+StringType(createView.collation.get)
+  case alterTable: AlterTableCommand if alterTable.table.resolved =>
+val collation = Option(alterTable
+  .table.asInstanceOf[ResolvedTable]
+  .table.properties.get(TableCatalog.PROP_COLLATION))
+if (collation.isDefined) {
+  StringType(collation.get)
+} else {
+  StringType(defaultCollation)
+}
+  case _ => StringType(defaultCollation)
 }
-
-plan.exists(node => needsResolution(node.expressions))
-  }
-
-  /**
-   * Returns whether any of the given `expressions` needs to have its
-   * default string type resolved.
-   */
-  def needsResolution(expressions: Seq[Expression]): Boolean = {
-expressions.exists(needsResolution)
   }
 
-  /**
-   * Returns whether the given `expression` needs to have its
-   * default string type resolved.
-   */
-  def needsResolution(expression: Expression): Boolean = {
-expression.exists(e => transformExpression.isDefinedAt(e))
-  }
-
-  private def isDefaultSessionCollationUsed: Boolean = conf.defaultStringType 
== StringType
-
-  /**
-   * Returns the default string type that should be used in a given DDL 
command (for now always
-   * UTF8_BINARY).
-   */
-  private def stringTypeForDDLCommand(table: LogicalPlan): StringType =
-StringType("UTF8_BINARY")
-
-  /** Returns the session default string type */
-  private def sessionDefaultStringType: StringType =
-StringType(conf.defaultStringType.collationId)
-
   private def isDDLCommand(plan: LogicalPlan): Boolean = plan exists {
 case _: AddColumns | _: ReplaceColumns | _: AlterColumns => true
 case _ => isCreateOrAlterPlan(pl

Re: [PR] [SPARK-51146][SQL][FOLLOWUP] Respect system env `SPARK_CONNECT_MODE` in places that access the api mode config [spark]

2025-02-13 Thread via GitHub



cloud-fan commented on PR #49930:
URL: https://github.com/apache/spark/pull/49930#issuecomment-2655862681

   @dongjoon-hyun I'd like the cut rc1 on time to kick off the release cycle. 
We can fail rc1 if there are many pending work, which will likely happen.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-48530][SQL] Support for local variables in SQL Scripting [spark]

2025-02-13 Thread via GitHub



cloud-fan commented on code in PR #49445:
URL: https://github.com/apache/spark/pull/49445#discussion_r1954045545


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ColumnResolutionHelper.scala:
##
@@ -266,22 +275,41 @@ trait ColumnResolutionHelper extends Logging with 
DataTypeErrorsBase {
   }
 }
 
-if (maybeTempVariableName(nameParts)) {
-  val variableName = if (conf.caseSensitiveAnalysis) {
-nameParts.last
-  } else {
-nameParts.last.toLowerCase(Locale.ROOT)
-  }
-  catalogManager.tempVariableManager.get(variableName).map { varDef =>
+val namePartsCaseAdjusted = if (conf.caseSensitiveAnalysis) {
+  nameParts
+} else {
+  nameParts.map(_.toLowerCase(Locale.ROOT))
+}
+
+SqlScriptingLocalVariableManager.get()
+  // If we are in EXECUTE IMMEDIATE lookup only session variables.
+  .filterNot(_ => AnalysisContext.get.isExecuteImmediate)
+  // If variable name is qualified with session. treat it as a 
session variable.
+  .filter(_ =>
+nameParts.length <= 2 && 
nameParts.init.map(_.toLowerCase(Locale.ROOT)) != Seq("session"))

Review Comment:
   can we reuse the function that validates the label name? We can do something 
like
   ```
   .filterNot(nameParts => nameParts.length > 2 || (naneParts.length == 2 && 
!isValidLabelName(naneParts.head)))
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-28973][SQL] Add `TimeType` and support `java.time.LocalTime` as its external type. [spark]

2025-02-13 Thread via GitHub



MaxGekk commented on PR #25678:
URL: https://github.com/apache/spark/pull/25678#issuecomment-2655891624

   I sent the SPIP https://issues.apache.org/jira/browse/SPARK-51162 to dev 
list for discussion 
([link](https://lists.apache.org/thread/892vkskktqrx1czk9wm6l8vchpydrny2)). 
@zeddit @Fokko @tundraraj @redblackcoder @melin @hurelhuyag @younggyuchun 
Please, leave comments in the thread if you are still interested in the feature.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [WIP][SPARK-24815] [CORE] Trigger Interval based DRA for Structured Streaming [spark]

2025-02-13 Thread via GitHub



WebdeveloperIsaac commented on PR #42352:
URL: https://github.com/apache/spark/pull/42352#issuecomment-2655892185

   Can this be reopened +1? this feature will be of great help please don't 
keep this parked..


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-48530][SQL] Support for local variables in SQL Scripting [spark]

2025-02-13 Thread via GitHub



dusantism-db commented on code in PR #49445:
URL: https://github.com/apache/spark/pull/49445#discussion_r1954447510


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ColumnResolutionHelper.scala:
##
@@ -266,22 +275,41 @@ trait ColumnResolutionHelper extends Logging with 
DataTypeErrorsBase {
   }
 }
 
-if (maybeTempVariableName(nameParts)) {
-  val variableName = if (conf.caseSensitiveAnalysis) {
-nameParts.last
-  } else {
-nameParts.last.toLowerCase(Locale.ROOT)
-  }
-  catalogManager.tempVariableManager.get(variableName).map { varDef =>
+val namePartsCaseAdjusted = if (conf.caseSensitiveAnalysis) {
+  nameParts
+} else {
+  nameParts.map(_.toLowerCase(Locale.ROOT))
+}
+
+SqlScriptingLocalVariableManager.get()
+  // If we are in EXECUTE IMMEDIATE lookup only session variables.
+  .filterNot(_ => AnalysisContext.get.isExecuteImmediate)
+  // If variable name is qualified with session. treat it as a 
session variable.
+  .filter(_ =>
+nameParts.length <= 2 && 
nameParts.init.map(_.toLowerCase(Locale.ROOT)) != Seq("session"))

Review Comment:
   The `validSessionVariableName` function will return different results 
depending on nameParts length. In your case we only call it when length is 2, 
which is correct behavior, but a little confusing in my opinion. The way it is 
now is easier to understand IMO.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-42746][SQL][FOLLOWUP] Improve the code for SupportsOrderingWithinGroup and Mode [spark]

2025-02-13 Thread via GitHub



beliefer commented on code in PR #49907:
URL: https://github.com/apache/spark/pull/49907#discussion_r1954137856


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/percentiles.scala:
##
@@ -390,10 +390,6 @@ case class PercentileCont(left: Expression, right: 
Expression, reverse: Boolean
   override protected def withNewChildrenInternal(
   newLeft: Expression, newRight: Expression): PercentileCont =
 this.copy(left = newLeft, right = newRight)
-
-  override def orderingFilled: Boolean = left != UnresolvedWithinGroup

Review Comment:
   Before the PR https://github.com/apache/spark/pull/48748, the default value 
is false.
   That is a change not necessary!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[PR] [SPARK-51202][ML][PYTHON] Pass the session in meta algorithm python writers [spark]

2025-02-13 Thread via GitHub



zhengruifeng opened a new pull request, #49932:
URL: https://github.com/apache/spark/pull/49932

   ### What changes were proposed in this pull request?
   Pass the session in meta algorithm python writers
   
   
   ### Why are the changes needed?
   try the best to avoid recreating sessions
   
   
   ### Does this PR introduce _any_ user-facing change?
   no, internal change
   
   
   ### How was this patch tested?
   existing test should cover
   
   
   ### Was this patch authored or co-authored using generative AI tooling?
   no
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-51067][SQL] Revert session level collation for DML queries and apply object level collation for DDL queries [spark]

2025-02-13 Thread via GitHub



dejankrak-db commented on code in PR #49772:
URL: https://github.com/apache/spark/pull/49772#discussion_r1954127700


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveDDLCommandStringTypes.scala:
##
@@ -18,80 +18,48 @@
 package org.apache.spark.sql.catalyst.analysis
 
 import org.apache.spark.sql.catalyst.expressions.{Cast, Expression, Literal}
-import org.apache.spark.sql.catalyst.plans.logical.{AddColumns, AlterColumns, 
AlterColumnSpec, AlterViewAs, ColumnDefinition, CreateView, LogicalPlan, 
QualifiedColType, ReplaceColumns, V1CreateTablePlan, V2CreateTablePlan}
-import org.apache.spark.sql.catalyst.rules.{Rule, RuleExecutor}
+import org.apache.spark.sql.catalyst.plans.logical.{AddColumns, AlterColumns, 
AlterColumnSpec, AlterTableCommand, AlterViewAs, ColumnDefinition, CreateTable, 
CreateView, LogicalPlan, QualifiedColType, ReplaceColumns, V1CreateTablePlan, 
V2CreateTablePlan}
+import org.apache.spark.sql.catalyst.rules.Rule
+import org.apache.spark.sql.connector.catalog.TableCatalog
 import org.apache.spark.sql.types.{DataType, StringType}
 
 /**
- * Resolves default string types in queries and commands. For queries, the 
default string type is
- * determined by the session's default string type. For DDL, the default 
string type is the
- * default type of the object (table -> schema -> catalog). However, this is 
not implemented yet.
- * So, we will just use UTF8_BINARY for now.
+ * Resolves string types in DDL commands, where the string type inherits the
+ * collation from the corresponding object (table/view -> schema -> catalog).
  */
-object ResolveDefaultStringTypes extends Rule[LogicalPlan] {
+object ResolveDDLCommandStringTypes extends Rule[LogicalPlan] {
   def apply(plan: LogicalPlan): LogicalPlan = {
-val newPlan = apply0(plan)
-if (plan.ne(newPlan)) {
-  // Due to how tree transformations work and StringType object being 
equal to
-  // StringType("UTF8_BINARY"), we need to transform the plan twice
-  // to ensure the correct results for occurrences of default string type.
-  val finalPlan = apply0(newPlan)
-  RuleExecutor.forceAdditionalIteration(finalPlan)
-  finalPlan
-} else {
-  newPlan
-}
-  }
-
-  private def apply0(plan: LogicalPlan): LogicalPlan = {
 if (isDDLCommand(plan)) {
   transformDDL(plan)
 } else {
-  transformPlan(plan, sessionDefaultStringType)
+  // For non-DDL commands no need to do any further resolution of string 
types
+  plan
 }
   }
 
-  /**
-   * Returns whether any of the given `plan` needs to have its
-   * default string type resolved.
-   */
-  def needsResolution(plan: LogicalPlan): Boolean = {
-if (!isDDLCommand(plan) && isDefaultSessionCollationUsed) {
-  return false
+  /** Default collation used, if object level collation is not provided */
+  private def defaultCollation: String = "UTF8_BINARY"
+
+  /** Returns the string type that should be used in a given DDL command */
+  private def stringTypeForDDLCommand(table: LogicalPlan): StringType = {
+table match {
+  case createTable: CreateTable if 
createTable.tableSpec.collation.isDefined =>
+StringType(createTable.tableSpec.collation.get)
+  case createView: CreateView if createView.collation.isDefined =>
+StringType(createView.collation.get)
+  case alterTable: AlterTableCommand if alterTable.table.resolved =>
+val collation = Option(alterTable
+  .table.asInstanceOf[ResolvedTable]
+  .table.properties.get(TableCatalog.PROP_COLLATION))
+if (collation.isDefined) {
+  StringType(collation.get)
+} else {
+  StringType(defaultCollation)
+}
+  case _ => StringType(defaultCollation)
 }
-
-plan.exists(node => needsResolution(node.expressions))
-  }
-
-  /**
-   * Returns whether any of the given `expressions` needs to have its
-   * default string type resolved.
-   */
-  def needsResolution(expressions: Seq[Expression]): Boolean = {
-expressions.exists(needsResolution)
   }
 
-  /**
-   * Returns whether the given `expression` needs to have its
-   * default string type resolved.
-   */
-  def needsResolution(expression: Expression): Boolean = {
-expression.exists(e => transformExpression.isDefinedAt(e))
-  }
-
-  private def isDefaultSessionCollationUsed: Boolean = conf.defaultStringType 
== StringType
-
-  /**
-   * Returns the default string type that should be used in a given DDL 
command (for now always
-   * UTF8_BINARY).
-   */
-  private def stringTypeForDDLCommand(table: LogicalPlan): StringType =
-StringType("UTF8_BINARY")
-
-  /** Returns the session default string type */
-  private def sessionDefaultStringType: StringType =
-StringType(conf.defaultStringType.collationId)
-
   private def isDDLCommand(plan: LogicalPlan): Boolean = plan exists {
 case _: AddColumns | _: ReplaceColumns | _: AlterColumns => true
 case _ => isCreateOrAlterPlan

Re: [PR] [SPARK-51067][SQL] Revert session level collation for DML queries and apply object level collation for DDL queries [spark]

2025-02-13 Thread via GitHub



dejankrak-db commented on code in PR #49772:
URL: https://github.com/apache/spark/pull/49772#discussion_r1954134354


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveDDLCommandStringTypes.scala:
##
@@ -18,80 +18,48 @@
 package org.apache.spark.sql.catalyst.analysis
 
 import org.apache.spark.sql.catalyst.expressions.{Cast, Expression, Literal}
-import org.apache.spark.sql.catalyst.plans.logical.{AddColumns, AlterColumns, 
AlterColumnSpec, AlterViewAs, ColumnDefinition, CreateView, LogicalPlan, 
QualifiedColType, ReplaceColumns, V1CreateTablePlan, V2CreateTablePlan}
-import org.apache.spark.sql.catalyst.rules.{Rule, RuleExecutor}
+import org.apache.spark.sql.catalyst.plans.logical.{AddColumns, AlterColumns, 
AlterColumnSpec, AlterTableCommand, AlterViewAs, ColumnDefinition, CreateTable, 
CreateView, LogicalPlan, QualifiedColType, ReplaceColumns, V1CreateTablePlan, 
V2CreateTablePlan}
+import org.apache.spark.sql.catalyst.rules.Rule
+import org.apache.spark.sql.connector.catalog.TableCatalog
 import org.apache.spark.sql.types.{DataType, StringType}
 
 /**
- * Resolves default string types in queries and commands. For queries, the 
default string type is
- * determined by the session's default string type. For DDL, the default 
string type is the
- * default type of the object (table -> schema -> catalog). However, this is 
not implemented yet.
- * So, we will just use UTF8_BINARY for now.
+ * Resolves string types in DDL commands, where the string type inherits the
+ * collation from the corresponding object (table/view -> schema -> catalog).
  */
-object ResolveDefaultStringTypes extends Rule[LogicalPlan] {
+object ResolveDDLCommandStringTypes extends Rule[LogicalPlan] {
   def apply(plan: LogicalPlan): LogicalPlan = {
-val newPlan = apply0(plan)
-if (plan.ne(newPlan)) {
-  // Due to how tree transformations work and StringType object being 
equal to
-  // StringType("UTF8_BINARY"), we need to transform the plan twice
-  // to ensure the correct results for occurrences of default string type.
-  val finalPlan = apply0(newPlan)
-  RuleExecutor.forceAdditionalIteration(finalPlan)
-  finalPlan
-} else {
-  newPlan
-}
-  }
-
-  private def apply0(plan: LogicalPlan): LogicalPlan = {
 if (isDDLCommand(plan)) {
   transformDDL(plan)
 } else {
-  transformPlan(plan, sessionDefaultStringType)
+  // For non-DDL commands no need to do any further resolution of string 
types
+  plan
 }
   }
 
-  /**
-   * Returns whether any of the given `plan` needs to have its
-   * default string type resolved.
-   */
-  def needsResolution(plan: LogicalPlan): Boolean = {
-if (!isDDLCommand(plan) && isDefaultSessionCollationUsed) {
-  return false
+  /** Default collation used, if object level collation is not provided */
+  private def defaultCollation: String = "UTF8_BINARY"
+
+  /** Returns the string type that should be used in a given DDL command */
+  private def stringTypeForDDLCommand(table: LogicalPlan): StringType = {
+table match {
+  case createTable: CreateTable if 
createTable.tableSpec.collation.isDefined =>
+StringType(createTable.tableSpec.collation.get)
+  case createView: CreateView if createView.collation.isDefined =>
+StringType(createView.collation.get)
+  case alterTable: AlterTableCommand if alterTable.table.resolved =>
+val collation = Option(alterTable
+  .table.asInstanceOf[ResolvedTable]
+  .table.properties.get(TableCatalog.PROP_COLLATION))
+if (collation.isDefined) {
+  StringType(collation.get)
+} else {
+  StringType(defaultCollation)
+}
+  case _ => StringType(defaultCollation)

Review Comment:
   Removed V1CreateTablePlan trait altogether, this can be handled separately 
in a follow up PR as discussed.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-51067][SQL] Revert session level collation for DML queries and apply object level collation for DDL queries [spark]

2025-02-13 Thread via GitHub



dejankrak-db commented on code in PR #49772:
URL: https://github.com/apache/spark/pull/49772#discussion_r1954132769


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveDDLCommandStringTypes.scala:
##
@@ -155,22 +123,22 @@ object ResolveDefaultStringTypes extends 
Rule[LogicalPlan] {
 dataType.existsRecursively(isDefaultStringType)
 
   private def isDefaultStringType(dataType: DataType): Boolean = {
+// STRING (without explicit collation) is considered default string type.
+// STRING COLLATE  (with explicit collation) is not 
considered
+// default string type even when explicit collation is UTF8_BINARY 
(default collation).
 dataType match {
-  case st: StringType =>
-// should only return true for StringType object and not 
StringType("UTF8_BINARY")
-st.eq(StringType) || st.isInstanceOf[TemporaryStringType]
+  // should only return true for StringType object and not for 
StringType("UTF8_BINARY")
+  case st: StringType => st.eq(StringType)
   case _ => false
 }
   }
 
   private def replaceDefaultStringType(dataType: DataType, newType: 
StringType): DataType = {
+// Should replace STRING with the new type.
+// Should not replace STRING COLLATE UTF8_BINARY, as that is explicit 
collation.
 dataType.transformRecursively {
   case currentType: StringType if isDefaultStringType(currentType) =>
-if (currentType == newType) {
-  TemporaryStringType()
-} else {
-  newType
-}
+newType

Review Comment:
   As discussed, I have removed V1CreateTablePlan trait for now, as supporting 
v2 CREATE TABLE command is more critical, and we can handle v1 CREATE TABLE 
command separately in a follow up change down the road, following the proposed 
approach.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-51202][ML][PYTHON] Pass the session in meta algorithm python writers [spark]

2025-02-13 Thread via GitHub



zhengruifeng commented on PR #49932:
URL: https://github.com/apache/spark/pull/49932#issuecomment-2656017987

   This PR is for master-only (4.1)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[PR] [SPARK-49479][CORE] Cancel the Timer non-daemon thread on stopping the BarrierCoordinator [spark]

2025-02-13 Thread via GitHub

bcheena opened a new pull request, #49934:
URL: https://github.com/apache/spark/pull/49934

### What changes were proposed in this pull request?
Cancelling the `Timer` non-daemon thread on stopping the `BarrierCoordinator`

### Why are the changes needed?
In Barrier Execution Mode, Spark driver JVM could hang around after calling
spark.stop(). Although the Spark Context was shutdown, the JVM was still
running. The reason was that there is a non-daemon timer thread named
`BarrierCoordinator barrier epoch increment timer`, which prevented the driver
JVM from stopping.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Run the following scripts locally using s./bin/spark-submit. Without this
change, the JVM would hang there and not exit. With this change it would exit
successfully.
1.
[barrier_example.py](https://gist.github.com/jshmchenxi/5d7e7c61e1eedd03cd6d676699059e9b#file-barrier_example-py)

2.
[xgboost-test.py](https://gist.github.com/bcheena/510230e19120eb9ae631dcafa804409f).

### Was this patch authored or co-authored using generative AI tooling?
No

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-51198][CORE][DOCS] Revise `defaultMinPartitions` function description [spark]

2025-02-13 Thread via GitHub



yaooqinn closed pull request #49929: [SPARK-51198][CORE][DOCS] Revise 
`defaultMinPartitions` function description
URL: https://github.com/apache/spark/pull/49929


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-51200][BUILD] Add SparkR deprecation info to `README.md` and `make-distribution.sh` help [spark]

2025-02-13 Thread via GitHub



yaooqinn commented on PR #49931:
URL: https://github.com/apache/spark/pull/49931#issuecomment-2656165484

   Merged to master and branch-4.0, thank you @dongjoon-hyun 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-51181] [SQL] Enforce determinism when pulling out non deterministic expressions from logical plan [spark]

2025-02-13 Thread via GitHub



mihailoale-db commented on PR #49935:
URL: https://github.com/apache/spark/pull/49935#issuecomment-2656322301

   @MaxGekk @cloud-fan ptal when you have time. Thanks


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[PR] [SPARK-51203] Enhance force optimize skewed join [spark]

2025-02-13 Thread via GitHub



wForget opened a new pull request, #49936:
URL: https://github.com/apache/spark/pull/49936

   
   
   ### What changes were proposed in this pull request?
   
   
   
   ### Why are the changes needed?
   
   ForceOptimizeSkewedJoin allows optimizing skewed join even if
   introduce extra shuffle, but currently it only works for aggregation after 
join, not for aggregations in children of join. Like:
   
   ```
   HashAggregate
        |
     Exchange 
        |
   HashAggregate     Exchange (skewed side)
        |               |
      Sort            Sort
        \               /
    SortMergeJoin(isSkewJoin = true) 
   ```
   
   I want to introduce extra shuffle for join child so as to optimize skewed 
join when forceOptimizeSkewedJoin is enabled? Like:
   
   ```
     HashAggregate
          |
       Exchange 
          |
     HashAggregate     
          |
       Exchange
   (froce extra shuffle)       Exchange (skewed side)
          |                       |
        Sort                     Sort
          \                      /
         SortMergeJoin(isSkewJoin = true) 
   ```
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   
   ### How was this patch tested?
   
   
   added unit test
   
   ### Was this patch authored or co-authored using generative AI tooling?
   
   No


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-51181] [SQL] Enforce determinism when pulling out non deterministic expressions from logical plan [spark]

2025-02-13 Thread via GitHub



mihailotim-db commented on code in PR #49935:
URL: https://github.com/apache/spark/pull/49935#discussion_r1954499398


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/PullOutNondeterministic.scala:
##
@@ -55,9 +57,9 @@ object PullOutNondeterministic extends Rule[LogicalPlan] {
   val nondeterToAttr =
 
NondeterministicExpressionCollection.getNondeterministicToAttributes(p.expressions)
   val newPlan = p.transformExpressions { case e =>
-nondeterToAttr.get(e).map(_.toAttribute).getOrElse(e)
+Option(nondeterToAttr.get(e)).map(_.toAttribute).getOrElse(e)
   }
-  val newChild = Project(p.child.output ++ nondeterToAttr.values, p.child)
+  val newChild = Project(p.child.output ++ 
nondeterToAttr.values.asScala.toSeq, p.child)

Review Comment:
   Let's not use `asScala` because it copies the entire map



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-51181] [SQL] Enforce determinism when pulling out non deterministic expressions from logical plan [spark]

2025-02-13 Thread via GitHub



mihailotim-db commented on code in PR #49935:
URL: https://github.com/apache/spark/pull/49935#discussion_r1954499398


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/PullOutNondeterministic.scala:
##
@@ -55,9 +57,9 @@ object PullOutNondeterministic extends Rule[LogicalPlan] {
   val nondeterToAttr =
 
NondeterministicExpressionCollection.getNondeterministicToAttributes(p.expressions)
   val newPlan = p.transformExpressions { case e =>
-nondeterToAttr.get(e).map(_.toAttribute).getOrElse(e)
+Option(nondeterToAttr.get(e)).map(_.toAttribute).getOrElse(e)
   }
-  val newChild = Project(p.child.output ++ nondeterToAttr.values, p.child)
+  val newChild = Project(p.child.output ++ 
nondeterToAttr.values.asScala.toSeq, p.child)

Review Comment:
   Let's not use `asScala` because it copies the entire map



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-51183][SQL] Link to Parquet spec in Variant docs [spark]

2025-02-13 Thread via GitHub



cashmand commented on PR #49910:
URL: https://github.com/apache/spark/pull/49910#issuecomment-2656729838

   Hi @dongjoon-hyun, I updated the text to clarify the current status of the 
project. Let me know what you think.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-51146][SQL][FOLLOWUP] Respect system env `SPARK_CONNECT_MODE` in places that access the api mode config [spark]

2025-02-13 Thread via GitHub



dongjoon-hyun commented on PR #49930:
URL: https://github.com/apache/spark/pull/49930#issuecomment-2656819482

   Ya, I agree with you, @cloud-fan .
   > I'd like the cut rc1 on time to kick off the release cycle. We can fail 
rc1 if there are many pending work, which will likely happen.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-51200][BUILD] Add SparkR deprecation info to `README.md` and `make-distribution.sh` help [spark]

2025-02-13 Thread via GitHub



dongjoon-hyun commented on PR #49931:
URL: https://github.com/apache/spark/pull/49931#issuecomment-2656806168

   Thank you, @yaooqinn !


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-51198][CORE][DOCS] Revise `defaultMinPartitions` function description [spark]

2025-02-13 Thread via GitHub



dongjoon-hyun commented on PR #49929:
URL: https://github.com/apache/spark/pull/49929#issuecomment-2656805060

   Thank you, @yaooqinn !


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-51201][SQL] Make Partitioning Hints support byte and short values [spark]

2025-02-13 Thread via GitHub



dongjoon-hyun commented on code in PR #49933:
URL: https://github.com/apache/spark/pull/49933#discussion_r1954654468


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveHints.scala:
##
@@ -174,9 +174,23 @@ object ResolveHints {
* COALESCE Hint accepts names "COALESCE", "REPARTITION", and 
"REPARTITION_BY_RANGE".
*/
   object ResolveCoalesceHints extends Rule[LogicalPlan] {
-
-val COALESCE_HINT_NAMES: Set[String] =
-  Set("COALESCE", "REPARTITION", "REPARTITION_BY_RANGE", "REBALANCE")
+private def getNumOfPartitions(hint: UnresolvedHint): (Option[Int], 
Seq[Any]) = {
+  hint.parameters match {
+case Seq(ByteLiteral(numPartitions), _*) =>
+  (Some(numPartitions.toInt), hint.parameters.tail)
+case Seq(ShortLiteral(numPartitions), _*) =>
+  (Some(numPartitions.toInt), hint.parameters.tail)
+case Seq(IntegerLiteral(numPartitions), _*) => (Some(numPartitions), 
hint.parameters.tail)

Review Comment:
   nit, This could be two line like the above two cases.



##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveHints.scala:
##
@@ -174,9 +174,23 @@ object ResolveHints {
* COALESCE Hint accepts names "COALESCE", "REPARTITION", and 
"REPARTITION_BY_RANGE".
*/
   object ResolveCoalesceHints extends Rule[LogicalPlan] {
-
-val COALESCE_HINT_NAMES: Set[String] =
-  Set("COALESCE", "REPARTITION", "REPARTITION_BY_RANGE", "REBALANCE")
+private def getNumOfPartitions(hint: UnresolvedHint): (Option[Int], 
Seq[Any]) = {
+  hint.parameters match {
+case Seq(ByteLiteral(numPartitions), _*) =>
+  (Some(numPartitions.toInt), hint.parameters.tail)
+case Seq(ShortLiteral(numPartitions), _*) =>
+  (Some(numPartitions.toInt), hint.parameters.tail)
+case Seq(IntegerLiteral(numPartitions), _*) => (Some(numPartitions), 
hint.parameters.tail)

Review Comment:
   nit, This could be two line like the above two cases.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-51201][SQL] Make Partitioning Hints support byte and short values [spark]

2025-02-13 Thread via GitHub



dongjoon-hyun commented on code in PR #49933:
URL: https://github.com/apache/spark/pull/49933#discussion_r1954656932


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveHints.scala:
##
@@ -174,9 +174,23 @@ object ResolveHints {
* COALESCE Hint accepts names "COALESCE", "REPARTITION", and 
"REPARTITION_BY_RANGE".
*/
   object ResolveCoalesceHints extends Rule[LogicalPlan] {
-
-val COALESCE_HINT_NAMES: Set[String] =
-  Set("COALESCE", "REPARTITION", "REPARTITION_BY_RANGE", "REBALANCE")
+private def getNumOfPartitions(hint: UnresolvedHint): (Option[Int], 
Seq[Any]) = {
+  hint.parameters match {
+case Seq(ByteLiteral(numPartitions), _*) =>
+  (Some(numPartitions.toInt), hint.parameters.tail)
+case Seq(ShortLiteral(numPartitions), _*) =>
+  (Some(numPartitions.toInt), hint.parameters.tail)
+case Seq(IntegerLiteral(numPartitions), _*) => (Some(numPartitions), 
hint.parameters.tail)
+case _ => (None, hint.parameters)
+  }
+}

Review Comment:
   nit. New empty line between two methods.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-51181] [SQL] Enforce determinism when pulling out non deterministic expressions from logical plan [spark]

2025-02-13 Thread via GitHub



cloud-fan commented on PR #49935:
URL: https://github.com/apache/spark/pull/49935#issuecomment-2656875411

   thanks, merging to master/4.0!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-51181] [SQL] Enforce determinism when pulling out non deterministic expressions from logical plan [spark]

2025-02-13 Thread via GitHub



cloud-fan closed pull request #49935: [SPARK-51181] [SQL] Enforce determinism 
when pulling out non deterministic expressions from logical plan
URL: https://github.com/apache/spark/pull/49935


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-51067][SQL] Revert session level collation for DML queries and apply object level collation for DDL queries [spark]

2025-02-13 Thread via GitHub



cloud-fan commented on PR #49772:
URL: https://github.com/apache/spark/pull/49772#issuecomment-2656883176

   thanks, merging to master/4.0!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-51067][SQL] Revert session level collation for DML queries and apply object level collation for DDL queries [spark]

2025-02-13 Thread via GitHub



cloud-fan closed pull request #49772: [SPARK-51067][SQL] Revert session level 
collation for DML queries and apply object level collation for DDL queries
URL: https://github.com/apache/spark/pull/49772


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[PR] [SPARK-51113][SQL] Fix correctness with UNION/EXCEPT/INTERSECT inside a view or EXECUTE IMMEDIATE [spark]

2025-02-13 Thread via GitHub



vladimirg-db opened a new pull request, #49937:
URL: https://github.com/apache/spark/pull/49937

   ### What changes were proposed in this pull request?
   
   Fix correctness with UNION/EXCEPT/INTERSECT inside a view or `EXECUTE 
IMMEDIATE`.
   
   In the following examples the SQL Parser considers UNION/EXCEPT/INTERSECT 
keywords as aliases and drops the rest of the query:
   
   ```
   spark.sql("CREATE OR REPLACE VIEW v1 AS SELECT 1 AS col1 UNION SELECT 2 
UNION SELECT 3 UNION SELECT 4")
   spark.sql("SELECT * FROM v1").show()
   spark.sql("SELECT * FROM v1").queryExecution.analyzed
   
   spark.sql("CREATE OR REPLACE VIEW v1 AS SELECT 1 AS col1 EXCEPT SELECT 2 
EXCEPT SELECT 1 EXCEPT SELECT 2")
   spark.sql("SELECT * FROM v1").show()
   spark.sql("SELECT * FROM v1").queryExecution.analyzed
   
   spark.sql("CREATE OR REPLACE VIEW t1 AS SELECT 1 AS col1 INTERSECT SELECT 1 
INTERSECT SELECT 2 INTERSECT SELECT 2")
   spark.sql("SELECT * FROM v1").show()
   spark.sql("SELECT * FROM v1").queryExecution.analyzed
   
   spark.sql("DECLARE v INT")
   spark.sql("EXECUTE IMMEDIATE 'SELECT 1 UNION SELECT 2 UNION SELECT 3' INTO 
v")
   spark.sql("EXECUTE IMMEDIATE 'SELECT 1 UNION SELECT 2 UNION SELECT 3' INTO 
v").queryExecution.analyzed
   spark.sql("SELECT v").show()
   ```
   
   
![image](https://github.com/user-attachments/assets/ef726178-2375-4ebc-a7e3-88f1991d1016)
   
![image](https://github.com/user-attachments/assets/50b4b7ba-bc7d-4fc1-a921-f4cbfcab79a3)
   
![image](https://github.com/user-attachments/assets/85b65325-5dd9-4d74-b46d-8ea203ce1039)
   
![image](https://github.com/user-attachments/assets/c53c5e02-18c6-4e30-b834-94af619190c5)
   
   There's no correctness issue associated with regular queries (without the 
`VIEW` or `EXECUTE IMMEDIATE`). Apparently that's because we use 
`ParserInterface.parsePlan` (`singleStatement` term in Spark SQL grammar) for 
[regular 
queries](https://github.com/apache/spark/blob/b968ce1d3ac1b72019b30bf3d4e11d9574ba1205/sql/core/src/main/scala/org/apache/spark/sql/classic/SparkSession.scala#L490)
 and `ParserInterface.parseQuery` (`query` term in Spark SQL grammar) for [view 
bodies](https://github.com/apache/spark/blob/b968ce1d3ac1b72019b30bf3d4e11d9574ba1205/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala#L986)
 and [EXECUTE IMMEDIATE with 
INTO](https://github.com/apache/spark/blob/b968ce1d3ac1b72019b30bf3d4e11d9574ba1205/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/executeImmediate.scala#L167).
 The difference is that `singleStatement` [ends in 
EOF](https://github.com/apache/spark/blob/b968ce1d3ac1b72019b30bf3d4e11d9574ba1205/sql/
 
api/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBaseParser.g4#L144).
   
   Not sure what's the actual root cause, because I don't know much about the 
SQL Parser.
   
   ### Why are the changes needed?
   
   Correctness issue fix.
   
   ### Does this PR introduce _any_ user-facing change?
   
   Yes, the results of queries on top of aforementioned views are gonna be 
correct.
   
   ### How was this patch tested?
   
   New `view-correctness` suite.
   
   ### Was this patch authored or co-authored using generative AI tooling?
   
   No.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[PR] [SPARK-51201][SQL] Make Partitioning Hints support byte and short values [spark]

2025-02-13 Thread via GitHub



yaooqinn opened a new pull request, #49933:
URL: https://github.com/apache/spark/pull/49933

   ### What changes were proposed in this pull request?
   
   The `Dataset.hint` method takes `Any*` as parameters for both the partition 
number and other columns, and the `Partitioning Hints` resolution process 
treats anything that CAN NOT be resolved to a partition number as a column 
candidate. For example, cases like below will report an ambagious error message 
for users.
   
   ```scala
   val n: Short = 123
   spark.range(1).hint("rebalance", n).show
   ```
   
   ```
   REBALANCE Hint parameter should include columns, but 123 found.
   ```
   
   This PR makes Partitioning Hints resolution accept byte and short values and 
improves the debugging error message.
   
   ### Why are the changes needed?
   
   - A byte or short value can possibly exist in a user's spark pipeline and 
takes as a partition number, w/ this PR, runtime errors can be reduced
   - improve error message for debugging
   - For developers that build Spark Connect clients in other languages, there 
are some cases in which they can not handle or distinguish integrals. It's good 
for the server side to handle these valid cases.
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   Yes, `Dataset.hint("coalesce", 3.toByte)` and `/*+ COALESCE(3S) */` now are 
valid
   
   ### How was this patch tested?
   
   new tests
   
   ### Was this patch authored or co-authored using generative AI tooling?
   
   no


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-48530][SQL] Support for local variables in SQL Scripting [spark]

2025-02-13 Thread via GitHub



dusantism-db commented on code in PR #49445:
URL: https://github.com/apache/spark/pull/49445#discussion_r1954282034


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/ParserUtils.scala:
##
@@ -350,3 +357,12 @@ class SqlScriptingLabelContext {
 }
   }
 }
+
+object SqlScriptingLabelContext {
+  private val forbiddenLabelNames: immutable.Set[Regex] =
+immutable.Set("builtin".r, "session".r, "sys.*".r)

Review Comment:
   No, it will only match the string "builtin". Isn't that the correct way?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-48375][SQL] Add support for SIGNAL statement [spark]

2025-02-13 Thread via GitHub



cloud-fan commented on code in PR #49726:
URL: https://github.com/apache/spark/pull/49726#discussion_r1954098174


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala:
##
@@ -329,6 +376,32 @@ class AstBuilder extends DataTypeAstBuilder
   .duplicateConditionInScope(CurrentOrigin.get, 
condition.conditionName)
   }
   conditions += condition.conditionName -> condition.sqlState
+case signalStatement: SignalStatement if 
signalStatement.sqlState.isEmpty =>

Review Comment:
   SGTM



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-48375][SQL] Add support for SIGNAL statement [spark]

2025-02-13 Thread via GitHub



cloud-fan commented on code in PR #49726:
URL: https://github.com/apache/spark/pull/49726#discussion_r1954099260


##
sql/catalyst/src/main/scala/org/apache/spark/sql/exceptions/SqlScriptingRuntimeException.scala:
##
@@ -0,0 +1,70 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.exceptions
+
+import scala.jdk.CollectionConverters._
+
+import org.apache.spark.{SparkThrowable, SparkThrowableHelper}
+import org.apache.spark.sql.catalyst.trees.Origin
+import 
org.apache.spark.sql.exceptions.SqlScriptingRuntimeException.errorMessageWithLineNumber
+
+class SqlScriptingRuntimeException (
+condition: String,
+sqlState: String,
+message: String,
+cause: Throwable,
+val origin: Origin,
+messageParameters: Map[String, String] = Map.empty,
+isBuiltinError: Boolean = false)
+  extends Exception(
+errorMessageWithLineNumber(
+  origin,
+  condition,
+  sqlState,
+  message,
+  messageParameters,
+  isBuiltinError),
+cause)
+  with SparkThrowable {

Review Comment:
   then we shouldn't call it `SqlScriptingRuntimeException`. We should remove 
`Runtime` from the name



##
sql/catalyst/src/main/scala/org/apache/spark/sql/exceptions/SqlScriptingRuntimeException.scala:
##
@@ -0,0 +1,70 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.exceptions
+
+import scala.jdk.CollectionConverters._
+
+import org.apache.spark.{SparkThrowable, SparkThrowableHelper}
+import org.apache.spark.sql.catalyst.trees.Origin
+import 
org.apache.spark.sql.exceptions.SqlScriptingRuntimeException.errorMessageWithLineNumber
+
+class SqlScriptingRuntimeException (
+condition: String,
+sqlState: String,
+message: String,
+cause: Throwable,
+val origin: Origin,
+messageParameters: Map[String, String] = Map.empty,
+isBuiltinError: Boolean = false)
+  extends Exception(
+errorMessageWithLineNumber(
+  origin,
+  condition,
+  sqlState,
+  message,
+  messageParameters,
+  isBuiltinError),
+cause)
+  with SparkThrowable {

Review Comment:
   then we shouldn't call it `SqlScriptingRuntimeException`. We should remove 
`Runtime` from the name



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-48375][SQL] Add support for SIGNAL statement [spark]

2025-02-13 Thread via GitHub



cloud-fan commented on code in PR #49726:
URL: https://github.com/apache/spark/pull/49726#discussion_r1954110539


##
sql/core/src/main/scala/org/apache/spark/sql/scripting/SqlScriptingExecutionNode.scala:
##
@@ -1020,3 +1024,107 @@ class ExceptionHandlerExec(
 
   override def reset(): Unit = body.reset()
 }
+
+/**
+ * Executable node for Signal Statement.
+ * @param errorCondition Name of the error condition/SQL State for error that 
will be thrown.
+ * @param sqlState SQL State of the error that will be thrown.
+ * @param message Error message (either string or variable name).
+ * @param isBuiltinError Whether the error condition is a builtin condition.
+ * @param msgArguments Error message parameters for builtin conditions.
+ * @param session Spark session that SQL script is executed within.
+ * @param origin Origin descriptor for the statement.
+ */
+class SignalStatementExec(
+val errorCondition: String,
+val sqlState: String,
+val message: Either[String, UnresolvedAttribute],

Review Comment:
   oh, we use `Either` so that `UnresolvedAttribute` is not resolved by the 
analyzer but in `SignalStatementExec`. Do we have a requirement for doing so?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-51067][SQL] Revert session level collation for DML queries and apply object level collation for DDL queries [spark]

2025-02-13 Thread via GitHub



dejankrak-db commented on code in PR #49772:
URL: https://github.com/apache/spark/pull/49772#discussion_r1954124114


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveDDLCommandStringTypes.scala:
##
@@ -155,22 +123,22 @@ object ResolveDefaultStringTypes extends 
Rule[LogicalPlan] {
 dataType.existsRecursively(isDefaultStringType)
 
   private def isDefaultStringType(dataType: DataType): Boolean = {
+// STRING (without explicit collation) is considered default string type.
+// STRING COLLATE  (with explicit collation) is not 
considered
+// default string type even when explicit collation is UTF8_BINARY 
(default collation).
 dataType match {
-  case st: StringType =>
-// should only return true for StringType object and not 
StringType("UTF8_BINARY")
-st.eq(StringType) || st.isInstanceOf[TemporaryStringType]
+  // should only return true for StringType object and not for 
StringType("UTF8_BINARY")
+  case st: StringType => st.eq(StringType)

Review Comment:
   Yes, this is a quite interesting finding, I was surprised too. Good 
discussion!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-48375][SQL] Add support for SIGNAL statement [spark]

2025-02-13 Thread via GitHub



miland-db commented on code in PR #49726:
URL: https://github.com/apache/spark/pull/49726#discussion_r1954155098


##
sql/core/src/main/scala/org/apache/spark/sql/scripting/SqlScriptingExecutionNode.scala:
##
@@ -1020,3 +1024,107 @@ class ExceptionHandlerExec(
 
   override def reset(): Unit = body.reset()
 }
+
+/**
+ * Executable node for Signal Statement.
+ * @param errorCondition Name of the error condition/SQL State for error that 
will be thrown.
+ * @param sqlState SQL State of the error that will be thrown.
+ * @param message Error message (either string or variable name).
+ * @param isBuiltinError Whether the error condition is a builtin condition.
+ * @param msgArguments Error message parameters for builtin conditions.
+ * @param session Spark session that SQL script is executed within.
+ * @param origin Origin descriptor for the statement.
+ */
+class SignalStatementExec(
+val errorCondition: String,
+val sqlState: String,
+val message: Either[String, UnresolvedAttribute],

Review Comment:
   There is no requirement, but for scripting specific statements we usually 
don't go through Analyzer because we are able to "resolve and execute" them 
inside scripting engine. In this case, `SignalStatementExec` just keeps 
information about the exception that needs to be thrown. When we encounter it 
during script execution, we just read necessary information and throw an 
exception.
   
   Example we used was code for `EXECUTE IMMEDIATE` and how is `query_string` 
resolved there.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[PR] [SPARK-51181] [SQL] Enforce determinism when pulling out non deterministic expressions from logical plan [spark]

2025-02-13 Thread via GitHub



mihailoale-db opened a new pull request, #49935:
URL: https://github.com/apache/spark/pull/49935

   ### What changes were proposed in this pull request?
   Enforce determinism when pulling out non deterministic expressions from 
logical plan.
   
   ### Why are the changes needed?
   This is needed to avoid plan normalization problem when comparing 
single-pass and fixed point results.
   
   ### Does this PR introduce _any_ user-facing change?
   No.
   
   ### How was this patch tested?
   Existing tests.
   
   ### Was this patch authored or co-authored using generative AI tooling?
   No.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-51200][BUILD] Add SparkR deprecation info to `README.md` and `make-distribution.sh` help [spark]

2025-02-13 Thread via GitHub



yaooqinn closed pull request #49931: [SPARK-51200][BUILD] Add SparkR 
deprecation info to `README.md` and `make-distribution.sh` help
URL: https://github.com/apache/spark/pull/49931


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-51156][CONNECT] Provide a basic authentication token when running Spark Connect server locally [spark]

2025-02-13 Thread via GitHub



HyukjinKwon commented on PR #49880:
URL: https://github.com/apache/spark/pull/49880#issuecomment-2656206417

   I think we will likely miss RC1 - I will have to be away from keyboard like 
3 days. Since technically CVE isn't filed yet, and this is an optional 
distribution, I think we can go ahead with RC 1. I will try to target RC 2.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-51156][CONNECT] Provide a basic authentication token when running Spark Connect server locally [spark]

2025-02-13 Thread via GitHub



HyukjinKwon commented on PR #49880:
URL: https://github.com/apache/spark/pull/49880#issuecomment-2656204302

   I think I can't just enable SSL by default. We should expose the certificate 
or use insecure connection. The access token API cannot be used with SSL it 
seems so I can't reuse this existing token either.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-51198][CORE][DOCS] Revise `defaultMinPartitions` function description [spark]

2025-02-13 Thread via GitHub



yaooqinn commented on PR #49929:
URL: https://github.com/apache/spark/pull/49929#issuecomment-2656160458

   Merged to master, thank you @dongjoon-hyun 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-51156][CONNECT] Provide a basic authentication token when running Spark Connect server locally [spark]

2025-02-13 Thread via GitHub



HyukjinKwon commented on code in PR #49880:
URL: https://github.com/apache/spark/pull/49880#discussion_r1954272416


##
sql/connect/common/src/main/scala/org/apache/spark/sql/connect/common/config/ConnectCommon.scala:
##
@@ -16,9 +16,23 @@
  */
 package org.apache.spark.sql.connect.common.config
 
+import io.grpc.Metadata
+
 private[sql] object ConnectCommon {
   val CONNECT_GRPC_BINDING_PORT: Int = 15002
   val CONNECT_GRPC_PORT_MAX_RETRIES: Int = 0
   val CONNECT_GRPC_MAX_MESSAGE_SIZE: Int = 128 * 1024 * 1024
   val CONNECT_GRPC_MARSHALLER_RECURSION_LIMIT: Int = 1024
+  val AUTH_TOKEN_META_DATA_KEY: Metadata.Key[String] =
+Metadata.Key.of("Authentication", Metadata.ASCII_STRING_MARSHALLER)
+
+  val CONNECT_LOCAL_AUTH_TOKEN_ENV_NAME = "SPARK_CONNECT_LOCAL_AUTH_TOKEN"
+  private var CONNECT_LOCAL_AUTH_TOKEN: Option[String] = Option(
+System.getenv(CONNECT_LOCAL_AUTH_TOKEN_ENV_NAME))
+  def setLocalAuthToken(token: String): Unit = CONNECT_LOCAL_AUTH_TOKEN = 
Option(token)
+  def getLocalAuthToken: Option[String] = CONNECT_LOCAL_AUTH_TOKEN
+  def assertLocalAuthToken(token: Option[String]): Unit = token.foreach { t =>
+assert(CONNECT_LOCAL_AUTH_TOKEN.isDefined)

Review Comment:
   For other places, I will fix it later. But here assert is correct because if 
`token` is set, `CONNECT_LOCAL_AUTH_TOKEN` must be set to for local usage.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-51156][CONNECT] Provide a basic authentication token when running Spark Connect server locally [spark]

2025-02-13 Thread via GitHub



HyukjinKwon commented on code in PR #49880:
URL: https://github.com/apache/spark/pull/49880#discussion_r1954273145


##
sql/connect/server/src/main/scala/org/apache/spark/sql/connect/service/LocalAuthInterceptor.scala:
##
@@ -0,0 +1,36 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.connect.service
+
+import io.grpc.{Metadata, ServerCall, ServerCallHandler, ServerInterceptor}
+
+import org.apache.spark.sql.connect.common.config.ConnectCommon
+
+/**
+ * A gRPC interceptor to check if the header contains token for authentication.
+ */
+class LocalAuthInterceptor extends ServerInterceptor {

Review Comment:
   I will scope it down to local usage for now. Whole this is internal for now, 
and we don't need to generalize them at this moment.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-46937][SQL] Improve concurrency performance for FunctionRegistry [spark]

2025-02-13 Thread via GitHub



LuciferYang commented on code in PR #47084:
URL: https://github.com/apache/spark/pull/47084#discussion_r1954366720


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala:
##
@@ -306,11 +296,9 @@ class SimpleFunctionRegistry
 extends SimpleFunctionRegistryBase[Expression]
 with FunctionRegistry {
 
-  override def clone(): SimpleFunctionRegistry = synchronized {
+  override def clone(): SimpleFunctionRegistry = {

Review Comment:
   Originally, this entire function was atomic. If there are additions or 
deletions to functionBuilders during the cloning process, it would deviate from 
the previous behavior.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-51046][SQL][TEST] Reduce `numCols` in `withFilter()` to prevent `SubExprEliminationBenchmark` from failing due to a Codegen error [spark]

2025-02-13 Thread via GitHub



wayneguow commented on PR #49938:
URL: https://github.com/apache/spark/pull/49938#issuecomment-2657115551

   Benchmark jdk17: https://github.com/wayneguow/spark/actions/runs/13310975547
   Benchmark jdk21: https://github.com/wayneguow/spark/actions/runs/13310979208


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[PR] [SPARK-51046][SQL][TEST] Reduce `numCols` in `withFilter()` to prevent `SubExprEliminationBenchmark` from failing due to a Codegen error [spark]

2025-02-13 Thread via GitHub



wayneguow opened a new pull request, #49938:
URL: https://github.com/apache/spark/pull/49938

   
   
   ### What changes were proposed in this pull request?
   
   This PR aims to reduce `numCols` in `withFilter()` to prevent 
`SubExprEliminationBenchmark` from failing due to a Codegen error.
   
   I did some debug investigation and found that in the current master branch, 
the codegen code has a huge `processNext` method, but in branch-3.5, it is 
split into many small methods. Because the logic of codegen is different, the 
previous benchmark cannot run normally.
   
   master branch:
   ```
   protected void processNext() throws java.io.IOException {
   
while ( inputadapter_input_0.hasNext()) {
...
   }
   }
   ```
   3.5 branch:
   ```
   public boolean eval(InternalRow i) {
boolean value_2997 = Or_498(i);
return !globalIsNull_498 && value_2997;
}
   private boolean Or_498(InternalRow i) {
   ...
   }
   ...
   private boolean Or_(InternalRow i) {
  ...
   }
   ```
   
   ### Why are the changes needed?
   
   If we run `SubExprEliminationBenchmark`:
   ```
   build/sbt "sql/Test/runMain 
org.apache.spark.sql.execution.SubExprEliminationBenchmark"
   ```
   It fails at `CodeGenerator`, details:
   > [info] Running benchmark: from_json as subExpr in Filter
   [info]   Running case: subExprElimination false, codegen: true
   [info] 00:08:31.509 ERROR 
org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator: Failed to 
compile the generated Java code.
   [info] org.codehaus.commons.compiler.InternalCompilerException: Compiling 
"GeneratedClass" in File 'generated.java', Line 1, Column 1: File 
'generated.java', Line 24, Column 16: Compiling "processNext()"
   
   The root cause is:
   > [error] Caused by: 
org.codehaus.commons.compiler.InternalCompilerException: Code grows beyond 64 KB
   [error] at 
org.codehaus.janino.CodeContext.makeSpace(CodeContext.java:699)
   [error] at 
org.codehaus.janino.CodeContext.write(CodeContext.java:558)
   [error] at 
org.codehaus.janino.UnitCompiler.write(UnitCompiler.java:13079)
   [error] at 
org.codehaus.janino.UnitCompiler.store(UnitCompiler.java:12752)
   [error] at 
org.codehaus.janino.UnitCompiler.store(UnitCompiler.java:12730)
   [error] at 
org.codehaus.janino.UnitCompiler.compile2(UnitCompiler.java:2742)
   [error] ... 164 more
   
   ### Does this PR introduce _any_ user-facing change?
   
   No, just fix a test benchmark failure.
   
   ### How was this patch tested?
   
   Run benchmark tests manually.
   
   ### Was this patch authored or co-authored using generative AI tooling?
   
   No.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-48375][SQL] Add support for SIGNAL statement [spark]

2025-02-13 Thread via GitHub



cloud-fan commented on code in PR #49726:
URL: https://github.com/apache/spark/pull/49726#discussion_r1954812502


##
sql/core/src/main/scala/org/apache/spark/sql/scripting/SqlScriptingExecutionNode.scala:
##
@@ -1020,3 +1024,107 @@ class ExceptionHandlerExec(
 
   override def reset(): Unit = body.reset()
 }
+
+/**
+ * Executable node for Signal Statement.
+ * @param errorCondition Name of the error condition/SQL State for error that 
will be thrown.
+ * @param sqlState SQL State of the error that will be thrown.
+ * @param message Error message (either string or variable name).
+ * @param isBuiltinError Whether the error condition is a builtin condition.
+ * @param msgArguments Error message parameters for builtin conditions.
+ * @param session Spark session that SQL script is executed within.
+ * @param origin Origin descriptor for the statement.
+ */
+class SignalStatementExec(
+val errorCondition: String,
+val sqlState: String,
+val message: Either[String, UnresolvedAttribute],

Review Comment:
   Can you give some examples of other scripting plan nodes that do not use 
analyzer to resolve its expressions?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-48375][SQL] Add support for SIGNAL statement [spark]

2025-02-13 Thread via GitHub



cloud-fan commented on code in PR #49726:
URL: https://github.com/apache/spark/pull/49726#discussion_r1954815751


##
sql/core/src/test/scala/org/apache/spark/sql/scripting/SqlScriptingExecutionSuite.scala:
##
@@ -69,6 +70,222 @@ class SqlScriptingExecutionSuite extends QueryTest with 
SharedSparkSession {
 }
   }
 
+  // Signal tests
+  test("signal statement - condition") {
+val sqlScript =
+  """
+|BEGIN
+|  DECLARE TEST_CONDITION CONDITION FOR SQLSTATE '12345';
+|  SIGNAL TEST_CONDITION;
+|END
+|""".stripMargin
+val exception = intercept[SqlScriptingRuntimeException] {
+  verifySqlScriptResult(sqlScript, Seq.empty)
+}
+checkError(

Review Comment:
   Can you also run this test manually and check the user-facing error message? 
To make sure that the error context (line number) is properly displayed.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-51113][SQL] Fix correctness with UNION/EXCEPT/INTERSECT inside a view or EXECUTE IMMEDIATE [spark]

2025-02-13 Thread via GitHub



dongjoon-hyun commented on PR #49937:
URL: https://github.com/apache/spark/pull/49937#issuecomment-2657424493

   Nice catch!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-51113][SQL] Fix correctness with UNION/EXCEPT/INTERSECT inside a view or EXECUTE IMMEDIATE [spark]

2025-02-13 Thread via GitHub



vladimirg-db commented on PR #49937:
URL: https://github.com/apache/spark/pull/49937#issuecomment-2657422364

   @dongjoon-hyun I figured out the problem with a `sql-udf` test. So what 
happened is, in the new golden file tests I created a view _and_ a temporary 
view with the same name `v1`. And afterwards I dropped it only once. I actually 
didn't know that a regular view may shadow a temporary view with the same name!
   
![image](https://github.com/user-attachments/assets/2f853e87-6715-493c-88e1-461fc40ba7e7)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-51149][CORE] Log classpath in SparkSubmit on ClassNotFoundException [spark]

2025-02-13 Thread via GitHub



vrozov commented on PR #49870:
URL: https://github.com/apache/spark/pull/49870#issuecomment-2657458078

   @dongjoon-hyun Please review


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-51113][SQL] Fix correctness with UNION/EXCEPT/INTERSECT inside a view or EXECUTE IMMEDIATE [spark]

2025-02-13 Thread via GitHub



dongjoon-hyun closed pull request #49937: [SPARK-51113][SQL] Fix correctness 
with UNION/EXCEPT/INTERSECT inside a view or EXECUTE IMMEDIATE
URL: https://github.com/apache/spark/pull/49937


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[PR] [SPARK-51206][PYTHON][CONNECT] Move Arrow conversion helpers out of Spark Connect [spark]

2025-02-13 Thread via GitHub



wengh opened a new pull request, #49941:
URL: https://github.com/apache/spark/pull/49941

   
   
   ### What changes were proposed in this pull request?
   
   
   Refactor `pyspark.sql.connect.conversion` to move 
`LocalDataToArrowConversion` and `ArrowTableToRowsConversion` into 
`pyspark.sql.conversion`.
   
   The reason is that `pyspark.sql.connect.conversion` checks for Spark Connect 
dependencies such as `grpcio` and `pandas`, but `LocalDataToArrowConversion` 
and `ArrowTableToRowsConversion` don't need these dependencies.
   
   `pyspark.sql.connect.conversion` still re-exports the two classes for 
backward compatibility.
   
   ### Why are the changes needed?
   
   
   Python Data Sources should work without Spark Connect dependencies but 
currently it imports `LocalDataToArrowConversion` and 
`ArrowTableToRowsConversion` from `pyspark.sql.connect.conversion` making it 
require unnecessary dependencies. This change moves these two classes to 
`pyspark.sql.conversion` so that Python Data Sources runs without Spark Connect 
dependencies.
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   No
   
   ### How was this patch tested?
   
   
   Existing tests should make sure that the changes don't break anything.
   
   TODO: Add new test to ensure that Python Data Sources can be used with 
minimal dependencies.
   
   ### Was this patch authored or co-authored using generative AI tooling?
   
   
   No
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-51205][BUILD][TESTS] Upgrade `bytebuddy` to 1.17.0 to support Java 25 [spark]

2025-02-13 Thread via GitHub



dongjoon-hyun commented on PR #49940:
URL: https://github.com/apache/spark/pull/49940#issuecomment-2657732901

   Could you review this test-dependency PR when you have some time, @huaxingao 
?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-51205][BUILD][TESTS] Upgrade `bytebuddy` to 1.17.0 to support Java 25 [spark]

2025-02-13 Thread via GitHub



dongjoon-hyun commented on PR #49940:
URL: https://github.com/apache/spark/pull/49940#issuecomment-2657736954

   Thank you, @huaxingao . Merged to master.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-51205][BUILD][TESTS] Upgrade `bytebuddy` to 1.17.0 to support Java 25 [spark]

2025-02-13 Thread via GitHub



dongjoon-hyun closed pull request #49940: [SPARK-51205][BUILD][TESTS] Upgrade 
`bytebuddy` to 1.17.0 to support Java 25
URL: https://github.com/apache/spark/pull/49940


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-51046][SQL][TEST] Reduce `numCols` in `withFilter()` to prevent `SubExprEliminationBenchmark` from failing due to a Codegen error [spark]

2025-02-13 Thread via GitHub



dongjoon-hyun commented on PR #49938:
URL: https://github.com/apache/spark/pull/49938#issuecomment-2657165478

   cc @panbingkun , @cloud-fan , @LuciferYang 
   
   It seems that we need to make a decision. Are we good with this codeine perf 
regression?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-51046][SQL][TEST] Reduce `numCols` in `withFilter()` to prevent `SubExprEliminationBenchmark` from failing due to a Codegen error [spark]

2025-02-13 Thread via GitHub



dongjoon-hyun commented on code in PR #49938:
URL: https://github.com/apache/spark/pull/49938#discussion_r1954851640


##
sql/core/benchmarks/SubExprEliminationBenchmark-results.txt:
##
@@ -3,23 +3,23 @@ Benchmark for performance of subexpression elimination
 

 
 Preparing data for benchmarking ...
-OpenJDK 64-Bit Server VM 17.0.12+7-LTS on Linux 6.5.0-1025-azure
+OpenJDK 64-Bit Server VM 17.0.14+7-LTS on Linux 6.8.0-1021-azure
 AMD EPYC 7763 64-Core Processor
 from_json as subExpr in Project:  Best Time(ms)   Avg Time(ms)   
Stdev(ms)Rate(M/s)   Per Row(ns)   Relative
 

-subExprElimination false, codegen: true6438   6551 
 98  0.064378783.5   1.0X
-subExprElimination false, codegen: false   6216   6320 
175  0.062161826.1   1.0X
-subExprElimination true, codegen: true 1480   1518 
 39  0.014799890.8   4.3X
-subExprElimination true, codegen: false1321   1429 
 94  0.013212919.6   4.9X
+subExprElimination false, codegen: true6398   6601 
177  0.063980070.0   1.0X
+subExprElimination false, codegen: false   6018   6137 
123  0.060178371.6   1.1X
+subExprElimination true, codegen: true 1288   1352 
 78  0.012883220.0   5.0X
+subExprElimination true, codegen: false1299   1340 
 66  0.012989399.4   4.9X
 
 Preparing data for benchmarking ...
-OpenJDK 64-Bit Server VM 17.0.12+7-LTS on Linux 6.5.0-1025-azure
+OpenJDK 64-Bit Server VM 17.0.14+7-LTS on Linux 6.8.0-1021-azure
 AMD EPYC 7763 64-Core Processor
 from_json as subExpr in Filter:   Best Time(ms)   Avg Time(ms)   
Stdev(ms)Rate(M/s)   Per Row(ns)   Relative
 

-subExprElimination false, codegen: true7107   7310 
207  0.071066752.8   1.0X
-subExprElimination false, codegen: false   6738   6781 
 41  0.067375897.0   1.1X
-subExprElimination true, codegen: true 2052   2110 
 51  0.020519152.3   3.5X
-subExprElimination true, codegen: false2053   2079 
 33  0.020526629.8   3.5X
+subExprElimination false, codegen: true2474   2512 
 44  0.024744107.3   1.0X
+subExprElimination false, codegen: false   2231   2246 
 20  0.022306061.2   1.1X
+subExprElimination true, codegen: true 2408   2509 
100  0.024084091.2   1.0X

Review Comment:
   Yes, it's true. The performance regression was the actual root cause why we 
didn't make a decision. Here, FYI.
   
   - https://github.com/apache/spark/pull/49411



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-48375][SQL] Add support for SIGNAL statement [spark]

2025-02-13 Thread via GitHub



cloud-fan commented on code in PR #49726:
URL: https://github.com/apache/spark/pull/49726#discussion_r1954813899


##
sql/core/src/main/scala/org/apache/spark/sql/scripting/SqlScriptingExecutionNode.scala:
##
@@ -1020,3 +1024,107 @@ class ExceptionHandlerExec(
 
   override def reset(): Unit = body.reset()
 }
+
+/**
+ * Executable node for Signal Statement.
+ * @param errorCondition Name of the error condition/SQL State for error that 
will be thrown.
+ * @param sqlState SQL State of the error that will be thrown.
+ * @param message Error message (either string or variable name).
+ * @param isBuiltinError Whether the error condition is a builtin condition.
+ * @param msgArguments Error message parameters for builtin conditions.
+ * @param session Spark session that SQL script is executed within.
+ * @param origin Origin descriptor for the statement.
+ */
+class SignalStatementExec(
+val errorCondition: String,
+val sqlState: String,
+val message: Either[String, UnresolvedAttribute],
+val msgArguments: Option[UnresolvedAttribute],
+val isBuiltinError: Boolean,
+val session: SparkSession,
+override val origin: Origin)
+  extends LeafStatementExec
+  with ColumnResolutionHelper with WithOrigin {

Review Comment:
   we don't need to implement `ColumnResolutionHelper` if we let analyzer to 
resolve the variables.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

1 2 >

1 - 100 of 162 matches

Mail list logo