Re: [PR] [SPARK-51179][SQL] Refactor SupportsOrderingWithinGroup so that centralized check [spark]
beliefer commented on PR #49908: URL: https://github.com/apache/spark/pull/49908#issuecomment-2655798525 ping @cloud-fan cc @mikhailnik-db -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-42746][SQL][FOLLOWUP] Improve the code for SupportsOrderingWithinGroup and Mode [spark]
cloud-fan commented on code in PR #49907: URL: https://github.com/apache/spark/pull/49907#discussion_r1954008498 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/percentiles.scala: ## @@ -390,10 +390,6 @@ case class PercentileCont(left: Expression, right: Expression, reverse: Boolean override protected def withNewChildrenInternal( newLeft: Expression, newRight: Expression): PercentileCont = this.copy(left = newLeft, right = newRight) - - override def orderingFilled: Boolean = left != UnresolvedWithinGroup Review Comment: Why do we make such a change? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-51067][SQL] Revert session level collation for DML queries and apply object level collation for DDL queries [spark]
cloud-fan commented on code in PR #49772: URL: https://github.com/apache/spark/pull/49772#discussion_r1954019289 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveDDLCommandStringTypes.scala: ## @@ -155,22 +123,22 @@ object ResolveDefaultStringTypes extends Rule[LogicalPlan] { dataType.existsRecursively(isDefaultStringType) private def isDefaultStringType(dataType: DataType): Boolean = { +// STRING (without explicit collation) is considered default string type. +// STRING COLLATE (with explicit collation) is not considered +// default string type even when explicit collation is UTF8_BINARY (default collation). dataType match { - case st: StringType => -// should only return true for StringType object and not StringType("UTF8_BINARY") -st.eq(StringType) || st.isInstanceOf[TemporaryStringType] + // should only return true for StringType object and not for StringType("UTF8_BINARY") + case st: StringType => st.eq(StringType) case _ => false } } private def replaceDefaultStringType(dataType: DataType, newType: StringType): DataType = { +// Should replace STRING with the new type. +// Should not replace STRING COLLATE UTF8_BINARY, as that is explicit collation. dataType.transformRecursively { case currentType: StringType if isDefaultStringType(currentType) => -if (currentType == newType) { - TemporaryStringType() -} else { - newType -} +newType Review Comment: or can we move this rule to `sql/core` as an extended resolution rule in https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/internal/BaseSessionStateBuilder.scala#L212 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [WIP][SPARK-7101][SQL] support java.sql.Time [spark]
MaxGekk commented on PR #28858: URL: https://github.com/apache/spark/pull/28858#issuecomment-2655900699 I sent the SPIP https://issues.apache.org/jira/browse/SPARK-51162 to dev list for discussion ([link](https://lists.apache.org/thread/892vkskktqrx1czk9wm6l8vchpydrny2)). @younggyuchun @dongjoon-hyun @HyukjinKwon Please, leave comments in the thread if you are still interested in the feature. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-48530][SQL] Support for local variables in SQL Scripting [spark]
cloud-fan commented on code in PR #49445: URL: https://github.com/apache/spark/pull/49445#discussion_r1954078549 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/ParserUtils.scala: ## @@ -350,3 +357,12 @@ class SqlScriptingLabelContext { } } } + +object SqlScriptingLabelContext { + private val forbiddenLabelNames: immutable.Set[Regex] = +immutable.Set("builtin".r, "session".r, "sys.*".r) Review Comment: will it match `*builtin*`? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-51210][CORE] Add `--enable-native-access=ALL-UNNAMED` to Java options for Java 24+ [spark]
dongjoon-hyun closed pull request #49944: [SPARK-51210][CORE] Add `--enable-native-access=ALL-UNNAMED` to Java options for Java 24+ URL: https://github.com/apache/spark/pull/49944 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-51210][CORE] Add `--enable-native-access=ALL-UNNAMED` to Java options for Java 24+ [spark]
dongjoon-hyun commented on PR #49944: URL: https://github.com/apache/spark/pull/49944#issuecomment-2658003644 Thank you for review and approval, @viirya ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-48375][SQL] Add support for SIGNAL statement [spark]
cloud-fan commented on code in PR #49726: URL: https://github.com/apache/spark/pull/49726#discussion_r1955489143 ## sql/core/src/main/scala/org/apache/spark/sql/scripting/SqlScriptingExecutionNode.scala: ## @@ -1020,3 +1024,107 @@ class ExceptionHandlerExec( override def reset(): Unit = body.reset() } + +/** + * Executable node for Signal Statement. + * @param errorCondition Name of the error condition/SQL State for error that will be thrown. + * @param sqlState SQL State of the error that will be thrown. + * @param message Error message (either string or variable name). + * @param isBuiltinError Whether the error condition is a builtin condition. + * @param msgArguments Error message parameters for builtin conditions. + * @param session Spark session that SQL script is executed within. + * @param origin Origin descriptor for the statement. + */ +class SignalStatementExec( +val errorCondition: String, +val sqlState: String, +val message: Either[String, UnresolvedAttribute], Review Comment: I checked the error condition and handler ones. They don't have expressions to resolve. Can you provide other examples to justify your approach? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-42746][SQL][FOLLOWUP] Improve the code for SupportsOrderingWithinGroup and Mode [spark]
cloud-fan commented on code in PR #49907: URL: https://github.com/apache/spark/pull/49907#discussion_r1955422516 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/percentiles.scala: ## @@ -390,10 +390,6 @@ case class PercentileCont(left: Expression, right: Expression, reverse: Boolean override protected def withNewChildrenInternal( newLeft: Expression, newRight: Expression): PercentileCont = this.copy(left = newLeft, right = newRight) - - override def orderingFilled: Boolean = left != UnresolvedWithinGroup Review Comment: why it's not necessary? `orderingFilled` should be true if `UnresolvedWithinGroup` is resolved and replaced? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-51210][CORE] Add `--enable-native-access=ALL-UNNAMED` to Java options for Java 24+ [spark]
LuciferYang commented on PR #49944: URL: https://github.com/apache/spark/pull/49944#issuecomment-2658349981 late LGTM -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[PR] [SPARK-51215][ML][PYTHON][CONNECT] Add a helper function to invoke helper model attr [spark]
zhengruifeng opened a new pull request, #49951: URL: https://github.com/apache/spark/pull/49951 ### What changes were proposed in this pull request? Add a helper function to invoke helper model attr ### Why are the changes needed? deduplicate code ### Does this PR introduce _any_ user-facing change? no, minor refactor ### How was this patch tested? existing tests ### Was this patch authored or co-authored using generative AI tooling? no -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-51213][SQL] Keep Expression class info when resolving hint parameters [spark]
yaooqinn commented on PR #49950: URL: https://github.com/apache/spark/pull/49950#issuecomment-2658350458 Thank you @dongjoon-hyun, I've passed the failed test locally. Let's run another round in GA. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-51119][SQL][FOLLOW-UP] ColumnDefinition.toV1Column should preserve EXISTS_DEFAULT resolution [spark]
cloud-fan commented on PR #49947: URL: https://github.com/apache/spark/pull/49947#issuecomment-2658350592 thanks, merging to master/4.0! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-51119][SQL][FOLLOW-UP] ColumnDefinition.toV1Column should preserve EXISTS_DEFAULT resolution [spark]
cloud-fan closed pull request #49947: [SPARK-51119][SQL][FOLLOW-UP] ColumnDefinition.toV1Column should preserve EXISTS_DEFAULT resolution URL: https://github.com/apache/spark/pull/49947 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-51204][BUILD] Upgrade `sbt-assembly` to 2.3.1 [spark]
pan3793 commented on PR #49939: URL: https://github.com/apache/spark/pull/49939#issuecomment-2658374563 According to the release notes, it is better to go 4.0 too. > - Fixes assemblyOutputPath > - Fixes assemblyExcludedJars Spark uses these features, but I am not sure if it is affected, cc @LuciferYang you may have more context. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-51209][CORE][FOLLOWUP] Use `user.name` system property first as a fallback [spark]
dongjoon-hyun closed pull request #49949: [SPARK-51209][CORE][FOLLOWUP] Use `user.name` system property first as a fallback URL: https://github.com/apache/spark/pull/49949 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-51209][CORE][FOLLOWUP] Use `user.name` system property first as a fallback [spark]
dongjoon-hyun commented on PR #49949: URL: https://github.com/apache/spark/pull/49949#issuecomment-2658378854 `Document generations` passed at the last commit. Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-51208][SQL] `ColumnDefinition.toV1Column` should preserve `EXISTS_DEFAULT` resolution [spark]
cloud-fan commented on code in PR #49942: URL: https://github.com/apache/spark/pull/49942#discussion_r1955382758 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/ColumnDefinition.scala: ## @@ -75,7 +75,7 @@ case class ColumnDefinition( // For v1 CREATE TABLE command, we will resolve and execute the default value expression later // in the rule `DataSourceAnalysis`. We just need to put the default value SQL string here. metadataBuilder.putString(CURRENT_DEFAULT_COLUMN_METADATA_KEY, default.originalSQL) - metadataBuilder.putString(EXISTS_DEFAULT_COLUMN_METADATA_KEY, default.originalSQL) + metadataBuilder.putString(EXISTS_DEFAULT_COLUMN_METADATA_KEY, default.child.sql) Review Comment: Note that, `Expression#sql` is for display only, and it's not guaranteed to be able to parse back. I think we should only use `child.sql` here if `child` is a literal, which means this `ColumnDefinition` has been resolved and constant-folded. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[PR] [SPARK-51212][PYTHON] Add a separated PySpark package for Spark Connect by default [spark]
ueshin opened a new pull request, #49946: URL: https://github.com/apache/spark/pull/49946 ### What changes were proposed in this pull request? Adds a separated PySpark package for Spark Connect by default. - Rename `pyspark-connect` to `pyspark-client` - Add a new `pyspark-connect` that depends on `pyspark` but use Spark Connect by default ### Why are the changes needed? As discussed in the dev list, we will have an additional package for Spark Connect by default. ### Does this PR introduce _any_ user-facing change? Yes, there will be three packages for PySpark. - `pyspark`: contains the whole package to run PySpark, Spark Classic by default - `pyspark-connect`: depends on `pyspark`, Spark Connect by default - `pyspark-client`: only contains Python files to work with Spark Connect ### How was this patch tested? Manually. ### Was this patch authored or co-authored using generative AI tooling? No. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-51208][SQL] `ColumnDefinition.toV1Column` should preserve `EXISTS_DEFAULT` resolution [spark]
cloud-fan commented on code in PR #49942: URL: https://github.com/apache/spark/pull/49942#discussion_r1955421786 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/ColumnDefinition.scala: ## @@ -75,7 +75,7 @@ case class ColumnDefinition( // For v1 CREATE TABLE command, we will resolve and execute the default value expression later // in the rule `DataSourceAnalysis`. We just need to put the default value SQL string here. metadataBuilder.putString(CURRENT_DEFAULT_COLUMN_METADATA_KEY, default.originalSQL) - metadataBuilder.putString(EXISTS_DEFAULT_COLUMN_METADATA_KEY, default.originalSQL) + metadataBuilder.putString(EXISTS_DEFAULT_COLUMN_METADATA_KEY, default.child.sql) Review Comment: When we convert v2 CREATE TABLE command to v1, `ColumnDefinition` is only resolved but not constant-folded (still in the analyzer phase). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-51185][Core] Revert simplifications to PartitionedFileUtil API to reduce memory requirements [spark]
LukasRupprecht commented on PR #49915: URL: https://github.com/apache/spark/pull/49915#issuecomment-2658110441 Thanks @cloud-fan for merging this! Will prepare a separate PR for 3.5. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-51177][PYTHON][CONNECT] Add `InvalidCommandInput` to Spark Connect Python client [spark]
itholic closed pull request #49916: [SPARK-51177][PYTHON][CONNECT] Add `InvalidCommandInput` to Spark Connect Python client URL: https://github.com/apache/spark/pull/49916 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-51177][PYTHON][CONNECT] Add `InvalidCommandInput` to Spark Connect Python client [spark]
itholic commented on PR #49916: URL: https://github.com/apache/spark/pull/49916#issuecomment-2658076422 Merged to master and branch-4.0 Thanks @HyukjinKwon @zhengruifeng for the review. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-48375][SQL] Add support for SIGNAL statement [spark]
cloud-fan commented on code in PR #49726: URL: https://github.com/apache/spark/pull/49726#discussion_r1955489998 ## sql/core/src/main/scala/org/apache/spark/sql/scripting/SqlScriptingExecutionNode.scala: ## @@ -1020,3 +1024,107 @@ class ExceptionHandlerExec( override def reset(): Unit = body.reset() } + +/** + * Executable node for Signal Statement. + * @param errorCondition Name of the error condition/SQL State for error that will be thrown. + * @param sqlState SQL State of the error that will be thrown. + * @param message Error message (either string or variable name). + * @param isBuiltinError Whether the error condition is a builtin condition. + * @param msgArguments Error message parameters for builtin conditions. + * @param session Spark session that SQL script is executed within. + * @param origin Origin descriptor for the statement. + */ +class SignalStatementExec( +val errorCondition: String, +val sqlState: String, +val message: Either[String, UnresolvedAttribute], +val msgArguments: Option[UnresolvedAttribute], +val isBuiltinError: Boolean, +val session: SparkSession, +override val origin: Origin) + extends LeafStatementExec + with ColumnResolutionHelper with WithOrigin { Review Comment: The execution node of SIGNAL just gets the string value by calling `.eval()` of the message expression, which can be either literal or variables. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-51095][CORE][SQL] Include caller context for hdfs audit logs for calls from driver [spark]
pan3793 commented on code in PR #49814: URL: https://github.com/apache/spark/pull/49814#discussion_r1955489715 ## sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala: ## @@ -171,6 +172,11 @@ private[hive] class HiveClientImpl( private def newState(): SessionState = { val hiveConf = newHiveConf(sparkConf, hadoopConf, extraConfig, Some(initClassLoader)) val state = new SessionState(hiveConf) +// When SessionState is initialized, the caller context is overridden by hive +// so we need to reset it back to the DRIVER Review Comment: @sririshindra Thanks for your effort in testing it! For Iceberg cases, you can use Spark 3.5 to test, AFAIK, there are no changes related to CallerContext in the Spark 4.0 period -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-51095][CORE][SQL] Include caller context for hdfs audit logs for calls from driver [spark]
sririshindra commented on code in PR #49814: URL: https://github.com/apache/spark/pull/49814#discussion_r1955487024 ## sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala: ## @@ -171,6 +172,11 @@ private[hive] class HiveClientImpl( private def newState(): SessionState = { val hiveConf = newHiveConf(sparkConf, hadoopConf, extraConfig, Some(initClassLoader)) val state = new SessionState(hiveConf) +// When SessionState is initialized, the caller context is overridden by hive +// so we need to reset it back to the DRIVER Review Comment: @pan3793 , @cnauroth I was finally able to properly test on this upstream version in a docker based Cluster with this current branch. Looks like the change in hiveClinetImpl is not needed in Spark 4. I checked if the caller context is being set during the sessionSate initialization in HiveClinetImpl and it looks like it is not. So, Once the CallerContext is set inside the SparkContext class its is not being overridden by anything else from the Driver process. ``` 2025-02-14 02:26:23,249 INFO FSNamesystem.audit: allowed=true ugi=root (auth:SIMPLE) ip=/192.168.97.4cmd=getfileinfo src=/warehouse/sample dst=nullperm=null proto=rpc callerContext=SPARK_DRIVER_application_1739496632907_0005 2025-02-14 02:26:23,265 INFO FSNamesystem.audit: allowed=true ugi=root (auth:SIMPLE) ip=/192.168.97.4cmd=listStatus src=/warehouse/sample dst=nullperm=null proto=rpc callerContext=SPARK_DRIVER_application_1739496632907_0005 2025-02-14 02:26:25,519 INFO FSNamesystem.audit: allowed=true ugi=root (auth:SIMPLE) ip=/192.168.97.5cmd=open src=/warehouse/sample/part-0-dd473344-76b1-4179-91ae-d15a8da4a888-c000 dst=nullperm=null proto=rpc callerContext=SPARK_TASK_application_1739496632907_0005_JId_0_SId_0_0_TId_0_0 2025-02-14 02:26:26,345 INFO FSNamesystem.audit: allowed=true ugi=root (auth:SIMPLE) ip=/192.168.97.5cmd=open src=/warehouse/sample/part-0-dd473344-76b1-4179-91ae-d15a8da4a888-c000 dst=nullperm=null proto=rpc callerContext=SPARK_TASK_application_1739496632907_0005_JId_1_SId_1_0_TId_1_0 ``` I wasn't able to test with Iceberg though. This is because Iceberg doesn't support Spark4 yet. Once an Iceberg release with Spark 4 support is released, I will retest it and make any changes needed in a separate PR. But For now, I removed the change that was in HiveClientImpl.scala . -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-50655][SS] Move virtual col family related mapping into db layer instead of encoder [spark]
anishshri-db commented on code in PR #49304: URL: https://github.com/apache/spark/pull/49304#discussion_r1955675678 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDBFileManager.scala: ## @@ -931,7 +941,8 @@ case class RocksDBCheckpointMetadata( sstFiles: Seq[RocksDBSstFile], logFiles: Seq[RocksDBLogFile], numKeys: Long, -columnFamilyMapping: Option[Map[String, Short]] = None, +numInternalKeys: Long, +columnFamilyMapping: Option[Map[String, ColumnFamilyInfo]] = None, Review Comment: Done - added format validation tests for serde -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-51210][CORE] Add `--enable-native-access=ALL-UNNAMED` to Java options for Java 24+ [spark]
dongjoon-hyun commented on PR #49944: URL: https://github.com/apache/spark/pull/49944#issuecomment-2657968814 Could you review this `Java option` PR when you have some time, @viirya ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-51208][SQL] `ColumnDefinition.toV1Column` should preserve `EXISTS_DEFAULT` resolution [spark]
cloud-fan commented on code in PR #49942: URL: https://github.com/apache/spark/pull/49942#discussion_r1955383049 ## sql/catalyst/src/test/scala/org/apache/spark/sql/types/StructTypeSuite.scala: ## @@ -798,4 +800,35 @@ class StructTypeSuite extends SparkFunSuite with SQLHelper { assert( mapper.readTree(mapWithNestedArray.json) == mapper.readTree(expectedJson)) } + + test("SPARK-51208: ColumnDefinition.toV1Column should preserve EXISTS_DEFAULT resolution") { + +def validateConvertedDefaults( + colName: String, Review Comment: super nit: 4 spaces indentation -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-51209][CORE] Improve `getCurrentUserName` to handle Java 24+ [spark]
dongjoon-hyun closed pull request #49943: [SPARK-51209][CORE] Improve `getCurrentUserName` to handle Java 24+ URL: https://github.com/apache/spark/pull/49943 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-51209][CORE] Improve `getCurrentUserName` to handle Java 24+ [spark]
dongjoon-hyun commented on PR #49943: URL: https://github.com/apache/spark/pull/49943#issuecomment-2657967822 Merged to master for Apache Spark 4.1.0. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-51209][CORE] Improve `getCurrentUserName` to handle Java 24+ [spark]
dongjoon-hyun commented on PR #49943: URL: https://github.com/apache/spark/pull/49943#issuecomment-2657966885 Thank you, @huaxingao ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-51204][BUILD] Upgrade `sbt-assembly` to 2.3.1 [spark]
wayneguow commented on PR #49939: URL: https://github.com/apache/spark/pull/49939#issuecomment-2658092268 > Could you re-trigger the failed K8s test, @wayneguow ? Although it looks irrelevant, let's make it sure. After re-running, it was successful. However, this page does not yet show that the check has been passed. There may be some delay. https://github.com/wayneguow/spark/actions/runs/13312413853/job/37202892468 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[PR] [SPARK-51119][SQL][FOLLOW-UP] ColumnDefinition.toV1Column should preserve EXISTS_DEFAULT resolution [spark]
szehon-ho opened a new pull request, #49947: URL: https://github.com/apache/spark/pull/49947 Small follow up for: https://github.com/apache/spark/pull/49942 ### What changes were proposed in this pull request? Restrict the logic of https://github.com/apache/spark/pull/49942 to only resolved and folded. ### Why are the changes needed? Previous part of PR may have unexpected behavior if expression is not resolved/folded. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Existing unit test ### Was this patch authored or co-authored using generative AI tooling? No -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-51216][BUILD] Remove the useless `bigtop-dist` profile and the related outdated files [spark]
LuciferYang commented on PR #49952: URL: https://github.com/apache/spark/pull/49952#issuecomment-2658399233 cc @pan3793 @dongjoon-hyun -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[PR] [BUILD] Remove the useless `bigtop-dist` profile [spark]
LuciferYang opened a new pull request, #49952: URL: https://github.com/apache/spark/pull/49952 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? ### Was this patch authored or co-authored using generative AI tooling? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-51186][PYTHON] Add `StreamingPythonRunnerInitializationException` to PySpark base exception [spark]
itholic closed pull request #49917: [SPARK-51186][PYTHON] Add `StreamingPythonRunnerInitializationException` to PySpark base exception URL: https://github.com/apache/spark/pull/49917 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-51186][PYTHON] Add `StreamingPythonRunnerInitializationException` to PySpark base exception [spark]
itholic commented on PR #49917: URL: https://github.com/apache/spark/pull/49917#issuecomment-2658414570 Merged to master and branch-4.0 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-46937][SQL] Improve concurrency performance for FunctionRegistry [spark]
beliefer commented on code in PR #47084: URL: https://github.com/apache/spark/pull/47084#discussion_r1955643680 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala: ## @@ -306,11 +296,9 @@ class SimpleFunctionRegistry extends SimpleFunctionRegistryBase[Expression] with FunctionRegistry { - override def clone(): SimpleFunctionRegistry = synchronized { + override def clone(): SimpleFunctionRegistry = { Review Comment: Yes. I know it. ConcurrentHashMap creates the iterator with weakly consistent and it will not causes concurrent modification exception. I think the clone for SimpleFunctionRegistry allows the concurrent modification. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-46937][SQL] Improve concurrency performance for FunctionRegistry [spark]
beliefer commented on code in PR #47084: URL: https://github.com/apache/spark/pull/47084#discussion_r1955643680 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala: ## @@ -306,11 +296,9 @@ class SimpleFunctionRegistry extends SimpleFunctionRegistryBase[Expression] with FunctionRegistry { - override def clone(): SimpleFunctionRegistry = synchronized { + override def clone(): SimpleFunctionRegistry = { Review Comment: Yes. I know it. `ConcurrentHashMap` creates the iterator with weakly consistent and it will not causes concurrent modification exception. I think the clone for `SimpleFunctionRegistry` allows the concurrent modification. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-51067][SQL] Revert session level collation for DML queries and apply object level collation for DDL queries [spark]
cloud-fan commented on code in PR #49772: URL: https://github.com/apache/spark/pull/49772#discussion_r1954022933 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveDDLCommandStringTypes.scala: ## @@ -18,80 +18,48 @@ package org.apache.spark.sql.catalyst.analysis import org.apache.spark.sql.catalyst.expressions.{Cast, Expression, Literal} -import org.apache.spark.sql.catalyst.plans.logical.{AddColumns, AlterColumns, AlterColumnSpec, AlterViewAs, ColumnDefinition, CreateView, LogicalPlan, QualifiedColType, ReplaceColumns, V1CreateTablePlan, V2CreateTablePlan} -import org.apache.spark.sql.catalyst.rules.{Rule, RuleExecutor} +import org.apache.spark.sql.catalyst.plans.logical.{AddColumns, AlterColumns, AlterColumnSpec, AlterTableCommand, AlterViewAs, ColumnDefinition, CreateTable, CreateView, LogicalPlan, QualifiedColType, ReplaceColumns, V1CreateTablePlan, V2CreateTablePlan} +import org.apache.spark.sql.catalyst.rules.Rule +import org.apache.spark.sql.connector.catalog.TableCatalog import org.apache.spark.sql.types.{DataType, StringType} /** - * Resolves default string types in queries and commands. For queries, the default string type is - * determined by the session's default string type. For DDL, the default string type is the - * default type of the object (table -> schema -> catalog). However, this is not implemented yet. - * So, we will just use UTF8_BINARY for now. + * Resolves string types in DDL commands, where the string type inherits the + * collation from the corresponding object (table/view -> schema -> catalog). */ -object ResolveDefaultStringTypes extends Rule[LogicalPlan] { +object ResolveDDLCommandStringTypes extends Rule[LogicalPlan] { def apply(plan: LogicalPlan): LogicalPlan = { -val newPlan = apply0(plan) -if (plan.ne(newPlan)) { - // Due to how tree transformations work and StringType object being equal to - // StringType("UTF8_BINARY"), we need to transform the plan twice - // to ensure the correct results for occurrences of default string type. - val finalPlan = apply0(newPlan) - RuleExecutor.forceAdditionalIteration(finalPlan) - finalPlan -} else { - newPlan -} - } - - private def apply0(plan: LogicalPlan): LogicalPlan = { if (isDDLCommand(plan)) { transformDDL(plan) } else { - transformPlan(plan, sessionDefaultStringType) + // For non-DDL commands no need to do any further resolution of string types + plan } } - /** - * Returns whether any of the given `plan` needs to have its - * default string type resolved. - */ - def needsResolution(plan: LogicalPlan): Boolean = { -if (!isDDLCommand(plan) && isDefaultSessionCollationUsed) { - return false + /** Default collation used, if object level collation is not provided */ + private def defaultCollation: String = "UTF8_BINARY" + + /** Returns the string type that should be used in a given DDL command */ + private def stringTypeForDDLCommand(table: LogicalPlan): StringType = { +table match { + case createTable: CreateTable if createTable.tableSpec.collation.isDefined => +StringType(createTable.tableSpec.collation.get) + case createView: CreateView if createView.collation.isDefined => +StringType(createView.collation.get) + case alterTable: AlterTableCommand if alterTable.table.resolved => +val collation = Option(alterTable + .table.asInstanceOf[ResolvedTable] + .table.properties.get(TableCatalog.PROP_COLLATION)) +if (collation.isDefined) { + StringType(collation.get) +} else { + StringType(defaultCollation) +} + case _ => StringType(defaultCollation) } - -plan.exists(node => needsResolution(node.expressions)) - } - - /** - * Returns whether any of the given `expressions` needs to have its - * default string type resolved. - */ - def needsResolution(expressions: Seq[Expression]): Boolean = { -expressions.exists(needsResolution) } - /** - * Returns whether the given `expression` needs to have its - * default string type resolved. - */ - def needsResolution(expression: Expression): Boolean = { -expression.exists(e => transformExpression.isDefinedAt(e)) - } - - private def isDefaultSessionCollationUsed: Boolean = conf.defaultStringType == StringType - - /** - * Returns the default string type that should be used in a given DDL command (for now always - * UTF8_BINARY). - */ - private def stringTypeForDDLCommand(table: LogicalPlan): StringType = -StringType("UTF8_BINARY") - - /** Returns the session default string type */ - private def sessionDefaultStringType: StringType = -StringType(conf.defaultStringType.collationId) - private def isDDLCommand(plan: LogicalPlan): Boolean = plan exists { case _: AddColumns | _: ReplaceColumns | _: AlterColumns => true case _ => isCreateOrAlterPlan(pl
Re: [PR] [SPARK-51146][SQL][FOLLOWUP] Respect system env `SPARK_CONNECT_MODE` in places that access the api mode config [spark]
cloud-fan commented on PR #49930: URL: https://github.com/apache/spark/pull/49930#issuecomment-2655862681 @dongjoon-hyun I'd like the cut rc1 on time to kick off the release cycle. We can fail rc1 if there are many pending work, which will likely happen. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-48530][SQL] Support for local variables in SQL Scripting [spark]
cloud-fan commented on code in PR #49445: URL: https://github.com/apache/spark/pull/49445#discussion_r1954045545 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ColumnResolutionHelper.scala: ## @@ -266,22 +275,41 @@ trait ColumnResolutionHelper extends Logging with DataTypeErrorsBase { } } -if (maybeTempVariableName(nameParts)) { - val variableName = if (conf.caseSensitiveAnalysis) { -nameParts.last - } else { -nameParts.last.toLowerCase(Locale.ROOT) - } - catalogManager.tempVariableManager.get(variableName).map { varDef => +val namePartsCaseAdjusted = if (conf.caseSensitiveAnalysis) { + nameParts +} else { + nameParts.map(_.toLowerCase(Locale.ROOT)) +} + +SqlScriptingLocalVariableManager.get() + // If we are in EXECUTE IMMEDIATE lookup only session variables. + .filterNot(_ => AnalysisContext.get.isExecuteImmediate) + // If variable name is qualified with session. treat it as a session variable. + .filter(_ => +nameParts.length <= 2 && nameParts.init.map(_.toLowerCase(Locale.ROOT)) != Seq("session")) Review Comment: can we reuse the function that validates the label name? We can do something like ``` .filterNot(nameParts => nameParts.length > 2 || (naneParts.length == 2 && !isValidLabelName(naneParts.head))) ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-28973][SQL] Add `TimeType` and support `java.time.LocalTime` as its external type. [spark]
MaxGekk commented on PR #25678: URL: https://github.com/apache/spark/pull/25678#issuecomment-2655891624 I sent the SPIP https://issues.apache.org/jira/browse/SPARK-51162 to dev list for discussion ([link](https://lists.apache.org/thread/892vkskktqrx1czk9wm6l8vchpydrny2)). @zeddit @Fokko @tundraraj @redblackcoder @melin @hurelhuyag @younggyuchun Please, leave comments in the thread if you are still interested in the feature. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [WIP][SPARK-24815] [CORE] Trigger Interval based DRA for Structured Streaming [spark]
WebdeveloperIsaac commented on PR #42352: URL: https://github.com/apache/spark/pull/42352#issuecomment-2655892185 Can this be reopened +1? this feature will be of great help please don't keep this parked.. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-48530][SQL] Support for local variables in SQL Scripting [spark]
dusantism-db commented on code in PR #49445: URL: https://github.com/apache/spark/pull/49445#discussion_r1954447510 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ColumnResolutionHelper.scala: ## @@ -266,22 +275,41 @@ trait ColumnResolutionHelper extends Logging with DataTypeErrorsBase { } } -if (maybeTempVariableName(nameParts)) { - val variableName = if (conf.caseSensitiveAnalysis) { -nameParts.last - } else { -nameParts.last.toLowerCase(Locale.ROOT) - } - catalogManager.tempVariableManager.get(variableName).map { varDef => +val namePartsCaseAdjusted = if (conf.caseSensitiveAnalysis) { + nameParts +} else { + nameParts.map(_.toLowerCase(Locale.ROOT)) +} + +SqlScriptingLocalVariableManager.get() + // If we are in EXECUTE IMMEDIATE lookup only session variables. + .filterNot(_ => AnalysisContext.get.isExecuteImmediate) + // If variable name is qualified with session. treat it as a session variable. + .filter(_ => +nameParts.length <= 2 && nameParts.init.map(_.toLowerCase(Locale.ROOT)) != Seq("session")) Review Comment: The `validSessionVariableName` function will return different results depending on nameParts length. In your case we only call it when length is 2, which is correct behavior, but a little confusing in my opinion. The way it is now is easier to understand IMO. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-42746][SQL][FOLLOWUP] Improve the code for SupportsOrderingWithinGroup and Mode [spark]
beliefer commented on code in PR #49907: URL: https://github.com/apache/spark/pull/49907#discussion_r1954137856 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/percentiles.scala: ## @@ -390,10 +390,6 @@ case class PercentileCont(left: Expression, right: Expression, reverse: Boolean override protected def withNewChildrenInternal( newLeft: Expression, newRight: Expression): PercentileCont = this.copy(left = newLeft, right = newRight) - - override def orderingFilled: Boolean = left != UnresolvedWithinGroup Review Comment: Before the PR https://github.com/apache/spark/pull/48748, the default value is false. That is a change not necessary! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[PR] [SPARK-51202][ML][PYTHON] Pass the session in meta algorithm python writers [spark]
zhengruifeng opened a new pull request, #49932: URL: https://github.com/apache/spark/pull/49932 ### What changes were proposed in this pull request? Pass the session in meta algorithm python writers ### Why are the changes needed? try the best to avoid recreating sessions ### Does this PR introduce _any_ user-facing change? no, internal change ### How was this patch tested? existing test should cover ### Was this patch authored or co-authored using generative AI tooling? no -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-51067][SQL] Revert session level collation for DML queries and apply object level collation for DDL queries [spark]
dejankrak-db commented on code in PR #49772: URL: https://github.com/apache/spark/pull/49772#discussion_r1954127700 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveDDLCommandStringTypes.scala: ## @@ -18,80 +18,48 @@ package org.apache.spark.sql.catalyst.analysis import org.apache.spark.sql.catalyst.expressions.{Cast, Expression, Literal} -import org.apache.spark.sql.catalyst.plans.logical.{AddColumns, AlterColumns, AlterColumnSpec, AlterViewAs, ColumnDefinition, CreateView, LogicalPlan, QualifiedColType, ReplaceColumns, V1CreateTablePlan, V2CreateTablePlan} -import org.apache.spark.sql.catalyst.rules.{Rule, RuleExecutor} +import org.apache.spark.sql.catalyst.plans.logical.{AddColumns, AlterColumns, AlterColumnSpec, AlterTableCommand, AlterViewAs, ColumnDefinition, CreateTable, CreateView, LogicalPlan, QualifiedColType, ReplaceColumns, V1CreateTablePlan, V2CreateTablePlan} +import org.apache.spark.sql.catalyst.rules.Rule +import org.apache.spark.sql.connector.catalog.TableCatalog import org.apache.spark.sql.types.{DataType, StringType} /** - * Resolves default string types in queries and commands. For queries, the default string type is - * determined by the session's default string type. For DDL, the default string type is the - * default type of the object (table -> schema -> catalog). However, this is not implemented yet. - * So, we will just use UTF8_BINARY for now. + * Resolves string types in DDL commands, where the string type inherits the + * collation from the corresponding object (table/view -> schema -> catalog). */ -object ResolveDefaultStringTypes extends Rule[LogicalPlan] { +object ResolveDDLCommandStringTypes extends Rule[LogicalPlan] { def apply(plan: LogicalPlan): LogicalPlan = { -val newPlan = apply0(plan) -if (plan.ne(newPlan)) { - // Due to how tree transformations work and StringType object being equal to - // StringType("UTF8_BINARY"), we need to transform the plan twice - // to ensure the correct results for occurrences of default string type. - val finalPlan = apply0(newPlan) - RuleExecutor.forceAdditionalIteration(finalPlan) - finalPlan -} else { - newPlan -} - } - - private def apply0(plan: LogicalPlan): LogicalPlan = { if (isDDLCommand(plan)) { transformDDL(plan) } else { - transformPlan(plan, sessionDefaultStringType) + // For non-DDL commands no need to do any further resolution of string types + plan } } - /** - * Returns whether any of the given `plan` needs to have its - * default string type resolved. - */ - def needsResolution(plan: LogicalPlan): Boolean = { -if (!isDDLCommand(plan) && isDefaultSessionCollationUsed) { - return false + /** Default collation used, if object level collation is not provided */ + private def defaultCollation: String = "UTF8_BINARY" + + /** Returns the string type that should be used in a given DDL command */ + private def stringTypeForDDLCommand(table: LogicalPlan): StringType = { +table match { + case createTable: CreateTable if createTable.tableSpec.collation.isDefined => +StringType(createTable.tableSpec.collation.get) + case createView: CreateView if createView.collation.isDefined => +StringType(createView.collation.get) + case alterTable: AlterTableCommand if alterTable.table.resolved => +val collation = Option(alterTable + .table.asInstanceOf[ResolvedTable] + .table.properties.get(TableCatalog.PROP_COLLATION)) +if (collation.isDefined) { + StringType(collation.get) +} else { + StringType(defaultCollation) +} + case _ => StringType(defaultCollation) } - -plan.exists(node => needsResolution(node.expressions)) - } - - /** - * Returns whether any of the given `expressions` needs to have its - * default string type resolved. - */ - def needsResolution(expressions: Seq[Expression]): Boolean = { -expressions.exists(needsResolution) } - /** - * Returns whether the given `expression` needs to have its - * default string type resolved. - */ - def needsResolution(expression: Expression): Boolean = { -expression.exists(e => transformExpression.isDefinedAt(e)) - } - - private def isDefaultSessionCollationUsed: Boolean = conf.defaultStringType == StringType - - /** - * Returns the default string type that should be used in a given DDL command (for now always - * UTF8_BINARY). - */ - private def stringTypeForDDLCommand(table: LogicalPlan): StringType = -StringType("UTF8_BINARY") - - /** Returns the session default string type */ - private def sessionDefaultStringType: StringType = -StringType(conf.defaultStringType.collationId) - private def isDDLCommand(plan: LogicalPlan): Boolean = plan exists { case _: AddColumns | _: ReplaceColumns | _: AlterColumns => true case _ => isCreateOrAlterPlan
Re: [PR] [SPARK-51067][SQL] Revert session level collation for DML queries and apply object level collation for DDL queries [spark]
dejankrak-db commented on code in PR #49772: URL: https://github.com/apache/spark/pull/49772#discussion_r1954134354 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveDDLCommandStringTypes.scala: ## @@ -18,80 +18,48 @@ package org.apache.spark.sql.catalyst.analysis import org.apache.spark.sql.catalyst.expressions.{Cast, Expression, Literal} -import org.apache.spark.sql.catalyst.plans.logical.{AddColumns, AlterColumns, AlterColumnSpec, AlterViewAs, ColumnDefinition, CreateView, LogicalPlan, QualifiedColType, ReplaceColumns, V1CreateTablePlan, V2CreateTablePlan} -import org.apache.spark.sql.catalyst.rules.{Rule, RuleExecutor} +import org.apache.spark.sql.catalyst.plans.logical.{AddColumns, AlterColumns, AlterColumnSpec, AlterTableCommand, AlterViewAs, ColumnDefinition, CreateTable, CreateView, LogicalPlan, QualifiedColType, ReplaceColumns, V1CreateTablePlan, V2CreateTablePlan} +import org.apache.spark.sql.catalyst.rules.Rule +import org.apache.spark.sql.connector.catalog.TableCatalog import org.apache.spark.sql.types.{DataType, StringType} /** - * Resolves default string types in queries and commands. For queries, the default string type is - * determined by the session's default string type. For DDL, the default string type is the - * default type of the object (table -> schema -> catalog). However, this is not implemented yet. - * So, we will just use UTF8_BINARY for now. + * Resolves string types in DDL commands, where the string type inherits the + * collation from the corresponding object (table/view -> schema -> catalog). */ -object ResolveDefaultStringTypes extends Rule[LogicalPlan] { +object ResolveDDLCommandStringTypes extends Rule[LogicalPlan] { def apply(plan: LogicalPlan): LogicalPlan = { -val newPlan = apply0(plan) -if (plan.ne(newPlan)) { - // Due to how tree transformations work and StringType object being equal to - // StringType("UTF8_BINARY"), we need to transform the plan twice - // to ensure the correct results for occurrences of default string type. - val finalPlan = apply0(newPlan) - RuleExecutor.forceAdditionalIteration(finalPlan) - finalPlan -} else { - newPlan -} - } - - private def apply0(plan: LogicalPlan): LogicalPlan = { if (isDDLCommand(plan)) { transformDDL(plan) } else { - transformPlan(plan, sessionDefaultStringType) + // For non-DDL commands no need to do any further resolution of string types + plan } } - /** - * Returns whether any of the given `plan` needs to have its - * default string type resolved. - */ - def needsResolution(plan: LogicalPlan): Boolean = { -if (!isDDLCommand(plan) && isDefaultSessionCollationUsed) { - return false + /** Default collation used, if object level collation is not provided */ + private def defaultCollation: String = "UTF8_BINARY" + + /** Returns the string type that should be used in a given DDL command */ + private def stringTypeForDDLCommand(table: LogicalPlan): StringType = { +table match { + case createTable: CreateTable if createTable.tableSpec.collation.isDefined => +StringType(createTable.tableSpec.collation.get) + case createView: CreateView if createView.collation.isDefined => +StringType(createView.collation.get) + case alterTable: AlterTableCommand if alterTable.table.resolved => +val collation = Option(alterTable + .table.asInstanceOf[ResolvedTable] + .table.properties.get(TableCatalog.PROP_COLLATION)) +if (collation.isDefined) { + StringType(collation.get) +} else { + StringType(defaultCollation) +} + case _ => StringType(defaultCollation) Review Comment: Removed V1CreateTablePlan trait altogether, this can be handled separately in a follow up PR as discussed. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-51067][SQL] Revert session level collation for DML queries and apply object level collation for DDL queries [spark]
dejankrak-db commented on code in PR #49772: URL: https://github.com/apache/spark/pull/49772#discussion_r1954132769 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveDDLCommandStringTypes.scala: ## @@ -155,22 +123,22 @@ object ResolveDefaultStringTypes extends Rule[LogicalPlan] { dataType.existsRecursively(isDefaultStringType) private def isDefaultStringType(dataType: DataType): Boolean = { +// STRING (without explicit collation) is considered default string type. +// STRING COLLATE (with explicit collation) is not considered +// default string type even when explicit collation is UTF8_BINARY (default collation). dataType match { - case st: StringType => -// should only return true for StringType object and not StringType("UTF8_BINARY") -st.eq(StringType) || st.isInstanceOf[TemporaryStringType] + // should only return true for StringType object and not for StringType("UTF8_BINARY") + case st: StringType => st.eq(StringType) case _ => false } } private def replaceDefaultStringType(dataType: DataType, newType: StringType): DataType = { +// Should replace STRING with the new type. +// Should not replace STRING COLLATE UTF8_BINARY, as that is explicit collation. dataType.transformRecursively { case currentType: StringType if isDefaultStringType(currentType) => -if (currentType == newType) { - TemporaryStringType() -} else { - newType -} +newType Review Comment: As discussed, I have removed V1CreateTablePlan trait for now, as supporting v2 CREATE TABLE command is more critical, and we can handle v1 CREATE TABLE command separately in a follow up change down the road, following the proposed approach. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-51202][ML][PYTHON] Pass the session in meta algorithm python writers [spark]
zhengruifeng commented on PR #49932: URL: https://github.com/apache/spark/pull/49932#issuecomment-2656017987 This PR is for master-only (4.1) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[PR] [SPARK-49479][CORE] Cancel the Timer non-daemon thread on stopping the BarrierCoordinator [spark]
bcheena opened a new pull request, #49934: URL: https://github.com/apache/spark/pull/49934 ### What changes were proposed in this pull request? Cancelling the `Timer` non-daemon thread on stopping the `BarrierCoordinator` ### Why are the changes needed? In Barrier Execution Mode, Spark driver JVM could hang around after calling spark.stop(). Although the Spark Context was shutdown, the JVM was still running. The reason was that there is a non-daemon timer thread named `BarrierCoordinator barrier epoch increment timer`, which prevented the driver JVM from stopping. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Run the following scripts locally using s./bin/spark-submit. Without this change, the JVM would hang there and not exit. With this change it would exit successfully. 1. [barrier_example.py](https://gist.github.com/jshmchenxi/5d7e7c61e1eedd03cd6d676699059e9b#file-barrier_example-py) 2. [xgboost-test.py](https://gist.github.com/bcheena/510230e19120eb9ae631dcafa804409f). ### Was this patch authored or co-authored using generative AI tooling? No -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-51198][CORE][DOCS] Revise `defaultMinPartitions` function description [spark]
yaooqinn closed pull request #49929: [SPARK-51198][CORE][DOCS] Revise `defaultMinPartitions` function description URL: https://github.com/apache/spark/pull/49929 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-51200][BUILD] Add SparkR deprecation info to `README.md` and `make-distribution.sh` help [spark]
yaooqinn commented on PR #49931: URL: https://github.com/apache/spark/pull/49931#issuecomment-2656165484 Merged to master and branch-4.0, thank you @dongjoon-hyun -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-51181] [SQL] Enforce determinism when pulling out non deterministic expressions from logical plan [spark]
mihailoale-db commented on PR #49935: URL: https://github.com/apache/spark/pull/49935#issuecomment-2656322301 @MaxGekk @cloud-fan ptal when you have time. Thanks -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[PR] [SPARK-51203] Enhance force optimize skewed join [spark]
wForget opened a new pull request, #49936: URL: https://github.com/apache/spark/pull/49936 ### What changes were proposed in this pull request? ### Why are the changes needed? ForceOptimizeSkewedJoin allows optimizing skewed join even if introduce extra shuffle, but currently it only works for aggregation after join, not for aggregations in children of join. Like: ``` HashAggregate    |  Exchange    | HashAggregate   Exchange (skewed side)    |        |   Sort       Sort    \        /  SortMergeJoin(isSkewJoin = true) ``` I want to introduce extra shuffle for join child so as to optimize skewed join when forceOptimizeSkewedJoin is enabled? Like: ```  HashAggregate     |   Exchange     |  HashAggregate        |   Exchange (froce extra shuffle)    Exchange (skewed side)     |            |    Sort           Sort     \            /    SortMergeJoin(isSkewJoin = true) ``` ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? added unit test ### Was this patch authored or co-authored using generative AI tooling? No -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-51181] [SQL] Enforce determinism when pulling out non deterministic expressions from logical plan [spark]
mihailotim-db commented on code in PR #49935: URL: https://github.com/apache/spark/pull/49935#discussion_r1954499398 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/PullOutNondeterministic.scala: ## @@ -55,9 +57,9 @@ object PullOutNondeterministic extends Rule[LogicalPlan] { val nondeterToAttr = NondeterministicExpressionCollection.getNondeterministicToAttributes(p.expressions) val newPlan = p.transformExpressions { case e => -nondeterToAttr.get(e).map(_.toAttribute).getOrElse(e) +Option(nondeterToAttr.get(e)).map(_.toAttribute).getOrElse(e) } - val newChild = Project(p.child.output ++ nondeterToAttr.values, p.child) + val newChild = Project(p.child.output ++ nondeterToAttr.values.asScala.toSeq, p.child) Review Comment: Let's not use `asScala` because it copies the entire map -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-51181] [SQL] Enforce determinism when pulling out non deterministic expressions from logical plan [spark]
mihailotim-db commented on code in PR #49935: URL: https://github.com/apache/spark/pull/49935#discussion_r1954499398 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/PullOutNondeterministic.scala: ## @@ -55,9 +57,9 @@ object PullOutNondeterministic extends Rule[LogicalPlan] { val nondeterToAttr = NondeterministicExpressionCollection.getNondeterministicToAttributes(p.expressions) val newPlan = p.transformExpressions { case e => -nondeterToAttr.get(e).map(_.toAttribute).getOrElse(e) +Option(nondeterToAttr.get(e)).map(_.toAttribute).getOrElse(e) } - val newChild = Project(p.child.output ++ nondeterToAttr.values, p.child) + val newChild = Project(p.child.output ++ nondeterToAttr.values.asScala.toSeq, p.child) Review Comment: Let's not use `asScala` because it copies the entire map -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-51183][SQL] Link to Parquet spec in Variant docs [spark]
cashmand commented on PR #49910: URL: https://github.com/apache/spark/pull/49910#issuecomment-2656729838 Hi @dongjoon-hyun, I updated the text to clarify the current status of the project. Let me know what you think. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-51146][SQL][FOLLOWUP] Respect system env `SPARK_CONNECT_MODE` in places that access the api mode config [spark]
dongjoon-hyun commented on PR #49930: URL: https://github.com/apache/spark/pull/49930#issuecomment-2656819482 Ya, I agree with you, @cloud-fan . > I'd like the cut rc1 on time to kick off the release cycle. We can fail rc1 if there are many pending work, which will likely happen. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-51200][BUILD] Add SparkR deprecation info to `README.md` and `make-distribution.sh` help [spark]
dongjoon-hyun commented on PR #49931: URL: https://github.com/apache/spark/pull/49931#issuecomment-2656806168 Thank you, @yaooqinn ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-51198][CORE][DOCS] Revise `defaultMinPartitions` function description [spark]
dongjoon-hyun commented on PR #49929: URL: https://github.com/apache/spark/pull/49929#issuecomment-2656805060 Thank you, @yaooqinn ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-51201][SQL] Make Partitioning Hints support byte and short values [spark]
dongjoon-hyun commented on code in PR #49933: URL: https://github.com/apache/spark/pull/49933#discussion_r1954654468 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveHints.scala: ## @@ -174,9 +174,23 @@ object ResolveHints { * COALESCE Hint accepts names "COALESCE", "REPARTITION", and "REPARTITION_BY_RANGE". */ object ResolveCoalesceHints extends Rule[LogicalPlan] { - -val COALESCE_HINT_NAMES: Set[String] = - Set("COALESCE", "REPARTITION", "REPARTITION_BY_RANGE", "REBALANCE") +private def getNumOfPartitions(hint: UnresolvedHint): (Option[Int], Seq[Any]) = { + hint.parameters match { +case Seq(ByteLiteral(numPartitions), _*) => + (Some(numPartitions.toInt), hint.parameters.tail) +case Seq(ShortLiteral(numPartitions), _*) => + (Some(numPartitions.toInt), hint.parameters.tail) +case Seq(IntegerLiteral(numPartitions), _*) => (Some(numPartitions), hint.parameters.tail) Review Comment: nit, This could be two line like the above two cases. ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveHints.scala: ## @@ -174,9 +174,23 @@ object ResolveHints { * COALESCE Hint accepts names "COALESCE", "REPARTITION", and "REPARTITION_BY_RANGE". */ object ResolveCoalesceHints extends Rule[LogicalPlan] { - -val COALESCE_HINT_NAMES: Set[String] = - Set("COALESCE", "REPARTITION", "REPARTITION_BY_RANGE", "REBALANCE") +private def getNumOfPartitions(hint: UnresolvedHint): (Option[Int], Seq[Any]) = { + hint.parameters match { +case Seq(ByteLiteral(numPartitions), _*) => + (Some(numPartitions.toInt), hint.parameters.tail) +case Seq(ShortLiteral(numPartitions), _*) => + (Some(numPartitions.toInt), hint.parameters.tail) +case Seq(IntegerLiteral(numPartitions), _*) => (Some(numPartitions), hint.parameters.tail) Review Comment: nit, This could be two line like the above two cases. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-51201][SQL] Make Partitioning Hints support byte and short values [spark]
dongjoon-hyun commented on code in PR #49933: URL: https://github.com/apache/spark/pull/49933#discussion_r1954656932 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveHints.scala: ## @@ -174,9 +174,23 @@ object ResolveHints { * COALESCE Hint accepts names "COALESCE", "REPARTITION", and "REPARTITION_BY_RANGE". */ object ResolveCoalesceHints extends Rule[LogicalPlan] { - -val COALESCE_HINT_NAMES: Set[String] = - Set("COALESCE", "REPARTITION", "REPARTITION_BY_RANGE", "REBALANCE") +private def getNumOfPartitions(hint: UnresolvedHint): (Option[Int], Seq[Any]) = { + hint.parameters match { +case Seq(ByteLiteral(numPartitions), _*) => + (Some(numPartitions.toInt), hint.parameters.tail) +case Seq(ShortLiteral(numPartitions), _*) => + (Some(numPartitions.toInt), hint.parameters.tail) +case Seq(IntegerLiteral(numPartitions), _*) => (Some(numPartitions), hint.parameters.tail) +case _ => (None, hint.parameters) + } +} Review Comment: nit. New empty line between two methods. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-51181] [SQL] Enforce determinism when pulling out non deterministic expressions from logical plan [spark]
cloud-fan commented on PR #49935: URL: https://github.com/apache/spark/pull/49935#issuecomment-2656875411 thanks, merging to master/4.0! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-51181] [SQL] Enforce determinism when pulling out non deterministic expressions from logical plan [spark]
cloud-fan closed pull request #49935: [SPARK-51181] [SQL] Enforce determinism when pulling out non deterministic expressions from logical plan URL: https://github.com/apache/spark/pull/49935 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-51067][SQL] Revert session level collation for DML queries and apply object level collation for DDL queries [spark]
cloud-fan commented on PR #49772: URL: https://github.com/apache/spark/pull/49772#issuecomment-2656883176 thanks, merging to master/4.0! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-51067][SQL] Revert session level collation for DML queries and apply object level collation for DDL queries [spark]
cloud-fan closed pull request #49772: [SPARK-51067][SQL] Revert session level collation for DML queries and apply object level collation for DDL queries URL: https://github.com/apache/spark/pull/49772 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[PR] [SPARK-51113][SQL] Fix correctness with UNION/EXCEPT/INTERSECT inside a view or EXECUTE IMMEDIATE [spark]
vladimirg-db opened a new pull request, #49937: URL: https://github.com/apache/spark/pull/49937 ### What changes were proposed in this pull request? Fix correctness with UNION/EXCEPT/INTERSECT inside a view or `EXECUTE IMMEDIATE`. In the following examples the SQL Parser considers UNION/EXCEPT/INTERSECT keywords as aliases and drops the rest of the query: ``` spark.sql("CREATE OR REPLACE VIEW v1 AS SELECT 1 AS col1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4") spark.sql("SELECT * FROM v1").show() spark.sql("SELECT * FROM v1").queryExecution.analyzed spark.sql("CREATE OR REPLACE VIEW v1 AS SELECT 1 AS col1 EXCEPT SELECT 2 EXCEPT SELECT 1 EXCEPT SELECT 2") spark.sql("SELECT * FROM v1").show() spark.sql("SELECT * FROM v1").queryExecution.analyzed spark.sql("CREATE OR REPLACE VIEW t1 AS SELECT 1 AS col1 INTERSECT SELECT 1 INTERSECT SELECT 2 INTERSECT SELECT 2") spark.sql("SELECT * FROM v1").show() spark.sql("SELECT * FROM v1").queryExecution.analyzed spark.sql("DECLARE v INT") spark.sql("EXECUTE IMMEDIATE 'SELECT 1 UNION SELECT 2 UNION SELECT 3' INTO v") spark.sql("EXECUTE IMMEDIATE 'SELECT 1 UNION SELECT 2 UNION SELECT 3' INTO v").queryExecution.analyzed spark.sql("SELECT v").show() ```     There's no correctness issue associated with regular queries (without the `VIEW` or `EXECUTE IMMEDIATE`). Apparently that's because we use `ParserInterface.parsePlan` (`singleStatement` term in Spark SQL grammar) for [regular queries](https://github.com/apache/spark/blob/b968ce1d3ac1b72019b30bf3d4e11d9574ba1205/sql/core/src/main/scala/org/apache/spark/sql/classic/SparkSession.scala#L490) and `ParserInterface.parseQuery` (`query` term in Spark SQL grammar) for [view bodies](https://github.com/apache/spark/blob/b968ce1d3ac1b72019b30bf3d4e11d9574ba1205/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala#L986) and [EXECUTE IMMEDIATE with INTO](https://github.com/apache/spark/blob/b968ce1d3ac1b72019b30bf3d4e11d9574ba1205/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/executeImmediate.scala#L167). The difference is that `singleStatement` [ends in EOF](https://github.com/apache/spark/blob/b968ce1d3ac1b72019b30bf3d4e11d9574ba1205/sql/ api/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBaseParser.g4#L144). Not sure what's the actual root cause, because I don't know much about the SQL Parser. ### Why are the changes needed? Correctness issue fix. ### Does this PR introduce _any_ user-facing change? Yes, the results of queries on top of aforementioned views are gonna be correct. ### How was this patch tested? New `view-correctness` suite. ### Was this patch authored or co-authored using generative AI tooling? No. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[PR] [SPARK-51201][SQL] Make Partitioning Hints support byte and short values [spark]
yaooqinn opened a new pull request, #49933: URL: https://github.com/apache/spark/pull/49933 ### What changes were proposed in this pull request? The `Dataset.hint` method takes `Any*` as parameters for both the partition number and other columns, and the `Partitioning Hints` resolution process treats anything that CAN NOT be resolved to a partition number as a column candidate. For example, cases like below will report an ambagious error message for users. ```scala val n: Short = 123 spark.range(1).hint("rebalance", n).show ``` ``` REBALANCE Hint parameter should include columns, but 123 found. ``` This PR makes Partitioning Hints resolution accept byte and short values and improves the debugging error message. ### Why are the changes needed? - A byte or short value can possibly exist in a user's spark pipeline and takes as a partition number, w/ this PR, runtime errors can be reduced - improve error message for debugging - For developers that build Spark Connect clients in other languages, there are some cases in which they can not handle or distinguish integrals. It's good for the server side to handle these valid cases. ### Does this PR introduce _any_ user-facing change? Yes, `Dataset.hint("coalesce", 3.toByte)` and `/*+ COALESCE(3S) */` now are valid ### How was this patch tested? new tests ### Was this patch authored or co-authored using generative AI tooling? no -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-48530][SQL] Support for local variables in SQL Scripting [spark]
dusantism-db commented on code in PR #49445: URL: https://github.com/apache/spark/pull/49445#discussion_r1954282034 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/ParserUtils.scala: ## @@ -350,3 +357,12 @@ class SqlScriptingLabelContext { } } } + +object SqlScriptingLabelContext { + private val forbiddenLabelNames: immutable.Set[Regex] = +immutable.Set("builtin".r, "session".r, "sys.*".r) Review Comment: No, it will only match the string "builtin". Isn't that the correct way? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-48375][SQL] Add support for SIGNAL statement [spark]
cloud-fan commented on code in PR #49726: URL: https://github.com/apache/spark/pull/49726#discussion_r1954098174 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala: ## @@ -329,6 +376,32 @@ class AstBuilder extends DataTypeAstBuilder .duplicateConditionInScope(CurrentOrigin.get, condition.conditionName) } conditions += condition.conditionName -> condition.sqlState +case signalStatement: SignalStatement if signalStatement.sqlState.isEmpty => Review Comment: SGTM -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-48375][SQL] Add support for SIGNAL statement [spark]
cloud-fan commented on code in PR #49726: URL: https://github.com/apache/spark/pull/49726#discussion_r1954099260 ## sql/catalyst/src/main/scala/org/apache/spark/sql/exceptions/SqlScriptingRuntimeException.scala: ## @@ -0,0 +1,70 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.exceptions + +import scala.jdk.CollectionConverters._ + +import org.apache.spark.{SparkThrowable, SparkThrowableHelper} +import org.apache.spark.sql.catalyst.trees.Origin +import org.apache.spark.sql.exceptions.SqlScriptingRuntimeException.errorMessageWithLineNumber + +class SqlScriptingRuntimeException ( +condition: String, +sqlState: String, +message: String, +cause: Throwable, +val origin: Origin, +messageParameters: Map[String, String] = Map.empty, +isBuiltinError: Boolean = false) + extends Exception( +errorMessageWithLineNumber( + origin, + condition, + sqlState, + message, + messageParameters, + isBuiltinError), +cause) + with SparkThrowable { Review Comment: then we shouldn't call it `SqlScriptingRuntimeException`. We should remove `Runtime` from the name ## sql/catalyst/src/main/scala/org/apache/spark/sql/exceptions/SqlScriptingRuntimeException.scala: ## @@ -0,0 +1,70 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.exceptions + +import scala.jdk.CollectionConverters._ + +import org.apache.spark.{SparkThrowable, SparkThrowableHelper} +import org.apache.spark.sql.catalyst.trees.Origin +import org.apache.spark.sql.exceptions.SqlScriptingRuntimeException.errorMessageWithLineNumber + +class SqlScriptingRuntimeException ( +condition: String, +sqlState: String, +message: String, +cause: Throwable, +val origin: Origin, +messageParameters: Map[String, String] = Map.empty, +isBuiltinError: Boolean = false) + extends Exception( +errorMessageWithLineNumber( + origin, + condition, + sqlState, + message, + messageParameters, + isBuiltinError), +cause) + with SparkThrowable { Review Comment: then we shouldn't call it `SqlScriptingRuntimeException`. We should remove `Runtime` from the name -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-48375][SQL] Add support for SIGNAL statement [spark]
cloud-fan commented on code in PR #49726: URL: https://github.com/apache/spark/pull/49726#discussion_r1954110539 ## sql/core/src/main/scala/org/apache/spark/sql/scripting/SqlScriptingExecutionNode.scala: ## @@ -1020,3 +1024,107 @@ class ExceptionHandlerExec( override def reset(): Unit = body.reset() } + +/** + * Executable node for Signal Statement. + * @param errorCondition Name of the error condition/SQL State for error that will be thrown. + * @param sqlState SQL State of the error that will be thrown. + * @param message Error message (either string or variable name). + * @param isBuiltinError Whether the error condition is a builtin condition. + * @param msgArguments Error message parameters for builtin conditions. + * @param session Spark session that SQL script is executed within. + * @param origin Origin descriptor for the statement. + */ +class SignalStatementExec( +val errorCondition: String, +val sqlState: String, +val message: Either[String, UnresolvedAttribute], Review Comment: oh, we use `Either` so that `UnresolvedAttribute` is not resolved by the analyzer but in `SignalStatementExec`. Do we have a requirement for doing so? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-51067][SQL] Revert session level collation for DML queries and apply object level collation for DDL queries [spark]
dejankrak-db commented on code in PR #49772: URL: https://github.com/apache/spark/pull/49772#discussion_r1954124114 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveDDLCommandStringTypes.scala: ## @@ -155,22 +123,22 @@ object ResolveDefaultStringTypes extends Rule[LogicalPlan] { dataType.existsRecursively(isDefaultStringType) private def isDefaultStringType(dataType: DataType): Boolean = { +// STRING (without explicit collation) is considered default string type. +// STRING COLLATE (with explicit collation) is not considered +// default string type even when explicit collation is UTF8_BINARY (default collation). dataType match { - case st: StringType => -// should only return true for StringType object and not StringType("UTF8_BINARY") -st.eq(StringType) || st.isInstanceOf[TemporaryStringType] + // should only return true for StringType object and not for StringType("UTF8_BINARY") + case st: StringType => st.eq(StringType) Review Comment: Yes, this is a quite interesting finding, I was surprised too. Good discussion! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-48375][SQL] Add support for SIGNAL statement [spark]
miland-db commented on code in PR #49726: URL: https://github.com/apache/spark/pull/49726#discussion_r1954155098 ## sql/core/src/main/scala/org/apache/spark/sql/scripting/SqlScriptingExecutionNode.scala: ## @@ -1020,3 +1024,107 @@ class ExceptionHandlerExec( override def reset(): Unit = body.reset() } + +/** + * Executable node for Signal Statement. + * @param errorCondition Name of the error condition/SQL State for error that will be thrown. + * @param sqlState SQL State of the error that will be thrown. + * @param message Error message (either string or variable name). + * @param isBuiltinError Whether the error condition is a builtin condition. + * @param msgArguments Error message parameters for builtin conditions. + * @param session Spark session that SQL script is executed within. + * @param origin Origin descriptor for the statement. + */ +class SignalStatementExec( +val errorCondition: String, +val sqlState: String, +val message: Either[String, UnresolvedAttribute], Review Comment: There is no requirement, but for scripting specific statements we usually don't go through Analyzer because we are able to "resolve and execute" them inside scripting engine. In this case, `SignalStatementExec` just keeps information about the exception that needs to be thrown. When we encounter it during script execution, we just read necessary information and throw an exception. Example we used was code for `EXECUTE IMMEDIATE` and how is `query_string` resolved there. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[PR] [SPARK-51181] [SQL] Enforce determinism when pulling out non deterministic expressions from logical plan [spark]
mihailoale-db opened a new pull request, #49935: URL: https://github.com/apache/spark/pull/49935 ### What changes were proposed in this pull request? Enforce determinism when pulling out non deterministic expressions from logical plan. ### Why are the changes needed? This is needed to avoid plan normalization problem when comparing single-pass and fixed point results. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Existing tests. ### Was this patch authored or co-authored using generative AI tooling? No. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-51200][BUILD] Add SparkR deprecation info to `README.md` and `make-distribution.sh` help [spark]
yaooqinn closed pull request #49931: [SPARK-51200][BUILD] Add SparkR deprecation info to `README.md` and `make-distribution.sh` help URL: https://github.com/apache/spark/pull/49931 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-51156][CONNECT] Provide a basic authentication token when running Spark Connect server locally [spark]
HyukjinKwon commented on PR #49880: URL: https://github.com/apache/spark/pull/49880#issuecomment-2656206417 I think we will likely miss RC1 - I will have to be away from keyboard like 3 days. Since technically CVE isn't filed yet, and this is an optional distribution, I think we can go ahead with RC 1. I will try to target RC 2. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-51156][CONNECT] Provide a basic authentication token when running Spark Connect server locally [spark]
HyukjinKwon commented on PR #49880: URL: https://github.com/apache/spark/pull/49880#issuecomment-2656204302 I think I can't just enable SSL by default. We should expose the certificate or use insecure connection. The access token API cannot be used with SSL it seems so I can't reuse this existing token either. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-51198][CORE][DOCS] Revise `defaultMinPartitions` function description [spark]
yaooqinn commented on PR #49929: URL: https://github.com/apache/spark/pull/49929#issuecomment-2656160458 Merged to master, thank you @dongjoon-hyun -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-51156][CONNECT] Provide a basic authentication token when running Spark Connect server locally [spark]
HyukjinKwon commented on code in PR #49880: URL: https://github.com/apache/spark/pull/49880#discussion_r1954272416 ## sql/connect/common/src/main/scala/org/apache/spark/sql/connect/common/config/ConnectCommon.scala: ## @@ -16,9 +16,23 @@ */ package org.apache.spark.sql.connect.common.config +import io.grpc.Metadata + private[sql] object ConnectCommon { val CONNECT_GRPC_BINDING_PORT: Int = 15002 val CONNECT_GRPC_PORT_MAX_RETRIES: Int = 0 val CONNECT_GRPC_MAX_MESSAGE_SIZE: Int = 128 * 1024 * 1024 val CONNECT_GRPC_MARSHALLER_RECURSION_LIMIT: Int = 1024 + val AUTH_TOKEN_META_DATA_KEY: Metadata.Key[String] = +Metadata.Key.of("Authentication", Metadata.ASCII_STRING_MARSHALLER) + + val CONNECT_LOCAL_AUTH_TOKEN_ENV_NAME = "SPARK_CONNECT_LOCAL_AUTH_TOKEN" + private var CONNECT_LOCAL_AUTH_TOKEN: Option[String] = Option( +System.getenv(CONNECT_LOCAL_AUTH_TOKEN_ENV_NAME)) + def setLocalAuthToken(token: String): Unit = CONNECT_LOCAL_AUTH_TOKEN = Option(token) + def getLocalAuthToken: Option[String] = CONNECT_LOCAL_AUTH_TOKEN + def assertLocalAuthToken(token: Option[String]): Unit = token.foreach { t => +assert(CONNECT_LOCAL_AUTH_TOKEN.isDefined) Review Comment: For other places, I will fix it later. But here assert is correct because if `token` is set, `CONNECT_LOCAL_AUTH_TOKEN` must be set to for local usage. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-51156][CONNECT] Provide a basic authentication token when running Spark Connect server locally [spark]
HyukjinKwon commented on code in PR #49880: URL: https://github.com/apache/spark/pull/49880#discussion_r1954273145 ## sql/connect/server/src/main/scala/org/apache/spark/sql/connect/service/LocalAuthInterceptor.scala: ## @@ -0,0 +1,36 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.connect.service + +import io.grpc.{Metadata, ServerCall, ServerCallHandler, ServerInterceptor} + +import org.apache.spark.sql.connect.common.config.ConnectCommon + +/** + * A gRPC interceptor to check if the header contains token for authentication. + */ +class LocalAuthInterceptor extends ServerInterceptor { Review Comment: I will scope it down to local usage for now. Whole this is internal for now, and we don't need to generalize them at this moment. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-46937][SQL] Improve concurrency performance for FunctionRegistry [spark]
LuciferYang commented on code in PR #47084: URL: https://github.com/apache/spark/pull/47084#discussion_r1954366720 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala: ## @@ -306,11 +296,9 @@ class SimpleFunctionRegistry extends SimpleFunctionRegistryBase[Expression] with FunctionRegistry { - override def clone(): SimpleFunctionRegistry = synchronized { + override def clone(): SimpleFunctionRegistry = { Review Comment: Originally, this entire function was atomic. If there are additions or deletions to functionBuilders during the cloning process, it would deviate from the previous behavior. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-51046][SQL][TEST] Reduce `numCols` in `withFilter()` to prevent `SubExprEliminationBenchmark` from failing due to a Codegen error [spark]
wayneguow commented on PR #49938: URL: https://github.com/apache/spark/pull/49938#issuecomment-2657115551 Benchmark jdk17: https://github.com/wayneguow/spark/actions/runs/13310975547 Benchmark jdk21: https://github.com/wayneguow/spark/actions/runs/13310979208 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[PR] [SPARK-51046][SQL][TEST] Reduce `numCols` in `withFilter()` to prevent `SubExprEliminationBenchmark` from failing due to a Codegen error [spark]
wayneguow opened a new pull request, #49938: URL: https://github.com/apache/spark/pull/49938 ### What changes were proposed in this pull request? This PR aims to reduce `numCols` in `withFilter()` to prevent `SubExprEliminationBenchmark` from failing due to a Codegen error. I did some debug investigation and found that in the current master branch, the codegen code has a huge `processNext` method, but in branch-3.5, it is split into many small methods. Because the logic of codegen is different, the previous benchmark cannot run normally. master branch: ``` protected void processNext() throws java.io.IOException { while ( inputadapter_input_0.hasNext()) { ... } } ``` 3.5 branch: ``` public boolean eval(InternalRow i) { boolean value_2997 = Or_498(i); return !globalIsNull_498 && value_2997; } private boolean Or_498(InternalRow i) { ... } ... private boolean Or_(InternalRow i) { ... } ``` ### Why are the changes needed? If we run `SubExprEliminationBenchmark`: ``` build/sbt "sql/Test/runMain org.apache.spark.sql.execution.SubExprEliminationBenchmark" ``` It fails at `CodeGenerator`, details: > [info] Running benchmark: from_json as subExpr in Filter [info] Running case: subExprElimination false, codegen: true [info] 00:08:31.509 ERROR org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator: Failed to compile the generated Java code. [info] org.codehaus.commons.compiler.InternalCompilerException: Compiling "GeneratedClass" in File 'generated.java', Line 1, Column 1: File 'generated.java', Line 24, Column 16: Compiling "processNext()" The root cause is: > [error] Caused by: org.codehaus.commons.compiler.InternalCompilerException: Code grows beyond 64 KB [error] at org.codehaus.janino.CodeContext.makeSpace(CodeContext.java:699) [error] at org.codehaus.janino.CodeContext.write(CodeContext.java:558) [error] at org.codehaus.janino.UnitCompiler.write(UnitCompiler.java:13079) [error] at org.codehaus.janino.UnitCompiler.store(UnitCompiler.java:12752) [error] at org.codehaus.janino.UnitCompiler.store(UnitCompiler.java:12730) [error] at org.codehaus.janino.UnitCompiler.compile2(UnitCompiler.java:2742) [error] ... 164 more ### Does this PR introduce _any_ user-facing change? No, just fix a test benchmark failure. ### How was this patch tested? Run benchmark tests manually. ### Was this patch authored or co-authored using generative AI tooling? No. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-48375][SQL] Add support for SIGNAL statement [spark]
cloud-fan commented on code in PR #49726: URL: https://github.com/apache/spark/pull/49726#discussion_r1954812502 ## sql/core/src/main/scala/org/apache/spark/sql/scripting/SqlScriptingExecutionNode.scala: ## @@ -1020,3 +1024,107 @@ class ExceptionHandlerExec( override def reset(): Unit = body.reset() } + +/** + * Executable node for Signal Statement. + * @param errorCondition Name of the error condition/SQL State for error that will be thrown. + * @param sqlState SQL State of the error that will be thrown. + * @param message Error message (either string or variable name). + * @param isBuiltinError Whether the error condition is a builtin condition. + * @param msgArguments Error message parameters for builtin conditions. + * @param session Spark session that SQL script is executed within. + * @param origin Origin descriptor for the statement. + */ +class SignalStatementExec( +val errorCondition: String, +val sqlState: String, +val message: Either[String, UnresolvedAttribute], Review Comment: Can you give some examples of other scripting plan nodes that do not use analyzer to resolve its expressions? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-48375][SQL] Add support for SIGNAL statement [spark]
cloud-fan commented on code in PR #49726: URL: https://github.com/apache/spark/pull/49726#discussion_r1954815751 ## sql/core/src/test/scala/org/apache/spark/sql/scripting/SqlScriptingExecutionSuite.scala: ## @@ -69,6 +70,222 @@ class SqlScriptingExecutionSuite extends QueryTest with SharedSparkSession { } } + // Signal tests + test("signal statement - condition") { +val sqlScript = + """ +|BEGIN +| DECLARE TEST_CONDITION CONDITION FOR SQLSTATE '12345'; +| SIGNAL TEST_CONDITION; +|END +|""".stripMargin +val exception = intercept[SqlScriptingRuntimeException] { + verifySqlScriptResult(sqlScript, Seq.empty) +} +checkError( Review Comment: Can you also run this test manually and check the user-facing error message? To make sure that the error context (line number) is properly displayed. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-51113][SQL] Fix correctness with UNION/EXCEPT/INTERSECT inside a view or EXECUTE IMMEDIATE [spark]
dongjoon-hyun commented on PR #49937: URL: https://github.com/apache/spark/pull/49937#issuecomment-2657424493 Nice catch! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-51113][SQL] Fix correctness with UNION/EXCEPT/INTERSECT inside a view or EXECUTE IMMEDIATE [spark]
vladimirg-db commented on PR #49937: URL: https://github.com/apache/spark/pull/49937#issuecomment-2657422364 @dongjoon-hyun I figured out the problem with a `sql-udf` test. So what happened is, in the new golden file tests I created a view _and_ a temporary view with the same name `v1`. And afterwards I dropped it only once. I actually didn't know that a regular view may shadow a temporary view with the same name!  -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-51149][CORE] Log classpath in SparkSubmit on ClassNotFoundException [spark]
vrozov commented on PR #49870: URL: https://github.com/apache/spark/pull/49870#issuecomment-2657458078 @dongjoon-hyun Please review -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-51113][SQL] Fix correctness with UNION/EXCEPT/INTERSECT inside a view or EXECUTE IMMEDIATE [spark]
dongjoon-hyun closed pull request #49937: [SPARK-51113][SQL] Fix correctness with UNION/EXCEPT/INTERSECT inside a view or EXECUTE IMMEDIATE URL: https://github.com/apache/spark/pull/49937 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[PR] [SPARK-51206][PYTHON][CONNECT] Move Arrow conversion helpers out of Spark Connect [spark]
wengh opened a new pull request, #49941: URL: https://github.com/apache/spark/pull/49941 ### What changes were proposed in this pull request? Refactor `pyspark.sql.connect.conversion` to move `LocalDataToArrowConversion` and `ArrowTableToRowsConversion` into `pyspark.sql.conversion`. The reason is that `pyspark.sql.connect.conversion` checks for Spark Connect dependencies such as `grpcio` and `pandas`, but `LocalDataToArrowConversion` and `ArrowTableToRowsConversion` don't need these dependencies. `pyspark.sql.connect.conversion` still re-exports the two classes for backward compatibility. ### Why are the changes needed? Python Data Sources should work without Spark Connect dependencies but currently it imports `LocalDataToArrowConversion` and `ArrowTableToRowsConversion` from `pyspark.sql.connect.conversion` making it require unnecessary dependencies. This change moves these two classes to `pyspark.sql.conversion` so that Python Data Sources runs without Spark Connect dependencies. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Existing tests should make sure that the changes don't break anything. TODO: Add new test to ensure that Python Data Sources can be used with minimal dependencies. ### Was this patch authored or co-authored using generative AI tooling? No -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-51205][BUILD][TESTS] Upgrade `bytebuddy` to 1.17.0 to support Java 25 [spark]
dongjoon-hyun commented on PR #49940: URL: https://github.com/apache/spark/pull/49940#issuecomment-2657732901 Could you review this test-dependency PR when you have some time, @huaxingao ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-51205][BUILD][TESTS] Upgrade `bytebuddy` to 1.17.0 to support Java 25 [spark]
dongjoon-hyun commented on PR #49940: URL: https://github.com/apache/spark/pull/49940#issuecomment-2657736954 Thank you, @huaxingao . Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-51205][BUILD][TESTS] Upgrade `bytebuddy` to 1.17.0 to support Java 25 [spark]
dongjoon-hyun closed pull request #49940: [SPARK-51205][BUILD][TESTS] Upgrade `bytebuddy` to 1.17.0 to support Java 25 URL: https://github.com/apache/spark/pull/49940 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-51046][SQL][TEST] Reduce `numCols` in `withFilter()` to prevent `SubExprEliminationBenchmark` from failing due to a Codegen error [spark]
dongjoon-hyun commented on PR #49938: URL: https://github.com/apache/spark/pull/49938#issuecomment-2657165478 cc @panbingkun , @cloud-fan , @LuciferYang It seems that we need to make a decision. Are we good with this codeine perf regression? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-51046][SQL][TEST] Reduce `numCols` in `withFilter()` to prevent `SubExprEliminationBenchmark` from failing due to a Codegen error [spark]
dongjoon-hyun commented on code in PR #49938: URL: https://github.com/apache/spark/pull/49938#discussion_r1954851640 ## sql/core/benchmarks/SubExprEliminationBenchmark-results.txt: ## @@ -3,23 +3,23 @@ Benchmark for performance of subexpression elimination Preparing data for benchmarking ... -OpenJDK 64-Bit Server VM 17.0.12+7-LTS on Linux 6.5.0-1025-azure +OpenJDK 64-Bit Server VM 17.0.14+7-LTS on Linux 6.8.0-1021-azure AMD EPYC 7763 64-Core Processor from_json as subExpr in Project: Best Time(ms) Avg Time(ms) Stdev(ms)Rate(M/s) Per Row(ns) Relative -subExprElimination false, codegen: true6438 6551 98 0.064378783.5 1.0X -subExprElimination false, codegen: false 6216 6320 175 0.062161826.1 1.0X -subExprElimination true, codegen: true 1480 1518 39 0.014799890.8 4.3X -subExprElimination true, codegen: false1321 1429 94 0.013212919.6 4.9X +subExprElimination false, codegen: true6398 6601 177 0.063980070.0 1.0X +subExprElimination false, codegen: false 6018 6137 123 0.060178371.6 1.1X +subExprElimination true, codegen: true 1288 1352 78 0.012883220.0 5.0X +subExprElimination true, codegen: false1299 1340 66 0.012989399.4 4.9X Preparing data for benchmarking ... -OpenJDK 64-Bit Server VM 17.0.12+7-LTS on Linux 6.5.0-1025-azure +OpenJDK 64-Bit Server VM 17.0.14+7-LTS on Linux 6.8.0-1021-azure AMD EPYC 7763 64-Core Processor from_json as subExpr in Filter: Best Time(ms) Avg Time(ms) Stdev(ms)Rate(M/s) Per Row(ns) Relative -subExprElimination false, codegen: true7107 7310 207 0.071066752.8 1.0X -subExprElimination false, codegen: false 6738 6781 41 0.067375897.0 1.1X -subExprElimination true, codegen: true 2052 2110 51 0.020519152.3 3.5X -subExprElimination true, codegen: false2053 2079 33 0.020526629.8 3.5X +subExprElimination false, codegen: true2474 2512 44 0.024744107.3 1.0X +subExprElimination false, codegen: false 2231 2246 20 0.022306061.2 1.1X +subExprElimination true, codegen: true 2408 2509 100 0.024084091.2 1.0X Review Comment: Yes, it's true. The performance regression was the actual root cause why we didn't make a decision. Here, FYI. - https://github.com/apache/spark/pull/49411 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-48375][SQL] Add support for SIGNAL statement [spark]
cloud-fan commented on code in PR #49726: URL: https://github.com/apache/spark/pull/49726#discussion_r1954813899 ## sql/core/src/main/scala/org/apache/spark/sql/scripting/SqlScriptingExecutionNode.scala: ## @@ -1020,3 +1024,107 @@ class ExceptionHandlerExec( override def reset(): Unit = body.reset() } + +/** + * Executable node for Signal Statement. + * @param errorCondition Name of the error condition/SQL State for error that will be thrown. + * @param sqlState SQL State of the error that will be thrown. + * @param message Error message (either string or variable name). + * @param isBuiltinError Whether the error condition is a builtin condition. + * @param msgArguments Error message parameters for builtin conditions. + * @param session Spark session that SQL script is executed within. + * @param origin Origin descriptor for the statement. + */ +class SignalStatementExec( +val errorCondition: String, +val sqlState: String, +val message: Either[String, UnresolvedAttribute], +val msgArguments: Option[UnresolvedAttribute], +val isBuiltinError: Boolean, +val session: SparkSession, +override val origin: Origin) + extends LeafStatementExec + with ColumnResolutionHelper with WithOrigin { Review Comment: we don't need to implement `ColumnResolutionHelper` if we let analyzer to resolve the variables. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org