Re: [PR] [SPARK-51179][SQL] Refactor SupportsOrderingWithinGroup so that centralized check [spark]

2025-02-13 Thread via GitHub
beliefer commented on PR #49908: URL: https://github.com/apache/spark/pull/49908#issuecomment-2655798525 ping @cloud-fan cc @mikhailnik-db -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spe

Re: [PR] [SPARK-42746][SQL][FOLLOWUP] Improve the code for SupportsOrderingWithinGroup and Mode [spark]

2025-02-13 Thread via GitHub
cloud-fan commented on code in PR #49907: URL: https://github.com/apache/spark/pull/49907#discussion_r1954008498 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/percentiles.scala: ## @@ -390,10 +390,6 @@ case class PercentileCont(left: Expressi

Re: [PR] [SPARK-51067][SQL] Revert session level collation for DML queries and apply object level collation for DDL queries [spark]

2025-02-13 Thread via GitHub
cloud-fan commented on code in PR #49772: URL: https://github.com/apache/spark/pull/49772#discussion_r1954019289 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveDDLCommandStringTypes.scala: ## @@ -155,22 +123,22 @@ object ResolveDefaultStringTypes ex

Re: [PR] [WIP][SPARK-7101][SQL] support java.sql.Time [spark]

2025-02-13 Thread via GitHub
MaxGekk commented on PR #28858: URL: https://github.com/apache/spark/pull/28858#issuecomment-2655900699 I sent the SPIP https://issues.apache.org/jira/browse/SPARK-51162 to dev list for discussion ([link](https://lists.apache.org/thread/892vkskktqrx1czk9wm6l8vchpydrny2)). @younggyuchun @do

Re: [PR] [SPARK-48530][SQL] Support for local variables in SQL Scripting [spark]

2025-02-13 Thread via GitHub
cloud-fan commented on code in PR #49445: URL: https://github.com/apache/spark/pull/49445#discussion_r1954078549 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/ParserUtils.scala: ## @@ -350,3 +357,12 @@ class SqlScriptingLabelContext { } } } + +obje

Re: [PR] [SPARK-51210][CORE] Add `--enable-native-access=ALL-UNNAMED` to Java options for Java 24+ [spark]

2025-02-13 Thread via GitHub
dongjoon-hyun closed pull request #49944: [SPARK-51210][CORE] Add `--enable-native-access=ALL-UNNAMED` to Java options for Java 24+ URL: https://github.com/apache/spark/pull/49944 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub a

Re: [PR] [SPARK-51210][CORE] Add `--enable-native-access=ALL-UNNAMED` to Java options for Java 24+ [spark]

2025-02-13 Thread via GitHub
dongjoon-hyun commented on PR #49944: URL: https://github.com/apache/spark/pull/49944#issuecomment-2658003644 Thank you for review and approval, @viirya ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above t

Re: [PR] [SPARK-48375][SQL] Add support for SIGNAL statement [spark]

2025-02-13 Thread via GitHub
cloud-fan commented on code in PR #49726: URL: https://github.com/apache/spark/pull/49726#discussion_r1955489143 ## sql/core/src/main/scala/org/apache/spark/sql/scripting/SqlScriptingExecutionNode.scala: ## @@ -1020,3 +1024,107 @@ class ExceptionHandlerExec( override def re

Re: [PR] [SPARK-42746][SQL][FOLLOWUP] Improve the code for SupportsOrderingWithinGroup and Mode [spark]

2025-02-13 Thread via GitHub
cloud-fan commented on code in PR #49907: URL: https://github.com/apache/spark/pull/49907#discussion_r1955422516 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/percentiles.scala: ## @@ -390,10 +390,6 @@ case class PercentileCont(left: Expressi

Re: [PR] [SPARK-51210][CORE] Add `--enable-native-access=ALL-UNNAMED` to Java options for Java 24+ [spark]

2025-02-13 Thread via GitHub
LuciferYang commented on PR #49944: URL: https://github.com/apache/spark/pull/49944#issuecomment-2658349981 late LGTM -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To uns

[PR] [SPARK-51215][ML][PYTHON][CONNECT] Add a helper function to invoke helper model attr [spark]

2025-02-13 Thread via GitHub
zhengruifeng opened a new pull request, #49951: URL: https://github.com/apache/spark/pull/49951 ### What changes were proposed in this pull request? Add a helper function to invoke helper model attr ### Why are the changes needed? deduplicate code ### Does th

Re: [PR] [SPARK-51213][SQL] Keep Expression class info when resolving hint parameters [spark]

2025-02-13 Thread via GitHub
yaooqinn commented on PR #49950: URL: https://github.com/apache/spark/pull/49950#issuecomment-2658350458 Thank you @dongjoon-hyun, I've passed the failed test locally. Let's run another round in GA. -- This is an automated message from the Apache Git Service. To respond to the message, pl

Re: [PR] [SPARK-51119][SQL][FOLLOW-UP] ColumnDefinition.toV1Column should preserve EXISTS_DEFAULT resolution [spark]

2025-02-13 Thread via GitHub
cloud-fan commented on PR #49947: URL: https://github.com/apache/spark/pull/49947#issuecomment-2658350592 thanks, merging to master/4.0! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specif

Re: [PR] [SPARK-51119][SQL][FOLLOW-UP] ColumnDefinition.toV1Column should preserve EXISTS_DEFAULT resolution [spark]

2025-02-13 Thread via GitHub
cloud-fan closed pull request #49947: [SPARK-51119][SQL][FOLLOW-UP] ColumnDefinition.toV1Column should preserve EXISTS_DEFAULT resolution URL: https://github.com/apache/spark/pull/49947 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to Gi

Re: [PR] [SPARK-51204][BUILD] Upgrade `sbt-assembly` to 2.3.1 [spark]

2025-02-13 Thread via GitHub
pan3793 commented on PR #49939: URL: https://github.com/apache/spark/pull/49939#issuecomment-2658374563 According to the release notes, it is better to go 4.0 too. > - Fixes assemblyOutputPath > - Fixes assemblyExcludedJars Spark uses these features, but I am not sure if it i

Re: [PR] [SPARK-51209][CORE][FOLLOWUP] Use `user.name` system property first as a fallback [spark]

2025-02-13 Thread via GitHub
dongjoon-hyun closed pull request #49949: [SPARK-51209][CORE][FOLLOWUP] Use `user.name` system property first as a fallback URL: https://github.com/apache/spark/pull/49949 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use t

Re: [PR] [SPARK-51209][CORE][FOLLOWUP] Use `user.name` system property first as a fallback [spark]

2025-02-13 Thread via GitHub
dongjoon-hyun commented on PR #49949: URL: https://github.com/apache/spark/pull/49949#issuecomment-2658378854 `Document generations` passed at the last commit. Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitH

Re: [PR] [SPARK-51208][SQL] `ColumnDefinition.toV1Column` should preserve `EXISTS_DEFAULT` resolution [spark]

2025-02-13 Thread via GitHub
cloud-fan commented on code in PR #49942: URL: https://github.com/apache/spark/pull/49942#discussion_r1955382758 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/ColumnDefinition.scala: ## @@ -75,7 +75,7 @@ case class ColumnDefinition( // For v1

[PR] [SPARK-51212][PYTHON] Add a separated PySpark package for Spark Connect by default [spark]

2025-02-13 Thread via GitHub
ueshin opened a new pull request, #49946: URL: https://github.com/apache/spark/pull/49946 ### What changes were proposed in this pull request? Adds a separated PySpark package for Spark Connect by default. - Rename `pyspark-connect` to `pyspark-client` - Add a new `pyspark-co

Re: [PR] [SPARK-51208][SQL] `ColumnDefinition.toV1Column` should preserve `EXISTS_DEFAULT` resolution [spark]

2025-02-13 Thread via GitHub
cloud-fan commented on code in PR #49942: URL: https://github.com/apache/spark/pull/49942#discussion_r1955421786 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/ColumnDefinition.scala: ## @@ -75,7 +75,7 @@ case class ColumnDefinition( // For v1

Re: [PR] [SPARK-51185][Core] Revert simplifications to PartitionedFileUtil API to reduce memory requirements [spark]

2025-02-13 Thread via GitHub
LukasRupprecht commented on PR #49915: URL: https://github.com/apache/spark/pull/49915#issuecomment-2658110441 Thanks @cloud-fan for merging this! Will prepare a separate PR for 3.5. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to Gi

Re: [PR] [SPARK-51177][PYTHON][CONNECT] Add `InvalidCommandInput` to Spark Connect Python client [spark]

2025-02-13 Thread via GitHub
itholic closed pull request #49916: [SPARK-51177][PYTHON][CONNECT] Add `InvalidCommandInput` to Spark Connect Python client URL: https://github.com/apache/spark/pull/49916 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use t

Re: [PR] [SPARK-51177][PYTHON][CONNECT] Add `InvalidCommandInput` to Spark Connect Python client [spark]

2025-02-13 Thread via GitHub
itholic commented on PR #49916: URL: https://github.com/apache/spark/pull/49916#issuecomment-2658076422 Merged to master and branch-4.0 Thanks @HyukjinKwon @zhengruifeng for the review. -- This is an automated message from the Apache Git Service. To respond to the message, please log on

Re: [PR] [SPARK-48375][SQL] Add support for SIGNAL statement [spark]

2025-02-13 Thread via GitHub
cloud-fan commented on code in PR #49726: URL: https://github.com/apache/spark/pull/49726#discussion_r1955489998 ## sql/core/src/main/scala/org/apache/spark/sql/scripting/SqlScriptingExecutionNode.scala: ## @@ -1020,3 +1024,107 @@ class ExceptionHandlerExec( override def re

Re: [PR] [SPARK-51095][CORE][SQL] Include caller context for hdfs audit logs for calls from driver [spark]

2025-02-13 Thread via GitHub
pan3793 commented on code in PR #49814: URL: https://github.com/apache/spark/pull/49814#discussion_r1955489715 ## sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala: ## @@ -171,6 +172,11 @@ private[hive] class HiveClientImpl( private def newState():

Re: [PR] [SPARK-51095][CORE][SQL] Include caller context for hdfs audit logs for calls from driver [spark]

2025-02-13 Thread via GitHub
sririshindra commented on code in PR #49814: URL: https://github.com/apache/spark/pull/49814#discussion_r1955487024 ## sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala: ## @@ -171,6 +172,11 @@ private[hive] class HiveClientImpl( private def newSta

Re: [PR] [SPARK-50655][SS] Move virtual col family related mapping into db layer instead of encoder [spark]

2025-02-13 Thread via GitHub
anishshri-db commented on code in PR #49304: URL: https://github.com/apache/spark/pull/49304#discussion_r1955675678 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDBFileManager.scala: ## @@ -931,7 +941,8 @@ case class RocksDBCheckpointMetadata(

Re: [PR] [SPARK-51210][CORE] Add `--enable-native-access=ALL-UNNAMED` to Java options for Java 24+ [spark]

2025-02-13 Thread via GitHub
dongjoon-hyun commented on PR #49944: URL: https://github.com/apache/spark/pull/49944#issuecomment-2657968814 Could you review this `Java option` PR when you have some time, @viirya ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to G

Re: [PR] [SPARK-51208][SQL] `ColumnDefinition.toV1Column` should preserve `EXISTS_DEFAULT` resolution [spark]

2025-02-13 Thread via GitHub
cloud-fan commented on code in PR #49942: URL: https://github.com/apache/spark/pull/49942#discussion_r1955383049 ## sql/catalyst/src/test/scala/org/apache/spark/sql/types/StructTypeSuite.scala: ## @@ -798,4 +800,35 @@ class StructTypeSuite extends SparkFunSuite with SQLHelper {

Re: [PR] [SPARK-51209][CORE] Improve `getCurrentUserName` to handle Java 24+ [spark]

2025-02-13 Thread via GitHub
dongjoon-hyun closed pull request #49943: [SPARK-51209][CORE] Improve `getCurrentUserName` to handle Java 24+ URL: https://github.com/apache/spark/pull/49943 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above t

Re: [PR] [SPARK-51209][CORE] Improve `getCurrentUserName` to handle Java 24+ [spark]

2025-02-13 Thread via GitHub
dongjoon-hyun commented on PR #49943: URL: https://github.com/apache/spark/pull/49943#issuecomment-2657967822 Merged to master for Apache Spark 4.1.0. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [PR] [SPARK-51209][CORE] Improve `getCurrentUserName` to handle Java 24+ [spark]

2025-02-13 Thread via GitHub
dongjoon-hyun commented on PR #49943: URL: https://github.com/apache/spark/pull/49943#issuecomment-2657966885 Thank you, @huaxingao ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-51204][BUILD] Upgrade `sbt-assembly` to 2.3.1 [spark]

2025-02-13 Thread via GitHub
wayneguow commented on PR #49939: URL: https://github.com/apache/spark/pull/49939#issuecomment-2658092268 > Could you re-trigger the failed K8s test, @wayneguow ? Although it looks irrelevant, let's make it sure. After re-running, it was successful. However, this page does not yet sho

[PR] [SPARK-51119][SQL][FOLLOW-UP] ColumnDefinition.toV1Column should preserve EXISTS_DEFAULT resolution [spark]

2025-02-13 Thread via GitHub
szehon-ho opened a new pull request, #49947: URL: https://github.com/apache/spark/pull/49947 Small follow up for: https://github.com/apache/spark/pull/49942 ### What changes were proposed in this pull request? Restrict the logic of https://github.com/apache/spark/pull/49942 to o

Re: [PR] [SPARK-51216][BUILD] Remove the useless `bigtop-dist` profile and the related outdated files [spark]

2025-02-13 Thread via GitHub
LuciferYang commented on PR #49952: URL: https://github.com/apache/spark/pull/49952#issuecomment-2658399233 cc @pan3793 @dongjoon-hyun -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specifi

[PR] [BUILD] Remove the useless `bigtop-dist` profile [spark]

2025-02-13 Thread via GitHub
LuciferYang opened a new pull request, #49952: URL: https://github.com/apache/spark/pull/49952 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How

Re: [PR] [SPARK-51186][PYTHON] Add `StreamingPythonRunnerInitializationException` to PySpark base exception [spark]

2025-02-13 Thread via GitHub
itholic closed pull request #49917: [SPARK-51186][PYTHON] Add `StreamingPythonRunnerInitializationException` to PySpark base exception URL: https://github.com/apache/spark/pull/49917 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHu

Re: [PR] [SPARK-51186][PYTHON] Add `StreamingPythonRunnerInitializationException` to PySpark base exception [spark]

2025-02-13 Thread via GitHub
itholic commented on PR #49917: URL: https://github.com/apache/spark/pull/49917#issuecomment-2658414570 Merged to master and branch-4.0 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specifi

Re: [PR] [SPARK-46937][SQL] Improve concurrency performance for FunctionRegistry [spark]

2025-02-13 Thread via GitHub
beliefer commented on code in PR #47084: URL: https://github.com/apache/spark/pull/47084#discussion_r1955643680 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala: ## @@ -306,11 +296,9 @@ class SimpleFunctionRegistry extends SimpleF

Re: [PR] [SPARK-46937][SQL] Improve concurrency performance for FunctionRegistry [spark]

2025-02-13 Thread via GitHub
beliefer commented on code in PR #47084: URL: https://github.com/apache/spark/pull/47084#discussion_r1955643680 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala: ## @@ -306,11 +296,9 @@ class SimpleFunctionRegistry extends SimpleF

Re: [PR] [SPARK-51067][SQL] Revert session level collation for DML queries and apply object level collation for DDL queries [spark]

2025-02-13 Thread via GitHub
cloud-fan commented on code in PR #49772: URL: https://github.com/apache/spark/pull/49772#discussion_r1954022933 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveDDLCommandStringTypes.scala: ## @@ -18,80 +18,48 @@ package org.apache.spark.sql.catalys

Re: [PR] [SPARK-51146][SQL][FOLLOWUP] Respect system env `SPARK_CONNECT_MODE` in places that access the api mode config [spark]

2025-02-13 Thread via GitHub
cloud-fan commented on PR #49930: URL: https://github.com/apache/spark/pull/49930#issuecomment-2655862681 @dongjoon-hyun I'd like the cut rc1 on time to kick off the release cycle. We can fail rc1 if there are many pending work, which will likely happen. -- This is an automated message fr

Re: [PR] [SPARK-48530][SQL] Support for local variables in SQL Scripting [spark]

2025-02-13 Thread via GitHub
cloud-fan commented on code in PR #49445: URL: https://github.com/apache/spark/pull/49445#discussion_r1954045545 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ColumnResolutionHelper.scala: ## @@ -266,22 +275,41 @@ trait ColumnResolutionHelper extends Logg

Re: [PR] [SPARK-28973][SQL] Add `TimeType` and support `java.time.LocalTime` as its external type. [spark]

2025-02-13 Thread via GitHub
MaxGekk commented on PR #25678: URL: https://github.com/apache/spark/pull/25678#issuecomment-2655891624 I sent the SPIP https://issues.apache.org/jira/browse/SPARK-51162 to dev list for discussion ([link](https://lists.apache.org/thread/892vkskktqrx1czk9wm6l8vchpydrny2)). @zeddit @Fokko @t

Re: [PR] [WIP][SPARK-24815] [CORE] Trigger Interval based DRA for Structured Streaming [spark]

2025-02-13 Thread via GitHub
WebdeveloperIsaac commented on PR #42352: URL: https://github.com/apache/spark/pull/42352#issuecomment-2655892185 Can this be reopened +1? this feature will be of great help please don't keep this parked.. -- This is an automated message from the Apache Git Service. To respond to the mess

Re: [PR] [SPARK-48530][SQL] Support for local variables in SQL Scripting [spark]

2025-02-13 Thread via GitHub
dusantism-db commented on code in PR #49445: URL: https://github.com/apache/spark/pull/49445#discussion_r1954447510 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ColumnResolutionHelper.scala: ## @@ -266,22 +275,41 @@ trait ColumnResolutionHelper extends L

Re: [PR] [SPARK-42746][SQL][FOLLOWUP] Improve the code for SupportsOrderingWithinGroup and Mode [spark]

2025-02-13 Thread via GitHub
beliefer commented on code in PR #49907: URL: https://github.com/apache/spark/pull/49907#discussion_r1954137856 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/percentiles.scala: ## @@ -390,10 +390,6 @@ case class PercentileCont(left: Expressio

[PR] [SPARK-51202][ML][PYTHON] Pass the session in meta algorithm python writers [spark]

2025-02-13 Thread via GitHub
zhengruifeng opened a new pull request, #49932: URL: https://github.com/apache/spark/pull/49932 ### What changes were proposed in this pull request? Pass the session in meta algorithm python writers ### Why are the changes needed? try the best to avoid recreating sessions

Re: [PR] [SPARK-51067][SQL] Revert session level collation for DML queries and apply object level collation for DDL queries [spark]

2025-02-13 Thread via GitHub
dejankrak-db commented on code in PR #49772: URL: https://github.com/apache/spark/pull/49772#discussion_r1954127700 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveDDLCommandStringTypes.scala: ## @@ -18,80 +18,48 @@ package org.apache.spark.sql.cata

Re: [PR] [SPARK-51067][SQL] Revert session level collation for DML queries and apply object level collation for DDL queries [spark]

2025-02-13 Thread via GitHub
dejankrak-db commented on code in PR #49772: URL: https://github.com/apache/spark/pull/49772#discussion_r1954134354 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveDDLCommandStringTypes.scala: ## @@ -18,80 +18,48 @@ package org.apache.spark.sql.cata

Re: [PR] [SPARK-51067][SQL] Revert session level collation for DML queries and apply object level collation for DDL queries [spark]

2025-02-13 Thread via GitHub
dejankrak-db commented on code in PR #49772: URL: https://github.com/apache/spark/pull/49772#discussion_r1954132769 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveDDLCommandStringTypes.scala: ## @@ -155,22 +123,22 @@ object ResolveDefaultStringTypes

Re: [PR] [SPARK-51202][ML][PYTHON] Pass the session in meta algorithm python writers [spark]

2025-02-13 Thread via GitHub
zhengruifeng commented on PR #49932: URL: https://github.com/apache/spark/pull/49932#issuecomment-2656017987 This PR is for master-only (4.1) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the s

[PR] [SPARK-49479][CORE] Cancel the Timer non-daemon thread on stopping the BarrierCoordinator [spark]

2025-02-13 Thread via GitHub
bcheena opened a new pull request, #49934: URL: https://github.com/apache/spark/pull/49934 ### What changes were proposed in this pull request? Cancelling the `Timer` non-daemon thread on stopping the `BarrierCoordinator` ### Why are the changes needed? In Barrier Execution M

Re: [PR] [SPARK-51198][CORE][DOCS] Revise `defaultMinPartitions` function description [spark]

2025-02-13 Thread via GitHub
yaooqinn closed pull request #49929: [SPARK-51198][CORE][DOCS] Revise `defaultMinPartitions` function description URL: https://github.com/apache/spark/pull/49929 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abo

Re: [PR] [SPARK-51200][BUILD] Add SparkR deprecation info to `README.md` and `make-distribution.sh` help [spark]

2025-02-13 Thread via GitHub
yaooqinn commented on PR #49931: URL: https://github.com/apache/spark/pull/49931#issuecomment-2656165484 Merged to master and branch-4.0, thank you @dongjoon-hyun -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the UR

Re: [PR] [SPARK-51181] [SQL] Enforce determinism when pulling out non deterministic expressions from logical plan [spark]

2025-02-13 Thread via GitHub
mihailoale-db commented on PR #49935: URL: https://github.com/apache/spark/pull/49935#issuecomment-2656322301 @MaxGekk @cloud-fan ptal when you have time. Thanks -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[PR] [SPARK-51203] Enhance force optimize skewed join [spark]

2025-02-13 Thread via GitHub
wForget opened a new pull request, #49936: URL: https://github.com/apache/spark/pull/49936 ### What changes were proposed in this pull request? ### Why are the changes needed? ForceOptimizeSkewedJoin allows optimizing skewed join even if introduce extra shuffl

Re: [PR] [SPARK-51181] [SQL] Enforce determinism when pulling out non deterministic expressions from logical plan [spark]

2025-02-13 Thread via GitHub
mihailotim-db commented on code in PR #49935: URL: https://github.com/apache/spark/pull/49935#discussion_r1954499398 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/PullOutNondeterministic.scala: ## @@ -55,9 +57,9 @@ object PullOutNondeterministic extends R

Re: [PR] [SPARK-51181] [SQL] Enforce determinism when pulling out non deterministic expressions from logical plan [spark]

2025-02-13 Thread via GitHub
mihailotim-db commented on code in PR #49935: URL: https://github.com/apache/spark/pull/49935#discussion_r1954499398 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/PullOutNondeterministic.scala: ## @@ -55,9 +57,9 @@ object PullOutNondeterministic extends R

Re: [PR] [SPARK-51183][SQL] Link to Parquet spec in Variant docs [spark]

2025-02-13 Thread via GitHub
cashmand commented on PR #49910: URL: https://github.com/apache/spark/pull/49910#issuecomment-2656729838 Hi @dongjoon-hyun, I updated the text to clarify the current status of the project. Let me know what you think. -- This is an automated message from the Apache Git Service. To respond

Re: [PR] [SPARK-51146][SQL][FOLLOWUP] Respect system env `SPARK_CONNECT_MODE` in places that access the api mode config [spark]

2025-02-13 Thread via GitHub
dongjoon-hyun commented on PR #49930: URL: https://github.com/apache/spark/pull/49930#issuecomment-2656819482 Ya, I agree with you, @cloud-fan . > I'd like the cut rc1 on time to kick off the release cycle. We can fail rc1 if there are many pending work, which will likely happen. -- Th

Re: [PR] [SPARK-51200][BUILD] Add SparkR deprecation info to `README.md` and `make-distribution.sh` help [spark]

2025-02-13 Thread via GitHub
dongjoon-hyun commented on PR #49931: URL: https://github.com/apache/spark/pull/49931#issuecomment-2656806168 Thank you, @yaooqinn ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific c

Re: [PR] [SPARK-51198][CORE][DOCS] Revise `defaultMinPartitions` function description [spark]

2025-02-13 Thread via GitHub
dongjoon-hyun commented on PR #49929: URL: https://github.com/apache/spark/pull/49929#issuecomment-2656805060 Thank you, @yaooqinn ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific c

Re: [PR] [SPARK-51201][SQL] Make Partitioning Hints support byte and short values [spark]

2025-02-13 Thread via GitHub
dongjoon-hyun commented on code in PR #49933: URL: https://github.com/apache/spark/pull/49933#discussion_r1954654468 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveHints.scala: ## @@ -174,9 +174,23 @@ object ResolveHints { * COALESCE Hint accept

Re: [PR] [SPARK-51201][SQL] Make Partitioning Hints support byte and short values [spark]

2025-02-13 Thread via GitHub
dongjoon-hyun commented on code in PR #49933: URL: https://github.com/apache/spark/pull/49933#discussion_r1954656932 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveHints.scala: ## @@ -174,9 +174,23 @@ object ResolveHints { * COALESCE Hint accept

Re: [PR] [SPARK-51181] [SQL] Enforce determinism when pulling out non deterministic expressions from logical plan [spark]

2025-02-13 Thread via GitHub
cloud-fan commented on PR #49935: URL: https://github.com/apache/spark/pull/49935#issuecomment-2656875411 thanks, merging to master/4.0! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specif

Re: [PR] [SPARK-51181] [SQL] Enforce determinism when pulling out non deterministic expressions from logical plan [spark]

2025-02-13 Thread via GitHub
cloud-fan closed pull request #49935: [SPARK-51181] [SQL] Enforce determinism when pulling out non deterministic expressions from logical plan URL: https://github.com/apache/spark/pull/49935 -- This is an automated message from the Apache Git Service. To respond to the message, please log on

Re: [PR] [SPARK-51067][SQL] Revert session level collation for DML queries and apply object level collation for DDL queries [spark]

2025-02-13 Thread via GitHub
cloud-fan commented on PR #49772: URL: https://github.com/apache/spark/pull/49772#issuecomment-2656883176 thanks, merging to master/4.0! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specif

Re: [PR] [SPARK-51067][SQL] Revert session level collation for DML queries and apply object level collation for DDL queries [spark]

2025-02-13 Thread via GitHub
cloud-fan closed pull request #49772: [SPARK-51067][SQL] Revert session level collation for DML queries and apply object level collation for DDL queries URL: https://github.com/apache/spark/pull/49772 -- This is an automated message from the Apache Git Service. To respond to the message, plea

[PR] [SPARK-51113][SQL] Fix correctness with UNION/EXCEPT/INTERSECT inside a view or EXECUTE IMMEDIATE [spark]

2025-02-13 Thread via GitHub
vladimirg-db opened a new pull request, #49937: URL: https://github.com/apache/spark/pull/49937 ### What changes were proposed in this pull request? Fix correctness with UNION/EXCEPT/INTERSECT inside a view or `EXECUTE IMMEDIATE`. In the following examples the SQL Parser consid

[PR] [SPARK-51201][SQL] Make Partitioning Hints support byte and short values [spark]

2025-02-13 Thread via GitHub
yaooqinn opened a new pull request, #49933: URL: https://github.com/apache/spark/pull/49933 ### What changes were proposed in this pull request? The `Dataset.hint` method takes `Any*` as parameters for both the partition number and other columns, and the `Partitioning Hints` resolutio

Re: [PR] [SPARK-48530][SQL] Support for local variables in SQL Scripting [spark]

2025-02-13 Thread via GitHub
dusantism-db commented on code in PR #49445: URL: https://github.com/apache/spark/pull/49445#discussion_r1954282034 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/ParserUtils.scala: ## @@ -350,3 +357,12 @@ class SqlScriptingLabelContext { } } } + +o

Re: [PR] [SPARK-48375][SQL] Add support for SIGNAL statement [spark]

2025-02-13 Thread via GitHub
cloud-fan commented on code in PR #49726: URL: https://github.com/apache/spark/pull/49726#discussion_r1954098174 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala: ## @@ -329,6 +376,32 @@ class AstBuilder extends DataTypeAstBuilder

Re: [PR] [SPARK-48375][SQL] Add support for SIGNAL statement [spark]

2025-02-13 Thread via GitHub
cloud-fan commented on code in PR #49726: URL: https://github.com/apache/spark/pull/49726#discussion_r1954099260 ## sql/catalyst/src/main/scala/org/apache/spark/sql/exceptions/SqlScriptingRuntimeException.scala: ## @@ -0,0 +1,70 @@ +/* + * Licensed to the Apache Software Foundat

Re: [PR] [SPARK-48375][SQL] Add support for SIGNAL statement [spark]

2025-02-13 Thread via GitHub
cloud-fan commented on code in PR #49726: URL: https://github.com/apache/spark/pull/49726#discussion_r1954110539 ## sql/core/src/main/scala/org/apache/spark/sql/scripting/SqlScriptingExecutionNode.scala: ## @@ -1020,3 +1024,107 @@ class ExceptionHandlerExec( override def re

Re: [PR] [SPARK-51067][SQL] Revert session level collation for DML queries and apply object level collation for DDL queries [spark]

2025-02-13 Thread via GitHub
dejankrak-db commented on code in PR #49772: URL: https://github.com/apache/spark/pull/49772#discussion_r1954124114 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveDDLCommandStringTypes.scala: ## @@ -155,22 +123,22 @@ object ResolveDefaultStringTypes

Re: [PR] [SPARK-48375][SQL] Add support for SIGNAL statement [spark]

2025-02-13 Thread via GitHub
miland-db commented on code in PR #49726: URL: https://github.com/apache/spark/pull/49726#discussion_r1954155098 ## sql/core/src/main/scala/org/apache/spark/sql/scripting/SqlScriptingExecutionNode.scala: ## @@ -1020,3 +1024,107 @@ class ExceptionHandlerExec( override def re

[PR] [SPARK-51181] [SQL] Enforce determinism when pulling out non deterministic expressions from logical plan [spark]

2025-02-13 Thread via GitHub
mihailoale-db opened a new pull request, #49935: URL: https://github.com/apache/spark/pull/49935 ### What changes were proposed in this pull request? Enforce determinism when pulling out non deterministic expressions from logical plan. ### Why are the changes needed? This is nee

Re: [PR] [SPARK-51200][BUILD] Add SparkR deprecation info to `README.md` and `make-distribution.sh` help [spark]

2025-02-13 Thread via GitHub
yaooqinn closed pull request #49931: [SPARK-51200][BUILD] Add SparkR deprecation info to `README.md` and `make-distribution.sh` help URL: https://github.com/apache/spark/pull/49931 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

Re: [PR] [SPARK-51156][CONNECT] Provide a basic authentication token when running Spark Connect server locally [spark]

2025-02-13 Thread via GitHub
HyukjinKwon commented on PR #49880: URL: https://github.com/apache/spark/pull/49880#issuecomment-2656206417 I think we will likely miss RC1 - I will have to be away from keyboard like 3 days. Since technically CVE isn't filed yet, and this is an optional distribution, I think we can go ahea

Re: [PR] [SPARK-51156][CONNECT] Provide a basic authentication token when running Spark Connect server locally [spark]

2025-02-13 Thread via GitHub
HyukjinKwon commented on PR #49880: URL: https://github.com/apache/spark/pull/49880#issuecomment-2656204302 I think I can't just enable SSL by default. We should expose the certificate or use insecure connection. The access token API cannot be used with SSL it seems so I can't reuse this ex

Re: [PR] [SPARK-51198][CORE][DOCS] Revise `defaultMinPartitions` function description [spark]

2025-02-13 Thread via GitHub
yaooqinn commented on PR #49929: URL: https://github.com/apache/spark/pull/49929#issuecomment-2656160458 Merged to master, thank you @dongjoon-hyun -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go t

Re: [PR] [SPARK-51156][CONNECT] Provide a basic authentication token when running Spark Connect server locally [spark]

2025-02-13 Thread via GitHub
HyukjinKwon commented on code in PR #49880: URL: https://github.com/apache/spark/pull/49880#discussion_r1954272416 ## sql/connect/common/src/main/scala/org/apache/spark/sql/connect/common/config/ConnectCommon.scala: ## @@ -16,9 +16,23 @@ */ package org.apache.spark.sql.connec

Re: [PR] [SPARK-51156][CONNECT] Provide a basic authentication token when running Spark Connect server locally [spark]

2025-02-13 Thread via GitHub
HyukjinKwon commented on code in PR #49880: URL: https://github.com/apache/spark/pull/49880#discussion_r1954273145 ## sql/connect/server/src/main/scala/org/apache/spark/sql/connect/service/LocalAuthInterceptor.scala: ## @@ -0,0 +1,36 @@ +/* + * Licensed to the Apache Software Fo

Re: [PR] [SPARK-46937][SQL] Improve concurrency performance for FunctionRegistry [spark]

2025-02-13 Thread via GitHub
LuciferYang commented on code in PR #47084: URL: https://github.com/apache/spark/pull/47084#discussion_r1954366720 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala: ## @@ -306,11 +296,9 @@ class SimpleFunctionRegistry extends Simp

Re: [PR] [SPARK-51046][SQL][TEST] Reduce `numCols` in `withFilter()` to prevent `SubExprEliminationBenchmark` from failing due to a Codegen error [spark]

2025-02-13 Thread via GitHub
wayneguow commented on PR #49938: URL: https://github.com/apache/spark/pull/49938#issuecomment-2657115551 Benchmark jdk17: https://github.com/wayneguow/spark/actions/runs/13310975547 Benchmark jdk21: https://github.com/wayneguow/spark/actions/runs/13310979208 -- This is an automated mes

[PR] [SPARK-51046][SQL][TEST] Reduce `numCols` in `withFilter()` to prevent `SubExprEliminationBenchmark` from failing due to a Codegen error [spark]

2025-02-13 Thread via GitHub
wayneguow opened a new pull request, #49938: URL: https://github.com/apache/spark/pull/49938 ### What changes were proposed in this pull request? This PR aims to reduce `numCols` in `withFilter()` to prevent `SubExprEliminationBenchmark` from failing due to a Codegen error.

Re: [PR] [SPARK-48375][SQL] Add support for SIGNAL statement [spark]

2025-02-13 Thread via GitHub
cloud-fan commented on code in PR #49726: URL: https://github.com/apache/spark/pull/49726#discussion_r1954812502 ## sql/core/src/main/scala/org/apache/spark/sql/scripting/SqlScriptingExecutionNode.scala: ## @@ -1020,3 +1024,107 @@ class ExceptionHandlerExec( override def re

Re: [PR] [SPARK-48375][SQL] Add support for SIGNAL statement [spark]

2025-02-13 Thread via GitHub
cloud-fan commented on code in PR #49726: URL: https://github.com/apache/spark/pull/49726#discussion_r1954815751 ## sql/core/src/test/scala/org/apache/spark/sql/scripting/SqlScriptingExecutionSuite.scala: ## @@ -69,6 +70,222 @@ class SqlScriptingExecutionSuite extends QueryTest

Re: [PR] [SPARK-51113][SQL] Fix correctness with UNION/EXCEPT/INTERSECT inside a view or EXECUTE IMMEDIATE [spark]

2025-02-13 Thread via GitHub
dongjoon-hyun commented on PR #49937: URL: https://github.com/apache/spark/pull/49937#issuecomment-2657424493 Nice catch! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [PR] [SPARK-51113][SQL] Fix correctness with UNION/EXCEPT/INTERSECT inside a view or EXECUTE IMMEDIATE [spark]

2025-02-13 Thread via GitHub
vladimirg-db commented on PR #49937: URL: https://github.com/apache/spark/pull/49937#issuecomment-2657422364 @dongjoon-hyun I figured out the problem with a `sql-udf` test. So what happened is, in the new golden file tests I created a view _and_ a temporary view with the same name `v1`. And

Re: [PR] [SPARK-51149][CORE] Log classpath in SparkSubmit on ClassNotFoundException [spark]

2025-02-13 Thread via GitHub
vrozov commented on PR #49870: URL: https://github.com/apache/spark/pull/49870#issuecomment-2657458078 @dongjoon-hyun Please review -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific co

Re: [PR] [SPARK-51113][SQL] Fix correctness with UNION/EXCEPT/INTERSECT inside a view or EXECUTE IMMEDIATE [spark]

2025-02-13 Thread via GitHub
dongjoon-hyun closed pull request #49937: [SPARK-51113][SQL] Fix correctness with UNION/EXCEPT/INTERSECT inside a view or EXECUTE IMMEDIATE URL: https://github.com/apache/spark/pull/49937 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[PR] [SPARK-51206][PYTHON][CONNECT] Move Arrow conversion helpers out of Spark Connect [spark]

2025-02-13 Thread via GitHub
wengh opened a new pull request, #49941: URL: https://github.com/apache/spark/pull/49941 ### What changes were proposed in this pull request? Refactor `pyspark.sql.connect.conversion` to move `LocalDataToArrowConversion` and `ArrowTableToRowsConversion` into `pyspark.sql.

Re: [PR] [SPARK-51205][BUILD][TESTS] Upgrade `bytebuddy` to 1.17.0 to support Java 25 [spark]

2025-02-13 Thread via GitHub
dongjoon-hyun commented on PR #49940: URL: https://github.com/apache/spark/pull/49940#issuecomment-2657732901 Could you review this test-dependency PR when you have some time, @huaxingao ? -- This is an automated message from the Apache Git Service. To respond to the message, please log o

Re: [PR] [SPARK-51205][BUILD][TESTS] Upgrade `bytebuddy` to 1.17.0 to support Java 25 [spark]

2025-02-13 Thread via GitHub
dongjoon-hyun commented on PR #49940: URL: https://github.com/apache/spark/pull/49940#issuecomment-2657736954 Thank you, @huaxingao . Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to g

Re: [PR] [SPARK-51205][BUILD][TESTS] Upgrade `bytebuddy` to 1.17.0 to support Java 25 [spark]

2025-02-13 Thread via GitHub
dongjoon-hyun closed pull request #49940: [SPARK-51205][BUILD][TESTS] Upgrade `bytebuddy` to 1.17.0 to support Java 25 URL: https://github.com/apache/spark/pull/49940 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the UR

Re: [PR] [SPARK-51046][SQL][TEST] Reduce `numCols` in `withFilter()` to prevent `SubExprEliminationBenchmark` from failing due to a Codegen error [spark]

2025-02-13 Thread via GitHub
dongjoon-hyun commented on PR #49938: URL: https://github.com/apache/spark/pull/49938#issuecomment-2657165478 cc @panbingkun , @cloud-fan , @LuciferYang It seems that we need to make a decision. Are we good with this codeine perf regression? -- This is an automated message from th

Re: [PR] [SPARK-51046][SQL][TEST] Reduce `numCols` in `withFilter()` to prevent `SubExprEliminationBenchmark` from failing due to a Codegen error [spark]

2025-02-13 Thread via GitHub
dongjoon-hyun commented on code in PR #49938: URL: https://github.com/apache/spark/pull/49938#discussion_r1954851640 ## sql/core/benchmarks/SubExprEliminationBenchmark-results.txt: ## @@ -3,23 +3,23 @@ Benchmark for performance of subexpression elimination

Re: [PR] [SPARK-48375][SQL] Add support for SIGNAL statement [spark]

2025-02-13 Thread via GitHub
cloud-fan commented on code in PR #49726: URL: https://github.com/apache/spark/pull/49726#discussion_r1954813899 ## sql/core/src/main/scala/org/apache/spark/sql/scripting/SqlScriptingExecutionNode.scala: ## @@ -1020,3 +1024,107 @@ class ExceptionHandlerExec( override def re

  1   2   >