Re: [PR] [SPARK-50751][SQL] Assign appropriate error condition for `_LEGACY_ERROR_TEMP_1305`: `UNSUPPORTED_TABLE_CHANGE_IN_JDBC_CATALOG` [spark]

2025-01-20 Thread via GitHub
MaxGekk closed pull request #49395: [SPARK-50751][SQL] Assign appropriate error condition for `_LEGACY_ERROR_TEMP_1305`: `UNSUPPORTED_TABLE_CHANGE_IN_JDBC_CATALOG` URL: https://github.com/apache/spark/pull/49395 -- This is an automated message from the Apache Git Service. To respond to the m

Re: [PR] [SPARK-50751][SQL] Assign appropriate error condition for `_LEGACY_ERROR_TEMP_1305`: `UNSUPPORTED_TABLE_CHANGE_IN_JDBC_CATALOG` [spark]

2025-01-20 Thread via GitHub
MaxGekk commented on PR #49395: URL: https://github.com/apache/spark/pull/49395#issuecomment-2603895035 +1, LGTM. Merging to master/4.0. Thank you, @itholic. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL a

Re: [PR] [SPARK-50792][SQL][FOLLOWUP] Improve the push down information for binary [spark]

2025-01-20 Thread via GitHub
MaxGekk commented on code in PR #49555: URL: https://github.com/apache/spark/pull/49555#discussion_r1923216437 ## sql/catalyst/src/main/scala/org/apache/spark/sql/connector/expressions/expressions.scala: ## @@ -388,12 +388,13 @@ private[sql] object HoursTransform { } private

Re: [PR] [SPARK-50895][SQL] Create common interface for expressions which produce default string type [spark]

2025-01-20 Thread via GitHub
MaxGekk commented on code in PR #49576: URL: https://github.com/apache/spark/pull/49576#discussion_r1923211274 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collationExpressions.scala: ## @@ -151,12 +151,15 @@ case class ResolvedCollation(collationName

Re: [PR] [WIP][SPARK-50838][SQL]Performs additional checks inside recursive CTEs to throw an error if forbidden case is encountered [spark]

2025-01-20 Thread via GitHub
cloud-fan commented on code in PR #49518: URL: https://github.com/apache/spark/pull/49518#discussion_r1923141166 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveWithCTE.scala: ## @@ -49,17 +51,27 @@ object ResolveWithCTE extends Rule[LogicalPlan] {

Re: [PR] [SPARK-50718][PYTHON] Support `addArtifact(s)` for PySpark [spark]

2025-01-20 Thread via GitHub
itholic commented on PR #49572: URL: https://github.com/apache/spark/pull/49572#issuecomment-2603765598 Merged to master and create separate PR for branch-4.0: https://github.com/apache/spark/pull/49583. Thanks @HyukjinKwon for the review. -- This is an automated message from the A

Re: [PR] [WIP][SPARK-50838][SQL]Performs additional checks inside recursive CTEs to throw an error if forbidden case is encountered [spark]

2025-01-20 Thread via GitHub
cloud-fan commented on code in PR #49518: URL: https://github.com/apache/spark/pull/49518#discussion_r1923135639 ## common/utils/src/main/resources/error/error-conditions.json: ## @@ -3117,6 +3117,29 @@ ], "sqlState" : "42602" }, + "INVALID_RECURSIVE_REFERENCE" : {

Re: [PR] [4.0][SPARK-50718][PYTHON] Support `addArtifact(s)` for PySpark [spark]

2025-01-20 Thread via GitHub
itholic commented on PR #49583: URL: https://github.com/apache/spark/pull/49583#issuecomment-2603764762 cc @HyukjinKwon -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[PR] [SPARK-50718][PYTHON] Support `addArtifact(s)` for PySpark [spark]

2025-01-20 Thread via GitHub
itholic opened a new pull request, #49583: URL: https://github.com/apache/spark/pull/49583 ### What changes were proposed in this pull request? This PR proposes to support `addArtifact(s)` for PySpark Cherry-pick https://github.com/apache/spark/pull/49572 for 4.0 ### Why

Re: [PR] [SPARK-50718][PYTHON] Support `addArtifact(s)` for PySpark [spark]

2025-01-20 Thread via GitHub
itholic closed pull request #49572: [SPARK-50718][PYTHON] Support `addArtifact(s)` for PySpark URL: https://github.com/apache/spark/pull/49572 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spe

Re: [PR] [SPARK-50902][CORE][K8S][TESTS] Add `CRC32C` test cases [spark]

2025-01-20 Thread via GitHub
dongjoon-hyun commented on PR #49582: URL: https://github.com/apache/spark/pull/49582#issuecomment-2603757387 Thank you, @yaooqinn ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific c

Re: [PR] [SPARK-50902][CORE][K8S][TESTS] Add `CRC32C` test cases [spark]

2025-01-20 Thread via GitHub
dongjoon-hyun commented on PR #49582: URL: https://github.com/apache/spark/pull/49582#issuecomment-2603733674 Could you review this PR, @yaooqinn ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[PR] [SPARK-50902][CORE][K8S][TESTS] Add `CRC32C` test cases [spark]

2025-01-20 Thread via GitHub
dongjoon-hyun opened a new pull request, #49582: URL: https://github.com/apache/spark/pull/49582 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### H

Re: [PR] [SPARK-50091][SQL] Handle case of aggregates in left-hand operand of IN-subquery [spark]

2025-01-20 Thread via GitHub
bersprockets commented on code in PR #48627: URL: https://github.com/apache/spark/pull/48627#discussion_r1923076982 ## sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/RewriteSubquerySuite.scala: ## @@ -79,4 +80,20 @@ class RewriteSubquerySuite extends PlanTes

Re: [PR] [SPARK-50091][SQL] Handle case of aggregates in left-hand operand of IN-subquery [spark]

2025-01-20 Thread via GitHub
bersprockets commented on code in PR #48627: URL: https://github.com/apache/spark/pull/48627#discussion_r1923076021 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/subquery.scala: ## @@ -246,46 +267,242 @@ object RewritePredicateSubquery extends Rule[Logi

Re: [PR] [SPARK-50879][ML][PYTHON][CONNECT] Support feature scalers on Connect [spark]

2025-01-20 Thread via GitHub
wbo4958 commented on code in PR #49581: URL: https://github.com/apache/spark/pull/49581#discussion_r1923064064 ## mllib-local/src/main/scala/org/apache/spark/ml/linalg/Vectors.scala: ## @@ -240,6 +240,8 @@ sealed trait Vector extends Serializable { @Since("2.0.0") object Vecto

Re: [PR] [SPARK-50879][ML][PYTHON][CONNECT] Support feature scalers on Connect [spark]

2025-01-20 Thread via GitHub
wbo4958 commented on code in PR #49581: URL: https://github.com/apache/spark/pull/49581#discussion_r1923063332 ## mllib-local/src/main/scala/org/apache/spark/ml/linalg/Vectors.scala: ## @@ -240,6 +240,8 @@ sealed trait Vector extends Serializable { @Since("2.0.0") object Vecto

Re: [PR] [SPARK-50879][ML][PYTHON][CONNECT] Support feature scalers on Connect [spark]

2025-01-20 Thread via GitHub
zhengruifeng commented on PR #49581: URL: https://github.com/apache/spark/pull/49581#issuecomment-2603634625 cc @wbo4958 @HyukjinKwon -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[PR] [SPARK-50879][ML][PYTHON][CONNECT] Support feature scalers on Connect [spark]

2025-01-20 Thread via GitHub
zhengruifeng opened a new pull request, #49581: URL: https://github.com/apache/spark/pull/49581 ### What changes were proposed in this pull request? Support feature scalers on Connect: - org.apache.spark.ml.feature.StandardScaler - org.apache.spark.ml.feature.MaxAbsScaler

Re: [PR] [SPARK-50870][SQL] Add the timezone when casting to timestamp in V2ScanRelationPushDown [spark]

2025-01-20 Thread via GitHub
cloud-fan commented on PR #49549: URL: https://github.com/apache/spark/pull/49549#issuecomment-2603610075 After v2 filter translation, ideally v2 sources won't evaluate the Spark `Cast` directly. I think it's only a problem for advanced Spark users who customize the `V2ScanRelationPushDown`

Re: [PR] [SPARK-50897][ML][CONNECT] Avoiding instance creation in ServiceLoader [spark]

2025-01-20 Thread via GitHub
grundprinzip closed pull request #49577: [SPARK-50897][ML][CONNECT] Avoiding instance creation in ServiceLoader URL: https://github.com/apache/spark/pull/49577 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [PR] [SPARK-50897][ML][CONNECT] Avoiding instance creation in ServiceLoader [spark]

2025-01-20 Thread via GitHub
grundprinzip commented on PR #49577: URL: https://github.com/apache/spark/pull/49577#issuecomment-2603623311 merging to master and branch-4.0 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the s

Re: [PR] [SPARK-50875][SQL] Add RTRIM collations to TVF [spark]

2025-01-20 Thread via GitHub
cloud-fan commented on PR #49554: URL: https://github.com/apache/spark/pull/49554#issuecomment-2603612480 thanks, merging to master/4.0! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the sp

Re: [PR] [SPARK-50875][SQL] Add RTRIM collations to TVF [spark]

2025-01-20 Thread via GitHub
cloud-fan closed pull request #49554: [SPARK-50875][SQL] Add RTRIM collations to TVF URL: https://github.com/apache/spark/pull/49554 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comm

Re: [PR] [SPARK-50870][SQL] Add the timezone when casting to timestamp in V2ScanRelationPushDown [spark]

2025-01-20 Thread via GitHub
cloud-fan closed pull request #49549: [SPARK-50870][SQL] Add the timezone when casting to timestamp in V2ScanRelationPushDown URL: https://github.com/apache/spark/pull/49549 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

Re: [PR] [SPARK-50894][SQL] Postgres driver version bump to 42.7.5 [spark]

2025-01-20 Thread via GitHub
cloud-fan closed pull request #49575: [SPARK-50894][SQL] Postgres driver version bump to 42.7.5 URL: https://github.com/apache/spark/pull/49575 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the sp

Re: [PR] [SPARK-50894][SQL] Postgres driver version bump to 42.7.5 [spark]

2025-01-20 Thread via GitHub
cloud-fan commented on PR #49575: URL: https://github.com/apache/spark/pull/49575#issuecomment-2603604069 The failed yarn test is unrelated, merging to master/4.0! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the UR

Re: [PR] [SPARK-50900][ML][CONNECT] Add VectorUDT and MatrixUDT to ProtoDataTypes [spark]

2025-01-20 Thread via GitHub
zhengruifeng closed pull request #49580: [SPARK-50900][ML][CONNECT] Add VectorUDT and MatrixUDT to ProtoDataTypes URL: https://github.com/apache/spark/pull/49580 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abo

Re: [PR] [SPARK-50900][ML][CONNECT] Add VectorUDT and MatrixUDT to ProtoDataTypes [spark]

2025-01-20 Thread via GitHub
zhengruifeng commented on PR #49580: URL: https://github.com/apache/spark/pull/49580#issuecomment-2603583880 thanks, merged to master/4.0 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the speci

Re: [PR] [SPARK-XXXX][ML][CONNECT] Avoiding instance creation in ServiceLoader [spark]

2025-01-20 Thread via GitHub
HyukjinKwon commented on code in PR #49577: URL: https://github.com/apache/spark/pull/49577#discussion_r1923016989 ## sql/connect/server/src/main/scala/org/apache/spark/sql/connect/ml/MLUtils.scala: ## @@ -50,8 +51,18 @@ private[ml] object MLUtils { private def loadOperators(

Re: [PR] [SPARK-50898][ML][PYTHON][CONNECT] Support `FPGrowth` on connect [spark]

2025-01-20 Thread via GitHub
zhengruifeng commented on PR #49579: URL: https://github.com/apache/spark/pull/49579#issuecomment-2603544101 cc @wbo4958 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[PR] [SPARK-50900][ML][CONNECT] Add VectorUDT and MatrixUDT to ProtoDataTypes [spark]

2025-01-20 Thread via GitHub
zhengruifeng opened a new pull request, #49580: URL: https://github.com/apache/spark/pull/49580 ### What changes were proposed in this pull request? Add VectorUDT and MatrixUDT to ProtoDataTypes ### Why are the changes needed? 1, to avoid recreating the protobuf messa

Re: [PR] [SPARK-50855][SS][CONNECT] Spark Connect Support for TransformWithState [spark]

2025-01-20 Thread via GitHub
anishshri-db commented on PR #49488: URL: https://github.com/apache/spark/pull/49488#issuecomment-2603535495 Is the CI failure related - https://github.com/jingz-db/spark/actions/runs/12837471455/job/35801692179 ? -- This is an automated message from the Apache Git Service. To respond to

[PR] [SPARK-50898][ML][PYTHON][CONNECT] Support `FPGrowth` on connect [spark]

2025-01-20 Thread via GitHub
zhengruifeng opened a new pull request, #49579: URL: https://github.com/apache/spark/pull/49579 ### What changes were proposed in this pull request? Support `FPGrowth` on connect ### Why are the changes needed? for feature parity ### Does this PR introduce _any_ us

Re: [PR] [SPARK-50718][PYTHON] Support `addArtifact(s)` for PySpark [spark]

2025-01-20 Thread via GitHub
itholic commented on code in PR #49572: URL: https://github.com/apache/spark/pull/49572#discussion_r1922994221 ## python/pyspark/sql/tests/test_artifact.py: ## @@ -0,0 +1,94 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreem

Re: [PR] [SPARK-50718][PYTHON] Support `addArtifact(s)` for PySpark [spark]

2025-01-20 Thread via GitHub
itholic commented on code in PR #49572: URL: https://github.com/apache/spark/pull/49572#discussion_r1922994221 ## python/pyspark/sql/tests/test_artifact.py: ## @@ -0,0 +1,94 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreem

Re: [PR] [SPARK-XXXX][ML][CONNECT] Avoiding instance creation in ServiceLoader [spark]

2025-01-20 Thread via GitHub
wbo4958 commented on PR #49577: URL: https://github.com/apache/spark/pull/49577#issuecomment-2603508131 Hi @grundprinzip, I created a jira for this PR, https://issues.apache.org/jira/browse/SPARK-50897, if possible, please change the title to [SPARK-50897] -- This is an automated messag

Re: [PR] [SPARK-XXXX][ML][CONNECT] Avoiding instance creation in ServiceLoader [spark]

2025-01-20 Thread via GitHub
grundprinzip commented on code in PR #49577: URL: https://github.com/apache/spark/pull/49577#discussion_r1922990873 ## sql/connect/server/src/main/scala/org/apache/spark/sql/connect/ml/MLUtils.scala: ## @@ -50,8 +51,18 @@ private[ml] object MLUtils { private def loadOperators

Re: [PR] [SPARK-50751][SQL] Assign appropriate error condition for `_LEGACY_ERROR_TEMP_1305`: `UNSUPPORTED_TABLE_CHANGE_IN_JDBC_CATALOG` [spark]

2025-01-20 Thread via GitHub
itholic commented on code in PR #49395: URL: https://github.com/apache/spark/pull/49395#discussion_r1922986736 ## sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/v2/jdbc/JDBCTableCatalogSuite.scala: ## @@ -444,9 +444,11 @@ class JDBCTableCatalogSuite extends Q

Re: [PR] [SPARK-XXXX][ML][CONNECT] Avoiding instance creation in ServiceLoader [spark]

2025-01-20 Thread via GitHub
wbo4958 commented on code in PR #49577: URL: https://github.com/apache/spark/pull/49577#discussion_r1922971672 ## sql/connect/server/src/main/scala/org/apache/spark/sql/connect/ml/MLUtils.scala: ## @@ -50,8 +51,18 @@ private[ml] object MLUtils { private def loadOperators(mlCl

Re: [PR] [SPARK-50844][ML][CONNECT] Make model be loaded by ServiceLoader when loading [spark]

2025-01-20 Thread via GitHub
zhengruifeng commented on PR #49569: URL: https://github.com/apache/spark/pull/49569#issuecomment-2603464640 Merged to master/4.0 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comm

Re: [PR] [SPARK-50844][ML][CONNECT] Make model be loaded by ServiceLoader when loading [spark]

2025-01-20 Thread via GitHub
zhengruifeng closed pull request #49569: [SPARK-50844][ML][CONNECT] Make model be loaded by ServiceLoader when loading URL: https://github.com/apache/spark/pull/49569 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the UR

Re: [PR] [SPARK-50582][SQL][PYTHON] Add quote builtin function [spark]

2025-01-20 Thread via GitHub
dongjoon-hyun commented on code in PR #49191: URL: https://github.com/apache/spark/pull/49191#discussion_r1922953699 ## sql/core/src/test/resources/sql-functions/sql-expression-schema.md: ## @@ -165,8 +165,8 @@ | org.apache.spark.sql.catalyst.expressions.If | if | SELECT if(1 <

Re: [PR] [SPARK-50896][INFRA] Add Daily Maven CI to branch-4.0 [spark]

2025-01-20 Thread via GitHub
dongjoon-hyun commented on PR #49578: URL: https://github.com/apache/spark/pull/49578#issuecomment-2603436069 Thank you, @zhengruifeng . Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above t

Re: [PR] [SPARK-50896][INFRA] Add Daily Maven CI to branch-4.0 [spark]

2025-01-20 Thread via GitHub
dongjoon-hyun closed pull request #49578: [SPARK-50896][INFRA] Add Daily Maven CI to branch-4.0 URL: https://github.com/apache/spark/pull/49578 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the sp

Re: [PR] [SPARK-50718][PYTHON] Support `addArtifact(s)` for PySpark [spark]

2025-01-20 Thread via GitHub
itholic commented on code in PR #49572: URL: https://github.com/apache/spark/pull/49572#discussion_r1922943368 ## python/pyspark/sql/session.py: ## @@ -2100,16 +2104,36 @@ def addArtifacts( file : bool Add a file to be downloaded with this Spark job on ever

Re: [PR] [SPARK-XXXX][ML][CONNECT] Avoiding instance creation in ServiceLoader [spark]

2025-01-20 Thread via GitHub
HyukjinKwon commented on code in PR #49577: URL: https://github.com/apache/spark/pull/49577#discussion_r1922946621 ## sql/connect/server/src/main/scala/org/apache/spark/sql/connect/ml/MLUtils.scala: ## @@ -50,8 +51,18 @@ private[ml] object MLUtils { private def loadOperators(

Re: [PR] [SPARK-XXXX][ML][CONNECT] Avoiding instance creation in ServiceLoader [spark]

2025-01-20 Thread via GitHub
HyukjinKwon commented on code in PR #49577: URL: https://github.com/apache/spark/pull/49577#discussion_r1922945632 ## sql/connect/server/src/main/scala/org/apache/spark/sql/connect/ml/MLUtils.scala: ## @@ -50,8 +51,18 @@ private[ml] object MLUtils { private def loadOperators(

Re: [PR] [SPARK-50718][PYTHON] Support `addArtifact(s)` for PySpark [spark]

2025-01-20 Thread via GitHub
itholic commented on code in PR #49572: URL: https://github.com/apache/spark/pull/49572#discussion_r1922940544 ## python/pyspark/sql/session.py: ## @@ -2100,16 +2104,36 @@ def addArtifacts( file : bool Add a file to be downloaded with this Spark job on ever

Re: [PR] [SPARK-XXXX][ML][CONNECT] Avoiding instance creation in ServiceLoader [spark]

2025-01-20 Thread via GitHub
wbo4958 commented on PR #49577: URL: https://github.com/apache/spark/pull/49577#issuecomment-2603401729 That's a wonderful perf improvement. Thx Martin -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to g

Re: [PR] [SPARK-39901][CORE][SQL] Redesign `ignoreCorruptFiles` to make it more accurate by adding a new config `spark.files.ignoreCorruptFiles.errorClasses` [spark]

2025-01-20 Thread via GitHub
github-actions[bot] commented on PR #47090: URL: https://github.com/apache/spark/pull/47090#issuecomment-2603400122 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.

Re: [PR] [SPARK-48922][SQL] Optimize nested data type insertion performance [spark]

2025-01-20 Thread via GitHub
github-actions[bot] closed pull request #47381: [SPARK-48922][SQL] Optimize nested data type insertion performance URL: https://github.com/apache/spark/pull/47381 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL ab

Re: [PR] [SPARK-50735][CONNECT] Failure in ExecuteResponseObserver results in infinite reattaching requests [spark]

2025-01-20 Thread via GitHub
HyukjinKwon closed pull request #49370: [SPARK-50735][CONNECT] Failure in ExecuteResponseObserver results in infinite reattaching requests URL: https://github.com/apache/spark/pull/49370 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to G

Re: [PR] [SPARK-50735][CONNECT] Failure in ExecuteResponseObserver results in infinite reattaching requests [spark]

2025-01-20 Thread via GitHub
HyukjinKwon commented on PR #49370: URL: https://github.com/apache/spark/pull/49370#issuecomment-2603380649 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] [SPARK-50718][PYTHON] Support `addArtifact(s)` for PySpark [spark]

2025-01-20 Thread via GitHub
HyukjinKwon commented on code in PR #49572: URL: https://github.com/apache/spark/pull/49572#discussion_r1922918869 ## python/pyspark/sql/session.py: ## @@ -2100,16 +2104,36 @@ def addArtifacts( file : bool Add a file to be downloaded with this Spark job on

Re: [PR] [SPARK-50718][PYTHON] Support `addArtifact(s)` for PySpark [spark]

2025-01-20 Thread via GitHub
HyukjinKwon commented on code in PR #49572: URL: https://github.com/apache/spark/pull/49572#discussion_r1922918527 ## python/pyspark/sql/session.py: ## @@ -2100,16 +2104,36 @@ def addArtifacts( file : bool Add a file to be downloaded with this Spark job on

Re: [PR] [SPARK-50718][PYTHON] Support `addArtifact(s)` for PySpark [spark]

2025-01-20 Thread via GitHub
HyukjinKwon commented on code in PR #49572: URL: https://github.com/apache/spark/pull/49572#discussion_r1922918130 ## python/pyspark/sql/tests/test_artifact.py: ## @@ -0,0 +1,94 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license ag

Re: [PR] [SPARK-50718][PYTHON] Support `addArtifact(s)` for PySpark [spark]

2025-01-20 Thread via GitHub
HyukjinKwon commented on code in PR #49572: URL: https://github.com/apache/spark/pull/49572#discussion_r1922917740 ## python/pyspark/sql/session.py: ## @@ -39,6 +40,7 @@ ) from pyspark.conf import SparkConf +from pyspark.core.files import SparkFiles Review Comment: Let's

Re: [PR] [SPARK-50893][CONNECT] Mark UDT.DataType optional [spark]

2025-01-20 Thread via GitHub
HyukjinKwon commented on PR #49574: URL: https://github.com/apache/spark/pull/49574#issuecomment-2603372736 Merged to master and branch-4.0. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the sp

Re: [PR] [SPARK-50893][CONNECT] Mark UDT.DataType optional [spark]

2025-01-20 Thread via GitHub
HyukjinKwon closed pull request #49574: [SPARK-50893][CONNECT] Mark UDT.DataType optional URL: https://github.com/apache/spark/pull/49574 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-47573][K8S] Support custom driver log url [spark]

2025-01-20 Thread via GitHub
EnricoMi commented on PR #45728: URL: https://github.com/apache/spark/pull/45728#issuecomment-2603210753 @dongjoon-hyun fixed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] [SPARK-50582][SQL][PYTHON] Add quote builtin function [spark]

2025-01-20 Thread via GitHub
sarutak commented on code in PR #49191: URL: https://github.com/apache/spark/pull/49191#discussion_r1922832773 ## sql/core/src/test/resources/sql-functions/sql-expression-schema.md: ## @@ -165,8 +165,8 @@ | org.apache.spark.sql.catalyst.expressions.If | if | SELECT if(1 < 2, 'a

Re: [PR] [SPARK-50582][SQL][PYTHON] Add quote builtin function [spark]

2025-01-20 Thread via GitHub
sarutak commented on code in PR #49191: URL: https://github.com/apache/spark/pull/49191#discussion_r1922832773 ## sql/core/src/test/resources/sql-functions/sql-expression-schema.md: ## @@ -165,8 +165,8 @@ | org.apache.spark.sql.catalyst.expressions.If | if | SELECT if(1 < 2, 'a

[PR] [SPARK-50896][INFRA] Add Daily Maven CI to branch-4.0 [spark]

2025-01-20 Thread via GitHub
dongjoon-hyun opened a new pull request, #49578: URL: https://github.com/apache/spark/pull/49578 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### H

Re: [PR] [SPARK-50895] Create common interface for expressions which produce default string type [spark]

2025-01-20 Thread via GitHub
stefankandic commented on code in PR #49576: URL: https://github.com/apache/spark/pull/49576#discussion_r1922732288 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Expression.scala: ## @@ -1462,3 +1462,11 @@ case class MultiCommutativeOp( override p

Re: [PR] [SPARK-50870][SQL] Add the timezone when casting to timestamp in V2ScanRelationPushDown [spark]

2025-01-20 Thread via GitHub
MaxGekk commented on PR #49549: URL: https://github.com/apache/spark/pull/49549#issuecomment-2602991541 > This query fails to execute because the injected cast expression lacks the timezone information BTW, @changgyoopark-db can you add a test for the case? -- This is an automated

Re: [PR] [SPARK-50894] Postgres driver version bump to 42.7.5 [spark]

2025-01-20 Thread via GitHub
MaxGekk commented on PR #49575: URL: https://github.com/apache/spark/pull/49575#issuecomment-2602972349 @vladanvasi-db Could you set proper tag in PR's title, please. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

Re: [PR] [SPARK-50895] Create common interface for expressions which produce default string type [spark]

2025-01-20 Thread via GitHub
MaxGekk commented on code in PR #49576: URL: https://github.com/apache/spark/pull/49576#discussion_r1922710739 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Expression.scala: ## @@ -1462,3 +1462,11 @@ case class MultiCommutativeOp( override protec

Re: [PR] [SPARK-49872][CORE] allow unlimited json size again [spark]

2025-01-20 Thread via GitHub
dongjoon-hyun commented on PR #49163: URL: https://github.com/apache/spark/pull/49163#issuecomment-2602922665 Thank you for updating. Could you make CI happy, @steven-aerts ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

Re: [PR] [WIP][SPARK-50838][SQL]Performs additional checks inside recursive CTEs to throw an error if forbidden case is encountered [spark]

2025-01-20 Thread via GitHub
milanisvet commented on PR #49518: URL: https://github.com/apache/spark/pull/49518#issuecomment-2602908969 The whole checkRecursion logic is now rewritten and placed in ResolveWithCTE as discussed offline. One note here: I am still not sure if we should keep datatype check since it thro

Re: [PR] [WIP][SPARK-50838][SQL]Performs additional checks inside recursive CTEs to throw an error if forbidden case is encountered [spark]

2025-01-20 Thread via GitHub
milanisvet commented on code in PR #49518: URL: https://github.com/apache/spark/pull/49518#discussion_r1922678957 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala: ## @@ -1043,6 +1044,75 @@ trait CheckAnalysis extends PredicateHelper with

Re: [PR] [WIP][SPARK-50838][SQL]Performs additional checks inside recursive CTEs to throw an error if forbidden case is encountered [spark]

2025-01-20 Thread via GitHub
milanisvet commented on code in PR #49518: URL: https://github.com/apache/spark/pull/49518#discussion_r1922678467 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala: ## @@ -1043,6 +1044,75 @@ trait CheckAnalysis extends PredicateHelper with

Re: [PR] [SPARK-49872][CORE] allow unlimited json size again [spark]

2025-01-20 Thread via GitHub
steven-aerts commented on code in PR #49163: URL: https://github.com/apache/spark/pull/49163#discussion_r1922611837 ## core/src/main/scala/org/apache/spark/util/JsonProtocol.scala: ## @@ -69,6 +69,11 @@ private[spark] object JsonProtocol extends JsonUtils { private[util] v

Re: [PR] [SPARK-50844][ML][CONNECT] make model be loaded by ServiceLoader when loading [spark]

2025-01-20 Thread via GitHub
grundprinzip commented on PR #49569: URL: https://github.com/apache/spark/pull/49569#issuecomment-2602758547 @wbo4958 cf https://github.com/apache/spark/pull/49577 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the UR

[PR] [SPARK-XXXX][ML][CONNECT] Avoiding instance creation in ServiceLoader [spark]

2025-01-20 Thread via GitHub
grundprinzip opened a new pull request, #49577: URL: https://github.com/apache/spark/pull/49577 ### What changes were proposed in this pull request? When converting the iterator of the ServiceLoader call to Scala, we explicitly instantiate all classes that the service loader provides. Sin

Re: [PR] [SPARK-19335][SPARK-38200][SQL] Add upserts for writing to JDBC [spark]

2025-01-20 Thread via GitHub
EnricoMi commented on PR #49528: URL: https://github.com/apache/spark/pull/49528#issuecomment-2602740419 @MaxGekk thanks for the comments, all addressed in [a03345c0](https://github.com/G-Research/spark/commit/a03345c001f3e58ae80e21abd85b3806e251017d). -- This is an automated message from

Re: [PR] [DRAFT] Generalize collation coercion [spark]

2025-01-20 Thread via GitHub
stefankandic closed pull request #49112: [DRAFT] Generalize collation coercion URL: https://github.com/apache/spark/pull/49112 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. T

Re: [PR] [DRAFT] Add back casting for collated types [spark]

2025-01-20 Thread via GitHub
stefankandic closed pull request #49129: [DRAFT] Add back casting for collated types URL: https://github.com/apache/spark/pull/49129 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comm

Re: [PR] [SPARK-50735][CONNECT] Failure in ExecuteResponseObserver results in infinite reattaching requests [spark]

2025-01-20 Thread via GitHub
changgyoopark-db commented on PR #49370: URL: https://github.com/apache/spark/pull/49370#issuecomment-2602680034 @HyukjinKwon Hi, can you please merge this PR? Thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use th

[PR] [SPARK-50895] Create common interface for expressions which produce default string type [spark]

2025-01-20 Thread via GitHub
stefankandic opened a new pull request, #49576: URL: https://github.com/apache/spark/pull/49576 ### What changes were proposed in this pull request? Introducing a new interface `DefaultStringProducingExpression` which should be inherited by all expressions that produce default st

Re: [PR] [WIP][SPARK-50838][SQL]Add checkRecursion to check if all the rules about recursive queries are fulfilled. Adjust optimizer with UnionLoop cases. [spark]

2025-01-20 Thread via GitHub
milanisvet commented on code in PR #49518: URL: https://github.com/apache/spark/pull/49518#discussion_r1922522863 ## common/utils/src/main/resources/error/error-conditions.json: ## @@ -3099,6 +3099,29 @@ ], "sqlState" : "42602" }, + "INVALID_RECURSIVE_REFERENCE" :

Re: [PR] [WIP][SPARK-50838][SQL]Add checkRecursion to check if all the rules about recursive queries are fulfilled. Adjust optimizer with UnionLoop cases. [spark]

2025-01-20 Thread via GitHub
milanisvet commented on code in PR #49518: URL: https://github.com/apache/spark/pull/49518#discussion_r1922522608 ## common/utils/src/main/resources/error/error-conditions.json: ## @@ -3099,6 +3099,29 @@ ], "sqlState" : "42602" }, + "INVALID_RECURSIVE_REFERENCE" :

Re: [PR] [WIP][SPARK-50892][SQL]Add UnionLoopExec, physical operator for recursion, to perform execution of recursive queries [spark]

2025-01-20 Thread via GitHub
milanisvet commented on code in PR #49571: URL: https://github.com/apache/spark/pull/49571#discussion_r1922519875 ## sql/core/src/main/scala/org/apache/spark/sql/execution/basicPhysicalOperators.scala: ## @@ -714,6 +718,140 @@ case class UnionExec(children: Seq[SparkPlan]) exten

Re: [PR] [WIP][SPARK-50892][SQL]Add UnionLoopExec, physical operator for recursion, to perform execution of recursive queries [spark]

2025-01-20 Thread via GitHub
milanisvet commented on code in PR #49571: URL: https://github.com/apache/spark/pull/49571#discussion_r1922518700 ## sql/core/src/main/scala/org/apache/spark/sql/execution/basicPhysicalOperators.scala: ## @@ -714,6 +718,140 @@ case class UnionExec(children: Seq[SparkPlan]) exten

Re: [PR] [WIP][SPARK-50892][SQL]Add UnionLoopExec, physical operator for recursion, to perform execution of recursive queries [spark]

2025-01-20 Thread via GitHub
milanisvet commented on code in PR #49571: URL: https://github.com/apache/spark/pull/49571#discussion_r1922515273 ## sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala: ## @@ -4520,6 +4520,31 @@ object SQLConf { .checkValues(LegacyBehaviorPolicy.values.

Re: [PR] [SPARK-50793][SQL] Fix MySQL cast function for DOUBLE, LONGTEXT, BIGINT and BLOB types [spark]

2025-01-20 Thread via GitHub
sunxiaoguang commented on code in PR #49453: URL: https://github.com/apache/spark/pull/49453#discussion_r1922513989 ## connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/MySQLIntegrationSuite.scala: ## @@ -241,6 +241,91 @@ class MySQLIntegrationSuite

Re: [PR] [WIP][SPARK-50892][SQL]Add UnionLoopExec, physical operator for recursion, to perform execution of recursive queries [spark]

2025-01-20 Thread via GitHub
milanisvet commented on code in PR #49571: URL: https://github.com/apache/spark/pull/49571#discussion_r1922509471 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/InlineCTE.scala: ## @@ -61,7 +61,8 @@ case class InlineCTE( // 1) It is fine to inline a

[PR] [SPARK-50894] Postgres driver version bump to 42.7.5 [spark]

2025-01-20 Thread via GitHub
vladanvasi-db opened a new pull request, #49575: URL: https://github.com/apache/spark/pull/49575 ### What changes were proposed in this pull request? In this PR, I propose upgrading the version of `PostgreSQL` driver version to `42.7.5`. More specifically, in this [PR](https:

Re: [PR] [WIP][SPARK-50892][SQL]Add UnionLoopExec, physical operator for recursion, to perform execution of recursive queries [spark]

2025-01-20 Thread via GitHub
cloud-fan commented on code in PR #49571: URL: https://github.com/apache/spark/pull/49571#discussion_r1922492638 ## sql/core/src/main/scala/org/apache/spark/sql/execution/basicPhysicalOperators.scala: ## @@ -714,6 +718,140 @@ case class UnionExec(children: Seq[SparkPlan]) extend

Re: [PR] [SPARK-50844][ML][CONNECT] make model be loaded by ServiceLoader when loading [spark]

2025-01-20 Thread via GitHub
grundprinzip commented on PR #49569: URL: https://github.com/apache/spark/pull/49569#issuecomment-2602574775 Right now we're using the service loader in a slightly weird way. By default the service loader allows you to iterate over a stream and you can pick which instance to pick. Now, we'r

Re: [PR] [WIP][SPARK-50892][SQL]Add UnionLoopExec, physical operator for recursion, to perform execution of recursive queries [spark]

2025-01-20 Thread via GitHub
cloud-fan commented on code in PR #49571: URL: https://github.com/apache/spark/pull/49571#discussion_r1922484925 ## sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala: ## @@ -4520,6 +4520,31 @@ object SQLConf { .checkValues(LegacyBehaviorPolicy.values.m

Re: [PR] [WIP][SPARK-50892][SQL]Add UnionLoopExec, physical operator for recursion, to perform execution of recursive queries [spark]

2025-01-20 Thread via GitHub
cloud-fan commented on code in PR #49571: URL: https://github.com/apache/spark/pull/49571#discussion_r1922482014 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/InlineCTE.scala: ## @@ -61,7 +61,8 @@ case class InlineCTE( // 1) It is fine to inline a C

Re: [PR] Bump google.golang.org/protobuf from 1.36.1 to 1.36.2 [spark-connect-go]

2025-01-20 Thread via GitHub
dependabot[bot] commented on PR #116: URL: https://github.com/apache/spark-connect-go/pull/116#issuecomment-2602500409 Superseded by #119. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spec

[PR] Bump github.com/apache/arrow-go/v18 from 18.0.0 to 18.1.0 [spark-connect-go]

2025-01-20 Thread via GitHub
dependabot[bot] opened a new pull request, #118: URL: https://github.com/apache/spark-connect-go/pull/118 Bumps [github.com/apache/arrow-go/v18](https://github.com/apache/arrow-go) from 18.0.0 to 18.1.0. Release notes Sourced from https://github.com/apache/arrow-go/releases";>githu

Re: [PR] Bump google.golang.org/protobuf from 1.36.1 to 1.36.2 [spark-connect-go]

2025-01-20 Thread via GitHub
dependabot[bot] closed pull request #116: Bump google.golang.org/protobuf from 1.36.1 to 1.36.2 URL: https://github.com/apache/spark-connect-go/pull/116 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [PR] [SPARK-50870][CORE] Add the timezone when casting to timestamp in V2ScanRelationPushDown [spark]

2025-01-20 Thread via GitHub
changgyoopark-db commented on code in PR #49549: URL: https://github.com/apache/spark/pull/49549#discussion_r1922447749 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/V2ScanRelationPushDown.scala: ## @@ -324,12 +324,14 @@ object V2ScanRelationPushDown e

[PR] Bump google.golang.org/protobuf from 1.36.1 to 1.36.3 [spark-connect-go]

2025-01-20 Thread via GitHub
dependabot[bot] opened a new pull request, #119: URL: https://github.com/apache/spark-connect-go/pull/119 Bumps google.golang.org/protobuf from 1.36.1 to 1.36.3. [![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=g

Re: [PR] [SPARK-50870][CORE] Add the timezone when casting to timestamp in V2ScanRelationPushDown [spark]

2025-01-20 Thread via GitHub
cloud-fan commented on code in PR #49549: URL: https://github.com/apache/spark/pull/49549#discussion_r1922432053 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/V2ScanRelationPushDown.scala: ## @@ -324,12 +324,14 @@ object V2ScanRelationPushDown extends

Re: [PR] [SPARK-50870][CORE] Add the timezone when casting to timestamp in V2ScanRelationPushDown [spark]

2025-01-20 Thread via GitHub
cloud-fan commented on code in PR #49549: URL: https://github.com/apache/spark/pull/49549#discussion_r1922432053 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/V2ScanRelationPushDown.scala: ## @@ -324,12 +324,14 @@ object V2ScanRelationPushDown extends

Re: [PR] [SPARK-50870][CORE] Add the timezone when casting to timestamp in V2ScanRelationPushDown [spark]

2025-01-20 Thread via GitHub
cloud-fan commented on code in PR #49549: URL: https://github.com/apache/spark/pull/49549#discussion_r1922431657 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/V2ScanRelationPushDown.scala: ## @@ -324,12 +324,14 @@ object V2ScanRelationPushDown extends

  1   2   >