Re: [PR] [SPARK-50792][SQL][FOLLOWUP] Improve the push down information for binary [spark]

2025-01-21 Thread via GitHub
beliefer commented on code in PR #49555: URL: https://github.com/apache/spark/pull/49555#discussion_r1923797461 ## sql/catalyst/src/main/scala/org/apache/spark/sql/connector/expressions/expressions.scala: ## @@ -388,12 +388,13 @@ private[sql] object HoursTransform { } privat

Re: [PR] [SPARK-50792][SQL][FOLLOWUP] Improve the push down information for binary [spark]

2025-01-21 Thread via GitHub
beliefer commented on code in PR #49555: URL: https://github.com/apache/spark/pull/49555#discussion_r1923797461 ## sql/catalyst/src/main/scala/org/apache/spark/sql/connector/expressions/expressions.scala: ## @@ -388,12 +388,13 @@ private[sql] object HoursTransform { } privat

Re: [PR] [SPARK-50792][SQL][FOLLOWUP] Improve the push down information for binary [spark]

2025-01-21 Thread via GitHub
beliefer commented on code in PR #49555: URL: https://github.com/apache/spark/pull/49555#discussion_r1923797461 ## sql/catalyst/src/main/scala/org/apache/spark/sql/connector/expressions/expressions.scala: ## @@ -388,12 +388,13 @@ private[sql] object HoursTransform { } privat

Re: [PR] [SPARK-50792][SQL][FOLLOWUP] Improve the push down information for binary [spark]

2025-01-21 Thread via GitHub
beliefer commented on code in PR #49555: URL: https://github.com/apache/spark/pull/49555#discussion_r1923797461 ## sql/catalyst/src/main/scala/org/apache/spark/sql/connector/expressions/expressions.scala: ## @@ -388,12 +388,13 @@ private[sql] object HoursTransform { } privat

[PR] [SPARK-50798][SQL][FOLLOWUP] Further improvements to `NormalizePlan` [spark]

2025-01-21 Thread via GitHub
mihailotim-db opened a new pull request, #49585: URL: https://github.com/apache/spark/pull/49585 ### What changes were proposed in this pull request? Improve `NormalizePlan` by fixing normalization of `InheritAnalysisRules` and add normalization for `CommonExpressionId` and expres

[PR] [SPARK-50904][SQL] Fix collation expression walker query execution [spark]

2025-01-21 Thread via GitHub
stefankandic opened a new pull request, #49586: URL: https://github.com/apache/spark/pull/49586 ### What changes were proposed in this pull request? Changing when we collect results in `CollationExpressionWalkerSuite` on borders of changing session default collation. ### Why ar

Re: [PR] [SPARK-50895][SQL] Create common interface for expressions which produce default string type [spark]

2025-01-21 Thread via GitHub
stefankandic commented on code in PR #49576: URL: https://github.com/apache/spark/pull/49576#discussion_r1923877836 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collationExpressions.scala: ## @@ -151,12 +151,15 @@ case class ResolvedCollation(collatio

Re: [PR] [SPARK-48353][SQL] Introduction of Error Handling mechanism in SQL Scripting [spark]

2025-01-21 Thread via GitHub
dusantism-db commented on PR #49427: URL: https://github.com/apache/spark/pull/49427#issuecomment-2605017778 Could we add a test in which a handler is declared but an error is thrown anyway, because it is a different condition? For example declare handler for divide by zero but unresolved c

Re: [PR] [SPARK-48530][SQL] Support for local variables in SQL Scripting [spark]

2025-01-21 Thread via GitHub
vladimirg-db commented on code in PR #49445: URL: https://github.com/apache/spark/pull/49445#discussion_r1923923937 ## sql/catalyst/src/main/scala/org/apache/spark/sql/connector/catalog/CatalogManager.scala: ## @@ -49,6 +49,18 @@ class CatalogManager( // TODO: create a real S

Re: [PR] [SPARK-48530][SQL] Support for local variables in SQL Scripting [spark]

2025-01-21 Thread via GitHub
dusantism-db commented on code in PR #49445: URL: https://github.com/apache/spark/pull/49445#discussion_r1923946713 ## sql/catalyst/src/main/scala/org/apache/spark/sql/connector/catalog/CatalogManager.scala: ## @@ -49,6 +49,18 @@ class CatalogManager( // TODO: create a real S

Re: [PR] [WIP][SPARK-50838][SQL]Performs additional checks inside recursive CTEs to throw an error if forbidden case is encountered [spark]

2025-01-21 Thread via GitHub
milanisvet commented on PR #49518: URL: https://github.com/apache/spark/pull/49518#issuecomment-2605067854 As discussed offline, `checkIfSelfReferenceIsPlacedCorrectly` and `checkDataTypesAnchorAndRecursiveTerm` definitions left in `resolveWithCTE` singleton, but invoked in `checkAnalysis`

Re: [PR] [WIP][SPARK-50838][SQL]Performs additional checks inside recursive CTEs to throw an error if forbidden case is encountered [spark]

2025-01-21 Thread via GitHub
milanisvet commented on code in PR #49518: URL: https://github.com/apache/spark/pull/49518#discussion_r1923949925 ## common/utils/src/main/resources/error/error-conditions.json: ## @@ -3117,6 +3117,29 @@ ], "sqlState" : "42602" }, + "INVALID_RECURSIVE_REFERENCE" :

Re: [PR] [SPARK-48530][SQL] Support for local variables in SQL Scripting [spark]

2025-01-21 Thread via GitHub
dusantism-db commented on code in PR #49445: URL: https://github.com/apache/spark/pull/49445#discussion_r1923946713 ## sql/catalyst/src/main/scala/org/apache/spark/sql/connector/catalog/CatalogManager.scala: ## @@ -49,6 +49,18 @@ class CatalogManager( // TODO: create a real S

Re: [PR] [SPARK-50718][PYTHON][4.0] Support `addArtifact(s)` for PySpark [spark]

2025-01-21 Thread via GitHub
itholic closed pull request #49583: [SPARK-50718][PYTHON][4.0] Support `addArtifact(s)` for PySpark URL: https://github.com/apache/spark/pull/49583 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to th

Re: [PR] [SPARK-50718][PYTHON][4.0] Support `addArtifact(s)` for PySpark [spark]

2025-01-21 Thread via GitHub
itholic commented on PR #49583: URL: https://github.com/apache/spark/pull/49583#issuecomment-2603949067 Merged to branch-4.0. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] [SPARK-50582][SQL][PYTHON] Add quote builtin function [spark]

2025-01-21 Thread via GitHub
sarutak commented on code in PR #49191: URL: https://github.com/apache/spark/pull/49191#discussion_r1923271004 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala: ## @@ -3723,3 +3723,40 @@ case class Luhncheck(input: Expression) exte

Re: [PR] [SPARK-50582][SQL][PYTHON] Add quote builtin function [spark]

2025-01-21 Thread via GitHub
sarutak commented on code in PR #49191: URL: https://github.com/apache/spark/pull/49191#discussion_r1923275195 ## sql/core/src/test/scala/org/apache/spark/sql/StringFunctionsSuite.scala: ## @@ -1452,4 +1452,21 @@ class StringFunctionsSuite extends QueryTest with SharedSparkSess

Re: [PR] [SPARK-49646][SQL] add spark config for fixing subquery decorrelation for union/set operations when parentOuterReferences has references not covered in collectedChildOuterReferences [spark]

2025-01-21 Thread via GitHub
cloud-fan commented on PR #49536: URL: https://github.com/apache/spark/pull/49536#issuecomment-2604140193 Can you link to the original PR that did the change? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abo

Re: [PR] [SPARK-50091][SQL] Handle case of aggregates in left-hand operand of IN-subquery [spark]

2025-01-21 Thread via GitHub
cloud-fan commented on code in PR #48627: URL: https://github.com/apache/spark/pull/48627#discussion_r1923402160 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/subquery.scala: ## @@ -246,46 +267,106 @@ object RewritePredicateSubquery extends Rule[Logical

Re: [PR] [SPARK-50091][SQL] Handle case of aggregates in left-hand operand of IN-subquery [spark]

2025-01-21 Thread via GitHub
cloud-fan commented on code in PR #48627: URL: https://github.com/apache/spark/pull/48627#discussion_r1923402967 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/subquery.scala: ## @@ -246,46 +267,106 @@ object RewritePredicateSubquery extends Rule[Logical

Re: [PR] [SPARK-50091][SQL] Handle case of aggregates in left-hand operand of IN-subquery [spark]

2025-01-21 Thread via GitHub
cloud-fan commented on code in PR #48627: URL: https://github.com/apache/spark/pull/48627#discussion_r1923408425 ## sql/core/src/test/scala/org/apache/spark/sql/SubquerySuite.scala: ## @@ -2800,4 +2800,32 @@ class SubquerySuite extends QueryTest checkAnswer(df3, Row(7))

Re: [PR] [SPARK-50902][CORE][K8S][TESTS] Add `CRC32C` test cases [spark]

2025-01-21 Thread via GitHub
LuciferYang commented on PR #49582: URL: https://github.com/apache/spark/pull/49582#issuecomment-2604071321 A failure occurred in https://github.com/dongjoon-hyun/spark/actions/runs/12881008656/job/35910916844 , causing the `KubernetesLocalDiskShuffleDataIOSuite` to not be executed. Alth

Re: [PR] [SPARK-50792][SQL][FOLLOWUP] Improve the push down information for binary [spark]

2025-01-21 Thread via GitHub
cloud-fan commented on code in PR #49555: URL: https://github.com/apache/spark/pull/49555#discussion_r1923350962 ## sql/catalyst/src/main/scala/org/apache/spark/sql/connector/expressions/expressions.scala: ## @@ -388,12 +388,13 @@ private[sql] object HoursTransform { } priva

Re: [PR] [SPARK-50880][SQL] Add a new visitBinaryComparison method to V2ExpressionSQLBuilder [spark]

2025-01-21 Thread via GitHub
beliefer commented on PR #49556: URL: https://github.com/apache/spark/pull/49556#issuecomment-2604766338 @cloud-fan Thank you! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment

Re: [PR] [SPARK-50895][SQL] Create common interface for expressions which produce default string type [spark]

2025-01-21 Thread via GitHub
stefankandic commented on PR #49576: URL: https://github.com/apache/spark/pull/49576#issuecomment-2604961817 @MaxGekk I have created a separate PR to unblock the test that is failing in the collation expression walker suite #49586. -- This is an automated message from the Apache Git Servi

Re: [PR] [SPARK-48530][SQL] Support for local variables in SQL Scripting [spark]

2025-01-21 Thread via GitHub
cloud-fan commented on code in PR #49445: URL: https://github.com/apache/spark/pull/49445#discussion_r1923899724 ## sql/catalyst/src/main/scala/org/apache/spark/sql/connector/catalog/CatalogManager.scala: ## @@ -49,6 +49,18 @@ class CatalogManager( // TODO: create a real SYST

Re: [PR] [SPARK-48530][SQL] Support for local variables in SQL Scripting [spark]

2025-01-21 Thread via GitHub
cloud-fan commented on code in PR #49445: URL: https://github.com/apache/spark/pull/49445#discussion_r1923900874 ## sql/catalyst/src/main/scala/org/apache/spark/sql/connector/catalog/CatalogManager.scala: ## @@ -49,6 +49,18 @@ class CatalogManager( // TODO: create a real SYST

Re: [PR] [SPARK-48530][SQL] Support for local variables in SQL Scripting [spark]

2025-01-21 Thread via GitHub
dusantism-db commented on code in PR #49445: URL: https://github.com/apache/spark/pull/49445#discussion_r1923862248 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala: ## @@ -154,6 +154,9 @@ case class AnalysisContext( referredTempFunctionN

Re: [PR] [SPARK-48530][SQL] Support for local variables in SQL Scripting [spark]

2025-01-21 Thread via GitHub
dusantism-db commented on PR #49445: URL: https://github.com/apache/spark/pull/49445#issuecomment-2604938126 @cloud-fan @MaxGekk I resolved all comments, could you take a look? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub a

Re: [PR] [SPARK-48530][SQL] Support for local variables in SQL Scripting [spark]

2025-01-21 Thread via GitHub
vladimirg-db commented on code in PR #49445: URL: https://github.com/apache/spark/pull/49445#discussion_r1923910427 ## sql/catalyst/src/main/scala/org/apache/spark/sql/connector/catalog/CatalogManager.scala: ## @@ -49,6 +49,18 @@ class CatalogManager( // TODO: create a real S

Re: [PR] [SPARK-48353][SQL] Introduction of Error Handling mechanism in SQL Scripting [spark]

2025-01-21 Thread via GitHub
dusantism-db commented on code in PR #49427: URL: https://github.com/apache/spark/pull/49427#discussion_r1923910371 ## sql/core/src/test/scala/org/apache/spark/sql/scripting/SqlScriptingExecutionSuite.scala: ## @@ -65,6 +68,426 @@ class SqlScriptingExecutionSuite extends QueryTe

Re: [PR] [SPARK-48530][SQL] Support for local variables in SQL Scripting [spark]

2025-01-21 Thread via GitHub
cloud-fan commented on code in PR #49445: URL: https://github.com/apache/spark/pull/49445#discussion_r1923910719 ## sql/catalyst/src/main/scala/org/apache/spark/sql/connector/catalog/CatalogManager.scala: ## @@ -49,6 +49,18 @@ class CatalogManager( // TODO: create a real SYST

Re: [PR] [SPARK-48353][SQL] Introduction of Error Handling mechanism in SQL Scripting [spark]

2025-01-21 Thread via GitHub
dusantism-db commented on code in PR #49427: URL: https://github.com/apache/spark/pull/49427#discussion_r1923910371 ## sql/core/src/test/scala/org/apache/spark/sql/scripting/SqlScriptingExecutionSuite.scala: ## @@ -65,6 +68,426 @@ class SqlScriptingExecutionSuite extends QueryTe

Re: [PR] [SPARK-48353][SQL] Introduction of Error Handling mechanism in SQL Scripting [spark]

2025-01-21 Thread via GitHub
miland-db commented on code in PR #49427: URL: https://github.com/apache/spark/pull/49427#discussion_r1923911342 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala: ## @@ -159,15 +159,104 @@ class AstBuilder extends DataTypeAstBuilder scrip

Re: [PR] [SPARK-48530][SQL] Support for local variables in SQL Scripting [spark]

2025-01-21 Thread via GitHub
vladimirg-db commented on code in PR #49445: URL: https://github.com/apache/spark/pull/49445#discussion_r1923911359 ## sql/catalyst/src/main/scala/org/apache/spark/sql/connector/catalog/CatalogManager.scala: ## @@ -49,6 +49,18 @@ class CatalogManager( // TODO: create a real S

[PR] [SPARK-50905][SQL][TESTS] Rename `Customer*` to `Custom*` in `SparkSessionExtensionSuite` [spark]

2025-01-21 Thread via GitHub
dongjoon-hyun opened a new pull request, #49587: URL: https://github.com/apache/spark/pull/49587 … ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change?

Re: [PR] [WIP][SPARK-50838][SQL]Performs additional checks inside recursive CTEs to throw an error if forbidden case is encountered [spark]

2025-01-21 Thread via GitHub
milanisvet commented on code in PR #49518: URL: https://github.com/apache/spark/pull/49518#discussion_r1923954633 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveWithCTE.scala: ## @@ -49,17 +51,27 @@ object ResolveWithCTE extends Rule[LogicalPlan] {

Re: [PR] [SPARK-50898][ML][PYTHON][CONNECT] Support `FPGrowth` on connect [spark]

2025-01-21 Thread via GitHub
zhengruifeng commented on PR #49579: URL: https://github.com/apache/spark/pull/49579#issuecomment-2603906459 thanks, merged to master/4.0 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the speci

Re: [PR] [SPARK-50898][ML][PYTHON][CONNECT] Support `FPGrowth` on connect [spark]

2025-01-21 Thread via GitHub
zhengruifeng closed pull request #49579: [SPARK-50898][ML][PYTHON][CONNECT] Support `FPGrowth` on connect URL: https://github.com/apache/spark/pull/49579 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [PR] [SPARK-50582][SQL][PYTHON] Add quote builtin function [spark]

2025-01-21 Thread via GitHub
MaxGekk commented on code in PR #49191: URL: https://github.com/apache/spark/pull/49191#discussion_r1923230780 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala: ## @@ -3723,3 +3723,40 @@ case class Luhncheck(input: Expression) exte

Re: [PR] [SPARK-50879][ML][PYTHON][CONNECT] Support feature scalers on Connect [spark]

2025-01-21 Thread via GitHub
zhengruifeng commented on PR #49581: URL: https://github.com/apache/spark/pull/49581#issuecomment-2603917527 thanks, merged to master/4.0 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the speci

Re: [PR] [SPARK-50879][ML][PYTHON][CONNECT] Support feature scalers on Connect [spark]

2025-01-21 Thread via GitHub
zhengruifeng closed pull request #49581: [SPARK-50879][ML][PYTHON][CONNECT] Support feature scalers on Connect URL: https://github.com/apache/spark/pull/49581 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [PR] [SPARK-50858][PYTHON] Add configuration to hide Python UDF stack trace [spark]

2025-01-21 Thread via GitHub
wengh commented on code in PR #49535: URL: https://github.com/apache/spark/pull/49535#discussion_r1924468968 ## python/pyspark/util.py: ## @@ -468,16 +468,19 @@ def handle_worker_exception(e: BaseException, outfile: IO) -> None: and exception traceback info to outfile. JVM

Re: [PR] [SPARK-50858][PYTHON] Add configuration to hide Python UDF stack trace [spark]

2025-01-21 Thread via GitHub
wengh commented on code in PR #49535: URL: https://github.com/apache/spark/pull/49535#discussion_r1924468968 ## python/pyspark/util.py: ## @@ -468,16 +468,19 @@ def handle_worker_exception(e: BaseException, outfile: IO) -> None: and exception traceback info to outfile. JVM

Re: [PR] [SPARK-50858][PYTHON] Add configuration to hide Python UDF stack trace [spark]

2025-01-21 Thread via GitHub
allisonwang-db commented on code in PR #49535: URL: https://github.com/apache/spark/pull/49535#discussion_r1924503930 ## python/pyspark/util.py: ## @@ -468,16 +468,19 @@ def handle_worker_exception(e: BaseException, outfile: IO) -> None: and exception traceback info to out

Re: [PR] [SPARK-50902][CORE][K8S][TESTS] Add `CRC32C` test cases [spark]

2025-01-21 Thread via GitHub
dongjoon-hyun commented on PR #49582: URL: https://github.com/apache/spark/pull/49582#issuecomment-2605130155 Thank you! Now all tests passed. Merged to master/4.0. ![Screenshot 2025-01-21 at 08 00 57](https://github.com/user-attachments/assets/d74967d6-ea5f-47fd-b241-0781257445c1)

Re: [PR] [SPARK-50082][CORE] Remove some unnecessary Jersey-related warning logs [spark]

2025-01-21 Thread via GitHub
pan3793 commented on PR #48611: URL: https://github.com/apache/spark/pull/48611#issuecomment-2605203551 > If the following two dependencies are in the class path, there will be no corresponding warning logs, but we excluded it in this PR: https://github.com/apache/spark/pull/25481 > - `j

Re: [PR] [WIP][SPARK-50892][SQL]Add UnionLoopExec, physical operator for recursion, to perform execution of recursive queries [spark]

2025-01-21 Thread via GitHub
milanisvet commented on code in PR #49571: URL: https://github.com/apache/spark/pull/49571#discussion_r1924001588 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/InlineCTE.scala: ## @@ -61,7 +61,8 @@ case class InlineCTE( // 1) It is fine to inline a

Re: [PR] [SPARK-48530][SQL] Support for local variables in SQL Scripting [spark]

2025-01-21 Thread via GitHub
dusantism-db commented on code in PR #49445: URL: https://github.com/apache/spark/pull/49445#discussion_r1924348312 ## sql/catalyst/src/main/scala/org/apache/spark/sql/connector/catalog/CatalogManager.scala: ## @@ -49,6 +49,18 @@ class CatalogManager( // TODO: create a real S

Re: [PR] [SPARK-45013][CORE][TEST][3.5] Flaky Test with NPE: track allocated resources by taskId [spark]

2025-01-21 Thread via GitHub
dongjoon-hyun closed pull request #49589: [SPARK-45013][CORE][TEST][3.5] Flaky Test with NPE: track allocated resources by taskId URL: https://github.com/apache/spark/pull/49589 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

Re: [PR] [SPARK-49646][SQL] add spark config for fixing subquery decorrelation for union/set operations when parentOuterReferences has references not covered in collectedChildOuterReferences [spark]

2025-01-21 Thread via GitHub
AveryQi115 commented on PR #49536: URL: https://github.com/apache/spark/pull/49536#issuecomment-2605444582 I changed the description and linked the original pr in the pr description. Here's the linked change: https://github.com/apache/spark/pull/48109 -- This is an automated message from

Re: [PR] [SPARK-50639][SQL] Improve warning logging in CacheManager [spark]

2025-01-21 Thread via GitHub
vrozov commented on PR #49276: URL: https://github.com/apache/spark/pull/49276#issuecomment-2605196148 @gengliangwang Please check my reply. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the sp

Re: [PR] [SPARK-50883][SQL] Support altering multiple columns in the same command [spark]

2025-01-21 Thread via GitHub
ctring commented on PR #49559: URL: https://github.com/apache/spark/pull/49559#issuecomment-2605472867 @MaxGekk I updated the PR -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comme

Re: [PR] [SPARK-45013][CORE][TEST][3.5] Flaky Test with NPE: track allocated resources by taskId [spark]

2025-01-21 Thread via GitHub
dongjoon-hyun commented on PR #49589: URL: https://github.com/apache/spark/pull/49589#issuecomment-2605714639 Merged to branch-3.5. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific co

[PR] [SPARK-50906][SS] Add nullability check for if inputs of to_avro align with schema [spark]

2025-01-21 Thread via GitHub
fanyue-xia opened a new pull request, #49590: URL: https://github.com/apache/spark/pull/49590 ### What changes were proposed in this pull request? Previously, we don't explicitly check when input of `to_avro` is `null` but the schema does not allow `null`. As a result, a NPE w

Re: [PR] [SPARK-50905][SQL][TESTS] Rename `Customer*` to `Custom*` in `SparkSessionExtensionSuite` [spark]

2025-01-21 Thread via GitHub
LuciferYang commented on PR #49587: URL: https://github.com/apache/spark/pull/49587#issuecomment-2605266305 late LGTM -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To uns

Re: [PR] [SPARK-48530][SQL] Support for local variables in SQL Scripting [spark]

2025-01-21 Thread via GitHub
davidm-db commented on code in PR #49445: URL: https://github.com/apache/spark/pull/49445#discussion_r1924312921 ## sql/catalyst/src/main/scala/org/apache/spark/sql/connector/catalog/CatalogManager.scala: ## @@ -49,6 +49,18 @@ class CatalogManager( // TODO: create a real SYST

Re: [PR] [SPARK-48353][SQL] Introduction of Error Handling mechanism in SQL Scripting [spark]

2025-01-21 Thread via GitHub
davidm-db commented on code in PR #49427: URL: https://github.com/apache/spark/pull/49427#discussion_r1924320046 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/ParserUtils.scala: ## @@ -161,25 +161,26 @@ class SqlScriptingParsingContext { transitionTo(S

Re: [PR] [SPARK-50904][SQL] Fix collation expression walker query execution [spark]

2025-01-21 Thread via GitHub
MaxGekk closed pull request #49586: [SPARK-50904][SQL] Fix collation expression walker query execution URL: https://github.com/apache/spark/pull/49586 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] [SPARK-48530][SQL] Support for local variables in SQL Scripting [spark]

2025-01-21 Thread via GitHub
dusantism-db commented on code in PR #49445: URL: https://github.com/apache/spark/pull/49445#discussion_r1924066055 ## sql/catalyst/src/main/scala/org/apache/spark/sql/connector/catalog/CatalogManager.scala: ## @@ -49,6 +49,18 @@ class CatalogManager( // TODO: create a real S

Re: [PR] [SPARK-50905][SQL][TESTS] Rename `Customer*` to `Custom*` in `SparkSessionExtensionSuite` [spark]

2025-01-21 Thread via GitHub
dongjoon-hyun commented on PR #49587: URL: https://github.com/apache/spark/pull/49587#issuecomment-2605256634 Could you review this PR, @LuciferYang ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [PR] [SPARK-50905][SQL][TESTS] Rename `Customer*` to `Custom*` in `SparkSessionExtensionSuite` [spark]

2025-01-21 Thread via GitHub
dongjoon-hyun commented on PR #49587: URL: https://github.com/apache/spark/pull/49587#issuecomment-2605260101 Thank you, @MaxGekk ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific co

Re: [PR] [SPARK-50905][SQL][TESTS] Rename `Customer*` to `Custom*` in `SparkSessionExtensionSuite` [spark]

2025-01-21 Thread via GitHub
dongjoon-hyun closed pull request #49587: [SPARK-50905][SQL][TESTS] Rename `Customer*` to `Custom*` in `SparkSessionExtensionSuite` URL: https://github.com/apache/spark/pull/49587 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub a

Re: [PR] [SPARK-50905][SQL][TESTS] Rename `Customer*` to `Custom*` in `SparkSessionExtensionSuite` [spark]

2025-01-21 Thread via GitHub
dongjoon-hyun commented on PR #49587: URL: https://github.com/apache/spark/pull/49587#issuecomment-2605264174 `SparkSessionExtensionSuite` passed in the CI. Merged to master/4.0. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

Re: [PR] [SPARK-50883][SQL] Support altering multiple columns in the same command [spark]

2025-01-21 Thread via GitHub
ctring commented on code in PR #49559: URL: https://github.com/apache/spark/pull/49559#discussion_r1924272176 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala: ## @@ -1622,60 +1622,84 @@ trait CheckAnalysis extends PredicateHelper with L

Re: [PR] [SPARK-46937][SQL] Improve concurrency performance for FunctionRegistry [spark]

2025-01-21 Thread via GitHub
github-actions[bot] commented on PR #47084: URL: https://github.com/apache/spark/pull/47084#issuecomment-2606014785 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.

Re: [PR] [SPARK-50858][PYTHON] Add configuration to hide Python UDF stack trace [spark]

2025-01-21 Thread via GitHub
HyukjinKwon commented on code in PR #49535: URL: https://github.com/apache/spark/pull/49535#discussion_r1924528145 ## sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala: ## @@ -3459,6 +3459,15 @@ object SQLConf { .checkValues(Set("legacy", "row", "dic

Re: [PR] [SPARK-50855][SS][CONNECT] Spark Connect Support for TransformWithState [spark]

2025-01-21 Thread via GitHub
anishshri-db commented on code in PR #49488: URL: https://github.com/apache/spark/pull/49488#discussion_r1924521840 ## connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/KeyValueGroupedDataset.scala: ## @@ -140,28 +144,50 @@ class KeyValueGroupedDataset[K, V] priva

Re: [PR] [SPARK-39901][CORE][SQL] Redesign `ignoreCorruptFiles` to make it more accurate by adding a new config `spark.files.ignoreCorruptFiles.errorClasses` [spark]

2025-01-21 Thread via GitHub
github-actions[bot] closed pull request #47090: [SPARK-39901][CORE][SQL] Redesign `ignoreCorruptFiles` to make it more accurate by adding a new config `spark.files.ignoreCorruptFiles.errorClasses` URL: https://github.com/apache/spark/pull/47090 -- This is an automated message from the Apache

[PR] [ML][CONNECT] Support Transformer [spark]

2025-01-21 Thread via GitHub
wbo4958 opened a new pull request, #49588: URL: https://github.com/apache/spark/pull/49588 ### What changes were proposed in this pull request? This PR adds support transformer on ml connect. Currently, VectorAssembler is fully supported. ### Why are the changes needed?

[PR] [SPARK-45013][TEST][3.5] Flaky Test with NPE: track allocated resources by taskId [spark]

2025-01-21 Thread via GitHub
LuciferYang opened a new pull request, #49589: URL: https://github.com/apache/spark/pull/49589 ### What changes were proposed in this pull request? This PR ensures the runningTasks to be updated before subsequent tasks causing NPE ### Why are the changes needed? fix flakey tests

Re: [PR] [SPARK-50855][SS][CONNECT] Spark Connect Support for TransformWithState [spark]

2025-01-21 Thread via GitHub
jingz-db commented on PR #49488: URL: https://github.com/apache/spark/pull/49488#issuecomment-2605402772 > Is the CI failure related - https://github.com/jingz-db/spark/actions/runs/12837471455/job/35801692179 ? Yes it is related to proto file changes. I just rebased on latest master

Re: [PR] [SPARK-50883][SQL] Support altering multiple columns in the same command [spark]

2025-01-21 Thread via GitHub
ctring commented on code in PR #49559: URL: https://github.com/apache/spark/pull/49559#discussion_r1924273748 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/v2AlterTableCommands.scala: ## @@ -201,49 +201,55 @@ case class RenameColumn( copy(table

Re: [PR] [SPARK-50902][CORE][K8S][TESTS] Add `CRC32C` test cases [spark]

2025-01-21 Thread via GitHub
dongjoon-hyun closed pull request #49582: [SPARK-50902][CORE][K8S][TESTS] Add `CRC32C` test cases URL: https://github.com/apache/spark/pull/49582 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] [SPARK-45013][TEST][3.5] Flaky Test with NPE: track allocated resources by taskId [spark]

2025-01-21 Thread via GitHub
LuciferYang commented on PR #49589: URL: https://github.com/apache/spark/pull/49589#issuecomment-2605240660 I hope to backport this fix to branch-3.5, as I encountered similar test failures in the daily tests of branch-3.5: - https://github.com/apache/spark/actions/runs/12885594112/job/35

Re: [PR] [SPARK-50904][SQL] Fix collation expression walker query execution [spark]

2025-01-21 Thread via GitHub
MaxGekk commented on PR #49586: URL: https://github.com/apache/spark/pull/49586#issuecomment-2605237204 +1, LGTM. Merging to master/4.0. Thank you, @stefankandic. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

Re: [PR] [SPARK-50895][SQL] Create common interface for expressions which produce default string type [spark]

2025-01-21 Thread via GitHub
MaxGekk commented on PR #49576: URL: https://github.com/apache/spark/pull/49576#issuecomment-2605524819 +1, LGTM. Merging to master/4.0. Thank you, @stefankandic and @stevomitric for review. -- This is an automated message from the Apache Git Service. To respond to the message, please l

Re: [PR] [SPARK-50895][SQL] Create common interface for expressions which produce default string type [spark]

2025-01-21 Thread via GitHub
MaxGekk closed pull request #49576: [SPARK-50895][SQL] Create common interface for expressions which produce default string type URL: https://github.com/apache/spark/pull/49576 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

Re: [PR] [SPARK-50883][SQL] Support altering multiple columns in the same command [spark]

2025-01-21 Thread via GitHub
scovich commented on code in PR #49559: URL: https://github.com/apache/spark/pull/49559#discussion_r1924223680 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala: ## @@ -1622,60 +1622,84 @@ trait CheckAnalysis extends PredicateHelper with

Re: [PR] [SPARK-50895][SQL] Create common interface for expressions which produce default string type [spark]

2025-01-21 Thread via GitHub
MaxGekk commented on PR #49576: URL: https://github.com/apache/spark/pull/49576#issuecomment-2605535297 @stefankandic Could you open a PR with backport to `branch-4.0`, please. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub a

Re: [PR] [WIP][SPARK-50838][SQL]Performs additional checks inside recursive CTEs to throw an error if forbidden case is encountered [spark]

2025-01-21 Thread via GitHub
dtenedor commented on code in PR #49518: URL: https://github.com/apache/spark/pull/49518#discussion_r1924249528 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveWithCTE.scala: ## @@ -183,4 +184,52 @@ object ResolveWithCTE extends Rule[LogicalPlan] {

Re: [PR] [SPARK-49700][CONNECT][SQL] Unified Scala Interface for Connect and Classic [spark]

2025-01-21 Thread via GitHub
hvanhovell commented on code in PR #48818: URL: https://github.com/apache/spark/pull/48818#discussion_r1924244051 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/InsertIntoHadoopFsRelationCommand.scala: ## Review Comment: Take another look at DataWriti

Re: [PR] [SPARK-50858][PYTHON] Add configuration to hide Python UDF stack trace [spark]

2025-01-21 Thread via GitHub
allisonwang-db commented on code in PR #49535: URL: https://github.com/apache/spark/pull/49535#discussion_r1924245423 ## sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala: ## @@ -3459,6 +3459,16 @@ object SQLConf { .checkValues(Set("legacy", "row", "

Re: [PR] [SPARK-50858][PYTHON] Add configuration to hide Python UDF stack trace [spark]

2025-01-21 Thread via GitHub
wengh commented on PR #49535: URL: https://github.com/apache/spark/pull/49535#issuecomment-2605498765 @allisonwang-db @ueshin Could you review this PR that adds configuration to hide Python stack trace from analyze_udtf? -- This is an automated message from the Apache Git Service. To resp

Re: [PR] [SPARK-50792][SQL][FOLLOWUP] Improve the push down information for binary [spark]

2025-01-21 Thread via GitHub
cloud-fan commented on code in PR #49555: URL: https://github.com/apache/spark/pull/49555#discussion_r1923352440 ## sql/catalyst/src/main/scala/org/apache/spark/sql/connector/expressions/expressions.scala: ## @@ -388,12 +388,13 @@ private[sql] object HoursTransform { } priva

Re: [PR] [SPARK-50880][SQL] Add a new visitBinaryComparison method to V2ExpressionSQLBuilder [spark]

2025-01-21 Thread via GitHub
cloud-fan closed pull request #49556: [SPARK-50880][SQL] Add a new visitBinaryComparison method to V2ExpressionSQLBuilder URL: https://github.com/apache/spark/pull/49556 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

Re: [PR] [SPARK-50091][SQL] Handle case of aggregates in left-hand operand of IN-subquery [spark]

2025-01-21 Thread via GitHub
cloud-fan commented on code in PR #48627: URL: https://github.com/apache/spark/pull/48627#discussion_r1923409502 ## sql/core/src/test/scala/org/apache/spark/sql/SubquerySuite.scala: ## @@ -2800,4 +2800,32 @@ class SubquerySuite extends QueryTest checkAnswer(df3, Row(7))

Re: [PR] [SPARK-50880][SQL] Add a new visitBinaryComparison method to V2ExpressionSQLBuilder [spark]

2025-01-21 Thread via GitHub
cloud-fan commented on PR #49556: URL: https://github.com/apache/spark/pull/49556#issuecomment-2604176635 thanks, merging to master/4.0! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specif

[PR] [WIP][SPARK_50903][CONNECT] Let the plan cache only contain analysed plans [spark]

2025-01-21 Thread via GitHub
changgyoopark-db opened a new pull request, #49584: URL: https://github.com/apache/spark/pull/49584 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ##

Re: [PR] [SPARK-50912][PYTHON][TESTS] Skip FrameTakeAdvParityTests.test_take_adv because of OOM for now [spark]

2025-01-21 Thread via GitHub
HyukjinKwon closed pull request #49593: [SPARK-50912][PYTHON][TESTS] Skip FrameTakeAdvParityTests.test_take_adv because of OOM for now URL: https://github.com/apache/spark/pull/49593 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHu

[PR] [SPARK-50912][PYTHON][TESTS] Skip FrameTakeAdvParityTests.test_take_adv because of OOM for now [spark]

2025-01-21 Thread via GitHub
HyukjinKwon opened a new pull request, #49593: URL: https://github.com/apache/spark/pull/49593 ### What changes were proposed in this pull request? This PR proposes to skip `test_take_adv` in Spark Connect only build. Similar with https://github.com/apache/spark/pull/49565 ###

Re: [PR] [SPARK-50912][PYTHON][TESTS] Skip FrameTakeAdvParityTests.test_take_adv because of OOM for now [spark]

2025-01-21 Thread via GitHub
HyukjinKwon commented on PR #49593: URL: https://github.com/apache/spark/pull/49593#issuecomment-2606091709 Merged to master, branch-4.0, and branch-3.5. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

Re: [PR] [SPARK-45013][CORE][TEST][3.5] Flaky Test with NPE: track allocated resources by taskId [spark]

2025-01-21 Thread via GitHub
LuciferYang commented on PR #49589: URL: https://github.com/apache/spark/pull/49589#issuecomment-2606161803 Thanks @dongjoon-hyun -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific com

Re: [PR] [SPARK-50915][PYTHON][CONNECT] Add `getCondition` and deprecate `getErrorClass` in `PySparkException` [spark]

2025-01-21 Thread via GitHub
HyukjinKwon commented on code in PR #49594: URL: https://github.com/apache/spark/pull/49594#discussion_r1924621839 ## python/docs/source/reference/pyspark.errors.rst: ## @@ -69,7 +69,7 @@ Methods .. autosummary:: :toctree: api/ -PySparkException.getErrorClass +Py

Re: [PR] [SPARK-50915][PYTHON][CONNECT] Add `getCondition` and deprecate `getErrorClass` in `PySparkException` [spark]

2025-01-21 Thread via GitHub
HyukjinKwon commented on code in PR #49594: URL: https://github.com/apache/spark/pull/49594#discussion_r1924621621 ## python/docs/source/reference/pyspark.errors.rst: ## @@ -69,7 +69,7 @@ Methods .. autosummary:: :toctree: api/ -PySparkException.getErrorClass Review

Re: [PR] [SPARK-50915][PYTHON][CONNECT] Add `getCondition` and deprecate `getErrorClass` in `PySparkException` [spark]

2025-01-21 Thread via GitHub
HyukjinKwon commented on code in PR #49594: URL: https://github.com/apache/spark/pull/49594#discussion_r1924621839 ## python/docs/source/reference/pyspark.errors.rst: ## @@ -69,7 +69,7 @@ Methods .. autosummary:: :toctree: api/ -PySparkException.getErrorClass +Py

Re: [PR] [SPARK-50853][CORE] Close temp shuffle file writable channel [spark]

2025-01-21 Thread via GitHub
LuciferYang commented on code in PR #49531: URL: https://github.com/apache/spark/pull/49531#discussion_r1924639668 ## core/src/test/scala/org/apache/spark/network/netty/NettyBlockTransferServiceSuite.scala: ## @@ -130,6 +132,62 @@ class NettyBlockTransferServiceSuite assert

Re: [PR] [SPARK-50853][CORE] Close temp shuffle file writable channel [spark]

2025-01-21 Thread via GitHub
LuciferYang commented on code in PR #49531: URL: https://github.com/apache/spark/pull/49531#discussion_r1924639668 ## core/src/test/scala/org/apache/spark/network/netty/NettyBlockTransferServiceSuite.scala: ## @@ -130,6 +132,62 @@ class NettyBlockTransferServiceSuite assert

Re: [PR] [SPARK-50820][SQL] DSv2: Conditional nullification of metadata columns in DML [spark]

2025-01-21 Thread via GitHub
aokolnychyi commented on code in PR #49493: URL: https://github.com/apache/spark/pull/49493#discussion_r1924643421 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/ReplaceDataProjections.scala: ## @@ -0,0 +1,24 @@ +/* + * Licensed to the Apache Software Foundati

Re: [PR] [SPARK-50578][PYTHON][SS] Add support for new version of state metadata for TransformWithStateInPandas [spark]

2025-01-21 Thread via GitHub
HyukjinKwon commented on PR #49156: URL: https://github.com/apache/spark/pull/49156#issuecomment-2606079406 Just a quick note .. Seems like the test `test_value_state_ttl_expiration` is still flaky when old dependencies are used (https://github.com/apache/spark/actions/runs/12883552117/job/

  1   2   >