Re: [PR] [SPARK-44155] Adding a dev utility to improve error messages based on LLM [spark]

2025-04-22 Thread via GitHub
smileyboy2019 commented on PR #41711: URL: https://github.com/apache/spark/pull/41711#issuecomment-2820643018 May I ask if there is currently support for LLM large models to support Spark -- This is an automated message from the Apache Git Service. To respond to the message, please log on

Re: [PR] [SPARK-51861][SQL][UI] Remove duplicated/unnecessary info of InMemoryRelation Plan Detail [spark]

2025-04-22 Thread via GitHub
cloud-fan commented on PR #50656: URL: https://github.com/apache/spark/pull/50656#issuecomment-2820639069 late LGTM -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsub

Re: [PR] [SPARK-51856][ML][CONNECT] Update model size API to count distributed DataFrame size [spark]

2025-04-22 Thread via GitHub
smileyboy2019 commented on PR #50652: URL: https://github.com/apache/spark/pull/50652#issuecomment-2820647076 May I ask if there is currently support for LLM large models to write Spark code and execute scripts through the large models -- This is an automated message from the Apache Git S

Re: [PR] [SPARK-51862][ML][CONNECT][TESTS] Clean up ml cache before ReusedConnectTestCase and ReusedMixedTestCase [spark]

2025-04-22 Thread via GitHub
zhengruifeng closed pull request #50660: [SPARK-51862][ML][CONNECT][TESTS] Clean up ml cache before ReusedConnectTestCase and ReusedMixedTestCase URL: https://github.com/apache/spark/pull/50660 -- This is an automated message from the Apache Git Service. To respond to the message, please log

Re: [PR] [SPARK-51862][ML][CONNECT][TESTS] Clean up ml cache before ReusedConnectTestCase and ReusedMixedTestCase [spark]

2025-04-22 Thread via GitHub
zhengruifeng commented on PR #50660: URL: https://github.com/apache/spark/pull/50660#issuecomment-2820660027 thanks, merged to master -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-50983][SQL]Part 1.a Add outer scope attributes for SubqueryExpression [spark]

2025-04-22 Thread via GitHub
vladimirg-db commented on code in PR #50285: URL: https://github.com/apache/spark/pull/50285#discussion_r2053490556 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/subquery.scala: ## @@ -67,6 +67,8 @@ abstract class PlanExpression[T <: QueryPlan[_]] exte

[PR] [MINOR] Make `README.md` up-to-date [spark-connect-swift]

2025-04-22 Thread via GitHub
dongjoon-hyun opened a new pull request, #83: URL: https://github.com/apache/spark-connect-swift/pull/83 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change?

Re: [PR] [SPARK-50983][SQL]Part 1.a Add outer scope attributes for SubqueryExpression [spark]

2025-04-22 Thread via GitHub
vladimirg-db commented on code in PR #50285: URL: https://github.com/apache/spark/pull/50285#discussion_r2053837425 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/FunctionTableSubqueryArgumentExpression.scala: ## @@ -67,12 +71,20 @@ import org.apache.sp

Re: [PR] [SPARK-50983][SQL]Part 1.a Add outer scope attributes for SubqueryExpression [spark]

2025-04-22 Thread via GitHub
mihailotim-db commented on code in PR #50285: URL: https://github.com/apache/spark/pull/50285#discussion_r2053843045 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/FunctionTableSubqueryArgumentExpression.scala: ## @@ -46,6 +46,10 @@ import org.apache.sp

Re: [PR] [SPARK-50983][SQL]Part 1.b Add analyzer support for nested correlated subqueries [spark]

2025-04-22 Thread via GitHub
vladimirg-db commented on code in PR #50548: URL: https://github.com/apache/spark/pull/50548#discussion_r2053533428 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala: ## @@ -234,9 +228,9 @@ object AnalysisContext { try f finally { set(orig

Re: [PR] [SPARK-51864] Rename parameters and support case-insensitively [spark-connect-swift]

2025-04-22 Thread via GitHub
dongjoon-hyun commented on PR #81: URL: https://github.com/apache/spark-connect-swift/pull/81#issuecomment-2820493495 Let me merge this to fix the parameter mismatches. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[PR] [SPARK-51864] Rename parameters and support case-insensitively [spark-connect-swift]

2025-04-22 Thread via GitHub
dongjoon-hyun opened a new pull request, #81: URL: https://github.com/apache/spark-connect-swift/pull/81 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change?

Re: [PR] [SPARK-51863] Support `join` and `crossJoin` in `DataFrame` [spark-connect-swift]

2025-04-22 Thread via GitHub
dongjoon-hyun commented on PR #80: URL: https://github.com/apache/spark-connect-swift/pull/80#issuecomment-2820490063 Let me merge this to unblock the next dev steps. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use th

Re: [PR] [SPARK-51863] Support `join` and `crossJoin` in `DataFrame` [spark-connect-swift]

2025-04-22 Thread via GitHub
dongjoon-hyun closed pull request #80: [SPARK-51863] Support `join` and `crossJoin` in `DataFrame` URL: https://github.com/apache/spark-connect-swift/pull/80 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above t

Re: [PR] [SPARK-51864] Rename parameters and support case-insensitively [spark-connect-swift]

2025-04-22 Thread via GitHub
dongjoon-hyun closed pull request #81: [SPARK-51864] Rename parameters and support case-insensitively URL: https://github.com/apache/spark-connect-swift/pull/81 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abov

Re: [PR] [SPARK-51863] Support `join` and `crossJoin` in `DataFrame` [spark-connect-swift]

2025-04-22 Thread via GitHub
peter-toth commented on code in PR #80: URL: https://github.com/apache/spark-connect-swift/pull/80#discussion_r2053564567 ## Sources/SparkConnect/DataFrame.swift: ## @@ -521,6 +521,80 @@ public actor DataFrame: Sendable { } } + /// Join with another `DataFrame`. + /

[PR] [SPARK-51529] Support for TLS connections [spark-connect-swift]

2025-04-22 Thread via GitHub
dongjoon-hyun opened a new pull request, #82: URL: https://github.com/apache/spark-connect-swift/pull/82 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change?

Re: [PR] [SPARK-51529] Support `TLS` connections [spark-connect-swift]

2025-04-22 Thread via GitHub
grundprinzip commented on PR #82: URL: https://github.com/apache/spark-connect-swift/pull/82#issuecomment-2820523390 While experimenting with the client, I found another issue: It will always create a new connection and not reuse an existing connection for the session. Every time whe

[PR] [SPARK-51861][CORE] NettyServer doesn't shutdown if SparkContext initialize failed [spark]

2025-04-22 Thread via GitHub
IsisPolei opened a new pull request, #50661: URL: https://github.com/apache/spark/pull/50661 ### What changes were proposed in this pull request? When obtaining a SparkContext instance using SparkContext.getOrCreate(), if an exception occurs during initialization (such as using incorr

Re: [PR] [SPARK-51821][CORE] Call interrupt() without holding uninterruptibleLock to avoid possible deadlock [spark]

2025-04-22 Thread via GitHub
cloud-fan commented on code in PR #50594: URL: https://github.com/apache/spark/pull/50594#discussion_r2053652126 ## core/src/test/scala/org/apache/spark/util/UninterruptibleThreadSuite.scala: ## @@ -115,6 +116,51 @@ class UninterruptibleThreadSuite extends SparkFunSuite { a

Re: [PR] [SPARK-51856][ML][CONNECT] Update model size API to count distributed DataFrame size [spark]

2025-04-22 Thread via GitHub
WeichenXu123 commented on PR #50652: URL: https://github.com/apache/spark/pull/50652#issuecomment-2820776066 > May I ask if there is currently support for LLM large models to write Spark code and execute scripts through the large models as of now, SparkML does not support LLM models,

Re: [PR] [SPARK-44856][PYTHON] Improve Python UDTF arrow serializer performance [spark]

2025-04-22 Thread via GitHub
HyukjinKwon commented on code in PR #50099: URL: https://github.com/apache/spark/pull/50099#discussion_r2053763675 ## python/pyspark/worker.py: ## @@ -1417,6 +1434,153 @@ def mapper(_, it): return mapper, None, ser, ser +elif eval_type == PythonEvalType.SQL_ARRO

[PR] [SPARK-51814][SS][PYTHON][FOLLLOW-UP] Uses `list(self)` instead of `StructType.fields` for old version compat [spark]

2025-04-22 Thread via GitHub
HyukjinKwon opened a new pull request, #50662: URL: https://github.com/apache/spark/pull/50662 ### What changes were proposed in this pull request? This PR is a followup of https://github.com/apache/spark/pull/50600 that proposes to use `list(self)` instead of `StructType.fields` for

Re: [PR] [SPARK-51609][SQL] Optimize Recursive CTE execution for simple queries [spark]

2025-04-22 Thread via GitHub
Pajaraja commented on code in PR #50402: URL: https://github.com/apache/spark/pull/50402#discussion_r2053779830 ## sql/core/src/main/scala/org/apache/spark/sql/execution/UnionLoopExec.scala: ## @@ -149,6 +156,36 @@ case class UnionLoopExec( val numPartitions = prevDF.queryE

Re: [PR] [SPARK-50983][SQL]Part 1.a Add outer scope attributes for SubqueryExpression [spark]

2025-04-22 Thread via GitHub
mihailotim-db commented on code in PR #50285: URL: https://github.com/apache/spark/pull/50285#discussion_r2053843556 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/DynamicPruning.scala: ## @@ -67,6 +70,17 @@ case class DynamicPruningSubquery( copy(

Re: [PR] [MINOR] Make `README.md` up-to-date [spark-connect-swift]

2025-04-22 Thread via GitHub
dongjoon-hyun closed pull request #83: [MINOR] Make `README.md` up-to-date URL: https://github.com/apache/spark-connect-swift/pull/83 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comm

Re: [PR] [MINOR] Make `README.md` up-to-date [spark-connect-swift]

2025-04-22 Thread via GitHub
dongjoon-hyun commented on PR #83: URL: https://github.com/apache/spark-connect-swift/pull/83#issuecomment-2820908892 Let me merge this because this is a minor README update. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub an

Re: [PR] [SPARK-51826][K8S][DOCS] Update `YuniKorn` docs with `1.6.2` [spark]

2025-04-22 Thread via GitHub
dongjoon-hyun commented on PR #50617: URL: https://github.com/apache/spark/pull/50617#issuecomment-2820335277 According to YuniKorn community recommendation, I backported this documentation patch to `branch-4.0` via 4f65156ea5e57486f47c891dd640bef366e57b61 . -- This is an automated messa

Re: [PR] [SPARK-51529] Support `TLS` connections [spark-connect-swift]

2025-04-22 Thread via GitHub
dongjoon-hyun commented on PR #82: URL: https://github.com/apache/spark-connect-swift/pull/82#issuecomment-2820528943 Thank you for verifying TLS feature, @grundprinzip . Yes, reusing and reconnection feature is related to many implementation details. So, I'm stilling working on it.

Re: [PR] [SPARK-51529] Support `TLS` connections [spark-connect-swift]

2025-04-22 Thread via GitHub
dongjoon-hyun closed pull request #82: [SPARK-51529] Support `TLS` connections URL: https://github.com/apache/spark-connect-swift/pull/82 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[PR] [CONNECT][TESTS] Print stacktrace message when RootAllocator fails to close [spark]

2025-04-22 Thread via GitHub
LuciferYang opened a new pull request, #50663: URL: https://github.com/apache/spark/pull/50663 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How

Re: [PR] [SPARK-51821][CORE] Call interrupt() without holding uninterruptibleLock to avoid possible deadlock [spark]

2025-04-22 Thread via GitHub
vrozov commented on code in PR #50594: URL: https://github.com/apache/spark/pull/50594#discussion_r2054245540 ## core/src/test/scala/org/apache/spark/util/UninterruptibleThreadSuite.scala: ## @@ -115,6 +116,51 @@ class UninterruptibleThreadSuite extends SparkFunSuite { asse

[PR] [SPARK-51866][CONNECT][TESTS] Close `serializerAllocator` and `deserializerAllocator` when the creation of `CloseableIterator` by `ArrowEncoderSuite#roundTripWithDifferentIOEncoders` fails [spark

2025-04-22 Thread via GitHub
LuciferYang opened a new pull request, #50664: URL: https://github.com/apache/spark/pull/50664 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How

Re: [PR] [SPARK-51821][CORE] Call interrupt() without holding uninterruptibleLock to avoid possible deadlock [spark]

2025-04-22 Thread via GitHub
vrozov commented on code in PR #50594: URL: https://github.com/apache/spark/pull/50594#discussion_r2054245540 ## core/src/test/scala/org/apache/spark/util/UninterruptibleThreadSuite.scala: ## @@ -115,6 +116,51 @@ class UninterruptibleThreadSuite extends SparkFunSuite { asse

Re: [PR] [SPARK-51400] Replace ArrayContains nodes to InSet [spark]

2025-04-22 Thread via GitHub
adrians commented on code in PR #50170: URL: https://github.com/apache/spark/pull/50170#discussion_r2054160474 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala: ## @@ -298,6 +299,24 @@ object ReorderAssociativeOperator extends Rule[Logica

Re: [PR] [SPARK-51400] Replace ArrayContains nodes to InSet [spark]

2025-04-22 Thread via GitHub
adrians commented on code in PR #50170: URL: https://github.com/apache/spark/pull/50170#discussion_r2054160474 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala: ## @@ -298,6 +299,24 @@ object ReorderAssociativeOperator extends Rule[Logica

Re: [PR] [CONNECT][TESTS] Print stacktrace message when RootAllocator fails to close [spark]

2025-04-22 Thread via GitHub
LuciferYang commented on code in PR #50663: URL: https://github.com/apache/spark/pull/50663#discussion_r2054149578 ## sql/connect/client/jvm/src/test/scala/org/apache/spark/sql/connect/client/arrow/ArrowEncoderSuite.scala: ## @@ -99,40 +99,49 @@ class ArrowEncoderSuite extends C

Re: [PR] [SPARK-51834][SQL] Support end-to-end table constraint management [spark]

2025-04-22 Thread via GitHub
aokolnychyi commented on code in PR #50631: URL: https://github.com/apache/spark/pull/50631#discussion_r2054406158 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveTableSpec.scala: ## @@ -18,11 +18,12 @@ package org.apache.spark.sql.catalyst.analysis

Re: [PR] [SPARK-51758][SS] Fix test case related to extra batch causing empty df due to watermark [spark]

2025-04-22 Thread via GitHub
LuciferYang commented on PR #50626: URL: https://github.com/apache/spark/pull/50626#issuecomment-2821775469 I found that it's not just the aforementioned daily test that has been affected. I noticed that after this pull request was merged into the branch-4.0 submission pipeline, the

Re: [PR] [SPARK-51834][SQL] Support end-to-end table constraint management [spark]

2025-04-22 Thread via GitHub
gengliangwang commented on code in PR #50631: URL: https://github.com/apache/spark/pull/50631#discussion_r2054469701 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/constraints.scala: ## @@ -112,6 +117,27 @@ case class CheckConstraint( with TableConst

Re: [PR] [SPARK-51834][SQL] Support end-to-end table constraint management [spark]

2025-04-22 Thread via GitHub
gengliangwang commented on code in PR #50631: URL: https://github.com/apache/spark/pull/50631#discussion_r2054485796 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/v2AlterTableCommands.scala: ## @@ -295,7 +295,16 @@ case class AlterTableCollation( ca

Re: [PR] [SPARK-51834][SQL] Support end-to-end table constraint management [spark]

2025-04-22 Thread via GitHub
gengliangwang commented on code in PR #50631: URL: https://github.com/apache/spark/pull/50631#discussion_r2054483558 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/constraints.scala: ## @@ -112,6 +117,27 @@ case class CheckConstraint( with TableConst

Re: [PR] [CONNECT][TESTS] Print stacktrace message when RootAllocator fails to close [spark]

2025-04-22 Thread via GitHub
LuciferYang commented on code in PR #50663: URL: https://github.com/apache/spark/pull/50663#discussion_r2054188981 ## sql/connect/client/jvm/src/test/scala/org/apache/spark/sql/connect/client/arrow/ArrowEncoderSuite.scala: ## @@ -827,7 +827,8 @@ class ArrowEncoderSuite extends C

Re: [PR] [SPARK-51400] Replace ArrayContains nodes to InSet [spark]

2025-04-22 Thread via GitHub
adrians commented on PR #50170: URL: https://github.com/apache/spark/pull/50170#issuecomment-2821450242 > When there is no pushdown benefit, is there any down side of changing the physical execution from `ArrayContains` to `InSet`? I don't see any downsides: * in the worst-case: si

Re: [PR] [SPARK-51865][CONNECT][TESTS] Print stacktrace message when `RootAllocator` fails to close [spark]

2025-04-22 Thread via GitHub
LuciferYang commented on code in PR #50663: URL: https://github.com/apache/spark/pull/50663#discussion_r2054259506 ## sql/connect/client/jvm/src/test/scala/org/apache/spark/sql/connect/client/arrow/ArrowEncoderSuite.scala: ## @@ -827,7 +827,8 @@ class ArrowEncoderSuite extends C

Re: [PR] [SPARK-51834][SQL] Support end-to-end table constraint management [spark]

2025-04-22 Thread via GitHub
aokolnychyi commented on code in PR #50631: URL: https://github.com/apache/spark/pull/50631#discussion_r2054450115 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveTableSpec.scala: ## @@ -46,20 +47,55 @@ object ResolveTableSpec extends Rule[LogicalPla

Re: [PR] [SPARK-51834][SQL] Support end-to-end table constraint management [spark]

2025-04-22 Thread via GitHub
gengliangwang commented on code in PR #50631: URL: https://github.com/apache/spark/pull/50631#discussion_r2054452263 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveTableSpec.scala: ## @@ -18,11 +18,12 @@ package org.apache.spark.sql.catalyst.analys

Re: [PR] [SPARK-51834][SQL] Support end-to-end table constraint management [spark]

2025-04-22 Thread via GitHub
aokolnychyi commented on code in PR #50631: URL: https://github.com/apache/spark/pull/50631#discussion_r2054455549 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/constraints.scala: ## @@ -112,6 +117,27 @@ case class CheckConstraint( with TableConstra

Re: [PR] [SPARK-51834][SQL] Support end-to-end table constraint management [spark]

2025-04-22 Thread via GitHub
gengliangwang commented on code in PR #50631: URL: https://github.com/apache/spark/pull/50631#discussion_r2054458428 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveTableSpec.scala: ## @@ -46,20 +47,55 @@ object ResolveTableSpec extends Rule[LogicalP

Re: [PR] [SPARK-51834][SQL] Support end-to-end table constraint management [spark]

2025-04-22 Thread via GitHub
aokolnychyi commented on code in PR #50631: URL: https://github.com/apache/spark/pull/50631#discussion_r2054457100 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/constraints.scala: ## @@ -112,6 +117,27 @@ case class CheckConstraint( with TableConstra

Re: [PR] [SPARK-51834][SQL] Support end-to-end table constraint management [spark]

2025-04-22 Thread via GitHub
aokolnychyi commented on code in PR #50631: URL: https://github.com/apache/spark/pull/50631#discussion_r2054462490 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/v2AlterTableCommands.scala: ## @@ -295,7 +295,16 @@ case class AlterTableCollation( case

Re: [PR] [SPARK-51834][SQL] Support end-to-end table constraint management [spark]

2025-04-22 Thread via GitHub
aokolnychyi commented on code in PR #50631: URL: https://github.com/apache/spark/pull/50631#discussion_r2054462490 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/v2AlterTableCommands.scala: ## @@ -295,7 +295,16 @@ case class AlterTableCollation( case

Re: [PR] [SPARK-51400] Replace ArrayContains nodes to InSet [spark]

2025-04-22 Thread via GitHub
adrians commented on code in PR #50170: URL: https://github.com/apache/spark/pull/50170#discussion_r2054160474 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala: ## @@ -298,6 +299,24 @@ object ReorderAssociativeOperator extends Rule[Logica

[PR] [WIP] [SPARK-51867][ML] Make scala model supporting save / load methods against local filesystem path [spark]

2025-04-22 Thread via GitHub
WeichenXu123 opened a new pull request, #50665: URL: https://github.com/apache/spark/pull/50665 ### What changes were proposed in this pull request? Make scala model supporting save / load methods (deverloper api) against local filesystem path. ### Why are the chan

Re: [PR] [SPARK-51711][ML][PYTHON][CONNECT] Propagates the active remote spark session to new threads to fix CrossValidator [spark]

2025-04-22 Thread via GitHub
xi-db commented on code in PR #50507: URL: https://github.com/apache/spark/pull/50507#discussion_r2054274968 ## python/pyspark/ml/connect/tuning.py: ## @@ -434,7 +434,7 @@ def _fit(self, dataset: Union[pd.DataFrame, DataFrame]) -> "CrossValidatorModel" tasks = _p

Re: [PR] [SPARK-51711][ML][PYTHON][CONNECT] Propagates the active remote spark session to new threads to fix CrossValidator [spark]

2025-04-22 Thread via GitHub
xi-db commented on PR #50507: URL: https://github.com/apache/spark/pull/50507#issuecomment-2821576150 > have we tried just replacing SparkSession.getActiveSession() with SparkSession.active() ? @zhengruifeng I've tested changing to SparkSession.active() in `tuning.py`, but it doesn'

Re: [PR] [SPARK-50983][SQL]Part 1.b Add analyzer support for nested correlated subqueries [spark]

2025-04-22 Thread via GitHub
AveryQi115 commented on code in PR #50548: URL: https://github.com/apache/spark/pull/50548#discussion_r2054646724 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala: ## @@ -2320,20 +2348,57 @@ class Analyzer(override val catalogManager: Catalog

Re: [PR] [SPARK-50983][SQL]Part 1.b Add analyzer support for nested correlated subqueries [spark]

2025-04-22 Thread via GitHub
vladimirg-db commented on code in PR #50548: URL: https://github.com/apache/spark/pull/50548#discussion_r2054653700 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala: ## @@ -2309,6 +2316,27 @@ class Analyzer(override val catalogManager: Catalo

Re: [PR] [SPARK-50983][SQL]Part 1.b Add analyzer support for nested correlated subqueries [spark]

2025-04-22 Thread via GitHub
AveryQi115 commented on code in PR #50548: URL: https://github.com/apache/spark/pull/50548#discussion_r2054652763 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala: ## @@ -2309,6 +2316,27 @@ class Analyzer(override val catalogManager: CatalogM

Re: [PR] [SPARK-50983][SQL]Part 1.b Add analyzer support for nested correlated subqueries [spark]

2025-04-22 Thread via GitHub
vladimirg-db commented on code in PR #50548: URL: https://github.com/apache/spark/pull/50548#discussion_r2054658908 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala: ## @@ -2309,6 +2316,27 @@ class Analyzer(override val catalogManager: Catalo

Re: [PR] [SPARK-50983][SQL]Part 1.b Add analyzer support for nested correlated subqueries [spark]

2025-04-22 Thread via GitHub
AveryQi115 commented on code in PR #50548: URL: https://github.com/apache/spark/pull/50548#discussion_r2054657899 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/DynamicPruning.scala: ## @@ -67,6 +68,17 @@ case class DynamicPruningSubquery( copy()

Re: [PR] [SPARK-50983][SQL]Part 1.b Add analyzer support for nested correlated subqueries [spark]

2025-04-22 Thread via GitHub
AveryQi115 commented on code in PR #50548: URL: https://github.com/apache/spark/pull/50548#discussion_r2054649412 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala: ## @@ -2320,20 +2348,57 @@ class Analyzer(override val catalogManager: Catalog

Re: [PR] [SPARK-50983][SQL]Part 1.b Add analyzer support for nested correlated subqueries [spark]

2025-04-22 Thread via GitHub
AveryQi115 commented on code in PR #50548: URL: https://github.com/apache/spark/pull/50548#discussion_r2054649412 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala: ## @@ -2320,20 +2348,57 @@ class Analyzer(override val catalogManager: Catalog

Re: [PR] [SPARK-50983][SQL]Part 1.b Add analyzer support for nested correlated subqueries [spark]

2025-04-22 Thread via GitHub
AveryQi115 commented on code in PR #50548: URL: https://github.com/apache/spark/pull/50548#discussion_r2054657286 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala: ## @@ -228,6 +228,67 @@ trait CheckAnalysis extends LookupCatalog with Qu

Re: [PR] [SPARK-50983][SQL]Part 1.b Add analyzer support for nested correlated subqueries [spark]

2025-04-22 Thread via GitHub
vladimirg-db commented on code in PR #50548: URL: https://github.com/apache/spark/pull/50548#discussion_r2054660033 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala: ## @@ -228,6 +228,67 @@ trait CheckAnalysis extends LookupCatalog with

Re: [PR] [SPARK-51834][SQL] Support end-to-end table constraint management [spark]

2025-04-22 Thread via GitHub
aokolnychyi commented on code in PR #50631: URL: https://github.com/apache/spark/pull/50631#discussion_r2054660897 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveTableSpec.scala: ## @@ -46,20 +47,55 @@ object ResolveTableSpec extends Rule[LogicalPla

Re: [PR] [SPARK-50983][SQL]Part 1.b Add analyzer support for nested correlated subqueries [spark]

2025-04-22 Thread via GitHub
vladimirg-db commented on code in PR #50548: URL: https://github.com/apache/spark/pull/50548#discussion_r2054668941 ## sql/core/src/test/resources/sql-tests/analyzer-results/join-lateral.sql.out: ## @@ -1209,7 +1201,7 @@ Project [c1#x, c2#x, count(1)#xL] SELECT * FROM t1, LATER

Re: [PR] [SPARK-50983][SQL]Part 1.b Add analyzer support for nested correlated subqueries [spark]

2025-04-22 Thread via GitHub
vladimirg-db commented on code in PR #50548: URL: https://github.com/apache/spark/pull/50548#discussion_r2054668941 ## sql/core/src/test/resources/sql-tests/analyzer-results/join-lateral.sql.out: ## @@ -1209,7 +1201,7 @@ Project [c1#x, c2#x, count(1)#xL] SELECT * FROM t1, LATER

Re: [PR] [SPARK-50983][SQL]Part 1.b Add analyzer support for nested correlated subqueries [spark]

2025-04-22 Thread via GitHub
vladimirg-db commented on code in PR #50548: URL: https://github.com/apache/spark/pull/50548#discussion_r2054668941 ## sql/core/src/test/resources/sql-tests/analyzer-results/join-lateral.sql.out: ## @@ -1209,7 +1201,7 @@ Project [c1#x, c2#x, count(1)#xL] SELECT * FROM t1, LATER

Re: [PR] [SPARK-50983][SQL]Part 1.b Add analyzer support for nested correlated subqueries [spark]

2025-04-22 Thread via GitHub
vladimirg-db commented on code in PR #50548: URL: https://github.com/apache/spark/pull/50548#discussion_r2054668941 ## sql/core/src/test/resources/sql-tests/analyzer-results/join-lateral.sql.out: ## @@ -1209,7 +1201,7 @@ Project [c1#x, c2#x, count(1)#xL] SELECT * FROM t1, LATER

Re: [PR] [SPARK-50983][SQL]Part 1.b Add analyzer support for nested correlated subqueries [spark]

2025-04-22 Thread via GitHub
vladimirg-db commented on code in PR #50548: URL: https://github.com/apache/spark/pull/50548#discussion_r2054672889 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala: ## @@ -228,6 +228,67 @@ trait CheckAnalysis extends LookupCatalog with

Re: [PR] [SPARK-51834][SQL] Support end-to-end table constraint management [spark]

2025-04-22 Thread via GitHub
aokolnychyi commented on code in PR #50631: URL: https://github.com/apache/spark/pull/50631#discussion_r2054663517 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveTableSpec.scala: ## @@ -46,20 +47,55 @@ object ResolveTableSpec extends Rule[LogicalPla

Re: [PR] [WIP] [SPARK-51867][ML] Make scala model supporting save / load methods against local filesystem path [spark]

2025-04-22 Thread via GitHub
WeichenXu123 commented on code in PR #50665: URL: https://github.com/apache/spark/pull/50665#discussion_r2055233400 ## mllib/src/main/scala/org/apache/spark/ml/clustering/KMeans.scala: ## @@ -233,7 +233,7 @@ private class InternalKMeansModelWriter extends MLWriterFormat with ML

Re: [PR] [WIP] [SPARK-51867][ML] Make scala model supporting save / load methods against local filesystem path [spark]

2025-04-22 Thread via GitHub
WeichenXu123 commented on code in PR #50665: URL: https://github.com/apache/spark/pull/50665#discussion_r2055233400 ## mllib/src/main/scala/org/apache/spark/ml/clustering/KMeans.scala: ## @@ -233,7 +233,7 @@ private class InternalKMeansModelWriter extends MLWriterFormat with ML

Re: [PR] [WIP] [SPARK-51867][ML] Make scala model supporting save / load methods against local filesystem path [spark]

2025-04-22 Thread via GitHub
WeichenXu123 commented on code in PR #50665: URL: https://github.com/apache/spark/pull/50665#discussion_r2055232971 ## mllib/src/main/scala/org/apache/spark/ml/clustering/KMeans.scala: ## @@ -233,7 +233,7 @@ private class InternalKMeansModelWriter extends MLWriterFormat with ML

[PR] [SPARK-51873][ML] For OneVsRest algorithm, allow using save / load to replace cache [spark]

2025-04-22 Thread via GitHub
WeichenXu123 opened a new pull request, #50672: URL: https://github.com/apache/spark/pull/50672 ### What changes were proposed in this pull request? For OneVsRest algorithm, allow using save / load to replace cache ### Why are the changes needed? Dataframe persist

Re: [PR] [SPARK-51869][SS] Create classification for user errors within handleInputRows for Scala TransformWithState [spark]

2025-04-22 Thread via GitHub
anishshri-db commented on code in PR #50667: URL: https://github.com/apache/spark/pull/50667#discussion_r2055293705 ## common/utils/src/main/resources/error/error-conditions.json: ## @@ -5233,6 +5233,12 @@ ], "sqlState" : "XXKST" }, + "TRANSFORM_WITH_STATE_USER_FUN

Re: [PR] [SPARK-51869][SS] Create classification for user errors within handleInputRows for Scala TransformWithState [spark]

2025-04-22 Thread via GitHub
anishshri-db commented on code in PR #50667: URL: https://github.com/apache/spark/pull/50667#discussion_r2055295233 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/TransformWithStateExec.scala: ## @@ -259,48 +260,62 @@ case class TransformWithStateExec( pr

Re: [PR] [SPARK-51869][SS] Create classification for user errors within handleInputRows for Scala TransformWithState [spark]

2025-04-22 Thread via GitHub
ericm-db commented on code in PR #50667: URL: https://github.com/apache/spark/pull/50667#discussion_r2055297035 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/TransformWithStateExec.scala: ## @@ -259,48 +260,62 @@ case class TransformWithStateExec( privat

Re: [PR] [SPARK-51869][SS] Create classification for user errors within handleInputRows for Scala TransformWithState [spark]

2025-04-22 Thread via GitHub
anishshri-db commented on code in PR #50667: URL: https://github.com/apache/spark/pull/50667#discussion_r2055296936 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/TransformWithStateExec.scala: ## @@ -259,48 +260,62 @@ case class TransformWithStateExec( pr

Re: [PR] [SPARK-51869][SS] Create classification for user errors within handleInputRows for Scala TransformWithState [spark]

2025-04-22 Thread via GitHub
ericm-db commented on code in PR #50667: URL: https://github.com/apache/spark/pull/50667#discussion_r2055304307 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/TransformWithStateExec.scala: ## @@ -259,48 +260,62 @@ case class TransformWithStateExec( privat

Re: [PR] [SPARK-51869][SS] Create classification for user errors within handleInputRows for Scala TransformWithState [spark]

2025-04-22 Thread via GitHub
anishshri-db commented on code in PR #50667: URL: https://github.com/apache/spark/pull/50667#discussion_r2055303990 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/TransformWithStateExec.scala: ## @@ -259,48 +260,62 @@ case class TransformWithStateExec( pr

Re: [PR] [SPARK-51869][SS] Create classification for user errors within handleInputRows for Scala TransformWithState [spark]

2025-04-22 Thread via GitHub
ericm-db commented on code in PR #50667: URL: https://github.com/apache/spark/pull/50667#discussion_r2055304909 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/TransformWithStateExec.scala: ## @@ -259,48 +260,62 @@ case class TransformWithStateExec( privat

Re: [PR] [SPARK-51869][SS] Create classification for user errors within handleInputRows for Scala TransformWithState [spark]

2025-04-22 Thread via GitHub
anishshri-db commented on code in PR #50667: URL: https://github.com/apache/spark/pull/50667#discussion_r2055306871 ## sql/core/src/test/scala/org/apache/spark/sql/streaming/TransformWithStateSuite.scala: ## @@ -2400,6 +2554,170 @@ class TransformWithStateValidationSuite extends

Re: [PR] [SPARK-51869][SS] Create classification for user errors within handleInputRows for Scala TransformWithState [spark]

2025-04-22 Thread via GitHub
anishshri-db commented on code in PR #50667: URL: https://github.com/apache/spark/pull/50667#discussion_r2055307574 ## sql/core/src/test/scala/org/apache/spark/sql/streaming/TransformWithStateSuite.scala: ## @@ -2400,6 +2554,170 @@ class TransformWithStateValidationSuite extends

Re: [PR] [SPARK-51400] Replace ArrayContains nodes to InSet [spark]

2025-04-22 Thread via GitHub
adrians commented on code in PR #50170: URL: https://github.com/apache/spark/pull/50170#discussion_r2054148711 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala: ## @@ -298,6 +299,24 @@ object ReorderAssociativeOperator extends Rule[Logica

Re: [PR] [CONNECT][TESTS] Print stacktrace message when RootAllocator fails to close [spark]

2025-04-22 Thread via GitHub
LuciferYang commented on code in PR #50663: URL: https://github.com/apache/spark/pull/50663#discussion_r2054149578 ## sql/connect/client/jvm/src/test/scala/org/apache/spark/sql/connect/client/arrow/ArrowEncoderSuite.scala: ## @@ -99,40 +99,49 @@ class ArrowEncoderSuite extends C

Re: [PR] [SPARK-51834][SQL] Support end-to-end table constraint management [spark]

2025-04-22 Thread via GitHub
aokolnychyi commented on code in PR #50631: URL: https://github.com/apache/spark/pull/50631#discussion_r2054398110 ## common/utils/src/main/resources/error/error-conditions.json: ## @@ -4070,6 +4070,12 @@ ], "sqlState" : "HV091" }, + "NON_DETERMINISTIC_CHECK_CONSTR

Re: [PR] [SPARK-51849][SQL] Refactoring `ResolveDDLCommandStringTypes` [spark]

2025-04-22 Thread via GitHub
cloud-fan commented on PR #50609: URL: https://github.com/apache/spark/pull/50609#issuecomment-2821764859 thanks, merging to master/4.0! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specif

Re: [PR] [SPARK-51849][SQL] Refactoring `ResolveDDLCommandStringTypes` [spark]

2025-04-22 Thread via GitHub
cloud-fan closed pull request #50609: [SPARK-51849][SQL] Refactoring `ResolveDDLCommandStringTypes` URL: https://github.com/apache/spark/pull/50609 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to th

Re: [PR] [SPARK-50983][SQL]Part 1.a Add outer scope attributes for SubqueryExpression [spark]

2025-04-22 Thread via GitHub
AveryQi115 commented on code in PR #50285: URL: https://github.com/apache/spark/pull/50285#discussion_r2054604668 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/FunctionTableSubqueryArgumentExpression.scala: ## @@ -46,6 +46,10 @@ import org.apache.spark

Re: [PR] [SPARK-50983][SQL]Part 1.a Add outer scope attributes for SubqueryExpression [spark]

2025-04-22 Thread via GitHub
AveryQi115 commented on code in PR #50285: URL: https://github.com/apache/spark/pull/50285#discussion_r2054616260 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/FunctionTableSubqueryArgumentExpression.scala: ## @@ -46,6 +46,10 @@ import org.apache.spark

Re: [PR] [SPARK-51834][SQL] Support end-to-end table constraint management [spark]

2025-04-22 Thread via GitHub
aokolnychyi commented on code in PR #50631: URL: https://github.com/apache/spark/pull/50631#discussion_r2054620282 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveTableSpec.scala: ## @@ -46,20 +47,55 @@ object ResolveTableSpec extends Rule[LogicalPla

Re: [PR] [SPARK-50983][SQL]Part 1.a Add outer scope attributes for SubqueryExpression [spark]

2025-04-22 Thread via GitHub
vladimirg-db commented on code in PR #50285: URL: https://github.com/apache/spark/pull/50285#discussion_r2054617169 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/FunctionTableSubqueryArgumentExpression.scala: ## @@ -46,6 +46,10 @@ import org.apache.spa

Re: [PR] [SPARK-50983][SQL]Part 1.a Add outer scope attributes for SubqueryExpression [spark]

2025-04-22 Thread via GitHub
vladimirg-db commented on code in PR #50285: URL: https://github.com/apache/spark/pull/50285#discussion_r2054618132 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/subquery.scala: ## @@ -412,12 +426,19 @@ case class ScalarSubquery( override def withNe

Re: [PR] [SPARK-51834][SQL] Support end-to-end table constraint management [spark]

2025-04-22 Thread via GitHub
aokolnychyi commented on code in PR #50631: URL: https://github.com/apache/spark/pull/50631#discussion_r2054622395 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/v2AlterTableCommands.scala: ## @@ -295,7 +295,16 @@ case class AlterTableCollation( case

Re: [PR] [SPARK-50983][SQL]Part 1.b Add analyzer support for nested correlated subqueries [spark]

2025-04-22 Thread via GitHub
AveryQi115 commented on code in PR #50548: URL: https://github.com/apache/spark/pull/50548#discussion_r2054623724 ## sql/core/src/test/resources/sql-tests/analyzer-results/join-lateral.sql.out: ## @@ -1209,7 +1201,7 @@ Project [c1#x, c2#x, count(1)#xL] SELECT * FROM t1, LATERAL

Re: [PR] [SPARK-50983][SQL]Part 1.a Add outer scope attributes for SubqueryExpression [spark]

2025-04-22 Thread via GitHub
vladimirg-db commented on code in PR #50285: URL: https://github.com/apache/spark/pull/50285#discussion_r2054640056 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/FunctionTableSubqueryArgumentExpression.scala: ## @@ -46,6 +46,10 @@ import org.apache.spa

Re: [PR] [SPARK-51821][CORE] Call interrupt() without holding uninterruptibleLock to avoid possible deadlock [spark]

2025-04-22 Thread via GitHub
Ngone51 commented on code in PR #50594: URL: https://github.com/apache/spark/pull/50594#discussion_r2055342632 ## core/src/test/scala/org/apache/spark/util/UninterruptibleThreadSuite.scala: ## @@ -115,6 +116,45 @@ class UninterruptibleThreadSuite extends SparkFunSuite { ass

Re: [PR] [SPARK-47791][SQL][FOLLOWUP] Avoid invalid JDBC decimal scale [spark]

2025-04-22 Thread via GitHub
cloud-fan commented on code in PR #50673: URL: https://github.com/apache/spark/pull/50673#discussion_r2055373949 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala: ## @@ -203,7 +203,11 @@ object JdbcUtils extends Logging with SQLConfHelpe

  1   2   >