Re: [PR] [SPARK-51096][SQL][TESTS] Splitting TransformWithStateSuite into UnsafeRow and Avro encoding suites [spark]

2025-02-06 Thread via GitHub
HeartSaVioR closed pull request #49815: [SPARK-51096][SQL][TESTS] Splitting TransformWithStateSuite into UnsafeRow and Avro encoding suites URL: https://github.com/apache/spark/pull/49815 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

Re: [PR] [SPARK-48375][SQL] Add support for SIGNAL statement [spark]

2025-02-06 Thread via GitHub
cloud-fan commented on code in PR #49726: URL: https://github.com/apache/spark/pull/49726#discussion_r1944308460 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/SqlScriptingLogicalPlans.scala: ## @@ -405,3 +406,25 @@ case class ExceptionHandler(

Re: [PR] [SPARK-48375][SQL] Add support for SIGNAL statement [spark]

2025-02-06 Thread via GitHub
cloud-fan commented on code in PR #49726: URL: https://github.com/apache/spark/pull/49726#discussion_r1944310207 ## sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/SqlScriptingParserSuite.scala: ## @@ -2413,6 +2413,21 @@ class SqlScriptingParserSuite extends Spa

Re: [PR] [SPARK-48375][SQL] Add support for SIGNAL statement [spark]

2025-02-06 Thread via GitHub
cloud-fan commented on code in PR #49726: URL: https://github.com/apache/spark/pull/49726#discussion_r1944313734 ## sql/core/src/main/scala/org/apache/spark/sql/scripting/SqlScriptingExecutionNode.scala: ## @@ -1020,3 +1023,105 @@ class ExceptionHandlerExec( override def re

Re: [PR] [SPARK-51008][SQL] Add ResultStage for AQE [spark]

2025-02-06 Thread via GitHub
cloud-fan commented on code in PR #49715: URL: https://github.com/apache/spark/pull/49715#discussion_r1944368452 ## sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala: ## @@ -579,23 +592,52 @@ case class AdaptiveSparkPlanExec( al

Re: [PR] [SPARK-51008][SQL] Add ResultStage for AQE [spark]

2025-02-06 Thread via GitHub
cloud-fan commented on code in PR #49715: URL: https://github.com/apache/spark/pull/49715#discussion_r1944369641 ## sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala: ## @@ -579,23 +593,52 @@ case class AdaptiveSparkPlanExec( al

Re: [PR] [SPARK-48881][SQL] Some dynamic partitions can be compensated to specific partition values [spark]

2025-02-06 Thread via GitHub
fusheng9399 closed pull request #47418: [SPARK-48881][SQL] Some dynamic partitions can be compensated to specific partition values URL: https://github.com/apache/spark/pull/47418 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub an

Re: [PR] [SPARK-51050][SQL][TESTS] Add group by alias tests to the group-by-alias.sql [spark]

2025-02-06 Thread via GitHub
MaxGekk commented on PR #49750: URL: https://github.com/apache/spark/pull/49750#issuecomment-2639300183 +1, LGTM. Merging to master/4.0. Thank you, @mihailoale-db and @vladimirg-db @beliefer for review. -- This is an automated message from the Apache Git Service. To respond to the messa

Re: [PR] [SPARK-51105][ML][PYTHON][CONNECT][TESTS][4.0] Add parity test for ml functions [spark]

2025-02-06 Thread via GitHub
zhengruifeng closed pull request #49828: [SPARK-51105][ML][PYTHON][CONNECT][TESTS][4.0] Add parity test for ml functions URL: https://github.com/apache/spark/pull/49828 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

Re: [PR] [SPARK-51105][ML][PYTHON][CONNECT][TESTS] Add parity test for ml functions [spark]

2025-02-06 Thread via GitHub
zhengruifeng commented on PR #49824: URL: https://github.com/apache/spark/pull/49824#issuecomment-2639302102 merged to master -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] [SPARK-51105][ML][PYTHON][CONNECT][TESTS][4.0] Add parity test for ml functions [spark]

2025-02-06 Thread via GitHub
zhengruifeng commented on PR #49828: URL: https://github.com/apache/spark/pull/49828#issuecomment-2639301393 merged to 4.0 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. T

[PR] [SPARK-51105][ML][PYTHON][CONNECT][TESTS][4.0] Add parity test for ml functions [spark]

2025-02-06 Thread via GitHub
zhengruifeng opened a new pull request, #49828: URL: https://github.com/apache/spark/pull/49828 cherry-pick https://github.com/apache/spark/pull/49824 to 4.0 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

Re: [PR] [SPARK-51109][SQL] CTE in subquery expression as grouping column [spark]

2025-02-06 Thread via GitHub
cloud-fan commented on PR #49829: URL: https://github.com/apache/spark/pull/49829#issuecomment-2639312458 cc @peter-toth @gengliangwang -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specif

[PR] [SPARK-51109][SQL] CTE in subquery expression as grouping column [spark]

2025-02-06 Thread via GitHub
cloud-fan opened a new pull request, #49829: URL: https://github.com/apache/spark/pull/49829 ### What changes were proposed in this pull request? This is a long-standing problem. With the GROUP BY ordinal feature, it's quite easy for users to write a complicated expression as

Re: [PR] [SPARK-48353][SQL] Introduction of Exception Handling mechanism in SQL Scripting [spark]

2025-02-06 Thread via GitHub
LuciferYang commented on code in PR #49427: URL: https://github.com/apache/spark/pull/49427#discussion_r1944276204 ## sql/core/src/test/scala/org/apache/spark/sql/scripting/SqlScriptingExecutionSuite.scala: ## @@ -65,6 +69,557 @@ class SqlScriptingExecutionSuite extends QueryTes

Re: [PR] [SPARK-51096][SQL] Splitting TransformWithStateSuite into UnsafeRow and Avro encoding suites [spark]

2025-02-06 Thread via GitHub
HeartSaVioR commented on PR #49815: URL: https://github.com/apache/spark/pull/49815#issuecomment-2639113028 The CI failure is not relevant - it only failed from pyspark-pandas and this PR only touched "Scala tests". -- This is an automated message from the Apache Git Service. To respond t

Re: [PR] [SPARK-51096][SQL][TESTS] Splitting TransformWithStateSuite into UnsafeRow and Avro encoding suites [spark]

2025-02-06 Thread via GitHub
HeartSaVioR commented on PR #49815: URL: https://github.com/apache/spark/pull/49815#issuecomment-2639113710 Thanks! Merging to master/4.0. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spec

Re: [PR] [SPARK-51107][LAUNCHER] Minor polish for CommandBuilderUtils class. [spark]

2025-02-06 Thread via GitHub
RocMarshal commented on code in PR #49826: URL: https://github.com/apache/spark/pull/49826#discussion_r1944349098 ## launcher/src/main/java/org/apache/spark/launcher/CommandBuilderUtils.java: ## @@ -46,24 +47,15 @@ static boolean isEmpty(String s) { /** Joins a list of stri

Re: [PR] [SPARK-51008][SQL] Add ResultStage for AQE [spark]

2025-02-06 Thread via GitHub
cloud-fan commented on code in PR #49715: URL: https://github.com/apache/spark/pull/49715#discussion_r1944355476 ## sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala: ## @@ -579,23 +592,52 @@ case class AdaptiveSparkPlanExec( al

Re: [PR] [SPARK-48353][SQL] Introduction of Exception Handling mechanism in SQL Scripting [spark]

2025-02-06 Thread via GitHub
miland-db commented on code in PR #49427: URL: https://github.com/apache/spark/pull/49427#discussion_r1944352830 ## sql/core/src/test/scala/org/apache/spark/sql/scripting/SqlScriptingExecutionSuite.scala: ## @@ -65,6 +69,557 @@ class SqlScriptingExecutionSuite extends QueryTest

Re: [PR] [SPARK-48353][SQL] Introduction of Exception Handling mechanism in SQL Scripting [spark]

2025-02-06 Thread via GitHub
cloud-fan commented on code in PR #49427: URL: https://github.com/apache/spark/pull/49427#discussion_r1944387771 ## sql/core/src/test/scala/org/apache/spark/sql/scripting/SqlScriptingExecutionSuite.scala: ## @@ -65,6 +69,557 @@ class SqlScriptingExecutionSuite extends QueryTest

Re: [PR] [SPARK-48353][SQL] Introduction of Exception Handling mechanism in SQL Scripting [spark]

2025-02-06 Thread via GitHub
LuciferYang commented on code in PR #49427: URL: https://github.com/apache/spark/pull/49427#discussion_r1944392967 ## sql/core/src/test/scala/org/apache/spark/sql/scripting/SqlScriptingExecutionSuite.scala: ## @@ -65,6 +69,557 @@ class SqlScriptingExecutionSuite extends QueryTes

Re: [PR] [SPARK-51105][ML][PYTHON][CONNECT][TESTS] Add parity test for ml functions [spark]

2025-02-06 Thread via GitHub
zhengruifeng closed pull request #49824: [SPARK-51105][ML][PYTHON][CONNECT][TESTS] Add parity test for ml functions URL: https://github.com/apache/spark/pull/49824 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL a

Re: [PR] [SPARK-48375][SQL] Add support for SIGNAL statement [spark]

2025-02-06 Thread via GitHub
miland-db commented on code in PR #49726: URL: https://github.com/apache/spark/pull/49726#discussion_r1944520674 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/SqlScriptingLogicalPlans.scala: ## @@ -405,3 +406,25 @@ case class ExceptionHandler(

Re: [PR] [SPARK-51010][SQL] Fix AlterColumnSpec not reporting resolved status correctly [spark]

2025-02-06 Thread via GitHub
cloud-fan commented on PR #49705: URL: https://github.com/apache/spark/pull/49705#issuecomment-2639517537 good catch! merging to master/4.0 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spe

Re: [PR] [SPARK-51010][SQL] Fix AlterColumnSpec not reporting resolved status correctly [spark]

2025-02-06 Thread via GitHub
cloud-fan closed pull request #49705: [SPARK-51010][SQL] Fix AlterColumnSpec not reporting resolved status correctly URL: https://github.com/apache/spark/pull/49705 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[PR] [SPARK-48881][SQL] Add FillStaticPartitions optimization rule to fill the static partiti… [spark]

2025-02-06 Thread via GitHub
fusheng9399 opened a new pull request, #49830: URL: https://github.com/apache/spark/pull/49830 …ons of the InsertIntoHadoopFsRelation command ### What changes were proposed in this pull request? When writing dynamic partitions, some dynamic partitions in InsertIntoHadoo

Re: [PR] [SPARK-51008][SQL] Add ResultStage for AQE [spark]

2025-02-06 Thread via GitHub
ulysses-you commented on code in PR #49715: URL: https://github.com/apache/spark/pull/49715#discussion_r1944256813 ## sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala: ## @@ -344,56 +350,50 @@ case class AdaptiveSparkPlanExec(

Re: [PR] [SPARK-49968][SQL] The split function produces incorrect results with an empty regex and a limit [spark]

2025-02-06 Thread via GitHub
beliefer commented on PR #48470: URL: https://github.com/apache/spark/pull/48470#issuecomment-2639099089 Please investigate more databases, then we make the decision which is the more suitable behavior. -- This is an automated message from the Apache Git Service. To respond to the message

Re: [PR] [SPARK-48375][SQL] Add support for SIGNAL statement [spark]

2025-02-06 Thread via GitHub
cloud-fan commented on code in PR #49726: URL: https://github.com/apache/spark/pull/49726#discussion_r1944314508 ## sql/core/src/main/scala/org/apache/spark/sql/scripting/SqlScriptingExecutionNode.scala: ## @@ -1020,3 +1023,105 @@ class ExceptionHandlerExec( override def re

Re: [PR] [SPARK-48375][SQL] Add support for SIGNAL statement [spark]

2025-02-06 Thread via GitHub
cloud-fan commented on code in PR #49726: URL: https://github.com/apache/spark/pull/49726#discussion_r1944498572 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/SqlScriptingLogicalPlans.scala: ## @@ -405,3 +406,25 @@ case class ExceptionHandler(

Re: [PR] [SPARK-48375][SQL] Add support for SIGNAL statement [spark]

2025-02-06 Thread via GitHub
MaxGekk commented on code in PR #49726: URL: https://github.com/apache/spark/pull/49726#discussion_r1944495189 ## common/utils/src/main/resources/error/error-conditions.json: ## @@ -3602,6 +3602,12 @@ ], "sqlState" : "42K09" }, + "INVALID_VARIABLE_TYPE_FOR_SIGNAL_S

Re: [PR] [SPARK-51107][CORE] Minor polish for CommandBuilderUtils class. [spark]

2025-02-06 Thread via GitHub
LuciferYang commented on code in PR #49826: URL: https://github.com/apache/spark/pull/49826#discussion_r1944500609 ## launcher/src/main/java/org/apache/spark/launcher/CommandBuilderUtils.java: ## @@ -46,24 +47,15 @@ static boolean isEmpty(String s) { /** Joins a list of str

Re: [PR] [SPARK-50982][SQL] Support more SQL/DataFrame read path functionality in single-pass Analyzer [spark]

2025-02-06 Thread via GitHub
LuciferYang commented on code in PR #49658: URL: https://github.com/apache/spark/pull/49658#discussion_r1944266007 ## sql/core/src/test/scala/org/apache/spark/sql/analysis/resolver/ViewResolverSuite.scala: ## @@ -0,0 +1,188 @@ +/* + * Licensed to the Apache Software Foundation (

Re: [PR] [SPARK-51107][LAUNCHER] Minor polish for CommandBuilderUtils class. [spark]

2025-02-06 Thread via GitHub
LuciferYang commented on code in PR #49826: URL: https://github.com/apache/spark/pull/49826#discussion_r1944297105 ## launcher/src/main/java/org/apache/spark/launcher/CommandBuilderUtils.java: ## @@ -46,24 +47,15 @@ static boolean isEmpty(String s) { /** Joins a list of str

Re: [PR] [SPARK-48375][SQL] Add support for SIGNAL statement [spark]

2025-02-06 Thread via GitHub
miland-db commented on code in PR #49726: URL: https://github.com/apache/spark/pull/49726#discussion_r1944467073 ## sql/core/src/main/scala/org/apache/spark/sql/scripting/SqlScriptingExecutionNode.scala: ## @@ -1020,3 +1023,105 @@ class ExceptionHandlerExec( override def re

Re: [PR] [SPARK-48375][SQL] Add support for SIGNAL statement [spark]

2025-02-06 Thread via GitHub
miland-db commented on code in PR #49726: URL: https://github.com/apache/spark/pull/49726#discussion_r1944468496 ## sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/SqlScriptingParserSuite.scala: ## @@ -2413,6 +2413,21 @@ class SqlScriptingParserSuite extends Spa

Re: [PR] [SPARK-48375][SQL] Add support for SIGNAL statement [spark]

2025-02-06 Thread via GitHub
miland-db commented on code in PR #49726: URL: https://github.com/apache/spark/pull/49726#discussion_r1944478910 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/SqlScriptingLogicalPlans.scala: ## @@ -405,3 +406,25 @@ case class ExceptionHandler(

[PR] [SPARK-51108][INFRA] Install Python packages for `yarn` module in `maven_test.yml` [spark]

2025-02-06 Thread via GitHub
LuciferYang opened a new pull request, #49827: URL: https://github.com/apache/spark/pull/49827 ### What changes were proposed in this pull request? This pr aims to add install Python packages process for `yarn` module in `maven_test.yml` ### Why are the changes needed? Synchroni

Re: [PR] [SPARK-51108][INFRA] Install Python packages for `yarn` module in `maven_test.yml` [spark]

2025-02-06 Thread via GitHub
LuciferYang commented on code in PR #49827: URL: https://github.com/apache/spark/pull/49827#discussion_r1944327611 ## .github/workflows/maven_test.yml: ## @@ -176,7 +176,7 @@ jobs: python-version: '3.11' architecture: x64 - name: Install Python packa

Re: [PR] [SPARK-51008][SQL] Add ResultStage for AQE [spark]

2025-02-06 Thread via GitHub
cloud-fan commented on code in PR #49715: URL: https://github.com/apache/spark/pull/49715#discussion_r1944332029 ## sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala: ## @@ -268,25 +276,23 @@ case class AdaptiveSparkPlanExec( def fi

Re: [PR] [SPARK-51008][SQL] Add ResultStage for AQE [spark]

2025-02-06 Thread via GitHub
cloud-fan commented on code in PR #49715: URL: https://github.com/apache/spark/pull/49715#discussion_r1944332985 ## sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala: ## @@ -523,15 +516,36 @@ case class AdaptiveSparkPlanExec( /**

Re: [PR] [SPARK-50982][SQL] Support more SQL/DataFrame read path functionality in single-pass Analyzer [spark]

2025-02-06 Thread via GitHub
vladimirg-db commented on code in PR #49658: URL: https://github.com/apache/spark/pull/49658#discussion_r1944333768 ## sql/core/src/test/scala/org/apache/spark/sql/analysis/resolver/ViewResolverSuite.scala: ## @@ -0,0 +1,188 @@ +/* + * Licensed to the Apache Software Foundation

Re: [PR] [SPARK-50982][SQL] Support more SQL/DataFrame read path functionality in single-pass Analyzer [spark]

2025-02-06 Thread via GitHub
LuciferYang commented on code in PR #49658: URL: https://github.com/apache/spark/pull/49658#discussion_r1944338526 ## sql/core/src/test/scala/org/apache/spark/sql/analysis/resolver/ViewResolverSuite.scala: ## @@ -0,0 +1,188 @@ +/* + * Licensed to the Apache Software Foundation (

Re: [PR] [SPARK-51008][SQL] Add ResultStage for AQE [spark]

2025-02-06 Thread via GitHub
cloud-fan commented on code in PR #49715: URL: https://github.com/apache/spark/pull/49715#discussion_r1944342120 ## sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala: ## @@ -344,56 +350,50 @@ case class AdaptiveSparkPlanExec( if

Re: [PR] [SPARK-50881][PYTHON] Use cached schema where possible in conenct dataframe.py [spark]

2025-02-06 Thread via GitHub
zhengruifeng commented on PR #49749: URL: https://github.com/apache/spark/pull/49749#issuecomment-2639347007 > Failed tests dont seem relevant please rebase this PR to latest master, to make sure CI is green -- This is an automated message from the Apache Git Service. To respond to

Re: [PR] [SPARK-51050][SQL][TESTS] Add group by alias tests to the group-by-alias.sql [spark]

2025-02-06 Thread via GitHub
MaxGekk closed pull request #49750: [SPARK-51050][SQL][TESTS] Add group by alias tests to the group-by-alias.sql URL: https://github.com/apache/spark/pull/49750 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abov

Re: [PR] [SPARK-51101][ML][PYTHON][CONNECT][TESTS] Add doctest for `pyspark.ml.connect.functions` [spark]

2025-02-06 Thread via GitHub
zhengruifeng commented on PR #49821: URL: https://github.com/apache/spark/pull/49821#issuecomment-2639378260 cc @HyukjinKwon would you mind taking another look? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL a

Re: [PR] [SPARK-50605][CONNECT] Support SQL API mode for easier migration to Spark Connect [spark]

2025-02-06 Thread via GitHub
LuciferYang commented on code in PR #49107: URL: https://github.com/apache/spark/pull/49107#discussion_r1945271197 ## resource-managers/yarn/src/test/scala/org/apache/spark/deploy/yarn/YarnClusterSuite.scala: ## @@ -237,6 +265,16 @@ class YarnClusterSuite extends BaseYarnCluster

Re: [PR] [SPARK-51042][SQL] Read and write the month and days fields of intervals with one call in Unsafe* classes [spark]

2025-02-06 Thread via GitHub
jonathan-albrecht-ibm commented on PR #49737: URL: https://github.com/apache/spark/pull/49737#issuecomment-2640748852 @MaxGekk Thanks for reviewing and merging! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL a

Re: [PR] [SPARK-51114] [SQL] Refactor PullOutNondeterministic rule [spark]

2025-02-06 Thread via GitHub
vladimirg-db commented on PR #49837: URL: https://github.com/apache/spark/pull/49837#issuecomment-2641040587 Please specify that this is for single-pass Analyzer in the PR description. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

Re: [PR] [SPARK-51093][SQL][TESTS] Fix minor endianness issues in tests. [spark]

2025-02-06 Thread via GitHub
jonathan-albrecht-ibm commented on PR #49812: URL: https://github.com/apache/spark/pull/49812#issuecomment-2640876133 Thanks @MaxGekk, I didn't know I could do that. I reran the failing build and it passed this time -- This is an automated message from the Apache Git Service. To respond t

Re: [PR] [SPARK-51092][SQL] FlatMapGroups: Change the `dataType` of the `timestampTimeoutAttribute` from IntegerType to LongType [spark]

2025-02-06 Thread via GitHub
jonathan-albrecht-ibm commented on PR #49811: URL: https://github.com/apache/spark/pull/49811#issuecomment-2640906566 @HeartSaVioR thanks for reviewing. I understand that a schema change could break backwards compatibility. I'd just like to mention in case it helps that all of the tests, ex

Re: [PR] [SPARK-51042][SQL] Read and write the month and days fields of intervals with one call in Unsafe* classes [spark]

2025-02-06 Thread via GitHub
jonathan-albrecht-ibm commented on PR #49737: URL: https://github.com/apache/spark/pull/49737#issuecomment-2640820183 @MaxGekk Would it be possible to merge this to branch-3.5 as well? I have also built and tested it on a local build of 3.5.4. More generally, should I mention that I'd

Re: [PR] [SPARK-51065][SQL] Disallowing non-nullable schema when Avro encoding is used for TransformWithState [spark]

2025-02-06 Thread via GitHub
ericm-db commented on code in PR #49751: URL: https://github.com/apache/spark/pull/49751#discussion_r1945307417 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StatefulProcessorHandleImpl.scala: ## @@ -364,17 +364,20 @@ class DriverStatefulProcessorHandleImpl

Re: [PR] [SPARK-51065][SQL] Disallowing non-nullable schema when Avro encoding is used for TransformWithState [spark]

2025-02-06 Thread via GitHub
ericm-db commented on code in PR #49751: URL: https://github.com/apache/spark/pull/49751#discussion_r1945307892 ## python/pyspark/sql/tests/pandas/test_pandas_transform_with_state.py: ## @@ -1470,6 +1470,39 @@ def check_exception(error): check_exception=chec

Re: [PR] [SPARK-51095][CORE][SQL] Include caller context for hdfs audit logs for calls from driver [spark]

2025-02-06 Thread via GitHub
sririshindra commented on PR #49814: URL: https://github.com/apache/spark/pull/49814#issuecomment-2640817650 > The '__' in the PR description suggest the example was created with an older version of the PR, right? > > I.e. the `SPARK_DRIVER__application` in this line: > > ```

Re: [PR] [SPARK-51065][SQL] Disallowing non-nullable schema when Avro encoding is used for TransformWithState [spark]

2025-02-06 Thread via GitHub
anishshri-db commented on code in PR #49751: URL: https://github.com/apache/spark/pull/49751#discussion_r1945369374 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StateStoreColumnFamilySchemaUtils.scala: ## @@ -47,7 +47,7 @@ object StateStoreColumnFamilySche

Re: [PR] [SPARK-51114] [SQL] Refactor PullOutNondeterministic rule [spark]

2025-02-06 Thread via GitHub
vladimirg-db commented on code in PR #49837: URL: https://github.com/apache/spark/pull/49837#discussion_r1945454956 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/PullOutNondeterministic.scala: ## @@ -51,27 +55,39 @@ object PullOutNondeterministic extends

[PR] [SPARK-51104][DOC][FollowUp] Self-host docsearch.min.css in Spark website [spark]

2025-02-06 Thread via GitHub
gengliangwang opened a new pull request, #49838: URL: https://github.com/apache/spark/pull/49838 ### What changes were proposed in this pull request? Follow-up of https://github.com/apache/spark/pull/49823, we need to self-host docsearch.min.css in Spark website as well

Re: [PR] [SPARK-51097] [SS] Adding partition-level metrics for last uploaded snapshot version in RocksDB [spark]

2025-02-06 Thread via GitHub
liviazhu-db commented on PR #49816: URL: https://github.com/apache/spark/pull/49816#issuecomment-2641089134 I thought we were creating a new metric for the snapshot delta lag, not the last uploaded version? Did I miss something, why did we change this? Now we need to manually compute the la

Re: [PR] [MINOR][SQL] Remove unused TableSchema constructor [spark]

2025-02-06 Thread via GitHub
the-sakthi closed pull request #49645: [MINOR][SQL] Remove unused TableSchema constructor URL: https://github.com/apache/spark/pull/49645 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [MINOR][SQL] Remove unused TableSchema constructor [spark]

2025-02-06 Thread via GitHub
the-sakthi commented on PR #49645: URL: https://github.com/apache/spark/pull/49645#issuecomment-2641097470 Closing this in favor of the above comment. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[PR] [SPARK-50075][FOLLOWUP][DOCS] Add table-valued function API docs [spark]

2025-02-06 Thread via GitHub
ueshin opened a new pull request, #49839: URL: https://github.com/apache/spark/pull/49839 ### What changes were proposed in this pull request? Adds table-valued function API docs. ### Why are the changes needed? Some of the API docs related to table-valued functions are m

Re: [PR] [SQL][SPARK-51113] Fix correctness with UNION/EXCEPT/INTERSECT inside a view or EXECUTE IMMEDIATE [spark]

2025-02-06 Thread via GitHub
vladimirg-db commented on PR #49835: URL: https://github.com/apache/spark/pull/49835#issuecomment-2641054365 I will add more tests for `EXECUTE IMMEDIATE`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [PR] [SPARK-50639][SQL] Improve warning logging in CacheManager [spark]

2025-02-06 Thread via GitHub
yangguoaws commented on code in PR #49276: URL: https://github.com/apache/spark/pull/49276#discussion_r1944674922 ## sql/core/src/main/scala/org/apache/spark/sql/execution/CacheManager.scala: ## @@ -126,7 +126,9 @@ class CacheManager extends Logging with AdaptiveSparkPlanHelper

Re: [PR] [SPARK-48530][SQL] Support for local variables in SQL Scripting [spark]

2025-02-06 Thread via GitHub
dusantism-db commented on code in PR #49445: URL: https://github.com/apache/spark/pull/49445#discussion_r1944679337 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ColumnResolutionHelper.scala: ## @@ -266,22 +275,46 @@ trait ColumnResolutionHelper extends L

Re: [PR] [SPARK-50605][CONNECT] Support SQL API mode for easier migration to Spark Connect [spark]

2025-02-06 Thread via GitHub
LuciferYang commented on code in PR #49107: URL: https://github.com/apache/spark/pull/49107#discussion_r1945123620 ## resource-managers/yarn/src/test/scala/org/apache/spark/deploy/yarn/YarnClusterSuite.scala: ## @@ -237,6 +265,16 @@ class YarnClusterSuite extends BaseYarnCluster

Re: [PR] [SQL][SPARK-51113] Fix correctness with UNION/EXCEPT/INTERSECT inside a view [spark]

2025-02-06 Thread via GitHub
vladimirg-db commented on PR #49835: URL: https://github.com/apache/spark/pull/49835#issuecomment-2640573487 @dtenedor @cloud-fan any ideas why the SQL Parser behaves like that in the first place? I mean, I found a fix (it kinda works the same as what we do for regular queries), but not sur

[PR] [SPARK-51112][CONNECT] Avoid using pyarrow's `to_pandas` on an empty table [spark]

2025-02-06 Thread via GitHub
vicennial opened a new pull request, #49834: URL: https://github.com/apache/spark/pull/49834 ### What changes were proposed in this pull request? When the `pyarrow` table is empty, avoid calling the `to_pandas` method due to potential segfault failures. Instead, an empty panda

Re: [PR] [SPARK-51107][CORE] Refactor CommandBuilderUtils#join to reuse the lines about strings join and reduce the redundant lines. [spark]

2025-02-06 Thread via GitHub
LuciferYang commented on PR #49826: URL: https://github.com/apache/spark/pull/49826#issuecomment-2640043258 Although the failure should not be related to the current pr, could you please retrigger the two failed GA tasks? @RocMarshal thanks -- This is an automated message from the Apache

Re: [PR] [SPARK-50605][CONNECT] Support SQL API mode for easier migration to Spark Connect [spark]

2025-02-06 Thread via GitHub
LuciferYang commented on code in PR #49107: URL: https://github.com/apache/spark/pull/49107#discussion_r1945271197 ## resource-managers/yarn/src/test/scala/org/apache/spark/deploy/yarn/YarnClusterSuite.scala: ## @@ -237,6 +265,16 @@ class YarnClusterSuite extends BaseYarnCluster

[PR] [SPARK-51114] [SQL] Refactor PullOutNondeterministic rule [spark]

2025-02-06 Thread via GitHub
mihailoale-db opened a new pull request, #49837: URL: https://github.com/apache/spark/pull/49837 ### What changes were proposed in this pull request? Refactor `PullOutNondeterministic` rule. ### Why are the changes needed? Better reusability of the `PullOutNondeterministic` rule.

Re: [PR] [SPARK-51104][DOC] Self-host JavaScript and CSS in Spark website [spark]

2025-02-06 Thread via GitHub
nchammas commented on PR #49823: URL: https://github.com/apache/spark/pull/49823#issuecomment-2640232712 > @nchammas This question isn't related to this PR, but I couldn't build the PySpark API docs on macOS using the following command: > > ``` > SKIP_SCALADOC=1 SKIP_RDOC=1 SKIP_SQ

Re: [PR] [SPARK-51095][CORE][SQL] Include caller context for hdfs audit logs for calls from driver [spark]

2025-02-06 Thread via GitHub
sririshindra commented on code in PR #49814: URL: https://github.com/apache/spark/pull/49814#discussion_r1944966711 ## core/src/main/scala/org/apache/spark/SparkContext.scala: ## @@ -722,6 +722,10 @@ class SparkContext(config: SparkConf) extends Logging { } appStatusSo

Re: [PR] [SPARK-49698][CONNECT][SQL] Add ClassicOnly annotation for classic only methods. [spark]

2025-02-06 Thread via GitHub
hvanhovell commented on PR #49801: URL: https://github.com/apache/spark/pull/49801#issuecomment-2640268767 Merging to master/4.0 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comme

[PR] [SPARK-48881][SQL] Add FillStaticPartitions optimization rule [spark]

2025-02-06 Thread via GitHub
fusheng9399 opened a new pull request, #49832: URL: https://github.com/apache/spark/pull/49832 ### What changes were proposed in this pull request? When writing dynamic partitions, some dynamic partitions in InsertIntoHadoopFsRelationCommand can be compensated to specific part

Re: [PR] [SPARK-43415][CONNECT][SQL] Implement `KVGDS.agg` with custom `mapValues` function [spark]

2025-02-06 Thread via GitHub
asfgit closed pull request #49111: [SPARK-43415][CONNECT][SQL] Implement `KVGDS.agg` with custom `mapValues` function URL: https://github.com/apache/spark/pull/49111 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

Re: [PR] [SPARK-50639][SQL] Improve warning logging in CacheManager [spark]

2025-02-06 Thread via GitHub
yangguoaws commented on code in PR #49276: URL: https://github.com/apache/spark/pull/49276#discussion_r1944708425 ## sql/core/src/main/scala/org/apache/spark/sql/execution/CacheManager.scala: ## @@ -206,7 +209,10 @@ class CacheManager extends Logging with AdaptiveSparkPlanHelpe

Re: [PR] [SPARK-51057][SS] Remove scala option based variant API for value state [spark]

2025-02-06 Thread via GitHub
HeartSaVioR commented on PR #49769: URL: https://github.com/apache/spark/pull/49769#issuecomment-2639693771 Thanks! Merging to master/4.0. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spec

Re: [PR] [SPARK-48530][SQL] Support for local variables in SQL Scripting [spark]

2025-02-06 Thread via GitHub
dusantism-db commented on code in PR #49445: URL: https://github.com/apache/spark/pull/49445#discussion_r1944727379 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveCatalogs.scala: ## @@ -73,28 +94,39 @@ class ResolveCatalogs(val catalogManager: Catal

[PR] [SPARK-51111][SS] Avoid consumer rebalancing stuck when starting a spark streaming job [spark]

2025-02-06 Thread via GitHub
yabola opened a new pull request, #49831: URL: https://github.com/apache/spark/pull/49831 (no comment) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mai

Re: [PR] [SPARK-48881][SQL] Add FillStaticPartitions optimization rule [spark]

2025-02-06 Thread via GitHub
fusheng9399 closed pull request #49830: [SPARK-48881][SQL] Add FillStaticPartitions optimization rule URL: https://github.com/apache/spark/pull/49830 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] [SPARK-51108][INFRA] Install Python packages for `yarn` module in `maven_test.yml` [spark]

2025-02-06 Thread via GitHub
LuciferYang commented on code in PR #49827: URL: https://github.com/apache/spark/pull/49827#discussion_r1945128803 ## .github/workflows/maven_test.yml: ## @@ -176,7 +176,7 @@ jobs: python-version: '3.11' architecture: x64 - name: Install Python packa

Re: [PR] [SPARK-51108][INFRA] Install Python packages for `yarn` module in `maven_test.yml` [spark]

2025-02-06 Thread via GitHub
LuciferYang commented on code in PR #49827: URL: https://github.com/apache/spark/pull/49827#discussion_r1945128803 ## .github/workflows/maven_test.yml: ## @@ -176,7 +176,7 @@ jobs: python-version: '3.11' architecture: x64 - name: Install Python packa

Re: [PR] [SPARK-50854][SS] Make path fully qualified before passing it to FileStreamSink [spark]

2025-02-06 Thread via GitHub
vrozov commented on PR #49654: URL: https://github.com/apache/spark/pull/49654#issuecomment-2640513821 > Though I don't like to make any behavior change as it is not justified that this will help any arbitrary setup. Could you please add the sink option in FileStreamSink, like "qualifyRelat

Re: [PR] [SPARK-51093][SQL][TESTS] Fix minor endianness issues in tests. [spark]

2025-02-06 Thread via GitHub
MaxGekk commented on PR #49812: URL: https://github.com/apache/spark/pull/49812#issuecomment-2640546139 > So its not related to this change. Highly likely you are right, but this particular GitHub action stopped and didn't run the rest tests that might related to your changes. May I a

Re: [PR] [SPARK-51097] [SS] Adding partition-level metrics for last uploaded snapshot version in RocksDB [spark]

2025-02-06 Thread via GitHub
zecookiez commented on PR #49816: URL: https://github.com/apache/spark/pull/49816#issuecomment-2641110063 This was more of an offline discussion, but I think the delta is implied through oldest snapshots as well, so we can also use this to identify problematic partitions -- This is an au

Re: [PR] [SPARK-51104][DOC][FollowUp] Self-host docsearch.min.css in Spark website [spark]

2025-02-06 Thread via GitHub
gengliangwang commented on PR #49838: URL: https://github.com/apache/spark/pull/49838#issuecomment-2641143881 @viirya thanks for the reviews. I am merging this one to master/branch-4.0/branch-3.5 -- This is an automated message from the Apache Git Service. To respond to the message, pleas

Re: [PR] [SPARK-51110][CORE][SQL] Proper error handling for file status when reading files [spark]

2025-02-06 Thread via GitHub
fusheng9399 commented on code in PR #49833: URL: https://github.com/apache/spark/pull/49833#discussion_r1945962916 ## core/src/main/scala/org/apache/spark/util/HadoopFSUtils.scala: ## @@ -250,8 +250,15 @@ private[spark] object HadoopFSUtils extends Logging { "method

Re: [PR] [SPARK-51008][SQL] Add ResultStage for AQE [spark]

2025-02-06 Thread via GitHub
cloud-fan commented on code in PR #49715: URL: https://github.com/apache/spark/pull/49715#discussion_r1945970874 ## sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala: ## @@ -344,56 +350,50 @@ case class AdaptiveSparkPlanExec( if

Re: [PR] [SPARK-51108][INFRA] Install Python packages for `yarn` module in `maven_test.yml` [spark]

2025-02-06 Thread via GitHub
LuciferYang commented on PR #49827: URL: https://github.com/apache/spark/pull/49827#issuecomment-2641974717 Thanks @HyukjinKwon -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comme

Re: [PR] [SPARK-51104][DOC] Self-host JavaScript and CSS in Spark website [spark]

2025-02-06 Thread via GitHub
panbingkun commented on PR #49823: URL: https://github.com/apache/spark/pull/49823#issuecomment-2642050734 LGTM late. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To uns

Re: [PR] [SPARK-50655][SS] Move virtual col family related mapping into db layer instead of encoder [spark]

2025-02-06 Thread via GitHub
anishshri-db commented on code in PR #49304: URL: https://github.com/apache/spark/pull/49304#discussion_r1946060254 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDBFileManager.scala: ## @@ -931,7 +941,8 @@ case class RocksDBCheckpointMetadata(

Re: [PR] [SPARK-50655][SS] Move virtual col family related mapping into db layer instead of encoder [spark]

2025-02-06 Thread via GitHub
anishshri-db commented on code in PR #49304: URL: https://github.com/apache/spark/pull/49304#discussion_r1946059793 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDB.scala: ## @@ -723,20 +920,43 @@ class RocksDB( iter.seekToFirst()

Re: [PR] [SPARK-51065][SQL] Disallowing non-nullable schema when Avro encoding is used for TransformWithState [spark]

2025-02-06 Thread via GitHub
HeartSaVioR commented on code in PR #49751: URL: https://github.com/apache/spark/pull/49751#discussion_r1945987644 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StatefulProcessorHandleImpl.scala: ## @@ -363,18 +363,36 @@ class DriverStatefulProcessorHandleI

Re: [PR] [SPARK-51065][SQL] Disallowing non-nullable schema when Avro encoding is used for TransformWithState [spark]

2025-02-06 Thread via GitHub
HeartSaVioR commented on code in PR #49751: URL: https://github.com/apache/spark/pull/49751#discussion_r1945998917 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/TransformWithStateExec.scala: ## @@ -149,8 +149,11 @@ case class TransformWithStateExec(

Re: [PR] [SPARK-50605][CONNECT][FOLLOW-UP] Install dependencies for Yarn tests in Maven build [spark]

2025-02-06 Thread via GitHub
HyukjinKwon commented on PR #49845: URL: https://github.com/apache/spark/pull/49845#issuecomment-2642022862 @LuciferYang made a PR first :-) https://github.com/apache/spark/pull/49827 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to G

Re: [PR] [SPARK-50605][CONNECT][FOLLOW-UP] Install dependencies for Yarn tests in Maven build [spark]

2025-02-06 Thread via GitHub
cloud-fan commented on PR #49845: URL: https://github.com/apache/spark/pull/49845#issuecomment-2642013276 This is not the right fix? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific c

Re: [PR] [SPARK-51067][SQL] Revert session level collation as object level collation will be used instead [spark]

2025-02-06 Thread via GitHub
cloud-fan commented on code in PR #49772: URL: https://github.com/apache/spark/pull/49772#discussion_r1945991404 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveDefaultStringTypes.scala: ## @@ -47,7 +46,7 @@ object ResolveDefaultStringTypes extends R

  1   2   3   >