Re: [PR] [SPARK-51175][CORE] Make `Master` show elapsed time when removing drivers [spark]

2025-02-11 Thread via GitHub
dongjoon-hyun commented on PR #49903: URL: https://github.com/apache/spark/pull/49903#issuecomment-2652890785 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific commen

Re: [PR] [SPARK-51175][CORE] Make `Master` show elapsed time when removing drivers [spark]

2025-02-11 Thread via GitHub
dongjoon-hyun closed pull request #49903: [SPARK-51175][CORE] Make `Master` show elapsed time when removing drivers URL: https://github.com/apache/spark/pull/49903 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL a

Re: [PR] [SPARK-51141][ML][CONNECT][DOCS] Document the Support of ML on Connect [spark]

2025-02-11 Thread via GitHub
dongjoon-hyun commented on PR #49902: URL: https://github.com/apache/spark/pull/49902#issuecomment-2652889523 Merged to master/4.0. Thank you, @zhengruifeng and @HyukjinKwon . -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub an

Re: [PR] [SPARK-51141][ML][CONNECT][DOCS] Document the Support of ML on Connect [spark]

2025-02-11 Thread via GitHub
dongjoon-hyun closed pull request #49902: [SPARK-51141][ML][CONNECT][DOCS] Document the Support of ML on Connect URL: https://github.com/apache/spark/pull/49902 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abov

Re: [PR] [SPARK-51113][SQL] Fix correctness with UNION/EXCEPT/INTERSECT inside a view or EXECUTE IMMEDIATE [spark]

2025-02-11 Thread via GitHub
cloud-fan commented on code in PR #49835: URL: https://github.com/apache/spark/pull/49835#discussion_r1952121444 ## sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala: ## @@ -5554,6 +5554,18 @@ object SQLConf { .booleanConf .createWithDefault(true)

Re: [PR] [SPARK-51097] [SS] Adding state store instance metrics for last uploaded snapshot version in RocksDB [spark]

2025-02-11 Thread via GitHub
HeartSaVioR commented on code in PR #49816: URL: https://github.com/apache/spark/pull/49816#discussion_r1952059417 ## sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala: ## @@ -2251,6 +2251,18 @@ object SQLConf { .booleanConf .createWithDefault

Re: [PR] [SPARK-51175][CORE] Make `Master` show elapsed time when removing drivers [spark]

2025-02-11 Thread via GitHub
HyukjinKwon commented on PR #49903: URL: https://github.com/apache/spark/pull/49903#issuecomment-2652759158 Let's fill the PR description tho :-). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] [SPARK-50126][PYTHON][CONNECT][3.5] PySpark expr() (expression) SQL Function returns None in Spark Connect [spark]

2025-02-11 Thread via GitHub
the-sakthi commented on PR #49755: URL: https://github.com/apache/spark/pull/49755#issuecomment-2652838021 Thank you, all! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. T

Re: [PR] [SPARK-51163][BUILD] Exclude duplicated jars from connect-repl [spark]

2025-02-11 Thread via GitHub
dongjoon-hyun commented on PR #49892: URL: https://github.com/apache/spark/pull/49892#issuecomment-2652832694 It seems that the PR description is outdated. When I compare the AS-IS PR, I get the following. Could you double-check and make the PR description up-to-date, please, @pan3793 ?

Re: [PR] [SPARK-50126][PYTHON][CONNECT][3.5] PySpark expr() (expression) SQL Function returns None in Spark Connect [spark]

2025-02-11 Thread via GitHub
dongjoon-hyun commented on PR #49755: URL: https://github.com/apache/spark/pull/49755#issuecomment-2652804846 I fully understand how you feel and thank you for saying like that. I'd recommend to use Apache Spark 4.0.0 because it will arrive this week (before Apache Spark 3.5.5). For t

Re: [PR] [SPARK-51110][CORE][SQL] Proper error handling for file status when reading files [spark]

2025-02-11 Thread via GitHub
fusheng9399 commented on PR #49833: URL: https://github.com/apache/spark/pull/49833#issuecomment-2652818709 Could you please help me review the code again? @MaxGekk -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

Re: [PR] [SPARK-51152][SQL] Add richer examples for the get_json_object function [spark]

2025-02-11 Thread via GitHub
fusheng9399 commented on code in PR #49875: URL: https://github.com/apache/spark/pull/49875#discussion_r1952078550 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala: ## @@ -42,6 +42,10 @@ import org.apache.spark.unsafe.types.UTF8Strin

Re: [PR] [SPARK-51152][SQL] Add richer examples for the get_json_object function [spark]

2025-02-11 Thread via GitHub
fusheng9399 commented on code in PR #49875: URL: https://github.com/apache/spark/pull/49875#discussion_r1952070180 ## python/pyspark/sql/functions/builtin.py: ## @@ -20115,11 +20115,24 @@ def get_json_object(col: "ColumnOrName", path: str) -> Column: Examples --

Re: [PR] [SPARK-51152][SQL] Add richer examples for the get_json_object function [spark]

2025-02-11 Thread via GitHub
fusheng9399 commented on code in PR #49875: URL: https://github.com/apache/spark/pull/49875#discussion_r1952069454 ## python/pyspark/sql/functions/builtin.py: ## @@ -20115,11 +20115,24 @@ def get_json_object(col: "ColumnOrName", path: str) -> Column: Examples --

[PR] Test arrow 18.2.0 [spark]

2025-02-11 Thread via GitHub
LuciferYang opened a new pull request, #49904: URL: https://github.com/apache/spark/pull/49904 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How

Re: [PR] [SPARK-50126][PYTHON][CONNECT][3.5] PySpark expr() (expression) SQL Function returns None in Spark Connect [spark]

2025-02-11 Thread via GitHub
the-sakthi commented on PR #49755: URL: https://github.com/apache/spark/pull/49755#issuecomment-2652790860 Thanks for taking time to review and leave feedback @dongjoon-hyun and @zhengruifeng Reasons why I thought of this more as a bug and less as an improvement: - Actually I ran

Re: [PR] [SPARK-51163][BUILD] Exclude duplicated jars from connect-repl [spark]

2025-02-11 Thread via GitHub
dongjoon-hyun commented on code in PR #49892: URL: https://github.com/apache/spark/pull/49892#discussion_r1952059048 ## pom.xml: ## @@ -1136,7 +1136,7 @@ org.scala-lang.modules -scala-xml_2.13 +scala-xml_${scala.binary.v

Re: [PR] [SPARK-50126][PYTHON][CONNECT][3.5] PySpark expr() (expression) SQL Function returns None in Spark Connect [spark]

2025-02-11 Thread via GitHub
dongjoon-hyun commented on PR #49755: URL: https://github.com/apache/spark/pull/49755#issuecomment-2652783632 For `Spark Connect` perspective, Apache Spark 3.5's Connect is known to be premature when we compare with Apache Spark 4.0's Connect module, isn't it? As you pointed out, there are

Re: [PR] [SPARK-50126][PYTHON][CONNECT][3.5] PySpark expr() (expression) SQL Function returns None in Spark Connect [spark]

2025-02-11 Thread via GitHub
dongjoon-hyun commented on PR #49755: URL: https://github.com/apache/spark/pull/49755#issuecomment-2652779466 Well, I disagree with the idea of backporting. Especially, we should be more conservative because Apache Spark 3.5.x is a long-term support version. To @zhengruifeng , please

Re: [PR] [SPARK-51163][BUILD] Exclude duplicated jars from connect-repl [spark]

2025-02-11 Thread via GitHub
pan3793 commented on code in PR #49892: URL: https://github.com/apache/spark/pull/49892#discussion_r1952053090 ## pom.xml: ## @@ -1136,7 +1136,7 @@ org.scala-lang.modules -scala-xml_2.13 +scala-xml_${scala.binary.version

Re: [PR] [SPARK-51163][BUILD] Exclude duplicated jars from connect-repl [spark]

2025-02-11 Thread via GitHub
dongjoon-hyun commented on code in PR #49892: URL: https://github.com/apache/spark/pull/49892#discussion_r1952049761 ## pom.xml: ## @@ -1136,7 +1136,7 @@ org.scala-lang.modules -scala-xml_2.13 +scala-xml_${scala.binary.v

Re: [PR] [SPARK-51163][BUILD] Exclude duplicated jars from connect-repl [spark]

2025-02-11 Thread via GitHub
dongjoon-hyun commented on PR #49892: URL: https://github.com/apache/spark/pull/49892#issuecomment-2652766699 Thank you for updating the PR, @pan3793 ! 👍🏻 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [PR] [SPARK-51175][CORE] Make `Master` show elapsed time when removing drivers [spark]

2025-02-11 Thread via GitHub
dongjoon-hyun commented on PR #49903: URL: https://github.com/apache/spark/pull/49903#issuecomment-2652761374 Thank you, @HyukjinKwon . Yes, it's filled now~ :) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

Re: [PR] [SPARK-50856][SS][PYTHON][CONNECT] Spark Connect Support for TransformWithStateInPandas In Python [spark]

2025-02-11 Thread via GitHub
HeartSaVioR commented on code in PR #49560: URL: https://github.com/apache/spark/pull/49560#discussion_r1952035735 ## python/pyspark/sql/tests/pandas/test_pandas_transform_with_state.py: ## Review Comment: Again, please highlight if there was also a code change other than c

Re: [PR] [SPARK-50856][SS][PYTHON][CONNECT] Spark Connect Support for TransformWithStateInPandas In Python [spark]

2025-02-11 Thread via GitHub
HeartSaVioR commented on code in PR #49560: URL: https://github.com/apache/spark/pull/49560#discussion_r1951994759 ## python/pyspark/sql/tests/pandas/test_pandas_transform_with_state.py: ## @@ -67,6 +67,8 @@ def conf(cls): ) cfg.set("spark.sql.execution.arrow.

Re: [PR] [SPARK-51163][BUILD] Exclude duplicated jars from connect-repl [spark]

2025-02-11 Thread via GitHub
pan3793 commented on PR #49892: URL: https://github.com/apache/spark/pull/49892#issuecomment-2652744804 @dongjoon-hyun Thanks for checking the GHA result. The UT classpath is different from the packaged classpath, I tuned the POM to ensure both are fine. (test steps are updated in PR desc)

[PR] [SPARK-51175][CORE] Make `Master` show elapsed time when removing drivers [spark]

2025-02-11 Thread via GitHub
dongjoon-hyun opened a new pull request, #49903: URL: https://github.com/apache/spark/pull/49903 … ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change?

Re: [PR] [SPARK-51171][BUILD] Upgrade `checkstyle` to 10.21.2 [spark]

2025-02-11 Thread via GitHub
dongjoon-hyun commented on PR #49899: URL: https://github.com/apache/spark/pull/49899#issuecomment-2652748625 Merged to master. Thank you, @wayneguow and all! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abo

Re: [PR] [SPARK-51171][BUILD] Upgrade `checkstyle` to 10.21.2 [spark]

2025-02-11 Thread via GitHub
dongjoon-hyun closed pull request #49899: [SPARK-51171][BUILD] Upgrade `checkstyle` to 10.21.2 URL: https://github.com/apache/spark/pull/49899 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spe

Re: [PR] [SPARK-51008][SQL] Add ResultStage for AQE [spark]

2025-02-11 Thread via GitHub
cloud-fan commented on code in PR #49715: URL: https://github.com/apache/spark/pull/49715#discussion_r1952029141 ## sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala: ## @@ -521,6 +513,66 @@ case class AdaptiveSparkPlanExec( this.in

Re: [PR] [SPARK-50126][PYTHON][CONNECT][3.5] PySpark expr() (expression) SQL Function returns None in Spark Connect [spark]

2025-02-11 Thread via GitHub
zhengruifeng commented on PR #49755: URL: https://github.com/apache/spark/pull/49755#issuecomment-2652725024 It is an improvement from the perspective of spark connect, it is also a bug fix if users migrate from spark classic to spark connect. I think there are many PRs like this.

Re: [PR] [SPARK-50666][SQL] Support hint for reading in JDBC data source [spark]

2025-02-11 Thread via GitHub
gabry-lab commented on PR #49564: URL: https://github.com/apache/spark/pull/49564#issuecomment-2652712873 > Member @dongjoon-hyun As I mentioned above, `INDEX(test idx1)` is a hint, `/*+`, `*/` are not parts of it. We still use a regrex to validate the `/*+`, `*/` ? -- This is an a

Re: [PR] [SPARK-51008][SQL] Add ResultStage for AQE [spark]

2025-02-11 Thread via GitHub
liuzqt commented on PR #49715: URL: https://github.com/apache/spark/pull/49715#issuecomment-2652695121 > I think we already did it for all query stages. @liuzqt how did you see result query stage in the UI? I think we need to explicitly match the name to ignore it (updated in [this c

Re: [PR] [SPARK-50126][PYTHON][CONNECT][3.5] PySpark expr() (expression) SQL Function returns None in Spark Connect [spark]

2025-02-11 Thread via GitHub
dongjoon-hyun commented on PR #49755: URL: https://github.com/apache/spark/pull/49755#issuecomment-2652688620 Let me close this PR to prevent any accidental merging. We can continue our discussion on this PR. -- This is an automated message from the Apache Git Service. To respond to the m

Re: [PR] [SPARK-51008][SQL] Add ResultStage for AQE [spark]

2025-02-11 Thread via GitHub
liuzqt commented on code in PR #49715: URL: https://github.com/apache/spark/pull/49715#discussion_r1951996396 ## sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/QueryStageExec.scala: ## @@ -303,3 +308,43 @@ case class TableCacheQueryStageExec( override def g

Re: [PR] [SPARK-48516][PYTHON][CONNECT] Turn on Arrow optimization for Python UDFs by default [spark]

2025-02-11 Thread via GitHub
xinrong-meng commented on PR #49482: URL: https://github.com/apache/spark/pull/49482#issuecomment-2652683313 Thank you @dongjoon-hyun ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specifi

Re: [PR] [SPARK-50126][PYTHON][CONNECT][3.5] PySpark expr() (expression) SQL Function returns None in Spark Connect [spark]

2025-02-11 Thread via GitHub
dongjoon-hyun closed pull request #49755: [SPARK-50126][PYTHON][CONNECT][3.5] PySpark expr() (expression) SQL Function returns None in Spark Connect URL: https://github.com/apache/spark/pull/49755 -- This is an automated message from the Apache Git Service. To respond to the message, please l

Re: [PR] [SPARK-51067][SQL] Revert session level collation for DML queries and apply object level collation for DDL queries [spark]

2025-02-11 Thread via GitHub
cloud-fan commented on code in PR #49772: URL: https://github.com/apache/spark/pull/49772#discussion_r1951990003 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveDDLCommandStringTypes.scala: ## @@ -155,22 +123,22 @@ object ResolveDefaultStringTypes ex

Re: [PR] [SPARK-51067][SQL] Revert session level collation for DML queries and apply object level collation for DDL queries [spark]

2025-02-11 Thread via GitHub
cloud-fan commented on code in PR #49772: URL: https://github.com/apache/spark/pull/49772#discussion_r1951988840 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveDDLCommandStringTypes.scala: ## @@ -155,22 +123,22 @@ object ResolveDefaultStringTypes ex

Re: [PR] [SPARK-51067][SQL] Revert session level collation for DML queries and apply object level collation for DDL queries [spark]

2025-02-11 Thread via GitHub
cloud-fan commented on code in PR #49772: URL: https://github.com/apache/spark/pull/49772#discussion_r1951987578 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveDDLCommandStringTypes.scala: ## @@ -18,80 +18,48 @@ package org.apache.spark.sql.catalys

Re: [PR] [MINOR][INFRA] List pip installation before test in python macos test and python connect test [spark]

2025-02-11 Thread via GitHub
zhengruifeng commented on PR #49901: URL: https://github.com/apache/spark/pull/49901#issuecomment-2652682297 thanks, merged to master/4.0 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the speci

Re: [PR] [MINOR][INFRA] List pip installation before test in python macos test and python connect test [spark]

2025-02-11 Thread via GitHub
zhengruifeng closed pull request #49901: [MINOR][INFRA] List pip installation before test in python macos test and python connect test URL: https://github.com/apache/spark/pull/49901 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHu

Re: [PR] [SPARK-51067][SQL] Revert session level collation for DML queries and apply object level collation for DDL queries [spark]

2025-02-11 Thread via GitHub
cloud-fan commented on code in PR #49772: URL: https://github.com/apache/spark/pull/49772#discussion_r1951984152 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveDDLCommandStringTypes.scala: ## @@ -18,80 +18,48 @@ package org.apache.spark.sql.catalys

Re: [PR] [SPARK-50582][SQL][PYTHON] Add quote builtin function [spark]

2025-02-11 Thread via GitHub
sarutak commented on PR #49191: URL: https://github.com/apache/spark/pull/49191#issuecomment-2652662253 Thank you @dongjoon-hyun and @MaxGekk ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] [SPARK-50992][SQL] OOMs and performance issues with AQE in large plans [spark]

2025-02-11 Thread via GitHub
SauronShepherd commented on PR #49724: URL: https://github.com/apache/spark/pull/49724#issuecomment-2652658452 Showing the final plan - if it's not expensive in time and memory consumption - could be a great idea but may require more changes? -- This is an automated message from the Apac

Re: [PR] [SPARK-50856][SS][PYTHON][CONNECT] Spark Connect Support for TransformWithStateInPandas In Python [spark]

2025-02-11 Thread via GitHub
HeartSaVioR commented on code in PR #49560: URL: https://github.com/apache/spark/pull/49560#discussion_r1951968651 ## sql/connect/common/src/main/protobuf/spark/connect/relations.proto: ## @@ -1031,6 +1031,26 @@ message GroupMap { // (Optional) The schema for the grouped st

Re: [PR] [SPARK-50582][SQL][PYTHON] Add quote builtin function [spark]

2025-02-11 Thread via GitHub
dongjoon-hyun commented on PR #49191: URL: https://github.com/apache/spark/pull/49191#issuecomment-2652607345 Merged to master. Thank you, @sarutak and @MaxGekk . -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

Re: [PR] [SPARK-50582][SQL][PYTHON] Add quote builtin function [spark]

2025-02-11 Thread via GitHub
dongjoon-hyun closed pull request #49191: [SPARK-50582][SQL][PYTHON] Add quote builtin function URL: https://github.com/apache/spark/pull/49191 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the sp

[PR] [SPARK-51141][ML][CONNECT][DOCS] Document the Support of ML on Connect [spark]

2025-02-11 Thread via GitHub
zhengruifeng opened a new pull request, #49902: URL: https://github.com/apache/spark/pull/49902 ### What changes were proposed in this pull request? Document the Support of ML on Connect ### Why are the changes needed? to document the status ### Does this P

Re: [PR] [MINOR][INFRA] List pip installation before test in python macos test and python connect test [spark]

2025-02-11 Thread via GitHub
zhengruifeng commented on code in PR #49901: URL: https://github.com/apache/spark/pull/49901#discussion_r1951929591 ## .github/workflows/build_python_connect.yml: ## @@ -72,7 +72,9 @@ jobs: python packaging/connect/setup.py sdist cd dist pip inst

[PR] [MINOR][INFRA] List pip installation before test in python macos test and python connect test [spark]

2025-02-11 Thread via GitHub
zhengruifeng opened a new pull request, #49901: URL: https://github.com/apache/spark/pull/49901 ### What changes were proposed in this pull request? List pip installation before test in python macos test and python connect test ### Why are the changes needed? to impr

Re: [PR] [SPARK-50854][SS] Make path fully qualified before passing it to FileStreamSink [spark]

2025-02-11 Thread via GitHub
HeartSaVioR closed pull request #49654: [SPARK-50854][SS] Make path fully qualified before passing it to FileStreamSink URL: https://github.com/apache/spark/pull/49654 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the U

Re: [PR] [SPARK-50854][SS] Make path fully qualified before passing it to FileStreamSink [spark]

2025-02-11 Thread via GitHub
HeartSaVioR commented on PR #49654: URL: https://github.com/apache/spark/pull/49654#issuecomment-2652571224 Thanks! Merging to master/4.0. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spec

Re: [PR] [SPARK-51136][CORE] Set `CallerContext` for History Server [spark]

2025-02-11 Thread via GitHub
cnauroth commented on PR #49858: URL: https://github.com/apache/spark/pull/49858#issuecomment-2652560330 Great, really appreciate your efforts on testing strategy for this change! Thank you, @dongjoon-hyun . -- This is an automated message from the Apache Git Service. To respond to the me

Re: [PR] [SPARK-51172][SS] Rename to spark.sql.optimizer.pruneFiltersCanPruneStreamingSubplan [spark]

2025-02-11 Thread via GitHub
HeartSaVioR commented on PR #49897: URL: https://github.com/apache/spark/pull/49897#issuecomment-2652567140 There are two cases: 1. The streaming query has started from Spark 3.5.4 2. The streaming query has started before Spark 3.5.4, and had migrated to Spark 3.5.4 1. W

Re: [PR] [SPARK-42746][SQL][FOLLOWUP] Improve the golden files by print the hex string of binary [spark]

2025-02-11 Thread via GitHub
beliefer commented on PR #49889: URL: https://github.com/apache/spark/pull/49889#issuecomment-2652563094 @LuciferYang @HyukjinKwon Thank you! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the s

Re: [PR] [SPARK-42746][SQL][FOLLOWUP] Improve the golden files by print the hex string of binary [spark]

2025-02-11 Thread via GitHub
LuciferYang commented on PR #49889: URL: https://github.com/apache/spark/pull/49889#issuecomment-2652540448 Merged into master and branch-4.0. Thanks @beliefer and @HyukjinKwon -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

Re: [PR] [SPARK-42746][SQL][FOLLOWUP] Improve the golden files by print the hex string of binary [spark]

2025-02-11 Thread via GitHub
LuciferYang closed pull request #49889: [SPARK-42746][SQL][FOLLOWUP] Improve the golden files by print the hex string of binary URL: https://github.com/apache/spark/pull/49889 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and u

Re: [PR] [SPARK-51173][TESTS] Add `configName` Scalastyle rule [spark]

2025-02-11 Thread via GitHub
dongjoon-hyun closed pull request #49900: [SPARK-51173][TESTS] Add `configName` Scalastyle rule URL: https://github.com/apache/spark/pull/49900 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the sp

Re: [PR] [SPARK-51173][TESTS] Add `configName` Scalastyle rule [spark]

2025-02-11 Thread via GitHub
dongjoon-hyun commented on PR #49900: URL: https://github.com/apache/spark/pull/49900#issuecomment-2652526084 Thank you, @LuciferYang . Scala Linter passed. https://github.com/user-attachments/assets/4cebd329-ad64-44fa-af47-c48c032e009c"; /> Let me merge this. -- This is an

Re: [PR] [SPARK-51146][INFRA] Use awk to update release scripts [spark]

2025-02-11 Thread via GitHub
cloud-fan commented on code in PR #49891: URL: https://github.com/apache/spark/pull/49891#discussion_r1951892767 ## dev/make-distribution.sh: ## @@ -317,9 +317,14 @@ if [ "$MAKE_TGZ" == "true" ]; then TARDIR="$SPARK_HOME/$TARDIR_NAME" rm -rf "$TARDIR" cp -r "$DIST

Re: [PR] [SPARK-51146][INFRA] Use awk to update release scripts [spark]

2025-02-11 Thread via GitHub
pan3793 commented on code in PR #49891: URL: https://github.com/apache/spark/pull/49891#discussion_r1951886878 ## dev/make-distribution.sh: ## @@ -317,9 +317,9 @@ if [ "$MAKE_TGZ" == "true" ]; then TARDIR="$SPARK_HOME/$TARDIR_NAME" rm -rf "$TARDIR" cp -r "$DISTDIR

Re: [PR] [SPARK-51172][SS] Rename to spark.sql.optimizer.pruneFiltersCanPruneStreamingSubplan [spark]

2025-02-11 Thread via GitHub
HyukjinKwon closed pull request #49897: [SPARK-51172][SS] Rename to spark.sql.optimizer.pruneFiltersCanPruneStreamingSubplan URL: https://github.com/apache/spark/pull/49897 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

Re: [PR] [SPARK-51173][TESTS] Add `configName` Scalastyle rule [spark]

2025-02-11 Thread via GitHub
dongjoon-hyun commented on PR #49900: URL: https://github.com/apache/spark/pull/49900#issuecomment-2652499536 Let me rebase this PR. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific c

Re: [PR] [SPARK-51146][INFRA] Use awk to update release scripts [spark]

2025-02-11 Thread via GitHub
cloud-fan commented on code in PR #49891: URL: https://github.com/apache/spark/pull/49891#discussion_r1951884340 ## dev/make-distribution.sh: ## @@ -317,9 +317,9 @@ if [ "$MAKE_TGZ" == "true" ]; then TARDIR="$SPARK_HOME/$TARDIR_NAME" rm -rf "$TARDIR" cp -r "$DISTD

Re: [PR] [SPARK-51173][TESTS] Add `configName` Scalastyle rule [spark]

2025-02-11 Thread via GitHub
dongjoon-hyun commented on PR #49900: URL: https://github.com/apache/spark/pull/49900#issuecomment-2652499362 Thank you, @HyukjinKwon . -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specifi

Re: [PR] [SPARK-51146][INFRA] Use awk to update release scripts [spark]

2025-02-11 Thread via GitHub
cloud-fan commented on code in PR #49891: URL: https://github.com/apache/spark/pull/49891#discussion_r1951883632 ## dev/make-distribution.sh: ## @@ -317,9 +317,9 @@ if [ "$MAKE_TGZ" == "true" ]; then TARDIR="$SPARK_HOME/$TARDIR_NAME" rm -rf "$TARDIR" cp -r "$DISTD

Re: [PR] [SPARK-51172][SS] Rename to spark.sql.optimizer.pruneFiltersCanPruneStreamingSubplan [spark]

2025-02-11 Thread via GitHub
HyukjinKwon commented on PR #49897: URL: https://github.com/apache/spark/pull/49897#issuecomment-2652498476 Merged to master and branch-4.0. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the sp

Re: [PR] [SPARK-45891][SQL][FOLLOWUP] Disable `spark.sql.variant.allowReadingShredded` by default [spark]

2025-02-11 Thread via GitHub
cloud-fan commented on PR #49874: URL: https://github.com/apache/spark/pull/49874#issuecomment-2652495423 @pan3793 can you elaborate on the incompatibilities between the spark variant implementation and the new parquet spec? -- This is an automated message from the Apache Git Service. To

Re: [PR] [SPARK-51146][INFRA] Use awk to update release scripts [spark]

2025-02-11 Thread via GitHub
pan3793 commented on PR #49891: URL: https://github.com/apache/spark/pull/49891#issuecomment-2652494678 > We don't have a windows script to make a release :-). @HyukjinKwon I mean scripts like `spark-shell2.cmd`. BTW, I remember you had a suggestion on removing `bin/spark-connect-shel

Re: [PR] [SPARK-51114] [SQL] Refactor PullOutNondeterministic rule [spark]

2025-02-11 Thread via GitHub
cloud-fan closed pull request #49837: [SPARK-51114] [SQL] Refactor PullOutNondeterministic rule URL: https://github.com/apache/spark/pull/49837 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the sp

Re: [PR] [SPARK-51114] [SQL] Refactor PullOutNondeterministic rule [spark]

2025-02-11 Thread via GitHub
cloud-fan commented on PR #49837: URL: https://github.com/apache/spark/pull/49837#issuecomment-2652492679 The AQE test failure is unrelated, thanks, merging to master! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use th

Re: [PR] [SPARK-51172][SS] Rename to spark.sql.optimizer.pruneFiltersCanPruneStreamingSubplan [spark]

2025-02-11 Thread via GitHub
HeartSaVioR commented on PR #49897: URL: https://github.com/apache/spark/pull/49897#issuecomment-2652486115 Let's fix in master/4.0 first to avoid making more releases shipping this. We should probably think through how to not break 3.5 when fixing. If we weren't released this in 3.5 that'd

[PR] [SPARK-51173][TESTS] Add `configName` Scalastyle rule [spark]

2025-02-11 Thread via GitHub
dongjoon-hyun opened a new pull request, #49900: URL: https://github.com/apache/spark/pull/49900 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### H

Re: [PR] [SPARK-42746][SQL][FOLLOWUP] Change the delimiter parameter of listagg scala functions from Column to String [spark]

2025-02-11 Thread via GitHub
yaooqinn commented on PR #49879: URL: https://github.com/apache/spark/pull/49879#issuecomment-2652484552 OK, I'll close this, thank you all -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spe

Re: [PR] [SPARK-42746][SQL][FOLLOWUP] Change the delimiter parameter of listagg scala functions from Column to String [spark]

2025-02-11 Thread via GitHub
yaooqinn closed pull request #49879: [SPARK-42746][SQL][FOLLOWUP] Change the delimiter parameter of listagg scala functions from Column to String URL: https://github.com/apache/spark/pull/49879 -- This is an automated message from the Apache Git Service. To respond to the message, please log

Re: [PR] [SPARK-49699][SS] Disable PruneFilters for streaming workloads [spark]

2025-02-11 Thread via GitHub
HeartSaVioR commented on code in PR #48149: URL: https://github.com/apache/spark/pull/48149#discussion_r1951875300 ## sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala: ## @@ -3827,6 +3827,15 @@ object SQLConf { .intConf .createWithDefault(Byt

Re: [PR] [SPARK-51173][TESTS] Add `configName` Scalastyle rule [spark]

2025-02-11 Thread via GitHub
dongjoon-hyun commented on PR #49900: URL: https://github.com/apache/spark/pull/49900#issuecomment-2652482091 cc @HyukjinKwon and @LuciferYang -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] [SPARK-51172][SS] Rename to spark.sql.optimizer.pruneFiltersCanPruneStreamingSubplan [spark]

2025-02-11 Thread via GitHub
HyukjinKwon commented on PR #49897: URL: https://github.com/apache/spark/pull/49897#issuecomment-2652479796 H .. @HeartSaVioR WDYT? should we only make this change in master branch alone? -- This is an automated message from the Apache Git Service. To respond to the message, please lo

Re: [PR] [SPARK-49699][SS][FOLLOW-UP] Rename to spark.sql.optimizer.pruneFiltersCanPruneStreamingSubplan [spark]

2025-02-11 Thread via GitHub
HyukjinKwon commented on PR #49897: URL: https://github.com/apache/spark/pull/49897#issuecomment-2652472122 sure, will do -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. T

Re: [PR] [SPARK-49699][SS][FOLLOW-UP] Rename to spark.sql.optimizer.pruneFiltersCanPruneStreamingSubplan [spark]

2025-02-11 Thread via GitHub
HyukjinKwon commented on PR #49897: URL: https://github.com/apache/spark/pull/49897#issuecomment-2652471560 uhoh -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscr

[PR] [SPARK-51171][BUILD] Upgrade `checkstyle` to 10.21.2 [spark]

2025-02-11 Thread via GitHub
wayneguow opened a new pull request, #49899: URL: https://github.com/apache/spark/pull/49899 ### What changes were proposed in this pull request? This PR aims to upgrade `checkstyle` from 10.20.2 to 10.21.2. ### Why are the changes needed? To pick up bug fixes:

Re: [PR] [SPARK-51097] [SS] Adding state store instance metrics for last uploaded snapshot version in RocksDB [spark]

2025-02-11 Thread via GitHub
micheal-o commented on PR #49816: URL: https://github.com/apache/spark/pull/49816#issuecomment-2652464983 @zecookiez Looks good. Thanks for the work here. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above t

Re: [PR] [SPARK-49699][SS][FOLLOW-UP] Rename to spark.sql.optimizer.pruneFiltersCanPruneStreamingSubplan [spark]

2025-02-11 Thread via GitHub
dongjoon-hyun commented on PR #49897: URL: https://github.com/apache/spark/pull/49897#issuecomment-2652464889 Could you file a new JIRA issue for this because this should have a fix version `3.5.5`? -- This is an automated message from the Apache Git Service. To respond to the message, pl

Re: [PR] [SPARK-51097] [SS] Adding state store instance metrics for last uploaded snapshot version in RocksDB [spark]

2025-02-11 Thread via GitHub
micheal-o commented on code in PR #49816: URL: https://github.com/apache/spark/pull/49816#discussion_r1951864272 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamingSymmetricHashJoinExec.scala: ## @@ -527,9 +527,7 @@ case class StreamingSymmetricHashJoin

Re: [PR] [WIP][SPARK-51083][CORE] Modify JavaUtils to not swallow InterruptedExceptions [spark]

2025-02-11 Thread via GitHub
anishshri-db commented on PR #49796: URL: https://github.com/apache/spark/pull/49796#issuecomment-2652462231 @neilramaswamy - is the PR status still WIP ? if not - lets remove that from the title ? -- This is an automated message from the Apache Git Service. To respond to the message, ple

Re: [PR] [SPARK-51136][CORE] Set `CallerContext` for History Server [spark]

2025-02-11 Thread via GitHub
dongjoon-hyun commented on PR #49858: URL: https://github.com/apache/spark/pull/49858#issuecomment-2652460931 All `core` module tests passed. Merged to master for Apache Spark 4.1.0. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to Gi

Re: [PR] [WIP][SPARK-51083][CORE] Modify JavaUtils to not swallow InterruptedExceptions [spark]

2025-02-11 Thread via GitHub
anishshri-db commented on code in PR #49796: URL: https://github.com/apache/spark/pull/49796#discussion_r1951863044 ## common/utils/src/main/java/org/apache/spark/network/util/JavaUtils.java: ## @@ -181,6 +183,8 @@ private static void deleteRecursivelyUsingUnixNative(File file)

Re: [PR] [SPARK-51136][CORE] Set `CallerContext` for History Server [spark]

2025-02-11 Thread via GitHub
dongjoon-hyun closed pull request #49858: [SPARK-51136][CORE] Set `CallerContext` for History Server URL: https://github.com/apache/spark/pull/49858 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to t

Re: [PR] [SPARK-51152][SQL]Add an example for get_json_object when the JSON object is of JSON array type [spark]

2025-02-11 Thread via GitHub
panbingkun commented on code in PR #49875: URL: https://github.com/apache/spark/pull/49875#discussion_r1951862714 ## python/pyspark/sql/functions/builtin.py: ## @@ -20115,11 +20115,24 @@ def get_json_object(col: "ColumnOrName", path: str) -> Column: Examples ---

Re: [PR] [SPARK-51152][SQL]Add an example for get_json_object when the JSON object is of JSON array type [spark]

2025-02-11 Thread via GitHub
panbingkun commented on code in PR #49875: URL: https://github.com/apache/spark/pull/49875#discussion_r1951862898 ## python/pyspark/sql/functions/builtin.py: ## @@ -20115,11 +20115,24 @@ def get_json_object(col: "ColumnOrName", path: str) -> Column: Examples ---

Re: [PR] [SPARK-51152][SQL]Add an example for get_json_object when the JSON object is of JSON array type [spark]

2025-02-11 Thread via GitHub
panbingkun commented on code in PR #49875: URL: https://github.com/apache/spark/pull/49875#discussion_r1951859880 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala: ## @@ -42,6 +42,10 @@ import org.apache.spark.unsafe.types.UTF8String

Re: [PR] [SPARK-51170][PYTHON][CONNECT][TESTS] Separate local and local-cluster in pure Python build in master [spark]

2025-02-11 Thread via GitHub
HyukjinKwon closed pull request #49896: [SPARK-51170][PYTHON][CONNECT][TESTS] Separate local and local-cluster in pure Python build in master URL: https://github.com/apache/spark/pull/49896 -- This is an automated message from the Apache Git Service. To respond to the message, please log on t

Re: [PR] [SPARK-51170][PYTHON][CONNECT][TESTS] Separate local and local-cluster in pure Python build in master [spark]

2025-02-11 Thread via GitHub
HyukjinKwon commented on PR #49896: URL: https://github.com/apache/spark/pull/49896#issuecomment-2652444500 Merged to master and branch-4.0. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the sp

Re: [PR] [SPARK-51164][CORE][TESTS][FOLLOWUP] Add hadoop.caller.context.enabled for scalatest-maven-plugin. [spark]

2025-02-11 Thread via GitHub
dongjoon-hyun closed pull request #49898: [SPARK-51164][CORE][TESTS][FOLLOWUP] Add hadoop.caller.context.enabled for scalatest-maven-plugin. URL: https://github.com/apache/spark/pull/49898 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

Re: [PR] [SPARK-51156][CONNECT] Provide a basic authentication token when running Spark Connect server locally [spark]

2025-02-11 Thread via GitHub
HyukjinKwon commented on PR #49880: URL: https://github.com/apache/spark/pull/49880#issuecomment-2652439762 let me take a look -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment

Re: [PR] [SPARK-51164][CORE][TESTS][FOLLOWUP] Add hadoop.caller.context.enabled for scalatest-maven-plugin. [spark]

2025-02-11 Thread via GitHub
cnauroth commented on PR #49898: URL: https://github.com/apache/spark/pull/49898#issuecomment-2652436677 @dongjoon-hyun , thanks for the quick approval and merge. No worries! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

Re: [PR] [SPARK-51164][CORE][TESTS][FOLLOWUP] Add hadoop.caller.context.enabled for scalatest-maven-plugin. [spark]

2025-02-11 Thread via GitHub
dongjoon-hyun commented on PR #49898: URL: https://github.com/apache/spark/pull/49898#issuecomment-2652433255 Thank you again, @cnauroth . -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spec

Re: [PR] [SPARK-51164][CORE][TESTS][FOLLOWUP] Add hadoop.caller.context.enabled for scalatest-maven-plugin. [spark]

2025-02-11 Thread via GitHub
dongjoon-hyun commented on PR #49898: URL: https://github.com/apache/spark/pull/49898#issuecomment-2652433068 Merged to master/4.0/3.5. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specifi

Re: [PR] [SPARK-51164][CORE][TESTS][FOLLOWUP] Add hadoop.caller.context.enabled for scalatest-maven-plugin. [spark]

2025-02-11 Thread via GitHub
dongjoon-hyun commented on PR #49898: URL: https://github.com/apache/spark/pull/49898#issuecomment-2652431849 Since the PR builder is SBT, I verified manually. ``` $ build/mvn -pl core test -Dtest=none -Dsuites=org.apache.spark.util.UtilsSuite ... UtilsSuite: - timeConversion

  1   2   3   >