Re: [PR] [SPARK-51154][BUILD][TESTS] Remove unused `jopt` test dependency [spark]

2025-02-10 Thread via GitHub
cnauroth commented on PR #49877: URL: https://github.com/apache/spark/pull/49877#issuecomment-2649977050 Thank you for the review and commit, @dongjoon-hyun ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abo

[PR] [SPARK-51157][SQL] Add missing @varargs Scala annotation for Scala function APIs [spark]

2025-02-10 Thread via GitHub
yaooqinn opened a new pull request, #49883: URL: https://github.com/apache/spark/pull/49883 ### What changes were proposed in this pull request? This PR adds missing @varargs Scala annotation for Scala function APIs ### Why are the changes needed? To instruct the comp

Re: [PR] [SPARK-51155][CORE] Make `SparkContext` show total runtime after stopping [spark]

2025-02-10 Thread via GitHub
dongjoon-hyun closed pull request #49878: [SPARK-51155][CORE] Make `SparkContext` show total runtime after stopping URL: https://github.com/apache/spark/pull/49878 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL a

Re: [PR] [SPARK-51153][BUILD] Remove unused `javax.activation:activation` dependency [spark]

2025-02-10 Thread via GitHub
cnauroth commented on code in PR #49876: URL: https://github.com/apache/spark/pull/49876#discussion_r1950331429 ## LICENSE-binary: ## @@ -501,7 +501,6 @@ core/src/main/resources/org/apache/spark/ui/static/d3.min.js Common Development and Distribution License (CDDL) 1.0 -

Re: [PR] [SPARK-51155][CORE] Make `SparkContext` show total runtime after stopping [spark]

2025-02-10 Thread via GitHub
dongjoon-hyun commented on PR #49878: URL: https://github.com/apache/spark/pull/49878#issuecomment-2649973836 Thank you, @yaooqinn . -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific c

Re: [PR] [SPARK-51154][BUILD][TESTS] Remove unused `jopt` test dependency [spark]

2025-02-10 Thread via GitHub
dongjoon-hyun commented on PR #49877: URL: https://github.com/apache/spark/pull/49877#issuecomment-2649971342 Merged to master/4.0. Thank you again. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go t

Re: [PR] [SPARK-51154][BUILD][TESTS] Remove unused `jopt` test dependency [spark]

2025-02-10 Thread via GitHub
dongjoon-hyun closed pull request #49877: [SPARK-51154][BUILD][TESTS] Remove unused `jopt` test dependency URL: https://github.com/apache/spark/pull/49877 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to g

Re: [PR] [SPARK-51153][BUILD] Remove unused `javax.activation:activation` dependency [spark]

2025-02-10 Thread via GitHub
dongjoon-hyun closed pull request #49876: [SPARK-51153][BUILD] Remove unused `javax.activation:activation` dependency URL: https://github.com/apache/spark/pull/49876 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

Re: [PR] [SPARK-51153][BUILD] Remove unused `javax.activation:activation` dependency [spark]

2025-02-10 Thread via GitHub
dongjoon-hyun commented on PR #49876: URL: https://github.com/apache/spark/pull/49876#issuecomment-264995 `OracleIntegrationSuite` failure is irrelevant to this PR. Merged to master/4.0 -- This is an automated message from the Apache Git Service. To respond to the message, please lo

Re: [PR] [SPARK-45891][SQL][FOLLOWUP] Disable `spark.sql.variant.allowReadingShredded` by default [spark]

2025-02-10 Thread via GitHub
dongjoon-hyun commented on PR #49874: URL: https://github.com/apache/spark/pull/49874#issuecomment-2649957485 cc @chenhao-db, @cashmand , @gene-db , @cloud-fan -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL a

Re: [PR] [SPARK-51153][BUILD] Remove unused `javax.activation:activation` dependency [spark]

2025-02-10 Thread via GitHub
dongjoon-hyun commented on code in PR #49876: URL: https://github.com/apache/spark/pull/49876#discussion_r1950312360 ## LICENSE-binary: ## @@ -501,7 +501,6 @@ core/src/main/resources/org/apache/spark/ui/static/d3.min.js Common Development and Distribution License (CDDL) 1.0

[PR] [SPARK-50940][SPARK-50941][ML][PYTHON][CONNECT][FOLLOW-UP] Directly reuse the CrossValidatorModelWriter and TrainValidationSplitModelWriter [spark]

2025-02-10 Thread via GitHub
zhengruifeng opened a new pull request, #49882: URL: https://github.com/apache/spark/pull/49882 ### What changes were proposed in this pull request? Directly reuse the CrossValidatorModelWriter and TrainValidationSplitModelWriter ### Why are the changes needed? to si

Re: [PR] [SPARK-51119][SQL][FOLLOW-UP] Readers on executors resolving EXISTS_DEFAULT should not call catalogs [spark]

2025-02-10 Thread via GitHub
dongjoon-hyun commented on PR #49881: URL: https://github.com/apache/spark/pull/49881#issuecomment-2649928014 cc @cloud-fan -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] [SPARK-51152][SQL]Add an example for get_json_object when the JSON object is of JSON array type [spark]

2025-02-10 Thread via GitHub
panbingkun commented on PR #49875: URL: https://github.com/apache/spark/pull/49875#issuecomment-2649925845 Are there any other examples of `wide characters` that need to be shown? Additionally, the `PySpark example` also needs to be updated. -- This is an automated message from the Apa

Re: [PR] [SPARK-51119][SQL] Readers on executors resolving EXISTS_DEFAULT should not call catalogs [spark]

2025-02-10 Thread via GitHub
szehon-ho commented on code in PR #49840: URL: https://github.com/apache/spark/pull/49840#discussion_r1950287853 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/ResolveDefaultColumnsUtil.scala: ## @@ -320,6 +319,29 @@ object ResolveDefaultColumns extends QueryE

Re: [PR] [SPARK-51119][SQL] Readers on executors resolving EXISTS_DEFAULT should not call catalogs [spark]

2025-02-10 Thread via GitHub
szehon-ho commented on code in PR #49840: URL: https://github.com/apache/spark/pull/49840#discussion_r1950287853 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/ResolveDefaultColumnsUtil.scala: ## @@ -320,6 +319,29 @@ object ResolveDefaultColumns extends QueryE

Re: [PR] [SPARK-51119][SQL] Readers on executors resolving EXISTS_DEFAULT should not call catalogs [spark]

2025-02-10 Thread via GitHub
szehon-ho commented on code in PR #49840: URL: https://github.com/apache/spark/pull/49840#discussion_r1950287853 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/ResolveDefaultColumnsUtil.scala: ## @@ -320,6 +319,29 @@ object ResolveDefaultColumns extends QueryE

Re: [PR] [SPARK-51119][SQL] Readers on executors resolving EXISTS_DEFAULT should not call catalogs [spark]

2025-02-10 Thread via GitHub
szehon-ho commented on PR #49840: URL: https://github.com/apache/spark/pull/49840#issuecomment-2649906152 Synced with @cloud-fan offline, the current code should work for this case. Made follow up https://github.com/apache/spark/pull/49881 to do some cleanup to put the logic in the right p

[PR] [SPARK-51119][SQL][FOLLOW-UP] Readers on executors resolving EXISTS_DEFAULT should not call catalogs [spark]

2025-02-10 Thread via GitHub
szehon-ho opened a new pull request, #49881: URL: https://github.com/apache/spark/pull/49881 ### What changes were proposed in this pull request? Code cleanup for https://github.com/apache/spark/pull/49840/. Literal#fromSQL should be the inverse of Literal#sql. The cast handlin

Re: [PR] [SPARK-51147][SS] Refactor streaming related classes to a dedicated streaming directory [spark]

2025-02-10 Thread via GitHub
HeartSaVioR commented on PR #49867: URL: https://github.com/apache/spark/pull/49867#issuecomment-2649891533 Thanks! Merging to master/4.0. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spec

Re: [PR] [SPARK-51147][SS] Refactor streaming related classes to a dedicated streaming directory [spark]

2025-02-10 Thread via GitHub
HeartSaVioR closed pull request #49867: [SPARK-51147][SS] Refactor streaming related classes to a dedicated streaming directory URL: https://github.com/apache/spark/pull/49867 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and u

Re: [PR] [SPARK-51156][CONNECT] Provide a basic authentication token when running Spark Connect server locally [spark]

2025-02-10 Thread via GitHub
HyukjinKwon commented on PR #49880: URL: https://github.com/apache/spark/pull/49880#issuecomment-2649889401 It's enabled by default .. so I think it's fine ... -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL ab

Re: [PR] [SPARK-51136][CORE] Set `CallerContext` for History Server [spark]

2025-02-10 Thread via GitHub
cnauroth commented on PR #49858: URL: https://github.com/apache/spark/pull/49858#issuecomment-2649887978 > BTW, I'm investigating the relevant code at this chance while reviewing this PR. It's because the existing test coverage also looks suspicious due to `val callerContextEnabled`. The co

Re: [PR] [SPARK-51136][CORE] Set `CallerContext` for History Server [spark]

2025-02-10 Thread via GitHub
dongjoon-hyun commented on PR #49858: URL: https://github.com/apache/spark/pull/49858#issuecomment-2649879155 BTW, I'm investigating the relevant code at this chance while reviewing this PR. It's because the existing test coverage also looks suspicious due to `val callerContextEnabled`. Th

[PR] [SPARK-42746][SQL][FOLLOWUP] Change the delimiter parameter of listagg scala functions from Column to String [spark]

2025-02-10 Thread via GitHub
yaooqinn opened a new pull request, #49879: URL: https://github.com/apache/spark/pull/49879 ### What changes were proposed in this pull request? This PR changes the delimiter parameter of listagg scala functions from Column to String ### Why are the changes needed?

Re: [PR] [SPARK-51153][BUILD] Remove unused `javax.activation:activation` dependence [spark]

2025-02-10 Thread via GitHub
cnauroth commented on code in PR #49876: URL: https://github.com/apache/spark/pull/49876#discussion_r1950271725 ## LICENSE-binary: ## @@ -501,7 +501,6 @@ core/src/main/resources/org/apache/spark/ui/static/d3.min.js Common Development and Distribution License (CDDL) 1.0 -

Re: [PR] [SPARK-51155][CORE] Make `SparkContext` show total runtime after stopping [spark]

2025-02-10 Thread via GitHub
dongjoon-hyun commented on code in PR #49878: URL: https://github.com/apache/spark/pull/49878#discussion_r1950271133 ## core/src/main/scala/org/apache/spark/SparkContext.scala: ## @@ -2405,7 +2405,8 @@ class SparkContext(config: SparkConf) extends Logging { ResourceProfile.

Re: [PR] [SPARK-51155][CORE] Make `SparkContext` show total runtime after stopping [spark]

2025-02-10 Thread via GitHub
cnauroth commented on code in PR #49878: URL: https://github.com/apache/spark/pull/49878#discussion_r1950268117 ## core/src/main/scala/org/apache/spark/SparkContext.scala: ## @@ -2405,7 +2405,8 @@ class SparkContext(config: SparkConf) extends Logging { ResourceProfile.clear

Re: [PR] [SPARK-51152][SQL]Add an example for get_json_object when the JSON object is of JSON array type [spark]

2025-02-10 Thread via GitHub
fusheng9399 commented on PR #49875: URL: https://github.com/apache/spark/pull/49875#issuecomment-2649869495 Please help review it when you have free time, thanks! @panbingkun -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub an

Re: [PR] [SPARK-51153][BUILD] Remove unused `javax.activation:activation` dependence [spark]

2025-02-10 Thread via GitHub
wayneguow commented on code in PR #49876: URL: https://github.com/apache/spark/pull/49876#discussion_r1950264947 ## LICENSE-binary: ## @@ -501,7 +501,6 @@ core/src/main/resources/org/apache/spark/ui/static/d3.min.js Common Development and Distribution License (CDDL) 1.0

[PR] [SPARK-51155][CORE] Make `SparkContext` show total runtime after stopping [spark]

2025-02-10 Thread via GitHub
dongjoon-hyun opened a new pull request, #49878: URL: https://github.com/apache/spark/pull/49878 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### H

[PR] [SPARK-51154][KAFKA] Remove unused jopt dependency. [spark]

2025-02-10 Thread via GitHub
cnauroth opened a new pull request, #49877: URL: https://github.com/apache/spark/pull/49877 ### What changes were proposed in this pull request? The Kafka connector declares a dependecy on the jopt library. This library is not actually used in the code, so we can remove it. ###

Re: [PR] [SPARK-51154][KAFKA] Remove unused jopt dependency. [spark]

2025-02-10 Thread via GitHub
cnauroth commented on PR #49877: URL: https://github.com/apache/spark/pull/49877#issuecomment-2649831980 Apparently this dependency dates all the way back to [SPARK-1022](https://issues.apache.org/jira/browse/SPARK-1022). That patch added the dependency, but didn't include any code that nee

Re: [PR] [SPARK-51154][KAFKA] Remove unused jopt dependency. [spark]

2025-02-10 Thread via GitHub
cnauroth commented on PR #49877: URL: https://github.com/apache/spark/pull/49877#issuecomment-2649825750 I am seeing some flakiness on these Kafka tests locally, but it's definitely not related to this proposed change. Curious to see how it runs in CI. -- This is an automated message from

Re: [PR] [SPARK-45891][SQL][FOLLOWUP] Disable `spark.sql.variant.allowReadingShredded` by default [spark]

2025-02-10 Thread via GitHub
pan3793 commented on PR #49874: URL: https://github.com/apache/spark/pull/49874#issuecomment-2649824762 @dongjoon-hyun All variant shred-related changes except this one are currently disabled(by internal SQL confs) in Spark 4.0, and only usable as test features, which is safe for future bre

Re: [PR] [SPARK-51149][CORE] Log classpath in SparkSubmit on ClassNotFoundException [spark]

2025-02-10 Thread via GitHub
vrozov commented on PR #49870: URL: https://github.com/apache/spark/pull/49870#issuecomment-2649799708 I don't think that `--verbose` outputs the full classpath and between enhancing `--verbose` to log classpath and logging classpath on the specific error that relates to the classpath issue

Re: [PR] [SPARK-51153][BUILD] Remove unused `javax.activation:activation` dependence [spark]

2025-02-10 Thread via GitHub
wayneguow commented on PR #49876: URL: https://github.com/apache/spark/pull/49876#issuecomment-2649795409 It comes from https://github.com/apache/spark/pull/22081/files#r209707448, but after #49854 , it seems that nowhere depends on this. -- This is an automated message from the Apache Gi

[PR] Add an example for get_json_object when the JSON object is of JSON array type [spark]

2025-02-10 Thread via GitHub
fusheng9399 opened a new pull request, #49875: URL: https://github.com/apache/spark/pull/49875 ### What changes were proposed in this pull request? Add an example for get_json_object when the JSON object is of JSON array type. ### Why are the changes needed? Most use

[PR] [SPARK-51153][BUILD] Remove unused `javax.activation:activation` dependence [spark]

2025-02-10 Thread via GitHub
wayneguow opened a new pull request, #49876: URL: https://github.com/apache/spark/pull/49876 ### What changes were proposed in this pull request? This PR aims to remove unused `javax.activation:activation` dependence. ### Why are the changes needed? Reduce useless

Re: [PR] [SPARK-51097] [SS] Adding state store instance metrics for last uploaded snapshot version in RocksDB [spark]

2025-02-10 Thread via GitHub
micheal-o commented on code in PR #49816: URL: https://github.com/apache/spark/pull/49816#discussion_r1950201910 ## sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala: ## @@ -2251,6 +2251,19 @@ object SQLConf { .booleanConf .createWithDefault(t

Re: [PR] [SPARK-51008][SQL] Add ResultStage for AQE [spark]

2025-02-10 Thread via GitHub
cloud-fan commented on code in PR #49715: URL: https://github.com/apache/spark/pull/49715#discussion_r1950213620 ## sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/QueryStageExec.scala: ## @@ -303,3 +308,43 @@ case class TableCacheQueryStageExec( override de

Re: [PR] [SPARK-42746][SQL] Implement LISTAGG function [spark]

2025-02-10 Thread via GitHub
cloud-fan commented on code in PR #48748: URL: https://github.com/apache/spark/pull/48748#discussion_r1950212939 ## sql/api/src/main/scala/org/apache/spark/sql/functions.scala: ## @@ -1147,6 +1147,77 @@ object functions { */ def sum_distinct(e: Column): Column = Column.fn

Re: [PR] [SPARK-51150][ML] Explicitly pass the session in meta algorithm writers [spark]

2025-02-10 Thread via GitHub
zhengruifeng closed pull request #49871: [SPARK-51150][ML] Explicitly pass the session in meta algorithm writers URL: https://github.com/apache/spark/pull/49871 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abov

Re: [PR] [SPARK-42746][SQL] Implement LISTAGG function [spark]

2025-02-10 Thread via GitHub
yaooqinn commented on code in PR #48748: URL: https://github.com/apache/spark/pull/48748#discussion_r1950205886 ## sql/api/src/main/scala/org/apache/spark/sql/functions.scala: ## @@ -1147,6 +1147,77 @@ object functions { */ def sum_distinct(e: Column): Column = Column.fn(

Re: [PR] [SPARK-51150][ML] Explicitly pass the session in meta algorithm writers [spark]

2025-02-10 Thread via GitHub
zhengruifeng commented on PR #49871: URL: https://github.com/apache/spark/pull/49871#issuecomment-2649760851 thanks, merged to master for 4.1 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the s

Re: [PR] [SPARK-42746][SQL] Implement LISTAGG function [spark]

2025-02-10 Thread via GitHub
yaooqinn commented on code in PR #48748: URL: https://github.com/apache/spark/pull/48748#discussion_r1950205886 ## sql/api/src/main/scala/org/apache/spark/sql/functions.scala: ## @@ -1147,6 +1147,77 @@ object functions { */ def sum_distinct(e: Column): Column = Column.fn(

Re: [PR] [SPARK-51008][SQL] Add ResultStage for AQE [spark]

2025-02-10 Thread via GitHub
ulysses-you commented on code in PR #49715: URL: https://github.com/apache/spark/pull/49715#discussion_r1950197446 ## sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/QueryStageExec.scala: ## @@ -303,3 +308,43 @@ case class TableCacheQueryStageExec( override

Re: [PR] [SPARK-45891][SQL] Rebuild variant binary from shredded data. [spark]

2025-02-10 Thread via GitHub
pan3793 commented on PR #48851: URL: https://github.com/apache/spark/pull/48851#issuecomment-2649741363 I opened https://github.com/apache/spark/pull/49874 to disable the `spark.sql.variant.allowReadingShredded` by default -- This is an automated message from the Apache Git Service. To re

[PR] [SPARK-45891][SQL][FOLLOWUP] Disable `spark.sql.variant.allowReadingShredded` by default [spark]

2025-02-10 Thread via GitHub
pan3793 opened a new pull request, #49874: URL: https://github.com/apache/spark/pull/49874 ### What changes were proposed in this pull request? Disable `spark.sql.variant.allowReadingShredded` by default ### Why are the changes needed? https://github.com/apache/pa

Re: [PR] [SPARK-51008][SQL] Add ResultStage for AQE [spark]

2025-02-10 Thread via GitHub
cloud-fan commented on PR #49715: URL: https://github.com/apache/spark/pull/49715#issuecomment-2649745108 I think we already did it for all query stages. @liuzqt how did you see result query stage in the UI? -- This is an automated message from the Apache Git Service. To respond to the me

Re: [PR] [SPARK-51067][SQL] Revert session level collation for DML queries and apply object level collation for DDL queries [spark]

2025-02-10 Thread via GitHub
cloud-fan commented on code in PR #49772: URL: https://github.com/apache/spark/pull/49772#discussion_r1950196523 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveDDLCommandStringTypes.scala: ## @@ -18,80 +18,48 @@ package org.apache.spark.sql.catalys

Re: [PR] [SPARK-51008][SQL] Add ResultStage for AQE [spark]

2025-02-10 Thread via GitHub
ulysses-you commented on PR #49715: URL: https://github.com/apache/spark/pull/49715#issuecomment-2649742747 shall we ignore `ResultQeryStage` in spark ui like other query stage ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

Re: [PR] [SPARK-51067][SQL] Revert session level collation for DML queries and apply object level collation for DDL queries [spark]

2025-02-10 Thread via GitHub
cloud-fan commented on code in PR #49772: URL: https://github.com/apache/spark/pull/49772#discussion_r1950192009 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveDDLCommandStringTypes.scala: ## @@ -18,80 +18,48 @@ package org.apache.spark.sql.catalys

Re: [PR] [SPARK-51067][SQL] Revert session level collation for DML queries and apply object level collation for DDL queries [spark]

2025-02-10 Thread via GitHub
cloud-fan commented on code in PR #49772: URL: https://github.com/apache/spark/pull/49772#discussion_r1950192262 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveDDLCommandStringTypes.scala: ## @@ -18,80 +18,48 @@ package org.apache.spark.sql.catalys

Re: [PR] [SPARK-51067][SQL] Revert session level collation for DML queries and apply object level collation for DDL queries [spark]

2025-02-10 Thread via GitHub
cloud-fan commented on code in PR #49772: URL: https://github.com/apache/spark/pull/49772#discussion_r1950193133 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveDDLCommandStringTypes.scala: ## @@ -156,21 +124,15 @@ object ResolveDefaultStringTypes ex

Re: [PR] [SPARK-51067][SQL] Revert session level collation for DML queries and apply object level collation for DDL queries [spark]

2025-02-10 Thread via GitHub
cloud-fan commented on code in PR #49772: URL: https://github.com/apache/spark/pull/49772#discussion_r1950192262 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveDDLCommandStringTypes.scala: ## @@ -18,80 +18,48 @@ package org.apache.spark.sql.catalys

Re: [PR] [SPARK-51067][SQL] Revert session level collation for DML queries and apply object level collation for DDL queries [spark]

2025-02-10 Thread via GitHub
cloud-fan commented on code in PR #49772: URL: https://github.com/apache/spark/pull/49772#discussion_r1950192009 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveDDLCommandStringTypes.scala: ## @@ -18,80 +18,48 @@ package org.apache.spark.sql.catalys

Re: [PR] [SPARK-51008][SQL] Add ResultStage for AQE [spark]

2025-02-10 Thread via GitHub
cloud-fan commented on code in PR #49715: URL: https://github.com/apache/spark/pull/49715#discussion_r1950184718 ## sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala: ## @@ -521,6 +513,66 @@ case class AdaptiveSparkPlanExec( this.in

Re: [PR] [MINOR][DOCS] Fix incorrect description of constraint on spark.sql.adaptive.coalescePartitions.minPartitionSize [spark]

2025-02-10 Thread via GitHub
cloud-fan closed pull request #49872: [MINOR][DOCS] Fix incorrect description of constraint on spark.sql.adaptive.coalescePartitions.minPartitionSize URL: https://github.com/apache/spark/pull/49872 -- This is an automated message from the Apache Git Service. To respond to the message, please

Re: [PR] [MINOR][DOCS] Fix incorrect description of constraint on spark.sql.adaptive.coalescePartitions.minPartitionSize [spark]

2025-02-10 Thread via GitHub
cloud-fan commented on PR #49872: URL: https://github.com/apache/spark/pull/49872#issuecomment-2649725958 thanks, merging to master/3.5 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specifi

Re: [PR] [SPARK-51008][SQL] Add ResultStage for AQE [spark]

2025-02-10 Thread via GitHub
cloud-fan commented on code in PR #49715: URL: https://github.com/apache/spark/pull/49715#discussion_r1950185321 ## sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala: ## @@ -521,6 +513,66 @@ case class AdaptiveSparkPlanExec( this.in

Re: [PR] [SPARK-51008][SQL] Add ResultStage for AQE [spark]

2025-02-10 Thread via GitHub
cloud-fan commented on code in PR #49715: URL: https://github.com/apache/spark/pull/49715#discussion_r1950184963 ## sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala: ## @@ -521,6 +513,66 @@ case class AdaptiveSparkPlanExec( this.in

Re: [PR] [SPARK-51119][SQL] Readers on executors resolving EXISTS_DEFAULT should not call catalogs [spark]

2025-02-10 Thread via GitHub
cloud-fan commented on code in PR #49840: URL: https://github.com/apache/spark/pull/49840#discussion_r1950180575 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/ResolveDefaultColumnsUtil.scala: ## @@ -320,6 +319,29 @@ object ResolveDefaultColumns extends QueryE

Re: [PR] [SPARK-51119][SQL] Readers on executors resolving EXISTS_DEFAULT should not call catalogs [spark]

2025-02-10 Thread via GitHub
cloud-fan commented on code in PR #49840: URL: https://github.com/apache/spark/pull/49840#discussion_r1950179313 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/literals.scala: ## @@ -265,6 +267,23 @@ object Literal { s"Literal must have a corresp

Re: [PR] [SPARK-51132][ML][BUILD] Upgrade `JPMML` to 1.7.1 [spark]

2025-02-10 Thread via GitHub
wayneguow commented on PR #49854: URL: https://github.com/apache/spark/pull/49854#issuecomment-2649705472 Thank you all! Also, those two annoying Jersey warning logs are now gone. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHu

Re: [PR] [SPARK-51132][ML][BUILD] Upgrade `JPMML` to 1.7.1 [spark]

2025-02-10 Thread via GitHub
LuciferYang commented on PR #49854: URL: https://github.com/apache/spark/pull/49854#issuecomment-2649688178 Merged into master and branch-4.0. Thanks @wayneguow @zhengruifeng @dongjoon-hyun @pan3793 @vruusmann -- This is an automated message from the Apache Git Service. To respond to t

[PR] [SPARK-51151][SS] Fix internal and public API naming for `TimeMode.None()` for TransformWithState [spark]

2025-02-10 Thread via GitHub
jingz-db opened a new pull request, #49873: URL: https://github.com/apache/spark/pull/49873 ### What changes were proposed in this pull request? Currently the naming of `TimeMode.None()` is different with internal case class `case object None extends TimeMode`. This leads to

Re: [PR] [SPARK-51132][ML][BUILD] Upgrade `JPMML` to 1.7.1 [spark]

2025-02-10 Thread via GitHub
LuciferYang closed pull request #49854: [SPARK-51132][ML][BUILD] Upgrade `JPMML` to 1.7.1 URL: https://github.com/apache/spark/pull/49854 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-51149][CORE] Log classpath in SparkSubmit on ClassNotFoundException [spark]

2025-02-10 Thread via GitHub
pan3793 commented on PR #49870: URL: https://github.com/apache/spark/pull/49870#issuecomment-2649684882 Seems you can use `spark-submit --verbose ...` to rerun the failed cases to know the classpath. I think verbose error messages should help administrators diagnose the issue, but th

Re: [PR] [SPARK-51150][ML] Explicitly pass the session in meta algorithm writers [spark]

2025-02-10 Thread via GitHub
zhengruifeng commented on PR #49871: URL: https://github.com/apache/spark/pull/49871#issuecomment-2649670489 The python side is kind of complicated (mixture of sc, connect session, classic session), I will resolve it in separate PRs -- This is an automated message from the Apache Git Serv

[PR] [SPARK-51150][ML] Explicitly pass the session in meta algorithm writers [spark]

2025-02-10 Thread via GitHub
zhengruifeng opened a new pull request, #49871: URL: https://github.com/apache/spark/pull/49871 ### What changes were proposed in this pull request? Explicitly pass the session to avoid recreating it ### Why are the changes needed? The overhead of get/create a session

[PR] [MINOR][DOCS] Fix incorrect description of constraint on spark.sql.adaptive.coalescePartitions.minPartitionSize [spark]

2025-02-10 Thread via GitHub
JoshRosen opened a new pull request, #49872: URL: https://github.com/apache/spark/pull/49872 ### What changes were proposed in this pull request? This PR addresses a minor problem in the SQL performance guide's description of the `spark.sql.adaptive.coalescePartitions.minPartitionSize

Re: [PR] Revert "[SPARK-51140][ML][4.0] Sort the params before saving" [spark]

2025-02-10 Thread via GitHub
dongjoon-hyun commented on PR #49869: URL: https://github.com/apache/spark/pull/49869#issuecomment-2649649588 Thank you for your swift action! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] [SPARK-51095][CORE][SQL] Include caller context for hdfs audit logs for calls from driver [spark]

2025-02-10 Thread via GitHub
pan3793 commented on code in PR #49814: URL: https://github.com/apache/spark/pull/49814#discussion_r1950146770 ## sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala: ## @@ -171,6 +172,11 @@ private[hive] class HiveClientImpl( private def newState():

Re: [PR] [SPARK-51119][SQL] Readers on executors resolving EXISTS_DEFAULT should not call catalogs [spark]

2025-02-10 Thread via GitHub
szehon-ho commented on code in PR #49840: URL: https://github.com/apache/spark/pull/49840#discussion_r1950141582 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/literals.scala: ## @@ -265,6 +267,23 @@ object Literal { s"Literal must have a corresp

Re: [PR] Revert "[SPARK-51140][ML][4.0] Sort the params before saving" [spark]

2025-02-10 Thread via GitHub
dongjoon-hyun commented on PR #49869: URL: https://github.com/apache/spark/pull/49869#issuecomment-2649645368 BTW, you also can revert directly from `branch-4.0` in this case, @zhengruifeng ~ -- This is an automated message from the Apache Git Service. To respond to the message, please lo

Re: [PR] [SPARK-51119][SQL] Readers on executors resolving EXISTS_DEFAULT should not call catalogs [spark]

2025-02-10 Thread via GitHub
szehon-ho commented on code in PR #49840: URL: https://github.com/apache/spark/pull/49840#discussion_r1950141582 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/literals.scala: ## @@ -265,6 +267,23 @@ object Literal { s"Literal must have a corresp

Re: [PR] [SPARK-51119][SQL] Readers on executors resolving EXISTS_DEFAULT should not call catalogs [spark]

2025-02-10 Thread via GitHub
szehon-ho commented on code in PR #49840: URL: https://github.com/apache/spark/pull/49840#discussion_r1950141582 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/literals.scala: ## @@ -265,6 +267,23 @@ object Literal { s"Literal must have a corresp

Re: [PR] Revert "[SPARK-51140][ML][4.0] Sort the params before saving" [spark]

2025-02-10 Thread via GitHub
zhengruifeng closed pull request #49869: Revert "[SPARK-51140][ML][4.0] Sort the params before saving" URL: https://github.com/apache/spark/pull/49869 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] [SPARK-50917][EXAMPLES] Add Pi Scala example to work both for Connect and Classic [spark]

2025-02-10 Thread via GitHub
yaooqinn commented on PR #49617: URL: https://github.com/apache/spark/pull/49617#issuecomment-2649652651 Thank you very much, @dongjoon-hyun , @cloud-fan , @HyukjinKwon . -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

Re: [PR] [SPARK-51148][BUILD] Upgrade `zstd-jni` to 1.5.6-10 [spark]

2025-02-10 Thread via GitHub
dongjoon-hyun closed pull request #49868: [SPARK-51148][BUILD] Upgrade `zstd-jni` to 1.5.6-10 URL: https://github.com/apache/spark/pull/49868 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spec

Re: [PR] [SPARK-51148][BUILD] Upgrade `zstd-jni` to 1.5.6-10 [spark]

2025-02-10 Thread via GitHub
dongjoon-hyun commented on PR #49868: URL: https://github.com/apache/spark/pull/49868#issuecomment-2649651435 Merged to master~ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific commen

Re: [PR] [SPARK-51132][ML][BUILD] Upgrade `JPMML` to 1.7.1 [spark]

2025-02-10 Thread via GitHub
dongjoon-hyun commented on PR #49854: URL: https://github.com/apache/spark/pull/49854#issuecomment-2649647703 Oh, ya. Sorry, I missed that, @zhengruifeng . :) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL a

Re: [PR] Revert "[SPARK-51140][ML][4.0] Sort the params before saving" [spark]

2025-02-10 Thread via GitHub
zhengruifeng commented on PR #49869: URL: https://github.com/apache/spark/pull/49869#issuecomment-2649648624 merged to 4.0 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. T

Re: [PR] [SPARK-50793][SQL] Fix MySQL cast function for DOUBLE, LONGTEXT, BIGINT and BLOB types [spark]

2025-02-10 Thread via GitHub
sunxiaoguang commented on PR #49453: URL: https://github.com/apache/spark/pull/49453#issuecomment-2649646282 Hello @yaooqinn I'm wondering if you could take some time to look at the latest code and share any comments you might have? -- This is an automated message from the Apache Git

Re: [PR] [SPARK-50582][SQL][PYTHON] Add quote builtin function [spark]

2025-02-10 Thread via GitHub
sarutak commented on PR #49191: URL: https://github.com/apache/spark/pull/49191#issuecomment-2649643699 Ah, O.K, I'll change them. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific com

[PR] [SPARK-51149][CORE] Log classpath in SparkSubmit on ClassNotFoundException [spark]

2025-02-10 Thread via GitHub
vrozov opened a new pull request, #49870: URL: https://github.com/apache/spark/pull/49870 ### What changes were proposed in this pull request? The PR adds logging of the classpath in SparkSubmit when SparkApp throws `ClassNotFoundException`. ### Why are the changes needed? When

Re: [PR] [SPARK-51147][SS] Refactor streaming related classes to a dedicated streaming directory [spark]

2025-02-10 Thread via GitHub
anishshri-db commented on code in PR #49867: URL: https://github.com/apache/spark/pull/49867#discussion_r1950133011 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/python/PythonMicroBatchStream.scala: ## @@ -120,4 +120,3 @@ object PythonMicroBatchStream

Re: [PR] [SPARK-51142][ML][CONNECT] ML protobufs clean up [spark]

2025-02-10 Thread via GitHub
zhengruifeng commented on PR #49862: URL: https://github.com/apache/spark/pull/49862#issuecomment-2649641840 merged to master/4.0 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comm

Re: [PR] [SPARK-51142][ML][CONNECT] ML protobufs clean up [spark]

2025-02-10 Thread via GitHub
zhengruifeng closed pull request #49862: [SPARK-51142][ML][CONNECT] ML protobufs clean up URL: https://github.com/apache/spark/pull/49862 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-51132][ML][BUILD] Upgrade `JPMML` to 1.7.1 [spark]

2025-02-10 Thread via GitHub
zhengruifeng commented on PR #49854: URL: https://github.com/apache/spark/pull/49854#issuecomment-2649640350 > cc @zhengruifeng I have approved https://github.com/apache/spark/pull/49854#issuecomment-2646737700, not sure why I am not in the `Reviewers` list :) -- This is an automa

[PR] Revert "[SPARK-51140][ML] Sort the params before saving" [spark]

2025-02-10 Thread via GitHub
zhengruifeng opened a new pull request, #49869: URL: https://github.com/apache/spark/pull/49869 This reverts commit fab541d43395a61c1b295aa46717d183bf4236ff. It is just an improvement, no need to backported in 4.0 -- This is an automated message from the Apache Git Service. To respo

Re: [PR] [SPARK-50582][SQL][PYTHON] Add quote builtin function [spark]

2025-02-10 Thread via GitHub
sarutak commented on PR #49191: URL: https://github.com/apache/spark/pull/49191#issuecomment-2649638435 @dongjoon-hyun No problem. If the current change seems good to you, I'll rebase once more. -- This is an automated message from the Apache Git Service. To respond to the message,

Re: [PR] [SPARK-51140][ML] Sort the params before saving [spark]

2025-02-10 Thread via GitHub
zhengruifeng commented on PR #49861: URL: https://github.com/apache/spark/pull/49861#issuecomment-2649636424 > IIUC, this is not a mandatory part of `Spark Connect ML`. This is a more general and orthogonal improvement, isn't it? correct, I will revert it in 4.0 -- This is an autom

Re: [PR] [SPARK-51140][ML] Sort the params before saving [spark]

2025-02-10 Thread via GitHub
dongjoon-hyun commented on PR #49861: URL: https://github.com/apache/spark/pull/49861#issuecomment-2649634431 IIUC, this is not a mandatory part of `Spark Connect ML`. This is a more general and orthogonal improvement, isn't it? -- This is an automated message from the Apache Git Service.

Re: [PR] [SPARK-51140][ML] Sort the params before saving [spark]

2025-02-10 Thread via GitHub
dongjoon-hyun commented on PR #49861: URL: https://github.com/apache/spark/pull/49861#issuecomment-2649633463 Thank you. Yes, it would be great if we set a good example which complies the Apache Spark policy, @zhengruifeng . Otherwise, we are unable to prevent the other contributors'

Re: [PR] [SPARK-51142][ML][CONNECT] ML protobufs clean up [spark]

2025-02-10 Thread via GitHub
zhengruifeng commented on PR #49862: URL: https://github.com/apache/spark/pull/49862#issuecomment-2649632272 I am going to merge this into 4.0, otherwise the protobuf compatibility between 4.0 and 4.1 will be broken -- This is an automated message from the Apache Git Service. To respond t

Re: [PR] [SPARK-51140][ML] Sort the params before saving [spark]

2025-02-10 Thread via GitHub
zhengruifeng commented on PR #49861: URL: https://github.com/apache/spark/pull/49861#issuecomment-2649628695 @dongjoon-hyun ah, sorry, shall we revet it in 4.0? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL a

Re: [PR] [SPARK-51112][CONNECT] Avoid using pyarrow's `to_pandas` on an empty table [spark]

2025-02-10 Thread via GitHub
HyukjinKwon closed pull request #49834: [SPARK-51112][CONNECT] Avoid using pyarrow's `to_pandas` on an empty table URL: https://github.com/apache/spark/pull/49834 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL ab

Re: [PR] [SPARK-51095][CORE][SQL] Include caller context for hdfs audit logs for calls from driver [spark]

2025-02-10 Thread via GitHub
cnauroth commented on code in PR #49814: URL: https://github.com/apache/spark/pull/49814#discussion_r1950100085 ## sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala: ## @@ -171,6 +172,11 @@ private[hive] class HiveClientImpl( private def newState()

  1   2   3   >