Re: [PR] [SPARK-48985][CONNECT] Connect Compatible Expression Constructors [spark]

2024-07-26 Thread via GitHub
zhengruifeng commented on code in PR #47464: URL: https://github.com/apache/spark/pull/47464#discussion_r1692601307 ## connect/server/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala: ## @@ -1893,33 +1855,6 @@ class SparkConnectPlanner( val

Re: [PR] [SPARK-49002][SQL] Consistently handle invalid location/path values for all database objects [spark]

2024-07-26 Thread via GitHub
yaooqinn commented on PR #47485: URL: https://github.com/apache/spark/pull/47485#issuecomment-2252246044 cc @cloud-fan @dongjoon-hyun @HyukjinKwon thanks -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[PR] Add bouncyCastle-related test dependencies to the `hive-thrift` module to fix the Maven daily test. [spark]

2024-07-26 Thread via GitHub
LuciferYang opened a new pull request, #47496: URL: https://github.com/apache/spark/pull/47496 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How

[PR] [SPARK-49013] Change key in collationsMap for Map and Array types in scala [spark]

2024-07-26 Thread via GitHub
stefankandic opened a new pull request, #47497: URL: https://github.com/apache/spark/pull/47497 ### What changes were proposed in this pull request? When deserializing map/array that is not part of the struct field, the key in collation map should just be `{"element": collatio

[PR] [SPARK-49014][BUILD] Bump Apache Avro to 1.12.0 [spark]

2024-07-26 Thread via GitHub
Fokko opened a new pull request, #47498: URL: https://github.com/apache/spark/pull/47498 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How was t

Re: [PR] [SPARK-48998][ML] Meta algorithms save/load model with SparkSession [spark]

2024-07-26 Thread via GitHub
zhengruifeng commented on PR #47477: URL: https://github.com/apache/spark/pull/47477#issuecomment-2252444974 merged to master -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] [SPARK-48998][ML] Meta algorithms save/load model with SparkSession [spark]

2024-07-26 Thread via GitHub
zhengruifeng closed pull request #47477: [SPARK-48998][ML] Meta algorithms save/load model with SparkSession URL: https://github.com/apache/spark/pull/47477 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[PR] [SPARK-49000][SQL][WIP] Fix "select count(distinct 1) from t" where t is empty table [spark]

2024-07-26 Thread via GitHub
nikolamand-db opened a new pull request, #47499: URL: https://github.com/apache/spark/pull/47499 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### H

Re: [PR] [SC-170296] GROUP BY with MapType nested inside complex type [spark]

2024-07-26 Thread via GitHub
nebojsa-db commented on code in PR #47331: URL: https://github.com/apache/spark/pull/47331#discussion_r1692937421 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala: ## @@ -892,132 +892,108 @@ case class MapFromEntries(child: Expr

Re: [PR] [SPARK-49013] Change key in collationsMap for Map and Array types in scala [spark]

2024-07-26 Thread via GitHub
stefankandic commented on PR #47497: URL: https://github.com/apache/spark/pull/47497#issuecomment-2252576861 @HyukjinKwon Please take a look if you can as you have the context from the same pyspark change. -- This is an automated message from the Apache Git Service. To respond to the mess

[PR] [SPARK-49015][CORE] Connect Server should respect `spark.log.structuredLogging.enabled` [spark]

2024-07-26 Thread via GitHub
pan3793 opened a new pull request, #47500: URL: https://github.com/apache/spark/pull/47500 ### What changes were proposed in this pull request? Currently, structured logging is constantly activated no matter value of `spark.log.structuredLogging.enabled`. ### Why are th

Re: [PR] [SPARK-49003][SQL][COLLATION] Fix calculating hash value of collated strings [spark]

2024-07-26 Thread via GitHub
ilicmarkodb closed pull request #47486: [SPARK-49003][SQL][COLLATION] Fix calculating hash value of collated strings URL: https://github.com/apache/spark/pull/47486 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[PR] [SPARK-49017] Insert statement fails when multiple parameters are being used [spark]

2024-07-26 Thread via GitHub
mihailom-db opened a new pull request, #47501: URL: https://github.com/apache/spark/pull/47501 ### What changes were proposed in this pull request? Fix for multiple parameters support. ### Why are the changes needed? The use of multiple parameters with identifiers were broken

Re: [PR] [SPARK-48989][SQL] Fix error result of throwing an exception when the `count` parameter of `SUBSTRING_INDEX` is null [spark]

2024-07-26 Thread via GitHub
wayneguow commented on PR #47481: URL: https://github.com/apache/spark/pull/47481#issuecomment-2252714575 cc @cloud-fan @miland-db -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific co

Re: [PR] [SPARK-48989][SQL] Fix error result of throwing an exception when the `count` parameter of `SUBSTRING_INDEX` is null [spark]

2024-07-26 Thread via GitHub
miland-db commented on PR #47481: URL: https://github.com/apache/spark/pull/47481#issuecomment-2252797606 What happens in non-codegen path if we pass `null` as a parameter? Does it also fail? Change for codegen path looks good. What happens if we pass some other string value instead

Re: [PR] [SPARK-49012][SQL][BUILD] Add bouncycastle-related test dependencies to the `hive-thrift` module to fix the Maven daily test [spark]

2024-07-26 Thread via GitHub
LuciferYang commented on PR #47496: URL: https://github.com/apache/spark/pull/47496#issuecomment-2252872110 cc @HyukjinKwon @yaooqinn -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-48989][SQL] Fix error result of throwing an exception when the `count` parameter of `SUBSTRING_INDEX` is null [spark]

2024-07-26 Thread via GitHub
wayneguow commented on code in PR #47481: URL: https://github.com/apache/spark/pull/47481#discussion_r1693164370 ## sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/StringExpressionsSuite.scala: ## @@ -356,6 +356,8 @@ class StringExpressionsSuite extends Spa

Re: [PR] [SPARK-49015][CORE] Connect Server should respect `spark.log.structuredLogging.enabled` [spark]

2024-07-26 Thread via GitHub
pan3793 commented on PR #47500: URL: https://github.com/apache/spark/pull/47500#issuecomment-2252881716 cc @gengliangwang @HyukjinKwon -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specifi

Re: [PR] [SPARK-48989][SQL] Fix error result of throwing an exception when the `count` parameter of `SUBSTRING_INDEX` is null [spark]

2024-07-26 Thread via GitHub
wayneguow commented on code in PR #47481: URL: https://github.com/apache/spark/pull/47481#discussion_r1693171507 ## sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/StringExpressionsSuite.scala: ## @@ -356,6 +356,8 @@ class StringExpressionsSuite extends Spa

[PR] Ilicmarkodb/fix string hash [spark]

2024-07-26 Thread via GitHub
ilicmarkodb opened a new pull request, #47502: URL: https://github.com/apache/spark/pull/47502 ### What changes were proposed in this pull request? Changed hash function to be collation aware. ### Why are the changes needed? We were getting the wrong hash for collated str

Re: [PR] [SPARK-49012][SQL][BUILD] Add bouncycastle-related test dependencies to the `hive-thrift` module to fix the Maven daily test [spark]

2024-07-26 Thread via GitHub
LuciferYang commented on PR #47496: URL: https://github.com/apache/spark/pull/47496#issuecomment-2252885983 https://github.com/apache/spark/blob/5ccf9ba958f492c1eb4dde22a647ba75aba63d8e/pom.xml#L3165-L3166 https://github.com/apache/spark/blob/5ccf9ba958f492c1eb4dde22a647ba75aba63d8e/

[PR] [SPARK-49018] Fix approx_count_distinct not working correctly with collation [spark]

2024-07-26 Thread via GitHub
viktorluc-db opened a new pull request, #47503: URL: https://github.com/apache/spark/pull/47503 ### What changes were proposed in this pull request? Fix for approx_count_distinct not working correctly with collated strings. ### Why are the changes needed? approx_count_distinc

[PR] [WIP] Test removing the `hadoop.security.key.provider.path` configuration with global scope. [spark]

2024-07-26 Thread via GitHub
LuciferYang opened a new pull request, #47504: URL: https://github.com/apache/spark/pull/47504 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How

Re: [PR] [WIP] Test removing the `hadoop.security.key.provider.path` configuration with global scope. [spark]

2024-07-26 Thread via GitHub
LuciferYang commented on code in PR #47504: URL: https://github.com/apache/spark/pull/47504#discussion_r1693195055 ## sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/orc/OrcEncryptionSuite.scala: ## @@ -21,12 +21,18 @@ import java.util.Random import org.apa

Re: [PR] [SPARK-49012][SQL][BUILD] Add bouncycastle-related test dependencies to the `hive-thriftserver` module to fix the Maven daily test [spark]

2024-07-26 Thread via GitHub
dongjoon-hyun commented on PR #47496: URL: https://github.com/apache/spark/pull/47496#issuecomment-2252916931 I agree with adding this because it looks correct. However, let's figure out why this causes a problem from yesterday, @LuciferYang and @yaooqinn . If you don't mind, let's hold on

Re: [PR] [SPARK-49012][SQL][BUILD] Add bouncycastle-related test dependencies to the `hive-thriftserver` module to fix the Maven daily test [spark]

2024-07-26 Thread via GitHub
LuciferYang commented on PR #47496: URL: https://github.com/apache/spark/pull/47496#issuecomment-2252926189 @dongjoon-hyun As I mentioned in the PR description: 1. `sql-on-files.sql` added `CREATE TABLE sql_on_files.test_orc USING ORC AS SELECT 1;` yesterday 2. we have configured `hado

Re: [PR] [SPARK-49012][SQL][BUILD] Add bouncycastle-related test dependencies to the `hive-thriftserver` module to fix the Maven daily test [spark]

2024-07-26 Thread via GitHub
dongjoon-hyun commented on PR #47496: URL: https://github.com/apache/spark/pull/47496#issuecomment-2252925497 I updated my comment. I'm going to merge this right now. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

Re: [PR] [SPARK-49012][SQL][BUILD] Add bouncycastle-related test dependencies to the `hive-thriftserver` module to fix the Maven daily test [spark]

2024-07-26 Thread via GitHub
dongjoon-hyun closed pull request #47496: [SPARK-49012][SQL][BUILD] Add bouncycastle-related test dependencies to the `hive-thriftserver` module to fix the Maven daily test URL: https://github.com/apache/spark/pull/47496 -- This is an automated message from the Apache Git Service. To respond

Re: [PR] [SPARK-49012][SQL][BUILD] Add bouncycastle-related test dependencies to the `hive-thriftserver` module to fix the Maven daily test [spark]

2024-07-26 Thread via GitHub
dongjoon-hyun commented on PR #47496: URL: https://github.com/apache/spark/pull/47496#issuecomment-2252930370 Thank you, @LuciferYang and @yaooqinn . To @LuciferYang , yes, what I asked is why `CREATE TABLE sql_on_files.test_orc USING ORC AS SELECT 1;` is special in the `FakeKeyProvi

Re: [PR] [SPARK-48989][SQL] Fix error result of throwing an exception when the `count` parameter of `SUBSTRING_INDEX`(with Codegen) is null or string [spark]

2024-07-26 Thread via GitHub
wayneguow commented on code in PR #47481: URL: https://github.com/apache/spark/pull/47481#discussion_r1693214038 ## sql/core/src/test/scala/org/apache/spark/sql/StringFunctionsSuite.scala: ## @@ -424,6 +424,29 @@ class StringFunctionsSuite extends QueryTest with SharedSparkSess

Re: [PR] [SPARK-49012][SQL][BUILD] Add bouncycastle-related test dependencies to the `hive-thriftserver` module to fix the Maven daily test [spark]

2024-07-26 Thread via GitHub
LuciferYang commented on PR #47496: URL: https://github.com/apache/spark/pull/47496#issuecomment-2252940141 Yes, that's a good question, but it seems that no other test cases in the `hive-thriftserver` module have created and written data in Orc format before yestoday ... ![image](h

Re: [PR] Ilicmarkodb/fix string hash [spark]

2024-07-26 Thread via GitHub
stefankandic commented on code in PR #47502: URL: https://github.com/apache/spark/pull/47502#discussion_r1693213018 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/hash.scala: ## @@ -565,7 +565,15 @@ abstract class InterpretedHashFunction { case a

Re: [PR] [SPARK-49012][SQL][BUILD] Add bouncycastle-related test dependencies to the `hive-thriftserver` module to fix the Maven daily test [spark]

2024-07-26 Thread via GitHub
yaooqinn commented on PR #47496: URL: https://github.com/apache/spark/pull/47496#issuecomment-2252940898 It looks like Orc test suites are in `sql/core`, and `sql/hive` module, maybe it's the first time for `sql/hive-thriftserver` to touch orc tests, especially DDLs. -- This is an automa

Re: [PR] Ilicmarkodb/fix string hash [spark]

2024-07-26 Thread via GitHub
stefankandic commented on code in PR #47502: URL: https://github.com/apache/spark/pull/47502#discussion_r1693220878 ## sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/HashExpressionsSuite.scala: ## @@ -620,6 +620,33 @@ class HashExpressionsSuite extends Spa

Re: [PR] [SPARK-49012][SQL][BUILD] Add bouncycastle-related test dependencies to the `hive-thriftserver` module to fix the Maven daily test [spark]

2024-07-26 Thread via GitHub
dongjoon-hyun commented on PR #47496: URL: https://github.com/apache/spark/pull/47496#issuecomment-2252946939 Oh, got it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [PR] [SPARK-49012][SQL][BUILD] Add bouncycastle-related test dependencies to the `hive-thriftserver` module to fix the Maven daily test [spark]

2024-07-26 Thread via GitHub
dongjoon-hyun commented on PR #47496: URL: https://github.com/apache/spark/pull/47496#issuecomment-2252953065 I thought `ThriftServerQueryTestSuite` covers all data sources because it extends `SQLQueryTestSuite`. > class ThriftServerQueryTestSuite extends SQLQueryTestSuite with SharedThr

Re: [PR] [SPARK-48989][SQL] Fix error result of throwing an exception when the `count` parameter of `SUBSTRING_INDEX`(with Codegen) is null or string [spark]

2024-07-26 Thread via GitHub
miland-db commented on code in PR #47481: URL: https://github.com/apache/spark/pull/47481#discussion_r1693234173 ## sql/core/src/test/scala/org/apache/spark/sql/StringFunctionsSuite.scala: ## @@ -424,6 +424,29 @@ class StringFunctionsSuite extends QueryTest with SharedSparkSess

Re: [PR] [SPARK-48989][SQL] Fix error result of throwing an exception when the `count` parameter of `SUBSTRING_INDEX`(with Codegen) is null or string [spark]

2024-07-26 Thread via GitHub
miland-db commented on PR #47481: URL: https://github.com/apache/spark/pull/47481#issuecomment-2252966580 LGTM from my side -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] [SPARK-48700] [SQL] Mode expression for complex types (all collations) [spark]

2024-07-26 Thread via GitHub
GideonPotok commented on code in PR #47154: URL: https://github.com/apache/spark/pull/47154#discussion_r1693245942 ## sql/core/src/test/scala/org/apache/spark/sql/CollationSQLExpressionsSuite.scala: ## @@ -1789,44 +1798,90 @@ class CollationSQLExpressionsSuite s"named

Re: [PR] [SPARK-48382] Add `reconciler` to `spark-operator` module [spark-kubernetes-operator]

2024-07-26 Thread via GitHub
dongjoon-hyun closed pull request #12: [SPARK-48382] Add `reconciler` to `spark-operator` module URL: https://github.com/apache/spark-kubernetes-operator/pull/12 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abo

Re: [PR] [SPARK-48964][SQL][DOCS] Fix the discrepancy between implementation, comment and documentation of option `recursive.fields.max.depth` in ProtoBuf connector [spark]

2024-07-26 Thread via GitHub
wayneguow commented on PR #47458: URL: https://github.com/apache/spark/pull/47458#issuecomment-2252994418 also cc @gengliangwang, please take a look when your have time. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

Re: [PR] [SPARK-48382] Add `reconciler` to `spark-operator` module [spark-kubernetes-operator]

2024-07-26 Thread via GitHub
dongjoon-hyun commented on PR #12: URL: https://github.com/apache/spark-kubernetes-operator/pull/12#issuecomment-2253008350 Please make a new PR for `SparkOperator.java` because it's worth . -- This is an automated message from the Apache Git Service. To respond to the message, please log

[PR] [SPARK-48382][FOLLOWUP] Use `final` keyword for the applicable variables [spark-kubernetes-operator]

2024-07-26 Thread via GitHub
dongjoon-hyun opened a new pull request, #24: URL: https://github.com/apache/spark-kubernetes-operator/pull/24 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change?

Re: [PR] [SPARK-48382][FOLLOWUP] Use `final` keyword for the applicable variables [spark-kubernetes-operator]

2024-07-26 Thread via GitHub
dongjoon-hyun commented on PR #24: URL: https://github.com/apache/spark-kubernetes-operator/pull/24#issuecomment-2253045166 Could you review this PR, @jiangzho ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

Re: [PR] [SPARK-48986][CONNECT][SQL] Add ColumnNode Intermediate Representation [spark]

2024-07-26 Thread via GitHub
hvanhovell commented on PR #47466: URL: https://github.com/apache/spark/pull/47466#issuecomment-2253073540 Merging to master -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[PR] [SPARK-49019] Use `try-with-resources` to test `KubernetesClientFactory.buildKubernetesClient` [spark-kubernetes-operator]

2024-07-26 Thread via GitHub
dongjoon-hyun opened a new pull request, #25: URL: https://github.com/apache/spark-kubernetes-operator/pull/25 … ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing chang

Re: [PR] [SPARK-49003][SQL] Ilicmarkodb/fix string hash [spark]

2024-07-26 Thread via GitHub
ilicmarkodb commented on code in PR #47502: URL: https://github.com/apache/spark/pull/47502#discussion_r1693311745 ## sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/HashExpressionsSuite.scala: ## @@ -620,6 +620,33 @@ class HashExpressionsSuite extends Spar

Re: [PR] [SPARK-48382][FOLLOWUP] Use `final` keyword for the applicable variables [spark-kubernetes-operator]

2024-07-26 Thread via GitHub
dongjoon-hyun commented on PR #24: URL: https://github.com/apache/spark-kubernetes-operator/pull/24#issuecomment-2253094635 Could you review this PR, @viirya ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL a

Re: [PR] [SPARK-49019] Use `try-with-resources` to test `KubernetesClientFactory.buildKubernetesClient` [spark-kubernetes-operator]

2024-07-26 Thread via GitHub
dongjoon-hyun commented on PR #25: URL: https://github.com/apache/spark-kubernetes-operator/pull/25#issuecomment-2253094837 Could you review this PR, @viirya ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL a

[PR] [WIP][SPARK-49000][SQL] Fix aggregation for distinct literal [spark]

2024-07-26 Thread via GitHub
uros-db opened a new pull request, #47505: URL: https://github.com/apache/spark/pull/47505 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How was

[PR] [SPARK-49016][SQL] Queries from raw CSV files are disallowed when the referenced columns only include the internal corrupt record column [spark]

2024-07-26 Thread via GitHub
wayneguow opened a new pull request, #47506: URL: https://github.com/apache/spark/pull/47506 ### What changes were proposed in this pull request? From SQL migration guide:https://spark.apache.org/docs/latest/sql-migration-guide.html#upgrading-from-spark-sql-22-to-23 https

Re: [PR] [SPARK-48382][FOLLOWUP] Use `final` keyword for the applicable variables [spark-kubernetes-operator]

2024-07-26 Thread via GitHub
dongjoon-hyun commented on PR #24: URL: https://github.com/apache/spark-kubernetes-operator/pull/24#issuecomment-2253128649 Could you review this PR, @huaxingao ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the UR

Re: [PR] [SPARK-49019] Use `try-with-resources` to test `KubernetesClientFactory.buildKubernetesClient` [spark-kubernetes-operator]

2024-07-26 Thread via GitHub
dongjoon-hyun commented on PR #25: URL: https://github.com/apache/spark-kubernetes-operator/pull/25#issuecomment-2253128850 Could you review this PR, @huaxingao ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the UR

Re: [PR] [SPARK-48382][FOLLOWUP] Use `final` keyword for the applicable variables [spark-kubernetes-operator]

2024-07-26 Thread via GitHub
dongjoon-hyun closed pull request #24: [SPARK-48382][FOLLOWUP] Use `final` keyword for the applicable variables URL: https://github.com/apache/spark-kubernetes-operator/pull/24 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

Re: [PR] [SPARK-48382][FOLLOWUP] Use `final` keyword for the applicable variables [spark-kubernetes-operator]

2024-07-26 Thread via GitHub
dongjoon-hyun commented on PR #24: URL: https://github.com/apache/spark-kubernetes-operator/pull/24#issuecomment-2253142444 Merged to main. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the sp

Re: [PR] [SPARK-48382][FOLLOWUP] Use `final` keyword for the applicable variables [spark-kubernetes-operator]

2024-07-26 Thread via GitHub
dongjoon-hyun commented on PR #24: URL: https://github.com/apache/spark-kubernetes-operator/pull/24#issuecomment-2253141475 Thank you, @viirya and @huaxingao . -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL a

Re: [PR] [SPARK-49019] Use `try-with-resources` to test `KubernetesClientFactory.buildKubernetesClient` [spark-kubernetes-operator]

2024-07-26 Thread via GitHub
dongjoon-hyun commented on PR #25: URL: https://github.com/apache/spark-kubernetes-operator/pull/25#issuecomment-2253142816 Thank you, @viirya and @huaxingao . -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL a

Re: [PR] [SPARK-49019] Use `try-with-resources` to test `KubernetesClientFactory.buildKubernetesClient` [spark-kubernetes-operator]

2024-07-26 Thread via GitHub
dongjoon-hyun commented on PR #25: URL: https://github.com/apache/spark-kubernetes-operator/pull/25#issuecomment-2253143592 Merged to main. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the sp

Re: [PR] [SPARK-49019] Use `try-with-resources` to test `KubernetesClientFactory.buildKubernetesClient` [spark-kubernetes-operator]

2024-07-26 Thread via GitHub
dongjoon-hyun closed pull request #25: [SPARK-49019] Use `try-with-resources` to test `KubernetesClientFactory.buildKubernetesClient` URL: https://github.com/apache/spark-kubernetes-operator/pull/25 -- This is an automated message from the Apache Git Service. To respond to the message, please

Re: [PR] [SPARK-42307][SQL] Assign name for error _LEGACY_ERROR_TEMP_2232 [spark]

2024-07-26 Thread via GitHub
junyuc25 commented on code in PR #47354: URL: https://github.com/apache/spark/pull/47354#discussion_r1693364730 ## sql/core/src/test/scala/org/apache/spark/sql/RowSuite.scala: ## @@ -123,4 +123,17 @@ class RowSuite extends SparkFunSuite with SharedSparkSession { paramete

Re: [PR] [SPARK-42307][SQL] Assign name for error _LEGACY_ERROR_TEMP_2232 [spark]

2024-07-26 Thread via GitHub
allisonwang-db commented on code in PR #47354: URL: https://github.com/apache/spark/pull/47354#discussion_r1693367860 ## sql/core/src/test/scala/org/apache/spark/sql/RowSuite.scala: ## @@ -123,4 +123,17 @@ class RowSuite extends SparkFunSuite with SharedSparkSession { pa

Re: [PR] [SPARK-48821][SQL] Support Update in DataFrameWriterV2 [spark]

2024-07-26 Thread via GitHub
huaxingao commented on PR #47233: URL: https://github.com/apache/spark/pull/47233#issuecomment-2253170537 I had an offline discussion with @cloud-fan. The proposed API is ``` spark .update(tableName) .set(...) .where(...) .execute() ``` The reason for having th

Re: [PR] [SPARK-48999][SS] Divide PythonStreamingDataSourceSimpleSuite [spark]

2024-07-26 Thread via GitHub
allisonwang-db closed pull request #47479: [SPARK-48999][SS] Divide PythonStreamingDataSourceSimpleSuite URL: https://github.com/apache/spark/pull/47479 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [PR] [SPARK-48999][SS] Divide PythonStreamingDataSourceSimpleSuite [spark]

2024-07-26 Thread via GitHub
allisonwang-db commented on PR #47479: URL: https://github.com/apache/spark/pull/47479#issuecomment-2253214804 Thanks, merging to master! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the speci

[PR] [SPARK-49020] Avoid `raw` type usage [spark-kubernetes-operator]

2024-07-26 Thread via GitHub
dongjoon-hyun opened a new pull request, #26: URL: https://github.com/apache/spark-kubernetes-operator/pull/26 ### What changes were proposed in this pull request? This PR aims to avoid `raw` type usage. ### Why are the changes needed? We need to use use generic types lik

Re: [PR] [SPARK-49020] Avoid `raw` type usage [spark-kubernetes-operator]

2024-07-26 Thread via GitHub
dongjoon-hyun commented on PR #26: URL: https://github.com/apache/spark-kubernetes-operator/pull/26#issuecomment-2253225023 Could you review this, @jiangzho and @viirya ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and us

Re: [PR] [SPARK-49009][SQL][PYTHON] Make Column APIs and functions accept Enums [spark]

2024-07-26 Thread via GitHub
ueshin commented on PR #47495: URL: https://github.com/apache/spark/pull/47495#issuecomment-2253293672 The failure seems not related to this PR. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to t

Re: [PR] [SPARK-49009][SQL][PYTHON] Make Column APIs and functions accept Enums [spark]

2024-07-26 Thread via GitHub
ueshin commented on PR #47495: URL: https://github.com/apache/spark/pull/47495#issuecomment-2253293999 Thanks! merging to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comm

Re: [PR] [SPARK-49009][SQL][PYTHON] Make Column APIs and functions accept Enums [spark]

2024-07-26 Thread via GitHub
ueshin closed pull request #47495: [SPARK-49009][SQL][PYTHON] Make Column APIs and functions accept Enums URL: https://github.com/apache/spark/pull/47495 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [PR] [SPARK-48986][CONNECT][SQL] Add ColumnNode Intermediate Representation [spark]

2024-07-26 Thread via GitHub
asfgit closed pull request #47466: [SPARK-48986][CONNECT][SQL] Add ColumnNode Intermediate Representation URL: https://github.com/apache/spark/pull/47466 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[PR] [SPARK-48931][SS][FOLLOWUP] Reduce Cloud Store List API cost for state store maintenance task [spark]

2024-07-26 Thread via GitHub
riyaverm-db opened a new pull request, #47507: URL: https://github.com/apache/spark/pull/47507 ### What changes were proposed in this pull request? Updating migration doc for #47393 ### Why are the changes needed? Better visibility of the change. #

Re: [PR] [SPARK-48931][SS] Reduce Cloud Store List API cost for state store maintenance task [spark]

2024-07-26 Thread via GitHub
riyaverm-db commented on PR #47393: URL: https://github.com/apache/spark/pull/47393#issuecomment-2253457933 @HeartSaVioR Updated the migration doc here. https://github.com/apache/spark/pull/47507 -- This is an automated message from the Apache Git Service. To respond to the message, pleas

Re: [PR] [SPARK-48755] State V2 base implementation and ValueState support [spark]

2024-07-26 Thread via GitHub
bogao007 commented on PR #47133: URL: https://github.com/apache/spark/pull/47133#issuecomment-2253557239 @HyukjinKwon I got some other dependency errors for tests running in yarn and k8s ``` [info] - run Python application in yarn-client mode *** FAILED *** (4 seconds, 30 milliseconds

Re: [PR] [SPARK-48931][SS][FOLLOWUP] Reduce Cloud Store List API cost for state store maintenance task [spark]

2024-07-26 Thread via GitHub
riyaverm-db commented on PR #47507: URL: https://github.com/apache/spark/pull/47507#issuecomment-2253604118 @HeartSaVioR -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[PR] [SPARK-XXXX] Implement validation for the TransformWithStateExec operator using OperatorStateMetadataV2 [spark]

2024-07-26 Thread via GitHub
ericm-db opened a new pull request, #47508: URL: https://github.com/apache/spark/pull/47508 ### What changes were proposed in this pull request? Implementing validation for the TransformWithStateExec operator, so that it can't restart with a different TimeMode and OutputMode,

Re: [PR] [SPARK-49031] Implement validation for the TransformWithStateExec operator using OperatorStateMetadataV2 [spark]

2024-07-26 Thread via GitHub
anishshri-db commented on code in PR #47508: URL: https://github.com/apache/spark/pull/47508#discussion_r1693683286 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/TransformWithStateExec.scala: ## @@ -425,7 +432,10 @@ case class TransformWithStateExec(

Re: [PR] [SPARK-49031] Implement validation for the TransformWithStateExec operator using OperatorStateMetadataV2 [spark]

2024-07-26 Thread via GitHub
anishshri-db commented on code in PR #47508: URL: https://github.com/apache/spark/pull/47508#discussion_r1693683616 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StatefulProcessorHandleImpl.scala: ## @@ -301,17 +301,32 @@ class DriverStatefulProcessorHandle

Re: [PR] [SPARK-49031] Implement validation for the TransformWithStateExec operator using OperatorStateMetadataV2 [spark]

2024-07-26 Thread via GitHub
anishshri-db commented on code in PR #47508: URL: https://github.com/apache/spark/pull/47508#discussion_r1693683837 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StatefulProcessorHandleImpl.scala: ## @@ -301,17 +301,32 @@ class DriverStatefulProcessorHandle

Re: [PR] [SPARK-49031] Implement validation for the TransformWithStateExec operator using OperatorStateMetadataV2 [spark]

2024-07-26 Thread via GitHub
anishshri-db commented on code in PR #47508: URL: https://github.com/apache/spark/pull/47508#discussion_r1693684522 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/TransformWithStateVariableUtils.scala: ## @@ -0,0 +1,76 @@ +/* + * Licensed to the Apache Softw

Re: [PR] [SPARK-49031] Implement validation for the TransformWithStateExec operator using OperatorStateMetadataV2 [spark]

2024-07-26 Thread via GitHub
anishshri-db commented on code in PR #47508: URL: https://github.com/apache/spark/pull/47508#discussion_r1693684652 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/StateStoreErrors.scala: ## @@ -173,8 +173,51 @@ object StateStoreErrors { StateStore

Re: [PR] [SPARK-49031] Implement validation for the TransformWithStateExec operator using OperatorStateMetadataV2 [spark]

2024-07-26 Thread via GitHub
anishshri-db commented on code in PR #47508: URL: https://github.com/apache/spark/pull/47508#discussion_r1693685002 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/StateStoreErrors.scala: ## @@ -173,8 +173,51 @@ object StateStoreErrors { StateStore

Re: [PR] [SPARK-49031] Implement validation for the TransformWithStateExec operator using OperatorStateMetadataV2 [spark]

2024-07-26 Thread via GitHub
anishshri-db commented on code in PR #47508: URL: https://github.com/apache/spark/pull/47508#discussion_r1693685832 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/TransformWithStateExec.scala: ## @@ -441,6 +451,66 @@ case class TransformWithStateExec( n

Re: [PR] [SPARK-49031] Implement validation for the TransformWithStateExec operator using OperatorStateMetadataV2 [spark]

2024-07-26 Thread via GitHub
anishshri-db commented on code in PR #47508: URL: https://github.com/apache/spark/pull/47508#discussion_r1693686472 ## sql/core/src/test/scala/org/apache/spark/sql/streaming/TransformWithStateSuite.scala: ## @@ -983,6 +1006,77 @@ class TransformWithStateSuite extends StateStore

Re: [PR] [DO-NOT-MERGE][SPARK-47047][SS] Add changes to support reading transformWithState value state variables [spark]

2024-07-26 Thread via GitHub
anishshri-db closed pull request #47238: [DO-NOT-MERGE][SPARK-47047][SS] Add changes to support reading transformWithState value state variables URL: https://github.com/apache/spark/pull/47238 -- This is an automated message from the Apache Git Service. To respond to the message, please log o

Re: [PR] [DO-NOT-MERGE][SPARK-47047][SS] Add changes to support reading transformWithState value state variables [spark]

2024-07-26 Thread via GitHub
anishshri-db commented on PR #47238: URL: https://github.com/apache/spark/pull/47238#issuecomment-2253643239 Will cover the changes as part of Jing's PR -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[PR] [SPARK-49033][CORE] Support server-side environment variable replacement in REST Submission API [spark]

2024-07-26 Thread via GitHub
dongjoon-hyun opened a new pull request, #47509: URL: https://github.com/apache/spark/pull/47509 … ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change?

Re: [PR] [SPARK-49020] Avoid `raw` type usage [spark-kubernetes-operator]

2024-07-26 Thread via GitHub
dongjoon-hyun commented on PR #26: URL: https://github.com/apache/spark-kubernetes-operator/pull/26#issuecomment-2253660716 Thank you, @viirya . Merged to main. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the U

Re: [PR] [SPARK-49020] Avoid `raw` type usage [spark-kubernetes-operator]

2024-07-26 Thread via GitHub
dongjoon-hyun closed pull request #26: [SPARK-49020] Avoid `raw` type usage URL: https://github.com/apache/spark-kubernetes-operator/pull/26 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specif

Re: [PR] [SPARK-47870][SQL] Optimize predicate after push extra predicate through join [spark]

2024-07-26 Thread via GitHub
github-actions[bot] commented on PR #46085: URL: https://github.com/apache/spark/pull/46085#issuecomment-2253671372 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.

Re: [PR] [SPARK-47320][SQL] : The behaviour of Datasets involving self joins is inconsistent, unintuitive, with contradictions [spark]

2024-07-26 Thread via GitHub
github-actions[bot] closed pull request #45446: [SPARK-47320][SQL] : The behaviour of Datasets involving self joins is inconsistent, unintuitive, with contradictions URL: https://github.com/apache/spark/pull/45446 -- This is an automated message from the Apache Git Service. To respond to the

Re: [PR] [SPARK-47279][CORE]When the messageLoop encounter a fatal exception, such as oom, exit the JVM to avoid the driver hanging forever [spark]

2024-07-26 Thread via GitHub
github-actions[bot] closed pull request #45385: [SPARK-47279][CORE]When the messageLoop encounter a fatal exception, such as oom, exit the JVM to avoid the driver hanging forever URL: https://github.com/apache/spark/pull/45385 -- This is an automated message from the Apache Git Service. To r

Re: [PR] [SPARK-49033][CORE] Support server-side environment variable replacement in REST Submission API [spark]

2024-07-26 Thread via GitHub
dongjoon-hyun commented on PR #47509: URL: https://github.com/apache/spark/pull/47509#issuecomment-2253676008 Could you review this when you have some time, @viirya ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[PR] [SPARK-49032] Add schema path in metadata table entry, verify expected version and add operator metadata related test for operator metadata format v2 [spark]

2024-07-26 Thread via GitHub
anishshri-db opened a new pull request, #47510: URL: https://github.com/apache/spark/pull/47510 ### What changes were proposed in this pull request? Add schema path in metadata table entry, verify expected version and add operator metadata related test for operator metadata format v2

Re: [PR] [SPARK-49032][SS] Add schema path in metadata table entry, verify expected version and add operator metadata related test for operator metadata format v2 [spark]

2024-07-26 Thread via GitHub
anishshri-db commented on PR #47510: URL: https://github.com/apache/spark/pull/47510#issuecomment-2253686033 @ericm-db @HeartSaVioR - could you PTAL ? thx -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [PR] [SPARK-49033][CORE] Support server-side `environmentVariables` replacement in REST Submission API [spark]

2024-07-26 Thread via GitHub
dongjoon-hyun commented on PR #47509: URL: https://github.com/apache/spark/pull/47509#issuecomment-2253711762 Could you review this PR when you have some time, @yaooqinn ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and us

Re: [PR] [SPARK-49033][CORE] Support server-side `environmentVariables` replacement in REST Submission API [spark]

2024-07-26 Thread via GitHub
viirya commented on code in PR #47509: URL: https://github.com/apache/spark/pull/47509#discussion_r1693781033 ## core/src/main/scala/org/apache/spark/deploy/rest/StandaloneRestServer.scala: ## @@ -174,6 +174,13 @@ private[rest] class StandaloneSubmitRequestServlet( conf: Sp

Re: [PR] [SPARK-49033][CORE] Support server-side `environmentVariables` replacement in REST Submission API [spark]

2024-07-26 Thread via GitHub
dongjoon-hyun commented on code in PR #47509: URL: https://github.com/apache/spark/pull/47509#discussion_r1693782472 ## core/src/main/scala/org/apache/spark/deploy/rest/StandaloneRestServer.scala: ## @@ -174,6 +174,13 @@ private[rest] class StandaloneSubmitRequestServlet( c

Re: [PR] [SPARK-49033][CORE] Support server-side `environmentVariables` replacement in REST Submission API [spark]

2024-07-26 Thread via GitHub
dongjoon-hyun commented on PR #47509: URL: https://github.com/apache/spark/pull/47509#issuecomment-2253752891 Thank you, @viirya . It's removed. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to t

Re: [PR] [SPARK-49033][CORE] Support server-side `environmentVariables` replacement in REST Submission API [spark]

2024-07-26 Thread via GitHub
dongjoon-hyun commented on code in PR #47509: URL: https://github.com/apache/spark/pull/47509#discussion_r1693797107 ## core/src/main/scala/org/apache/spark/deploy/rest/StandaloneRestServer.scala: ## @@ -216,6 +221,7 @@ private[rest] class StandaloneSubmitRequestServlet( //

Re: [PR] [SPARK-49033][CORE] Support server-side `environmentVariables` replacement in REST Submission API [spark]

2024-07-26 Thread via GitHub
viirya commented on code in PR #47509: URL: https://github.com/apache/spark/pull/47509#discussion_r1693788302 ## core/src/main/scala/org/apache/spark/deploy/rest/StandaloneRestServer.scala: ## @@ -216,6 +221,7 @@ private[rest] class StandaloneSubmitRequestServlet( // Filter

  1   2   >