[PR] [SPARK-49035][PYTHON] Eliminate TypeVar `ColumnOrName_` [spark]

2024-07-28 Thread via GitHub
zhengruifeng opened a new pull request, #47512: URL: https://github.com/apache/spark/pull/47512 ### What changes were proposed in this pull request? Eliminate TypeVar `ColumnOrName_` ### Why are the changes needed? unify the usage of `ColumnOrName` ### Does this PR

Re: [PR] [SPARK-49035][PYTHON] Eliminate TypeVar `ColumnOrName_` [spark]

2024-07-28 Thread via GitHub
HyukjinKwon commented on PR #47512: URL: https://github.com/apache/spark/pull/47512#issuecomment-2254592358 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] [SPARK-49035][PYTHON] Eliminate TypeVar `ColumnOrName_` [spark]

2024-07-28 Thread via GitHub
HyukjinKwon closed pull request #47512: [SPARK-49035][PYTHON] Eliminate TypeVar `ColumnOrName_` URL: https://github.com/apache/spark/pull/47512 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the sp

Re: [PR] [SPARK-47618][CORE] Use `Magic Committer` for all S3 buckets by default [spark]

2024-07-28 Thread via GitHub
github-actions[bot] commented on PR #45740: URL: https://github.com/apache/spark/pull/45740#issuecomment-2254727181 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.

Re: [PR] [SPARK-48900] Add `reason` field for all internal calls for job/stage cancellation [spark]

2024-07-28 Thread via GitHub
cloud-fan commented on code in PR #47374: URL: https://github.com/apache/spark/pull/47374#discussion_r1694383494 ## sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/QueryStageExec.scala: ## @@ -154,14 +154,14 @@ abstract class QueryStageExec extends LeafExecNode {

Re: [PR] [SPARK-49016][SQL] Queries from raw CSV files are disallowed when the referenced columns only include the internal corrupt record column [spark]

2024-07-28 Thread via GitHub
wayneguow commented on code in PR #47506: URL: https://github.com/apache/spark/pull/47506#discussion_r1694406012 ## sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVSuite.scala: ## @@ -1739,6 +1739,32 @@ abstract class CSVSuite Row(1, Date.valueOf

Re: [PR] [SPARK-48910][SQL] Use HashSet/HashMap to avoid linear searches in PreprocessTableCreation [spark]

2024-07-28 Thread via GitHub
cloud-fan commented on code in PR #47484: URL: https://github.com/apache/spark/pull/47484#discussion_r1694406485 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/rules.scala: ## @@ -248,10 +249,14 @@ case class PreprocessTableCreation(catalog: SessionCatalo

Re: [PR] [SPARK-48910][SQL] Use HashSet/HashMap to avoid linear searches in PreprocessTableCreation [spark]

2024-07-28 Thread via GitHub
cloud-fan commented on code in PR #47484: URL: https://github.com/apache/spark/pull/47484#discussion_r1694406744 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/rules.scala: ## @@ -263,12 +268,14 @@ case class PreprocessTableCreation(catalog: SessionCatalo

Re: [PR] [SPARK-47618][CORE] Use `Magic Committer` for all S3 buckets by default [spark]

2024-07-28 Thread via GitHub
dongjoon-hyun commented on PR #45740: URL: https://github.com/apache/spark/pull/45740#issuecomment-2254783067 I removed `Stale` tag. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific c

Re: [PR] [SPARK-49014][BUILD] Bump Apache Avro to 1.12.0 [spark]

2024-07-28 Thread via GitHub
dongjoon-hyun commented on code in PR #47498: URL: https://github.com/apache/spark/pull/47498#discussion_r1694418472 ## pom.xml: ## @@ -359,6 +359,11 @@ false + + avro-release-candidate + Avro Release Candidate + https://repository.apac

Re: [PR] [SPARK-49014][BUILD] Bump Apache Avro to 1.12.0 [spark]

2024-07-28 Thread via GitHub
dongjoon-hyun commented on code in PR #47498: URL: https://github.com/apache/spark/pull/47498#discussion_r1694419392 ## pom.xml: ## @@ -359,6 +359,11 @@ false + + avro-release-candidate + Avro Release Candidate + https://repository.apac

Re: [PR] [SPARK-49014][BUILD] Bump Apache Avro to 1.12.0 [spark]

2024-07-28 Thread via GitHub
dongjoon-hyun commented on code in PR #47498: URL: https://github.com/apache/spark/pull/47498#discussion_r1694419392 ## pom.xml: ## @@ -359,6 +359,11 @@ false + + avro-release-candidate + Avro Release Candidate + https://repository.apac

Re: [PR] [SPARK-49014][BUILD] Bump Apache Avro to 1.12.0 [spark]

2024-07-28 Thread via GitHub
dongjoon-hyun commented on code in PR #47498: URL: https://github.com/apache/spark/pull/47498#discussion_r1694419392 ## pom.xml: ## @@ -359,6 +359,11 @@ false + + avro-release-candidate + Avro Release Candidate + https://repository.apac

Re: [PR] [SPARK-49032][SS] Add schema path in metadata table entry, verify expected version and add operator metadata related test for operator metadata format v2 [spark]

2024-07-28 Thread via GitHub
ericm-db commented on code in PR #47510: URL: https://github.com/apache/spark/pull/47510#discussion_r1694428539 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/state/metadata/StateMetadataSource.scala: ## @@ -47,7 +47,8 @@ case class StateMetadataTableEn

Re: [PR] [SPARK-45891][SQL][PYTHON][VARIANT] Add support for interval types in the Variant Spec [spark]

2024-07-28 Thread via GitHub
LuciferYang commented on code in PR #47473: URL: https://github.com/apache/spark/pull/47473#discussion_r1694435200 ## common/utils/src/main/scala/org/apache/spark/util/DayTimeIntervalUtils.java: ## @@ -0,0 +1,134 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under

Re: [PR] [SPARK-45891][SQL][PYTHON][VARIANT] Add support for interval types in the Variant Spec [spark]

2024-07-28 Thread via GitHub
LuciferYang commented on code in PR #47473: URL: https://github.com/apache/spark/pull/47473#discussion_r1694436591 ## common/utils/src/main/scala/org/apache/spark/util/DayTimeIntervalUtils.java: ## @@ -0,0 +1,134 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under

Re: [PR] [SPARK-45891][SQL][PYTHON][VARIANT] Add support for interval types in the Variant Spec [spark]

2024-07-28 Thread via GitHub
LuciferYang commented on code in PR #47473: URL: https://github.com/apache/spark/pull/47473#discussion_r1694436695 ## common/utils/src/main/scala/org/apache/spark/util/DayTimeIntervalUtils.java: ## @@ -0,0 +1,134 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under

Re: [PR] [SPARK-45891][SQL][PYTHON][VARIANT] Add support for interval types in the Variant Spec [spark]

2024-07-28 Thread via GitHub
LuciferYang commented on code in PR #47473: URL: https://github.com/apache/spark/pull/47473#discussion_r1694436983 ## common/utils/src/main/scala/org/apache/spark/util/DayTimeIntervalUtils.java: ## @@ -0,0 +1,134 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under

Re: [PR] [SPARK-45891][SQL][PYTHON][VARIANT] Add support for interval types in the Variant Spec [spark]

2024-07-28 Thread via GitHub
LuciferYang commented on code in PR #47473: URL: https://github.com/apache/spark/pull/47473#discussion_r1694437591 ## common/utils/src/main/scala/org/apache/spark/util/DayTimeIntervalUtils.java: ## @@ -0,0 +1,134 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under

Re: [PR] [SPARK-45891][SQL][PYTHON][VARIANT] Add support for interval types in the Variant Spec [spark]

2024-07-28 Thread via GitHub
LuciferYang commented on code in PR #47473: URL: https://github.com/apache/spark/pull/47473#discussion_r1694438032 ## common/utils/src/main/scala/org/apache/spark/util/DayTimeIntervalUtils.java: ## @@ -0,0 +1,134 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under

Re: [PR] [SPARK-49031] Implement validation for the TransformWithStateExec operator using OperatorStateMetadataV2 [spark]

2024-07-28 Thread via GitHub
ericm-db commented on code in PR #47508: URL: https://github.com/apache/spark/pull/47508#discussion_r1694439733 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/StateStoreErrors.scala: ## @@ -173,8 +173,51 @@ object StateStoreErrors { StateStoreProv

Re: [PR] [SPARK-49031] Implement validation for the TransformWithStateExec operator using OperatorStateMetadataV2 [spark]

2024-07-28 Thread via GitHub
ericm-db commented on code in PR #47508: URL: https://github.com/apache/spark/pull/47508#discussion_r1694439844 ## sql/core/src/test/scala/org/apache/spark/sql/streaming/TransformWithStateSuite.scala: ## @@ -983,6 +1006,77 @@ class TransformWithStateSuite extends StateStoreMetr

[PR] [SPARK-49036] Simplify assertion test code [spark-kubernetes-operator]

2024-07-28 Thread via GitHub
dongjoon-hyun opened a new pull request, #27: URL: https://github.com/apache/spark-kubernetes-operator/pull/27 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change?

Re: [PR] [SPARK-49032][SS] Add schema path in metadata table entry, verify expected version and add operator metadata related test for operator metadata format v2 [spark]

2024-07-28 Thread via GitHub
anishshri-db commented on code in PR #47510: URL: https://github.com/apache/spark/pull/47510#discussion_r1694459213 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/state/metadata/StateMetadataSource.scala: ## @@ -47,7 +47,8 @@ case class StateMetadataTab

Re: [PR] [SPARK-49032][SS] Add schema path in metadata table entry, verify expected version and add operator metadata related test for operator metadata format v2 [spark]

2024-07-28 Thread via GitHub
ericm-db commented on code in PR #47510: URL: https://github.com/apache/spark/pull/47510#discussion_r1694459690 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/state/metadata/StateMetadataSource.scala: ## @@ -47,7 +47,8 @@ case class StateMetadataTableEn

Re: [PR] [SPARK-49031] Implement validation for the TransformWithStateExec operator using OperatorStateMetadataV2 [spark]

2024-07-28 Thread via GitHub
anishshri-db commented on code in PR #47508: URL: https://github.com/apache/spark/pull/47508#discussion_r1694461576 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/TransformWithStateExec.scala: ## @@ -666,4 +705,3 @@ object TransformWithStateExec { } } /

Re: [PR] [SPARK-49031] Implement validation for the TransformWithStateExec operator using OperatorStateMetadataV2 [spark]

2024-07-28 Thread via GitHub
anishshri-db commented on code in PR #47508: URL: https://github.com/apache/spark/pull/47508#discussion_r1694462013 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/StateStoreErrors.scala: ## @@ -173,8 +173,51 @@ object StateStoreErrors { StateStore

Re: [PR] [SPARK-49031] Implement validation for the TransformWithStateExec operator using OperatorStateMetadataV2 [spark]

2024-07-28 Thread via GitHub
anishshri-db commented on code in PR #47508: URL: https://github.com/apache/spark/pull/47508#discussion_r1694465328 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/TransformWithStateExec.scala: ## @@ -441,6 +443,43 @@ case class TransformWithStateExec( n

Re: [PR] [SPARK-49031] Implement validation for the TransformWithStateExec operator using OperatorStateMetadataV2 [spark]

2024-07-28 Thread via GitHub
anishshri-db commented on code in PR #47508: URL: https://github.com/apache/spark/pull/47508#discussion_r1694465769 ## common/utils/src/main/resources/error/error-conditions.json: ## @@ -3803,6 +3803,12 @@ ], "sqlState" : "42802" }, + "STATEFUL_PROCESSOR_DUPLICATE_

Re: [PR] [SPARK-49031] Implement validation for the TransformWithStateExec operator using OperatorStateMetadataV2 [spark]

2024-07-28 Thread via GitHub
anishshri-db commented on code in PR #47508: URL: https://github.com/apache/spark/pull/47508#discussion_r1694466068 ## common/utils/src/main/resources/error/error-conditions.json: ## @@ -3852,12 +3858,24 @@ ], "sqlState" : "42802" }, + "STATE_STORE_INVALID_CONFIG_A

Re: [PR] [SPARK-49031] Implement validation for the TransformWithStateExec operator using OperatorStateMetadataV2 [spark]

2024-07-28 Thread via GitHub
anishshri-db commented on code in PR #47508: URL: https://github.com/apache/spark/pull/47508#discussion_r1694466441 ## common/utils/src/main/resources/error/error-conditions.json: ## @@ -3852,12 +3858,24 @@ ], "sqlState" : "42802" }, + "STATE_STORE_INVALID_CONFIG_A

Re: [PR] [SPARK-49002][SQL] Consistently handle invalid location/path values for all database objects [spark]

2024-07-28 Thread via GitHub
LuciferYang commented on code in PR #47485: URL: https://github.com/apache/spark/pull/47485#discussion_r1694471506 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala: ## @@ -1411,7 +1411,8 @@ class SessionCatalog( parts.map { part =>

Re: [PR] [SPARK-49002][SQL] Consistently handle invalid location/path values for all database objects [spark]

2024-07-28 Thread via GitHub
LuciferYang commented on PR #47485: URL: https://github.com/apache/spark/pull/47485#issuecomment-2254871821 This is fine for me, but please forgive my greed: the current `field` is a string, which developers can fill in quite arbitrarily. Can we possibly make it more standardized? -- Thi

Re: [PR] [SPARK-49036] Exclude `JUnitAssertionsShouldIncludeMessage/JUnitTestContainsTooManyAsserts` PMD rules and simplify test code [spark-kubernetes-operator]

2024-07-28 Thread via GitHub
dongjoon-hyun commented on PR #27: URL: https://github.com/apache/spark-kubernetes-operator/pull/27#issuecomment-2254876112 cc @jiangzho and @viirya -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [PR] [SPARK-49036] Exclude `JUnitAssertionsShouldIncludeMessage/JUnitTestContainsTooManyAsserts` PMD rules and simplify test code [spark-kubernetes-operator]

2024-07-28 Thread via GitHub
dongjoon-hyun commented on code in PR #27: URL: https://github.com/apache/spark-kubernetes-operator/pull/27#discussion_r1694480262 ## config/pmd/ruleset.xml: ## @@ -21,13 +21,12 @@ Spark Operator Ruleset - + + + + - - - Review Com

Re: [PR] [SPARK-49036] Exclude `JUnitAssertionsShouldIncludeMessage/JUnitTestContainsTooManyAsserts` PMD rules and simplify test code [spark-kubernetes-operator]

2024-07-28 Thread via GitHub
dongjoon-hyun commented on code in PR #27: URL: https://github.com/apache/spark-kubernetes-operator/pull/27#discussion_r1694480849 ## config/pmd/ruleset.xml: ## @@ -21,13 +21,12 @@ Spark Operator Ruleset - + + Review Comment: This is recommended, but we

Re: [PR] [SPARK-49031] Implement validation for the TransformWithStateExec operator using OperatorStateMetadataV2 [spark]

2024-07-28 Thread via GitHub
ericm-db commented on code in PR #47508: URL: https://github.com/apache/spark/pull/47508#discussion_r1694484271 ## common/utils/src/main/resources/error/error-conditions.json: ## @@ -3852,12 +3858,24 @@ ], "sqlState" : "42802" }, + "STATE_STORE_INVALID_CONFIG_AFTER

Re: [PR] Operator 0.1.0 [spark-kubernetes-operator]

2024-07-28 Thread via GitHub
dongjoon-hyun commented on PR #2: URL: https://github.com/apache/spark-kubernetes-operator/pull/2#issuecomment-2254902900 For the record, the following are merged. - #13 - #14 - #15 - #16 - #17 - #18 - #19 - #20 - #21 - #22 - #23 - #12 - #24

[PR] K8s with pvc fix [spark]

2024-07-28 Thread via GitHub
gantashalavenki opened a new pull request, #47513: URL: https://github.com/apache/spark/pull/47513 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ###

Re: [PR] [SPARK-48821][SQL] Support Update in DataFrameWriterV2 [spark]

2024-07-28 Thread via GitHub
cloud-fan commented on PR #47233: URL: https://github.com/apache/spark/pull/47233#issuecomment-2254953947 If we want more compile-time safety, we can also specify the where condition in `execute(...)`, as there should be at most one where condition for an UPDATE command. I don't have a stro

Re: [PR] [SC-170296] GROUP BY with MapType nested inside complex type [spark]

2024-07-28 Thread via GitHub
cloud-fan commented on code in PR #47331: URL: https://github.com/apache/spark/pull/47331#discussion_r1694560083 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala: ## @@ -892,132 +892,108 @@ case class MapFromEntries(child: Expre

Re: [PR] [SPARK-49031] Implement validation for the TransformWithStateExec operator using OperatorStateMetadataV2 [spark]

2024-07-28 Thread via GitHub
anishshri-db commented on code in PR #47508: URL: https://github.com/apache/spark/pull/47508#discussion_r1694606014 ## common/utils/src/main/resources/error/error-conditions.json: ## @@ -3852,12 +3858,24 @@ ], "sqlState" : "42802" }, + "STATE_STORE_INVALID_CONFIG_A

Re: [PR] [SPARK-49031] Implement validation for the TransformWithStateExec operator using OperatorStateMetadataV2 [spark]

2024-07-28 Thread via GitHub
anishshri-db commented on code in PR #47508: URL: https://github.com/apache/spark/pull/47508#discussion_r1694606570 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/TransformWithStateVariableUtils.scala: ## @@ -0,0 +1,139 @@ +/* + * Licensed to the Apache Soft

Re: [PR] [SPARK-49002][SQL] Consistently handle invalid location/path values for all database objects [spark]

2024-07-28 Thread via GitHub
yaooqinn commented on PR #47485: URL: https://github.com/apache/spark/pull/47485#issuecomment-2255024802 Hi @LuciferYang, I have considered this, and the error message fits somewhat arbitrary `field`s -- This is an automated message from the Apache Git Service. To respond to the message,

Re: [PR] [SPARK-49002][SQL] Consistently handle invalid location/path values for all database objects [spark]

2024-07-28 Thread via GitHub
LuciferYang commented on PR #47485: URL: https://github.com/apache/spark/pull/47485#issuecomment-2255029849 > Hi @LuciferYang, I have considered this, and the error message fits somewhat arbitrary `field`s ok -- This is an automated message from the Apache Git Service. To respond

Re: [PR] [SPARK-48901][SPARK-48916][SS][PYTHON] Introduce clusterBy DataStreamWriter API [spark]

2024-07-28 Thread via GitHub
HeartSaVioR closed pull request #47376: [SPARK-48901][SPARK-48916][SS][PYTHON] Introduce clusterBy DataStreamWriter API URL: https://github.com/apache/spark/pull/47376 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the U

Re: [PR] [SPARK-48901][SPARK-48916][SS][PYTHON] Introduce clusterBy DataStreamWriter API [spark]

2024-07-28 Thread via GitHub
HeartSaVioR commented on PR #47376: URL: https://github.com/apache/spark/pull/47376#issuecomment-2255078465 Thanks! Merging to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-45393][BUILD] Upgrade Hadoop to 3.4.0 [spark]

2024-07-28 Thread via GitHub
LuciferYang commented on PR #45583: URL: https://github.com/apache/spark/pull/45583#issuecomment-2255088028 Sorry to disturb everyone, but when I execute `OrcEncryptionSuite` on my M2 Max, I find that there are some differences when using Hadoop 3.4.0 and Hadoop 3.3.4. `build/sbt