Re: [PR] [SPARK-48628][CORE] Add task peak on/off heap memory metrics [spark]

2024-07-25 Thread via GitHub
mridulm commented on code in PR #47192: URL: https://github.com/apache/spark/pull/47192#discussion_r1690907811 ## core/src/main/scala/org/apache/spark/executor/TaskMetrics.scala: ## @@ -110,9 +112,22 @@ class TaskMetrics private[spark] () extends Serializable { * joins. The

Re: [PR] [SPARK-48844][FOLLOWUP][TESTS] Cleanup duplicated data resource files in hive-thriftserver test [spark]

2024-07-25 Thread via GitHub
yaooqinn commented on code in PR #47480: URL: https://github.com/apache/spark/pull/47480#discussion_r1690912988 ## sql/core/src/test/resources/sql-tests/inputs/sql-on-files.sql: ## @@ -1,19 +1,30 @@ +CREATE DATABASE IF NOT EXISTS sql_on_files; -- Parquet +CREATE TABLE sql_on_fi

Re: [PR] [SPARK-48829][BUILD] Upgrade `RoaringBitmap` to 1.2.1 [spark]

2024-07-25 Thread via GitHub
LuciferYang commented on PR #47247: URL: https://github.com/apache/spark/pull/47247#issuecomment-2249601828 ready to go? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [PR] [ONLY TEST][HOLD] Upgrade rocksdbjni to 9.4.0 [spark]

2024-07-25 Thread via GitHub
LuciferYang commented on PR #47207: URL: https://github.com/apache/spark/pull/47207#issuecomment-2249610870 Has there been any new progress on this one -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to g

Re: [PR] [TEST janino] v3.1.12 VS v3.1.9 [spark]

2024-07-25 Thread via GitHub
LuciferYang commented on PR #47455: URL: https://github.com/apache/spark/pull/47455#issuecomment-2249631043 Are there any conclusions from this test? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [PR] [TEST janino] v3.1.12 VS v3.1.9 [spark]

2024-07-25 Thread via GitHub
panbingkun commented on PR #47455: URL: https://github.com/apache/spark/pull/47455#issuecomment-2249724363 > Are there any conclusions from this test? From the testing, it seems that there is no significant difference in performance. Regarding the version of `janino`, we seem to b

Re: [PR] [SPARK-48910][SQL] Use HashSet/HashMap to avoid linear searches in PreprocessTableCreation [spark]

2024-07-25 Thread via GitHub
vladimirg-db closed pull request #47424: [SPARK-48910][SQL] Use HashSet/HashMap to avoid linear searches in PreprocessTableCreation URL: https://github.com/apache/spark/pull/47424 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub a

Re: [PR] [TEST janino] v3.1.12 VS v3.1.9 [spark]

2024-07-25 Thread via GitHub
LuciferYang commented on PR #47455: URL: https://github.com/apache/spark/pull/47455#issuecomment-2249732282 Since this component is very critical, if there is no noticeable performance enhancement or critical bug fix, I recommend maintaining the use of the current version. cc @cloud-fan

[PR] [SPARK-48910][SQL] Use HashSet/HashMap to avoid linear searches in PreprocessTableCreation [spark]

2024-07-25 Thread via GitHub
vladimirg-db opened a new pull request, #47484: URL: https://github.com/apache/spark/pull/47484 ### What changes were proposed in this pull request? Use `HashSet`/`HashMap` instead of doing linear searches over the `Seq`. In case of 1000s of partitions this significantly i

Re: [PR] [SPARK-48910][SQL] Use HashSet/HashMap to avoid linear searches in PreprocessTableCreation [spark]

2024-07-25 Thread via GitHub
vladimirg-db commented on PR #47424: URL: https://github.com/apache/spark/pull/47424#issuecomment-2249742061 Recreated my form again... Also deleted apache-spark-ci-image https://github.com/apache/spark/pull/47484 -- This is an automated message from the Apache Git Service. To respond t

Re: [PR] [SPARK-48308][CORE][3.5] Unify getting data schema without partition columns in FileSourceStrategy [spark]

2024-07-25 Thread via GitHub
cloud-fan commented on PR #47483: URL: https://github.com/apache/spark/pull/47483#issuecomment-2249767736 This fixes a regression caused by https://github.com/apache/spark/pull/46565/files#diff-fbc6da30b8372e4f9aeb35ccf0d39eb796715d192c7eaeab109376584de0790eR121 , and make Delta Lake pass a

Re: [PR] [SPARK-48308][CORE][3.5] Unify getting data schema without partition columns in FileSourceStrategy [spark]

2024-07-25 Thread via GitHub
yaooqinn commented on PR #47483: URL: https://github.com/apache/spark/pull/47483#issuecomment-2249780123 Do we need this for branch 3.4? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specif

Re: [PR] [SPARK-48308][CORE][3.5] Unify getting data schema without partition columns in FileSourceStrategy [spark]

2024-07-25 Thread via GitHub
yaooqinn commented on PR #47483: URL: https://github.com/apache/spark/pull/47483#issuecomment-2249812967 Merged to branch-3.5, thank you all -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the sp

Re: [PR] [SPARK-48308][CORE][3.5] Unify getting data schema without partition columns in FileSourceStrategy [spark]

2024-07-25 Thread via GitHub
yaooqinn closed pull request #47483: [SPARK-48308][CORE][3.5] Unify getting data schema without partition columns in FileSourceStrategy URL: https://github.com/apache/spark/pull/47483 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitH

Re: [PR] [TEST janino] v3.1.12 VS v3.1.9 [spark]

2024-07-25 Thread via GitHub
panbingkun commented on PR #47455: URL: https://github.com/apache/spark/pull/47455#issuecomment-2249830972 Actually, it has some bug fixes, such as typical ones: [spark compilation failed with ArrayIndexOutOfBoundsException](https://github.com/janino-compiler/janino/issues/208) -- This

Re: [PR] [TEST janino] v3.1.12 VS v3.1.9 [spark]

2024-07-25 Thread via GitHub
LuciferYang commented on PR #47455: URL: https://github.com/apache/spark/pull/47455#issuecomment-2249837072 > Actually, it has some bug fixes, such as typical ones: [Fixed issue `spark compilation failed with ArrayIndexOutOfBoundsException`](https://github.com/janino-compiler/janino/issues/

Re: [PR] [SPARK-48910][SQL] Use HashSet/HashMap to avoid linear searches in PreprocessTableCreation [spark]

2024-07-25 Thread via GitHub
vladimirg-db commented on code in PR #47484: URL: https://github.com/apache/spark/pull/47484#discussion_r1691116665 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/rules.scala: ## @@ -248,10 +249,15 @@ case class PreprocessTableCreation(catalog: SessionCat

Re: [PR] [SPARK-48308][CORE][3.5] Unify getting data schema without partition columns in FileSourceStrategy [spark]

2024-07-25 Thread via GitHub
cloud-fan commented on PR #47483: URL: https://github.com/apache/spark/pull/47483#issuecomment-2249894610 It's fine to skip 3.4 as https://github.com/apache/spark/pull/46565 was not merged to 3.4 either. -- This is an automated message from the Apache Git Service. To respond to the messag

Re: [PR] [SPARK-48844][FOLLOWUP][TESTS] Cleanup duplicated data resource files in hive-thriftserver test [spark]

2024-07-25 Thread via GitHub
yaooqinn closed pull request #47480: [SPARK-48844][FOLLOWUP][TESTS] Cleanup duplicated data resource files in hive-thriftserver test URL: https://github.com/apache/spark/pull/47480 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

Re: [PR] [SPARK-48308][CORE][3.5] Unify getting data schema without partition columns in FileSourceStrategy [spark]

2024-07-25 Thread via GitHub
yaooqinn commented on PR #47483: URL: https://github.com/apache/spark/pull/47483#issuecomment-2249906895 Thank you @cloud-fan -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment

Re: [PR] [SPARK-48844][FOLLOWUP][TESTS] Cleanup duplicated data resource files in hive-thriftserver test [spark]

2024-07-25 Thread via GitHub
yaooqinn commented on PR #47480: URL: https://github.com/apache/spark/pull/47480#issuecomment-2249910142 Merged to master, thank you @cloud-fan -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to th

Re: [PR] [SPARK-48910][SQL] Use HashSet/HashMap to avoid linear searches in PreprocessTableCreation [spark]

2024-07-25 Thread via GitHub
vladimirg-db commented on PR #47484: URL: https://github.com/apache/spark/pull/47484#issuecomment-2250017548 @HyukjinKwon hi. Finally managed to make the Actions in my fork work. Tests passed. -- This is an automated message from the Apache Git Service. To respond to the message, please l

Re: [PR] [SPARK-48344][SQL] SQL API change to support execution of compound statements [spark]

2024-07-25 Thread via GitHub
miland-db commented on code in PR #47403: URL: https://github.com/apache/spark/pull/47403#discussion_r1691277611 ## sql/core/src/main/scala/org/apache/spark/sql/scripting/SqlScriptingInterpreter.scala: ## @@ -73,11 +74,19 @@ case class SqlScriptingInterpreter() { .map

Re: [PR] [MINOR][DOCS] Update doc `sql/README.md` [spark]

2024-07-25 Thread via GitHub
HyukjinKwon commented on PR #47476: URL: https://github.com/apache/spark/pull/47476#issuecomment-2250101451 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] [SPARK-48910][SQL] Use HashSet/HashMap to avoid linear searches in PreprocessTableCreation [spark]

2024-07-25 Thread via GitHub
vladimirg-db commented on code in PR #47484: URL: https://github.com/apache/spark/pull/47484#discussion_r1691291589 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/rules.scala: ## @@ -248,10 +249,15 @@ case class PreprocessTableCreation(catalog: SessionCat

Re: [PR] [MINOR][DOCS] Update doc `sql/README.md` [spark]

2024-07-25 Thread via GitHub
HyukjinKwon closed pull request #47476: [MINOR][DOCS] Update doc `sql/README.md` URL: https://github.com/apache/spark/pull/47476 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment

Re: [PR] [SPARK-48849][SS]Create OperatorStateMetadataV2 for the TransformWithStateExec operator [spark]

2024-07-25 Thread via GitHub
HeartSaVioR commented on code in PR #47445: URL: https://github.com/apache/spark/pull/47445#discussion_r1691375121 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/IncrementalExecution.scala: ## @@ -208,13 +208,16 @@ class IncrementalExecution( }

Re: [PR] [SPARK-48849][SS]Create OperatorStateMetadataV2 for the TransformWithStateExec operator [spark]

2024-07-25 Thread via GitHub
HeartSaVioR commented on PR #47445: URL: https://github.com/apache/spark/pull/47445#issuecomment-2250228784 Thanks! Merging to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-48849][SS]Create OperatorStateMetadataV2 for the TransformWithStateExec operator [spark]

2024-07-25 Thread via GitHub
HeartSaVioR closed pull request #47445: [SPARK-48849][SS]Create OperatorStateMetadataV2 for the TransformWithStateExec operator URL: https://github.com/apache/spark/pull/47445 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and u

[PR] [WIP][SPARK-49002][SQL] Consistently handle invalid location/path values for all database objects [spark]

2024-07-25 Thread via GitHub
yaooqinn opened a new pull request, #47485: URL: https://github.com/apache/spark/pull/47485 … ### What changes were proposed in this pull request? We are now consistently handling invalid location/path values for all database objects in this pull request. Before th

Re: [PR] [SPARK-32086][YARN]Bug fix for RemoveBroadcast RPC failed after executor is shutdown [spark]

2024-07-25 Thread via GitHub
esnhysythh commented on PR #28921: URL: https://github.com/apache/spark/pull/28921#issuecomment-2250349381 This Pull request is dangerous. It may cause deadlock problems (I have experimented with it). -- This is an automated message from the Apache Git Service. To respond to the messa

Re: [PR] [SPARK-48985][CONNECT] Connect Compatible Expression Constructors [spark]

2024-07-25 Thread via GitHub
hvanhovell commented on code in PR #47464: URL: https://github.com/apache/spark/pull/47464#discussion_r1691630792 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala: ## @@ -519,6 +519,8 @@ object FunctionRegistry { expressionBuilder

Re: [PR] [SPARK-48985][CONNECT] Connect Compatible Expression Constructors [spark]

2024-07-25 Thread via GitHub
hvanhovell commented on code in PR #47464: URL: https://github.com/apache/spark/pull/47464#discussion_r1691639698 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala: ## @@ -519,6 +519,8 @@ object FunctionRegistry { expressionBuilder

[PR] [SPARK-49003][SQL][COLLATION] Fix calculating hash value of collated strings [spark]

2024-07-25 Thread via GitHub
ilicmarkodb opened a new pull request, #47486: URL: https://github.com/apache/spark/pull/47486 ### What changes were proposed in this pull request? Fixed calculating hash value of collated strings. Changed hashing function to use proper hash for collated strings. ### Why are the

Re: [PR] [SPARK-48901][SPARK-48916][SS][PYTHON] Introduce clusterBy DataStreamWriter API [spark]

2024-07-25 Thread via GitHub
chirag-s-db commented on PR #47376: URL: https://github.com/apache/spark/pull/47376#issuecomment-2250755596 @HeartSaVioR https://github.com/apache/spark/pull/47301 has been merged, ready for review again! -- This is an automated message from the Apache Git Service. To respond to the messa

[PR] Mima [spark]

2024-07-25 Thread via GitHub
xupefei opened a new pull request, #47487: URL: https://github.com/apache/spark/pull/47487 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How was

Re: [PR] [MINOR][DOCS] Update doc `sql/README.md` [spark]

2024-07-25 Thread via GitHub
amaliujia commented on code in PR #47476: URL: https://github.com/apache/spark/pull/47476#discussion_r1691718137 ## sql/README.md: ## @@ -3,7 +3,8 @@ Spark SQL This module provides support for executing relational queries expressed in either SQL or the DataFrame/Dataset API.

Re: [PR] [SPARK-48503][SQL] Allow grouping on expressions in scalar subqueries, if they are bound to outer rows [spark]

2024-07-25 Thread via GitHub
agubichev commented on PR #47388: URL: https://github.com/apache/spark/pull/47388#issuecomment-2250792171 > To extend it a bit more, shall we allow `where func(T_inner.x) = T_outer.date group by T_inner.x` if the `func` guarantees to produce different results for different values of `T_inne

[PR] [SPARK-49005][K8S][3.5] Use `17-jammy` tag instead of `17` to prevent Pyth… [spark]

2024-07-25 Thread via GitHub
dongjoon-hyun opened a new pull request, #47488: URL: https://github.com/apache/spark/pull/47488 …on 3.12 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change?

[PR] [SPARK-49005][K8S][3.4] Use `17-jammy` tag instead of `17` to prevent Python 3.12 [spark]

2024-07-25 Thread via GitHub
dongjoon-hyun opened a new pull request, #47489: URL: https://github.com/apache/spark/pull/47489 … ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change?

[PR] [SPARK-49006] Implement purging for OperatorStateMetadataV2 and StateSchemaV3 files [spark]

2024-07-25 Thread via GitHub
ericm-db opened a new pull request, #47490: URL: https://github.com/apache/spark/pull/47490 ### What changes were proposed in this pull request? Currently, OperatorStateMetadataV2 and StateSchemaV3 files are written for every new query run. This PR will implement purging files

Re: [PR] [SPARK-49005][K8S][3.5] Use `17-jammy` tag instead of `17` to prevent Python 12 [spark]

2024-07-25 Thread via GitHub
dongjoon-hyun commented on PR #47488: URL: https://github.com/apache/spark/pull/47488#issuecomment-2250942959 cc @yaooqinn and @viirya -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specifi

Re: [PR] [SPARK-49005][K8S][3.4] Use `17-jammy` tag instead of `17-jre` to prevent Python 3.12 [spark]

2024-07-25 Thread via GitHub
dongjoon-hyun commented on PR #47489: URL: https://github.com/apache/spark/pull/47489#issuecomment-2250943259 cc @yaooqinn and @viirya -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specifi

Re: [PR] [SPARK-49005][K8S][3.5] Use `17-jammy` tag instead of `17` to prevent Python 12 [spark]

2024-07-25 Thread via GitHub
dongjoon-hyun commented on PR #47488: URL: https://github.com/apache/spark/pull/47488#issuecomment-2250938296 cc @yaooqinn and @viirya -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specifi

Re: [PR] [SPARK-49005][K8S][3.5] Use `17-jammy` tag instead of `17` to prevent Python 12 [spark]

2024-07-25 Thread via GitHub
dongjoon-hyun commented on PR #47488: URL: https://github.com/apache/spark/pull/47488#issuecomment-2250955060 Also, cc @huaxingao -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific com

Re: [PR] [SPARK-49005][K8S][3.4] Use `17-jammy` tag instead of `17-jre` to prevent Python 3.12 [spark]

2024-07-25 Thread via GitHub
dongjoon-hyun commented on PR #47489: URL: https://github.com/apache/spark/pull/47489#issuecomment-2250955410 Also, cc @huaxingao -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific com

Re: [PR] [SPARK-49005][K8S][3.5] Use `17-jammy` tag instead of `17` to prevent Python 12 [spark]

2024-07-25 Thread via GitHub
dongjoon-hyun commented on PR #47488: URL: https://github.com/apache/spark/pull/47488#issuecomment-2250968487 Thank you so much, @huaxingao ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the s

Re: [PR] [SPARK-49005][K8S][3.4] Use `17-jammy` tag instead of `17-jre` to prevent Python 3.12 [spark]

2024-07-25 Thread via GitHub
dongjoon-hyun commented on PR #47489: URL: https://github.com/apache/spark/pull/47489#issuecomment-2250972235 Thank you, @huaxingao ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-45891][SQL][PYTHON][VARIANT] Add support for interval types in the Variant Spec [spark]

2024-07-25 Thread via GitHub
gene-db commented on code in PR #47473: URL: https://github.com/apache/spark/pull/47473#discussion_r1691812112 ## common/variant/src/main/java/org/apache/spark/types/variant/VariantUtil.java: ## @@ -120,6 +120,12 @@ public class VariantUtil { // Long string value. The content

Re: [PR] [SPARK-45891][SQL][PYTHON][VARIANT] Add support for interval types in the Variant Spec [spark]

2024-07-25 Thread via GitHub
harshmotw-db commented on code in PR #47473: URL: https://github.com/apache/spark/pull/47473#discussion_r1691852875 ## common/variant/src/main/java/org/apache/spark/types/variant/Variant.java: ## @@ -113,6 +126,11 @@ public String getString() { return VariantUtil.getString(

Re: [PR] [SPARK-49005][K8S][3.5] Use `17-jammy` tag instead of `17` to prevent Python 12 [spark]

2024-07-25 Thread via GitHub
dongjoon-hyun commented on PR #47488: URL: https://github.com/apache/spark/pull/47488#issuecomment-2251060362 Let me merge this to recover the CIs. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] [SPARK-49005][K8S][3.5] Use `17-jammy` tag instead of `17` to prevent Python 12 [spark]

2024-07-25 Thread via GitHub
dongjoon-hyun closed pull request #47488: [SPARK-49005][K8S][3.5] Use `17-jammy` tag instead of `17` to prevent Python 12 URL: https://github.com/apache/spark/pull/47488 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

Re: [PR] [SPARK-49005][K8S][3.4] Use `17-jammy` tag instead of `17-jre` to prevent Python 3.12 [spark]

2024-07-25 Thread via GitHub
dongjoon-hyun commented on PR #47489: URL: https://github.com/apache/spark/pull/47489#issuecomment-2251061801 Let me merge this to recover the CIs. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] [SPARK-49005][K8S][3.4] Use `17-jammy` tag instead of `17-jre` to prevent Python 3.12 [spark]

2024-07-25 Thread via GitHub
dongjoon-hyun closed pull request #47489: [SPARK-49005][K8S][3.4] Use `17-jammy` tag instead of `17-jre` to prevent Python 3.12 URL: https://github.com/apache/spark/pull/47489 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and u

Re: [PR] [SPARK-49005][K8S][3.5] Use `17-jammy` tag instead of `17` to prevent Python 12 [spark]

2024-07-25 Thread via GitHub
dongjoon-hyun commented on PR #47488: URL: https://github.com/apache/spark/pull/47488#issuecomment-2251069589 Thank you, @viirya ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific com

Re: [PR] [SPARK-45891][SQL][PYTHON][VARIANT] Add support for interval types in the Variant Spec [spark]

2024-07-25 Thread via GitHub
harshmotw-db commented on code in PR #47473: URL: https://github.com/apache/spark/pull/47473#discussion_r1691939585 ## common/variant/src/main/java/org/apache/spark/types/variant/VariantUtil.java: ## @@ -377,11 +405,52 @@ public static long getLong(byte[] value, int pos) {

Re: [PR] [SPARK-45891][SQL][PYTHON][VARIANT] Add support for interval types in the Variant Spec [spark]

2024-07-25 Thread via GitHub
harshmotw-db commented on code in PR #47473: URL: https://github.com/apache/spark/pull/47473#discussion_r1691942388 ## common/variant/src/main/java/org/apache/spark/types/variant/Variant.java: ## @@ -88,6 +91,16 @@ public long getLong() { return VariantUtil.getLong(value, p

Re: [PR] [wip]Metadata vcf [spark]

2024-07-25 Thread via GitHub
ericm-db closed pull request #47446: [wip]Metadata vcf URL: https://github.com/apache/spark/pull/47446 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: r

Re: [PR] [SPARK-48821][SQL] Support Update in DataFrameWriterV2 [spark]

2024-07-25 Thread via GitHub
szehon-ho commented on PR #47233: URL: https://github.com/apache/spark/pull/47233#issuecomment-2251363935 Changed the api to be: ``` spark.table(tableNameAsString) .update(Map("salary" -> lit(-1))) .where($"pk" >= 2) .execute() ``` Can add op

Re: [PR] [SPARK-48996][SQL][PYTHON] Allow bare literals for __and__ and __or__ of Column [spark]

2024-07-25 Thread via GitHub
ueshin commented on PR #47474: URL: https://github.com/apache/spark/pull/47474#issuecomment-2251366183 The failure seems not related to this PR. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to t

Re: [PR] [SPARK-48996][SQL][PYTHON] Allow bare literals for __and__ and __or__ of Column [spark]

2024-07-25 Thread via GitHub
ueshin commented on PR #47474: URL: https://github.com/apache/spark/pull/47474#issuecomment-2251366450 Thanks! merging to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comm

Re: [PR] [SPARK-48996][SQL][PYTHON] Allow bare literals for __and__ and __or__ of Column [spark]

2024-07-25 Thread via GitHub
ueshin closed pull request #47474: [SPARK-48996][SQL][PYTHON] Allow bare literals for __and__ and __or__ of Column URL: https://github.com/apache/spark/pull/47474 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL ab

[PR] [SPARK-49007][CORE] Improve `MasterPage` to support custom title [spark]

2024-07-25 Thread via GitHub
dongjoon-hyun opened a new pull request, #47491: URL: https://github.com/apache/spark/pull/47491 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### H

Re: [PR] [SPARK-48628][CORE] Add task peak on/off heap memory metrics [spark]

2024-07-25 Thread via GitHub
liuzqt commented on code in PR #47192: URL: https://github.com/apache/spark/pull/47192#discussion_r1692144786 ## core/src/main/scala/org/apache/spark/executor/TaskMetrics.scala: ## @@ -110,9 +112,22 @@ class TaskMetrics private[spark] () extends Serializable { * joins. The v

Re: [PR] [SPARK-49005][K8S][3.4] Use `17-jammy` tag instead of `17-jre` to prevent Python 3.12 [spark]

2024-07-25 Thread via GitHub
dongjoon-hyun commented on PR #47489: URL: https://github.com/apache/spark/pull/47489#issuecomment-2251424159 Thank you, @viirya ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific com

Re: [PR] [SPARK-49007][CORE] Improve `MasterPage` to support custom title [spark]

2024-07-25 Thread via GitHub
dongjoon-hyun commented on PR #47491: URL: https://github.com/apache/spark/pull/47491#issuecomment-2251426017 Could you review this PR, @huaxingao ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go t

Re: [PR] [SPARK-48628][CORE] Add task peak on/off heap memory metrics [spark]

2024-07-25 Thread via GitHub
liuzqt commented on PR #47192: URL: https://github.com/apache/spark/pull/47192#issuecomment-2251439038 > Take a look at `peakExecutionMemory` within spark-core. We should be exposing the new metrics as part of the api - both at task level, and at stage level (distributions for ex). W

[PR] Rebased value state [spark]

2024-07-25 Thread via GitHub
jingz-db opened a new pull request, #47492: URL: https://github.com/apache/spark/pull/47492 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How wa

[PR] [SPARK-49008][PYTHON] Use `ParamSpec` to propagate `func` signature in `transform` [spark]

2024-07-25 Thread via GitHub
nicklamiller opened a new pull request, #47493: URL: https://github.com/apache/spark/pull/47493 ### What changes were proposed in this pull request? Propagate function signature of `func` in `DataFrame(...).transform(...)`. ### Why are the changes needed? Propagating the func

Re: [PR] [SPARK-48628][CORE] Add task peak on/off heap memory metrics [spark]

2024-07-25 Thread via GitHub
JoshRosen commented on code in PR #47192: URL: https://github.com/apache/spark/pull/47192#discussion_r1692177886 ## core/src/main/scala/org/apache/spark/executor/TaskMetrics.scala: ## @@ -110,9 +112,22 @@ class TaskMetrics private[spark] () extends Serializable { * joins. Th

Re: [PR] [SPARK-49007][CORE] Improve `MasterPage` to support custom title [spark]

2024-07-25 Thread via GitHub
dongjoon-hyun commented on PR #47491: URL: https://github.com/apache/spark/pull/47491#issuecomment-2251459571 Thank you, @huaxingao ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-48755] State V2 base implementation and ValueState support [spark]

2024-07-25 Thread via GitHub
bogao007 commented on PR #47133: URL: https://github.com/apache/spark/pull/47133#issuecomment-2251495351 Hi @HyukjinKwon, I'm getting below testing errors with my PR: ``` == ERROR [0.255s]: test_termination_sigterm

[PR] [SPARK-49010] Add unit tests for XML schema inference case sensitivity [spark]

2024-07-25 Thread via GitHub
shujingyang-db opened a new pull request, #47494: URL: https://github.com/apache/spark/pull/47494 ### What changes were proposed in this pull request? Currently, XML respects the case sensitivity SQLConf (default to false) in the schema inference but we lack unit tests to veri

[PR] [SPARK-49009][SQL][PYTHON] Make Column APIs and functions accept Enums [spark]

2024-07-25 Thread via GitHub
ueshin opened a new pull request, #47495: URL: https://github.com/apache/spark/pull/47495 ### What changes were proposed in this pull request? Make Column APIs and functions accept `Enum`s. ### Why are the changes needed? `Enum`s can be accepted in Column APIs and functio

Re: [PR] [SPARK-47829][SQL] Text Datasource supports Zstd compression codec [spark]

2024-07-25 Thread via GitHub
github-actions[bot] closed pull request #46026: [SPARK-47829][SQL] Text Datasource supports Zstd compression codec URL: https://github.com/apache/spark/pull/46026 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL ab

Re: [PR] [SPARK-47320][SQL] : The behaviour of Datasets involving self joins is inconsistent, unintuitive, with contradictions [spark]

2024-07-25 Thread via GitHub
github-actions[bot] commented on PR #45446: URL: https://github.com/apache/spark/pull/45446#issuecomment-2251630213 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.

Re: [PR] [SPARK-47279][CORE]When the messageLoop encounter a fatal exception, such as oom, exit the JVM to avoid the driver hanging forever [spark]

2024-07-25 Thread via GitHub
github-actions[bot] commented on PR #45385: URL: https://github.com/apache/spark/pull/45385#issuecomment-2251630277 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.

Re: [PR] [SPARK-47835][NETWORK] Remove switch for remoteReadNioBufferConversion [spark]

2024-07-25 Thread via GitHub
github-actions[bot] closed pull request #46030: [SPARK-47835][NETWORK] Remove switch for remoteReadNioBufferConversion URL: https://github.com/apache/spark/pull/46030 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the UR

Re: [PR] [SPARK-48985][CONNECT] Connect Compatible Expression Constructors [spark]

2024-07-25 Thread via GitHub
hvanhovell commented on code in PR #47464: URL: https://github.com/apache/spark/pull/47464#discussion_r1692311896 ## connect/server/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala: ## @@ -1893,33 +1855,6 @@ class SparkConnectPlanner( val un

Re: [PR] [SPARK-48985][CONNECT] Connect Compatible Expression Constructors [spark]

2024-07-25 Thread via GitHub
hvanhovell commented on code in PR #47464: URL: https://github.com/apache/spark/pull/47464#discussion_r1692314400 ## sql/core/src/main/scala/org/apache/spark/sql/functions.scala: ## @@ -5705,11 +5693,8 @@ object functions { * @group datetime_funcs * @since 3.2.0 */ -

Re: [PR] [SPARK-48985][CONNECT] Connect Compatible Expression Constructors [spark]

2024-07-25 Thread via GitHub
hvanhovell commented on code in PR #47464: URL: https://github.com/apache/spark/pull/47464#discussion_r1692315764 ## sql/core/src/main/scala/org/apache/spark/sql/functions.scala: ## @@ -5743,7 +5728,7 @@ object functions { * @since 3.2.0 */ def session_window(timeColu

Re: [PR] [SPARK-48985][CONNECT] Connect Compatible Expression Constructors [spark]

2024-07-25 Thread via GitHub
hvanhovell commented on code in PR #47464: URL: https://github.com/apache/spark/pull/47464#discussion_r1692320832 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveTimeWindows.scala: ## @@ -257,10 +259,11 @@ object SessionWindowing extends Rule[Logical

Re: [PR] [SPARK-48985][CONNECT] Connect Compatible Expression Constructors [spark]

2024-07-25 Thread via GitHub
hvanhovell commented on code in PR #47464: URL: https://github.com/apache/spark/pull/47464#discussion_r1692311896 ## connect/server/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala: ## @@ -1893,33 +1855,6 @@ class SparkConnectPlanner( val un

Re: [PR] [SPARK-48755] State V2 base implementation and ValueState support [spark]

2024-07-25 Thread via GitHub
HyukjinKwon commented on code in PR #47133: URL: https://github.com/apache/spark/pull/47133#discussion_r1692328985 ## python/pyspark/sql/pandas/group_ops.py: ## @@ -358,6 +364,140 @@ def applyInPandasWithState( ) return DataFrame(jdf, self.session) +def t

Re: [PR] [SPARK-48755] State V2 base implementation and ValueState support [spark]

2024-07-25 Thread via GitHub
HyukjinKwon commented on code in PR #47133: URL: https://github.com/apache/spark/pull/47133#discussion_r1692326848 ## python/pyspark/sql/streaming/stateful_processor.py: ## @@ -0,0 +1,180 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor

Re: [PR] [SPARK-48755] State V2 base implementation and ValueState support [spark]

2024-07-25 Thread via GitHub
HyukjinKwon commented on code in PR #47133: URL: https://github.com/apache/spark/pull/47133#discussion_r1692329891 ## python/pyspark/sql/pandas/group_ops.py: ## @@ -358,6 +364,140 @@ def applyInPandasWithState( ) return DataFrame(jdf, self.session) +def t

Re: [PR] [SPARK-48829][BUILD] Upgrade `RoaringBitmap` to 1.2.1 [spark]

2024-07-25 Thread via GitHub
panbingkun commented on PR #47247: URL: https://github.com/apache/spark/pull/47247#issuecomment-2251747973 > ready to go? Yea, ready for review, thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL ab

Re: [PR] [SPARK-48985][CONNECT] Connect Compatible Expression Constructors [spark]

2024-07-25 Thread via GitHub
hvanhovell commented on code in PR #47464: URL: https://github.com/apache/spark/pull/47464#discussion_r1692345753 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala: ## @@ -767,6 +771,7 @@ object FunctionRegistry { expression[EqualN

Re: [PR] [SPARK-48985][CONNECT] Connect Compatible Expression Constructors [spark]

2024-07-25 Thread via GitHub
hvanhovell commented on code in PR #47464: URL: https://github.com/apache/spark/pull/47464#discussion_r1692345905 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala: ## @@ -519,6 +519,8 @@ object FunctionRegistry { expressionBuilder

Re: [PR] [ONLY TEST][HOLD] Upgrade rocksdbjni to 9.4.0 [spark]

2024-07-25 Thread via GitHub
panbingkun commented on PR #47207: URL: https://github.com/apache/spark/pull/47207#issuecomment-2251754892 > Has there been any new progress on this one Let's wait a little longer, I think version 9.5 should be released soon https://github.com/user-attachments/assets/c31f7463-8222-4

Re: [PR] [SPARK-48755] State V2 base implementation and ValueState support [spark]

2024-07-25 Thread via GitHub
HyukjinKwon commented on PR #47133: URL: https://github.com/apache/spark/pull/47133#issuecomment-2251778289 The test failure doesn't look related to me. Can you reproduce it locally? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to Gi

Re: [PR] [SPARK-49006] Implement purging for OperatorStateMetadataV2 and StateSchemaV3 files [spark]

2024-07-25 Thread via GitHub
anishshri-db commented on code in PR #47490: URL: https://github.com/apache/spark/pull/47490#discussion_r1692363879 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/IncrementalExecution.scala: ## @@ -79,6 +79,38 @@ class IncrementalExecution( StreamingT

Re: [PR] [SPARK-49006] Implement purging for OperatorStateMetadataV2 and StateSchemaV3 files [spark]

2024-07-25 Thread via GitHub
anishshri-db commented on code in PR #47490: URL: https://github.com/apache/spark/pull/47490#discussion_r1692364074 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/OperatorStateMetadata.scala: ## @@ -313,3 +314,89 @@ class OperatorStateMetadataV2Reader(

Re: [PR] [SPARK-49006] Implement purging for OperatorStateMetadataV2 and StateSchemaV3 files [spark]

2024-07-25 Thread via GitHub
anishshri-db commented on code in PR #47490: URL: https://github.com/apache/spark/pull/47490#discussion_r1692364600 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/IncrementalExecution.scala: ## @@ -79,6 +79,38 @@ class IncrementalExecution( StreamingT

Re: [PR] [SPARK-48503][SQL] Allow grouping on expressions in scalar subqueries, if they are bound to outer rows [spark]

2024-07-25 Thread via GitHub
cloud-fan commented on PR #47388: URL: https://github.com/apache/spark/pull/47388#issuecomment-2251791506 thanks, merging to master! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific c

Re: [PR] [SPARK-48503][SQL] Allow grouping on expressions in scalar subqueries, if they are bound to outer rows [spark]

2024-07-25 Thread via GitHub
cloud-fan closed pull request #47388: [SPARK-48503][SQL] Allow grouping on expressions in scalar subqueries, if they are bound to outer rows URL: https://github.com/apache/spark/pull/47388 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

Re: [PR] [SPARK-45787][SQL] Support Catalog.listColumns for clustering columns [spark]

2024-07-25 Thread via GitHub
cloud-fan commented on PR #47451: URL: https://github.com/apache/spark/pull/47451#issuecomment-2251797510 thanks, merging to master! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific c

Re: [PR] [SPARK-45787][SQL] Support Catalog.listColumns for clustering columns [spark]

2024-07-25 Thread via GitHub
cloud-fan closed pull request #47451: [SPARK-45787][SQL] Support Catalog.listColumns for clustering columns URL: https://github.com/apache/spark/pull/47451 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

Re: [PR] [SPARK-49005][K8S][3.5] Use `17-jammy` tag instead of `17` to prevent Python 12 [spark]

2024-07-25 Thread via GitHub
yaooqinn commented on PR #47488: URL: https://github.com/apache/spark/pull/47488#issuecomment-2251851725 Late LGTM -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubs

Re: [PR] [SPARK-49005][K8S][3.4] Use `17-jammy` tag instead of `17-jre` to prevent Python 3.12 [spark]

2024-07-25 Thread via GitHub
yaooqinn commented on PR #47489: URL: https://github.com/apache/spark/pull/47489#issuecomment-2251851929 Late LGTM -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubs

  1   2   >