Re: [PR] [SPARK-48441][SQL] Fix StringTrim behaviour for non-UTF8_BINARY collations [spark]

2024-07-14 Thread via GitHub
uros-db commented on code in PR #46762: URL: https://github.com/apache/spark/pull/46762#discussion_r1677116591 ## common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationAwareUTF8String.java: ## @@ -841,117 +842,255 @@ public static UTF8String translate(final UTF

Re: [PR] [SPARK-48441][SQL] Fix StringTrim behaviour for non-UTF8_BINARY collations [spark]

2024-07-14 Thread via GitHub
uros-db commented on code in PR #46762: URL: https://github.com/apache/spark/pull/46762#discussion_r1677116630 ## common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationAwareUTF8String.java: ## @@ -841,117 +842,255 @@ public static UTF8String translate(final UTF

Re: [PR] [SPARK-48441][SQL] Fix StringTrim behaviour for non-UTF8_BINARY collations [spark]

2024-07-14 Thread via GitHub
uros-db commented on code in PR #46762: URL: https://github.com/apache/spark/pull/46762#discussion_r1677116677 ## common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationAwareUTF8String.java: ## @@ -841,117 +842,255 @@ public static UTF8String translate(final UTF

[PR] [SPARK-48892][ML] Avoid per-row param read in `Tokenizer` [spark]

2024-07-14 Thread via GitHub
zhengruifeng opened a new pull request, #47342: URL: https://github.com/apache/spark/pull/47342 ### What changes were proposed in this pull request? Inspired by https://github.com/apache/spark/pull/47258, I am checking other ML implementations, and find that we can also optimize `Tokenize

[PR] [SPARK-48893][SQL][DOCS] Add some examples for `linearRegression` built-in functions [spark]

2024-07-14 Thread via GitHub
wayneguow opened a new pull request, #47343: URL: https://github.com/apache/spark/pull/47343 ### What changes were proposed in this pull request? This PR aims to add some extra examples for `linearRegression` built-in functions. ### Why are the changes needed?

[PR] [SPARK-48894][TESTS] Upgrade `docker-java` to 3.4.0 [spark]

2024-07-14 Thread via GitHub
wayneguow opened a new pull request, #47344: URL: https://github.com/apache/spark/pull/47344 ### What changes were proposed in this pull request? This PR aims to upgrade `docker-java` to 3.4.0. ### Why are the changes needed? There some improvements, such as:

Re: [PR] [SPARK-48700] [SQL] Mode expression for complex types (all collations) [spark]

2024-07-14 Thread via GitHub
uros-db commented on code in PR #47154: URL: https://github.com/apache/spark/pull/47154#discussion_r1677169283 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Mode.scala: ## @@ -106,11 +155,13 @@ case class Mode( val collationAwareBuffer =

Re: [PR] [SPARK-48700] [SQL] Mode expression for complex types (all collations) [spark]

2024-07-14 Thread via GitHub
uros-db commented on code in PR #47154: URL: https://github.com/apache/spark/pull/47154#discussion_r1677169498 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Mode.scala: ## @@ -106,11 +155,13 @@ case class Mode( val collationAwareBuffer =

[PR] [SPARK-44728][PYTHON][DOCS][FOLLOWUP] Add a missing param doc in python api `partitioning` functions docs. [spark]

2024-07-14 Thread via GitHub
wayneguow opened a new pull request, #47345: URL: https://github.com/apache/spark/pull/47345 ### What changes were proposed in this pull request? Add a missing param in func docs of `partitioning.py`. ### Why are the changes needed? - Make python api docs bett

Re: [PR] [SPARK-48700] [SQL] Mode expression for complex types (all collations) [spark]

2024-07-14 Thread via GitHub
uros-db commented on code in PR #47154: URL: https://github.com/apache/spark/pull/47154#discussion_r1677169573 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Mode.scala: ## @@ -86,6 +75,66 @@ case class Mode( buffer } + private def

Re: [PR] [SPARK-48883][ML][R] Replace RDD read / write API invocation with Dataframe read / write API [spark]

2024-07-14 Thread via GitHub
dongjoon-hyun commented on PR #47328: URL: https://github.com/apache/spark/pull/47328#issuecomment-2227419001 Thank you for reverting this, Hyukjin. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go t

Re: [PR] [SPARK-48883][ML][R] Replace RDD read / write API invocation with Dataframe read / write API [spark]

2024-07-14 Thread via GitHub
dongjoon-hyun commented on PR #47341: URL: https://github.com/apache/spark/pull/47341#issuecomment-2227419448 Thank you, @HyukjinKwon and @zhengruifeng . In the PR description, could you add specific JIRA issue links for the following ? > In order to leverage Catalyst optimizer and

Re: [PR] [SPARK-47307][SQL][3.5] Add a config to optionally chunk base64 strings [spark]

2024-07-14 Thread via GitHub
dongjoon-hyun commented on code in PR #47325: URL: https://github.com/apache/spark/pull/47325#discussion_r1677194257 ## sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala: ## @@ -3229,6 +3229,15 @@ object SQLConf { .booleanConf .createWithDefault(f

Re: [PR] [SPARK-47307][SQL][3.5] Add a config to optionally chunk base64 strings [spark]

2024-07-14 Thread via GitHub
dongjoon-hyun commented on PR #47325: URL: https://github.com/apache/spark/pull/47325#issuecomment-2227463911 However, SPARK-48658 was merged as an improvement JIRA, @yaooqinn . Do you mean we need to convert it as a bug fix? ![Screenshot 2024-07-14 at 12 45 23](https://github.com/us

Re: [PR] [SPARK-47307][SQL][3.5] Add a config to optionally chunk base64 strings [spark]

2024-07-14 Thread via GitHub
dongjoon-hyun commented on PR #47325: URL: https://github.com/apache/spark/pull/47325#issuecomment-2227464217 If we need to change the issue type, please comment on your initial PR to get a consensus. - https://github.com/apache/spark/pull/47017 . -- This is an automated message from t

Re: [PR] [SPARK-48883][ML][R] Replace RDD read / write API invocation with Dataframe read / write API [spark]

2024-07-14 Thread via GitHub
dongjoon-hyun commented on code in PR #47341: URL: https://github.com/apache/spark/pull/47341#discussion_r1677196178 ## mllib/src/main/scala/org/apache/spark/ml/util/ReadWrite.scala: ## @@ -411,7 +411,10 @@ private[ml] object DefaultParamsWriter { paramMap: Option[JValue]

Re: [PR] [SPARK-48879][SQL] Expand the charset list with Chinese Standard Charsets [spark]

2024-07-14 Thread via GitHub
dongjoon-hyun commented on code in PR #47320: URL: https://github.com/apache/spark/pull/47320#discussion_r1677197220 ## connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/functions.scala: ## @@ -3840,8 +3840,8 @@ object functions { /** * Computes the first

[PR] [SPARK-XXXX]Adding `column` and `functions` packages. [spark-connect-go]

2024-07-14 Thread via GitHub
grundprinzip opened a new pull request, #35: URL: https://github.com/apache/spark-connect-go/pull/35 ### What changes were proposed in this pull request? This patch provides additional base capabilities that are needed to parallelize development more by adding very skeleton behavior for t

[PR] [SPARK-48895][R][INFRA] Use R 4.4.1 in `windows` R GitHub Action job [spark]

2024-07-14 Thread via GitHub
dongjoon-hyun opened a new pull request, #47346: URL: https://github.com/apache/spark/pull/47346 … ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change?

Re: [PR] [SPARK-48382]Add controller / reconciler module to operator [spark-kubernetes-operator]

2024-07-14 Thread via GitHub
dongjoon-hyun commented on code in PR #12: URL: https://github.com/apache/spark-kubernetes-operator/pull/12#discussion_r1677217454 ## spark-operator/src/main/java/org/apache/spark/k8s/operator/SparkOperator.java: ## @@ -0,0 +1,225 @@ +/* + * Licensed to the Apache Software Foun

Re: [PR] [SPARK-48382]Add controller / reconciler module to operator [spark-kubernetes-operator]

2024-07-14 Thread via GitHub
dongjoon-hyun commented on code in PR #12: URL: https://github.com/apache/spark-kubernetes-operator/pull/12#discussion_r1677217641 ## spark-operator/src/main/java/org/apache/spark/k8s/operator/SparkOperator.java: ## @@ -0,0 +1,225 @@ +/* + * Licensed to the Apache Software Foun

Re: [PR] [SPARK-48382]Add controller / reconciler module to operator [spark-kubernetes-operator]

2024-07-14 Thread via GitHub
dongjoon-hyun commented on code in PR #12: URL: https://github.com/apache/spark-kubernetes-operator/pull/12#discussion_r1677217641 ## spark-operator/src/main/java/org/apache/spark/k8s/operator/SparkOperator.java: ## @@ -0,0 +1,225 @@ +/* + * Licensed to the Apache Software Foun

Re: [PR] [SPARK-48382]Add controller / reconciler module to operator [spark-kubernetes-operator]

2024-07-14 Thread via GitHub
dongjoon-hyun commented on code in PR #12: URL: https://github.com/apache/spark-kubernetes-operator/pull/12#discussion_r1677217877 ## spark-operator/src/main/java/org/apache/spark/k8s/operator/SparkOperator.java: ## @@ -0,0 +1,225 @@ +/* + * Licensed to the Apache Software Foun

Re: [PR] [SPARK-48382]Add controller / reconciler module to operator [spark-kubernetes-operator]

2024-07-14 Thread via GitHub
dongjoon-hyun commented on code in PR #12: URL: https://github.com/apache/spark-kubernetes-operator/pull/12#discussion_r1677217977 ## spark-operator/src/main/java/org/apache/spark/k8s/operator/SparkOperator.java: ## @@ -0,0 +1,225 @@ +/* + * Licensed to the Apache Software Foun

Re: [PR] [SPARK-48382]Add controller / reconciler module to operator [spark-kubernetes-operator]

2024-07-14 Thread via GitHub
dongjoon-hyun commented on code in PR #12: URL: https://github.com/apache/spark-kubernetes-operator/pull/12#discussion_r1677218036 ## spark-operator/src/main/java/org/apache/spark/k8s/operator/SparkOperator.java: ## @@ -0,0 +1,225 @@ +/* + * Licensed to the Apache Software Foun

Re: [PR] [SPARK-48382]Add controller / reconciler module to operator [spark-kubernetes-operator]

2024-07-14 Thread via GitHub
dongjoon-hyun commented on code in PR #12: URL: https://github.com/apache/spark-kubernetes-operator/pull/12#discussion_r1677218778 ## spark-operator/src/main/java/org/apache/spark/k8s/operator/SparkOperator.java: ## @@ -0,0 +1,225 @@ +/* + * Licensed to the Apache Software Foun

Re: [PR] [SPARK-48382]Add controller / reconciler module to operator [spark-kubernetes-operator]

2024-07-14 Thread via GitHub
dongjoon-hyun commented on code in PR #12: URL: https://github.com/apache/spark-kubernetes-operator/pull/12#discussion_r1677219991 ## spark-operator/src/main/java/org/apache/spark/k8s/operator/client/RetryInterceptor.java: ## @@ -0,0 +1,106 @@ +/* + * Licensed to the Apache Sof

Re: [PR] [SPARK-48382]Add controller / reconciler module to operator [spark-kubernetes-operator]

2024-07-14 Thread via GitHub
dongjoon-hyun commented on code in PR #12: URL: https://github.com/apache/spark-kubernetes-operator/pull/12#discussion_r1677221277 ## spark-operator/src/main/java/org/apache/spark/k8s/operator/config/SparkOperatorConf.java: ## @@ -0,0 +1,429 @@ +/* + * Licensed to the Apache So

Re: [PR] [SPARK-48382]Add controller / reconciler module to operator [spark-kubernetes-operator]

2024-07-14 Thread via GitHub
dongjoon-hyun commented on code in PR #12: URL: https://github.com/apache/spark-kubernetes-operator/pull/12#discussion_r1677221410 ## spark-operator/src/main/java/org/apache/spark/k8s/operator/config/SparkOperatorConf.java: ## @@ -0,0 +1,429 @@ +/* + * Licensed to the Apache So

Re: [PR] [SPARK-48382]Add controller / reconciler module to operator [spark-kubernetes-operator]

2024-07-14 Thread via GitHub
dongjoon-hyun commented on code in PR #12: URL: https://github.com/apache/spark-kubernetes-operator/pull/12#discussion_r1677221410 ## spark-operator/src/main/java/org/apache/spark/k8s/operator/config/SparkOperatorConf.java: ## @@ -0,0 +1,429 @@ +/* + * Licensed to the Apache So

Re: [PR] [SPARK-48382]Add controller / reconciler module to operator [spark-kubernetes-operator]

2024-07-14 Thread via GitHub
dongjoon-hyun commented on code in PR #12: URL: https://github.com/apache/spark-kubernetes-operator/pull/12#discussion_r1677221687 ## spark-operator/src/main/java/org/apache/spark/k8s/operator/config/SparkOperatorConf.java: ## @@ -0,0 +1,429 @@ +/* + * Licensed to the Apache So

Re: [PR] [SPARK-48382]Add controller / reconciler module to operator [spark-kubernetes-operator]

2024-07-14 Thread via GitHub
dongjoon-hyun commented on code in PR #12: URL: https://github.com/apache/spark-kubernetes-operator/pull/12#discussion_r1677221774 ## spark-operator/src/main/java/org/apache/spark/k8s/operator/config/SparkOperatorConf.java: ## @@ -0,0 +1,429 @@ +/* + * Licensed to the Apache So

Re: [PR] [SPARK-48382]Add controller / reconciler module to operator [spark-kubernetes-operator]

2024-07-14 Thread via GitHub
dongjoon-hyun commented on code in PR #12: URL: https://github.com/apache/spark-kubernetes-operator/pull/12#discussion_r1677222082 ## spark-operator/src/main/java/org/apache/spark/k8s/operator/config/SparkOperatorConf.java: ## @@ -0,0 +1,429 @@ +/* + * Licensed to the Apache So

Re: [PR] [SPARK-48382]Add controller / reconciler module to operator [spark-kubernetes-operator]

2024-07-14 Thread via GitHub
dongjoon-hyun commented on code in PR #12: URL: https://github.com/apache/spark-kubernetes-operator/pull/12#discussion_r167742 ## spark-operator/src/main/java/org/apache/spark/k8s/operator/config/SparkOperatorConf.java: ## @@ -0,0 +1,429 @@ +/* + * Licensed to the Apache So

Re: [PR] [SPARK-48382]Add controller / reconciler module to operator [spark-kubernetes-operator]

2024-07-14 Thread via GitHub
dongjoon-hyun commented on code in PR #12: URL: https://github.com/apache/spark-kubernetes-operator/pull/12#discussion_r1677222344 ## spark-operator/src/main/java/org/apache/spark/k8s/operator/config/SparkOperatorConf.java: ## @@ -0,0 +1,429 @@ +/* + * Licensed to the Apache So

Re: [PR] [SPARK-48382]Add controller / reconciler module to operator [spark-kubernetes-operator]

2024-07-14 Thread via GitHub
dongjoon-hyun commented on code in PR #12: URL: https://github.com/apache/spark-kubernetes-operator/pull/12#discussion_r1677218778 ## spark-operator/src/main/java/org/apache/spark/k8s/operator/SparkOperator.java: ## @@ -0,0 +1,225 @@ +/* + * Licensed to the Apache Software Foun

Re: [PR] [SPARK-48382]Add controller / reconciler module to operator [spark-kubernetes-operator]

2024-07-14 Thread via GitHub
dongjoon-hyun commented on code in PR #12: URL: https://github.com/apache/spark-kubernetes-operator/pull/12#discussion_r1677222621 ## spark-operator/src/main/java/org/apache/spark/k8s/operator/config/SparkOperatorConf.java: ## @@ -0,0 +1,429 @@ +/* + * Licensed to the Apache So

Re: [PR] [SPARK-48382]Add controller / reconciler module to operator [spark-kubernetes-operator]

2024-07-14 Thread via GitHub
dongjoon-hyun commented on code in PR #12: URL: https://github.com/apache/spark-kubernetes-operator/pull/12#discussion_r1677222819 ## spark-operator/src/main/java/org/apache/spark/k8s/operator/config/SparkOperatorConf.java: ## @@ -0,0 +1,429 @@ +/* + * Licensed to the Apache So

Re: [PR] [SPARK-48382]Add controller / reconciler module to operator [spark-kubernetes-operator]

2024-07-14 Thread via GitHub
dongjoon-hyun commented on code in PR #12: URL: https://github.com/apache/spark-kubernetes-operator/pull/12#discussion_r1677222871 ## spark-operator/src/main/java/org/apache/spark/k8s/operator/config/SparkOperatorConf.java: ## @@ -0,0 +1,429 @@ +/* + * Licensed to the Apache So

Re: [PR] [SPARK-48382]Add controller / reconciler module to operator [spark-kubernetes-operator]

2024-07-14 Thread via GitHub
dongjoon-hyun commented on code in PR #12: URL: https://github.com/apache/spark-kubernetes-operator/pull/12#discussion_r1677223415 ## spark-operator/src/main/java/org/apache/spark/k8s/operator/config/SparkOperatorConf.java: ## @@ -0,0 +1,429 @@ +/* + * Licensed to the Apache So

Re: [PR] [SPARK-48382]Add controller / reconciler module to operator [spark-kubernetes-operator]

2024-07-14 Thread via GitHub
dongjoon-hyun commented on code in PR #12: URL: https://github.com/apache/spark-kubernetes-operator/pull/12#discussion_r1677223517 ## spark-operator/src/main/java/org/apache/spark/k8s/operator/config/SparkOperatorConf.java: ## @@ -0,0 +1,429 @@ +/* + * Licensed to the Apache So

Re: [PR] [SPARK-48382]Add controller / reconciler module to operator [spark-kubernetes-operator]

2024-07-14 Thread via GitHub
dongjoon-hyun commented on code in PR #12: URL: https://github.com/apache/spark-kubernetes-operator/pull/12#discussion_r1677223707 ## spark-operator/src/main/java/org/apache/spark/k8s/operator/config/SparkOperatorConf.java: ## @@ -0,0 +1,429 @@ +/* + * Licensed to the Apache So

Re: [PR] [SPARK-48382]Add controller / reconciler module to operator [spark-kubernetes-operator]

2024-07-14 Thread via GitHub
dongjoon-hyun commented on code in PR #12: URL: https://github.com/apache/spark-kubernetes-operator/pull/12#discussion_r1677223767 ## spark-operator/src/main/java/org/apache/spark/k8s/operator/config/SparkOperatorConf.java: ## @@ -0,0 +1,429 @@ +/* + * Licensed to the Apache So

Re: [PR] [SPARK-48382]Add controller / reconciler module to operator [spark-kubernetes-operator]

2024-07-14 Thread via GitHub
dongjoon-hyun commented on code in PR #12: URL: https://github.com/apache/spark-kubernetes-operator/pull/12#discussion_r1677223887 ## spark-operator/src/main/java/org/apache/spark/k8s/operator/config/SparkOperatorConf.java: ## @@ -0,0 +1,429 @@ +/* + * Licensed to the Apache So

Re: [PR] [SPARK-48382]Add controller / reconciler module to operator [spark-kubernetes-operator]

2024-07-14 Thread via GitHub
dongjoon-hyun commented on code in PR #12: URL: https://github.com/apache/spark-kubernetes-operator/pull/12#discussion_r1677224052 ## spark-operator/src/main/java/org/apache/spark/k8s/operator/config/SparkOperatorConf.java: ## @@ -0,0 +1,429 @@ +/* + * Licensed to the Apache So

Re: [PR] [SPARK-48382]Add controller / reconciler module to operator [spark-kubernetes-operator]

2024-07-14 Thread via GitHub
dongjoon-hyun commented on code in PR #12: URL: https://github.com/apache/spark-kubernetes-operator/pull/12#discussion_r1677224380 ## spark-operator/src/main/java/org/apache/spark/k8s/operator/config/SparkOperatorConf.java: ## @@ -0,0 +1,429 @@ +/* + * Licensed to the Apache So

Re: [PR] [SPARK-48382]Add controller / reconciler module to operator [spark-kubernetes-operator]

2024-07-14 Thread via GitHub
dongjoon-hyun commented on code in PR #12: URL: https://github.com/apache/spark-kubernetes-operator/pull/12#discussion_r1677224380 ## spark-operator/src/main/java/org/apache/spark/k8s/operator/config/SparkOperatorConf.java: ## @@ -0,0 +1,429 @@ +/* + * Licensed to the Apache So

Re: [PR] [SPARK-48382]Add controller / reconciler module to operator [spark-kubernetes-operator]

2024-07-14 Thread via GitHub
dongjoon-hyun commented on code in PR #12: URL: https://github.com/apache/spark-kubernetes-operator/pull/12#discussion_r1677225172 ## spark-operator/src/main/java/org/apache/spark/k8s/operator/config/SparkOperatorConf.java: ## @@ -0,0 +1,429 @@ +/* + * Licensed to the Apache So

Re: [PR] [SPARK-48382]Add controller / reconciler module to operator [spark-kubernetes-operator]

2024-07-14 Thread via GitHub
dongjoon-hyun commented on code in PR #12: URL: https://github.com/apache/spark-kubernetes-operator/pull/12#discussion_r1677225772 ## spark-operator/src/main/java/org/apache/spark/k8s/operator/config/SparkOperatorConf.java: ## @@ -0,0 +1,429 @@ +/* + * Licensed to the Apache So

Re: [PR] [SPARK-48382]Add controller / reconciler module to operator [spark-kubernetes-operator]

2024-07-14 Thread via GitHub
dongjoon-hyun commented on code in PR #12: URL: https://github.com/apache/spark-kubernetes-operator/pull/12#discussion_r1677225772 ## spark-operator/src/main/java/org/apache/spark/k8s/operator/config/SparkOperatorConf.java: ## @@ -0,0 +1,429 @@ +/* + * Licensed to the Apache So

Re: [PR] [SPARK-48382]Add controller / reconciler module to operator [spark-kubernetes-operator]

2024-07-14 Thread via GitHub
dongjoon-hyun commented on code in PR #12: URL: https://github.com/apache/spark-kubernetes-operator/pull/12#discussion_r1677226181 ## spark-operator/src/main/java/org/apache/spark/k8s/operator/config/SparkOperatorConf.java: ## @@ -0,0 +1,429 @@ +/* + * Licensed to the Apache So

Re: [PR] [SPARK-48895][R][INFRA] Use R 4.4.1 in `windows` R GitHub Action job [spark]

2024-07-14 Thread via GitHub
HyukjinKwon commented on PR #47346: URL: https://github.com/apache/spark/pull/47346#issuecomment-2227518425 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] [SPARK-48382]Add controller / reconciler module to operator [spark-kubernetes-operator]

2024-07-14 Thread via GitHub
dongjoon-hyun commented on code in PR #12: URL: https://github.com/apache/spark-kubernetes-operator/pull/12#discussion_r1677226325 ## spark-operator/src/main/java/org/apache/spark/k8s/operator/config/SparkOperatorConf.java: ## @@ -0,0 +1,429 @@ +/* + * Licensed to the Apache So

Re: [PR] [SPARK-48895][R][INFRA] Use R 4.4.1 in `windows` R GitHub Action job [spark]

2024-07-14 Thread via GitHub
HyukjinKwon closed pull request #47346: [SPARK-48895][R][INFRA] Use R 4.4.1 in `windows` R GitHub Action job URL: https://github.com/apache/spark/pull/47346 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

Re: [PR] [SPARK-48714][SPARK-48794][FOLLOW-UP][PYTHON][DOCS] Add `mergeInto` to API reference [spark]

2024-07-14 Thread via GitHub
HyukjinKwon commented on PR #47329: URL: https://github.com/apache/spark/pull/47329#issuecomment-2227518543 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] [SPARK-48714][SPARK-48794][FOLLOW-UP][PYTHON][DOCS] Add `mergeInto` to API reference [spark]

2024-07-14 Thread via GitHub
HyukjinKwon closed pull request #47329: [SPARK-48714][SPARK-48794][FOLLOW-UP][PYTHON][DOCS] Add `mergeInto` to API reference URL: https://github.com/apache/spark/pull/47329 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

Re: [PR] [SPARK-48382]Add controller / reconciler module to operator [spark-kubernetes-operator]

2024-07-14 Thread via GitHub
dongjoon-hyun commented on code in PR #12: URL: https://github.com/apache/spark-kubernetes-operator/pull/12#discussion_r1677224380 ## spark-operator/src/main/java/org/apache/spark/k8s/operator/config/SparkOperatorConf.java: ## @@ -0,0 +1,429 @@ +/* + * Licensed to the Apache So

Re: [PR] [SPARK-48382]Add controller / reconciler module to operator [spark-kubernetes-operator]

2024-07-14 Thread via GitHub
dongjoon-hyun commented on code in PR #12: URL: https://github.com/apache/spark-kubernetes-operator/pull/12#discussion_r1677226607 ## spark-operator/src/main/java/org/apache/spark/k8s/operator/config/SparkOperatorConf.java: ## @@ -0,0 +1,429 @@ +/* + * Licensed to the Apache So

Re: [PR] [SPARK-48382]Add controller / reconciler module to operator [spark-kubernetes-operator]

2024-07-14 Thread via GitHub
dongjoon-hyun commented on code in PR #12: URL: https://github.com/apache/spark-kubernetes-operator/pull/12#discussion_r1677226653 ## spark-operator/src/main/java/org/apache/spark/k8s/operator/config/SparkOperatorConf.java: ## @@ -0,0 +1,429 @@ +/* + * Licensed to the Apache So

Re: [PR] [SPARK-48382]Add controller / reconciler module to operator [spark-kubernetes-operator]

2024-07-14 Thread via GitHub
dongjoon-hyun commented on code in PR #12: URL: https://github.com/apache/spark-kubernetes-operator/pull/12#discussion_r1677226950 ## spark-operator/src/main/java/org/apache/spark/k8s/operator/config/SparkOperatorConf.java: ## @@ -0,0 +1,429 @@ +/* + * Licensed to the Apache So

Re: [PR] [SPARK-48883][ML][R] Replace RDD read / write API invocation with Dataframe read / write API [spark]

2024-07-14 Thread via GitHub
HyukjinKwon commented on code in PR #47341: URL: https://github.com/apache/spark/pull/47341#discussion_r1677226969 ## mllib/src/main/scala/org/apache/spark/ml/util/ReadWrite.scala: ## @@ -411,7 +411,10 @@ private[ml] object DefaultParamsWriter { paramMap: Option[JValue] =

Re: [PR] [SPARK-48382]Add controller / reconciler module to operator [spark-kubernetes-operator]

2024-07-14 Thread via GitHub
dongjoon-hyun commented on code in PR #12: URL: https://github.com/apache/spark-kubernetes-operator/pull/12#discussion_r1677226950 ## spark-operator/src/main/java/org/apache/spark/k8s/operator/config/SparkOperatorConf.java: ## @@ -0,0 +1,429 @@ +/* + * Licensed to the Apache So

Re: [PR] [SPARK-48883][ML][R] Replace RDD read / write API invocation with Dataframe read / write API [spark]

2024-07-14 Thread via GitHub
HyukjinKwon commented on PR #47341: URL: https://github.com/apache/spark/pull/47341#issuecomment-2227519717 Addressed all 👍 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] [SPARK-48382]Add controller / reconciler module to operator [spark-kubernetes-operator]

2024-07-14 Thread via GitHub
dongjoon-hyun commented on code in PR #12: URL: https://github.com/apache/spark-kubernetes-operator/pull/12#discussion_r1677227113 ## spark-operator/src/main/java/org/apache/spark/k8s/operator/config/SparkOperatorConf.java: ## @@ -0,0 +1,429 @@ +/* + * Licensed to the Apache So

Re: [PR] [SPARK-48382]Add controller / reconciler module to operator [spark-kubernetes-operator]

2024-07-14 Thread via GitHub
dongjoon-hyun commented on code in PR #12: URL: https://github.com/apache/spark-kubernetes-operator/pull/12#discussion_r1677227481 ## spark-operator/src/main/java/org/apache/spark/k8s/operator/context/SparkAppContext.java: ## @@ -0,0 +1,110 @@ +/* + * Licensed to the Apache Sof

Re: [PR] [SPARK-48382]Add controller / reconciler module to operator [spark-kubernetes-operator]

2024-07-14 Thread via GitHub
dongjoon-hyun commented on code in PR #12: URL: https://github.com/apache/spark-kubernetes-operator/pull/12#discussion_r1677227405 ## spark-operator/src/main/java/org/apache/spark/k8s/operator/context/SparkAppContext.java: ## @@ -0,0 +1,110 @@ +/* + * Licensed to the Apache Sof

Re: [PR] [SPARK-48382]Add controller / reconciler module to operator [spark-kubernetes-operator]

2024-07-14 Thread via GitHub
dongjoon-hyun commented on code in PR #12: URL: https://github.com/apache/spark-kubernetes-operator/pull/12#discussion_r1677228212 ## spark-operator/src/main/java/org/apache/spark/k8s/operator/context/SparkAppContext.java: ## @@ -0,0 +1,110 @@ +/* + * Licensed to the Apache Sof

Re: [PR] [SPARK-48382]Add controller / reconciler module to operator [spark-kubernetes-operator]

2024-07-14 Thread via GitHub
dongjoon-hyun commented on code in PR #12: URL: https://github.com/apache/spark-kubernetes-operator/pull/12#discussion_r1677228487 ## spark-operator/src/main/java/org/apache/spark/k8s/operator/config/SparkOperatorConf.java: ## @@ -0,0 +1,429 @@ +/* + * Licensed to the Apache So

Re: [PR] [SPARK-48382]Add controller / reconciler module to operator [spark-kubernetes-operator]

2024-07-14 Thread via GitHub
dongjoon-hyun commented on code in PR #12: URL: https://github.com/apache/spark-kubernetes-operator/pull/12#discussion_r1677228212 ## spark-operator/src/main/java/org/apache/spark/k8s/operator/context/SparkAppContext.java: ## @@ -0,0 +1,110 @@ +/* + * Licensed to the Apache Sof

Re: [PR] [SPARK-48883][ML][R] Replace RDD read / write API invocation with Dataframe read / write API [spark]

2024-07-14 Thread via GitHub
HyukjinKwon commented on PR #47341: URL: https://github.com/apache/spark/pull/47341#issuecomment-2227522251 Separated PR to https://github.com/apache/spark/pull/47347. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use th

Re: [PR] [SPARK-48896][ML][MLLIB] Avoid repartition when writing out the metadata [spark]

2024-07-14 Thread via GitHub
HyukjinKwon commented on PR #47347: URL: https://github.com/apache/spark/pull/47347#issuecomment-2227522284 cc @WeichenXu123 @zhengruifeng -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spe

Re: [PR] [SPARK-48382]Add controller / reconciler module to operator [spark-kubernetes-operator]

2024-07-14 Thread via GitHub
dongjoon-hyun commented on code in PR #12: URL: https://github.com/apache/spark-kubernetes-operator/pull/12#discussion_r1677230903 ## spark-operator/src/main/java/org/apache/spark/k8s/operator/decorators/DriverResourceDecorator.java: ## @@ -0,0 +1,67 @@ +/* + * Licensed to the

Re: [PR] [SPARK-48613][SQL] SPJ: Support auto-shuffle one side + less join keys than partition keys [spark]

2024-07-14 Thread via GitHub
sunchao closed pull request #47064: [SPARK-48613][SQL] SPJ: Support auto-shuffle one side + less join keys than partition keys URL: https://github.com/apache/spark/pull/47064 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and us

Re: [PR] [SPARK-48382]Add controller / reconciler module to operator [spark-kubernetes-operator]

2024-07-14 Thread via GitHub
dongjoon-hyun commented on code in PR #12: URL: https://github.com/apache/spark-kubernetes-operator/pull/12#discussion_r1677230957 ## spark-operator/src/main/java/org/apache/spark/k8s/operator/decorators/DriverResourceDecorator.java: ## @@ -0,0 +1,67 @@ +/* + * Licensed to the

Re: [PR] [SPARK-48613][SQL] SPJ: Support auto-shuffle one side + less join keys than partition keys [spark]

2024-07-14 Thread via GitHub
sunchao commented on PR #47064: URL: https://github.com/apache/spark/pull/47064#issuecomment-2227524570 Merge to master. Thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comme

Re: [PR] [SPARK-48382]Add controller / reconciler module to operator [spark-kubernetes-operator]

2024-07-14 Thread via GitHub
dongjoon-hyun commented on code in PR #12: URL: https://github.com/apache/spark-kubernetes-operator/pull/12#discussion_r1677231159 ## spark-operator/src/main/java/org/apache/spark/k8s/operator/metrics/JVMMetricSet.java: ## @@ -0,0 +1,67 @@ +/* + * Licensed to the Apache Softwar

Re: [PR] [SPARK-48382]Add controller / reconciler module to operator [spark-kubernetes-operator]

2024-07-14 Thread via GitHub
dongjoon-hyun commented on code in PR #12: URL: https://github.com/apache/spark-kubernetes-operator/pull/12#discussion_r1677231470 ## spark-operator/src/main/java/org/apache/spark/k8s/operator/metrics/JVMMetricSet.java: ## @@ -0,0 +1,67 @@ +/* + * Licensed to the Apache Softwar

Re: [PR] [SPARK-48382]Add controller / reconciler module to operator [spark-kubernetes-operator]

2024-07-14 Thread via GitHub
dongjoon-hyun commented on code in PR #12: URL: https://github.com/apache/spark-kubernetes-operator/pull/12#discussion_r1677231870 ## spark-operator/src/main/java/org/apache/spark/k8s/operator/metrics/MetricsService.java: ## @@ -0,0 +1,56 @@ +/* + * Licensed to the Apache Softw

Re: [PR] [SPARK-48382]Add controller / reconciler module to operator [spark-kubernetes-operator]

2024-07-14 Thread via GitHub
dongjoon-hyun commented on code in PR #12: URL: https://github.com/apache/spark-kubernetes-operator/pull/12#discussion_r1677232108 ## spark-operator/src/main/java/org/apache/spark/k8s/operator/metrics/MetricsService.java: ## @@ -0,0 +1,56 @@ +/* + * Licensed to the Apache Softw

Re: [PR] [SPARK-48382]Add controller / reconciler module to operator [spark-kubernetes-operator]

2024-07-14 Thread via GitHub
dongjoon-hyun commented on code in PR #12: URL: https://github.com/apache/spark-kubernetes-operator/pull/12#discussion_r1677232430 ## spark-operator/src/main/java/org/apache/spark/k8s/operator/metrics/MetricsSystem.java: ## @@ -0,0 +1,139 @@ +/* + * Licensed to the Apache Softw

Re: [PR] [SPARK-48382]Add controller / reconciler module to operator [spark-kubernetes-operator]

2024-07-14 Thread via GitHub
dongjoon-hyun commented on code in PR #12: URL: https://github.com/apache/spark-kubernetes-operator/pull/12#discussion_r1677232577 ## spark-operator/src/main/java/org/apache/spark/k8s/operator/metrics/MetricsSystem.java: ## @@ -0,0 +1,139 @@ +/* + * Licensed to the Apache Softw

Re: [PR] [SPARK-48382]Add controller / reconciler module to operator [spark-kubernetes-operator]

2024-07-14 Thread via GitHub
dongjoon-hyun commented on code in PR #12: URL: https://github.com/apache/spark-kubernetes-operator/pull/12#discussion_r1677233296 ## spark-operator/src/main/java/org/apache/spark/k8s/operator/metrics/MetricsSystemFactory.java: ## @@ -0,0 +1,104 @@ +/* + * Licensed to the Apach

Re: [PR] [SPARK-48382]Add controller / reconciler module to operator [spark-kubernetes-operator]

2024-07-14 Thread via GitHub
dongjoon-hyun commented on PR #12: URL: https://github.com/apache/spark-kubernetes-operator/pull/12#issuecomment-2227535340 I finished this round review because there are too many. Could you address some? -- This is an automated message from the Apache Git Service. To respond to the mess

Re: [PR] [SPARK-47702][CORE] Remove Shuffle service endpoint from the locations list when RDD block is removed form a node. [spark]

2024-07-14 Thread via GitHub
github-actions[bot] commented on PR #45836: URL: https://github.com/apache/spark/pull/45836#issuecomment-2227537097 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.

Re: [PR] [SPARK-21195][CORE] Dynamically register metrics from sources as they are reported [spark]

2024-07-14 Thread via GitHub
github-actions[bot] closed pull request #45883: [SPARK-21195][CORE] Dynamically register metrics from sources as they are reported URL: https://github.com/apache/spark/pull/45883 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub an

Re: [PR] [SPARK-27950][DSTREAM] Configurable dynamodb url so kinesis-asl works with localstack [spark]

2024-07-14 Thread via GitHub
github-actions[bot] closed pull request #45619: [SPARK-27950][DSTREAM] Configurable dynamodb url so kinesis-asl works with localstack URL: https://github.com/apache/spark/pull/45619 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

Re: [PR] [SPARK-48896][ML][MLLIB] Avoid repartition when writing out the metadata [spark]

2024-07-14 Thread via GitHub
HyukjinKwon commented on code in PR #47347: URL: https://github.com/apache/spark/pull/47347#discussion_r1677237810 ## mllib/src/main/scala/org/apache/spark/mllib/classification/NaiveBayes.scala: ## @@ -198,10 +198,10 @@ object NaiveBayesModel extends Loader[NaiveBayesModel] {

Re: [PR] [SPARK-48880][CORE] Avoid throw NullPointerException if driver plugin fails to initialize [spark]

2024-07-14 Thread via GitHub
ulysses-you commented on PR #47321: URL: https://github.com/apache/spark/pull/47321#issuecomment-2227560346 @yaooqinn it does not require touch spark internal classes in driver plugin. This error happened if we start spark-shell using `SparkConnectPlugin` but failed to bind address(The 150

Re: [PR] [SPARK-48834][SQL] Disable variant input/output to python scalar UDFs, UDTFs, UDAFs during query compilation [spark]

2024-07-14 Thread via GitHub
HyukjinKwon commented on PR #47253: URL: https://github.com/apache/spark/pull/47253#issuecomment-2227593424 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] [SPARK-48834][SQL] Disable variant input/output to python scalar UDFs, UDTFs, UDAFs during query compilation [spark]

2024-07-14 Thread via GitHub
HyukjinKwon closed pull request #47253: [SPARK-48834][SQL] Disable variant input/output to python scalar UDFs, UDTFs, UDAFs during query compilation URL: https://github.com/apache/spark/pull/47253 -- This is an automated message from the Apache Git Service. To respond to the message, please l

Re: [PR] [SPARK-48879][SQL] Expand the charset list with Chinese Standard Charsets [spark]

2024-07-14 Thread via GitHub
yaooqinn commented on code in PR #47320: URL: https://github.com/apache/spark/pull/47320#discussion_r1677265633 ## connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/functions.scala: ## @@ -3840,8 +3840,8 @@ object functions { /** * Computes the first argu

Re: [PR] [SPARK-48885][SQL] Make some subclasses of RuntimeReplaceable override replacement to lazy val [spark]

2024-07-14 Thread via GitHub
yaooqinn commented on PR #47333: URL: https://github.com/apache/spark/pull/47333#issuecomment-2227610633 Thank you @dongjoon-hyun. It makes sense to change `percentiles` too, I guess I was thinking that they are just identities of `private lazy val percentile = new Percentile(left, r

Re: [PR] [SPARK-47307][SQL][3.5] Add a config to optionally chunk base64 strings [spark]

2024-07-14 Thread via GitHub
yaooqinn commented on PR #47325: URL: https://github.com/apache/spark/pull/47325#issuecomment-2227614856 Okay, based on the information provided by @dongjoon-hyun and the [Policy of backporting bugfiexes](https://www.mail-archive.com/dev@spark.apache.org/msg10284.html), I think we can only

Re: [PR] [SPARK-48873] Use UnsafeRow in JSON parser. [spark]

2024-07-14 Thread via GitHub
LuciferYang commented on code in PR #47310: URL: https://github.com/apache/spark/pull/47310#discussion_r1677272149 ## sql/core/benchmarks/DataSourceReadBenchmark-results.txt: ## @@ -1,431 +1,438 @@ -DataSourceReadBenchmark-jdk21-results.txt===

Re: [PR] [SPARK-48873][SQL] Use UnsafeRow in JSON parser. [spark]

2024-07-14 Thread via GitHub
LuciferYang commented on code in PR #47310: URL: https://github.com/apache/spark/pull/47310#discussion_r1677272492 ## sql/core/benchmarks/DataSourceReadBenchmark-results.txt: ## @@ -1,431 +1,438 @@ -DataSourceReadBenchmark-jdk21-results.txt===

Re: [PR] [SPARK-48873][SQL] Use UnsafeRow in JSON parser. [spark]

2024-07-14 Thread via GitHub
LuciferYang commented on code in PR #47310: URL: https://github.com/apache/spark/pull/47310#discussion_r1677272149 ## sql/core/benchmarks/DataSourceReadBenchmark-results.txt: ## @@ -1,431 +1,438 @@ -DataSourceReadBenchmark-jdk21-results.txt===

Re: [PR] [SPARK-48886][SS] Add version info to changelog v2 to allow for easier evolution [spark]

2024-07-14 Thread via GitHub
HeartSaVioR commented on PR #47336: URL: https://github.com/apache/spark/pull/47336#issuecomment-2227627161 Shall we follow the way we have been doing for versioning? We tend to reserve the first line for version and use the format `s"v$version"` - this applies to offset/commit log as well

Re: [PR] [SPARK-48886][SS] Add version info to changelog v2 to allow for easier evolution [spark]

2024-07-14 Thread via GitHub
HeartSaVioR commented on code in PR #47336: URL: https://github.com/apache/spark/pull/47336#discussion_r1677278405 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDBFileManager.scala: ## @@ -193,11 +208,18 @@ class RocksDBFileManager( versio

Re: [PR] [SPARK-48888][SS] Remove snapshot creation based on changelog ops size [spark]

2024-07-14 Thread via GitHub
HeartSaVioR commented on PR #47338: URL: https://github.com/apache/spark/pull/47338#issuecomment-2227634609 Thanks! Merging to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-48888][SS] Remove snapshot creation based on changelog ops size [spark]

2024-07-14 Thread via GitHub
HeartSaVioR closed pull request #47338: [SPARK-4][SS] Remove snapshot creation based on changelog ops size URL: https://github.com/apache/spark/pull/47338 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

  1   2   >