Re: [PR] [SPARK-48755] State V2 base implementation and ValueState support [spark]

2024-07-16 Thread via GitHub
bogao007 commented on PR #47133: URL: https://github.com/apache/spark/pull/47133#issuecomment-2231824089 > @bogao007 - test failure seems related ? > > ``` > [error] /home/runner/work/spark/spark/sql/core/src/main/scala/org/apache/spark/sql/execution/python/TransformWithStateInPand

Re: [PR] [SPARK-48755] State V2 base implementation and ValueState support [spark]

2024-07-16 Thread via GitHub
anishshri-db commented on code in PR #47133: URL: https://github.com/apache/spark/pull/47133#discussion_r1680065191 ## python/pyspark/sql/tests/pandas/test_pandas_transform_with_state.py: ## @@ -0,0 +1,152 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or mo

Re: [PR] [SPARK-48755] State V2 base implementation and ValueState support [spark]

2024-07-16 Thread via GitHub
anishshri-db commented on code in PR #47133: URL: https://github.com/apache/spark/pull/47133#discussion_r1680066012 ## python/pyspark/sql/tests/pandas/test_pandas_transform_with_state.py: ## @@ -0,0 +1,152 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or mo

Re: [PR] [SPARK-48755] State V2 base implementation and ValueState support [spark]

2024-07-16 Thread via GitHub
anishshri-db commented on code in PR #47133: URL: https://github.com/apache/spark/pull/47133#discussion_r1680066533 ## python/pyspark/sql/tests/pandas/test_pandas_transform_with_state.py: ## @@ -0,0 +1,152 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or mo

Re: [PR] [SPARK-48755] State V2 base implementation and ValueState support [spark]

2024-07-16 Thread via GitHub
anishshri-db commented on code in PR #47133: URL: https://github.com/apache/spark/pull/47133#discussion_r1680066276 ## python/pyspark/sql/tests/pandas/test_pandas_transform_with_state.py: ## @@ -0,0 +1,152 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or mo

Re: [PR] [SPARK-48903][SS] Set the RocksDB last snapshot version correctly on remote load [spark]

2024-07-16 Thread via GitHub
anishshri-db commented on code in PR #47363: URL: https://github.com/apache/spark/pull/47363#discussion_r1680081616 ## sql/core/src/test/scala/org/apache/spark/sql/execution/streaming/state/RocksDBSuite.scala: ## @@ -1663,9 +1670,8 @@ class RocksDBSuite extends AlsoTestWithChan

Re: [PR] [SPARK-48900] Add `reason` field for `cancelJobGroup` and `cancelJobsWithTag` [spark]

2024-07-16 Thread via GitHub
mingkangli-db commented on PR #47361: URL: https://github.com/apache/spark/pull/47361#issuecomment-2231904382 @cloud-fan Hi Wenchen, since last time you reviewed it, I addressed the comments and also synced the changes to R, Python, and Java `SparkContext` API, making it consistent with the

Re: [PR] [SPARK-48892][ML] Avoid per-row param read in `Tokenizer` [spark]

2024-07-16 Thread via GitHub
zhengruifeng closed pull request #47342: [SPARK-48892][ML] Avoid per-row param read in `Tokenizer` URL: https://github.com/apache/spark/pull/47342 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] [SPARK-48892][ML] Avoid per-row param read in `Tokenizer` [spark]

2024-07-16 Thread via GitHub
zhengruifeng commented on PR #47342: URL: https://github.com/apache/spark/pull/47342#issuecomment-2231972215 merged to master -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[PR] [SPARK-48917][BUILD] Upgrade tink to 1.14.0 [spark]

2024-07-16 Thread via GitHub
panbingkun opened a new pull request, #47377: URL: https://github.com/apache/spark/pull/47377 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How

Re: [PR] [SPARK-48755] State V2 base implementation and ValueState support [spark]

2024-07-16 Thread via GitHub
bogao007 commented on code in PR #47133: URL: https://github.com/apache/spark/pull/47133#discussion_r1680168088 ## python/pyspark/sql/streaming/__init__.py: ## @@ -19,3 +19,4 @@ from pyspark.sql.streaming.readwriter import DataStreamReader, DataStreamWriter # noqa: F401 from

Re: [PR] [SPARK-48883][ML][R] Replace RDD read / write API invocation with Dataframe read / write API [spark]

2024-07-16 Thread via GitHub
HyukjinKwon commented on PR #47341: URL: https://github.com/apache/spark/pull/47341#issuecomment-2232029263 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] [SPARK-48883][ML][R] Replace RDD read / write API invocation with Dataframe read / write API [spark]

2024-07-16 Thread via GitHub
HyukjinKwon closed pull request #47341: [SPARK-48883][ML][R] Replace RDD read / write API invocation with Dataframe read / write API URL: https://github.com/apache/spark/pull/47341 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

Re: [PR] [SPARK-47602][CORE][K8S][FOLLOWUP] Improve structure logging for isExecutorIdleTimedOut [spark]

2024-07-16 Thread via GitHub
github-actions[bot] closed pull request #45849: [SPARK-47602][CORE][K8S][FOLLOWUP] Improve structure logging for isExecutorIdleTimedOut URL: https://github.com/apache/spark/pull/45849 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to Git

Re: [PR] [SPARK-47649][SQL] Make the parameter `inputs` of the function `[csv|parquet|orc|json|text|xml](paths: String*)` non empty [spark]

2024-07-16 Thread via GitHub
github-actions[bot] commented on PR #45776: URL: https://github.com/apache/spark/pull/45776#issuecomment-2232054848 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.

Re: [PR] [SPARK-48510] Fix for UDAF `toColumn` API when running tests in Maven [spark]

2024-07-16 Thread via GitHub
HyukjinKwon commented on PR #47368: URL: https://github.com/apache/spark/pull/47368#issuecomment-2232077921 @itholic wanna try merging this? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the sp

Re: [PR] [SPARK-48821][SQL] Support Update in DataFrameWriterV2 [spark]

2024-07-16 Thread via GitHub
huaxingao commented on PR #47233: URL: https://github.com/apache/spark/pull/47233#issuecomment-2232086743 I took a look at Delta Lake's implementation for [update](https://github.com/delta-io/delta/blob/master/spark/src/main/scala/io/delta/tables/DeltaTable.scala#L234), which uses executeUp

Re: [PR] [SPARK-48917][BUILD] Upgrade tink to 1.14.0 [spark]

2024-07-16 Thread via GitHub
panbingkun commented on PR #47377: URL: https://github.com/apache/spark/pull/47377#issuecomment-2232087368 It depends on `4.27.0

Re: [PR] [ONLY TEST][SPARK-48917][BUILD] Upgrade tink to 1.14.0 [spark]

2024-07-16 Thread via GitHub
panbingkun closed pull request #47377: [ONLY TEST][SPARK-48917][BUILD] Upgrade tink to 1.14.0 URL: https://github.com/apache/spark/pull/47377 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spec

Re: [PR] [ONLY TEST][SPARK-48917][BUILD] Upgrade tink to 1.14.0 [spark]

2024-07-16 Thread via GitHub
panbingkun commented on PR #47377: URL: https://github.com/apache/spark/pull/47377#issuecomment-2232113942 https://github.com/user-attachments/assets/0ecddfae-fae6-4f0b-bdb6-8458164a15fe";> -- This is an automated message from the Apache Git Service. To respond to the message, please l

Re: [PR] [SPARK-48903][SS] Set the RocksDB last snapshot version correctly on remote load [spark]

2024-07-16 Thread via GitHub
HeartSaVioR closed pull request #47363: [SPARK-48903][SS] Set the RocksDB last snapshot version correctly on remote load URL: https://github.com/apache/spark/pull/47363 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

Re: [PR] [SPARK-48903][SS] Set the RocksDB last snapshot version correctly on remote load [spark]

2024-07-16 Thread via GitHub
HeartSaVioR commented on PR #47363: URL: https://github.com/apache/spark/pull/47363#issuecomment-2232122796 Thanks! Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-48510][CONNECT][FOLLOW-UP] Fix for UDAF `toColumn` API when running tests in Maven [spark]

2024-07-16 Thread via GitHub
itholic closed pull request #47368: [SPARK-48510][CONNECT][FOLLOW-UP] Fix for UDAF `toColumn` API when running tests in Maven URL: https://github.com/apache/spark/pull/47368 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

Re: [PR] [SPARK-48510][CONNECT][FOLLOW-UP] Fix for UDAF `toColumn` API when running tests in Maven [spark]

2024-07-16 Thread via GitHub
itholic commented on PR #47368: URL: https://github.com/apache/spark/pull/47368#issuecomment-2232155789 Merged to master -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [PR] [SPARK-48919][CONNECT] Move connect code generation and dependency management to a separate project [spark]

2024-07-16 Thread via GitHub
hvanhovell commented on code in PR #47378: URL: https://github.com/apache/spark/pull/47378#discussion_r1680301810 ## sql/api/pom.xml: ## @@ -86,7 +92,7 @@ true -../api/src/main/antlr4 +

Re: [PR] [SPARK-48919][CONNECT] Move connect code generation and dependency management to a separate project [spark]

2024-07-16 Thread via GitHub
hvanhovell commented on code in PR #47378: URL: https://github.com/apache/spark/pull/47378#discussion_r1680302416 ## connector/connect/client/jvm/pom.xml: ## @@ -116,49 +90,18 @@ false true + Review Comment: Most of the shading i

Re: [PR] [SPARK-48919][CONNECT] Move connect code generation and dependency management to a separate project [spark]

2024-07-16 Thread via GitHub
hvanhovell commented on code in PR #47378: URL: https://github.com/apache/spark/pull/47378#discussion_r1680303391 ## connect/server/pom.xml: ## @@ -36,70 +36,25 @@ - org.apache.spark Review Comment: This has all been moved to connect-api. -- This is an a

Re: [PR] [SPARK-48919][CONNECT] Move connect code generation and dependency management to a separate project [spark]

2024-07-16 Thread via GitHub
hvanhovell commented on code in PR #47378: URL: https://github.com/apache/spark/pull/47378#discussion_r1680303100 ## connect/common/pom.xml: ## @@ -39,59 +39,6 @@ spark-sql-api_${scala.binary.version} ${project.version} - Review Comme

Re: [PR] [SPARK-48919][CONNECT] Move connect code generation and dependency management to a separate project [spark]

2024-07-16 Thread via GitHub
hvanhovell commented on code in PR #47378: URL: https://github.com/apache/spark/pull/47378#discussion_r1680312565 ## sql/connect-api/pom.xml: ## @@ -0,0 +1,312 @@ + + + +http://maven.apache.org/POM/4.0.0"; xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"; + xsi:sch

Re: [PR] [SPARK-48919][CONNECT] Move connect code generation and dependency management to a separate project [spark]

2024-07-16 Thread via GitHub
hvanhovell commented on code in PR #47378: URL: https://github.com/apache/spark/pull/47378#discussion_r1680312565 ## sql/connect-api/pom.xml: ## @@ -0,0 +1,312 @@ + + + +http://maven.apache.org/POM/4.0.0"; xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"; + xsi:sch

Re: [PR] [SPARK-48919][CONNECT] Move connect code generation and dependency management to a separate project [spark]

2024-07-16 Thread via GitHub
hvanhovell commented on code in PR #47378: URL: https://github.com/apache/spark/pull/47378#discussion_r1680313273 ## sql/connect-api/pom.xml: ## @@ -0,0 +1,312 @@ + + + +http://maven.apache.org/POM/4.0.0"; xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"; + xsi:sch

Re: [PR] [SPARK-48919][CONNECT] Move connect code generation and dependency management to a separate project [spark]

2024-07-16 Thread via GitHub
hvanhovell commented on code in PR #47378: URL: https://github.com/apache/spark/pull/47378#discussion_r1680313747 ## connector/connect/client/jvm/pom.xml: ## @@ -116,49 +90,18 @@ false true + Review Comment: ... and yes I still n

[PR] [SPARK-48920][BUILD][3.5] Upgrade ORC to 1.9.4 [spark]

2024-07-16 Thread via GitHub
williamhyun opened a new pull request, #47379: URL: https://github.com/apache/spark/pull/47379 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How

Re: [PR] [SPARK-48919][CONNECT] Move connect code generation and dependency management to a separate project [spark]

2024-07-16 Thread via GitHub
hvanhovell commented on code in PR #47378: URL: https://github.com/apache/spark/pull/47378#discussion_r1680314151 ## project/SparkBuild.scala: ## @@ -674,23 +664,76 @@ object SparkConnectCommon { // Exclude `scala-library` from assembly. (assembly / assemblyPackageScal

Re: [PR] [SPARK-48920][BUILD][3.5] Upgrade ORC to 1.9.4 [spark]

2024-07-16 Thread via GitHub
williamhyun commented on PR #47379: URL: https://github.com/apache/spark/pull/47379#issuecomment-2232214327 cc: @yaooqinn , @dongjoon-hyun -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spe

Re: [PR] [SPARK-48919][CONNECT] Move connect code generation and dependency management to a separate project [spark]

2024-07-16 Thread via GitHub
hvanhovell commented on code in PR #47378: URL: https://github.com/apache/spark/pull/47378#discussion_r1680315462 ## project/SparkBuild.scala: ## @@ -713,85 +756,9 @@ object SparkConnectCommon { } } -object SparkConnect { - import BuildCommons.protoVersion - +object Spark

Re: [PR] [SPARK-48919][CONNECT] Move connect code generation and dependency management to a separate project [spark]

2024-07-16 Thread via GitHub
hvanhovell commented on PR #47378: URL: https://github.com/apache/spark/pull/47378#issuecomment-2232216536 For the reviewers. Most of this PR is mechanical, renaming imports to their new shaded names. Please focus on the Maven and SBT build files first! -- This is an automated message fro

Re: [PR] [SPARK-48906][SQL] Introduce `SHOW COLLATIONS LIKE ...` syntax to show all collations [spark]

2024-07-16 Thread via GitHub
panbingkun commented on PR #47364: URL: https://github.com/apache/spark/pull/47364#issuecomment-2232225702 Currently only show `normalized` collation name. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [PR] [SPARK-48919][CONNECT] Move connect code generation and dependency management to a separate project [spark]

2024-07-16 Thread via GitHub
HyukjinKwon commented on PR #47378: URL: https://github.com/apache/spark/pull/47378#issuecomment-2232225650 cc @LuciferYang if you find some time to review. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [PR] add possibility to set log filename & disable spark log rotation [spark]

2024-07-16 Thread via GitHub
HyukjinKwon commented on PR #47373: URL: https://github.com/apache/spark/pull/47373#issuecomment-2232228940 Can you make a PR against master brnach? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go t

Re: [PR] [SPARK-47307][DOCS][FOLLOWUP] Add a migration guide for the behavior change of base64 function [spark]

2024-07-16 Thread via GitHub
HyukjinKwon commented on PR #47371: URL: https://github.com/apache/spark/pull/47371#issuecomment-2232230103 @allisonwang-db wanna try merging this? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[PR] [SPARK-48921][SQL] ScalaUDF in subquery should run through analyzer [spark]

2024-07-16 Thread via GitHub
viirya opened a new pull request, #47380: URL: https://github.com/apache/spark/pull/47380 ### What changes were proposed in this pull request? We got a customer issue that a `MergeInto` query on Iceberg table works earlier but cannot work after upgrading to Spark 3.4.

Re: [PR] [SPARK-47307][DOCS][FOLLOWUP] Add a migration guide for the behavior change of base64 function [spark]

2024-07-16 Thread via GitHub
allisonwang-db closed pull request #47371: [SPARK-47307][DOCS][FOLLOWUP] Add a migration guide for the behavior change of base64 function URL: https://github.com/apache/spark/pull/47371 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to Gi

Re: [PR] [SPARK-47307][DOCS][FOLLOWUP] Add a migration guide for the behavior change of base64 function [spark]

2024-07-16 Thread via GitHub
allisonwang-db commented on PR #47371: URL: https://github.com/apache/spark/pull/47371#issuecomment-2232238622 Merged to master and branch-3.5 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] [SPARK-48906][SQL] Introduce `SHOW COLLATIONS LIKE ...` syntax to show all collations [spark]

2024-07-16 Thread via GitHub
panbingkun commented on PR #47364: URL: https://github.com/apache/spark/pull/47364#issuecomment-2232241876 If necessary, we can also show columns: `CaseSensitivity` and `AccentSensitivity` -- This is an automated message from the Apache Git Service. To respond to the message, please log o

Re: [PR] [SPARK-48920][BUILD][3.5] Upgrade ORC to 1.9.4 [spark]

2024-07-16 Thread via GitHub
williamhyun commented on PR #47379: URL: https://github.com/apache/spark/pull/47379#issuecomment-2232245366 Thank you, @yaooqinn , @dongjoon-hyun , @HyukjinKwon ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

Re: [PR] [SPARK-48919][CONNECT] Move connect code generation and dependency management to a separate project [spark]

2024-07-16 Thread via GitHub
pan3793 commented on PR #47378: URL: https://github.com/apache/spark/pull/47378#issuecomment-2232246362 I remember there are issues for maven to consume a shaded module in the same project. i.e. you must run `mvn install -pl ` first, otherwise `mvn test` or `mvn package` can not see the sha

Re: [PR] [SPARK-48919][CONNECT] Move connect code generation and dependency management to a separate project [spark]

2024-07-16 Thread via GitHub
hvanhovell commented on PR #47378: URL: https://github.com/apache/spark/pull/47378#issuecomment-2232276170 @pan3793 thanks for the input. I did check maven package and that seemed to work for packaging (this is the command I used: `build/mvn package -pl connector/connect/client/jvm -am`). I

[PR] [SPARK-48922][SQL] Optimize complex type insertion performance [spark]

2024-07-16 Thread via GitHub
wForget opened a new pull request, #47381: URL: https://github.com/apache/spark/pull/47381 ### What changes were proposed in this pull request? To improve insertion performance, there is no need to add transform expressions when there is no conversion for complex types.

Re: [PR] [SPARK-48906][SQL] Introduce `SHOW COLLATIONS LIKE ...` syntax to show all collations [spark]

2024-07-16 Thread via GitHub
panbingkun commented on PR #47364: URL: https://github.com/apache/spark/pull/47364#issuecomment-2232283575 Another option: Only display `name`, `provider` and `version` when execute `SHOW COLLATIONS ...` And when execute `DESCRIBE COLLATIONS ...`, will display: `name`, `provider`, `ve

Re: [PR] [SPARK-48889][SS] testStream to unload state stores before finishing [spark]

2024-07-16 Thread via GitHub
HeartSaVioR commented on PR #47339: URL: https://github.com/apache/spark/pull/47339#issuecomment-2232289776 https://github.com/siying/spark/runs/27529815714 This only failed with Docker integration test `org.apache.spark.sql.jdbc.OracleIntegrationSuite` which is unrelated. -- This

Re: [PR] [SPARK-48921][SQL] ScalaUDF in subquery should run through analyzer [spark]

2024-07-16 Thread via GitHub
dongjoon-hyun commented on code in PR #47380: URL: https://github.com/apache/spark/pull/47380#discussion_r1680352198 ## sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/ResolveSubquerySuite.scala: ## @@ -299,4 +300,58 @@ class ResolveSubquerySuite extends Analy

Re: [PR] [SPARK-48889][SS] testStream to unload state stores before finishing [spark]

2024-07-16 Thread via GitHub
HeartSaVioR commented on PR #47339: URL: https://github.com/apache/spark/pull/47339#issuecomment-2232290247 Thanks! Merging to master/3.5/3.4 (if there's no merge conflict). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

Re: [PR] [SPARK-48917][BUILD] Upgrade tink to 1.14.0 [spark]

2024-07-16 Thread via GitHub
LuciferYang commented on PR #47377: URL: https://github.com/apache/spark/pull/47377#issuecomment-2232290639 Thank you for pinging me, @dongjoon-hyun Yes, due to the compatibility issue with protobuf-java, although excluding it may be a workaround, I prefer to wait for the official releas

Re: [PR] [SPARK-48921][SQL] ScalaUDF in subquery should run through analyzer [spark]

2024-07-16 Thread via GitHub
dongjoon-hyun commented on PR #47380: URL: https://github.com/apache/spark/pull/47380#issuecomment-2232291809 Do you happen to know which JIRA issue cause this regression, @viirya ? > after upgrading to Spark 3.4. -- This is an automated message from the Apache Git Service. To respo

Re: [PR] [SPARK-48889][SS] testStream to unload state stores before finishing [spark]

2024-07-16 Thread via GitHub
HeartSaVioR closed pull request #47339: [SPARK-48889][SS] testStream to unload state stores before finishing URL: https://github.com/apache/spark/pull/47339 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[PR] [MINOR][SQL] Fix CollationFactorySuite [spark]

2024-07-16 Thread via GitHub
panbingkun opened a new pull request, #47382: URL: https://github.com/apache/spark/pull/47382 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How

Re: [PR] [MINOR][SQL] Fix CollationFactorySuite [spark]

2024-07-16 Thread via GitHub
panbingkun commented on PR #47382: URL: https://github.com/apache/spark/pull/47382#issuecomment-2232303157 > Please use a proper JIRA ID for code change. Especially, this is a kind of `Fix`. Okay, let me file it. -- This is an automated message from the Apache Git Service. To respo

Re: [PR] [SPARK-48396] Support configuring max cores can be used for SQL [spark]

2024-07-16 Thread via GitHub
yabola commented on PR #46713: URL: https://github.com/apache/spark/pull/46713#issuecomment-2232302650 I would like to describe the usage scenario: In a scenario where multiple users are sharing 2048 core long running SQL cluster. Some users may have non-standard queries that use a large

Re: [PR] [MINOR][SQL] Fix CollationFactorySuite [spark]

2024-07-16 Thread via GitHub
panbingkun commented on code in PR #47382: URL: https://github.com/apache/spark/pull/47382#discussion_r1680359737 ## common/unsafe/src/test/scala/org/apache/spark/unsafe/types/CollationFactorySuite.scala: ## @@ -154,8 +151,8 @@ class CollationFactorySuite extends AnyFunSuite wit

Re: [PR] [SPARK-48919][CONNECT] Move connect code generation and dependency management to a separate project [spark]

2024-07-16 Thread via GitHub
LuciferYang commented on PR #47378: URL: https://github.com/apache/spark/pull/47378#issuecomment-2232303765 Thank you for pinging me, @HyukjinKwon -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] [SPARK-48923][SQL][TESTS] Fix the incorrect logic of `CollationFactorySuite` [spark]

2024-07-16 Thread via GitHub
panbingkun commented on code in PR #47382: URL: https://github.com/apache/spark/pull/47382#discussion_r1680361637 ## common/unsafe/src/test/scala/org/apache/spark/unsafe/types/CollationFactorySuite.scala: ## @@ -154,8 +151,8 @@ class CollationFactorySuite extends AnyFunSuite wit

Re: [PR] [SPARK-48917][BUILD] Upgrade tink to 1.14.0 [spark]

2024-07-16 Thread via GitHub
panbingkun closed pull request #47377: [SPARK-48917][BUILD] Upgrade tink to 1.14.0 URL: https://github.com/apache/spark/pull/47377 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific commen

Re: [PR] [SPARK-48917][BUILD] Upgrade tink to 1.14.0 [spark]

2024-07-16 Thread via GitHub
panbingkun commented on PR #47377: URL: https://github.com/apache/spark/pull/47377#issuecomment-2232309413 > Unfortunately, it seems that we had a previous PR and we decided to close, @panbingkun . > > * [[SPARK-48814][BUILD] Upgrade `tink` to 1.14.0  #47221](https://github.com/apache

Re: [PR] [SPARK-48923][SQL][TESTS] Fix the incorrect logic of `CollationFactorySuite` [spark]

2024-07-16 Thread via GitHub
dongjoon-hyun commented on PR #47382: URL: https://github.com/apache/spark/pull/47382#issuecomment-2232315194 Also, cc @dbatomic , @cloud-fan , @MaxGekk from the original PR. - #44968 -- This is an automated message from the Apache Git Service. To respond to the message, please log on t

Re: [PR] [SPARK-48510][2/2] Support UDAF `toColumn` API in Spark Connect [spark]

2024-07-16 Thread via GitHub
LuciferYang commented on code in PR #46849: URL: https://github.com/apache/spark/pull/46849#discussion_r1680369394 ## connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/UserDefinedFunctionE2ETestSuite.scala: ## @@ -388,6 +378,66 @@ class UserDefinedFunctionE2ETestS

Re: [PR] [SPARK-48919][CONNECT] Move connect code generation and dependency management to a separate project [spark]

2024-07-16 Thread via GitHub
LuciferYang commented on PR #47378: URL: https://github.com/apache/spark/pull/47378#issuecomment-2232321196 > I am not sure how much of an issue this is since we use SBT for CI. There are multiple daily tests now using Maven for testing. -- This is an automated message f

Re: [PR] [SPARK-48510][2/2] Support UDAF `toColumn` API in Spark Connect [spark]

2024-07-16 Thread via GitHub
HyukjinKwon commented on code in PR #46849: URL: https://github.com/apache/spark/pull/46849#discussion_r1680376356 ## connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/UserDefinedFunctionE2ETestSuite.scala: ## @@ -388,6 +378,66 @@ class UserDefinedFunctionE2ETestS

Re: [PR] [SPARK-48920][BUILD][3.5] Upgrade ORC to 1.9.4 [spark]

2024-07-16 Thread via GitHub
dongjoon-hyun commented on PR #47379: URL: https://github.com/apache/spark/pull/47379#issuecomment-2232348708 Merged to branch-3.5 for Apache Spark 3.5.2. Thank you all! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

Re: [PR] [SPARK-48920][BUILD][3.5] Upgrade ORC to 1.9.4 [spark]

2024-07-16 Thread via GitHub
dongjoon-hyun closed pull request #47379: [SPARK-48920][BUILD][3.5] Upgrade ORC to 1.9.4 URL: https://github.com/apache/spark/pull/47379 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-48921][SQL] ScalaUDF in subquery should run through analyzer [spark]

2024-07-16 Thread via GitHub
viirya commented on PR #47380: URL: https://github.com/apache/spark/pull/47380#issuecomment-2232423385 > Do you happen to know which JIRA issue is related to this regression, @viirya ? > > > after upgrading to Spark 3.4. Thank you for review, @dongjoon-hyun. It is not ca

Re: [PR] [SPARK-48921][SQL] ScalaUDF in subquery should run through analyzer [spark]

2024-07-16 Thread via GitHub
viirya commented on PR #47380: URL: https://github.com/apache/spark/pull/47380#issuecomment-2232425180 I re-triggered the failed `Run Docker integration tests`. All CIs are passed now: https://github.com/viirya/spark-1/actions/runs/9967182407/job/27542878853 -- This is an automated

Re: [PR] [SPARK-48921][SQL] ScalaUDF in subquery should run through analyzer [spark]

2024-07-16 Thread via GitHub
dongjoon-hyun commented on PR #47380: URL: https://github.com/apache/spark/pull/47380#issuecomment-2232428003 Got it. Feel free to merge and backport, @viirya ~ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL a

Re: [PR] [SPARK-48921][SQL] ScalaUDF in subquery should run through analyzer [spark]

2024-07-16 Thread via GitHub
viirya commented on PR #47380: URL: https://github.com/apache/spark/pull/47380#issuecomment-2232429329 Thank you @dongjoon-hyun. I will keep it for a day and merge if no more comments. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

Re: [PR] [SPARK-48919][CONNECT] Move connect code generation and dependency management to a separate project [spark]

2024-07-16 Thread via GitHub
LuciferYang commented on PR #47378: URL: https://github.com/apache/spark/pull/47378#issuecomment-2232430016 @hvanhovell local run ``` build/mvn clean install -DskipTests -Phive build/mvn test -pl connector/connect/client/jvm -Phive ``` then ``` [ERROR] Test

Re: [PR] [SPARK-48919][CONNECT] Move connect code generation and dependency management to a separate project [spark]

2024-07-16 Thread via GitHub
LuciferYang commented on PR #47378: URL: https://github.com/apache/spark/pull/47378#issuecomment-2232457945 local run ``` ./dev/test-dependencies.sh --replace-manifest git diff ``` ``` diff --git a/dev/deps/spark-deps-hadoop-3-hive-2.3 b/dev/deps/spark-deps-hadoop-

Re: [PR] [SPARK-48919][CONNECT] Move connect code generation and dependency management to a separate project [spark]

2024-07-16 Thread via GitHub
LuciferYang commented on code in PR #47378: URL: https://github.com/apache/spark/pull/47378#discussion_r1680434473 ## project/SparkBuild.scala: ## @@ -674,23 +664,76 @@ object SparkConnectCommon { // Exclude `scala-library` from assembly. (assembly / assemblyPackageSca

Re: [PR] [SPARK-48919][CONNECT] Move connect code generation and dependency management to a separate project [spark]

2024-07-16 Thread via GitHub
LuciferYang commented on PR #47378: URL: https://github.com/apache/spark/pull/47378#issuecomment-2232464638 https://github.com/apache/spark/blob/3a245558be882ae94f507976e4e4fb8c1d9bf344/dev/sparktestsupport/modules.py#L323-L334 Although the `connect-api` module does not have test case

<    1   2