Re: [PR] [SPARK-51461] Setup `SparkConnect` Swift package structure and CI to test `build` [spark-connect-swift]

2025-03-11 Thread via GitHub
dongjoon-hyun commented on code in PR #4: URL: https://github.com/apache/spark-connect-swift/pull/4#discussion_r1988317890 ## README.md: ## @@ -1 +1,15 @@ -# Apache Spark Connect Client for Swift language +# Apache Spark Connect Client for Swift + +[![GitHub Actions Build](http

Re: [PR] [SPARK-51440][SS] classify the NPE when null topic field value is in kafka message data and there is no topic option [spark]

2025-03-11 Thread via GitHub
HeartSaVioR commented on PR #50214: URL: https://github.com/apache/spark/pull/50214#issuecomment-2712885987 Thanks! Merging to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-51462][SQL] Support typed literals of the TIME data type [spark]

2025-03-11 Thread via GitHub
dongjoon-hyun commented on PR #50228: URL: https://github.com/apache/spark/pull/50228#issuecomment-2712883926 Merged to master for Apache Spark 4.1.0. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [PR] [SPARK-51440][SS] classify the NPE when null topic field value is in kafka message data and there is no topic option [spark]

2025-03-11 Thread via GitHub
HeartSaVioR closed pull request #50214: [SPARK-51440][SS] classify the NPE when null topic field value is in kafka message data and there is no topic option URL: https://github.com/apache/spark/pull/50214 -- This is an automated message from the Apache Git Service. To respond to the message,

Re: [PR] [SPARK-51462][SQL] Support typed literals of the TIME data type [spark]

2025-03-11 Thread via GitHub
LuciferYang commented on PR #50228: URL: https://github.com/apache/spark/pull/50228#issuecomment-2712885221 late LGTM -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To uns

Re: [PR] [SPARK-51465] Use `Apache Arrow Swift` 19.0.1 [spark-connect-swift]

2025-03-11 Thread via GitHub
dongjoon-hyun commented on PR #6: URL: https://github.com/apache/spark-connect-swift/pull/6#issuecomment-2712886857 Could you review this PR to use `Apache Arrow Swift` source code in `Apache Spark Connect for Swift` when you have some time, @MaxGekk ? This is a temporal usage until Apache

[PR] [SPARK-51463] Add `Spark Connect`-generated `Swift` source code [spark-connect-swift]

2025-03-11 Thread via GitHub
dongjoon-hyun opened a new pull request, #5: URL: https://github.com/apache/spark-connect-swift/pull/5 ### What changes were proposed in this pull request? This PR aims to add `Spark Connect`-generated `Swift` source code. ### Why are the changes needed? These files are g

Re: [PR] [SPARK-51450][CORE] BarrierCoordinator thread not exiting in Spark standalone mode [spark]

2025-03-11 Thread via GitHub
jjayadeep06 commented on PR #50223: URL: https://github.com/apache/spark/pull/50223#issuecomment-2712735092 > Hello @jjayadeep06 . I saw how #50020 on master switched to a daemon thread. Do you think we should construct the `Timer` here on branch-3.5 with a daemon thread too for extra safet

Re: [PR] [SPARK-51468][SQL] Revert "From json/xml should not change collations in the given schema" [spark]

2025-03-11 Thread via GitHub
stefankandic commented on PR #50234: URL: https://github.com/apache/spark/pull/50234#issuecomment-2713967892 @MaxGekk can you take a look? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spec

Re: [PR] [SPARK-51318][TESTS] Remove `jar` files from Apache Spark repository and disable affected tests [spark]

2025-03-11 Thread via GitHub
vrozov commented on code in PR #50231: URL: https://github.com/apache/spark/pull/50231#discussion_r1989606369 ## core/src/test/scala/org/apache/spark/SparkContextSuite.scala: ## @@ -243,7 +243,8 @@ class SparkContextSuite extends SparkFunSuite with LocalSparkContext with Eventu

Re: [PR] [SPARK-51318][TESTS] Remove `jar` files from Apache Spark repository and disable affected tests [spark]

2025-03-11 Thread via GitHub
HyukjinKwon commented on code in PR #50231: URL: https://github.com/apache/spark/pull/50231#discussion_r1989391203 ## core/src/test/scala/org/apache/spark/SparkContextSuite.scala: ## @@ -243,7 +243,8 @@ class SparkContextSuite extends SparkFunSuite with LocalSparkContext with E

Re: [PR] [SPARK-51466][SQL][HIVE] Eliminate Hive built-in UDFs initialization on Hive UDF evaluation [spark]

2025-03-11 Thread via GitHub
LuciferYang commented on code in PR #50232: URL: https://github.com/apache/spark/pull/50232#discussion_r1988965720 ## sql/hive/src/main/java/org/apache/hadoop/hive/ql/exec/SparkDefaultUDFMethodResolver.java: ## @@ -0,0 +1,341 @@ +/* + * Licensed to the Apache Software Foundation

Re: [PR] [SPARK-51318][TESTS] Remove `jar` files from Apache Spark repository and disable affected tests [spark]

2025-03-11 Thread via GitHub
vrozov commented on code in PR #50231: URL: https://github.com/apache/spark/pull/50231#discussion_r1988324765 ## sql/core/src/test/scala/org/apache/spark/sql/artifact/StubClassLoaderSuite.scala: ## @@ -101,7 +102,7 @@ class StubClassLoaderSuite extends SparkFunSuite { //

[PR] [SPARK-51465] Use `Apache Arrow Swift` 19.0.1 [spark-connect-swift]

2025-03-11 Thread via GitHub
dongjoon-hyun opened a new pull request, #6: URL: https://github.com/apache/spark-connect-swift/pull/6 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change?

Re: [PR] [SPARK-51318][TESTS] Remove `jar` files from Apache Spark repository and disable affected tests [spark]

2025-03-11 Thread via GitHub
HyukjinKwon commented on code in PR #50231: URL: https://github.com/apache/spark/pull/50231#discussion_r1989388752 ## core/src/test/scala/org/apache/spark/SparkContextSuite.scala: ## @@ -243,7 +243,8 @@ class SparkContextSuite extends SparkFunSuite with LocalSparkContext with E

Re: [PR] [SPARK-51468][SQL] Revert "From json/xml should not change collations in the given schema" [spark]

2025-03-11 Thread via GitHub
MaxGekk closed pull request #50234: [SPARK-51468][SQL] Revert "From json/xml should not change collations in the given schema" URL: https://github.com/apache/spark/pull/50234 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and us

Re: [PR] [SPARK-51318][TESTS] Remove `jar` files from Apache Spark repository and disable affected tests [spark]

2025-03-11 Thread via GitHub
HyukjinKwon commented on PR #50231: URL: https://github.com/apache/spark/pull/50231#issuecomment-2714485887 -1 if we disable the tests. We are introducing a set of technical debt to remove the other. -- This is an automated message from the Apache Git Service. To respond to the message, p

Re: [PR] [SPARK-51465] Use `Apache Arrow Swift` 19.0.1 [spark-connect-swift]

2025-03-11 Thread via GitHub
dongjoon-hyun commented on code in PR #6: URL: https://github.com/apache/spark-connect-swift/pull/6#discussion_r1989528793 ## Sources/SparkConnect/ArrowArray.swift: ## @@ -0,0 +1,331 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor licens

Re: [PR] [SPARK-51465] Use `Apache Arrow Swift` 19.0.1 [spark-connect-swift]

2025-03-11 Thread via GitHub
dongjoon-hyun commented on code in PR #6: URL: https://github.com/apache/spark-connect-swift/pull/6#discussion_r1989531383 ## Sources/SparkConnect/ArrowArray.swift: ## @@ -0,0 +1,331 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor licens

Re: [PR] [SPARK-51465] Use `Apache Arrow Swift` 19.0.1 [spark-connect-swift]

2025-03-11 Thread via GitHub
dongjoon-hyun commented on PR #6: URL: https://github.com/apache/spark-connect-swift/pull/6#issuecomment-2715038929 Thank you so much all for being interested in this new codebase and taking a look at this to help, @MaxGekk , @viirya and @huaxingao . Merged to main. -- This is an

Re: [PR] [SPARK-51318][TESTS] Remove `jar` files from Apache Spark repository and disable affected tests [spark]

2025-03-11 Thread via GitHub
HyukjinKwon commented on code in PR #50231: URL: https://github.com/apache/spark/pull/50231#discussion_r1989388752 ## core/src/test/scala/org/apache/spark/SparkContextSuite.scala: ## @@ -243,7 +243,8 @@ class SparkContextSuite extends SparkFunSuite with LocalSparkContext with E

Re: [PR] [SPARK-51465] Use `Apache Arrow Swift` 19.0.1 [spark-connect-swift]

2025-03-11 Thread via GitHub
dongjoon-hyun closed pull request #6: [SPARK-51465] Use `Apache Arrow Swift` 19.0.1 URL: https://github.com/apache/spark-connect-swift/pull/6 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spec

[PR] [SPARK-51467][UI] Make tables of the environment page filterable [spark]

2025-03-11 Thread via GitHub
yaooqinn opened a new pull request, #50233: URL: https://github.com/apache/spark/pull/50233 ### What changes were proposed in this pull request? This PR adds multi-column filtering ability to the tables of the environment page. ### Why are the changes needed?

Re: [PR] [SPARK-51466][SQL][HIVE] Eliminate Hive built-in UDFs initialization on Hive UDF evaluation [spark]

2025-03-11 Thread via GitHub
pan3793 commented on PR #50232: URL: https://github.com/apache/spark/pull/50232#issuecomment-2713149012 cc @cloud-fan @dongjoon-hyun @LuciferYang @wangyum @yaooqinn this is an alternative of SPARK-51449 (https://github.com/apache/spark/pull/50222), the advantages of this one are: - appli

Re: [PR] [SPARK-51466][SQL][HIVE] Eliminate Hive built-in UDFs initialization on Hive UDF evaluation [spark]

2025-03-11 Thread via GitHub
pan3793 commented on code in PR #50232: URL: https://github.com/apache/spark/pull/50232#discussion_r1988688097 ## sql/hive/src/main/java/org/apache/hadoop/hive/ql/exec/SparkDefaultUDFMethodResolver.java: ## @@ -0,0 +1,341 @@ +/* + * Licensed to the Apache Software Foundation (AS

[PR] [SPARK-51469][SQL] Improve MapKeyDedupPolicy so that avoid calling toString [spark]

2025-03-11 Thread via GitHub
beliefer opened a new pull request, #50235: URL: https://github.com/apache/spark/pull/50235 ### What changes were proposed in this pull request? This PR proposes to improve `MapKeyDedupPolicy` so that avoid calling `toString`. ### Why are the changes needed? Currently, there

Re: [PR] [MINOR][SQL][TESTS] Remove duplicated plan node check in DataFrameSetOperationsSuite [spark]

2025-03-11 Thread via GitHub
Surbhi-Vijay commented on PR #50227: URL: https://github.com/apache/spark/pull/50227#issuecomment-2714027337 All the checks have passed. @viirya can you please help in merging it? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHu

Re: [PR] [SPARK-51445][CORE][SQL][SS][CONNECT] Change the never changed `var` to `val` [spark]

2025-03-11 Thread via GitHub
beliefer commented on code in PR #50219: URL: https://github.com/apache/spark/pull/50219#discussion_r1986579208 ## core/src/test/scala/org/apache/spark/rpc/TestRpcEndpoint.scala: ## @@ -26,15 +26,15 @@ class TestRpcEndpoint extends ThreadSafeRpcEndpoint with TripleEquals {

[PR] [SPARK-51468][SQL] Revert "From json/xml should not change collations in the given schema" [spark]

2025-03-11 Thread via GitHub
stefankandic opened a new pull request, #50234: URL: https://github.com/apache/spark/pull/50234 ### What changes were proposed in this pull request? After removing session-level collation (#49772) we can also revert the PR that changed the behavior of `from_json` and `from_xml` ex

Re: [PR] [SPARK-51466][SQL][HIVE] Eliminate Hive built-in UDFs initialization on Hive UDF evaluation [spark]

2025-03-11 Thread via GitHub
pan3793 commented on code in PR #50232: URL: https://github.com/apache/spark/pull/50232#discussion_r1989120281 ## sql/hive/src/main/java/org/apache/hadoop/hive/ql/exec/SparkDefaultUDFMethodResolver.java: ## @@ -0,0 +1,341 @@ +/* + * Licensed to the Apache Software Foundation (AS

Re: [PR] [SPARK-51466][SQL][HIVE] Eliminate Hive built-in UDFs initialization on Hive UDF evaluation [spark]

2025-03-11 Thread via GitHub
pan3793 commented on code in PR #50232: URL: https://github.com/apache/spark/pull/50232#discussion_r1989258901 ## sql/hive/src/main/java/org/apache/hadoop/hive/ql/exec/SparkDefaultUDFMethodResolver.java: ## @@ -0,0 +1,341 @@ +/* + * Licensed to the Apache Software Foundation (AS

Re: [PR] [SPARK-51466][SQL][HIVE] Eliminate Hive built-in UDFs initialization on Hive UDF evaluation [spark]

2025-03-11 Thread via GitHub
pan3793 commented on code in PR #50232: URL: https://github.com/apache/spark/pull/50232#discussion_r1989262042 ## sql/hive-thriftserver/src/test/resources/log4j2.properties: ## @@ -92,3 +92,6 @@ logger.parquet2.level = error logger.thriftserver.name = org.apache.spark.sql.hi

Re: [PR] [SPARK-51457][R][INFRA] Use R 4.4.3 in `windows` R GitHub Action job [spark]

2025-03-11 Thread via GitHub
dongjoon-hyun commented on PR #50229: URL: https://github.com/apache/spark/pull/50229#issuecomment-2712033606 Thank you, @HyukjinKwon . -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specifi

Re: [PR] [SPARK-51307][SQL][3.5] locationUri in CatalogStorageFormat shall be decoded for display [spark]

2025-03-11 Thread via GitHub
yaooqinn commented on PR #50164: URL: https://github.com/apache/spark/pull/50164#issuecomment-2702905855 Merged to branch 3.5, thank you again @dongjoon-hyun -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abo

Re: [PR] [SPARK-49008][PYTHON] Use `ParamSpec` to propagate `func` signature in `transform` [spark]

2025-03-11 Thread via GitHub
MaicoTimmerman commented on PR #47493: URL: https://github.com/apache/spark/pull/47493#issuecomment-2704339832 Can we please remove the stale tag and consider merging this MR? The implementation provided by the author is compatible with all supported Python versions. -- This is an automa

Re: [PR] [SPARK-51365][TESTS] Test maven + macos [spark]

2025-03-11 Thread via GitHub
LuciferYang commented on code in PR #50178: URL: https://github.com/apache/spark/pull/50178#discussion_r1982726209 ## sql/core/src/test/scala/org/apache/spark/sql/test/SharedSparkSession.scala: ## @@ -79,6 +79,8 @@ trait SharedSparkSessionBase StaticSQLConf.WAREHOUSE_PATH

Re: [PR] [SPARK-48922][SQL] Optimize nested data type insertion performance [spark]

2025-03-11 Thread via GitHub
wForget commented on PR #47381: URL: https://github.com/apache/spark/pull/47381#issuecomment-2705357436 > Hi @wForget This PR looks important, do you plan to reopen this and rebase on top of [SPARK-49352](https://issues.apache.org/jira/browse/SPARK-49352) ? #47843 Sure, I will rebase

[PR] [SPARK-51318][TESTS] Remove `jar` files from Apache Spark repository and disable affected tests [spark]

2025-03-11 Thread via GitHub
vrozov opened a new pull request, #50231: URL: https://github.com/apache/spark/pull/50231 ### What changes were proposed in this pull request? remove jar files from the Apache Spark repo and disable affected tests. ### Why are the changes needed? Apache source releases must n

Re: [PR] [SPARK-51418][SQL] Fix DataSource PARTITON TABLE w/ Hive type incompatible partition columns [spark]

2025-03-11 Thread via GitHub
yaooqinn closed pull request #50182: [SPARK-51418][SQL] Fix DataSource PARTITON TABLE w/ Hive type incompatible partition columns URL: https://github.com/apache/spark/pull/50182 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

Re: [PR] [SPARK-51340][ML][CONNECT] Model size estimation for linear classification & regression models [spark]

2025-03-11 Thread via GitHub
zhengruifeng commented on code in PR #50106: URL: https://github.com/apache/spark/pull/50106#discussion_r1982391327 ## mllib/src/main/scala/org/apache/spark/ml/classification/FMClassifier.scala: ## @@ -312,6 +322,18 @@ class FMClassificationModel private[classification] ( c

Re: [PR] [SPARK-51365][SQL][TESTS] Add Envs to control the number of `SHUFFLE_EXCHANGE/RESULT_QUERY_STAGE` threads used in test cases related to `SharedSparkSession/TestHive` [spark]

2025-03-11 Thread via GitHub
dongjoon-hyun commented on PR #50206: URL: https://github.com/apache/spark/pull/50206#issuecomment-2709314647 Thank you, @LuciferYang . Merged to master/4.0 for Apache Spark 4.0.0. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitH

Re: [PR] [SPARK-51465] Use `Apache Arrow Swift` 19.0.1 [spark-connect-swift]

2025-03-11 Thread via GitHub
dongjoon-hyun commented on PR #6: URL: https://github.com/apache/spark-connect-swift/pull/6#issuecomment-2714672334 Could you review this PR when you have some time, @huaxingao ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHu

Re: [PR] [SPARK-50759][SQL] Deprecate a few legacy Catalog APIs [spark]

2025-03-11 Thread via GitHub
cloud-fan commented on PR #50085: URL: https://github.com/apache/spark/pull/50085#issuecomment-2714957423 I'm fine with it, cc @dongjoon-hyun -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] [SPARK-51468][SQL] Revert "From json/xml should not change collations in the given schema" [spark]

2025-03-11 Thread via GitHub
MaxGekk commented on PR #50234: URL: https://github.com/apache/spark/pull/50234#issuecomment-2714475389 +1, LGTM. Merging to master/4.0. Thank you, @stefankandic. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

Re: [PR] [SPARK-51318][TESTS] Remove `jar` files from Apache Spark repository and disable affected tests [spark]

2025-03-11 Thread via GitHub
vrozov commented on PR #50231: URL: https://github.com/apache/spark/pull/50231#issuecomment-2714762542 @HyukjinKwon removing jars from Apache source release is a must otherwise ASF may pull out release. By voting -1 on the PR I assume that you will provide another PR shortly with removed ja

Re: [PR] [SPARK-51465] Use `Apache Arrow Swift` 19.0.1 [spark-connect-swift]

2025-03-11 Thread via GitHub
MaxGekk commented on code in PR #6: URL: https://github.com/apache/spark-connect-swift/pull/6#discussion_r1989490231 ## Sources/SparkConnect/ArrowArray.swift: ## @@ -0,0 +1,331 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agre

Re: [PR] [SPARK-50759][SQL] Deprecate a few legacy Catalog APIs [spark]

2025-03-11 Thread via GitHub
dongjoon-hyun commented on PR #50085: URL: https://github.com/apache/spark/pull/50085#issuecomment-2714972304 Also, cc @yaooqinn and @LuciferYang , too -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to g

Re: [PR] [SPARK-51465] Use `Apache Arrow Swift` 19.0.1 [spark-connect-swift]

2025-03-11 Thread via GitHub
viirya commented on PR #6: URL: https://github.com/apache/spark-connect-swift/pull/6#issuecomment-2714985680 For the changes, do you plan to propose them to Arrow Swift repo? ```swift - public enum ArrowTypeId { + public enum ArrowTypeId: Sendable { - public enum Info {

Re: [PR] [SPARK-43131][SQL][WIP] Add labels to identify UDFs [spark]

2025-03-11 Thread via GitHub
tgravescs commented on PR #49775: URL: https://github.com/apache/spark/pull/49775#issuecomment-2714416194 @HyukjinKwon Was there a reason the UDF: tag was removed in https://github.com/apache/spark/commit/fe3e34dda68fd54212df1dd01b8acb9a9bc6a0ad.. I'm not seeing a lot in the pr or issue ab

Re: [PR] [SPARK-51465] Use `Apache Arrow Swift` 19.0.1 [spark-connect-swift]

2025-03-11 Thread via GitHub
dongjoon-hyun commented on PR #6: URL: https://github.com/apache/spark-connect-swift/pull/6#issuecomment-271568 Just to give the reviewers the background, `Swift 6` compiler becomes more like `Rust` compiler in terms of `Concurrency and Data Safety` check. - https://www.swift.org/blo

Re: [PR] [SPARK-51465] Use `Apache Arrow Swift` 19.0.1 [spark-connect-swift]

2025-03-11 Thread via GitHub
dongjoon-hyun commented on PR #6: URL: https://github.com/apache/spark-connect-swift/pull/6#issuecomment-2714994992 Thank you for review, @viirya . Yes, it's required when Apache Arrow starts to support Swift 6.0. -- This is an automated message from the Apache Git Service. To respond to

Re: [PR] [SPARK-51465] Use `Apache Arrow Swift` 19.0.1 [spark-connect-swift]

2025-03-11 Thread via GitHub
dongjoon-hyun commented on PR #6: URL: https://github.com/apache/spark-connect-swift/pull/6#issuecomment-2715003949 After building the initial working code, we are going to entering QA period by adding more unit test coverages. And, all issues are going to the upstreams because we don't wa

Re: [PR] [SPARK-48922][SQL] Optimize nested data type insertion performance [spark]

2025-03-11 Thread via GitHub
kazuyukitanimura commented on PR #47381: URL: https://github.com/apache/spark/pull/47381#issuecomment-2715013121 Hi @wForget Just checking if you had a chance -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

Re: [PR] [SPARK-51097][SS] Re-introduce RocksDB state store's last uploaded snapshot version instance metrics [spark]

2025-03-11 Thread via GitHub
ericm-db commented on code in PR #50195: URL: https://github.com/apache/spark/pull/50195#discussion_r1989916366 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDB.scala: ## @@ -1465,6 +1470,7 @@ class RocksDB( log"with uniqueId: ${MDC(LogK

[PR] [SPARK-51472] Add gRPC `Client` actor [spark-connect-swift]

2025-03-11 Thread via GitHub
dongjoon-hyun opened a new pull request, #7: URL: https://github.com/apache/spark-connect-swift/pull/7 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change?

Re: [PR] [SPARK-51097][SS] Re-introduce RocksDB state store's last uploaded snapshot version instance metrics [spark]

2025-03-11 Thread via GitHub
zecookiez commented on code in PR #50195: URL: https://github.com/apache/spark/pull/50195#discussion_r1989911570 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDB.scala: ## @@ -1465,6 +1470,7 @@ class RocksDB( log"with uniqueId: ${MDC(Log

Re: [PR] [SPARK-51318][TESTS] Remove `jar` files from Apache Spark repository and disable affected tests [spark]

2025-03-11 Thread via GitHub
vrozov commented on code in PR #50231: URL: https://github.com/apache/spark/pull/50231#discussion_r1989929619 ## core/src/test/scala/org/apache/spark/SparkContextSuite.scala: ## @@ -243,7 +243,8 @@ class SparkContextSuite extends SparkFunSuite with LocalSparkContext with Eventu

Re: [PR] [SPARK-51447][SQL] Add `stringToTime` and `stringToTimeAnsi` [spark]

2025-03-11 Thread via GitHub
MaxGekk closed pull request #50220: [SPARK-51447][SQL] Add `stringToTime` and `stringToTimeAnsi` URL: https://github.com/apache/spark/pull/50220 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the s

Re: [PR] [SPARK-51318][TESTS] Remove `jar` files from Apache Spark repository and disable affected tests [spark]

2025-03-11 Thread via GitHub
vrozov commented on PR #50231: URL: https://github.com/apache/spark/pull/50231#issuecomment-2711918242 They are. Please see https://lists.apache.org/thread/0ro5yn6lbbpmvmqp2px3s2pf7cwljlc4 -- This is an automated message from the Apache Git Service. To respond to the message, please log o

Re: [PR] [SPARK-51452][UI] Improve Thread dump table search [spark]

2025-03-11 Thread via GitHub
dongjoon-hyun closed pull request #50225: [SPARK-51452][UI] Improve Thread dump table search URL: https://github.com/apache/spark/pull/50225 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the speci

[PR] [Minor][SQL][Tests] Remove duplicated plan node check in DataFrameSetOperationsSuite [spark]

2025-03-11 Thread via GitHub
Surbhi-Vijay opened a new pull request, #50227: URL: https://github.com/apache/spark/pull/50227 ### What changes were proposed in this pull request? Remove duplicated plan node check in DataFrameSetOperationsSuite ### Why are the changes needed? Code is unnecessarily checking for

[PR] [SPARK-51452][UI] Improve Thread dump table search [spark]

2025-03-11 Thread via GitHub
yaooqinn opened a new pull request, #50225: URL: https://github.com/apache/spark/pull/50225 ### What changes were proposed in this pull request? This PR improves thread dump search by: - query with pattern `tr[id^="thread_"]` w/o further patten matching - short-circuit the se

Re: [PR] Initial Implementation [spark-connect-swift]

2025-03-11 Thread via GitHub
dongjoon-hyun commented on PR #1: URL: https://github.com/apache/spark-connect-swift/pull/1#issuecomment-2715087794 This is simply rebased to the `main` branch because the following were merged successfully. - #2 - #3 - #4 - #5 - #6 -- This is an automated message fro

Re: [PR] [SPARK-51446][SQL] Improve the codecNameMap for the compression codec [spark]

2025-03-11 Thread via GitHub
LuciferYang commented on code in PR #50221: URL: https://github.com/apache/spark/pull/50221#discussion_r1987085431 ## sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/util/HadoopCompressionCodec.java: ## @@ -53,11 +51,16 @@ public CompressionCodec getCompressionCodec() {

Re: [PR] [Minor][SQL][Tests] Remove duplicated plan node check in DataFrameSetOperationsSuite [spark]

2025-03-11 Thread via GitHub
Surbhi-Vijay commented on code in PR #50227: URL: https://github.com/apache/spark/pull/50227#discussion_r1987599381 ## sql/core/src/test/scala/org/apache/spark/sql/DataFrameSetOperationsSuite.scala: ## @@ -1493,8 +1493,6 @@ class DataFrameSetOperationsSuite extends QueryTest

Re: [PR] [SPARK-51461] Setup `SparkConnect` Swift package structure and CI to test `build` [spark-connect-swift]

2025-03-11 Thread via GitHub
yaooqinn commented on PR #4: URL: https://github.com/apache/spark-connect-swift/pull/4#issuecomment-2712452005 > Just FYI, this needs at least `4.0.0-rc2` because `Spark Connect` of `4.0.0-preview2` is insufficient like the following, @yaooqinn . > > ``` > $ git checkout v4.0.0-rc

Re: [PR] [Minor][SQL][Tests] Remove duplicated plan node check in DataFrameSetOperationsSuite [spark]

2025-03-11 Thread via GitHub
viirya commented on code in PR #50227: URL: https://github.com/apache/spark/pull/50227#discussion_r1987440881 ## sql/core/src/test/scala/org/apache/spark/sql/DataFrameSetOperationsSuite.scala: ## @@ -1493,8 +1493,6 @@ class DataFrameSetOperationsSuite extends QueryTest

Re: [PR] [SPARK-51429][Connect] Add "Acknowledgement" message to ExecutePlanResponse [spark]

2025-03-11 Thread via GitHub
vicennial closed pull request #50193: [SPARK-51429][Connect] Add "Acknowledgement" message to ExecutePlanResponse URL: https://github.com/apache/spark/pull/50193 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abo

Re: [PR] [SPARK-51269][SQL] SQLConf should manage the default value for avro compression level [spark]

2025-03-11 Thread via GitHub
yaooqinn commented on PR #50021: URL: https://github.com/apache/spark/pull/50021#issuecomment-2710080229 Shouldn‘t the PR scope be `Simplify AvroCompressionCodec by removing defaultCompressionLevel`? -- This is an automated message from the Apache Git Service. To respond to the message, p

Re: [PR] [SPARK-51391][SQL][CONNECT] Fix `SparkConnectClient` to respect `SPARK_USER` and `user.name` [spark]

2025-03-11 Thread via GitHub
dongjoon-hyun commented on code in PR #50159: URL: https://github.com/apache/spark/pull/50159#discussion_r1983844231 ## python/pyspark/sql/connect/client/core.py: ## @@ -666,7 +666,7 @@ def __init__( elif user_id is not None: self._user_id = user_id

Re: [PR] [SPARK-51269][SQL] SQLConf should manage the default value for avro compression level [spark]

2025-03-11 Thread via GitHub
beliefer commented on PR #50021: URL: https://github.com/apache/spark/pull/50021#issuecomment-2710093839 > Shouldn‘t the PR scope be `Simplify AvroCompressionCodec by removing defaultCompressionLevel`? Sounds good to me. -- This is an automated message from the Apache Git Service.

Re: [PR] [SPARK-49479][CORE] Cancel the Timer non-daemon thread on stopping the BarrierCoordinator [spark]

2025-03-11 Thread via GitHub
beliefer commented on code in PR #50020: URL: https://github.com/apache/spark/pull/50020#discussion_r1983086793 ## core/src/main/scala/org/apache/spark/BarrierCoordinator.scala: ## @@ -119,7 +126,9 @@ private[spark] class BarrierCoordinator( // A timer task that ensures we

Re: [PR] [SPARK-51182][SQL] DataFrameWriter should throw dataPathNotSpecifiedError when path is not specified [spark]

2025-03-11 Thread via GitHub
cloud-fan commented on PR #49928: URL: https://github.com/apache/spark/pull/49928#issuecomment-2705425799 @vrozov can you remove the java test? https://github.com/apache/spark/pull/49928#discussion_r1981021185 -- This is an automated message from the Apache Git Service. To respond to the

Re: [PR] [SPARK-51307][SQL][3.5] locationUri in CatalogStorageFormat shall be decoded for display [spark]

2025-03-11 Thread via GitHub
yaooqinn closed pull request #50164: [SPARK-51307][SQL][3.5] locationUri in CatalogStorageFormat shall be decoded for display URL: https://github.com/apache/spark/pull/50164 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

Re: [PR] [SPARK-19335][SPARK-38200][SQL] Add upserts for writing to JDBC [spark]

2025-03-11 Thread via GitHub
LuciferYang commented on PR #49528: URL: https://github.com/apache/spark/pull/49528#issuecomment-2703076175 cc @beliefer FYI -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] [SPARK-51097][SS] Re-introduce RocksDB state store's last uploaded snapshot version instance metrics [spark]

2025-03-11 Thread via GitHub
ericm-db commented on code in PR #50195: URL: https://github.com/apache/spark/pull/50195#discussion_r1989829264 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDB.scala: ## @@ -1465,6 +1470,7 @@ class RocksDB( log"with uniqueId: ${MDC(LogK

Re: [PR] [SPARK-51359][CORE][SQL] Set INT64 as the default timestamp type for Parquet files [spark]

2025-03-11 Thread via GitHub
HyukjinKwon commented on PR #50215: URL: https://github.com/apache/spark/pull/50215#issuecomment-2715147840 my concern is that this is a breaking change ... We will at least have to update the migration guide -- This is an automated message from the Apache Git Service. To respond to the m

Re: [PR] [SPARK-51318][TESTS] Remove `jar` files from Apache Spark repository and disable affected tests [spark]

2025-03-11 Thread via GitHub
HyukjinKwon commented on PR #50231: URL: https://github.com/apache/spark/pull/50231#issuecomment-2715158826 No, I agree with removing jars but disagree with removing tests. > I assume that you will provide another PR shortly with removed jars and tests being enabled No, I cas

Re: [PR] [SPARK-51318][TESTS] Remove `jar` files from Apache Spark repository and disable affected tests [spark]

2025-03-11 Thread via GitHub
HyukjinKwon commented on code in PR #50231: URL: https://github.com/apache/spark/pull/50231#discussion_r1989810019 ## core/src/test/scala/org/apache/spark/SparkContextSuite.scala: ## @@ -243,7 +243,8 @@ class SparkContextSuite extends SparkFunSuite with LocalSparkContext with E

[PR] [WIP][SQL] Support cast from string to time [spark]

2025-03-11 Thread via GitHub
MaxGekk opened a new pull request, #50236: URL: https://github.com/apache/spark/pull/50236 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How was

Re: [PR] [MINOR][SQL][TESTS] Remove duplicated plan node check in DataFrameSetOperationsSuite [spark]

2025-03-11 Thread via GitHub
viirya closed pull request #50227: [MINOR][SQL][TESTS] Remove duplicated plan node check in DataFrameSetOperationsSuite URL: https://github.com/apache/spark/pull/50227 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the U

Re: [PR] [MINOR][SQL][TESTS] Remove duplicated plan node check in DataFrameSetOperationsSuite [spark]

2025-03-11 Thread via GitHub
viirya commented on PR #50227: URL: https://github.com/apache/spark/pull/50227#issuecomment-2715225395 Thanks @Surbhi-Vijay @HyukjinKwon -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the speci

Re: [PR] [SPARK-51097][SS] Re-introduce RocksDB state store's last uploaded snapshot version instance metrics [spark]

2025-03-11 Thread via GitHub
ericm-db commented on code in PR #50195: URL: https://github.com/apache/spark/pull/50195#discussion_r1989857079 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDB.scala: ## @@ -1465,6 +1470,7 @@ class RocksDB( log"with uniqueId: ${MDC(LogK

Re: [PR] [SPARK-51458] Add GitHub Action job to check ASF license [spark-connect-swift]

2025-03-11 Thread via GitHub
dongjoon-hyun merged PR #2: URL: https://github.com/apache/spark-connect-swift/pull/2 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr..

Re: [PR] [SPARK-51454][SQL] Support cast from time to string [spark]

2025-03-11 Thread via GitHub
dongjoon-hyun closed pull request #50224: [SPARK-51454][SQL] Support cast from time to string URL: https://github.com/apache/spark/pull/50224 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spec

Re: [PR] [SPARK-51408][YARN][TESTS] AmIpFilterSuite#testProxyUpdate fails in some networks [spark]

2025-03-11 Thread via GitHub
cnauroth commented on PR #50173: URL: https://github.com/apache/spark/pull/50173#issuecomment-2704380243 Thank you, @LuciferYang ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific com

Re: [PR] [SPARK-51318][TESTS] Remove `jar` files from Apache Spark repository and disable affected tests [spark]

2025-03-11 Thread via GitHub
vrozov commented on PR #50231: URL: https://github.com/apache/spark/pull/50231#issuecomment-2715399820 > No, I agree with removing jars but disagree with removing tests. > > > I assume that you will provide another PR shortly with removed jars and tests being enabled > > No, I

Re: [PR] [SPARK-51097][SS] Re-introduce RocksDB state store's last uploaded snapshot version instance metrics [spark]

2025-03-11 Thread via GitHub
zecookiez commented on code in PR #50195: URL: https://github.com/apache/spark/pull/50195#discussion_r1989942743 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/statefulOperators.scala: ## @@ -203,20 +203,40 @@ trait StateStoreWriter def operatorStateMet

Re: [PR] [SPARK-51452][UI] Improve Thread dump table search [spark]

2025-03-11 Thread via GitHub
dongjoon-hyun commented on PR #50225: URL: https://github.com/apache/spark/pull/50225#issuecomment-2711437716 I also built and verified manually this PR. Merged to master for Apache Spark 4.1.0. -- This is an automated message from the Apache Git Service. To respond to the message, plea

Re: [PR] [SPARK-51418][SQL] Fix DataSource PARTITON TABLE w/ Hive type incompatible partition columns [spark]

2025-03-11 Thread via GitHub
yaooqinn commented on PR #50182: URL: https://github.com/apache/spark/pull/50182#issuecomment-2709472881 Merged to master and 4.0, thank you again @dongjoon-hyun -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

Re: [PR] Bump golang.org/x/oauth2 from 0.25.0 to 0.27.0 [spark-connect-go]

2025-03-11 Thread via GitHub
dependabot[bot] closed pull request #127: Bump golang.org/x/oauth2 from 0.25.0 to 0.27.0 URL: https://github.com/apache/spark-connect-go/pull/127 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[PR] [MINOR][BUILD]: Fix merge_spark_pr script for no jira case [spark]

2025-03-11 Thread via GitHub
viirya opened a new pull request, #50237: URL: https://github.com/apache/spark/pull/50237 ### What changes were proposed in this pull request? This patch tries to fix the merge script `merge_spark_pr` for no jira case by defining the `asf_jira` variable in the script.

Re: [PR] [SPARK-44856][PYTHON] Improve Python UDTF arrow serializer performance [spark]

2025-03-11 Thread via GitHub
HyukjinKwon commented on PR #50099: URL: https://github.com/apache/spark/pull/50099#issuecomment-2715305664 Let me add a legacy conf ... to be safer .. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[PR] [SPARK-51471][SS] classify the ASSERT error when offset/timestamp in startOffset is larger than the endOffset. [spark]

2025-03-11 Thread via GitHub
huanliwang-db opened a new pull request, #50238: URL: https://github.com/apache/spark/pull/50238 ### What changes were proposed in this pull request? We are throwing out the assertion error now when offset/timestamp in startOffset is larger than the endOffset. This could happ

Re: [PR] [SPARK-50820][SQL] DSv2: Conditional nullification of metadata columns in DML [spark]

2025-03-11 Thread via GitHub
huaxingao commented on code in PR #49493: URL: https://github.com/apache/spark/pull/49493#discussion_r1988353194 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/RewriteRowLevelCommand.scala: ## @@ -132,52 +164,119 @@ trait RewriteRowLevelCommand extends Rul

Re: [PR] [SPARK-49488][SQL][FOLLOWUP] Use correct MySQL datetime functions when pushing down EXTRACT [spark]

2025-03-11 Thread via GitHub
beliefer commented on PR #50112: URL: https://github.com/apache/spark/pull/50112#issuecomment-2703041060 @cloud-fan @dongjoon-hyun Thank you! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the s

Re: [PR] [SPARK-50763][SQL] Add Analyzer rule for resolving SQL table functions [spark]

2025-03-11 Thread via GitHub
cloud-fan commented on code in PR #49471: URL: https://github.com/apache/spark/pull/49471#discussion_r1990231759 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala: ## @@ -1675,6 +1676,86 @@ class SessionCatalog( } } + /** + *

Re: [PR] [SPARK-50464] Support unsigned integer for Arrow [spark]

2025-03-11 Thread via GitHub
miscenko commented on PR #49022: URL: https://github.com/apache/spark/pull/49022#issuecomment-2716153988 > > Hi, I am curious, what is the status of this PR? Was it abandoned ? > > it seems that nobody is interested in reviewing this pr. ಥ_ಥ It's a pity, I think this is a very u

Re: [PR] [SPARK-51446][SQL] Improve the codecNameMap for the compression codec [spark]

2025-03-11 Thread via GitHub
beliefer closed pull request #50221: [SPARK-51446][SQL] Improve the codecNameMap for the compression codec URL: https://github.com/apache/spark/pull/50221 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to g

Re: [PR] [SPARK-XXXXX][SQL] Don't insert redundant ColumnarToRowExec [spark]

2025-03-11 Thread via GitHub
viirya commented on PR #50239: URL: https://github.com/apache/spark/pull/50239#issuecomment-2715607848 I will add JIRA later. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

  1   2   >