Re: [PR] [SPARK-51490] Support `iOS`, `watchOS`, and `tvOS` [spark-connect-swift]

2025-03-12 Thread via GitHub
dongjoon-hyun commented on PR #13: URL: https://github.com/apache/spark-connect-swift/pull/13#issuecomment-2718747123 Thank you so much, @huaxingao ! Merged to main -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

Re: [PR] [SPARK-51484][SS] Remove unused function `private def newDFSFileName(String)` from `RocksDBFileManager` [spark]

2025-03-12 Thread via GitHub
dongjoon-hyun closed pull request #50249: [SPARK-51484][SS] Remove unused function `private def newDFSFileName(String)` from `RocksDBFileManager` URL: https://github.com/apache/spark/pull/50249 -- This is an automated message from the Apache Git Service. To respond to the message, please log

Re: [PR] [SPARK-51488][SQL] Support the TIME keyword as a data type [spark]

2025-03-12 Thread via GitHub
dongjoon-hyun commented on PR #50250: URL: https://github.com/apache/spark/pull/50250#issuecomment-2718780657 Merged to master. Thank you, @MaxGekk . -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [PR] [SPARK-51466][SQL][HIVE] Eliminate Hive built-in UDFs initialization on Hive UDF evaluation [spark]

2025-03-12 Thread via GitHub
dongjoon-hyun closed pull request #50232: [SPARK-51466][SQL][HIVE] Eliminate Hive built-in UDFs initialization on Hive UDF evaluation URL: https://github.com/apache/spark/pull/50232 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

Re: [PR] [SPARK-51466][SQL][HIVE] Eliminate Hive built-in UDFs initialization on Hive UDF evaluation [spark]

2025-03-12 Thread via GitHub
dongjoon-hyun commented on PR #50232: URL: https://github.com/apache/spark/pull/50232#issuecomment-2718785462 Thank you, @pan3793 and @LuciferYang . Merged to master. Could you make a backporting PR to branch-4.0 to pass CI there once more, @pan3793 ? -- This is an automated messag

Re: [PR] [SPARK-51487][PYTHON][INFRA] Refresh testing images for pyarrow 19 [spark]

2025-03-12 Thread via GitHub
LuciferYang commented on PR #50255: URL: https://github.com/apache/spark/pull/50255#issuecomment-2718534144 Will `dev/create-release/spark-rm/Dockerfile` be fixed in a separate pull request? -- This is an automated message from the Apache Git Service. To respond to the message

Re: [PR] [SPARK-43221][CORE] Host local block fetching should use a block status of a block stored on disk [spark]

2025-03-12 Thread via GitHub
dongjoon-hyun closed pull request #50122: [SPARK-43221][CORE] Host local block fetching should use a block status of a block stored on disk URL: https://github.com/apache/spark/pull/50122 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

Re: [PR] [SPARK-43221][CORE] Host local block fetching should use a block status of a block stored on disk [spark]

2025-03-12 Thread via GitHub
dongjoon-hyun commented on PR #50122: URL: https://github.com/apache/spark/pull/50122#issuecomment-2718823408 Thank you, @attilapiros and @Ngone51 . BTW, could you make two backporting PRs to branch-4.0 and branch-3.5 in order to make it sure to pass all CIs there, @attilapiros ? --

Re: [PR] [SPARK-51490] Support `iOS`, `watchOS`, and `tvOS` [spark-connect-swift]

2025-03-12 Thread via GitHub
dongjoon-hyun closed pull request #13: [SPARK-51490] Support `iOS`, `watchOS`, and `tvOS` URL: https://github.com/apache/spark-connect-swift/pull/13 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to t

Re: [PR] [SPARK-51488][SQL] Support the TIME keyword as a data type [spark]

2025-03-12 Thread via GitHub
dongjoon-hyun closed pull request #50250: [SPARK-51488][SQL] Support the TIME keyword as a data type URL: https://github.com/apache/spark/pull/50250 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to t

Re: [PR] [SPARK-43221][CORE] Host local block fetching should use a block status of a block stored on disk [spark]

2025-03-12 Thread via GitHub
dongjoon-hyun commented on PR #50122: URL: https://github.com/apache/spark/pull/50122#issuecomment-2718824584 Also, cc @mridulm , too -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [WIP][SPARK-51348][BUILD][SQL] Upgrade Hive to 4.0 [spark]

2025-03-12 Thread via GitHub
deniskuzZ commented on code in PR #50213: URL: https://github.com/apache/spark/pull/50213#discussion_r1991644845 ## sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveQuerySuite.scala: ## @@ -1210,7 +1210,7 @@ class HiveQuerySuite extends HiveComparisonTest with SQ

[PR] [SPARK-51493] Refine `merge_spark_pr.py` to use `connect-swift-x.y.z`… [spark-connect-swift]

2025-03-12 Thread via GitHub
dongjoon-hyun opened a new pull request, #14: URL: https://github.com/apache/spark-connect-swift/pull/14 … version ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing cha

[PR] [SPARK-51492][SS]FileStreamSource: Avoid expensive file concatenation if not needed [spark]

2025-03-12 Thread via GitHub
siying opened a new pull request, #50257: URL: https://github.com/apache/spark/pull/50257 ### What changes were proposed in this pull request? In two places where constructing a log line can be very or a little bit expensive, avoid logTrace() call as a whole. ### Why are the ch

Re: [PR] [WIP][SPARK-51348][BUILD][SQL] Upgrade Hive to 4.0 [spark]

2025-03-12 Thread via GitHub
deniskuzZ commented on code in PR #50213: URL: https://github.com/apache/spark/pull/50213#discussion_r1991644845 ## sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveQuerySuite.scala: ## @@ -1210,7 +1210,7 @@ class HiveQuerySuite extends HiveComparisonTest with SQ

Re: [PR] [SPARK-51493] Refine `merge_spark_pr.py` to use `connect-swift-x.y.z` version [spark-connect-swift]

2025-03-12 Thread via GitHub
dongjoon-hyun commented on PR #14: URL: https://github.com/apache/spark-connect-swift/pull/14#issuecomment-2718898426 Could you review this PR when you have some time, @attilapiros ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to G

[PR] [SPARK-51487][PYTHON][INFRA] Refresh testing images for pyarrow 19 [spark]

2025-03-12 Thread via GitHub
zhengruifeng opened a new pull request, #50255: URL: https://github.com/apache/spark/pull/50255 ### What changes were proposed in this pull request? Refresh testing images for pyarrow 19 ### Why are the changes needed? to test against the latest pyarrow ### Doe

Re: [PR] [WIP][SPARK-51348][BUILD][SQL] Upgrade Hive to 4.0 [spark]

2025-03-12 Thread via GitHub
deniskuzZ commented on code in PR #50213: URL: https://github.com/apache/spark/pull/50213#discussion_r1991860382 ## sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala: ## @@ -885,20 +884,6 @@ private[hive] class HiveClientImpl( // Since HIVE-182

Re: [PR] [SPARK-51338][INFRA] Add automated CI build for `connect-examples` [spark]

2025-03-12 Thread via GitHub
LuciferYang commented on code in PR #50187: URL: https://github.com/apache/spark/pull/50187#discussion_r1991901327 ## connect-examples/server-library-example/pom.xml: ## @@ -36,7 +36,8 @@ UTF-8 2.13 2.13.15 -3.25.4 -4.0.0-preview2 +4.29.3 +4.1.0-SN

Re: [PR] [MINOR][PYTHON] Generate with a newline at the end of error-conditions.json [spark]

2025-03-12 Thread via GitHub
HyukjinKwon closed pull request #50254: [MINOR][PYTHON] Generate with a newline at the end of error-conditions.json URL: https://github.com/apache/spark/pull/50254 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL a

[PR] [MINOR] Add compatibility badges [spark-connect-swift]

2025-03-12 Thread via GitHub
dongjoon-hyun opened a new pull request, #12: URL: https://github.com/apache/spark-connect-swift/pull/12 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change?

Re: [PR] [MINOR][PYTHON] Generate with a newline at the end of error-conditions.json [spark]

2025-03-12 Thread via GitHub
HyukjinKwon commented on PR #50254: URL: https://github.com/apache/spark/pull/50254#issuecomment-2718496912 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] [SPARK-48922][SQL] Avoid redundant array transform of identical expression for map type [spark]

2025-03-12 Thread via GitHub
wForget commented on PR #50245: URL: https://github.com/apache/spark/pull/50245#issuecomment-2716878368 @viirya @dongjoon-hyun @kazuyukitanimura could you please take a look? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub an

Re: [PR] [SPARK-51487][PYTHON][INFRA] Refresh testing images for pyarrow 19 [spark]

2025-03-12 Thread via GitHub
LuciferYang commented on code in PR #50255: URL: https://github.com/apache/spark/pull/50255#discussion_r1991919622 ## dev/spark-test-image/python-313/Dockerfile: ## @@ -67,7 +67,7 @@ RUN apt-get update && apt-get install -y \ && rm -rf /var/lib/apt/lists/* -ARG BASIC_PI

Re: [PR] [SPARK-51472] Add gRPC `SparkConnectClient` actor [spark-connect-swift]

2025-03-12 Thread via GitHub
dongjoon-hyun closed pull request #7: [SPARK-51472] Add gRPC `SparkConnectClient` actor URL: https://github.com/apache/spark-connect-swift/pull/7 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] [SPARK-51481] Add `RuntimeConf` actor [spark-connect-swift]

2025-03-12 Thread via GitHub
dongjoon-hyun commented on PR #9: URL: https://github.com/apache/spark-connect-swift/pull/9#issuecomment-2716593313 Please wait for a while until I finish the first migration~ `4.1.0` will be replaced properly in a week. -- This is an automated message from the Apache Git Service. To res

Re: [PR] [MINOR] Add compatibility badges [spark-connect-swift]

2025-03-12 Thread via GitHub
dongjoon-hyun commented on PR #12: URL: https://github.com/apache/spark-connect-swift/pull/12#issuecomment-2718610578 Thank you! Merged to main. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to t

Re: [PR] [MINOR] Add compatibility badges [spark-connect-swift]

2025-03-12 Thread via GitHub
dongjoon-hyun closed pull request #12: [MINOR] Add compatibility badges URL: https://github.com/apache/spark-connect-swift/pull/12 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment

Re: [PR] [SPARK-51350][SQL] Implement Show Procedures [spark]

2025-03-12 Thread via GitHub
zhengruifeng commented on code in PR #50109: URL: https://github.com/apache/spark/pull/50109#discussion_r1991972859 ## sql/core/src/main/scala/org/apache/spark/sql/execution/command/ShowProceduresCommand.scala: ## @@ -0,0 +1,54 @@ +/* + * Licensed to the Apache Software Foundati

Re: [PR] [SPARK-51481] Add `RuntimeConf` actor [spark-connect-swift]

2025-03-12 Thread via GitHub
dongjoon-hyun commented on PR #9: URL: https://github.com/apache/spark-connect-swift/pull/9#issuecomment-2718880137 To @yaooqinn , I applied new `version` scheme for all. > By the way, can we unbind the version tracker of this project to the Spark main repo? It might be inconvenient for

[PR] [SPARK-51491][PYTHON] Simplify boxplot with subquery APIs [spark]

2025-03-12 Thread via GitHub
zhengruifeng opened a new pull request, #50258: URL: https://github.com/apache/spark/pull/50258 ### What changes were proposed in this pull request? Simplify boxplot with subquery APIs ### Why are the changes needed? 1, to make the code simple; 2, according to my exper

Re: [PR] [SPARK-51473][ML][CONNECT] ML transformed dataframe keep a reference to the model [spark]

2025-03-12 Thread via GitHub
zhengruifeng commented on code in PR #50199: URL: https://github.com/apache/spark/pull/50199#discussion_r1992191679 ## python/pyspark/ml/util.py: ## @@ -185,29 +185,40 @@ def wrapped(self: "JavaWrapper", dataset: "ConnectDataFrame") -> Any: assert isinstance(

Re: [PR] [SPARK-51484][SS] Remove unused function `private def newDFSFileName(String)` from `RocksDBFileManager` [spark]

2025-03-12 Thread via GitHub
dongjoon-hyun commented on PR #50249: URL: https://github.com/apache/spark/pull/50249#issuecomment-2718779093 Merged to master. Thank you, @LuciferYang and @beliefer . -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use th

Re: [PR] [SPARK-51493] Refine `merge_spark_pr.py` to use `connect-swift-x.y.z` version [spark-connect-swift]

2025-03-12 Thread via GitHub
attilapiros commented on PR #14: URL: https://github.com/apache/spark-connect-swift/pull/14#issuecomment-2718970521 I see this connect-swift-x.y.z is already used in the tickets: https://github.com/user-attachments/assets/6f7088a6-4f08-4a2a-b44e-d7d30be79eff"; /> -- This is an aut

Re: [PR] [SPARK-51474][SQL] Don't insert redundant ColumnarToRowExec for node supporting both columnar and row output [spark]

2025-03-12 Thread via GitHub
viirya closed pull request #50239: [SPARK-51474][SQL] Don't insert redundant ColumnarToRowExec for node supporting both columnar and row output URL: https://github.com/apache/spark/pull/50239 -- This is an automated message from the Apache Git Service. To respond to the message, please log on

Re: [PR] [SPARK-51474][SQL] Don't insert redundant ColumnarToRowExec for node supporting both columnar and row output [spark]

2025-03-12 Thread via GitHub
viirya commented on PR #50239: URL: https://github.com/apache/spark/pull/50239#issuecomment-2717957833 Thank you @dongjoon-hyun -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comme

Re: [PR] [SPARK-51482][SQL] Support cast from string to time [spark]

2025-03-12 Thread via GitHub
LuciferYang commented on code in PR #50236: URL: https://github.com/apache/spark/pull/50236#discussion_r1991323757 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala: ## @@ -1337,6 +1350,35 @@ case class Cast( } } + private[this] def c

Re: [PR] [SPARK-50005][SQL] Enhance method verifyNotReadPath to identify subqueries hidden in the filter conditions. [spark]

2025-03-12 Thread via GitHub
xunxunmimi5577 commented on PR #48640: URL: https://github.com/apache/spark/pull/48640#issuecomment-2717698592 @panbingkun Could you please help review this? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abov

[PR] [WIP][SQL] Support the TIME keyword as a data type [spark]

2025-03-12 Thread via GitHub
MaxGekk opened a new pull request, #50250: URL: https://github.com/apache/spark/pull/50250 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How was

Re: [PR] [SPARK-51482][SQL] Support cast from string to time [spark]

2025-03-12 Thread via GitHub
MaxGekk commented on PR #50236: URL: https://github.com/apache/spark/pull/50236#issuecomment-2718536341 Merging to master. Thank you, @LuciferYang @beliefer for review. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use t

Re: [PR] [SPARK-51487][PYTHON][INFRA] Refresh testing images for pyarrow 19 [spark]

2025-03-12 Thread via GitHub
zhengruifeng commented on code in PR #50255: URL: https://github.com/apache/spark/pull/50255#discussion_r1991952614 ## dev/spark-test-image/python-313/Dockerfile: ## @@ -67,7 +67,7 @@ RUN apt-get update && apt-get install -y \ && rm -rf /var/lib/apt/lists/* -ARG BASIC_P

Re: [PR] [SPARK-51487][PYTHON][INFRA] Refresh testing images for pyarrow 19 [spark]

2025-03-12 Thread via GitHub
zhengruifeng commented on PR #50255: URL: https://github.com/apache/spark/pull/50255#issuecomment-2718596875 > Will `dev/create-release/spark-rm/Dockerfile` be fixed in a separate pull request? I think we can update/sync dev/create-release/spark-rm/Dockerfile just before each release

Re: [PR] [SPARK-51487][PYTHON][INFRA] Refresh testing images for pyarrow 19 [spark]

2025-03-12 Thread via GitHub
LuciferYang commented on code in PR #50255: URL: https://github.com/apache/spark/pull/50255#discussion_r1991916210 ## dev/spark-test-image/python-313/Dockerfile: ## @@ -67,7 +67,7 @@ RUN apt-get update && apt-get install -y \ && rm -rf /var/lib/apt/lists/* -ARG BASIC_PI

Re: [PR] [WIP][SPARK-51348][BUILD][SQL] Upgrade Hive to 4.0 [spark]

2025-03-12 Thread via GitHub
simhadri-g commented on code in PR #50213: URL: https://github.com/apache/spark/pull/50213#discussion_r1991865569 ## sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala: ## @@ -885,20 +884,6 @@ private[hive] class HiveClientImpl( // Since HIVE-18

Re: [PR] [SPARK-51487][PYTHON][INFRA] Refresh testing images for pyarrow 19 [spark]

2025-03-12 Thread via GitHub
LuciferYang commented on code in PR #50255: URL: https://github.com/apache/spark/pull/50255#discussion_r1991919622 ## dev/spark-test-image/python-313/Dockerfile: ## @@ -67,7 +67,7 @@ RUN apt-get update && apt-get install -y \ && rm -rf /var/lib/apt/lists/* -ARG BASIC_PI

Re: [PR] [SPARK-51097][SS] Re-introduce RocksDB state store's last uploaded snapshot version instance metrics [spark]

2025-03-12 Thread via GitHub
zecookiez commented on PR #50195: URL: https://github.com/apache/spark/pull/50195#issuecomment-2718664791 @HeartSaVioR can you take a quick look at this? Thanks :) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the UR

Re: [PR] [WIP] [SPARK-51097] [SS] Split apart SparkPlan metrics and instance metrics [spark]

2025-03-12 Thread via GitHub
zecookiez closed pull request #50157: [WIP] [SPARK-51097] [SS] Split apart SparkPlan metrics and instance metrics URL: https://github.com/apache/spark/pull/50157 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abo

Re: [PR] [MINOR][SQL] Reuse `dateTimeUtilsCls` in `Cast` [spark]

2025-03-12 Thread via GitHub
LuciferYang commented on PR #50251: URL: https://github.com/apache/spark/pull/50251#issuecomment-2718508143 ![image](https://github.com/user-attachments/assets/c86b11c4-1994-4e66-be06-1481a1419e97) all passed -- This is an automated message from the Apache Git Service. To respond to th

Re: [PR] [MINOR][SQL] Reuse `dateTimeUtilsCls` in `Cast` [spark]

2025-03-12 Thread via GitHub
LuciferYang closed pull request #50251: [MINOR][SQL] Reuse `dateTimeUtilsCls` in `Cast` URL: https://github.com/apache/spark/pull/50251 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific c

Re: [PR] [SPARK-51482][SQL] Support cast from string to time [spark]

2025-03-12 Thread via GitHub
MaxGekk closed pull request #50236: [SPARK-51482][SQL] Support cast from string to time URL: https://github.com/apache/spark/pull/50236 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific c

[PR] [SPARK-51489][SQL] Represent SQL Script in Spark UI [spark]

2025-03-12 Thread via GitHub
dusantism-db opened a new pull request, #50256: URL: https://github.com/apache/spark/pull/50256 ### What changes were proposed in this pull request? Initial representation of SQL Scripts in the Spark UI. This PR introduces a `"SQL Script ID"` column in the All Executions table in

[PR] [SPARK-51490] Support iOS, watchOS, and tvOS [spark-connect-swift]

2025-03-12 Thread via GitHub
dongjoon-hyun opened a new pull request, #13: URL: https://github.com/apache/spark-connect-swift/pull/13 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change?

Re: [PR] [SPARK-51466][SQL][HIVE] Eliminate Hive built-in UDFs initialization on Hive UDF evaluation [spark]

2025-03-12 Thread via GitHub
LuciferYang commented on code in PR #50232: URL: https://github.com/apache/spark/pull/50232#discussion_r1990820088 ## sql/hive/src/main/java/org/apache/hadoop/hive/ql/exec/HiveFunctionRegistryUtils.java: ## @@ -0,0 +1,342 @@ +/* + * Licensed to the Apache Software Foundation (AS

Re: [PR] [SPARK-51483] Add `SparkSession` and `DataFrame` actors [spark-connect-swift]

2025-03-12 Thread via GitHub
dongjoon-hyun commented on PR #10: URL: https://github.com/apache/spark-connect-swift/pull/10#issuecomment-2716686412 According to the review comment, I didn't mention `4.0.0 RC2` in `README.md` yet. - https://github.com/apache/spark-connect-swift/pull/4#discussion_r1988302867 For

Re: [PR] [SPARK-51469][SQL] Improve MapKeyDedupPolicy so that avoid calling toString [spark]

2025-03-12 Thread via GitHub
beliefer commented on PR #50235: URL: https://github.com/apache/spark/pull/50235#issuecomment-2716796972 ping @dongjoon-hyun @yaooqinn @LuciferYang -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go t

Re: [PR] [SPARK-51493] Refine `merge_spark_pr.py` to use `connect-swift-x.y.z` version [spark-connect-swift]

2025-03-12 Thread via GitHub
dongjoon-hyun commented on PR #14: URL: https://github.com/apache/spark-connect-swift/pull/14#issuecomment-2719042611 Thank you, @attilapiros . This PR aims to filter when it chooses `Fixed Version`. -- This is an automated message from the Apache Git Service. To respond to the me

Re: [PR] [SPARK-51358] [SS] Introduce snapshot upload lag detection through StateStoreCoordinator [spark]

2025-03-12 Thread via GitHub
zecookiez commented on code in PR #50123: URL: https://github.com/apache/spark/pull/50123#discussion_r1992241770 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDBStateStoreProvider.scala: ## @@ -38,9 +38,20 @@ import org.apache.spark.sql.types.Str

Re: [PR] [SPARK-51493] Refine `merge_spark_pr.py` to use `connect-swift-x.y.z` version [spark-connect-swift]

2025-03-12 Thread via GitHub
attilapiros commented on PR #14: URL: https://github.com/apache/spark-connect-swift/pull/14#issuecomment-2718979814 Let me do a quick test.. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the s

Re: [PR] [SPARK-51487][PYTHON][INFRA] Refresh testing images for pyarrow 19 [spark]

2025-03-12 Thread via GitHub
dongjoon-hyun commented on PR #50255: URL: https://github.com/apache/spark/pull/50255#issuecomment-2719073678 I switched the issue type from `Improvement` to `Test`. Feel free to switch back if it looks improper to you, @zhengruifeng . For the code wise, I'm +1 if this doesn't cause a

Re: [PR] [SPARK-51493] Refine `merge_spark_pr.py` to use `connect-swift-x.y.z` version [spark-connect-swift]

2025-03-12 Thread via GitHub
dongjoon-hyun commented on PR #14: URL: https://github.com/apache/spark-connect-swift/pull/14#issuecomment-2719045920 To test, you can try to merge this PR (SPARK-51493) with this script. https://github.com/user-attachments/assets/eed46cc9-cbea-44ed-909a-10d640e3a678"; /> -- This

Re: [PR] [SPARK-51473][ML][CONNECT] ML transformed dataframe keep a reference to the model [spark]

2025-03-12 Thread via GitHub
zhengruifeng commented on code in PR #50199: URL: https://github.com/apache/spark/pull/50199#discussion_r1992274916 ## python/pyspark/ml/util.py: ## @@ -185,29 +185,40 @@ def wrapped(self: "JavaWrapper", dataset: "ConnectDataFrame") -> Any: assert isinstance(

Re: [PR] [SPARK-43221][CORE] Host local block fetching should use a block status of a block stored on disk [spark]

2025-03-12 Thread via GitHub
attilapiros commented on PR #50122: URL: https://github.com/apache/spark/pull/50122#issuecomment-2719130827 Backport to branch-4.0: https://github.com/apache/spark/pull/50259 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

Re: [PR] [SPARK-43221][CORE] Host local block fetching should use a block status of a block stored on disk [spark]

2025-03-12 Thread via GitHub
attilapiros commented on PR #50122: URL: https://github.com/apache/spark/pull/50122#issuecomment-2719136015 Backport to branch-3.5: https://github.com/apache/spark/pull/50260 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[PR] [SPARK-43221][CORE][4.0] Host local block fetching should use a block status of a block stored on disk [spark]

2025-03-12 Thread via GitHub
attilapiros opened a new pull request, #50259: URL: https://github.com/apache/spark/pull/50259 **This is a backport to branch-4.0 from master.** Thanks for @yorksity who reported this error and even provided a PR for it. This solution very different from https://github.com/apache/s

Re: [PR] [SPARK-51493] Refine `merge_spark_pr.py` to use `connect-swift-x.y.z` version [spark-connect-swift]

2025-03-12 Thread via GitHub
dongjoon-hyun commented on PR #14: URL: https://github.com/apache/spark-connect-swift/pull/14#issuecomment-2719150536 Thank you so much, @attilapiros ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

Re: [PR] [WIP][SPARK-51348][BUILD][SQL] Upgrade Hive to 4.0 [spark]

2025-03-12 Thread via GitHub
simhadri-g commented on code in PR #50213: URL: https://github.com/apache/spark/pull/50213#discussion_r1990961751 ## sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala: ## @@ -885,20 +884,6 @@ private[hive] class HiveClientImpl( // Since HIVE-18

Re: [PR] [SPARK-43221][CORE] Host local block fetching should use a block status of a block stored on disk [spark]

2025-03-12 Thread via GitHub
Ngone51 commented on code in PR #50122: URL: https://github.com/apache/spark/pull/50122#discussion_r1990890014 ## core/src/main/scala/org/apache/spark/storage/BlockManagerMasterEndpoint.scala: ## @@ -862,31 +862,50 @@ class BlockManagerMasterEndpoint( private def getLocations

Re: [PR] [SPARK-51466][SQL][HIVE] Eliminate Hive built-in UDFs initialization on Hive UDF evaluation [spark]

2025-03-12 Thread via GitHub
pan3793 commented on code in PR #50232: URL: https://github.com/apache/spark/pull/50232#discussion_r1990737841 ## sql/hive-thriftserver/pom.xml: ## @@ -148,16 +148,6 @@ byte-buddy-agent test - Review Comment: Done by updating the `SparkBuild.scala`, i

Re: [PR] [SPARK-51365][TESTS] Test maven + macos [spark]

2025-03-12 Thread via GitHub
LuciferYang closed pull request #50178: [SPARK-51365][TESTS] Test maven + macos URL: https://github.com/apache/spark/pull/50178 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] [SPARK-48922][SQL] Avoid redundant array transform of identical expression for map type [spark]

2025-03-12 Thread via GitHub
beliefer commented on code in PR #50245: URL: https://github.com/apache/spark/pull/50245#discussion_r1991031211 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TableOutputResolver.scala: ## @@ -459,11 +459,28 @@ object TableOutputResolver extends SQLConfHel

Re: [PR] [SPARK-48922][SQL] Avoid redundant array transform of identical expression for map type [spark]

2025-03-12 Thread via GitHub
beliefer commented on code in PR #50245: URL: https://github.com/apache/spark/pull/50245#discussion_r1990965674 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TableOutputResolver.scala: ## @@ -459,11 +459,28 @@ object TableOutputResolver extends SQLConfHel

Re: [PR] [SPARK-49485][CORE] Fix speculative task hang bug due to remaining executor lay on same host [spark]

2025-03-12 Thread via GitHub
buska88 commented on code in PR #47949: URL: https://github.com/apache/spark/pull/47949#discussion_r1990965346 ## core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala: ## @@ -308,13 +328,59 @@ private[spark] class ExecutorAllocationManager( tasksPerExecuto

Re: [PR] [SPARK-48922][SQL] Avoid redundant array transform of identical expression for map type [spark]

2025-03-12 Thread via GitHub
wForget commented on code in PR #50245: URL: https://github.com/apache/spark/pull/50245#discussion_r1990969615 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TableOutputResolver.scala: ## @@ -459,11 +459,28 @@ object TableOutputResolver extends SQLConfHelp

[PR] [SPARK-51484][SS] Remove unused function `private def newDFSFileName(String)` from `RocksDBFileManager` [spark]

2025-03-12 Thread via GitHub
LuciferYang opened a new pull request, #50249: URL: https://github.com/apache/spark/pull/50249 ### What changes were proposed in this pull request? This pr aims to remove unused function `private def newDFSFileName(String)` from `RocksDBFileManager`, It is no longer used after SPARK-49770

Re: [PR] [SPARK-51482][SQL] Support cast from string to time [spark]

2025-03-12 Thread via GitHub
MaxGekk commented on PR #50236: URL: https://github.com/apache/spark/pull/50236#issuecomment-2717411710 @yaooqinn @LuciferYang @dongjoon-hyun @HyukjinKwon Could you take a look at the PR when you have time, please. -- This is an automated message from the Apache Git Service. To respond to

Re: [PR] [SPARK-51482][SQL] Support cast from string to time [spark]

2025-03-12 Thread via GitHub
MaxGekk commented on code in PR #50236: URL: https://github.com/apache/spark/pull/50236#discussion_r1991652407 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala: ## @@ -1337,6 +1350,35 @@ case class Cast( } } + private[this] def castT

Re: [PR] [MINOR][PYTHON] Generate with a newline at the end of error-conditions.json [spark]

2025-03-12 Thread via GitHub
HyukjinKwon commented on PR #50254: URL: https://github.com/apache/spark/pull/50254#issuecomment-2718324059 cc @LuciferYang -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] [SPARK-51338][INFRA] Add automated CI build for `connect-examples` [spark]

2025-03-12 Thread via GitHub
vicennial commented on code in PR #50187: URL: https://github.com/apache/spark/pull/50187#discussion_r1991794909 ## connect-examples/server-library-example/pom.xml: ## @@ -36,7 +36,8 @@ UTF-8 2.13 2.13.15 -3.25.4 -4.0.0-preview2 +4.29.3 +4.1.0-SNAP

Re: [PR] [WIP][SPARK-51348][BUILD][SQL] Upgrade Hive to 4.0 [spark]

2025-03-12 Thread via GitHub
simhadri-g commented on code in PR #50213: URL: https://github.com/apache/spark/pull/50213#discussion_r1990961751 ## sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala: ## @@ -885,20 +884,6 @@ private[hive] class HiveClientImpl( // Since HIVE-18

[PR] [MINOR][PYTHON] Generate with a newline at the end of error-conditions.json [spark]

2025-03-12 Thread via GitHub
HyukjinKwon opened a new pull request, #50254: URL: https://github.com/apache/spark/pull/50254 ### What changes were proposed in this pull request? This PR proposes to generate with a newline at the end of `error-conditions.json`. ### Why are the changes needed? To make

Re: [PR] [MINOR][DOC] Improve the documentation for configuration. [spark]

2025-03-12 Thread via GitHub
0xbadidea closed pull request #50247: [MINOR][DOC] Improve the documentation for configuration. URL: https://github.com/apache/spark/pull/50247 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the sp

Re: [PR] [WIP][SPARK-51348][BUILD][SQL] Upgrade Hive to 4.0 [spark]

2025-03-12 Thread via GitHub
deniskuzZ commented on code in PR #50213: URL: https://github.com/apache/spark/pull/50213#discussion_r1991644845 ## sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveQuerySuite.scala: ## @@ -1210,7 +1210,7 @@ class HiveQuerySuite extends HiveComparisonTest with SQ

Re: [PR] [SPARK-50188][CONNECT][PYTHON] When the connect client starts, print the server's webUrl [spark]

2025-03-12 Thread via GitHub
panbingkun commented on PR #48720: URL: https://github.com/apache/spark/pull/48720#issuecomment-2716707617 > +1 for the idea. Could you rebase this once more, @panbingkun ? Sorry, I have been struggling with a new job recently, and it has been updated! Thanks. -- This is an au

[PR] [MINOR][SQL] Reuse `dateTimeUtilsCls` in `Cast` [spark]

2025-03-12 Thread via GitHub
MaxGekk opened a new pull request, #50251: URL: https://github.com/apache/spark/pull/50251 ### What changes were proposed in this pull request? In the PR, I propose to re-use the method `dateTimeUtilsCls()` instead of `org.apache.spark.sql.catalyst.util.DateTimeUtils` in the `Cast` expres

[PR] [SPARK-51485] Add `How to use in your apps` section to `README.md` [spark-connect-swift]

2025-03-12 Thread via GitHub
dongjoon-hyun opened a new pull request, #11: URL: https://github.com/apache/spark-connect-swift/pull/11 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change?

[PR] [SPARK-51441][SQL] Add DSv2 APIs for constraints [spark]

2025-03-12 Thread via GitHub
aokolnychyi opened a new pull request, #50253: URL: https://github.com/apache/spark/pull/50253 ### What changes were proposed in this pull request? This PR adds DSv2 APIs for constraints as per SPIP [doc](https://docs.google.com/document/d/1EHjB4W1LjiXxsK_G7067j9pPX0y15LUF

Re: [PR] [MINOR][PYTHON] Reformat error classes [spark]

2025-03-12 Thread via GitHub
HyukjinKwon commented on PR #50241: URL: https://github.com/apache/spark/pull/50241#issuecomment-2718308691 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] [MINOR][PYTHON] Reformat error classes [spark]

2025-03-12 Thread via GitHub
HyukjinKwon commented on code in PR #50241: URL: https://github.com/apache/spark/pull/50241#discussion_r1991779317 ## python/pyspark/errors/error-conditions.json: ## @@ -1208,4 +1208,4 @@ "Index must be non-zero." ] } -} +} Review Comment: it is a generated fi

Re: [PR] [MINOR][PYTHON] Reformat error classes [spark]

2025-03-12 Thread via GitHub
HyukjinKwon closed pull request #50241: [MINOR][PYTHON] Reformat error classes URL: https://github.com/apache/spark/pull/50241 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. T

[PR] [SPARK-51271][PYTHON] Filter serialization for Python Data Source filter pushdown [spark]

2025-03-12 Thread via GitHub
wengh opened a new pull request, #50252: URL: https://github.com/apache/spark/pull/50252 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How was t

Re: [PR] [SPARK-51476][PYTHON][DOCS] Update document type conversion for Pandas UDFs (pyarrow 17.0.0, pandas 2.2.3, Python 3.11) [spark]

2025-03-12 Thread via GitHub
HyukjinKwon closed pull request #50242: [SPARK-51476][PYTHON][DOCS] Update document type conversion for Pandas UDFs (pyarrow 17.0.0, pandas 2.2.3, Python 3.11) URL: https://github.com/apache/spark/pull/50242 -- This is an automated message from the Apache Git Service. To respond to the messa

Re: [PR] [SPARK-43221][CORE] Host local block fetching should use a block status of a block stored on disk [spark]

2025-03-12 Thread via GitHub
Ngone51 commented on code in PR #50122: URL: https://github.com/apache/spark/pull/50122#discussion_r1990890014 ## core/src/main/scala/org/apache/spark/storage/BlockManagerMasterEndpoint.scala: ## @@ -862,31 +862,50 @@ class BlockManagerMasterEndpoint( private def getLocations

Re: [PR] [SPARK-51467][UI] Make tables of the environment page filterable [spark]

2025-03-12 Thread via GitHub
yaooqinn commented on PR #50233: URL: https://github.com/apache/spark/pull/50233#issuecomment-2716542978 > The overloaded action issues might be considered later in an independent JIRA (if needed later). Thank you @dongjoon-hyun. For potential UX improvements, I'd leave it for

Re: [PR] [SPARK-51485] Add `How to use in your apps` section to `README.md` [spark-connect-swift]

2025-03-12 Thread via GitHub
dongjoon-hyun commented on PR #11: URL: https://github.com/apache/spark-connect-swift/pull/11#issuecomment-2718378823 Thank you, @viirya . Merged to main. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abo

Re: [PR] [WIP][SPARK-51348][BUILD][SQL] Upgrade Hive to 4.0 [spark]

2025-03-12 Thread via GitHub
vrozov commented on code in PR #50213: URL: https://github.com/apache/spark/pull/50213#discussion_r1991823559 ## sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala: ## @@ -885,20 +884,6 @@ private[hive] class HiveClientImpl( // Since HIVE-18238(

Re: [PR] [SPARK-51485] Add `How to use in your apps` section to `README.md` [spark-connect-swift]

2025-03-12 Thread via GitHub
dongjoon-hyun closed pull request #11: [SPARK-51485] Add `How to use in your apps` section to `README.md` URL: https://github.com/apache/spark-connect-swift/pull/11 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

Re: [PR] [SPARK-51476][PYTHON][DOCS] Update document type conversion for Pandas UDFs (pyarrow 17.0.0, pandas 2.2.3, Python 3.11) [spark]

2025-03-12 Thread via GitHub
HyukjinKwon commented on PR #50242: URL: https://github.com/apache/spark/pull/50242#issuecomment-2718310865 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] [SPARK-51493] Refine `merge_spark_pr.py` to use `connect-swift-x.y.z` version [spark-connect-swift]

2025-03-12 Thread via GitHub
dongjoon-hyun commented on PR #14: URL: https://github.com/apache/spark-connect-swift/pull/14#issuecomment-2719154602 Merged to main~ (I also used this new script to merge this PR.) ``` ... Would you like to update an associated JIRA? (y/N): y Enter a JIRA id [SPARK-51493]:

Re: [PR] [SPARK-51358] [SS] Introduce snapshot upload lag detection through StateStoreCoordinator [spark]

2025-03-12 Thread via GitHub
zecookiez commented on code in PR #50123: URL: https://github.com/apache/spark/pull/50123#discussion_r1992391021 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDB.scala: ## @@ -65,6 +65,7 @@ case object StoreTaskCompletionListener extends RocksDB

[PR] [SPARK-43221][CORE][3.5] Host local block fetching should use a block status of a block stored on disk [spark]

2025-03-12 Thread via GitHub
attilapiros opened a new pull request, #50260: URL: https://github.com/apache/spark/pull/50260 **This is a backport to branch-3.5 from master.** Thanks for @yorksity who reported this error and even provided a PR for it. This solution very different from https://github.com/apache/s

  1   2   >