Re: [PR] [SPARK-51358] [SS] Introduce snapshot upload lag detection through StateStoreCoordinator [spark]

2025-03-12 Thread via GitHub
zecookiez commented on code in PR #50123: URL: https://github.com/apache/spark/pull/50123#discussion_r1992390682 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/StateStoreCoordinator.scala: ## @@ -168,9 +220,99 @@ private class StateStoreCoordinator(ove

Re: [PR] [SPARK-51358] [SS] Introduce snapshot upload lag detection through StateStoreCoordinator [spark]

2025-03-12 Thread via GitHub
zecookiez commented on code in PR #50123: URL: https://github.com/apache/spark/pull/50123#discussion_r1992490417 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/StateStoreCoordinator.scala: ## @@ -168,9 +220,99 @@ private class StateStoreCoordinator(ove

Re: [PR] [SPARK-48922][SQL][3.5] Avoid redundant array transform of identical expression for map type [spark]

2025-03-12 Thread via GitHub
kazuyukitanimura commented on PR #50265: URL: https://github.com/apache/spark/pull/50265#issuecomment-2719960307 LGTM Thank you @wForget -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the speci

Re: [PR] [SPARK-51358] [SS] Introduce snapshot upload lag detection through StateStoreCoordinator [spark]

2025-03-12 Thread via GitHub
zecookiez commented on code in PR #50123: URL: https://github.com/apache/spark/pull/50123#discussion_r1992795230 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/StateStoreCoordinator.scala: ## @@ -168,9 +220,99 @@ private class StateStoreCoordinator(ove

Re: [PR] [SPARK-51358] [SS] Introduce snapshot upload lag detection through StateStoreCoordinator [spark]

2025-03-12 Thread via GitHub
zecookiez commented on code in PR #50123: URL: https://github.com/apache/spark/pull/50123#discussion_r1992793749 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/StateStoreCoordinator.scala: ## @@ -168,9 +220,99 @@ private class StateStoreCoordinator(ove

Re: [PR] [SPARK-51469][SQL] Improve MapKeyDedupPolicy so that avoid calling toString [spark]

2025-03-12 Thread via GitHub
beliefer closed pull request #50235: [SPARK-51469][SQL] Improve MapKeyDedupPolicy so that avoid calling toString URL: https://github.com/apache/spark/pull/50235 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abov

Re: [PR] [SPARK-48922][SQL] Avoid redundant array transform of identical expression for map type [spark]

2025-03-12 Thread via GitHub
kazuyukitanimura commented on PR #50245: URL: https://github.com/apache/spark/pull/50245#issuecomment-2719938188 Thank you @wForget late LGTM -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to t

Re: [PR] [SPARK-51494][BUILD] Upgrade to Apache parent pom 33 [spark]

2025-03-12 Thread via GitHub
cnauroth commented on PR #50261: URL: https://github.com/apache/spark/pull/50261#issuecomment-2719935801 I appreciate it. Thank you, @dongjoon-hyun ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [PR] [SPARK-51469][SQL] Improve MapKeyDedupPolicy so that avoid calling toString [spark]

2025-03-12 Thread via GitHub
beliefer commented on PR #50235: URL: https://github.com/apache/spark/pull/50235#issuecomment-2719934092 @dongjoon-hyun To be honestly, it has extremely low performance overhead. Let me update the PR's description. I thought it will be good to cache the string value. -- This is an auto

Re: [PR] [SPARK-51495] Add `Integration Test` GitHub Action job with `4.0.0-preview2` [spark-connect-swift]

2025-03-12 Thread via GitHub
dongjoon-hyun commented on code in PR #15: URL: https://github.com/apache/spark-connect-swift/pull/15#discussion_r1992468710 ## Tests/SparkConnectTests/DataFrameTests.swift: ## @@ -81,6 +81,7 @@ struct DataFrameTests { await spark.stop() } +#if !os(Linux) Review Comm

Re: [PR] [SPARK-49479][CORE] Cancel the Timer non-daemon thread on stopping the BarrierCoordinator [spark]

2025-03-12 Thread via GitHub
jayadeep-jayaraman commented on code in PR #50020: URL: https://github.com/apache/spark/pull/50020#discussion_r1992757166 ## core/src/main/scala/org/apache/spark/BarrierCoordinator.scala: ## @@ -134,11 +138,8 @@ private[spark] class BarrierCoordinator( // Cancel the curre

Re: [PR] [SPARK-51271][PYTHON] Add filter pushdown API to Python Data Sources [spark]

2025-03-12 Thread via GitHub
beliefer commented on code in PR #49961: URL: https://github.com/apache/spark/pull/49961#discussion_r1985896731 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/python/PythonScanBuilder.scala: ## @@ -25,6 +27,40 @@ class PythonScanBuilder( ds: Python

Re: [PR] [SPARK-51492][SS]FileStreamSource: Avoid expensive string concatenation if not needed [spark]

2025-03-12 Thread via GitHub
siying closed pull request #50257: [SPARK-51492][SS]FileStreamSource: Avoid expensive string concatenation if not needed URL: https://github.com/apache/spark/pull/50257 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

Re: [PR] [SPARK-51492][SS]FileStreamSource: Avoid expensive string concatenation if not needed [spark]

2025-03-12 Thread via GitHub
siying commented on PR #50257: URL: https://github.com/apache/spark/pull/50257#issuecomment-2719796281 This seems to be lazily evaluated, so it's not a problem. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL a

Re: [PR] [SPARK-51450][CORE] BarrierCoordinator thread not exiting in Spark standalone mode [spark]

2025-03-12 Thread via GitHub
jjayadeep06 commented on PR #50223: URL: https://github.com/apache/spark/pull/50223#issuecomment-2719772362 > @jjayadeep06 I'm sorry for the suggestion at [#50020 (comment)](https://github.com/apache/spark/pull/50020#issuecomment-2705780937) I want you create a backport PR for branch-3.5, s

Re: [PR] [SPARK-51481] Add `RuntimeConf` actor [spark-connect-swift]

2025-03-12 Thread via GitHub
yaooqinn commented on PR #9: URL: https://github.com/apache/spark-connect-swift/pull/9#issuecomment-2719566864 Thank you @dongjoon-hyun -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the speci

Re: [PR] [SPARK-47484][SQL] Allow trailing comma in column definition list [spark]

2025-03-12 Thread via GitHub
Nicolas-Parot-Alvarez-Paidy commented on PR #45593: URL: https://github.com/apache/spark/pull/45593#issuecomment-2719763453 I hope this will be reconsidered. DuckDB, Snowflake but also the latest versions of Java, Python and Scala all support it. It clearly is a movement in the so

Re: [PR] [SPARK-43221][CORE][3.5] Host local block fetching should use a block status of a block stored on disk [spark]

2025-03-12 Thread via GitHub
attilapiros commented on PR #50260: URL: https://github.com/apache/spark/pull/50260#issuecomment-2719656434 Fixed. It was a difference between Scala 2.12 vs 2.13. The `Option#zip` gives back an `Iterable` on Scala 2.12 instead of an `Option`: ``` # scala Welcome to Scala 2.12.8

Re: [PR] [SPARK-51271][PYTHON] Add filter pushdown API to Python Data Sources [spark]

2025-03-12 Thread via GitHub
beliefer commented on code in PR #49961: URL: https://github.com/apache/spark/pull/49961#discussion_r1992622771 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/python/PythonDataSourceV2.scala: ## @@ -52,6 +52,11 @@ class PythonDataSourceV2 extends TableP

Re: [PR] [SPARK-51271][PYTHON] Add filter pushdown API to Python Data Sources [spark]

2025-03-12 Thread via GitHub
beliefer commented on PR #49961: URL: https://github.com/apache/spark/pull/49961#issuecomment-2719735902 cc @cloud-fan @allisonwang-db -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specifi

Re: [PR] [MINOR][SQL] Reuse `dateTimeUtilsCls` in `Cast` [spark]

2025-03-12 Thread via GitHub
LuciferYang commented on PR #50251: URL: https://github.com/apache/spark/pull/50251#issuecomment-2718510472 Merged into master. Thanks @MaxGekk -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to th

[PR] [SPARK-48922][SQL][3.5] Avoid redundant array transform of identical expression for map type [spark]

2025-03-12 Thread via GitHub
wForget opened a new pull request, #50265: URL: https://github.com/apache/spark/pull/50265 ### What changes were proposed in this pull request? Similar to #47843, this patch avoids ArrayTransform in `resolveMapType` function if the resolution expression is the same as input param.

Re: [PR] [SPARK-48922][SQL] Avoid redundant array transform of identical expression for map type [spark]

2025-03-12 Thread via GitHub
beliefer commented on PR #50245: URL: https://github.com/apache/spark/pull/50245#issuecomment-2719714521 @wForget Could you create backport PR for branch-3.5 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abov

Re: [PR] [SPARK-48922][SQL] Avoid redundant array transform of identical expression for map type [spark]

2025-03-12 Thread via GitHub
beliefer closed pull request #50245: [SPARK-48922][SQL] Avoid redundant array transform of identical expression for map type URL: https://github.com/apache/spark/pull/50245 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

Re: [PR] [SPARK-51449][BUILD] Restore hive-llap-common to compile scope [spark]

2025-03-12 Thread via GitHub
pan3793 closed pull request #50222: [SPARK-51449][BUILD] Restore hive-llap-common to compile scope URL: https://github.com/apache/spark/pull/50222 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] [SPARK-51450][CORE] BarrierCoordinator thread not exiting in Spark standalone mode [spark]

2025-03-12 Thread via GitHub
beliefer commented on PR #50223: URL: https://github.com/apache/spark/pull/50223#issuecomment-2719643055 @jjayadeep06 I'm sorry for the suggestion at https://github.com/apache/spark/pull/50020#issuecomment-2705780937 I want you create a backport PR for branch-3.5, since I read the issue w

Re: [PR] [SPARK-51449][BUILD] Restore hive-llap-common to compile scope [spark]

2025-03-12 Thread via GitHub
pan3793 commented on PR #50222: URL: https://github.com/apache/spark/pull/50222#issuecomment-2719620776 Close and in favor SPARK-51449 (https://github.com/apache/spark/pull/50222) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHu

[PR] [TEST-ONLY][4.0] Test pyarrow 4.0 [spark]

2025-03-12 Thread via GitHub
zhengruifeng opened a new pull request, #50262: URL: https://github.com/apache/spark/pull/50262 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### Ho

Re: [PR] [SPARK-51466][SQL][HIVE] Eliminate Hive built-in UDFs initialization on Hive UDF evaluation [spark]

2025-03-12 Thread via GitHub
pan3793 commented on PR #50232: URL: https://github.com/apache/spark/pull/50232#issuecomment-2719619876 @dongjoon-hyun I opend https://github.com/apache/spark/pull/50264 for 4.0 backport -- This is an automated message from the Apache Git Service. To respond to the message, please log on

Re: [PR] [SPARK-51479][SQL] Nullable in Row Level Operation Column is not correct [spark]

2025-03-12 Thread via GitHub
huaxingao commented on code in PR #50246: URL: https://github.com/apache/spark/pull/50246#discussion_r1992558449 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/RewriteRowLevelCommand.scala: ## @@ -273,9 +273,8 @@ trait RewriteRowLevelCommand extends Rule[L

[PR] Bump golang.org/x/net from 0.34.0 to 0.36.0 [spark-connect-go]

2025-03-12 Thread via GitHub
dependabot[bot] opened a new pull request, #130: URL: https://github.com/apache/spark-connect-go/pull/130 Bumps [golang.org/x/net](https://github.com/golang/net) from 0.34.0 to 0.36.0. Commits https://github.com/golang/net/commit/85d1d54551b68719346cb9fec24b911da4e452a1";>85d1d

Re: [PR] [SPARK-49479][CORE] Cancel the Timer non-daemon thread on stopping the BarrierCoordinator [spark]

2025-03-12 Thread via GitHub
tedyu commented on code in PR #50020: URL: https://github.com/apache/spark/pull/50020#discussion_r1992516501 ## core/src/main/scala/org/apache/spark/BarrierCoordinator.scala: ## @@ -134,11 +138,8 @@ private[spark] class BarrierCoordinator( // Cancel the current active Tim

Re: [PR] [SPARK-49479][CORE] Cancel the Timer non-daemon thread on stopping the BarrierCoordinator [spark]

2025-03-12 Thread via GitHub
tedyu commented on code in PR #50020: URL: https://github.com/apache/spark/pull/50020#discussion_r1992516501 ## core/src/main/scala/org/apache/spark/BarrierCoordinator.scala: ## @@ -134,11 +138,8 @@ private[spark] class BarrierCoordinator( // Cancel the current active Tim

[PR] Spark 51272 51016 combined [spark]

2025-03-12 Thread via GitHub
ahshahid opened a new pull request, #50263: URL: https://github.com/apache/spark/pull/50263 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How wa

Re: [PR] [SPARK-51494][BUILD] Upgrade to Apache parent pom 33 [spark]

2025-03-12 Thread via GitHub
dongjoon-hyun closed pull request #50261: [SPARK-51494][BUILD] Upgrade to Apache parent pom 33 URL: https://github.com/apache/spark/pull/50261 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spe

Re: [PR] [SPARK-51358] [SS] Introduce snapshot upload lag detection through StateStoreCoordinator [spark]

2025-03-12 Thread via GitHub
zecookiez commented on code in PR #50123: URL: https://github.com/apache/spark/pull/50123#discussion_r1992490417 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/StateStoreCoordinator.scala: ## @@ -168,9 +220,99 @@ private class StateStoreCoordinator(ove

Re: [PR] [SPARK-51358] [SS] Introduce snapshot upload lag detection through StateStoreCoordinator [spark]

2025-03-12 Thread via GitHub
zecookiez commented on code in PR #50123: URL: https://github.com/apache/spark/pull/50123#discussion_r1992486033 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/StateStoreCoordinator.scala: ## @@ -168,9 +220,99 @@ private class StateStoreCoordinator(ove

Re: [PR] [SPARK-51358] [SS] Introduce snapshot upload lag detection through StateStoreCoordinator [spark]

2025-03-12 Thread via GitHub
zecookiez commented on code in PR #50123: URL: https://github.com/apache/spark/pull/50123#discussion_r1992483531 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/StateStoreCoordinator.scala: ## @@ -66,9 +86,9 @@ object StateStoreCoordinatorRef extends Lo

Re: [PR] [SPARK-51495] Add `Integration Test` GitHub Action job with `4.0.0-preview2` [spark-connect-swift]

2025-03-12 Thread via GitHub
dongjoon-hyun commented on code in PR #15: URL: https://github.com/apache/spark-connect-swift/pull/15#discussion_r1992468710 ## Tests/SparkConnectTests/DataFrameTests.swift: ## @@ -81,6 +81,7 @@ struct DataFrameTests { await spark.stop() } +#if !os(Linux) Review Comm

Re: [PR] [WIP][SPARK-XXXX][Collation] Prevent Regex with collated strings [spark]

2025-03-12 Thread via GitHub
github-actions[bot] closed pull request #49020: [WIP][SPARK-][Collation] Prevent Regex with collated strings URL: https://github.com/apache/spark/pull/49020 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abov

Re: [PR] [SPARK-51495] Add `Integration Test` GitHub Action job with `4.0.0-preview2` [spark-connect-swift]

2025-03-12 Thread via GitHub
dongjoon-hyun commented on PR #15: URL: https://github.com/apache/spark-connect-swift/pull/15#issuecomment-2719427616 Could you review this when you have some time, @viirya ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub an

Re: [PR] [SPARK-51495] Add `Integration Test` GitHub Action job with `4.0.0-preview2` [spark-connect-swift]

2025-03-12 Thread via GitHub
dongjoon-hyun commented on PR #15: URL: https://github.com/apache/spark-connect-swift/pull/15#issuecomment-2719437716 Thank you, @viirya . Merged to main. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [PR] [SPARK-51495] Add `Integration Test` GitHub Action job with `4.0.0-preview2` [spark-connect-swift]

2025-03-12 Thread via GitHub
dongjoon-hyun commented on code in PR #15: URL: https://github.com/apache/spark-connect-swift/pull/15#discussion_r1992475005 ## Tests/SparkConnectTests/DataFrameTests.swift: ## @@ -81,6 +81,7 @@ struct DataFrameTests { await spark.stop() } +#if !os(Linux) Review Comm

Re: [PR] [SPARK-51495] Add `Integration Test` GitHub Action job with `4.0.0-preview2` [spark-connect-swift]

2025-03-12 Thread via GitHub
viirya commented on code in PR #15: URL: https://github.com/apache/spark-connect-swift/pull/15#discussion_r1992473785 ## Tests/SparkConnectTests/DataFrameTests.swift: ## @@ -81,6 +81,7 @@ struct DataFrameTests { await spark.stop() } +#if !os(Linux) Review Comment:

Re: [PR] [SPARK-43221][CORE][4.0] Host local block fetching should use a block status of a block stored on disk [spark]

2025-03-12 Thread via GitHub
dongjoon-hyun closed pull request #50259: [SPARK-43221][CORE][4.0] Host local block fetching should use a block status of a block stored on disk URL: https://github.com/apache/spark/pull/50259 -- This is an automated message from the Apache Git Service. To respond to the message, please log o

Re: [PR] [SPARK-50424][INFRA] Extract the common content of `Dockerfile` from `Docs`, `Linter`, and `SparkR` test images [spark]

2025-03-12 Thread via GitHub
github-actions[bot] commented on PR #48967: URL: https://github.com/apache/spark/pull/48967#issuecomment-2719432174 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.

Re: [PR] [SPARK-XXXX][Avro] Fix avro deserialization breaking for UnionType[null, Record] [spark]

2025-03-12 Thread via GitHub
github-actions[bot] commented on PR #49019: URL: https://github.com/apache/spark/pull/49019#issuecomment-2719432155 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.

Re: [PR] [WIP][SPARK-xxxx][Collation] Guard collation regex expressions behind a flag [spark]

2025-03-12 Thread via GitHub
github-actions[bot] commented on PR #49026: URL: https://github.com/apache/spark/pull/49026#issuecomment-2719432121 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.

Re: [PR] [SPARK-47672][SQL] Avoid double eval from filter pushDown [spark]

2025-03-12 Thread via GitHub
github-actions[bot] commented on PR #45802: URL: https://github.com/apache/spark/pull/45802#issuecomment-2719432206 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.

Re: [PR] [SPARK-51495] Add `Integration Test` GitHub Action job with `4.0.0-preview2` [spark-connect-swift]

2025-03-12 Thread via GitHub
dongjoon-hyun commented on code in PR #15: URL: https://github.com/apache/spark-connect-swift/pull/15#discussion_r1992468710 ## Tests/SparkConnectTests/DataFrameTests.swift: ## @@ -81,6 +81,7 @@ struct DataFrameTests { await spark.stop() } +#if !os(Linux) Review Comm

Re: [PR] [SPARK-51358] [SS] Introduce snapshot upload lag detection through StateStoreCoordinator [spark]

2025-03-12 Thread via GitHub
zecookiez commented on code in PR #50123: URL: https://github.com/apache/spark/pull/50123#discussion_r1992458658 ## sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala: ## @@ -2236,6 +2236,19 @@ object SQLConf { .booleanConf .createWithDefault(t

Re: [PR] [SPARK-51495] Add `Integration Test` GitHub Action job [spark-connect-swift]

2025-03-12 Thread via GitHub
dongjoon-hyun commented on code in PR #15: URL: https://github.com/apache/spark-connect-swift/pull/15#discussion_r1992468989 ## Tests/SparkConnectTests/RuntimeConfTests.swift: ## @@ -31,7 +31,7 @@ struct RuntimeConfTests { _ = try await client.connect(UUID().uuidString)

Re: [PR] [SPARK-51495] Add `Integration Test` GitHub Action job [spark-connect-swift]

2025-03-12 Thread via GitHub
dongjoon-hyun commented on code in PR #15: URL: https://github.com/apache/spark-connect-swift/pull/15#discussion_r1992469308 ## Tests/SparkConnectTests/SparkSessionTests.swift: ## @@ -56,7 +56,6 @@ struct SparkSessionTests { @Test func conf() async throws { let spark

Re: [PR] [SPARK-51350][SQL] Implement Show Procedures [spark]

2025-03-12 Thread via GitHub
aokolnychyi commented on code in PR #50109: URL: https://github.com/apache/spark/pull/50109#discussion_r1992411229 ## docs/sql-ref-ansi-compliance.md: ## @@ -648,6 +648,7 @@ Below is a list of all the keywords in Spark SQL. |PRECEDING|non-reserved|non-reserved|non-reserved| |P

Re: [PR] [SPARK-51349][SQL][TESTS] Change precedence of null and "null" in sorting in QueryTest [spark]

2025-03-12 Thread via GitHub
harshmotw-db commented on code in PR #50108: URL: https://github.com/apache/spark/pull/50108#discussion_r1992384542 ## sql/core/src/test/scala/org/apache/spark/sql/QueryTest.scala: ## @@ -326,7 +326,13 @@ object QueryTest extends Assertions { // For binary arrays, we conver

[PR] [SPARK-43221][CORE][3.5] Host local block fetching should use a block status of a block stored on disk [spark]

2025-03-12 Thread via GitHub
attilapiros opened a new pull request, #50260: URL: https://github.com/apache/spark/pull/50260 **This is a backport to branch-3.5 from master.** Thanks for @yorksity who reported this error and even provided a PR for it. This solution very different from https://github.com/apache/s

Re: [PR] [SPARK-51490] Support `iOS`, `watchOS`, and `tvOS` [spark-connect-swift]

2025-03-12 Thread via GitHub
dongjoon-hyun commented on PR #13: URL: https://github.com/apache/spark-connect-swift/pull/13#issuecomment-2718738366 Could you review this when you have some time, please, @huaxingao ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on t

Re: [PR] [SPARK-51358] [SS] Introduce snapshot upload lag detection through StateStoreCoordinator [spark]

2025-03-12 Thread via GitHub
zecookiez commented on code in PR #50123: URL: https://github.com/apache/spark/pull/50123#discussion_r1992404121 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/StateStoreCoordinator.scala: ## @@ -168,9 +220,99 @@ private class StateStoreCoordinator(ove

[PR] [MINOR][DOC] Improve the documentation for configuration [spark]

2025-03-12 Thread via GitHub
0xbadidea opened a new pull request, #50247: URL: https://github.com/apache/spark/pull/50247 ### What changes were proposed in this pull request? rephrased documentation for configuration ### Why are the changes needed? Fixed grammar. ### Does t

Re: [PR] [SPARK-31561][SQL] Add QUALIFY clause [spark]

2025-03-12 Thread via GitHub
ebonnal commented on PR #39691: URL: https://github.com/apache/spark/pull/39691#issuecomment-2719284145 Any plans to finalize this work? The review doesn’t seem to highlight any major blockers, right? Was it abandoned following an internal discussion? cc @wangyum 🙏🏻 -- This is an

Re: [PR] [SPARK-51358] [SS] Introduce snapshot upload lag detection through StateStoreCoordinator [spark]

2025-03-12 Thread via GitHub
zecookiez commented on code in PR #50123: URL: https://github.com/apache/spark/pull/50123#discussion_r1992391021 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDB.scala: ## @@ -65,6 +65,7 @@ case object StoreTaskCompletionListener extends RocksDB

Re: [PR] [SPARK-51493] Refine `merge_spark_pr.py` to use `connect-swift-x.y.z` version [spark-connect-swift]

2025-03-12 Thread via GitHub
dongjoon-hyun commented on PR #14: URL: https://github.com/apache/spark-connect-swift/pull/14#issuecomment-2719154602 Merged to main~ (I also used this new script to merge this PR.) ``` ... Would you like to update an associated JIRA? (y/N): y Enter a JIRA id [SPARK-51493]:

Re: [PR] [SPARK-51473][ML][CONNECT] ML transformed dataframe keep a reference to the model [spark]

2025-03-12 Thread via GitHub
zhengruifeng commented on code in PR #50199: URL: https://github.com/apache/spark/pull/50199#discussion_r1992343085 ## python/pyspark/ml/util.py: ## @@ -113,6 +113,11 @@ def invoke_remote_attribute_relation( methods, obj_ref = _extract_id_methods(instance._java_obj) me

Re: [PR] [SPARK-51473][ML][CONNECT] ML transformed dataframe keep a reference to the model [spark]

2025-03-12 Thread via GitHub
zhengruifeng commented on code in PR #50199: URL: https://github.com/apache/spark/pull/50199#discussion_r1992347220 ## python/pyspark/ml/classification.py: ## @@ -909,7 +912,10 @@ def evaluate(self, dataset: DataFrame) -> "LinearSVCSummary": if not isinstance(dataset,

Re: [PR] [SPARK-51473][ML][CONNECT] ML transformed dataframe keep a reference to the model [spark]

2025-03-12 Thread via GitHub
zhengruifeng commented on code in PR #50199: URL: https://github.com/apache/spark/pull/50199#discussion_r1992344748 ## python/pyspark/ml/classification.py: ## @@ -889,7 +889,10 @@ def summary(self) -> "LinearSVCTrainingSummary": # type: ignore[override] trained on the

Re: [PR] [SPARK-51473][ML][CONNECT] ML transformed dataframe keep a reference to the model [spark]

2025-03-12 Thread via GitHub
zhengruifeng commented on code in PR #50199: URL: https://github.com/apache/spark/pull/50199#discussion_r1992343085 ## python/pyspark/ml/util.py: ## @@ -113,6 +113,11 @@ def invoke_remote_attribute_relation( methods, obj_ref = _extract_id_methods(instance._java_obj) me

Re: [PR] [SPARK-51493] Refine `merge_spark_pr.py` to use `connect-swift-x.y.z` version [spark-connect-swift]

2025-03-12 Thread via GitHub
dongjoon-hyun closed pull request #14: [SPARK-51493] Refine `merge_spark_pr.py` to use `connect-swift-x.y.z` version URL: https://github.com/apache/spark-connect-swift/pull/14 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and u

[PR] Run test [spark-connect-swift]

2025-03-12 Thread via GitHub
dongjoon-hyun opened a new pull request, #15: URL: https://github.com/apache/spark-connect-swift/pull/15 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change?

Re: [PR] [SPARK-51493] Refine `merge_spark_pr.py` to use `connect-swift-x.y.z` version [spark-connect-swift]

2025-03-12 Thread via GitHub
dongjoon-hyun commented on PR #14: URL: https://github.com/apache/spark-connect-swift/pull/14#issuecomment-2719150536 Thank you so much, @attilapiros ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[PR] [SPARK-43221][CORE][4.0] Host local block fetching should use a block status of a block stored on disk [spark]

2025-03-12 Thread via GitHub
attilapiros opened a new pull request, #50259: URL: https://github.com/apache/spark/pull/50259 **This is a backport to branch-4.0 from master.** Thanks for @yorksity who reported this error and even provided a PR for it. This solution very different from https://github.com/apache/s

Re: [PR] [SPARK-43221][CORE] Host local block fetching should use a block status of a block stored on disk [spark]

2025-03-12 Thread via GitHub
attilapiros commented on PR #50122: URL: https://github.com/apache/spark/pull/50122#issuecomment-2719136015 Backport to branch-3.5: https://github.com/apache/spark/pull/50260 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

Re: [PR] [SPARK-43221][CORE] Host local block fetching should use a block status of a block stored on disk [spark]

2025-03-12 Thread via GitHub
attilapiros commented on PR #50122: URL: https://github.com/apache/spark/pull/50122#issuecomment-2719130827 Backport to branch-4.0: https://github.com/apache/spark/pull/50259 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

Re: [PR] [SPARK-51473][ML][CONNECT] ML transformed dataframe keep a reference to the model [spark]

2025-03-12 Thread via GitHub
zhengruifeng commented on code in PR #50199: URL: https://github.com/apache/spark/pull/50199#discussion_r1992274916 ## python/pyspark/ml/util.py: ## @@ -185,29 +185,40 @@ def wrapped(self: "JavaWrapper", dataset: "ConnectDataFrame") -> Any: assert isinstance(

Re: [PR] [SPARK-51493] Refine `merge_spark_pr.py` to use `connect-swift-x.y.z` version [spark-connect-swift]

2025-03-12 Thread via GitHub
dongjoon-hyun commented on PR #14: URL: https://github.com/apache/spark-connect-swift/pull/14#issuecomment-2719045920 To test, you can try to merge this PR (SPARK-51493) with this script. https://github.com/user-attachments/assets/eed46cc9-cbea-44ed-909a-10d640e3a678"; /> -- This

Re: [PR] [SPARK-51487][PYTHON][INFRA] Refresh testing images for pyarrow 19 [spark]

2025-03-12 Thread via GitHub
dongjoon-hyun commented on PR #50255: URL: https://github.com/apache/spark/pull/50255#issuecomment-2719073678 I switched the issue type from `Improvement` to `Test`. Feel free to switch back if it looks improper to you, @zhengruifeng . For the code wise, I'm +1 if this doesn't cause a

Re: [PR] [SPARK-51358] [SS] Introduce snapshot upload lag detection through StateStoreCoordinator [spark]

2025-03-12 Thread via GitHub
zecookiez commented on code in PR #50123: URL: https://github.com/apache/spark/pull/50123#discussion_r1992241770 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDBStateStoreProvider.scala: ## @@ -38,9 +38,20 @@ import org.apache.spark.sql.types.Str

Re: [PR] [SPARK-51493] Refine `merge_spark_pr.py` to use `connect-swift-x.y.z` version [spark-connect-swift]

2025-03-12 Thread via GitHub
dongjoon-hyun commented on PR #14: URL: https://github.com/apache/spark-connect-swift/pull/14#issuecomment-2719042611 Thank you, @attilapiros . This PR aims to filter when it chooses `Fixed Version`. -- This is an automated message from the Apache Git Service. To respond to the me

Re: [PR] [SPARK-51493] Refine `merge_spark_pr.py` to use `connect-swift-x.y.z` version [spark-connect-swift]

2025-03-12 Thread via GitHub
attilapiros commented on PR #14: URL: https://github.com/apache/spark-connect-swift/pull/14#issuecomment-2718979814 Let me do a quick test.. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the s

Re: [PR] [SPARK-51493] Refine `merge_spark_pr.py` to use `connect-swift-x.y.z` version [spark-connect-swift]

2025-03-12 Thread via GitHub
attilapiros commented on PR #14: URL: https://github.com/apache/spark-connect-swift/pull/14#issuecomment-2718970521 I see this connect-swift-x.y.z is already used in the tickets: https://github.com/user-attachments/assets/6f7088a6-4f08-4a2a-b44e-d7d30be79eff"; /> -- This is an aut

Re: [PR] [SPARK-51484][SS] Remove unused function `private def newDFSFileName(String)` from `RocksDBFileManager` [spark]

2025-03-12 Thread via GitHub
dongjoon-hyun commented on PR #50249: URL: https://github.com/apache/spark/pull/50249#issuecomment-2718779093 Merged to master. Thank you, @LuciferYang and @beliefer . -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use th

Re: [PR] [SPARK-51473][ML][CONNECT] ML transformed dataframe keep a reference to the model [spark]

2025-03-12 Thread via GitHub
zhengruifeng commented on code in PR #50199: URL: https://github.com/apache/spark/pull/50199#discussion_r1992191679 ## python/pyspark/ml/util.py: ## @@ -185,29 +185,40 @@ def wrapped(self: "JavaWrapper", dataset: "ConnectDataFrame") -> Any: assert isinstance(

[PR] [SPARK-51491][PYTHON] Simplify boxplot with subquery APIs [spark]

2025-03-12 Thread via GitHub
zhengruifeng opened a new pull request, #50258: URL: https://github.com/apache/spark/pull/50258 ### What changes were proposed in this pull request? Simplify boxplot with subquery APIs ### Why are the changes needed? 1, to make the code simple; 2, according to my exper

Re: [PR] [SPARK-51481] Add `RuntimeConf` actor [spark-connect-swift]

2025-03-12 Thread via GitHub
dongjoon-hyun commented on PR #9: URL: https://github.com/apache/spark-connect-swift/pull/9#issuecomment-2718880137 To @yaooqinn , I applied new `version` scheme for all. > By the way, can we unbind the version tracker of this project to the Spark main repo? It might be inconvenient for

Re: [PR] [SPARK-51493] Refine `merge_spark_pr.py` to use `connect-swift-x.y.z` version [spark-connect-swift]

2025-03-12 Thread via GitHub
dongjoon-hyun commented on PR #14: URL: https://github.com/apache/spark-connect-swift/pull/14#issuecomment-2718898426 Could you review this PR when you have some time, @attilapiros ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to G

Re: [PR] [WIP][SPARK-51348][BUILD][SQL] Upgrade Hive to 4.0 [spark]

2025-03-12 Thread via GitHub
deniskuzZ commented on code in PR #50213: URL: https://github.com/apache/spark/pull/50213#discussion_r1991644845 ## sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveQuerySuite.scala: ## @@ -1210,7 +1210,7 @@ class HiveQuerySuite extends HiveComparisonTest with SQ

[PR] [SPARK-51492][SS]FileStreamSource: Avoid expensive file concatenation if not needed [spark]

2025-03-12 Thread via GitHub
siying opened a new pull request, #50257: URL: https://github.com/apache/spark/pull/50257 ### What changes were proposed in this pull request? In two places where constructing a log line can be very or a little bit expensive, avoid logTrace() call as a whole. ### Why are the ch

[PR] [SPARK-51493] Refine `merge_spark_pr.py` to use `connect-swift-x.y.z`… [spark-connect-swift]

2025-03-12 Thread via GitHub
dongjoon-hyun opened a new pull request, #14: URL: https://github.com/apache/spark-connect-swift/pull/14 … version ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing cha

Re: [PR] [WIP][SPARK-51348][BUILD][SQL] Upgrade Hive to 4.0 [spark]

2025-03-12 Thread via GitHub
deniskuzZ commented on code in PR #50213: URL: https://github.com/apache/spark/pull/50213#discussion_r1991644845 ## sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveQuerySuite.scala: ## @@ -1210,7 +1210,7 @@ class HiveQuerySuite extends HiveComparisonTest with SQ

Re: [PR] [SPARK-43221][CORE] Host local block fetching should use a block status of a block stored on disk [spark]

2025-03-12 Thread via GitHub
dongjoon-hyun commented on PR #50122: URL: https://github.com/apache/spark/pull/50122#issuecomment-2718824584 Also, cc @mridulm , too -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-43221][CORE] Host local block fetching should use a block status of a block stored on disk [spark]

2025-03-12 Thread via GitHub
dongjoon-hyun commented on PR #50122: URL: https://github.com/apache/spark/pull/50122#issuecomment-2718823408 Thank you, @attilapiros and @Ngone51 . BTW, could you make two backporting PRs to branch-4.0 and branch-3.5 in order to make it sure to pass all CIs there, @attilapiros ? --

Re: [PR] [SPARK-51487][PYTHON][INFRA] Refresh testing images for pyarrow 19 [spark]

2025-03-12 Thread via GitHub
LuciferYang commented on PR #50255: URL: https://github.com/apache/spark/pull/50255#issuecomment-2718534144 Will `dev/create-release/spark-rm/Dockerfile` be fixed in a separate pull request? -- This is an automated message from the Apache Git Service. To respond to the message

Re: [PR] [SPARK-43221][CORE] Host local block fetching should use a block status of a block stored on disk [spark]

2025-03-12 Thread via GitHub
dongjoon-hyun closed pull request #50122: [SPARK-43221][CORE] Host local block fetching should use a block status of a block stored on disk URL: https://github.com/apache/spark/pull/50122 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

Re: [PR] [SPARK-51466][SQL][HIVE] Eliminate Hive built-in UDFs initialization on Hive UDF evaluation [spark]

2025-03-12 Thread via GitHub
dongjoon-hyun commented on PR #50232: URL: https://github.com/apache/spark/pull/50232#issuecomment-2718785462 Thank you, @pan3793 and @LuciferYang . Merged to master. Could you make a backporting PR to branch-4.0 to pass CI there once more, @pan3793 ? -- This is an automated messag

Re: [PR] [SPARK-51488][SQL] Support the TIME keyword as a data type [spark]

2025-03-12 Thread via GitHub
dongjoon-hyun closed pull request #50250: [SPARK-51488][SQL] Support the TIME keyword as a data type URL: https://github.com/apache/spark/pull/50250 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to t

Re: [PR] [SPARK-51466][SQL][HIVE] Eliminate Hive built-in UDFs initialization on Hive UDF evaluation [spark]

2025-03-12 Thread via GitHub
dongjoon-hyun closed pull request #50232: [SPARK-51466][SQL][HIVE] Eliminate Hive built-in UDFs initialization on Hive UDF evaluation URL: https://github.com/apache/spark/pull/50232 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

Re: [PR] [SPARK-51488][SQL] Support the TIME keyword as a data type [spark]

2025-03-12 Thread via GitHub
dongjoon-hyun commented on PR #50250: URL: https://github.com/apache/spark/pull/50250#issuecomment-2718780657 Merged to master. Thank you, @MaxGekk . -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [PR] [SPARK-51484][SS] Remove unused function `private def newDFSFileName(String)` from `RocksDBFileManager` [spark]

2025-03-12 Thread via GitHub
dongjoon-hyun closed pull request #50249: [SPARK-51484][SS] Remove unused function `private def newDFSFileName(String)` from `RocksDBFileManager` URL: https://github.com/apache/spark/pull/50249 -- This is an automated message from the Apache Git Service. To respond to the message, please log

Re: [PR] [SPARK-51490] Support `iOS`, `watchOS`, and `tvOS` [spark-connect-swift]

2025-03-12 Thread via GitHub
dongjoon-hyun closed pull request #13: [SPARK-51490] Support `iOS`, `watchOS`, and `tvOS` URL: https://github.com/apache/spark-connect-swift/pull/13 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to t

Re: [PR] [SPARK-51490] Support `iOS`, `watchOS`, and `tvOS` [spark-connect-swift]

2025-03-12 Thread via GitHub
dongjoon-hyun commented on PR #13: URL: https://github.com/apache/spark-connect-swift/pull/13#issuecomment-2718747123 Thank you so much, @huaxingao ! Merged to main -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[PR] [SPARK-51490] Support iOS, watchOS, and tvOS [spark-connect-swift]

2025-03-12 Thread via GitHub
dongjoon-hyun opened a new pull request, #13: URL: https://github.com/apache/spark-connect-swift/pull/13 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change?

  1   2   >