date:20250227

Re: [PR] [SPARK-49479][CORE] Cancel the Timer non-daemon thread on stopping the BarrierCoordinator [spark]

2025-02-27 Thread via GitHub



beliefer commented on code in PR #50020:
URL: https://github.com/apache/spark/pull/50020#discussion_r1973314920


##
core/src/main/scala/org/apache/spark/BarrierCoordinator.scala:
##
@@ -122,23 +124,40 @@ private[spark] class BarrierCoordinator(
 // Init a TimerTask for a barrier() call.
 private def initTimerTask(state: ContextBarrierState): Unit = {
   timerTask = new TimerTask {
-override def run(): Unit = state.synchronized {
-  // Timeout current barrier() call, fail all the sync requests.
-  requesters.foreach(_.sendFailure(new SparkException("The coordinator 
didn't get all " +
-s"barrier sync requests for barrier epoch $barrierEpoch from 
$barrierId within " +
-s"$timeoutInSecs second(s).")))
-  cleanupBarrierStage(barrierId)
-}
+override def run(): Unit =
+  try {
+state.synchronized {
+  if (!Thread.currentThread().isInterrupted()) {
+// Timeout current barrier() call, fail all the sync requests.
+requesters.foreach(
+  _.sendFailure(new SparkException("The coordinator didn't get 
all " +
+s"barrier sync requests for barrier epoch +" +
+s" $barrierEpoch from $barrierId within " +
+s"$timeoutInSecs second(s).")))
+cleanupBarrierStage(barrierId)
+  }
+}
+  } catch {
+case _: InterruptedException =>
+  // Handle interruption gracefully
+  Thread.currentThread().interrupt()
+case e: Exception => new SparkException("Error during " +
+  s"running of barrier tasks for " +
+  s"$barrierId", e)
+  } finally {
+// Ensure cleanup happens even if interrupted or exception occurs
+cleanupBarrierStage(barrierId)
+state.clear()
+  }
   }
 }
 
-// Cancel the current active TimerTask and release resources.
+/* Cancel the tasks scheduled to run inside the ScheduledExecutor 
Threadpool
+* The original implementation was clearing java.util.Timer and 
java.util.TimerTasks
+* This became a no-op when java.util.Timer was replaced with 
ScheduledThreadPoolExecutor
+*/
 private def cancelTimerTask(): Unit = {
-  if (timerTask != null) {
-timerTask.cancel()

Review Comment:
   timerTask.cancel() doesn't work ?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[PR] [SPARK-51337][SQL] Add maxRows to CTERelationRef [spark]

2025-02-27 Thread via GitHub



vladimirg-db opened a new pull request, #50104:
URL: https://github.com/apache/spark/pull/50104

   
   
   ### What changes were proposed in this pull request?
   
   Add `maxRows` field to `CTERelationRef`.
   
   ### Why are the changes needed?
   
   The Analyzer validates scalar subqueries by checking if it outputs just one 
row or not: 
https://github.com/vladimirg-db/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ValidateSubqueryExpression.scala#L144.
   
   This works in fixed-point Analyzer, because CTEs are inlined before 
`CheckAnalysis`. However, in single-pass Analyzer, we validate the subquery 
right after its validation, so `CTERelationRef` must output correct `maxRows` 
as well.
   
   ### Does this PR introduce _any_ user-facing change?
   
   There should be no changes to existing Catalyst behavior.
   
   ### How was this patch tested?
   
   Existing tests.
   
   ### Was this patch authored or co-authored using generative AI tooling?
   
   No.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-51332][SQL] DS V2 supports push down BIT_AND, BIT_OR, BIT_XOR, BIT_COUNT and BIT_GET [spark]

2025-02-27 Thread via GitHub



beliefer commented on PR #50097:
URL: https://github.com/apache/spark/pull/50097#issuecomment-2687554240

   ping @cloud-fan 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-49479][CORE] Cancel the Timer non-daemon thread on stopping the BarrierCoordinator [spark]

2025-02-27 Thread via GitHub



beliefer commented on code in PR #50020:
URL: https://github.com/apache/spark/pull/50020#discussion_r1973321076


##
core/src/main/scala/org/apache/spark/BarrierCoordinator.scala:
##
@@ -80,8 +81,9 @@ private[spark] class BarrierCoordinator(
   states.forEachValue(1, clearStateConsumer)
   states.clear()
   listenerBus.removeListener(listener)
-  ThreadUtils.shutdown(timer)
 } finally {
+  timerFuture.foreach(_.cancel(true))
+  ThreadUtils.shutdown(timer)

Review Comment:
   I checked other code, `ThreadUtils.shutdown(...)` did not placed into 
finally block.
   Could you give me some explanation ? Thank you.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-49479][CORE] Cancel the Timer non-daemon thread on stopping the BarrierCoordinator [spark]

2025-02-27 Thread via GitHub



beliefer commented on code in PR #50020:
URL: https://github.com/apache/spark/pull/50020#discussion_r1973321076


##
core/src/main/scala/org/apache/spark/BarrierCoordinator.scala:
##
@@ -80,8 +81,9 @@ private[spark] class BarrierCoordinator(
   states.forEachValue(1, clearStateConsumer)
   states.clear()
   listenerBus.removeListener(listener)
-  ThreadUtils.shutdown(timer)
 } finally {
+  timerFuture.foreach(_.cancel(true))
+  ThreadUtils.shutdown(timer)

Review Comment:
   I checked other code, `ThreadUtils.shutdown(...)` is not placed into finally 
block.
   Could you give me some explanation ? Thank you.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[PR] [WIP][SQL] Add `TimeType` [spark]

2025-02-27 Thread via GitHub



MaxGekk opened a new pull request, #50103:
URL: https://github.com/apache/spark/pull/50103

   
   
   ### What changes were proposed in this pull request?
   
   
   
   ### Why are the changes needed?
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   
   ### How was this patch tested?
   By running new tests:
   ```
   $ build/sbt "test:testOnly *DataTypeSuite"
   ```
   
   
   ### Was this patch authored or co-authored using generative AI tooling?
   No.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-50849][Connect] Add example project to demonstrate Spark Connect Server Libraries [spark]

2025-02-27 Thread via GitHub



vicennial commented on PR #49604:
URL: https://github.com/apache/spark/pull/49604#issuecomment-2687751626

   Thanks for the comment @LuciferYang
   
   > I have some doubts about merging this part of the code into the main 
codebase, as it seems more like an independent project to me.
   
   It is independent, but this is due to the independence property of the Spark 
Connect Client (which is, by design, a thin client with lightweight 
dependencies). Since this serves to demonstrate extending Spark Connect and 
displaying its usage of the extension via a Spark Connect Client, I chose to 
mimic how a practical example would look, and thus, there is no direct 
connection to the parent Spark tree.  
   
   > Additionally, I haven't found any process for building and testing it in 
any of Spark CI (if there is one, please point it out to me). What is the 
recognized way to ensure its integrity and health after we make changes to it?
   
   You are right, there is no automated process in the Spark CI at the moment. 
In an earlier [comment](https://issues.apache.org/jira/browse/SPARK-51338), I 
requested to separate out the addition of the automated build step in order to 
separate out and tackle the somewhat intricate complexity in modifying the CI 
in a fresh PR. I've opened up this ticket now to track this follow-up 
https://issues.apache.org/jira/browse/SPARK-51338. I am actively looking into 
the process to accomplish this step, albeit a bit slow as I haven't modified 
the CI in the past.
   
   > Additionally, in what form will this be released to users?
   
   This largely serves as a reference skeleton project structure (accompanied 
by the documentation in 
https://dist.apache.org/repos/dist/dev/spark/v4.0.0-rc1-docs/_site/app-dev-spark-connect.html).
 No intention atm to have users depend on this project but rather use it as a 
base for their own separate/independent development  
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[PR] [SPARK-51340][ML][CONNECT] Model size estimation for linear classification & regression models [spark]

2025-02-27 Thread via GitHub



zhengruifeng opened a new pull request, #50106:
URL: https://github.com/apache/spark/pull/50106

   
   
   ### What changes were proposed in this pull request?
   Model size estimation for linear classification & regression models
   
   
   ### Why are the changes needed?
   pre-training model size estimation is required to control the model cache at 
driver
   
   
   ### Does this PR introduce _any_ user-facing change?
   no
   
   
   ### How was this patch tested?
   added tests
   
   
   ### Was this patch authored or co-authored using generative AI tooling?
   no
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-51325] Check in source code for `smallJar.jar` [spark]

2025-02-27 Thread via GitHub



vicennial closed pull request #50092: [SPARK-51325] Check in source code for 
`smallJar.jar`
URL: https://github.com/apache/spark/pull/50092


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-51325] Check in source code for `smallJar.jar` [spark]

2025-02-27 Thread via GitHub



vicennial commented on PR #50092:
URL: https://github.com/apache/spark/pull/50092#issuecomment-2687688931

   Understood. Closing this PR and holding the code changes until we have a 
direction based on the dev list discussion


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-49479][CORE] Cancel the Timer non-daemon thread on stopping the BarrierCoordinator [spark]

2025-02-27 Thread via GitHub



jayadeep-jayaraman commented on code in PR #50020:
URL: https://github.com/apache/spark/pull/50020#discussion_r1973338339


##
core/src/main/scala/org/apache/spark/BarrierCoordinator.scala:
##
@@ -122,23 +124,40 @@ private[spark] class BarrierCoordinator(
 // Init a TimerTask for a barrier() call.
 private def initTimerTask(state: ContextBarrierState): Unit = {
   timerTask = new TimerTask {
-override def run(): Unit = state.synchronized {
-  // Timeout current barrier() call, fail all the sync requests.
-  requesters.foreach(_.sendFailure(new SparkException("The coordinator 
didn't get all " +
-s"barrier sync requests for barrier epoch $barrierEpoch from 
$barrierId within " +
-s"$timeoutInSecs second(s).")))
-  cleanupBarrierStage(barrierId)
-}
+override def run(): Unit =
+  try {
+state.synchronized {
+  if (!Thread.currentThread().isInterrupted()) {
+// Timeout current barrier() call, fail all the sync requests.
+requesters.foreach(
+  _.sendFailure(new SparkException("The coordinator didn't get 
all " +
+s"barrier sync requests for barrier epoch +" +
+s" $barrierEpoch from $barrierId within " +
+s"$timeoutInSecs second(s).")))
+cleanupBarrierStage(barrierId)
+  }
+}
+  } catch {
+case _: InterruptedException =>
+  // Handle interruption gracefully
+  Thread.currentThread().interrupt()
+case e: Exception => new SparkException("Error during " +
+  s"running of barrier tasks for " +
+  s"$barrierId", e)
+  } finally {
+// Ensure cleanup happens even if interrupted or exception occurs
+cleanupBarrierStage(barrierId)
+state.clear()
+  }
   }
 }
 
-// Cancel the current active TimerTask and release resources.
+/* Cancel the tasks scheduled to run inside the ScheduledExecutor 
Threadpool
+* The original implementation was clearing java.util.Timer and 
java.util.TimerTasks
+* This became a no-op when java.util.Timer was replaced with 
ScheduledThreadPoolExecutor
+*/
 private def cancelTimerTask(): Unit = {
-  if (timerTask != null) {
-timerTask.cancel()

Review Comment:
   It does not work. The explanation is available 
[here](https://github.com/apache/spark/pull/47956#issuecomment-2328035086). 
Basically, the issue was we moved from `Timertask` API to 
`ScheduledThreadPoolExecutor`  and the cancel method is no longer directly 
available as `ScheduledThreadPoolExecutor` returns a `ScheduledFuture` and we 
need to use 
[cancel](https://docs.oracle.com/en/java/javase/17/docs/api/java.base/java/util/concurrent/Future.html)
 method of `Future` 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-49479][CORE] Cancel the Timer non-daemon thread on stopping the BarrierCoordinator [spark]

2025-02-27 Thread via GitHub



jayadeep-jayaraman commented on code in PR #50020:
URL: https://github.com/apache/spark/pull/50020#discussion_r1973345489


##
core/src/main/scala/org/apache/spark/BarrierCoordinator.scala:
##
@@ -80,8 +81,9 @@ private[spark] class BarrierCoordinator(
   states.forEachValue(1, clearStateConsumer)
   states.clear()
   listenerBus.removeListener(listener)
-  ThreadUtils.shutdown(timer)
 } finally {
+  timerFuture.foreach(_.cancel(true))
+  ThreadUtils.shutdown(timer)

Review Comment:
   I moved it to `finally` to ensure that even if there are any failures in the 
previous cleanup steps we still shutdown the timer. But i can move it to the 
`try` block if it is a concern



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[PR] Add java/scala version in response [spark]

2025-02-27 Thread via GitHub



garlandz-db opened a new pull request, #50102:
URL: https://github.com/apache/spark/pull/50102

   
   
   ### What changes were proposed in this pull request?
   * piggyback off the spark version response and include other env properties 
like java/scala version
   
   
   
   ### Why are the changes needed?
   * client is not able to know what type of java/scala version the server uses 
which can be problematic with udfs and client has different build env than 
server
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   
   ### How was this patch tested?
   
   
   
   ### Was this patch authored or co-authored using generative AI tooling?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-49479][CORE] Cancel the Timer non-daemon thread on stopping the BarrierCoordinator [spark]

2025-02-27 Thread via GitHub



jjayadeep06 commented on code in PR #50020:
URL: https://github.com/apache/spark/pull/50020#discussion_r1973391083


##
core/src/main/scala/org/apache/spark/BarrierCoordinator.scala:
##
@@ -80,8 +81,9 @@ private[spark] class BarrierCoordinator(
   states.forEachValue(1, clearStateConsumer)
   states.clear()
   listenerBus.removeListener(listener)
-  ThreadUtils.shutdown(timer)
 } finally {
+  timerFuture.foreach(_.cancel(true))
+  ThreadUtils.shutdown(timer)

Review Comment:
   I moved it to `finally` to ensure that even if there are any failures in the 
previous cleanup steps we still shutdown the timer. But i can move it to the 
`try` block if it is a concern



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-49479][CORE] Cancel the Timer non-daemon thread on stopping the BarrierCoordinator [spark]

2025-02-27 Thread via GitHub



jayadeep-jayaraman commented on code in PR #50020:
URL: https://github.com/apache/spark/pull/50020#discussion_r1973338339


##
core/src/main/scala/org/apache/spark/BarrierCoordinator.scala:
##
@@ -122,23 +124,40 @@ private[spark] class BarrierCoordinator(
 // Init a TimerTask for a barrier() call.
 private def initTimerTask(state: ContextBarrierState): Unit = {
   timerTask = new TimerTask {
-override def run(): Unit = state.synchronized {
-  // Timeout current barrier() call, fail all the sync requests.
-  requesters.foreach(_.sendFailure(new SparkException("The coordinator 
didn't get all " +
-s"barrier sync requests for barrier epoch $barrierEpoch from 
$barrierId within " +
-s"$timeoutInSecs second(s).")))
-  cleanupBarrierStage(barrierId)
-}
+override def run(): Unit =
+  try {
+state.synchronized {
+  if (!Thread.currentThread().isInterrupted()) {
+// Timeout current barrier() call, fail all the sync requests.
+requesters.foreach(
+  _.sendFailure(new SparkException("The coordinator didn't get 
all " +
+s"barrier sync requests for barrier epoch +" +
+s" $barrierEpoch from $barrierId within " +
+s"$timeoutInSecs second(s).")))
+cleanupBarrierStage(barrierId)
+  }
+}
+  } catch {
+case _: InterruptedException =>
+  // Handle interruption gracefully
+  Thread.currentThread().interrupt()
+case e: Exception => new SparkException("Error during " +
+  s"running of barrier tasks for " +
+  s"$barrierId", e)
+  } finally {
+// Ensure cleanup happens even if interrupted or exception occurs
+cleanupBarrierStage(barrierId)
+state.clear()
+  }
   }
 }
 
-// Cancel the current active TimerTask and release resources.
+/* Cancel the tasks scheduled to run inside the ScheduledExecutor 
Threadpool
+* The original implementation was clearing java.util.Timer and 
java.util.TimerTasks
+* This became a no-op when java.util.Timer was replaced with 
ScheduledThreadPoolExecutor
+*/
 private def cancelTimerTask(): Unit = {
-  if (timerTask != null) {
-timerTask.cancel()

Review Comment:
   It does not work. The explanation is available 
[here](https://github.com/apache/spark/pull/47956#issuecomment-2328035086). 
Basically, the issue was we moved from `Timertask` API to 
`ScheduledThreadPoolExecutor`  and the cancel method is no longer directly 
available as `ScheduledThreadPoolExecutor` returns a `ScheduledFuture` and we 
need to use 
[cancel](https://docs.oracle.com/en/java/javase/17/docs/api/java.base/java/util/concurrent/Future.html)
 method of `Future` 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-49479][CORE] Cancel the Timer non-daemon thread on stopping the BarrierCoordinator [spark]

2025-02-27 Thread via GitHub



jjayadeep06 commented on code in PR #50020:
URL: https://github.com/apache/spark/pull/50020#discussion_r1973390612


##
core/src/main/scala/org/apache/spark/BarrierCoordinator.scala:
##
@@ -122,23 +124,40 @@ private[spark] class BarrierCoordinator(
 // Init a TimerTask for a barrier() call.
 private def initTimerTask(state: ContextBarrierState): Unit = {
   timerTask = new TimerTask {
-override def run(): Unit = state.synchronized {
-  // Timeout current barrier() call, fail all the sync requests.
-  requesters.foreach(_.sendFailure(new SparkException("The coordinator 
didn't get all " +
-s"barrier sync requests for barrier epoch $barrierEpoch from 
$barrierId within " +
-s"$timeoutInSecs second(s).")))
-  cleanupBarrierStage(barrierId)
-}
+override def run(): Unit =
+  try {
+state.synchronized {
+  if (!Thread.currentThread().isInterrupted()) {
+// Timeout current barrier() call, fail all the sync requests.
+requesters.foreach(
+  _.sendFailure(new SparkException("The coordinator didn't get 
all " +
+s"barrier sync requests for barrier epoch +" +
+s" $barrierEpoch from $barrierId within " +
+s"$timeoutInSecs second(s).")))
+cleanupBarrierStage(barrierId)
+  }
+}
+  } catch {
+case _: InterruptedException =>
+  // Handle interruption gracefully
+  Thread.currentThread().interrupt()
+case e: Exception => new SparkException("Error during " +
+  s"running of barrier tasks for " +
+  s"$barrierId", e)
+  } finally {
+// Ensure cleanup happens even if interrupted or exception occurs
+cleanupBarrierStage(barrierId)
+state.clear()
+  }
   }
 }
 
-// Cancel the current active TimerTask and release resources.
+/* Cancel the tasks scheduled to run inside the ScheduledExecutor 
Threadpool
+* The original implementation was clearing java.util.Timer and 
java.util.TimerTasks
+* This became a no-op when java.util.Timer was replaced with 
ScheduledThreadPoolExecutor
+*/
 private def cancelTimerTask(): Unit = {
-  if (timerTask != null) {
-timerTask.cancel()

Review Comment:
   It does not work. The explanation is available 
[here](https://github.com/apache/spark/pull/47956#issuecomment-2328035086). 
Basically, the issue was we moved from `Timertask` API to 
`ScheduledThreadPoolExecutor`  and the cancel method is no longer directly 
available as `ScheduledThreadPoolExecutor` returns a `ScheduledFuture` and we 
need to use 
[cancel](https://docs.oracle.com/en/java/javase/17/docs/api/java.base/java/util/concurrent/Future.html)
 method of `Future` 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-49479][CORE] Cancel the Timer non-daemon thread on stopping the BarrierCoordinator [spark]

2025-02-27 Thread via GitHub



jayadeep-jayaraman commented on code in PR #50020:
URL: https://github.com/apache/spark/pull/50020#discussion_r1973345489


##
core/src/main/scala/org/apache/spark/BarrierCoordinator.scala:
##
@@ -80,8 +81,9 @@ private[spark] class BarrierCoordinator(
   states.forEachValue(1, clearStateConsumer)
   states.clear()
   listenerBus.removeListener(listener)
-  ThreadUtils.shutdown(timer)
 } finally {
+  timerFuture.foreach(_.cancel(true))
+  ThreadUtils.shutdown(timer)

Review Comment:
   I moved it to `finally` to ensure that even if there are any failures in the 
previous cleanup steps we still shutdown the timer. But i can move it to the 
`try` block if it is a concern



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[PR] [SPARK-51339] Remove `IllegalImportsChecker` for `scala.collection.Seq/IndexedSeq` from `scalastyle-config.xml` [spark]

2025-02-27 Thread via GitHub



LuciferYang opened a new pull request, #50105:
URL: https://github.com/apache/spark/pull/50105

   
   
   ### What changes were proposed in this pull request?
   
   
   
   ### Why are the changes needed?
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   
   ### How was this patch tested?
   
   
   
   ### Was this patch authored or co-authored using generative AI tooling?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-51281][SQL] DataFrameWriterV2 should respect the path option [spark]

2025-02-27 Thread via GitHub



cloud-fan commented on code in PR #50040:
URL: https://github.com/apache/spark/pull/50040#discussion_r1973575824


##
sql/core/src/test/scala/org/apache/spark/sql/DataFrameWriterV2Suite.scala:
##
@@ -841,20 +841,24 @@ class DataFrameWriterV2Suite extends QueryTest with 
SharedSparkSession with Befo
   }
 
   test("SPARK-51281: create/replace file source tables") {
+def checkResults(df: DataFrame): Unit = {
+  checkAnswer(df, spark.range(10).toDF())
+}

Review Comment:
   let's not over-engineering. `spark.range(10)` is already very short.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-51281][SQL] DataFrameWriterV2 should respect the path option [spark]

2025-02-27 Thread via GitHub



cloud-fan commented on PR #50040:
URL: https://github.com/apache/spark/pull/50040#issuecomment-2687956522

   thanks for the review, merging to master/4.0/3.5!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-51281][SQL] DataFrameWriterV2 should respect the path option [spark]

2025-02-27 Thread via GitHub



cloud-fan closed pull request #50040: [SPARK-51281][SQL] DataFrameWriterV2 
should respect the path option
URL: https://github.com/apache/spark/pull/50040


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-49479][CORE] Cancel the Timer non-daemon thread on stopping the BarrierCoordinator [spark]

2025-02-27 Thread via GitHub



beliefer commented on code in PR #50020:
URL: https://github.com/apache/spark/pull/50020#discussion_r1973575314


##
core/src/main/scala/org/apache/spark/BarrierCoordinator.scala:
##
@@ -122,23 +124,40 @@ private[spark] class BarrierCoordinator(
 // Init a TimerTask for a barrier() call.
 private def initTimerTask(state: ContextBarrierState): Unit = {
   timerTask = new TimerTask {
-override def run(): Unit = state.synchronized {
-  // Timeout current barrier() call, fail all the sync requests.
-  requesters.foreach(_.sendFailure(new SparkException("The coordinator 
didn't get all " +
-s"barrier sync requests for barrier epoch $barrierEpoch from 
$barrierId within " +
-s"$timeoutInSecs second(s).")))
-  cleanupBarrierStage(barrierId)
-}
+override def run(): Unit =
+  try {
+state.synchronized {
+  if (!Thread.currentThread().isInterrupted()) {
+// Timeout current barrier() call, fail all the sync requests.
+requesters.foreach(
+  _.sendFailure(new SparkException("The coordinator didn't get 
all " +
+s"barrier sync requests for barrier epoch +" +
+s" $barrierEpoch from $barrierId within " +
+s"$timeoutInSecs second(s).")))
+cleanupBarrierStage(barrierId)
+  }
+}
+  } catch {
+case _: InterruptedException =>
+  // Handle interruption gracefully
+  Thread.currentThread().interrupt()
+case e: Exception => new SparkException("Error during " +
+  s"running of barrier tasks for " +
+  s"$barrierId", e)
+  } finally {
+// Ensure cleanup happens even if interrupted or exception occurs
+cleanupBarrierStage(barrierId)
+state.clear()
+  }
   }
 }
 
-// Cancel the current active TimerTask and release resources.
+/* Cancel the tasks scheduled to run inside the ScheduledExecutor 
Threadpool
+* The original implementation was clearing java.util.Timer and 
java.util.TimerTasks
+* This became a no-op when java.util.Timer was replaced with 
ScheduledThreadPoolExecutor
+*/
 private def cancelTimerTask(): Unit = {
-  if (timerTask != null) {
-timerTask.cancel()

Review Comment:
   Got it. Thank you!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-49479][CORE] Cancel the Timer non-daemon thread on stopping the BarrierCoordinator [spark]

2025-02-27 Thread via GitHub



beliefer commented on code in PR #50020:
URL: https://github.com/apache/spark/pull/50020#discussion_r1973607118


##
core/src/main/scala/org/apache/spark/BarrierCoordinator.scala:
##
@@ -122,23 +124,40 @@ private[spark] class BarrierCoordinator(
 // Init a TimerTask for a barrier() call.
 private def initTimerTask(state: ContextBarrierState): Unit = {
   timerTask = new TimerTask {
-override def run(): Unit = state.synchronized {
-  // Timeout current barrier() call, fail all the sync requests.
-  requesters.foreach(_.sendFailure(new SparkException("The coordinator 
didn't get all " +
-s"barrier sync requests for barrier epoch $barrierEpoch from 
$barrierId within " +
-s"$timeoutInSecs second(s).")))
-  cleanupBarrierStage(barrierId)
-}
+override def run(): Unit =
+  try {
+state.synchronized {
+  if (!Thread.currentThread().isInterrupted()) {
+// Timeout current barrier() call, fail all the sync requests.
+requesters.foreach(
+  _.sendFailure(new SparkException("The coordinator didn't get 
all " +
+s"barrier sync requests for barrier epoch +" +
+s" $barrierEpoch from $barrierId within " +
+s"$timeoutInSecs second(s).")))
+cleanupBarrierStage(barrierId)
+  }
+}
+  } catch {
+case _: InterruptedException =>
+  // Handle interruption gracefully
+  Thread.currentThread().interrupt()
+case e: Exception => new SparkException("Error during " +
+  s"running of barrier tasks for " +
+  s"$barrierId", e)
+  } finally {
+// Ensure cleanup happens even if interrupted or exception occurs
+cleanupBarrierStage(barrierId)
+state.clear()
+  }

Review Comment:
   Why do we add these block?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[PR] [SPARK-51341][SQL] Cancel time task with suitable way. [spark]

2025-02-27 Thread via GitHub



beliefer opened a new pull request, #50107:
URL: https://github.com/apache/spark/pull/50107

   ### What changes were proposed in this pull request?
   This PR proposes to cancel task with suitable way.
   
   
   ### Why are the changes needed?
   According to the discussion at
   https://github.com/apache/spark/pull/47956#issuecomment-2328035086
   TimerTask.cancel() doesn't work.
   
   
   ### Does this PR introduce _any_ user-facing change?
   'No'.
   
   
   ### How was this patch tested?
   GA
   
   
   ### Was this patch authored or co-authored using generative AI tooling?
   'No'.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-49756][SQL][FOLLOWUP] Use correct pgsql datetime fields when pushing down EXTRACT [spark]

2025-02-27 Thread via GitHub



cloud-fan commented on PR #50101:
URL: https://github.com/apache/spark/pull/50101#issuecomment-2688008852

   and also cc @yaooqinn 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-51182][SQL] DataFrameWriter should throw dataPathNotSpecifiedError when path is not specified [spark]

2025-02-27 Thread via GitHub



cloud-fan commented on PR #49928:
URL: https://github.com/apache/spark/pull/49928#issuecomment-2688013194

   The change LGTM, can we add a test in `DataFrameReaderWriterSuite`?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-48375][SQL] Add support for SIGNAL statement [spark]

2025-02-27 Thread via GitHub



cloud-fan commented on PR #49726:
URL: https://github.com/apache/spark/pull/49726#issuecomment-2688007524

   Can we implement the SIGNAL statement as a `SELECT raise_error(...)`? Then 
it's consistent that every scripting statement generates a DataFrame.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-51299][SQL][UI] MetricUtils.stringValue should filter metric values with initValue rather than a hardcoded value [spark]

2025-02-27 Thread via GitHub



cloud-fan commented on PR #50055:
URL: https://github.com/apache/spark/pull/50055#issuecomment-2688019216

   @jiwen624 I think you are right, do you have a concrete metric that has the 
problem? For metrics that we only need a sum, `0` is OK as the initial value 
and we don't filter it out.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-51307][SQL] locationUri in CatalogStorageFormat shall be decoded for display [spark]

2025-02-27 Thread via GitHub



cloud-fan commented on code in PR #50074:
URL: https://github.com/apache/spark/pull/50074#discussion_r1973628519


##
sql/core/src/test/resources/sql-tests/results/describe.sql.out:
##
@@ -890,6 +890,48 @@ a  string  
CONCAT('a\n b\n ', 'c\n d')
 b  int 42
 
 
+-- !query
+CREATE TABLE f PARTITIONED BY (B, C) AS SELECT 'APACHE' A, CAST('SPARK' AS 
BINARY) B, TIMESTAMP'2018-11-17 13:33:33' C
+-- !query schema
+struct<>
+-- !query output
+
+
+
+-- !query
+DESC FORMATTED f PARTITION (B='SPARK', C=TIMESTAMP'2018-11-17 13:33:33')
+-- !query schema
+struct
+-- !query output
+A  string  
+B  binary  
+C  timestamp   
+# Partition Information
+# col_name data_type   comment 
+B  binary  
+C  timestamp   
+   
+# Detailed Partition Information   

+Database   default 
+Table  f   
+Partition Values   [B=SPARK, C=2018-11-17 13:33:33]

+Location [not included in comparison]/{warehouse_dir}/f/B=SPARK/C=2018-11-17 
13%3A33%3A33  
+Partition Parameters   [numFiles=1, totalSize=486, transient_lastDdlTime=[not 
included in comparison]] 

Review Comment:
   we have `transient_lastDdlTime` because it's a hive table?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-51307][SQL] locationUri in CatalogStorageFormat shall be decoded for display [spark]

2025-02-27 Thread via GitHub



cloud-fan commented on code in PR #50074:
URL: https://github.com/apache/spark/pull/50074#discussion_r1973633764


##
sql/core/src/test/resources/sql-tests/results/describe.sql.out:
##
@@ -890,6 +890,48 @@ a  string  
CONCAT('a\n b\n ', 'c\n d')
 b  int 42
 
 
+-- !query
+CREATE TABLE f PARTITIONED BY (B, C) AS SELECT 'APACHE' A, CAST('SPARK' AS 
BINARY) B, TIMESTAMP'2018-11-17 13:33:33' C
+-- !query schema
+struct<>
+-- !query output
+
+
+
+-- !query
+DESC FORMATTED f PARTITION (B='SPARK', C=TIMESTAMP'2018-11-17 13:33:33')
+-- !query schema
+struct
+-- !query output
+A  string  
+B  binary  
+C  timestamp   
+# Partition Information
+# col_name data_type   comment 
+B  binary  
+C  timestamp   
+   
+# Detailed Partition Information   

+Database   default 
+Table  f   
+Partition Values   [B=SPARK, C=2018-11-17 13:33:33]

+Location [not included in comparison]/{warehouse_dir}/f/B=SPARK/C=2018-11-17 
13%3A33%3A33  
+Partition Parameters   [numFiles=1, totalSize=486, transient_lastDdlTime=[not 
included in comparison]] 

Review Comment:
   Maybe we can test with `... AS JSON` to avoid seeing these unrelated fields.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-51342][SQL] Add `TimeType` [spark]

2025-02-27 Thread via GitHub



the-sakthi commented on code in PR #50103:
URL: https://github.com/apache/spark/pull/50103#discussion_r1974366151


##
common/utils/src/main/resources/error/error-conditions.json:
##
@@ -6190,6 +6190,12 @@
 ],
 "sqlState" : "42000"
   },
+  "UNSUPPORTED_TIME_PRECISION" : {

Review Comment:
   Nit: UNSUPPORTED_TIME_SECONDS_PRECISION ? 
   Though the precision part comes only from the seconds in the time, so I 
guess it's fine either way?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-51326][CONNECT][4.0] Remove LazyExpression proto message [spark]

2025-02-27 Thread via GitHub



ueshin commented on PR #50094:
URL: https://github.com/apache/spark/pull/50094#issuecomment-263482

   Thanks! merging to branch-4.0.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-51342][SQL] Add `TimeType` [spark]

2025-02-27 Thread via GitHub



the-sakthi commented on PR #50103:
URL: https://github.com/apache/spark/pull/50103#issuecomment-2689165941

   LGTM


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-51339][BUILD] Remove `IllegalImportsChecker` for `s.c.Seq/IndexedSeq` from `scalastyle-config.xml` [spark]

2025-02-27 Thread via GitHub



the-sakthi commented on PR #50105:
URL: https://github.com/apache/spark/pull/50105#issuecomment-2689144092

   LGTM
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-51344] Fix `ENV` key value format in `*.template` [spark-docker]

2025-02-27 Thread via GitHub



dongjoon-hyun commented on PR #82:
URL: https://github.com/apache/spark-docker/pull/82#issuecomment-2689182775

   Thank you, @viirya !


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-51344] Fix `ENV` key value format in `*.template` [spark-docker]

2025-02-27 Thread via GitHub



dongjoon-hyun closed pull request #82: [SPARK-51344] Fix `ENV` key value format 
in `*.template`
URL: https://github.com/apache/spark-docker/pull/82


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-51270][SQL] Support UUID type in Variant [spark]

2025-02-27 Thread via GitHub



gene-db commented on code in PR #50025:
URL: https://github.com/apache/spark/pull/50025#discussion_r1974322433


##
common/variant/src/main/java/org/apache/spark/types/variant/VariantBuilder.java:
##
@@ -240,6 +242,18 @@ public void appendBinary(byte[] binary) {
 writePos += binary.length;
   }
 
+  public void appendUuid(UUID uuid) {

Review Comment:
   Is there a way to test this `appendUuid` api?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-51301][BUILD] Bump zstd-jni 1.5.7-1 [spark]

2025-02-27 Thread via GitHub



luben commented on code in PR #50057:
URL: https://github.com/apache/spark/pull/50057#discussion_r1974306557


##
core/benchmarks/ZStandardBenchmark-jdk21-results.txt:
##
@@ -2,48 +2,48 @@
 Benchmark ZStandardCompressionCodec
 

 
-OpenJDK 64-Bit Server VM 21.0.6+7-LTS on Linux 6.8.0-1020-azure
+OpenJDK 64-Bit Server VM 21.0.6+7-LTS on Linux 6.8.0-1021-azure
 AMD EPYC 7763 64-Core Processor
 Benchmark ZStandardCompressionCodec:Best Time(ms)   Avg 
Time(ms)   Stdev(ms)Rate(M/s)   Per Row(ns)   Relative
 
--
-Compression 1 times at level 1 without buffer pool656  
  668  12  0.0   65591.6   1.0X
-Compression 1 times at level 2 without buffer pool709  
  711   2  0.0   70934.6   0.9X
-Compression 1 times at level 3 without buffer pool814  
  818   5  0.0   81370.9   0.8X
-Compression 1 times at level 1 with buffer pool   601  
  603   2  0.0   60100.1   1.1X
-Compression 1 times at level 2 with buffer pool   634  
  636   2  0.0   63449.9   1.0X
-Compression 1 times at level 3 with buffer pool   748  
  753   5  0.0   74789.7   0.9X
+Compression 1 times at level 1 without buffer pool650  
  666  11  0.0   65044.6   1.0X
+Compression 1 times at level 2 without buffer pool701  
  703   2  0.0   70071.2   0.9X
+Compression 1 times at level 3 without buffer pool804  
  807   2  0.0   80430.6   0.8X
+Compression 1 times at level 1 with buffer pool   598  
  599   1  0.0   59810.5   1.1X
+Compression 1 times at level 2 with buffer pool   631  
  640  17  0.0   63096.7   1.0X
+Compression 1 times at level 3 with buffer pool   751  
  753   4  0.0   75051.1   0.9X
 
-OpenJDK 64-Bit Server VM 21.0.6+7-LTS on Linux 6.8.0-1020-azure
+OpenJDK 64-Bit Server VM 21.0.6+7-LTS on Linux 6.8.0-1021-azure
 AMD EPYC 7763 64-Core Processor
 Benchmark ZStandardCompressionCodec:Best Time(ms)   
Avg Time(ms)   Stdev(ms)Rate(M/s)   Per Row(ns)   Relative
 
--
-Decompression 1 times from level 1 without buffer pool817  
  818   1  0.0   81723.2   1.0X
-Decompression 1 times from level 2 without buffer pool817  
  818   1  0.0   81729.4   1.0X
-Decompression 1 times from level 3 without buffer pool817  
  818   1  0.0   81719.7   1.0X
-Decompression 1 times from level 1 with buffer pool   749  
  757  14  0.0   74864.9   1.1X
-Decompression 1 times from level 2 with buffer pool   748  
  749   1  0.0   74789.1   1.1X
-Decompression 1 times from level 3 with buffer pool   748  
  749   1  0.0   74811.6   1.1X
+Decompression 1 times from level 1 without buffer pool823  
  831  14  0.0   82254.2   1.0X
+Decompression 1 times from level 2 without buffer pool824  
  827   5  0.0   82359.1   1.0X
+Decompression 1 times from level 3 without buffer pool819  
  821   2  0.0   81925.1   1.0X
+Decompression 1 times from level 1 with buffer pool   753  
  755   3  0.0   75284.1   1.1X
+Decompression 1 times from level 2 with buffer pool   753  
  755   1  0.0   75344.7   1.1X
+Decompression 1 times from level 3 with buffer pool   753  
  754   1  0.0   75263.0   1.1X
 
-OpenJDK 64-Bit Server VM 21.0.6+7-LTS on Linux 6.8.0-1020-azure
+OpenJDK 64-Bit Server VM 21.0.6+7-LTS on Linux 6.8.0-1021-azure
 AMD EPYC 7763 64-Core Processor
 Parallel Compression at level 3:  Best Time(ms)   Avg Time(ms)   
Stdev(ms)Rate(M/s)   Per Row(ns)   Relative
 

-Parallel Compression w

Re: [PR] [SPARK-51326][CONNECT][4.0] Remove LazyExpression proto message [spark]

2025-02-27 Thread via GitHub



ueshin commented on PR #50094:
URL: https://github.com/apache/spark/pull/50094#issuecomment-262567

   The remaining test failures are not related to this PR.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-51278][FOLLOWUP][DOCS] Update JSON format from documentation [spark]

2025-02-27 Thread via GitHub



the-sakthi commented on PR #50100:
URL: https://github.com/apache/spark/pull/50100#issuecomment-2689210402

   LGTM


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-51270][SQL] Support UUID type in Variant [spark]

2025-02-27 Thread via GitHub



cashmand commented on code in PR #50025:
URL: https://github.com/apache/spark/pull/50025#discussion_r1974435292


##
common/variant/src/main/java/org/apache/spark/types/variant/VariantBuilder.java:
##
@@ -240,6 +242,18 @@ public void appendBinary(byte[] binary) {
 writePos += binary.length;
   }
 
+  public void appendUuid(UUID uuid) {

Review Comment:
   Good catch, I hadn't noticed that this function wasn't tested. I modified 
one of the tests I added to use VariantBuilder to construct its UUID, so that 
it uses this function. This exposed a bug - when I switched to using 
`ByteBuffer.wrap`, I didn't update the indices that I wrote the UUID to, so it 
was always writing them at index 0 of the full value array.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-50855][SS][CONNECT] Spark Connect Support for TransformWithState In Scala [spark]

2025-02-27 Thread via GitHub



jingz-db commented on code in PR #49488:
URL: https://github.com/apache/spark/pull/49488#discussion_r1974452111


##
sql/connect/client/jvm/src/test/scala/org/apache/spark/sql/connect/streaming/TransformWithStateConnectSuite.scala:
##


Review Comment:
   > I've realized that you just ported the tests from python to Scala
   
   Exactly. I am porting python tests as connect Scala tests because Python 
tests are essentially end to end tests as well.
   
   > without leveraging better test framework. Would we just make this test 
suite be concise as much as possible rather than blind port?
   
   Unfortunately `StreamTest` and `MemoryStream` are not visible from 
connect/clinet/jvm directory we are currently using for connect related suites. 
You may find all our other streaming tests are also end to end tests and did 
not make use of the StreamTest framework as well: 
https://github.com/apache/spark/blob/master/sql/connect/client/jvm/src/test/scala/org/apache/spark/sql/connect/streaming/FlatMapGroupsWithStateStreamingSuite.scala#L36



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-50855][SS][CONNECT] Spark Connect Support for TransformWithState In Scala [spark]

2025-02-27 Thread via GitHub



jingz-db commented on code in PR #49488:
URL: https://github.com/apache/spark/pull/49488#discussion_r1974458075


##
sql/connect/client/jvm/src/test/scala/org/apache/spark/sql/connect/streaming/TransformWithStateConnectSuite.scala:
##
@@ -0,0 +1,489 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.connect.streaming
+
+import java.io.{BufferedWriter, FileWriter}
+import java.nio.file.{Files, Paths}
+import java.sql.Timestamp
+
+import org.scalatest.concurrent.Eventually.eventually
+import org.scalatest.concurrent.Futures.timeout
+import org.scalatest.time.SpanSugar._
+
+import org.apache.spark.internal.Logging
+import org.apache.spark.sql.{DataFrame, Dataset, Encoders, Row}
+import org.apache.spark.sql.connect.SparkSession
+import org.apache.spark.sql.connect.test.{QueryTest, RemoteSparkSession}
+import org.apache.spark.sql.functions._
+import org.apache.spark.sql.streaming.{ListState, MapState, OutputMode, 
StatefulProcessor, StatefulProcessorWithInitialState, TimeMode, TimerValues, 
TTLConfig, ValueState}
+import org.apache.spark.sql.types._
+
+case class InputRowForConnectTest(key: String, value: String)
+case class OutputRowForConnectTest(key: String, value: String)
+case class StateRowForConnectTest(count: Long)
+
+// A basic stateful processor which will return the occurrences of key
+class BasicCountStatefulProcessor
+extends StatefulProcessor[String, InputRowForConnectTest, 
OutputRowForConnectTest]
+with Logging {
+  @transient protected var _countState: ValueState[StateRowForConnectTest] = _
+
+  override def init(outputMode: OutputMode, timeMode: TimeMode): Unit = {
+_countState = getHandle.getValueState[StateRowForConnectTest](
+  "countState",
+  Encoders.product[StateRowForConnectTest],
+  TTLConfig.NONE)
+  }
+
+  override def handleInputRows(
+  key: String,
+  inputRows: Iterator[InputRowForConnectTest],
+  timerValues: TimerValues): Iterator[OutputRowForConnectTest] = {
+val count =
+  _countState.getOption().getOrElse(StateRowForConnectTest(0L)).count + 
inputRows.toSeq.length
+_countState.update(StateRowForConnectTest(count))
+Iterator(OutputRowForConnectTest(key, count.toString))
+  }
+}
+
+// A stateful processor with initial state which will return the occurrences 
of key
+class TestInitialStatefulProcessor
+extends StatefulProcessorWithInitialState[
+  String,
+  (String, String),
+  (String, String),
+  (String, String, String)]
+with Logging {
+  @transient protected var _countState: ValueState[Long] = _
+
+  override def init(outputMode: OutputMode, timeMode: TimeMode): Unit = {
+_countState = getHandle.getValueState[Long]("countState", 
Encoders.scalaLong, TTLConfig.NONE)
+  }
+
+  override def handleInputRows(
+  key: String,
+  inputRows: Iterator[(String, String)],
+  timerValues: TimerValues): Iterator[(String, String)] = {
+val count = _countState.getOption().getOrElse(0L) + inputRows.toSeq.length
+_countState.update(count)
+Iterator((key, count.toString))
+  }
+
+  override def handleInitialState(
+  key: String,
+  initialState: (String, String, String),
+  timerValues: TimerValues): Unit = {
+val count = _countState.getOption().getOrElse(0L) + 1
+_countState.update(count)
+  }
+}
+
+case class OutputEventTimeRow(key: String, outputTimestamp: Timestamp)
+
+// A stateful processor which will return timestamp of the first item from 
input rows
+class ChainingOfOpsStatefulProcessor
+extends StatefulProcessor[String, (String, Timestamp), OutputEventTimeRow] 
{
+  override def init(outputMode: OutputMode, timeMode: TimeMode): Unit = {}
+
+  override def handleInputRows(
+  key: String,
+  inputRows: Iterator[(String, Timestamp)],
+  timerValues: TimerValues): Iterator[OutputEventTimeRow] = {
+val timestamp = inputRows.next()._2
+Iterator(OutputEventTimeRow(key, timestamp))
+  }
+}
+
+// A basic stateful processor contains composite state variables and TTL
+class TTLTestStatefulProcessor
+extends StatefulProcessor[String, (String, String), (String, String)] {
+  import java.time.Duration
+
+  @transient protected var countSt

Re: [PR] [SPARK-50855][SS][CONNECT] Spark Connect Support for TransformWithState In Scala [spark]

2025-02-27 Thread via GitHub



jingz-db commented on code in PR #49488:
URL: https://github.com/apache/spark/pull/49488#discussion_r1974452111


##
sql/connect/client/jvm/src/test/scala/org/apache/spark/sql/connect/streaming/TransformWithStateConnectSuite.scala:
##


Review Comment:
   > I've realized that you just ported the tests from python to Scala
   
   Exactly. I am porting python tests as connect Scala tests because Python 
tests are essentially end to end tests as well.
   
   > without leveraging better test framework. Would we just make this test 
suite be concise as much as possible rather than blind port?
   
   Unfortunately `StreamTest` and `MemoryStream` are not visible from 
connect/clinet/jvm directory we are currently using for connect related suites. 
You may find our other streaming tests are also end to end tests and did not 
make use of the StreamTest framework as well: 
https://github.com/apache/spark/blob/master/sql/connect/client/jvm/src/test/scala/org/apache/spark/sql/connect/streaming/FlatMapGroupsWithStateStreamingSuite.scala#L36



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-51337][SQL] Add maxRows to CTERelationDef and CTERelationRef [spark]

2025-02-27 Thread via GitHub



cloud-fan closed pull request #50104: [SPARK-51337][SQL] Add maxRows to 
CTERelationDef and CTERelationRef
URL: https://github.com/apache/spark/pull/50104


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-51337][SQL] Add maxRows to CTERelationDef and CTERelationRef [spark]

2025-02-27 Thread via GitHub



cloud-fan commented on PR #50104:
URL: https://github.com/apache/spark/pull/50104#issuecomment-2689469635

   thanks, merging to master!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-51303] [SQL] [TESTS] Extend `ORDER BY` testing coverage [spark]

2025-02-27 Thread via GitHub



cloud-fan commented on PR #50069:
URL: https://github.com/apache/spark/pull/50069#issuecomment-2689447332

   This is a test only PR and other test failures are definitely unrelated. 
Thanks, merging to master/4.0!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-51303] [SQL] [TESTS] Extend `ORDER BY` testing coverage [spark]

2025-02-27 Thread via GitHub



cloud-fan closed pull request #50069: [SPARK-51303] [SQL] [TESTS] Extend `ORDER 
BY` testing coverage
URL: https://github.com/apache/spark/pull/50069


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [WIP][SPARK-51271][PYTHON] Add filter pushdown API to Python Data Sources [spark]

2025-02-27 Thread via GitHub



wengh commented on code in PR #49961:
URL: https://github.com/apache/spark/pull/49961#discussion_r1974559481


##
python/pyspark/sql/tests/test_python_datasource.py:
##
@@ -246,6 +248,137 @@ def reader(self, schema) -> "DataSourceReader":
 assertDataFrameEqual(df, [Row(x=0, y="0"), Row(x=1, y="1")])
 self.assertEqual(df.select(spark_partition_id()).distinct().count(), 2)
 
+def test_filter_pushdown(self):
+class TestDataSourceReader(DataSourceReader):
+def __init__(self):
+self.has_filter = False
+
+def pushdownFilters(self, filters: List[Filter]) -> 
Iterable[Filter]:
+assert set(filters) == {
+EqualTo(("x",), 1),
+EqualTo(("y",), 2),
+}, filters
+self.has_filter = True
+# pretend we support x = 1 filter but in fact we don't
+# so we only return y = 2 filter
+yield filters[filters.index(EqualTo(("y",), 2))]
+
+def partitions(self):
+assert self.has_filter
+return super().partitions()
+
+def read(self, partition):
+assert self.has_filter
+yield [1, 1]
+yield [1, 2]
+yield [2, 2]
+
+class TestDataSource(DataSource):
+@classmethod
+def name(cls):
+return "test"
+
+def schema(self):
+return "x int, y int"
+
+def reader(self, schema) -> "DataSourceReader":
+return TestDataSourceReader()
+
+self.spark.dataSource.register(TestDataSource)
+df = self.spark.read.format("test").load().filter("x = 1 and y = 2")

Review Comment:
   added pushed filters to the physical plan description and a scala test for 
the physical plan



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [WIP][SPARK-51271][PYTHON] Add filter pushdown API to Python Data Sources [spark]

2025-02-27 Thread via GitHub



wengh commented on code in PR #49961:
URL: https://github.com/apache/spark/pull/49961#discussion_r1974559481


##
python/pyspark/sql/tests/test_python_datasource.py:
##
@@ -246,6 +248,137 @@ def reader(self, schema) -> "DataSourceReader":
 assertDataFrameEqual(df, [Row(x=0, y="0"), Row(x=1, y="1")])
 self.assertEqual(df.select(spark_partition_id()).distinct().count(), 2)
 
+def test_filter_pushdown(self):
+class TestDataSourceReader(DataSourceReader):
+def __init__(self):
+self.has_filter = False
+
+def pushdownFilters(self, filters: List[Filter]) -> 
Iterable[Filter]:
+assert set(filters) == {
+EqualTo(("x",), 1),
+EqualTo(("y",), 2),
+}, filters
+self.has_filter = True
+# pretend we support x = 1 filter but in fact we don't
+# so we only return y = 2 filter
+yield filters[filters.index(EqualTo(("y",), 2))]
+
+def partitions(self):
+assert self.has_filter
+return super().partitions()
+
+def read(self, partition):
+assert self.has_filter
+yield [1, 1]
+yield [1, 2]
+yield [2, 2]
+
+class TestDataSource(DataSource):
+@classmethod
+def name(cls):
+return "test"
+
+def schema(self):
+return "x int, y int"
+
+def reader(self, schema) -> "DataSourceReader":
+return TestDataSourceReader()
+
+self.spark.dataSource.register(TestDataSource)
+df = self.spark.read.format("test").load().filter("x = 1 and y = 2")

Review Comment:
   added the description and a scala test for this



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-50855][SS][CONNECT] Spark Connect Support for TransformWithState In Scala [spark]

2025-02-27 Thread via GitHub



jingz-db commented on code in PR #49488:
URL: https://github.com/apache/spark/pull/49488#discussion_r1974088879


##
sql/connect/common/src/main/protobuf/spark/connect/relations.proto:
##


Review Comment:
   Yes, Scala & Python is sharing the same connect client protocol.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-51278][FOLLOWUP][DOCS] Update JSON format from documentation [spark]

2025-02-27 Thread via GitHub



HyukjinKwon closed pull request #50100: [SPARK-51278][FOLLOWUP][DOCS] Update 
JSON format from documentation
URL: https://github.com/apache/spark/pull/50100


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-51278][FOLLOWUP][DOCS] Update JSON format from documentation [spark]

2025-02-27 Thread via GitHub



HyukjinKwon commented on PR #50100:
URL: https://github.com/apache/spark/pull/50100#issuecomment-2689394019

   Merged to master and branch-4.0.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] Dev/milast/recurisve cte [spark]

2025-02-27 Thread via GitHub



github-actions[bot] closed pull request #48878: Dev/milast/recurisve cte
URL: https://github.com/apache/spark/pull/48878


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-51335] Publish Apache Spark 3.5.5 to docker registry [spark-docker]

2025-02-27 Thread via GitHub



dongjoon-hyun commented on PR #80:
URL: https://github.com/apache/spark-docker/pull/80#issuecomment-2689111825

   Thank you for the pointer, @pan3793 . Let me try in this PR.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[PR] [SPARK-51344] Fix `ENV` key value format in `*.template` [spark-docker]

2025-02-27 Thread via GitHub



dongjoon-hyun opened a new pull request, #82:
URL: https://github.com/apache/spark-docker/pull/82

   ### What changes were proposed in this pull request?
   
   This PR aims to fix `ENV` key value format in `*.template`.
   
   ### Why are the changes needed?
   
   To follow the Docker guideline to fix the following legacy format.
   - https://docs.docker.com/reference/build-checks/legacy-key-value-format/
   
   ```
   - LegacyKeyValueFormat: "ENV key=value" should be used instead of legacy 
"ENV key value" format
   ```
   
   ### Does this PR introduce _any_ user-facing change?
   
   No. This removes a warning for now.
   
   ### How was this patch tested?
   
   Manual review.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-51298][WIP] Support variant in CSV scan [spark]

2025-02-27 Thread via GitHub



the-sakthi commented on PR #50052:
URL: https://github.com/apache/spark/pull/50052#issuecomment-2689223159

   Hello @chenhao-db thanks for the changes, could we please get some 
description in the jira and the response to the PR template quetsions above? 
Helps a lot with understanding the context for reviewing the PR. Thanks.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[PR] [SPARK-51345] Remove all EOL versions from `spark-docker` repository [spark-docker]

2025-02-27 Thread via GitHub



dongjoon-hyun opened a new pull request, #83:
URL: https://github.com/apache/spark-docker/pull/83

   
   
   ### What changes were proposed in this pull request?
   
   
   
   ### Why are the changes needed?
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   
   ### How was this patch tested?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-51345] Remove all EOL versions from `spark-docker` repository [spark-docker]

2025-02-27 Thread via GitHub



dongjoon-hyun commented on PR #83:
URL: https://github.com/apache/spark-docker/pull/83#issuecomment-2689220984

   cc @Yikun , @yaooqinn , @LuciferYang , @yaooqinn , @itholic , @viirya 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-51288][DOCS] Add link for Scala API of Spark Connect [spark]

2025-02-27 Thread via GitHub



the-sakthi commented on PR #50042:
URL: https://github.com/apache/spark/pull/50042#issuecomment-2689227610

   Nice. LGTM


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [WIP][SPARK-51271][PYTHON] Add filter pushdown API to Python Data Sources [spark]

2025-02-27 Thread via GitHub



wengh commented on code in PR #49961:
URL: https://github.com/apache/spark/pull/49961#discussion_r1974189473


##
python/pyspark/sql/tests/test_python_datasource.py:
##
@@ -246,6 +248,137 @@ def reader(self, schema) -> "DataSourceReader":
 assertDataFrameEqual(df, [Row(x=0, y="0"), Row(x=1, y="1")])
 self.assertEqual(df.select(spark_partition_id()).distinct().count(), 2)
 
+def test_filter_pushdown(self):
+class TestDataSourceReader(DataSourceReader):
+def __init__(self):
+self.has_filter = False
+
+def pushdownFilters(self, filters: List[Filter]) -> 
Iterable[Filter]:
+assert set(filters) == {
+EqualTo(("x",), 1),
+EqualTo(("y",), 2),
+}, filters
+self.has_filter = True
+# pretend we support x = 1 filter but in fact we don't
+# so we only return y = 2 filter
+yield filters[filters.index(EqualTo(("y",), 2))]
+
+def partitions(self):
+assert self.has_filter
+return super().partitions()
+
+def read(self, partition):
+assert self.has_filter
+yield [1, 1]
+yield [1, 2]
+yield [2, 2]
+
+class TestDataSource(DataSource):
+@classmethod
+def name(cls):
+return "test"
+
+def schema(self):
+return "x int, y int"
+
+def reader(self, schema) -> "DataSourceReader":
+return TestDataSourceReader()
+
+self.spark.dataSource.register(TestDataSource)
+df = self.spark.read.format("test").load().filter("x = 1 and y = 2")

Review Comment:
   like this
   ```
   *(1) Project [x#296, y#297]
   +- *(1) Filter ((isnotnull(x#296) AND isnotnull(y#297)) AND (y#297 = 2))
  +- BatchScan test[x#296, y#297] (Python) PushedFilters: [EqualTo(x,1)], 
ReadSchema: struct, ShortName: test RuntimeFilters: []
   ```



##
python/pyspark/sql/tests/test_python_datasource.py:
##
@@ -246,6 +248,137 @@ def reader(self, schema) -> "DataSourceReader":
 assertDataFrameEqual(df, [Row(x=0, y="0"), Row(x=1, y="1")])
 self.assertEqual(df.select(spark_partition_id()).distinct().count(), 2)
 
+def test_filter_pushdown(self):
+class TestDataSourceReader(DataSourceReader):
+def __init__(self):
+self.has_filter = False
+
+def pushdownFilters(self, filters: List[Filter]) -> 
Iterable[Filter]:
+assert set(filters) == {
+EqualTo(("x",), 1),
+EqualTo(("y",), 2),
+}, filters
+self.has_filter = True
+# pretend we support x = 1 filter but in fact we don't
+# so we only return y = 2 filter
+yield filters[filters.index(EqualTo(("y",), 2))]
+
+def partitions(self):
+assert self.has_filter
+return super().partitions()
+
+def read(self, partition):
+assert self.has_filter
+yield [1, 1]
+yield [1, 2]
+yield [2, 2]
+
+class TestDataSource(DataSource):
+@classmethod
+def name(cls):
+return "test"
+
+def schema(self):
+return "x int, y int"
+
+def reader(self, schema) -> "DataSourceReader":
+return TestDataSourceReader()
+
+self.spark.dataSource.register(TestDataSource)
+df = self.spark.read.format("test").load().filter("x = 1 and y = 2")

Review Comment:
   like this
   ```
   *(1) Project [x#296, y#297]
   +- *(1) Filter ((isnotnull(x#296) AND isnotnull(y#297)) AND (y#297 = 2))
  +- BatchScan test[x#296, y#297] (Python) PushedFilters: [EqualTo(x,1)], 
ReadSchema: struct RuntimeFilters: []
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-51345] Remove all EOL versions from `spark-docker` repository [spark-docker]

2025-02-27 Thread via GitHub



dongjoon-hyun commented on PR #83:
URL: https://github.com/apache/spark-docker/pull/83#issuecomment-2689239268

   Thank you, @viirya . No~ AFAIK, the published docker images will exist like 
the published Maven jars.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-51326][CONNECT] Remove LazyExpression proto message [spark]

2025-02-27 Thread via GitHub



ueshin closed pull request #50093: [SPARK-51326][CONNECT] Remove LazyExpression 
proto message
URL: https://github.com/apache/spark/pull/50093


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-51326][CONNECT] Remove LazyExpression proto message [spark]

2025-02-27 Thread via GitHub



ueshin commented on PR #50093:
URL: https://github.com/apache/spark/pull/50093#issuecomment-2688892681

   I reran the compatibility test after #50094 was merged and it passed. 
   - 
https://github.com/ueshin/apache-spark/actions/runs/13554631961/job/37946210197


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[PR] [SPARK-51347] Enable Ingress and Service Support for Spark Driver [spark-kubernetes-operator]

2025-02-27 Thread via GitHub



jiangzho opened a new pull request, #159:
URL: https://github.com/apache/spark-kubernetes-operator/pull/159

   
   
   ### What changes were proposed in this pull request?
   
   
   This PR adds support for launching Ingress and Services with Spark 
Applications.
   
   ### Why are the changes needed?
   
   
   There's common ask from user to support exposing driver via ingress and 
service to serve HTTP / HTTPS from outside the cluster - for example, to access 
Spark UI.
   
   As we support podTemplateSupport already, this feature can also be handy for 
those who would like to expose additional service ports in the same pod. 
Therefore, this PR aims to introduce this feature in a more general way instead 
of just exposing Spark UI.
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   No - unreleased version
   
   
   ### How was this patch tested?
   
   
   CIs
   
   ### Was this patch authored or co-authored using generative AI tooling?
   
   
   No
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-51322][SQL] Better error message for streaming subquery expression [spark]

2025-02-27 Thread via GitHub



cloud-fan closed pull request #50088: [SPARK-51322][SQL] Better error message 
for streaming subquery expression
URL: https://github.com/apache/spark/pull/50088


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-51322][SQL] Better error message for streaming subquery expression [spark]

2025-02-27 Thread via GitHub



cloud-fan commented on PR #50088:
URL: https://github.com/apache/spark/pull/50088#issuecomment-2687357950

   thanks for the review, merging to master/4.0!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[PR] [SPARK-49756][SQL][FOLLOWUP] Use correct pgsql datetime fields when pushing down EXTRACT [spark]

2025-02-27 Thread via GitHub



cloud-fan opened a new pull request, #50101:
URL: https://github.com/apache/spark/pull/50101

   
   
   ### What changes were proposed in this pull request?
   
   This is a followup of https://github.com/apache/spark/pull/48210 to fix 
correctness issues caused by pgsql filter pushdown. These datetime fields were 
picked wrongly before, see 
https://neon.tech/postgresql/postgresql-date-functions/postgresql-extract
   
   ### Why are the changes needed?
   
   bug fix
   
   ### Does this PR introduce _any_ user-facing change?
   
   Yes, query result is corrected, but this bug is not released yet.
   
   ### How was this patch tested?
   
   updated test
   
   ### Was this patch authored or co-authored using generative AI tooling?
   
   no


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-49756][SQL][FOLLOWUP] Use correct pgsql datetime fields when pushing down EXTRACT [spark]

2025-02-27 Thread via GitHub



cloud-fan commented on PR #50101:
URL: https://github.com/apache/spark/pull/50101#issuecomment-2687368062

   cc @beliefer @MaxGekk 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-51281][SQL] DataFrameWriterV2 should respect the path option [spark]

2025-02-27 Thread via GitHub



cloud-fan commented on code in PR #50040:
URL: https://github.com/apache/spark/pull/50040#discussion_r1973251531


##
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala:
##
@@ -5545,6 +5545,15 @@ object SQLConf {
 .booleanConf
 .createWithDefault(false)
 
+  val LEGACY_DF_WRITER_V2_IGNORE_PATH_OPTION =
+buildConf("spark.sql.legacy.dataFrameWriterV2IgnorePathOption")
+  .internal()
+  .doc("When set to true, DataFrameWriterV2 ignores the 'path' option and 
always write data " +
+"to the default table location.")
+  .version("3.5.5")

Review Comment:
   ```suggestion
 .version("3.5.6")
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-50994][CORE] Perform RDD conversion under tracked execution [spark]

2025-02-27 Thread via GitHub



cloud-fan commented on PR #49678:
URL: https://github.com/apache/spark/pull/49678#issuecomment-2687493241

   thanks, merging to master/4.0!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-48375][SQL] Add support for SIGNAL statement [spark]

2025-02-27 Thread via GitHub



cloud-fan commented on code in PR #49726:
URL: https://github.com/apache/spark/pull/49726#discussion_r1973284907


##
sql/core/src/test/scala/org/apache/spark/sql/scripting/SqlScriptingExecutionSuite.scala:
##
@@ -69,6 +70,222 @@ class SqlScriptingExecutionSuite extends QueryTest with 
SharedSparkSession {
 }
   }
 
+  // Signal tests
+  test("signal statement - condition") {
+val sqlScript =
+  """
+|BEGIN
+|  DECLARE TEST_CONDITION CONDITION FOR SQLSTATE '12345';
+|  SIGNAL TEST_CONDITION;
+|END
+|""".stripMargin
+val exception = intercept[SqlScriptingRuntimeException] {
+  verifySqlScriptResult(sqlScript, Seq.empty)
+}
+checkError(

Review Comment:
   posting the error message here as a PR comment is good enough.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[PR] [SPARK-51335] Publish Apache Spark 3.5.5 to docker registry [spark-docker]

2025-02-27 Thread via GitHub



dongjoon-hyun opened a new pull request, #80:
URL: https://github.com/apache/spark-docker/pull/80

   
   
   ### What changes were proposed in this pull request?
   
   
   
   ### Why are the changes needed?
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   
   ### How was this patch tested?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-50994][CORE] Perform RDD conversion under tracked execution [spark]

2025-02-27 Thread via GitHub



cloud-fan commented on code in PR #49678:
URL: https://github.com/apache/spark/pull/49678#discussion_r1973281569


##
sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala:
##
@@ -2721,6 +2721,25 @@ class DataFrameSuite extends QueryTest
   parameters = Map("name" -> ".whatever")
 )
   }
+
+  test("SPARK-50994: RDD conversion is performed with execution context") {
+withSQLConf(SQLConf.CASE_SENSITIVE.key -> "true") {

Review Comment:
   I see, let's merge it as it is.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-50994][CORE] Perform RDD conversion under tracked execution [spark]

2025-02-27 Thread via GitHub



cloud-fan closed pull request #49678: [SPARK-50994][CORE] Perform RDD 
conversion under tracked execution
URL: https://github.com/apache/spark/pull/49678


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[PR] [SPARK-51336] Upgrade `upload-artifact` to v4 in `spark-docker` repository [spark-docker]

2025-02-27 Thread via GitHub



dongjoon-hyun opened a new pull request, #81:
URL: https://github.com/apache/spark-docker/pull/81

   …
   
   
   
   ### What changes were proposed in this pull request?
   
   
   
   ### Why are the changes needed?
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   
   ### How was this patch tested?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-51310][SQL] Resolve the type of default string producing expressions [spark]

2025-02-27 Thread via GitHub



cloud-fan commented on PR #50053:
URL: https://github.com/apache/spark/pull/50053#issuecomment-2687505463

   thanks, merging to master/4.0!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-51310][SQL] Resolve the type of default string producing expressions [spark]

2025-02-27 Thread via GitHub



cloud-fan closed pull request #50053: [SPARK-51310][SQL] Resolve the type of 
default string producing expressions
URL: https://github.com/apache/spark/pull/50053


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-51341][CORE] Cancel time task with suitable way. [spark]

2025-02-27 Thread via GitHub



beliefer commented on code in PR #50107:
URL: https://github.com/apache/spark/pull/50107#discussion_r1973643904


##
core/src/main/scala/org/apache/spark/BarrierTaskContext.scala:
##
@@ -300,11 +300,7 @@ object BarrierTaskContext {
   @Since("2.4.0")
   def get(): BarrierTaskContext = 
TaskContext.get().asInstanceOf[BarrierTaskContext]
 
-  private val timer = {
-val executor = ThreadUtils.newDaemonSingleThreadScheduledExecutor(
-  "Barrier task timer for barrier() calls.")
-assert(executor.isInstanceOf[ScheduledThreadPoolExecutor])
-executor.asInstanceOf[ScheduledThreadPoolExecutor]
-  }
+  private val timer = ThreadUtils.newSingleThreadScheduledExecutor(

Review Comment:
   The origin code is non-daemon thread here.
   Please see
   
https://github.com/apache/spark/commit/5d5b3a54b7b5fb4308fe40da696ba805c72983fc#diff-e9f66d336ded6632366be06dbd70e74afe9595cfc92a962f2c8554f09af53d0fL286



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-49479][CORE] Cancel the Timer non-daemon thread on stopping the BarrierCoordinator [spark]

2025-02-27 Thread via GitHub



jjayadeep06 commented on code in PR #50020:
URL: https://github.com/apache/spark/pull/50020#discussion_r1973705329


##
core/src/main/scala/org/apache/spark/BarrierCoordinator.scala:
##
@@ -122,23 +124,40 @@ private[spark] class BarrierCoordinator(
 // Init a TimerTask for a barrier() call.
 private def initTimerTask(state: ContextBarrierState): Unit = {
   timerTask = new TimerTask {
-override def run(): Unit = state.synchronized {
-  // Timeout current barrier() call, fail all the sync requests.
-  requesters.foreach(_.sendFailure(new SparkException("The coordinator 
didn't get all " +
-s"barrier sync requests for barrier epoch $barrierEpoch from 
$barrierId within " +
-s"$timeoutInSecs second(s).")))
-  cleanupBarrierStage(barrierId)
-}
+override def run(): Unit =
+  try {
+state.synchronized {
+  if (!Thread.currentThread().isInterrupted()) {
+// Timeout current barrier() call, fail all the sync requests.
+requesters.foreach(
+  _.sendFailure(new SparkException("The coordinator didn't get 
all " +
+s"barrier sync requests for barrier epoch +" +
+s" $barrierEpoch from $barrierId within " +
+s"$timeoutInSecs second(s).")))
+cleanupBarrierStage(barrierId)
+  }
+}
+  } catch {
+case _: InterruptedException =>
+  // Handle interruption gracefully
+  Thread.currentThread().interrupt()
+case e: Exception => new SparkException("Error during " +
+  s"running of barrier tasks for " +
+  s"$barrierId", e)
+  } finally {
+// Ensure cleanup happens even if interrupted or exception occurs
+cleanupBarrierStage(barrierId)
+state.clear()
+  }

Review Comment:
   The `cleanupBarrierStage` is called as part of cleanup process in 
`OnStageCompleted` and `state.clear` is called 
[here](https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/BarrierCoordinator.scala#L230)
 as part of `OnStop`, hence I introduced it in the `finally` block as part of 
the cleanup process as part of Thread interruption cleanup



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-51307][SQL] locationUri in CatalogStorageFormat shall be decoded for display [spark]

2025-02-27 Thread via GitHub



cloud-fan commented on code in PR #50074:
URL: https://github.com/apache/spark/pull/50074#discussion_r1973627334


##
sql/core/src/test/resources/sql-tests/inputs/describe.sql:
##
@@ -122,6 +122,12 @@ DESC TABLE EXTENDED e;
 
 DESC FORMATTED e;
 
+CREATE TABLE f PARTITIONED BY (B, C) AS SELECT 'APACHE' A, CAST('SPARK' AS 
BINARY) B, TIMESTAMP'2018-11-17 13:33:33' C;

Review Comment:
   does it create a hive table, shall we put `USING json` to explicitly use 
spark natvie data source?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-51303] [SQL] [TESTS] Extend `ORDER BY` testing coverage [spark]

2025-02-27 Thread via GitHub



mihailoale-db commented on PR #50069:
URL: https://github.com/apache/spark/pull/50069#issuecomment-2688377111

   @MaxGekk @cloud-fan Failures don't seem related to changes? Please check 
when you have time. Thanks


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-51336] Upgrade `upload-artifact` to v4 in `spark-docker` repository [spark-docker]

2025-02-27 Thread via GitHub



dongjoon-hyun merged PR #81:
URL: https://github.com/apache/spark-docker/pull/81


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-51336] Upgrade `upload-artifact` to v4 in `spark-docker` repository [spark-docker]

2025-02-27 Thread via GitHub



dongjoon-hyun commented on PR #81:
URL: https://github.com/apache/spark-docker/pull/81#issuecomment-2688408435

   Thank you, @viirya !


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-51270][SQL] Support UUID type in Variant [spark]

2025-02-27 Thread via GitHub



cashmand commented on code in PR #50025:
URL: https://github.com/apache/spark/pull/50025#discussion_r1973900282


##
common/variant/src/main/java/org/apache/spark/types/variant/VariantBuilder.java:
##
@@ -240,6 +242,19 @@ public void appendBinary(byte[] binary) {
 writePos += binary.length;
   }
 
+  public void appendUuid(UUID uuid) {
+checkCapacity(1 + 16);
+writeBuffer[writePos++] = primitiveHeader(UUID);
+
+// UUID is stored big-endian, so don't use writeLong.
+ByteBuffer buffer = ByteBuffer.allocate(16);

Review Comment:
   Done.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-51335] Publish Apache Spark 3.5.5 to docker registry [spark-docker]

2025-02-27 Thread via GitHub



viirya commented on PR #80:
URL: https://github.com/apache/spark-docker/pull/80#issuecomment-2688494398

   This error happened (even after re-triggering):
   
   ```
   54.80 qemu: uncaught target signal 11 (Segmentation fault) - core dumped
   55.22 Segmentation fault (core dumped)
   55.27 qemu: uncaught target signal 11 (Segmentation fault) - core dumped
   55.64 Segmentation fault (core dumped)
   : Sub-process /usr/bin/dpkg returned an error code (1)
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-51336] Upgrade `upload-artifact` to v4 in `spark-docker` repository [spark-docker]

2025-02-27 Thread via GitHub



LuciferYang commented on PR #81:
URL: https://github.com/apache/spark-docker/pull/81#issuecomment-2688468782

   late LGTM


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-51335] Publish Apache Spark 3.5.5 to docker registry [spark-docker]

2025-02-27 Thread via GitHub



dongjoon-hyun commented on PR #80:
URL: https://github.com/apache/spark-docker/pull/80#issuecomment-2688513900

   Let me hold on this because this is not a blocker For Apache Spark 3.5.5 
release announcement.
   After announcing, I'll come back to this.
   
   Thank you, @viirya .


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-51335] Publish Apache Spark 3.5.5 to docker registry [spark-docker]

2025-02-27 Thread via GitHub



dongjoon-hyun commented on PR #80:
URL: https://github.com/apache/spark-docker/pull/80#issuecomment-2688511310

   Ya, it seems to come from the Qemu environment on GitHub Action.
   
   When I build the docker locally, it seems to work.
   ```
   $ docker build .
   [+] Building 423.4s (12/12) FINISHED 

 
docker:desktop-linux
=> [internal] load build definition from Dockerfile 

 
0.0s
=> => transferring dockerfile: 3.08kB   

 
0.0s
=> [internal] load metadata for docker.io/library/eclipse-temurin:17-jammy  

 
1.3s
=> [auth] library/eclipse-temurin:pull token for registry-1.docker.io   

 
0.0s
=> [internal] load .dockerignore

 
0.0s
=> => transferring context: 2B  

 
0.0s
=> [1/6] FROM 
docker.io/library/eclipse-temurin:17-jammy@sha256:1886bea164fd66274feb2031e5a12be3be830949c57491c51cd3d432dafda89f

 0.0s
=> => resolve 
docker.io/library/eclipse-temurin:17-jammy@sha256:1886bea164fd66274feb2031e5a12be3be830949c57491c51cd3d432dafda89f

 0.0s
=> => 
sha256:1886bea164fd66274feb2031e5a12be3be830949c57491c51cd3d432dafda89f 6.55kB 
/ 6.55kB
0.0s
=> => 
sha256:db3ac3d3f94412d3debc28603f37c00a46e773d619df4e21ef018ab1ce6681cd 1.94kB 
/ 1.94kB
0.0s
=> => 
sha256:c1f6b0a95ff9591d42a346fca88e1cd27633e580e7514397e79f3d25598d30ee 6.10kB 
/ 6.10kB
0.0s
=> [internal] load build context

 
0.0s
=> => transferring context: 4.78kB  

 
0.0s
=> [2/6] RUN groupadd --system --gid=185 spark && useradd --system 
--uid=185 --gid=spark spark 

  0.1s
=> [3/6] RUN set -ex; apt-get update; apt-get install -y gnupg2 
wget bash tini libc6 libpam-modules krb5-user libnss3 procps net-tools gosu 
libnss-wrapper; mkdir -p /opt/spark; mkdir /opt/spark/python; mkdir 
-p  15.1s
=> [4/6] RUN set -ex; export SPARK_TMP="$(mktemp -d)"; cd 
$SPARK_TMP; wget -nv -O spark.tgz 
"https://archive.apache.org/dist/spark/spark-3.5.5/spark-3.5.5-bin-hadoop3.tgz";;
 wget -nv -O spark.tgz.asc "https://archi  406.1s
=> [5/6] COPY entrypoint.sh /opt/   

 
0.0s
=> [6/6] WORKDIR /opt/spark/work-dir

 
0.0s
=> exporting to image

Re: [PR] [SPARK-51335] Publish Apache Spark 3.5.5 to docker registry [spark-docker]

2025-02-27 Thread via GitHub



viirya commented on PR #80:
URL: https://github.com/apache/spark-docker/pull/80#issuecomment-2688522180

   Yes. Looks like an issue from the infrastructure. Thanks @dongjoon-hyun 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-51335] Publish Apache Spark 3.5.5 to docker registry [spark-docker]

2025-02-27 Thread via GitHub



pan3793 commented on PR #80:
URL: https://github.com/apache/spark-docker/pull/80#issuecomment-2688551158

   this might be a solution  https://github.com/apache/iceberg/pull/12262


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-49756][SQL][FOLLOWUP] Use correct pgsql datetime fields when pushing down EXTRACT [spark]

2025-02-27 Thread via GitHub



beliefer commented on code in PR #50101:
URL: https://github.com/apache/spark/pull/50101#discussion_r1974672469


##
sql/core/src/main/scala/org/apache/spark/sql/jdbc/PostgresDialect.scala:
##
@@ -303,12 +303,27 @@ private case class PostgresDialect()
 
   class PostgresSQLBuilder extends JDBCSQLBuilder {
 override def visitExtract(field: String, source: String): String = {
-  field match {
-case "DAY_OF_YEAR" => s"EXTRACT(DOY FROM $source)"
-case "YEAR_OF_WEEK" => s"EXTRACT(YEAR FROM $source)"
-case "DAY_OF_WEEK" => s"EXTRACT(DOW FROM $source)"
-case _ => super.visitExtract(field, source)
+  // SECOND, MINUTE, HOUR, QUARTER, YEAR, DAY are identical on postgres 
and spark
+  // MONTHis different, postgres returns 0-11, spark returns 1-12.
+  //  Postgres also returns 1-12 but just for interval 
columns, so without source
+  //  data type, we cannot know how to push down it

Review Comment:
   ![Uploading 屏幕快照 2025-02-28 上午10.37.07.png…]()
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-49756][SQL][FOLLOWUP] Use correct pgsql datetime fields when pushing down EXTRACT [spark]

2025-02-27 Thread via GitHub



beliefer commented on code in PR #50101:
URL: https://github.com/apache/spark/pull/50101#discussion_r1974672789


##
sql/core/src/main/scala/org/apache/spark/sql/jdbc/PostgresDialect.scala:
##
@@ -303,12 +303,27 @@ private case class PostgresDialect()
 
   class PostgresSQLBuilder extends JDBCSQLBuilder {
 override def visitExtract(field: String, source: String): String = {
-  field match {
-case "DAY_OF_YEAR" => s"EXTRACT(DOY FROM $source)"
-case "YEAR_OF_WEEK" => s"EXTRACT(YEAR FROM $source)"
-case "DAY_OF_WEEK" => s"EXTRACT(DOW FROM $source)"
-case _ => super.visitExtract(field, source)
+  // SECOND, MINUTE, HOUR, QUARTER, YEAR, DAY are identical on postgres 
and spark
+  // MONTHis different, postgres returns 0-11, spark returns 1-12.
+  //  Postgres also returns 1-12 but just for interval 
columns, so without source
+  //  data type, we cannot know how to push down it

Review Comment:
   ![屏幕快照 2025-02-28 上午10 37 
07](https://github.com/user-attachments/assets/a5d9f0e1-c763-4cb8-bfa6-ec3f3e21305e)
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-51307][SQL] locationUri in CatalogStorageFormat shall be decoded for display [spark]

2025-02-27 Thread via GitHub



yaooqinn commented on code in PR #50074:
URL: https://github.com/apache/spark/pull/50074#discussion_r1974620691


##
sql/core/src/test/resources/sql-tests/inputs/describe.sql:
##
@@ -122,6 +122,12 @@ DESC TABLE EXTENDED e;
 
 DESC FORMATTED e;
 
+CREATE TABLE f PARTITIONED BY (B, C) AS SELECT 'APACHE' A, CAST('SPARK' AS 
BINARY) B, TIMESTAMP'2018-11-17 13:33:33' C;

Review Comment:
   Not necessary, the default behavior w/o `USING` keyword defaults to parquet 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-51307][SQL] locationUri in CatalogStorageFormat shall be decoded for display [spark]

2025-02-27 Thread via GitHub



yaooqinn commented on code in PR #50074:
URL: https://github.com/apache/spark/pull/50074#discussion_r1974624507


##
sql/core/src/test/resources/sql-tests/results/describe.sql.out:
##
@@ -890,6 +890,48 @@ a  string  
CONCAT('a\n b\n ', 'c\n d')
 b  int 42
 
 
+-- !query
+CREATE TABLE f PARTITIONED BY (B, C) AS SELECT 'APACHE' A, CAST('SPARK' AS 
BINARY) B, TIMESTAMP'2018-11-17 13:33:33' C
+-- !query schema
+struct<>
+-- !query output
+
+
+
+-- !query
+DESC FORMATTED f PARTITION (B='SPARK', C=TIMESTAMP'2018-11-17 13:33:33')
+-- !query schema
+struct
+-- !query output
+A  string  
+B  binary  
+C  timestamp   
+# Partition Information
+# col_name data_type   comment 
+B  binary  
+C  timestamp   
+   
+# Detailed Partition Information   

+Database   default 
+Table  f   
+Partition Values   [B=SPARK, C=2018-11-17 13:33:33]

+Location [not included in comparison]/{warehouse_dir}/f/B=SPARK/C=2018-11-17 
13%3A33%3A33  
+Partition Parameters   [numFiles=1, totalSize=486, transient_lastDdlTime=[not 
included in comparison]] 

Review Comment:
   > we have transient_lastDdlTime because it's a hive table?
   
   This is not from table parameters, but partition stats, I'm not aware of the 
history of filtering out some of the hive properties. However, it's not related 
to this PR.
   
   > Maybe we can test with ... AS JSON to avoid seeing these unrelated fields.
   
   We already have an `AS JSON` test variant, unfortunately 
transient_lastDdlTime is still there



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-51339][BUILD] Remove `IllegalImportsChecker` for `s.c.Seq/IndexedSeq` from `scalastyle-config.xml` [spark]

2025-02-27 Thread via GitHub



LuciferYang commented on PR #50105:
URL: https://github.com/apache/spark/pull/50105#issuecomment-2689564664

   Merged into master and branch-4.0. Thanks @HyukjinKwon and @the-sakthi 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-49756][SQL][FOLLOWUP] Use correct pgsql datetime fields when pushing down EXTRACT [spark]

2025-02-27 Thread via GitHub



cloud-fan commented on code in PR #50101:
URL: https://github.com/apache/spark/pull/50101#discussion_r1974712264


##
connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/PostgresIntegrationSuite.scala:
##
@@ -304,11 +309,25 @@ class PostgresIntegrationSuite extends 
DockerJDBCIntegrationV2Suite with V2JDBCT
 assert(rows7.length === 1)
 assert(rows7(0).getString(0) === "alex")
 
-val df8 = sql(s"SELECT name FROM $tbl WHERE dayofweek(date1) = 4")
-checkFilterPushed(df8)
-val rows8 = df8.collect()
-assert(rows8.length === 1)
-assert(rows8(0).getString(0) === "alex")
+withClue("dayofweek") {
+  val dow = sql(s"SELECT dayofweek(date1) FROM $tbl WHERE name = 'alex'")
+.collect().head.getInt(0)
+  val df = sql(s"SELECT name FROM $tbl WHERE dayofweek(date1) = $dow")

Review Comment:
   A better idea is probably have a test infra that automatically checks the 
result between JDBC pushdown on and off. I'll leave it to followups.



##
connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/PostgresIntegrationSuite.scala:
##
@@ -304,11 +309,25 @@ class PostgresIntegrationSuite extends 
DockerJDBCIntegrationV2Suite with V2JDBCT
 assert(rows7.length === 1)
 assert(rows7(0).getString(0) === "alex")
 
-val df8 = sql(s"SELECT name FROM $tbl WHERE dayofweek(date1) = 4")
-checkFilterPushed(df8)
-val rows8 = df8.collect()
-assert(rows8.length === 1)
-assert(rows8(0).getString(0) === "alex")
+withClue("dayofweek") {
+  val dow = sql(s"SELECT dayofweek(date1) FROM $tbl WHERE name = 'alex'")
+.collect().head.getInt(0)
+  val df = sql(s"SELECT name FROM $tbl WHERE dayofweek(date1) = $dow")

Review Comment:
   A better idea is probably having a test infra that automatically checks the 
result between JDBC pushdown on and off. I'll leave it to followups.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-51352] Use Spark 3.5.5 in E2E tests [spark-kubernetes-operator]

2025-02-27 Thread via GitHub



dongjoon-hyun closed pull request #160: [SPARK-51352] Use Spark 3.5.5 in E2E 
tests
URL: https://github.com/apache/spark-kubernetes-operator/pull/160


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-51347] Enable Ingress and Service Support for Spark Driver [spark-kubernetes-operator]

2025-02-27 Thread via GitHub



dongjoon-hyun closed pull request #159: [SPARK-51347] Enable Ingress and 
Service Support for Spark Driver
URL: https://github.com/apache/spark-kubernetes-operator/pull/159


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-51347] Enable Ingress and Service Support for Spark Driver [spark-kubernetes-operator]

2025-02-27 Thread via GitHub



dongjoon-hyun commented on PR #159:
URL: 
https://github.com/apache/spark-kubernetes-operator/pull/159#issuecomment-2689793279

   BTW, I noticed that this is a first commit with a different email, @jiangzho 
. Are you going to use this one?
   ```
   $ git log --author=zh | grep 'Author:' | sort | uniq -c
  1 Author: Zhou JIANG 
 20 Author: zhou-jiang 
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

1 2 >

1 - 100 of 135 matches

Mail list logo