Re: [PR] [SPARK-51290][SQL] Enable filling default values in DSv2 writes [spark]

2025-03-04 Thread via GitHub
aokolnychyi commented on code in PR #50044: URL: https://github.com/apache/spark/pull/50044#discussion_r1979905938 ## sql/catalyst/src/test/scala/org/apache/spark/sql/connector/catalog/InMemoryBaseTable.scala: ## @@ -400,18 +401,23 @@ abstract class InMemoryBaseTable( val

[PR] [WIP][SPARK-51384][SQL] Support `java.time.LocalTime` as the external type of `TimeType` [spark]

2025-03-04 Thread via GitHub
MaxGekk opened a new pull request, #50153: URL: https://github.com/apache/spark/pull/50153 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How was

Re: [PR] [WIP][SPARK-50892][SQL]Add UnionLoopExec, physical operator for recursion, to perform execution of recursive queries [spark]

2025-03-04 Thread via GitHub
Pajaraja commented on code in PR #49955: URL: https://github.com/apache/spark/pull/49955#discussion_r1979781701 ## sql/core/src/main/scala/org/apache/spark/sql/execution/basicPhysicalOperators.scala: ## @@ -714,6 +717,177 @@ case class UnionExec(children: Seq[SparkPlan]) extends

[PR] [SPARK-50855][CONNECT][TESTS][FOLLOWUP] Refactor `TransformWithStateConnectSuite` to run `DROP TABLE IF EXISTS my_sink` in `beforeAll/afterEach` [spark]

2025-03-04 Thread via GitHub
LuciferYang opened a new pull request, #50155: URL: https://github.com/apache/spark/pull/50155 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How

[PR] [SPARK-51389][BUILD] Allow full customization of profiles in scalastyle [spark]

2025-03-04 Thread via GitHub
cnauroth opened a new pull request, #50156: URL: https://github.com/apache/spark/pull/50156 ### What changes were proposed in this pull request? In the `scalastyle` helper script, activate profiles docker-integration-tests and kubernetes-integration-tests in the default value of `SPA

Re: [PR] [SPARK-51372][SQL] Introduce a builder pattern in TableCatalog [spark]

2025-03-04 Thread via GitHub
anoopj commented on code in PR #50137: URL: https://github.com/apache/spark/pull/50137#discussion_r1980076364 ## sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/TableCatalog.java: ## @@ -311,4 +311,49 @@ default boolean purgeTable(Identifier ident) throws Unsu

Re: [PR] [SPARK-51097] [SS] Split apart SparkPlan metrics and instance metrics [spark]

2025-03-04 Thread via GitHub
zecookiez commented on PR #50157: URL: https://github.com/apache/spark/pull/50157#issuecomment-2699060133 @cloud-fan I was recommended to get your advice on this change, as this is related to https://github.com/apache/spark/pull/49816 and was merged in recently -- This is an automated mes

[PR] [SPARK-51390][INFRA] Add more dependencies in LICENSE-binary for Spark 4.0.0 release [spark]

2025-03-04 Thread via GitHub
cnauroth opened a new pull request, #50158: URL: https://github.com/apache/spark/pull/50158 ### What changes were proposed in this pull request? While reviewing Spark 4.0.0 RC2, I noticed the binary distribution included several jars that were not covered in the license file. This cha

Re: [PR] [SPARK-51272][CORE]. Fix for the race condition in Scheduler causing failure in retrying all partitions in case of indeterministic shuffle keys [spark]

2025-03-04 Thread via GitHub
ahshahid commented on PR #50033: URL: https://github.com/apache/spark/pull/50033#issuecomment-2698932772 @attilapiros @squito , Given that there is a guarantee that DagScheduler::onReceive(event: DAGSchedulerEvent) is always going to be invoked in single thread of EventLoop and in NO SIT

Re: [PR] [MINOR][CORE] Remove unused `private[executor] var httpUrlConnectionTimeoutMillis` from `ExecutorClassLoader` [spark]

2025-03-04 Thread via GitHub
dongjoon-hyun closed pull request #50152: [MINOR][CORE] Remove unused `private[executor] var httpUrlConnectionTimeoutMillis` from `ExecutorClassLoader` URL: https://github.com/apache/spark/pull/50152 -- This is an automated message from the Apache Git Service. To respond to the message, plea

Re: [PR] [SPARK-51367][BUILD] Upgrade slf4j to 2.0.17 [spark]

2025-03-04 Thread via GitHub
dongjoon-hyun closed pull request #50115: [SPARK-51367][BUILD] Upgrade slf4j to 2.0.17 URL: https://github.com/apache/spark/pull/50115 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific co

Re: [PR] [SPARK-51372][SQL] Introduce a builder pattern in TableCatalog [spark]

2025-03-04 Thread via GitHub
aokolnychyi commented on code in PR #50137: URL: https://github.com/apache/spark/pull/50137#discussion_r1980111812 ## sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/TableBuilderImpl.java: ## @@ -0,0 +1,73 @@ +/* + * Licensed to the Apache Software Foundation (

Re: [PR] [SPARK-51358] [SS] Introduce snapshot upload lag detection through StateStoreCoordinator [spark]

2025-03-04 Thread via GitHub
liviazhu-db commented on code in PR #50123: URL: https://github.com/apache/spark/pull/50123#discussion_r1980112100 ## sql/core/src/test/scala/org/apache/spark/sql/execution/streaming/state/StateStoreCoordinatorSuite.scala: ## @@ -155,16 +156,296 @@ class StateStoreCoordinatorSui

Re: [PR] [SPARK-51387][BUILD] Upgrade Netty to 4.1.119.Final [spark]

2025-03-04 Thread via GitHub
dongjoon-hyun closed pull request #50150: [SPARK-51387][BUILD] Upgrade Netty to 4.1.119.Final URL: https://github.com/apache/spark/pull/50150 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spec

Re: [PR] [SPARK-51387][BUILD] Upgrade Netty to 4.1.119.Final [spark]

2025-03-04 Thread via GitHub
dongjoon-hyun commented on PR #50150: URL: https://github.com/apache/spark/pull/50150#issuecomment-2698481466 Merged to master/4.0. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific co

Re: [PR] [SPARK-51349][SQL][TESTS] Change precedence of null and "null" in sorting in QueryTest [spark]

2025-03-04 Thread via GitHub
harshmotw-db commented on code in PR #50108: URL: https://github.com/apache/spark/pull/50108#discussion_r1980130494 ## sql/core/src/test/scala/org/apache/spark/sql/QueryTest.scala: ## @@ -326,7 +326,13 @@ object QueryTest extends Assertions { // For binary arrays, we conver

Re: [PR] [SPARK-51372][SQL] Introduce a builder pattern in TableCatalog [spark]

2025-03-04 Thread via GitHub
anoopj commented on code in PR #50137: URL: https://github.com/apache/spark/pull/50137#discussion_r1980110293 ## sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/TableBuilderImpl.java: ## @@ -0,0 +1,73 @@ +/* + * Licensed to the Apache Software Foundation (ASF)

Re: [PR] [SPARK-51372][SQL] Introduce a builder pattern in TableCatalog [spark]

2025-03-04 Thread via GitHub
aokolnychyi commented on code in PR #50137: URL: https://github.com/apache/spark/pull/50137#discussion_r1980125783 ## sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/TableBuilderImpl.java: ## @@ -0,0 +1,74 @@ +/* + * Licensed to the Apache Software Foundation (

Re: [PR] [SPARK-51372][SQL] Introduce a builder pattern in TableCatalog [spark]

2025-03-04 Thread via GitHub
aokolnychyi commented on code in PR #50137: URL: https://github.com/apache/spark/pull/50137#discussion_r1980124261 ## sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/TableBuilderImpl.java: ## @@ -0,0 +1,74 @@ +/* + * Licensed to the Apache Software Foundation (

Re: [PR] [SPARK-51391][SQL][CONNECT] Fix `SparkConnectClient` to respect `SPARK_USER` and `user.name` [spark]

2025-03-04 Thread via GitHub
dongjoon-hyun commented on PR #50159: URL: https://github.com/apache/spark/pull/50159#issuecomment-2699448851 The comment is addressed, @HyukjinKwon . Thank you. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

Re: [PR] [SPARK-51383][PYTHON][CONNECT] Avoid making RPC calls if clients are already known as stopped [spark]

2025-03-04 Thread via GitHub
HyukjinKwon commented on PR #50145: URL: https://github.com/apache/spark/pull/50145#issuecomment-2699478854 Merged to master and branch-4.0. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the sp

[PR] [SPARK-51097] [SS] Revert RocksDB instance metrics changes for branch-4.0 [spark]

2025-03-04 Thread via GitHub
zecookiez opened a new pull request, #50165: URL: https://github.com/apache/spark/pull/50165 ### What changes were proposed in this pull request? SPARK-51097 Similar to #50161, this change reverts the changes in branch-4.0 introduced from SPARK-51097. More specifica

[PR] [SPARK-51307][SQL][3.5] locationUri in CatalogStorageFormat shall be decoded for display [spark]

2025-03-04 Thread via GitHub
yaooqinn opened a new pull request, #50164: URL: https://github.com/apache/spark/pull/50164 ### What changes were proposed in this pull request? This PR uses CatalogUtils.URIToString instead of URI.toString to decode the location URI. ### Why are the changes needed?

[PR] [SPARK-51394][ML] Optimize out the additional shuffle in stats tests [spark]

2025-03-04 Thread via GitHub
zhengruifeng opened a new pull request, #50166: URL: https://github.com/apache/spark/pull/50166 ### What changes were proposed in this pull request? Optimize out the additional shuffle in stats tests ### Why are the changes needed? for simplification ### Do

Re: [PR] [SPARK-49479][CORE] Cancel the Timer non-daemon thread on stopping the BarrierCoordinator [spark]

2025-03-04 Thread via GitHub
srowen commented on code in PR #50020: URL: https://github.com/apache/spark/pull/50020#discussion_r1980625111 ## core/src/main/scala/org/apache/spark/BarrierCoordinator.scala: ## @@ -119,26 +121,43 @@ private[spark] class BarrierCoordinator( // A timer task that ensures we

Re: [PR] [SPARK-49479][CORE] Cancel the Timer non-daemon thread on stopping the BarrierCoordinator [spark]

2025-03-04 Thread via GitHub
beliefer commented on code in PR #50020: URL: https://github.com/apache/spark/pull/50020#discussion_r1980624198 ## core/src/main/scala/org/apache/spark/BarrierCoordinator.scala: ## @@ -80,8 +81,9 @@ private[spark] class BarrierCoordinator( states.forEachValue(1, clearStat

Re: [PR] [SPARK-49479][CORE] Cancel the Timer non-daemon thread on stopping the BarrierCoordinator [spark]

2025-03-04 Thread via GitHub
jjayadeep06 commented on PR #50020: URL: https://github.com/apache/spark/pull/50020#issuecomment-2699707030 > I still don't see how this addresses the non-daemon thread hanging around. Wouldn't we need to make some executor use daemon threads somewhere? This code below setups the time

Re: [PR] [SPARK-49479][CORE] Cancel the Timer non-daemon thread on stopping the BarrierCoordinator [spark]

2025-03-04 Thread via GitHub
jjayadeep06 commented on code in PR #50020: URL: https://github.com/apache/spark/pull/50020#discussion_r1980639195 ## core/src/main/scala/org/apache/spark/BarrierCoordinator.scala: ## @@ -119,26 +121,43 @@ private[spark] class BarrierCoordinator( // A timer task that ensure

Re: [PR] [SPARK-49479][CORE] Cancel the Timer non-daemon thread on stopping the BarrierCoordinator [spark]

2025-03-04 Thread via GitHub
jjayadeep06 commented on code in PR #50020: URL: https://github.com/apache/spark/pull/50020#discussion_r1980641859 ## core/src/main/scala/org/apache/spark/BarrierCoordinator.scala: ## @@ -80,8 +81,9 @@ private[spark] class BarrierCoordinator( states.forEachValue(1, clearS

Re: [PR] [SPARK-51307][SQL] locationUri in CatalogStorageFormat shall be decoded for display [spark]

2025-03-04 Thread via GitHub
yaooqinn commented on PR #50074: URL: https://github.com/apache/spark/pull/50074#issuecomment-2699607782 Thank you @dongjoon-hyun I have cherrypicked this to 4.0 and will send a backport PR for 3.5.6 -- This is an automated message from the Apache Git Service. To respond to the mes

Re: [PR] [SPARK-51390][INFRA] Add more dependencies in LICENSE-binary for Spark 4.0.0 release [spark]

2025-03-04 Thread via GitHub
yaooqinn closed pull request #50158: [SPARK-51390][INFRA] Add more dependencies in LICENSE-binary for Spark 4.0.0 release URL: https://github.com/apache/spark/pull/50158 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

Re: [PR] [SPARK-51390][INFRA] Add more dependencies in LICENSE-binary for Spark 4.0.0 release [spark]

2025-03-04 Thread via GitHub
yaooqinn commented on PR #50158: URL: https://github.com/apache/spark/pull/50158#issuecomment-2699587578 Merged to master, thank you @cnauroth -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[PR] [SPARK-51393][PYTHON] Fallback to regular Python UDF when Arrow is not found but Arrow-optimized Python UDFs enabled [spark]

2025-03-04 Thread via GitHub
HyukjinKwon opened a new pull request, #50160: URL: https://github.com/apache/spark/pull/50160 ### What changes were proposed in this pull request? This PR extracts legitimate improvement in https://github.com/apache/spark/pull/49482. Falls back regular Python UDF when Arrow is not f

Re: [PR] [WIP][SPARK-51180][BUILD] Upgrade Arrow to 19.0.0 [spark]

2025-03-04 Thread via GitHub
LuciferYang commented on PR #49909: URL: https://github.com/apache/spark/pull/49909#issuecomment-2699591042 > Just a question. Is there any update because we have 19.0.1 already, @aimtsou and @LuciferYang ? > > https://arrow.apache.org/release/19.0.1.html No, the latest `arrow-

Re: [PR] [SPARK-49479][CORE] Cancel the Timer non-daemon thread on stopping the BarrierCoordinator [spark]

2025-03-04 Thread via GitHub
srowen commented on code in PR #50020: URL: https://github.com/apache/spark/pull/50020#discussion_r1980563457 ## core/src/main/scala/org/apache/spark/BarrierCoordinator.scala: ## @@ -122,23 +124,40 @@ private[spark] class BarrierCoordinator( // Init a TimerTask for a barrie

Re: [PR] [SPARK-51386][CORE][SQL] Assign name to error conditions _LEGACY_ERROR_TEMP_3300-3302 [spark]

2025-03-04 Thread via GitHub
pan3793 commented on code in PR #50149: URL: https://github.com/apache/spark/pull/50149#discussion_r1980564113 ## common/utils/src/main/resources/error/error-conditions.json: ## @@ -1737,6 +1743,12 @@ ], "sqlState" : "42822" }, + "GROW_POINTER_ARRAY_OUT_OF_MEMORY"

[PR] [MINOR] Remove unused imports [spark]

2025-03-04 Thread via GitHub
HyukjinKwon opened a new pull request, #50162: URL: https://github.com/apache/spark/pull/50162 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How was this patch

Re: [PR] [MINOR] Remove unused imports [spark]

2025-03-04 Thread via GitHub
HyukjinKwon closed pull request #50162: [MINOR] Remove unused imports URL: https://github.com/apache/spark/pull/50162 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubsc

Re: [PR] [MINOR] Remove unused imports [spark]

2025-03-04 Thread via GitHub
HyukjinKwon commented on PR #50162: URL: https://github.com/apache/spark/pull/50162#issuecomment-2699615468 Merged to master and branch-4.0. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the sp

Re: [PR] [SPARK-51372][SQL] Introduce a builder pattern in TableCatalog [spark]

2025-03-04 Thread via GitHub
aokolnychyi commented on code in PR #50137: URL: https://github.com/apache/spark/pull/50137#discussion_r1980111091 ## sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/TableBuilderImpl.java: ## @@ -0,0 +1,73 @@ +/* + * Licensed to the Apache Software Foundation (

Re: [PR] [SPARK-51390][INFRA] Add more dependencies in LICENSE-binary for Spark 4.0.0 release [spark]

2025-03-04 Thread via GitHub
cnauroth commented on PR #50158: URL: https://github.com/apache/spark/pull/50158#issuecomment-2699756902 @yaooqinn , I appreciate the review and merge. Thank you! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

Re: [PR] [SPARK-51182][SQL] DataFrameWriter should throw dataPathNotSpecifiedError when path is not specified [spark]

2025-03-04 Thread via GitHub
vrozov commented on PR #49928: URL: https://github.com/apache/spark/pull/49928#issuecomment-2699769133 @cloud-fan Added tests, please check. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the sp

Re: [PR] [SPARK-51383][PYTHON][CONNECT] Avoid making RPC calls if clients are already known as stopped [spark]

2025-03-04 Thread via GitHub
HyukjinKwon closed pull request #50145: [SPARK-51383][PYTHON][CONNECT] Avoid making RPC calls if clients are already known as stopped URL: https://github.com/apache/spark/pull/50145 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

Re: [PR] [SPARK-51276][PYTHON] Enable spark.sql.execution.arrow.pyspark.enabled by default [spark]

2025-03-04 Thread via GitHub
HyukjinKwon commented on PR #50036: URL: https://github.com/apache/spark/pull/50036#issuecomment-2699557086 Actually let me revert this ... there are too many subtle behaivour differences ... I will improve here more, and enable it back. -- This is an automated message from the Apache Git

Re: [PR] [SPARK-51307][SQL] locationUri in CatalogStorageFormat shall be decoded for display [spark]

2025-03-04 Thread via GitHub
dongjoon-hyun commented on PR #50074: URL: https://github.com/apache/spark/pull/50074#issuecomment-2698538770 Merged to master. Since this is reported as a `Bug`, could you make backporting PRs, @yaooqinn ? -- This is an automated message from the Apache Git Service. To respond to

Re: [PR] [SPARK-51330][PYTHON] Enable spark.sql.execution.pythonUDTF.arrow.enabled by default [spark]

2025-03-04 Thread via GitHub
HyukjinKwon closed pull request #50096: [SPARK-51330][PYTHON] Enable spark.sql.execution.pythonUDTF.arrow.enabled by default URL: https://github.com/apache/spark/pull/50096 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

Re: [PR] [SPARK-51330][PYTHON] Enable spark.sql.execution.pythonUDTF.arrow.enabled by default [spark]

2025-03-04 Thread via GitHub
HyukjinKwon commented on PR #50096: URL: https://github.com/apache/spark/pull/50096#issuecomment-2699536553 let me close this for now. I think behaviour diff is too much. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

Re: [PR] [SPARK-49479][CORE] Cancel the Timer non-daemon thread on stopping the BarrierCoordinator [spark]

2025-03-04 Thread via GitHub
beliefer commented on PR #50020: URL: https://github.com/apache/spark/pull/50020#issuecomment-2699539363 cc @srowen @dongjoon-hyun -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific co

Re: [PR] [SPARK-48516][PYTHON][CONNECT] Turn on Arrow optimization for Python UDFs by default [spark]

2025-03-04 Thread via GitHub
HyukjinKwon commented on PR #49482: URL: https://github.com/apache/spark/pull/49482#issuecomment-2699566083 There are some subtle diff found. I will revert this for now, and enable it back later after improving this more. -- This is an automated message from the Apache Git Service. To res

Re: [PR] [SPARK-51393][PYTHON] Fallback to regular Python UDF when Arrow is not found but Arrow-optimized Python UDFs enabled [spark]

2025-03-04 Thread via GitHub
ueshin commented on code in PR #50160: URL: https://github.com/apache/spark/pull/50160#discussion_r1980598264 ## python/pyspark/sql/connect/udf.py: ## @@ -78,6 +80,15 @@ def _create_py_udf( eval_type: int = PythonEvalType.SQL_BATCHED_UDF +if is_arrow_enabled: +

Re: [PR] [MINOR] Remove unused imports [spark]

2025-03-04 Thread via GitHub
HyukjinKwon commented on PR #50162: URL: https://github.com/apache/spark/pull/50162#issuecomment-2699615202 Gonna merge to recover the CI, -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spec

[PR] [SPARK-51277][PYTHON][FOLLOW-UP] Remove 0-arg check in Spark Connect side as well [spark]

2025-03-04 Thread via GitHub
HyukjinKwon opened a new pull request, #50163: URL: https://github.com/apache/spark/pull/50163 ### What changes were proposed in this pull request? This PR is a followup of https://github.com/apache/spark/commit/9efaa91deb502cadbaddb8ca4c934bd8529aeaad that removes the 0-arg check in

[PR] [SPARK-51097] [SS] Revert RocksDB instance metrics changes [spark]

2025-03-04 Thread via GitHub
zecookiez opened a new pull request, #50161: URL: https://github.com/apache/spark/pull/50161 ### What changes were proposed in this pull request? SPARK-51097 This change reverts the changes in master introduced from SPARK-51097. More specifically, this reverts the f

Re: [PR] [SPARK-49479][CORE] Cancel the Timer non-daemon thread on stopping the BarrierCoordinator [spark]

2025-03-04 Thread via GitHub
jjayadeep06 commented on code in PR #50020: URL: https://github.com/apache/spark/pull/50020#discussion_r1980609388 ## core/src/main/scala/org/apache/spark/BarrierCoordinator.scala: ## @@ -122,23 +124,40 @@ private[spark] class BarrierCoordinator( // Init a TimerTask for a b

Re: [PR] [SPARK-51341][CORE] Cancel time task with suitable way. [spark]

2025-03-04 Thread via GitHub
beliefer commented on code in PR #50107: URL: https://github.com/apache/spark/pull/50107#discussion_r1980618502 ## core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala: ## @@ -959,7 +965,9 @@ private[spark] class TaskSchedulerImpl( barrierCoordinator.s

Re: [PR] [SPARK-51277][PYTHON][FOLLOW-UP] Remove 0-arg check in Spark Connect side as well [spark]

2025-03-04 Thread via GitHub
HyukjinKwon closed pull request #50163: [SPARK-51277][PYTHON][FOLLOW-UP] Remove 0-arg check in Spark Connect side as well URL: https://github.com/apache/spark/pull/50163 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

Re: [PR] [SPARK-51277][PYTHON][FOLLOW-UP] Remove 0-arg check in Spark Connect side as well [spark]

2025-03-04 Thread via GitHub
HyukjinKwon commented on PR #50163: URL: https://github.com/apache/spark/pull/50163#issuecomment-2699779702 Merged to master and branch-4.0. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the sp

Re: [PR] [SPARK-51290][SQL] Enable filling default values in DSv2 writes [spark]

2025-03-04 Thread via GitHub
cloud-fan commented on code in PR #50044: URL: https://github.com/apache/spark/pull/50044#discussion_r1978853315 ## sql/catalyst/src/test/scala/org/apache/spark/sql/connector/catalog/InMemoryBaseTable.scala: ## @@ -400,18 +401,23 @@ abstract class InMemoryBaseTable( val s

Re: [PR] [SPARK-51373] [SS] Removing extra copy for column family prefix from 'ReplyChangelog' [spark]

2025-03-04 Thread via GitHub
HeartSaVioR closed pull request #50119: [SPARK-51373] [SS] Removing extra copy for column family prefix from 'ReplyChangelog' URL: https://github.com/apache/spark/pull/50119 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

Re: [PR] [SPARK-51373] [SS] Removing extra copy for column family prefix from 'ReplyChangelog' [spark]

2025-03-04 Thread via GitHub
HeartSaVioR commented on PR #50119: URL: https://github.com/apache/spark/pull/50119#issuecomment-2696615232 Thanks! Merging to master/4.0. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spec

Re: [PR] [SPARK-51321][SQL] Add rpad and lpad support for PostgresDialect and MsSQLServerDialect expression pushdown [spark]

2025-03-04 Thread via GitHub
beliefer commented on code in PR #50060: URL: https://github.com/apache/spark/pull/50060#discussion_r197621 ## connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/MsSqlServerIntegrationSuite.scala: ## @@ -91,6 +91,14 @@ class MsSqlServerIntegration

Re: [PR] [SPARK-51321][SQL] Add rpad and lpad support for PostgresDialect and MsSQLServerDialect expression pushdown [spark]

2025-03-04 Thread via GitHub
beliefer commented on code in PR #50060: URL: https://github.com/apache/spark/pull/50060#discussion_r1978890809 ## connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/MsSqlServerIntegrationSuite.scala: ## @@ -91,6 +91,14 @@ class MsSqlServerIntegration

[PR] [MINOR][DOCS] Add `spark.taskMetrics.trackUpdatedBlockStatuses` description for configuration.md [spark]

2025-03-04 Thread via GitHub
tomscut opened a new pull request, #50147: URL: https://github.com/apache/spark/pull/50147 ### What changes were proposed in this pull request? Add `spark.taskMetrics.trackUpdatedBlockStatuses` description for configuration.md. ### Does this PR introduce _any_ user-faci

Re: [PR] [SPARK-49488][SQL][FOLLOWUP] Use correct MySQL datetime functions when pushing down EXTRACT [spark]

2025-03-04 Thread via GitHub
cloud-fan commented on code in PR #50112: URL: https://github.com/apache/spark/pull/50112#discussion_r1979415722 ## connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/MySQLIntegrationSuite.scala: ## @@ -195,17 +198,19 @@ class MySQLIntegrationSuite ex

Re: [PR] [SPARK-51341][CORE] Cancel time task with suitable way. [spark]

2025-03-04 Thread via GitHub
srowen commented on code in PR #50107: URL: https://github.com/apache/spark/pull/50107#discussion_r1979417965 ## core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala: ## @@ -959,7 +965,9 @@ private[spark] class TaskSchedulerImpl( barrierCoordinator.sto

Re: [PR] [SPARK-49507][SQL] Fix the case issue after enabling metastorePartitionPruningFastFallback [spark]

2025-03-04 Thread via GitHub
cloud-fan commented on code in PR #47998: URL: https://github.com/apache/spark/pull/47998#discussion_r1979410799 ## sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala: ## @@ -417,8 +417,11 @@ private[client] class Shim_v2_0 extends Shim with Logging {

[PR] [SPARK-51385][SQL] Normalize out projection added in DeduplicateRelations for union child output deduplication [spark]

2025-03-04 Thread via GitHub
vladimirg-db opened a new pull request, #50148: URL: https://github.com/apache/spark/pull/50148 ### What changes were proposed in this pull request? Strip away extra projection added by `DeduplicateRelations` when comparing logical plans. `DeduplicateRelations` puts one extra `

Re: [PR] [WIP][SPARK-50892][SQL]Add UnionLoopExec, physical operator for recursion, to perform execution of recursive queries [spark]

2025-03-04 Thread via GitHub
Pajaraja commented on code in PR #49955: URL: https://github.com/apache/spark/pull/49955#discussion_r1979639801 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala: ## @@ -1031,6 +1040,9 @@ object ColumnPruning extends Rule[LogicalPlan] {

Re: [PR] [WIP][SPARK-50892][SQL]Add UnionLoopExec, physical operator for recursion, to perform execution of recursive queries [spark]

2025-03-04 Thread via GitHub
Pajaraja commented on code in PR #49955: URL: https://github.com/apache/spark/pull/49955#discussion_r1979641417 ## sql/core/src/main/scala/org/apache/spark/sql/execution/basicPhysicalOperators.scala: ## @@ -714,6 +717,177 @@ case class UnionExec(children: Seq[SparkPlan]) extends

Re: [PR] [SPARK-48538][SQL][FOLLOWUP] Add info logs when overriding HiveConf [spark]

2025-03-04 Thread via GitHub
dongjoon-hyun commented on PR #50133: URL: https://github.com/apache/spark/pull/50133#issuecomment-2697945572 May I ask what is the expectation to expose this info to the users, @pan3793 ? If there is no action item to the users, I'd prefer to keep the existing muted way too. -- This is

Re: [PR] [SPARK-51386][CORE][SQL] Assign name to error conditions _LEGACY_ERROR_TEMP_3300-3302 [spark]

2025-03-04 Thread via GitHub
dongjoon-hyun commented on code in PR #50149: URL: https://github.com/apache/spark/pull/50149#discussion_r1979644060 ## core/src/main/java/org/apache/spark/memory/TaskMemoryManager.java: ## @@ -278,15 +278,15 @@ private long trySpillAndAcquire( } } catch (ClosedByInt

Re: [PR] [SPARK-51029][INFRA][FOLLOWUP] Remove LICENSE and NOTICE for unused/duplicated hive deps [spark]

2025-03-04 Thread via GitHub
dongjoon-hyun closed pull request #50146: [SPARK-51029][INFRA][FOLLOWUP] Remove LICENSE and NOTICE for unused/duplicated hive deps URL: https://github.com/apache/spark/pull/50146 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub an

Re: [PR] [SPARK-51334][CONNECT] Add java/scala version in analyze spark_version response [spark]

2025-03-04 Thread via GitHub
grundprinzip commented on code in PR #50102: URL: https://github.com/apache/spark/pull/50102#discussion_r1979698080 ## sql/connect/common/src/main/protobuf/spark/connect/base.proto: ## @@ -261,6 +261,10 @@ message AnalyzePlanResponse { message SparkVersion { string ver

Re: [PR] [SPARK-51349][SQL][TESTS] Change precedence of null and "null" in sorting in QueryTest [spark]

2025-03-04 Thread via GitHub
cloud-fan commented on code in PR #50108: URL: https://github.com/apache/spark/pull/50108#discussion_r1979387686 ## sql/core/src/test/scala/org/apache/spark/sql/QueryTest.scala: ## @@ -326,7 +326,13 @@ object QueryTest extends Assertions { // For binary arrays, we convert i

Re: [PR] [SPARK-51357][SQL] Preserve plan change logging level for views [spark]

2025-03-04 Thread via GitHub
cloud-fan commented on code in PR #50118: URL: https://github.com/apache/spark/pull/50118#discussion_r1979391152 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ViewResolution.scala: ## @@ -21,7 +21,7 @@ import org.apache.spark.sql.catalyst.plans.logical.{

Re: [PR] [SPARK-51366][SQL] Add a new visitCaseWhen method to V2ExpressionSQLBuilder [spark]

2025-03-04 Thread via GitHub
cloud-fan commented on code in PR #50129: URL: https://github.com/apache/spark/pull/50129#discussion_r1979443722 ## sql/catalyst/src/main/java/org/apache/spark/sql/connector/util/V2ExpressionSQLBuilder.java: ## @@ -253,6 +254,18 @@ protected String visitNot(String v) { prot

Re: [PR] [SPARK-51349][SQL][TESTS] Change precedence of null and "null" in sorting in QueryTest [spark]

2025-03-04 Thread via GitHub
cloud-fan commented on code in PR #50108: URL: https://github.com/apache/spark/pull/50108#discussion_r1979390083 ## sql/core/src/test/scala/org/apache/spark/sql/QueryTest.scala: ## @@ -326,7 +326,13 @@ object QueryTest extends Assertions { // For binary arrays, we convert i

Re: [PR] [SPARK-51386][CORE][SQL] Assign name to error conditions _LEGACY_ERROR_TEMP_3300-3302 [spark]

2025-03-04 Thread via GitHub
dongjoon-hyun commented on code in PR #50149: URL: https://github.com/apache/spark/pull/50149#discussion_r1979644060 ## core/src/main/java/org/apache/spark/memory/TaskMemoryManager.java: ## @@ -278,15 +278,15 @@ private long trySpillAndAcquire( } } catch (ClosedByInt

Re: [PR] [SPARK-51386][CORE][SQL] Assign name to error conditions _LEGACY_ERROR_TEMP_3300-3302 [spark]

2025-03-04 Thread via GitHub
dongjoon-hyun commented on code in PR #50149: URL: https://github.com/apache/spark/pull/50149#discussion_r1979653482 ## common/utils/src/main/resources/error/error-conditions.json: ## @@ -1737,6 +1743,12 @@ ], "sqlState" : "42822" }, + "GROW_POINTER_ARRAY_OUT_OF_ME

[PR] [SPARK-51387][BUILD] Upgrade Netty to 4.1.119.Final [spark]

2025-03-04 Thread via GitHub
dongjoon-hyun opened a new pull request, #50150: URL: https://github.com/apache/spark/pull/50150 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### H

Re: [PR] [SPARK-49488][SQL][FOLLOWUP] Use correct MySQL datetime functions when pushing down EXTRACT [spark]

2025-03-04 Thread via GitHub
cloud-fan commented on code in PR #50112: URL: https://github.com/apache/spark/pull/50112#discussion_r1979427090 ## sql/core/src/main/scala/org/apache/spark/sql/jdbc/MySQLDialect.scala: ## @@ -54,10 +54,31 @@ private case class MySQLDialect() extends JdbcDialect with SQLConfHel

Re: [PR] [SPARK-51357][SQL] Preserve plan change logging level for views [spark]

2025-03-04 Thread via GitHub
vladimirg-db commented on code in PR #50118: URL: https://github.com/apache/spark/pull/50118#discussion_r1979404973 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ViewResolution.scala: ## @@ -21,7 +21,7 @@ import org.apache.spark.sql.catalyst.plans.logica

Re: [PR] [SPARK-50615][FOLLOWUP][SQL] Avoid dropping metadata in the push rule. [spark]

2025-03-04 Thread via GitHub
cloud-fan closed pull request #50121: [SPARK-50615][FOLLOWUP][SQL] Avoid dropping metadata in the push rule. URL: https://github.com/apache/spark/pull/50121 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

Re: [PR] [SPARK-50615][FOLLOWUP][SQL] Avoid dropping metadata in the push rule. [spark]

2025-03-04 Thread via GitHub
cloud-fan commented on PR #50121: URL: https://github.com/apache/spark/pull/50121#issuecomment-2697533239 thanks, merging to master/4.0! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specif

Re: [PR] [SPARK-49488][SQL][FOLLOWUP] Use correct MySQL datetime functions when pushing down EXTRACT [spark]

2025-03-04 Thread via GitHub
cloud-fan commented on code in PR #50112: URL: https://github.com/apache/spark/pull/50112#discussion_r1979434974 ## sql/core/src/main/scala/org/apache/spark/sql/jdbc/MySQLDialect.scala: ## @@ -54,10 +54,31 @@ private case class MySQLDialect() extends JdbcDialect with SQLConfHel

Re: [PR] [SPARK-51342][SQL] Add `TimeType` [spark]

2025-03-04 Thread via GitHub
cloud-fan commented on code in PR #50103: URL: https://github.com/apache/spark/pull/50103#discussion_r1979441532 ## sql/api/src/main/scala/org/apache/spark/sql/types/TimeType.scala: ## @@ -0,0 +1,48 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more +

Re: [PR] [SPARK-49488][SQL][FOLLOWUP] Use correct MySQL datetime functions when pushing down EXTRACT [spark]

2025-03-04 Thread via GitHub
cloud-fan commented on code in PR #50112: URL: https://github.com/apache/spark/pull/50112#discussion_r1979431491 ## sql/core/src/main/scala/org/apache/spark/sql/jdbc/MySQLDialect.scala: ## @@ -54,10 +54,31 @@ private case class MySQLDialect() extends JdbcDialect with SQLConfHel

Re: [PR] [SPARK-51358] [SS] Introduce snapshot upload lag detection through StateStoreCoordinator [spark]

2025-03-04 Thread via GitHub
zecookiez commented on code in PR #50123: URL: https://github.com/apache/spark/pull/50123#discussion_r1979750719 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/StateStoreCoordinator.scala: ## @@ -168,9 +204,85 @@ private class StateStoreCoordinator(ove

Re: [PR] [SPARK-51358] [SS] Introduce snapshot upload lag detection through StateStoreCoordinator [spark]

2025-03-04 Thread via GitHub
zecookiez commented on code in PR #50123: URL: https://github.com/apache/spark/pull/50123#discussion_r1979752276 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDB.scala: ## @@ -134,6 +134,9 @@ class RocksDB( rocksDbOptions.setStatistics(new Sta

Re: [PR] [SPARK-51334][CONNECT] Add java/scala version in analyze spark_version response [spark]

2025-03-04 Thread via GitHub
garlandz-db commented on code in PR #50102: URL: https://github.com/apache/spark/pull/50102#discussion_r1979801805 ## sql/connect/common/src/main/protobuf/spark/connect/base.proto: ## @@ -261,6 +261,10 @@ message AnalyzePlanResponse { message SparkVersion { string vers

Re: [PR] [SPARK-51367][BUILD] Upgrade slf4j to 2.0.17 [spark]

2025-03-04 Thread via GitHub
LuciferYang commented on PR #50115: URL: https://github.com/apache/spark/pull/50115#issuecomment-2698515891 Thank you @dongjoon-hyun -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [MINOR][CORE] Remove unused `private[executor] var httpUrlConnectionTimeoutMillis` from `ExecutorClassLoader` [spark]

2025-03-04 Thread via GitHub
LuciferYang commented on PR #50152: URL: https://github.com/apache/spark/pull/50152#issuecomment-2698516477 Thank you @dongjoon-hyun -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-50855][CONNECT][TESTS][FOLLOWUP] Refactor `TransformWithStateConnectSuite` to run `DROP TABLE IF EXISTS my_sink` in `beforeAll/afterEach` [spark]

2025-03-04 Thread via GitHub
LuciferYang commented on PR #50155: URL: https://github.com/apache/spark/pull/50155#issuecomment-2698517492 Thank you @dongjoon-hyun ~ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-51372][SQL] Introduce a builder pattern in TableCatalog [spark]

2025-03-04 Thread via GitHub
anoopj commented on code in PR #50137: URL: https://github.com/apache/spark/pull/50137#discussion_r1979992466 ## sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/TableCatalog.java: ## @@ -311,4 +311,49 @@ default boolean purgeTable(Identifier ident) throws Unsu

Re: [PR] [SPARK-51307][SQL] locationUri in CatalogStorageFormat shall be decoded for display [spark]

2025-03-04 Thread via GitHub
dongjoon-hyun closed pull request #50074: [SPARK-51307][SQL] locationUri in CatalogStorageFormat shall be decoded for display URL: https://github.com/apache/spark/pull/50074 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

Re: [PR] [SPARK-51372][SQL] Introduce a builder pattern in TableCatalog [spark]

2025-03-04 Thread via GitHub
anoopj commented on code in PR #50137: URL: https://github.com/apache/spark/pull/50137#discussion_r1980002481 ## sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/TableBuilderImpl.java: ## @@ -0,0 +1,73 @@ +/* + * Licensed to the Apache Software Foundation (ASF)

Re: [PR] [SPARK-51372][SQL] Introduce a builder pattern in TableCatalog [spark]

2025-03-04 Thread via GitHub
anoopj commented on code in PR #50137: URL: https://github.com/apache/spark/pull/50137#discussion_r1980084846 ## sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/TableCatalog.java: ## @@ -311,4 +311,49 @@ default boolean purgeTable(Identifier ident) throws Unsu

Re: [PR] [SPARK-50855][CONNECT][TESTS][FOLLOWUP] Refactor `TransformWithStateConnectSuite` to run `DROP TABLE IF EXISTS my_sink` in `beforeAll/afterEach` [spark]

2025-03-04 Thread via GitHub
dongjoon-hyun closed pull request #50155: [SPARK-50855][CONNECT][TESTS][FOLLOWUP] Refactor `TransformWithStateConnectSuite` to run `DROP TABLE IF EXISTS my_sink` in `beforeAll/afterEach` URL: https://github.com/apache/spark/pull/50155 -- This is an automated message from the Apache Git Serv

Re: [PR] [SPARK-50855][CONNECT][TESTS][FOLLOWUP] Refactor `TransformWithStateConnectSuite` to run `DROP TABLE IF EXISTS my_sink` in `beforeAll/afterEach` [spark]

2025-03-04 Thread via GitHub
dongjoon-hyun commented on PR #50155: URL: https://github.com/apache/spark/pull/50155#issuecomment-2699044698 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific commen

[PR] [SPARK-51097] [SS] Split apart SparkPlan metrics and instance metrics [spark]

2025-03-04 Thread via GitHub
zecookiez opened a new pull request, #50157: URL: https://github.com/apache/spark/pull/50157 ### What changes were proposed in this pull request? SPARK-51097 This PR aims to fix the issue of instance metrics being propagated all the way to Spark UI, even when uninit

  1   2   >