Re: [PR] [SPARK-49479][CORE] Cancel the Timer non-daemon thread on stopping the BarrierCoordinator [spark]

2025-03-06 Thread via GitHub
beliefer closed pull request #50020: [SPARK-49479][CORE] Cancel the Timer non-daemon thread on stopping the BarrierCoordinator URL: https://github.com/apache/spark/pull/50020 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and us

Re: [PR] [SPARK-51432][SQL][PYTHON] Throw a proper exception when Arrow schemas are mismatched [spark]

2025-03-06 Thread via GitHub
HyukjinKwon commented on PR #50201: URL: https://github.com/apache/spark/pull/50201#issuecomment-2705692088 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] [SPARK-49479][CORE] Cancel the Timer non-daemon thread on stopping the BarrierCoordinator [spark]

2025-03-06 Thread via GitHub
jjayadeep06 commented on code in PR #50020: URL: https://github.com/apache/spark/pull/50020#discussion_r1984374189 ## core/src/main/scala/org/apache/spark/BarrierCoordinator.scala: ## @@ -132,13 +136,15 @@ private[spark] class BarrierCoordinator( } } -// Cancel

[PR] session extension doc fix [spark]

2025-03-06 Thread via GitHub
Aishwarya-Lakshmi-M opened a new pull request, #50204: URL: https://github.com/apache/spark/pull/50204 **What changes were proposed in this pull request?** Spark session extensions comments enhancement **Why are the changes needed?** The list of sequence given in spark session ex

Re: [PR] [SPARK-49479][CORE] Cancel the Timer non-daemon thread on stopping the BarrierCoordinator [spark]

2025-03-06 Thread via GitHub
beliefer commented on code in PR #50020: URL: https://github.com/apache/spark/pull/50020#discussion_r1984560713 ## core/src/main/scala/org/apache/spark/BarrierCoordinator.scala: ## @@ -132,13 +136,15 @@ private[spark] class BarrierCoordinator( } } -// Cancel th

Re: [PR] [SPARK-51432][SQL][PYTHON] Throw a proper exception when Arrow schemas are mismatched [spark]

2025-03-06 Thread via GitHub
HyukjinKwon closed pull request #50201: [SPARK-51432][SQL][PYTHON] Throw a proper exception when Arrow schemas are mismatched URL: https://github.com/apache/spark/pull/50201 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

Re: [PR] Revert "[SPARK-51396][SQL] RuntimeConfig.getOption shouldn't use exceptions for control flow" [spark]

2025-03-06 Thread via GitHub
HyukjinKwon commented on PR #50200: URL: https://github.com/apache/spark/pull/50200#issuecomment-2705388045 (since this is a clean revert) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spec

Re: [PR] [SPARK-51272][CORE]. Fix for the race condition in Scheduler causing failure in retrying all partitions in case of indeterministic shuffle keys [spark]

2025-03-06 Thread via GitHub
ahshahid commented on code in PR #50033: URL: https://github.com/apache/spark/pull/50033#discussion_r1984505354 ## core/src/test/scala/org/apache/spark/scheduler/DAGSchedulerSuite.scala: ## @@ -3185,16 +3197,106 @@ class DAGSchedulerSuite extends SparkFunSuite with TempLocalSpa

Re: [PR] [SPARK-51432][SQL][PYTHON] Throw a proper exception when Arrow schemas are mismatched [spark]

2025-03-06 Thread via GitHub
ueshin commented on code in PR #50201: URL: https://github.com/apache/spark/pull/50201#discussion_r1984423181 ## sql/core/src/main/scala/org/apache/spark/sql/execution/objects.scala: ## @@ -252,8 +253,10 @@ case class MapPartitionsInRWithArrowExec( val outputProject = Uns

Re: [PR] [SPARK-51271][PYTHON] Add filter pushdown API to Python Data Sources [spark]

2025-03-06 Thread via GitHub
wengh commented on code in PR #49961: URL: https://github.com/apache/spark/pull/49961#discussion_r1984418894 ## python/pyspark/sql/datasource.py: ## @@ -234,6 +249,62 @@ def streamReader(self, schema: StructType) -> "DataSourceStreamReader": ) +ColumnPath = Tuple[s

Re: [PR] [SPARK-47849][PYTHON][CONNECT] Change release script to release pyspark-client [spark]

2025-03-06 Thread via GitHub
HyukjinKwon closed pull request #50203: [SPARK-47849][PYTHON][CONNECT] Change release script to release pyspark-client URL: https://github.com/apache/spark/pull/50203 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the UR

Re: [PR] [SPARK-51350][SQL] Implement Show Procedures [spark]

2025-03-06 Thread via GitHub
cloud-fan commented on code in PR #50109: URL: https://github.com/apache/spark/pull/50109#discussion_r1984474247 ## sql/catalyst/src/test/scala/org/apache/spark/sql/connector/catalog/InMemoryTableCatalog.scala: ## @@ -268,6 +268,11 @@ class InMemoryTableCatalog extends BasicInM

Re: [PR] [SPARK-51272][CORE]. Fix for the race condition in Scheduler causing failure in retrying all partitions in case of indeterministic shuffle keys [spark]

2025-03-06 Thread via GitHub
ahshahid commented on code in PR #50033: URL: https://github.com/apache/spark/pull/50033#discussion_r1984505894 ## core/src/test/scala/org/apache/spark/scheduler/DAGSchedulerSuite.scala: ## @@ -3185,16 +3197,106 @@ class DAGSchedulerSuite extends SparkFunSuite with TempLocalSpa

Re: [PR] [SPARK-51272][CORE]. Fix for the race condition in Scheduler causing failure in retrying all partitions in case of indeterministic shuffle keys [spark]

2025-03-06 Thread via GitHub
ahshahid commented on code in PR #50033: URL: https://github.com/apache/spark/pull/50033#discussion_r1984499520 ## core/src/test/scala/org/apache/spark/scheduler/DAGSchedulerSuite.scala: ## @@ -3185,16 +3197,106 @@ class DAGSchedulerSuite extends SparkFunSuite with TempLocalSpa

Re: [PR] [SPARK-51350][SQL] Implement Show Procedures [spark]

2025-03-06 Thread via GitHub
cloud-fan commented on code in PR #50109: URL: https://github.com/apache/spark/pull/50109#discussion_r1984473827 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/v2Commands.scala: ## @@ -1664,3 +1664,21 @@ case class Call( override protected def with

Re: [PR] [SPARK-50511][PYTHON][FOLLOWUP] Avoid wrapping streaming Python data source error messages [spark]

2025-03-06 Thread via GitHub
HyukjinKwon closed pull request #49532: [SPARK-50511][PYTHON][FOLLOWUP] Avoid wrapping streaming Python data source error messages URL: https://github.com/apache/spark/pull/49532 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub an

Re: [PR] [SPARK-51350][SQL] Implement Show Procedures [spark]

2025-03-06 Thread via GitHub
cloud-fan commented on code in PR #50109: URL: https://github.com/apache/spark/pull/50109#discussion_r1984471013 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala: ## @@ -28,13 +28,7 @@ import scala.util.{Failure, Random, Success, Try} import

Re: [PR] [SPARK-51350][SQL] Implement Show Procedures [spark]

2025-03-06 Thread via GitHub
cloud-fan commented on code in PR #50109: URL: https://github.com/apache/spark/pull/50109#discussion_r1984470634 ## sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/ProcedureCatalog.java: ## @@ -34,4 +41,11 @@ public interface ProcedureCatalog extends CatalogPlu

Re: [PR] [SPARK-50763][SQL] Add Analyzer rule for resolving SQL table functions [spark]

2025-03-06 Thread via GitHub
cloud-fan commented on code in PR #49471: URL: https://github.com/apache/spark/pull/49471#discussion_r1984468771 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala: ## @@ -1675,6 +1676,86 @@ class SessionCatalog( } } + /** + *

Re: [PR] [SPARK-50763][SQL] Add Analyzer rule for resolving SQL table functions [spark]

2025-03-06 Thread via GitHub
cloud-fan commented on code in PR #49471: URL: https://github.com/apache/spark/pull/49471#discussion_r1984459708 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala: ## @@ -1675,6 +1676,86 @@ class SessionCatalog( } } + /** + *

Re: [PR] Revert "[SPARK-51396][SQL] RuntimeConfig.getOption shouldn't use exceptions for control flow" [spark]

2025-03-06 Thread via GitHub
HyukjinKwon closed pull request #50200: Revert "[SPARK-51396][SQL] RuntimeConfig.getOption shouldn't use exceptions for control flow" URL: https://github.com/apache/spark/pull/50200 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[PR] log executing SQL text [spark]

2025-03-06 Thread via GitHub
gabry-lab opened a new pull request, #50202: URL: https://github.com/apache/spark/pull/50202 ### What changes were proposed in this pull request? log executing SQL text ### Why are the changes needed? SQL text is useful to debug JDBC connector ### Does this PR introduc

Re: [PR] [SPARK-51372][SQL] Introduce a builder pattern in TableCatalog [spark]

2025-03-06 Thread via GitHub
cloud-fan commented on PR #50137: URL: https://github.com/apache/spark/pull/50137#issuecomment-2705425290 > Would we deprecate the existing create/replace Table() methods (In TableCatalog and StagingTableCatalog) for a new one that takes in TableInfo instead? Yea this is expected. In

Re: [PR] [SPARK-51358] [SS] Introduce snapshot upload lag detection through StateStoreCoordinator [spark]

2025-03-06 Thread via GitHub
zecookiez commented on code in PR #50123: URL: https://github.com/apache/spark/pull/50123#discussion_r1984179128 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDBStateStoreProvider.scala: ## @@ -38,9 +38,14 @@ import org.apache.spark.sql.types.Str

Re: [PR] [SPARK-51432][SQL][PYTHON] Throw a proper exception when Arrow schemas are mismatched [spark]

2025-03-06 Thread via GitHub
HyukjinKwon commented on PR #50201: URL: https://github.com/apache/spark/pull/50201#issuecomment-2705494364 Oops, I mistakenly removed your comment. Yes, it has to be != and fixed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHu

Re: [PR] [SPARK-51432][SQL][PYTHON] Throw a proper exception when Arrow schemas are mismatched [spark]

2025-03-06 Thread via GitHub
ueshin commented on code in PR #50201: URL: https://github.com/apache/spark/pull/50201#discussion_r1984423181 ## sql/core/src/main/scala/org/apache/spark/sql/execution/objects.scala: ## @@ -252,8 +253,10 @@ case class MapPartitionsInRWithArrowExec( val outputProject = Uns

Re: [PR] [SPARK-49479][CORE] Cancel the Timer non-daemon thread on stopping the BarrierCoordinator [spark]

2025-03-06 Thread via GitHub
jjayadeep06 commented on code in PR #50020: URL: https://github.com/apache/spark/pull/50020#discussion_r1984414322 ## core/src/main/scala/org/apache/spark/BarrierCoordinator.scala: ## @@ -132,13 +136,15 @@ private[spark] class BarrierCoordinator( } } -// Cancel

Re: [PR] [SPARK-51272][CORE]. Fix for the race condition in Scheduler causing failure in retrying all partitions in case of indeterministic shuffle keys [spark]

2025-03-06 Thread via GitHub
attilapiros commented on code in PR #50033: URL: https://github.com/apache/spark/pull/50033#discussion_r1984397600 ## core/src/test/scala/org/apache/spark/scheduler/DAGSchedulerSuite.scala: ## @@ -3185,16 +3197,106 @@ class DAGSchedulerSuite extends SparkFunSuite with TempLocal

Re: [PR] [SPARK-49479][CORE] Cancel the Timer non-daemon thread on stopping the BarrierCoordinator [spark]

2025-03-06 Thread via GitHub
jayadeep-jayaraman commented on code in PR #50020: URL: https://github.com/apache/spark/pull/50020#discussion_r1984389105 ## core/src/main/scala/org/apache/spark/BarrierCoordinator.scala: ## @@ -132,13 +136,15 @@ private[spark] class BarrierCoordinator( } } -//

Re: [PR] [SPARK-49479][CORE] Cancel the Timer non-daemon thread on stopping the BarrierCoordinator [spark]

2025-03-06 Thread via GitHub
jjayadeep06 commented on code in PR #50020: URL: https://github.com/apache/spark/pull/50020#discussion_r1984389492 ## core/src/main/scala/org/apache/spark/BarrierCoordinator.scala: ## @@ -132,13 +136,15 @@ private[spark] class BarrierCoordinator( } } -// Cancel

[PR] [SPARK-51432][SQL][PYTHON] Throw a proper exception when Arrow schemas are mismatched [spark]

2025-03-06 Thread via GitHub
HyukjinKwon opened a new pull request, #50201: URL: https://github.com/apache/spark/pull/50201 ### What changes were proposed in this pull request? This PR proposes to raise a proper error instead of `assert` when the returned Arrow schema is mismatched with specified SQL return type.

Re: [PR] [SPARK-49479][CORE] Cancel the Timer non-daemon thread on stopping the BarrierCoordinator [spark]

2025-03-06 Thread via GitHub
beliefer commented on code in PR #50020: URL: https://github.com/apache/spark/pull/50020#discussion_r1984387678 ## core/src/main/scala/org/apache/spark/BarrierCoordinator.scala: ## @@ -132,13 +136,15 @@ private[spark] class BarrierCoordinator( } } -// Cancel th

Re: [PR] [SPARK-49479][CORE] Cancel the Timer non-daemon thread on stopping the BarrierCoordinator [spark]

2025-03-06 Thread via GitHub
jjayadeep06 commented on code in PR #50020: URL: https://github.com/apache/spark/pull/50020#discussion_r1984378935 ## core/src/main/scala/org/apache/spark/BarrierCoordinator.scala: ## @@ -80,8 +81,9 @@ private[spark] class BarrierCoordinator( states.forEachValue(1, clearS

Re: [PR] [SPARK-51396][SQL] RuntimeConfig.getOption shouldn't use exceptions for control flow [spark]

2025-03-06 Thread via GitHub
JoshRosen commented on code in PR #50167: URL: https://github.com/apache/spark/pull/50167#discussion_r1984355016 ## sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala: ## @@ -6623,12 +6623,14 @@ class SQLConf extends Serializable with Logging with SqlApiCon

Re: [PR] [SPARK-49479][CORE] Cancel the Timer non-daemon thread on stopping the BarrierCoordinator [spark]

2025-03-06 Thread via GitHub
jjayadeep06 commented on code in PR #50020: URL: https://github.com/apache/spark/pull/50020#discussion_r1984373008 ## core/src/main/scala/org/apache/spark/BarrierCoordinator.scala: ## @@ -132,13 +136,15 @@ private[spark] class BarrierCoordinator( } } -// Cancel

Re: [PR] [SPARK-50763][SQL] Add Analyzer rule for resolving SQL table functions [spark]

2025-03-06 Thread via GitHub
allisonwang-db commented on code in PR #49471: URL: https://github.com/apache/spark/pull/49471#discussion_r1984271562 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala: ## @@ -1675,6 +1676,91 @@ class SessionCatalog( } } + /** +

Re: [PR] [SPARK-49479][CORE] Cancel the Timer non-daemon thread on stopping the BarrierCoordinator [spark]

2025-03-06 Thread via GitHub
beliefer commented on code in PR #50020: URL: https://github.com/apache/spark/pull/50020#discussion_r1984367043 ## core/src/main/scala/org/apache/spark/BarrierCoordinator.scala: ## @@ -132,13 +136,15 @@ private[spark] class BarrierCoordinator( } } -// Cancel th

Re: [PR] [SPARK-49479][CORE] Cancel the Timer non-daemon thread on stopping the BarrierCoordinator [spark]

2025-03-06 Thread via GitHub
beliefer commented on code in PR #50020: URL: https://github.com/apache/spark/pull/50020#discussion_r1984362676 ## core/src/main/scala/org/apache/spark/BarrierCoordinator.scala: ## @@ -132,13 +136,15 @@ private[spark] class BarrierCoordinator( } } -// Cancel th

Re: [PR] [SPARK-50763][SQL] Add Analyzer rule for resolving SQL table functions [spark]

2025-03-06 Thread via GitHub
allisonwang-db commented on code in PR #49471: URL: https://github.com/apache/spark/pull/49471#discussion_r1984333875 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala: ## @@ -2655,6 +2656,93 @@ class Analyzer(override val catalogManager: Cata

Re: [PR] [SPARK-51396][SQL] RuntimeConfig.getOption shouldn't use exceptions for control flow [spark]

2025-03-06 Thread via GitHub
JoshRosen commented on code in PR #50167: URL: https://github.com/apache/spark/pull/50167#discussion_r1984351866 ## sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala: ## @@ -6623,12 +6623,14 @@ class SQLConf extends Serializable with Logging with SqlApiCon

Re: [PR] [SPARK-51391][SQL][CONNECT] Fix `SparkConnectClient` to respect `SPARK_USER` and `user.name` [spark]

2025-03-06 Thread via GitHub
dongjoon-hyun commented on code in PR #50159: URL: https://github.com/apache/spark/pull/50159#discussion_r1983844231 ## python/pyspark/sql/connect/client/core.py: ## @@ -666,7 +666,7 @@ def __init__( elif user_id is not None: self._user_id = user_id

Re: [PR] log executing jdbc query sql text [spark]

2025-03-06 Thread via GitHub
gabry-lab closed pull request #49039: log executing jdbc query sql text URL: https://github.com/apache/spark/pull/49039 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsub

Re: [PR] [SPARK-51417][CONNECT] Give a second to wait for Spark Connect server to fully start [spark]

2025-03-06 Thread via GitHub
HyukjinKwon commented on PR #50181: URL: https://github.com/apache/spark/pull/50181#issuecomment-2705267278 Merged to master and branch-4.0. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the sp

Re: [PR] [SPARK-50763][SQL] Add Analyzer rule for resolving SQL table functions [spark]

2025-03-06 Thread via GitHub
allisonwang-db commented on code in PR #49471: URL: https://github.com/apache/spark/pull/49471#discussion_r1984269314 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala: ## @@ -2636,6 +2637,93 @@ class Analyzer(override val catalogManager: Cata

[PR] [WIP][ML][CONNECT] ML transformed dataframe keep a reference to the model [spark]

2025-03-06 Thread via GitHub
zhengruifeng opened a new pull request, #50199: URL: https://github.com/apache/spark/pull/50199 ### What changes were proposed in this pull request? add the model link in the transformed dataframe ### Why are the changes needed? https://github.com/apache/spark/pull/49948 disabled

Re: [PR] [SPARK-50511][PYTHON][FOLLOWUP] Avoid wrapping streaming Python data source error messages [spark]

2025-03-06 Thread via GitHub
allisonwang-db commented on PR #49532: URL: https://github.com/apache/spark/pull/49532#issuecomment-2705286688 cc @dongjoon-hyun the tests should be fixed now. I'd like to include this fix in 4.0 as well. -- This is an automated message from the Apache Git Service. To respond to the messa

[PR] [SPARK-51429][Connect] Add "Acknowledgement" message to ExecutePlanResponse [spark]

2025-03-06 Thread via GitHub
vicennial opened a new pull request, #50193: URL: https://github.com/apache/spark/pull/50193 ### What changes were proposed in this pull request? Adds an `Acknowledgement` message in `ExecutePlanResponse` which is essentially an empty response containing metadata from the

Re: [PR] [SPARK-50511][PYTHON][FOLLOWUP] Avoid wrapping streaming Python data source error messages [spark]

2025-03-06 Thread via GitHub
HyukjinKwon commented on PR #49532: URL: https://github.com/apache/spark/pull/49532#issuecomment-2705289056 Merged to master and branch-4.0. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the sp

Re: [PR] [SPARK-51341][CORE] Cancel time task with suitable way. [spark]

2025-03-06 Thread via GitHub
beliefer closed pull request #50107: [SPARK-51341][CORE] Cancel time task with suitable way. URL: https://github.com/apache/spark/pull/50107 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the speci

[PR] [SPARK-51430][PYTHON] Stop PySpark context logger from propagating logs to stdout [spark]

2025-03-06 Thread via GitHub
allisonwang-db opened a new pull request, #50198: URL: https://github.com/apache/spark/pull/50198 ### What changes were proposed in this pull request? This PR stops PySpark context logger from propagating logs to stdout. ### Why are the changes needed? To improve

Re: [PR] [SPARK-51417][CONNECT] Give a second to wait for Spark Connect server to fully start [spark]

2025-03-06 Thread via GitHub
HyukjinKwon closed pull request #50181: [SPARK-51417][CONNECT] Give a second to wait for Spark Connect server to fully start URL: https://github.com/apache/spark/pull/50181 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

Re: [PR] corrected link to mllib-guide.md [spark]

2025-03-06 Thread via GitHub
github-actions[bot] commented on PR #48968: URL: https://github.com/apache/spark/pull/48968#issuecomment-2705237613 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.

Re: [PR] [DRAFT][SQL] Old collation resolution PR [spark]

2025-03-06 Thread via GitHub
github-actions[bot] commented on PR #48844: URL: https://github.com/apache/spark/pull/48844#issuecomment-2705237673 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.

Re: [PR] [SPARK-50413][SQL] Add a flag to insert commands to know if they are part of CTAS [spark]

2025-03-06 Thread via GitHub
github-actions[bot] closed pull request #48956: [SPARK-50413][SQL] Add a flag to insert commands to know if they are part of CTAS URL: https://github.com/apache/spark/pull/48956 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

Re: [PR] [SPARK-45278][YARN] Support executor bind address in Yarn executors [spark]

2025-03-06 Thread via GitHub
github-actions[bot] commented on PR #47892: URL: https://github.com/apache/spark/pull/47892#issuecomment-2705237707 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.

Re: [PR] [SPARK-51069][SQL] Add big-endian support to UnsafeRowUtils.validateStructuralIntegrityWithReasonImpl [spark]

2025-03-06 Thread via GitHub
jonathan-albrecht-ibm commented on code in PR #49773: URL: https://github.com/apache/spark/pull/49773#discussion_r1983617226 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/UnsafeRowUtils.scala: ## @@ -74,23 +92,23 @@ object UnsafeRowUtils { case (field,

Re: [PR] [MINOR][DOCS] Fix small grammatical nit [spark]

2025-03-06 Thread via GitHub
HyukjinKwon commented on PR #50196: URL: https://github.com/apache/spark/pull/50196#issuecomment-2705169573 Merged to master and branch-4.0. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the sp

Re: [PR] [MINOR][DOCS] Fix small grammatical nit [spark]

2025-03-06 Thread via GitHub
HyukjinKwon closed pull request #50196: [MINOR][DOCS] Fix small grammatical nit URL: https://github.com/apache/spark/pull/50196 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] [SPARK-51338] Add automated CI build for `connect-examples` [spark]

2025-03-06 Thread via GitHub
HyukjinKwon commented on code in PR #50187: URL: https://github.com/apache/spark/pull/50187#discussion_r1984185390 ## .github/workflows/build_and_test.yml: ## @@ -1290,3 +1292,35 @@ jobs: cd ui-test npm install --save-dev node --experimental-vm-m

Re: [PR] [SPARK-51338] Add automated CI build for `connect-examples` [spark]

2025-03-06 Thread via GitHub
HyukjinKwon commented on code in PR #50187: URL: https://github.com/apache/spark/pull/50187#discussion_r1984184646 ## .github/workflows/build_and_test.yml: ## @@ -92,6 +92,7 @@ jobs: pyspark_pandas_modules=`cd dev && python -c "import sparktestsupport.modules as m; p

Re: [PR] [SPARK-48922][SQL] Optimize nested data type insertion performance [spark]

2025-03-06 Thread via GitHub
kazuyukitanimura commented on PR #47381: URL: https://github.com/apache/spark/pull/47381#issuecomment-2705164126 Hi @wForget This PR looks important, do you plan to reopen this and rebase on top of [SPARK-49352](https://issues.apache.org/jira/browse/SPARK-49352) ? https://github.com/a

Re: [PR] [SPARK-51350][SQL] Implement Show Procedures [spark]

2025-03-06 Thread via GitHub
szehon-ho commented on PR #50109: URL: https://github.com/apache/spark/pull/50109#issuecomment-2705156517 Addressed review (thanks @allisonwang-db ). Also had an offline chat with @cloud-fan , we will not support LIKE in the first cut due to avoid consistency problems as different c

Re: [PR] [SPARK-51416][CONNECT] Remove SPARK_CONNECT_MODE when starting Spark Connect server [spark]

2025-03-06 Thread via GitHub
HyukjinKwon closed pull request #50180: [SPARK-51416][CONNECT] Remove SPARK_CONNECT_MODE when starting Spark Connect server URL: https://github.com/apache/spark/pull/50180 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use t

Re: [PR] [SPARK-51416][CONNECT] Remove SPARK_CONNECT_MODE when starting Spark Connect server [spark]

2025-03-06 Thread via GitHub
HyukjinKwon commented on PR #50180: URL: https://github.com/apache/spark/pull/50180#issuecomment-2705140055 Merged to master and branch-4.0. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the sp

[PR] [WIP][SPARK-51395][SQL] Refine handling of default values in procedures [spark]

2025-03-06 Thread via GitHub
aokolnychyi opened a new pull request, #50197: URL: https://github.com/apache/spark/pull/50197 ### What changes were proposed in this pull request? This PR refines handling of default values in procedures that will be released in 4.0. ### Why are the changes needed?

Re: [PR] [SPARK-19335][SPARK-38200][SQL] Add upserts for writing to JDBC [spark]

2025-03-06 Thread via GitHub
beliefer commented on code in PR #49528: URL: https://github.com/apache/spark/pull/49528#discussion_r1983218929 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcRelationProvider.scala: ## @@ -58,17 +59,20 @@ class JdbcRelationProvider extends Creata

Re: [PR] [SPARK-51149][CORE] Log classpath in SparkSubmit on ClassNotFoundException [spark]

2025-03-06 Thread via GitHub
vrozov commented on PR #49870: URL: https://github.com/apache/spark/pull/49870#issuecomment-2704363307 @hvanhovell ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsu

Re: [PR] [SPARK-51408][YARN][TESTS] AmIpFilterSuite#testProxyUpdate fails in some networks [spark]

2025-03-06 Thread via GitHub
cnauroth commented on PR #50173: URL: https://github.com/apache/spark/pull/50173#issuecomment-2705058426 FYI, apache/hadoop#7478 has a follow-up to keep in sync with the changes here to randomize the fake host name. -- This is an automated message from the Apache Git Service. To respond t

Re: [PR] [SPARK-50892][SQL]Add UnionLoopExec, physical operator for recursion, to perform execution of recursive queries [spark]

2025-03-06 Thread via GitHub
peter-toth commented on PR #49955: URL: https://github.com/apache/spark/pull/49955#issuecomment-2705032975 @Pajaraja, I've opened a small PR: https://github.com/Pajaraja/spark/pull/1 to your branch to refactor and fix limit handling. Let's discuss the changes if you have concerns. cc

Re: [PR] [SPARK-51402][SQL][TESTS] Test TimeType in UDF [spark]

2025-03-06 Thread via GitHub
calilisantos commented on code in PR #50194: URL: https://github.com/apache/spark/pull/50194#discussion_r1984097168 ## sql/core/src/test/scala/org/apache/spark/sql/UDFSuite.scala: ## @@ -1197,6 +1197,34 @@ class UDFSuite extends QueryTest with SharedSparkSession { Row(Ro

[PR] [SPARK-51097] [SS] Re-introduce RocksDB state store's last uploaded snapshot version instance metrics [spark]

2025-03-06 Thread via GitHub
zecookiez opened a new pull request, #50195: URL: https://github.com/apache/spark/pull/50195 … ### What changes were proposed in this pull request? SPARK-51097 #50161 recently had to revert the changes in #49816 due to instance metrics showing up on SparkUI, cau

Re: [PR] [SPARK-51412][K8S] Upgrade Gradle to 8.13 [spark-kubernetes-operator]

2025-03-06 Thread via GitHub
viirya commented on PR #165: URL: https://github.com/apache/spark-kubernetes-operator/pull/165#issuecomment-2704959084 Okay, I got it. Thanks @dongjoon-hyun :) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL a

[PR] [MINOR][DOCS] Fix small grammatical nit [spark]

2025-03-06 Thread via GitHub
the-sakthi opened a new pull request, #50196: URL: https://github.com/apache/spark/pull/50196 ### What changes were proposed in this pull request? A very minor grammatical nit. ### Why are the changes needed? To fix a very minor grammatical nit introduced while taking care of

Re: [PR] [SPARK-51372][SQL] Introduce a builder pattern in TableCatalog [spark]

2025-03-06 Thread via GitHub
gengliangwang commented on code in PR #50137: URL: https://github.com/apache/spark/pull/50137#discussion_r1983880092 ## sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/TableCatalog.java: ## @@ -311,4 +311,49 @@ default boolean purgeTable(Identifier ident) throw

Re: [PR] [SPARK-51412][K8S] Upgrade Gradle to 8.13 [spark-kubernetes-operator]

2025-03-06 Thread via GitHub
dongjoon-hyun commented on PR #165: URL: https://github.com/apache/spark-kubernetes-operator/pull/165#issuecomment-2704925908 BTW, @viirya . Please check your Apache email box. I sent an email to you. :) -- This is an automated message from the Apache Git Service. To respond to the messag

Re: [PR] [SPARK-51402][SQL][TESTS] Test TimeType in UDF [spark]

2025-03-06 Thread via GitHub
MaxGekk commented on code in PR #50194: URL: https://github.com/apache/spark/pull/50194#discussion_r1983987486 ## sql/core/src/test/scala/org/apache/spark/sql/UDFSuite.scala: ## @@ -862,7 +862,7 @@ class UDFSuite extends QueryTest with SharedSparkSession { .select(myUdf1(

Re: [PR] [SPARK-51424][BUILD] Upgrade ORC to 2.1.1 [spark]

2025-03-06 Thread via GitHub
dongjoon-hyun commented on PR #50189: URL: https://github.com/apache/spark/pull/50189#issuecomment-2704736909 All tests passed. https://github.com/user-attachments/assets/068e298e-14e0-4b00-83c3-d8c4e2d2733d"; /> -- This is an automated message from the Apache Git Service. To r

Re: [PR] [SPARK-51272][CORE]. Fix for the race condition in Scheduler causing failure in retrying all partitions in case of indeterministic shuffle keys [spark]

2025-03-06 Thread via GitHub
ahshahid commented on PR #50033: URL: https://github.com/apache/spark/pull/50033#issuecomment-2704813644 > @ahshahid as the solution became very different please update its description too in the PR description. @attilapiros .Sincere thanks for the detailed review. -- This is an au

Re: [PR] [SPARK-51424][BUILD] Upgrade ORC to 2.1.1 [spark]

2025-03-06 Thread via GitHub
dongjoon-hyun closed pull request #50189: [SPARK-51424][BUILD] Upgrade ORC to 2.1.1 URL: https://github.com/apache/spark/pull/50189 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comme

[PR] [SPARK-51402][SQL][TESTS] Test TimeType in UDF [spark]

2025-03-06 Thread via GitHub
calilisantos opened a new pull request, #50194: URL: https://github.com/apache/spark/pull/50194 ## **What changes were proposed in this pull request?** Write tests for TimeType in UDF as input parameters and results. ## **Why are the changes needed?** It follows https://issues.ap

Re: [PR] [SPARK-22876][YARN] Respect YARN AM failure validity interval [spark]

2025-03-06 Thread via GitHub
Kimahriman commented on PR #42570: URL: https://github.com/apache/spark/pull/42570#issuecomment-2704798779 Gentle ping, has been working great for us for over a year now -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

Re: [PR] [SPARK-44639][SS][YARN] Use Java tmp dir for local RocksDB state storage on Yarn [spark]

2025-03-06 Thread via GitHub
Kimahriman commented on PR #42301: URL: https://github.com/apache/spark/pull/42301#issuecomment-2704793961 Updated to create a util function for this behavior to make it more clear what's happening, and less specific to RocksDB, though the RocksDB state store is the only use case currently.

Re: [PR] [SPARK-51424][BUILD] Upgrade ORC to 2.1.1 [spark]

2025-03-06 Thread via GitHub
dongjoon-hyun commented on PR #50189: URL: https://github.com/apache/spark/pull/50189#issuecomment-2704756151 Merged to master/4.0. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific co

Re: [PR] [SPARK-51424][BUILD] Upgrade ORC to 2.1.1 [spark]

2025-03-06 Thread via GitHub
dongjoon-hyun commented on PR #50189: URL: https://github.com/apache/spark/pull/50189#issuecomment-2704738138 Could you review this PR, @huaxingao ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go t

Re: [PR] [SPARK-51391][SQL][CONNECT] Fix `SparkConnectClient` to respect `SPARK_USER` and `user.name` [spark]

2025-03-06 Thread via GitHub
dongjoon-hyun closed pull request #50159: [SPARK-51391][SQL][CONNECT] Fix `SparkConnectClient` to respect `SPARK_USER` and `user.name` URL: https://github.com/apache/spark/pull/50159 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHu

Re: [PR] [SPARK-51391][SQL][CONNECT] Fix `SparkConnectClient` to respect `SPARK_USER` and `user.name` [spark]

2025-03-06 Thread via GitHub
dongjoon-hyun commented on PR #50159: URL: https://github.com/apache/spark/pull/50159#issuecomment-2704653734 Thank you, @viirya and @HyukjinKwon . Merged to master/4.0. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and u

Re: [PR] [SPARK-51391][SQL][CONNECT] Fix `SparkConnectClient` to respect `SPARK_USER` and `user.name` [spark]

2025-03-06 Thread via GitHub
viirya commented on code in PR #50159: URL: https://github.com/apache/spark/pull/50159#discussion_r1983857249 ## python/pyspark/sql/connect/client/core.py: ## @@ -666,7 +666,7 @@ def __init__( elif user_id is not None: self._user_id = user_id else:

Re: [PR] [SPARK-51391][SQL][CONNECT] Fix `SparkConnectClient` to respect `SPARK_USER` and `user.name` [spark]

2025-03-06 Thread via GitHub
dongjoon-hyun commented on code in PR #50159: URL: https://github.com/apache/spark/pull/50159#discussion_r1983844231 ## python/pyspark/sql/connect/client/core.py: ## @@ -666,7 +666,7 @@ def __init__( elif user_id is not None: self._user_id = user_id

Re: [PR] [SPARK-51365][TESTS] Test maven + macos [spark]

2025-03-06 Thread via GitHub
LuciferYang commented on PR #50178: URL: https://github.com/apache/spark/pull/50178#issuecomment-2704523092 The issue should be resolvable. The commit history of this pr is a bit messy. I'll submit a clean one and add more descriptions tomorrow -- This is an automated message from the

Re: [PR] [SPARK-51391][SQL][CONNECT] Fix `SparkConnectClient` to respect `SPARK_USER` and `user.name` [spark]

2025-03-06 Thread via GitHub
viirya commented on code in PR #50159: URL: https://github.com/apache/spark/pull/50159#discussion_r1983788372 ## python/pyspark/sql/connect/client/core.py: ## @@ -666,7 +666,7 @@ def __init__( elif user_id is not None: self._user_id = user_id else:

Re: [PR] [SPARK-51372][SQL] Introduce a builder pattern in TableCatalog [spark]

2025-03-06 Thread via GitHub
anoopj commented on PR #50137: URL: https://github.com/apache/spark/pull/50137#issuecomment-2704514930 > I think we just need to have an interface to hold all table information, and let createTable/replaceTable take it instead of many parameters. Thanks for the feedback. Your suggest

Re: [PR] [SPARK-51391][SQL][CONNECT] Fix `SparkConnectClient` to respect `SPARK_USER` and `user.name` [spark]

2025-03-06 Thread via GitHub
viirya commented on code in PR #50159: URL: https://github.com/apache/spark/pull/50159#discussion_r1983788372 ## python/pyspark/sql/connect/client/core.py: ## @@ -666,7 +666,7 @@ def __init__( elif user_id is not None: self._user_id = user_id else:

Re: [PR] [SPARK-51391][SQL][CONNECT] Fix `SparkConnectClient` to respect `SPARK_USER` and `user.name` [spark]

2025-03-06 Thread via GitHub
viirya commented on code in PR #50159: URL: https://github.com/apache/spark/pull/50159#discussion_r1983784966 ## python/pyspark/sql/connect/client/core.py: ## @@ -666,7 +666,7 @@ def __init__( elif user_id is not None: self._user_id = user_id else:

Re: [PR] [SPARK-51391][SQL][CONNECT] Fix `SparkConnectClient` to respect `SPARK_USER` and `user.name` [spark]

2025-03-06 Thread via GitHub
viirya commented on code in PR #50159: URL: https://github.com/apache/spark/pull/50159#discussion_r1983775276 ## python/pyspark/sql/connect/client/core.py: ## @@ -666,7 +666,7 @@ def __init__( elif user_id is not None: self._user_id = user_id else:

Re: [PR] [SPARK-51097] [SS] [4.0] Revert RocksDB instance metrics changes [spark]

2025-03-06 Thread via GitHub
zecookiez closed pull request #50165: [SPARK-51097] [SS] [4.0] Revert RocksDB instance metrics changes URL: https://github.com/apache/spark/pull/50165 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] [SPARK-51391][SQL][CONNECT] Fix `SparkConnectClient` to respect `SPARK_USER` and `user.name` [spark]

2025-03-06 Thread via GitHub
viirya commented on PR #50159: URL: https://github.com/apache/spark/pull/50159#issuecomment-2704463043 Looking at this. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To u

[PR] [WIP][SPARK-51428][SQL] Reassign Aliases for collated expression trees deterministically [spark]

2025-03-06 Thread via GitHub
vladimirg-db opened a new pull request, #50192: URL: https://github.com/apache/spark/pull/50192 ### What changes were proposed in this pull request? Reassign Aliases for collated expression trees deterministically at the end of Analysis. ### Why are the changes needed?

Re: [PR] [SPARK-51182][SQL] DataFrameWriter should throw dataPathNotSpecifiedError when path is not specified [spark]

2025-03-06 Thread via GitHub
vrozov commented on PR #49928: URL: https://github.com/apache/spark/pull/49928#issuecomment-2704361837 @cloud-fan any action on my side before the PR is merged? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL a

[PR] [SPARK-51425][Connect] Add client API to set custom `operation_id` [spark]

2025-03-06 Thread via GitHub
vicennial opened a new pull request, #50191: URL: https://github.com/apache/spark/pull/50191 ### What changes were proposed in this pull request? Adds an additional optional parameter to the Scala/Python APIs to allow a user to explicitly set an `operation_id`. ### Why

[PR] [WIP][SQL] Add `TimeFormatter` [spark]

2025-03-06 Thread via GitHub
MaxGekk opened a new pull request, #50190: URL: https://github.com/apache/spark/pull/50190 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How was

  1   2   >