Re: [PR] [SPARK-51365][SQL][TESTS] Reduce `SHUFFLE_EXCHANGE_MAX_THREAD_THRESHOLD/RESULT_QUERY_STAGE_MAX_THREAD_THRESHOLD` for tests related to `SharedSparkSession/TestHive` when using `macOS + Apple S

2025-03-07 Thread via GitHub
LuciferYang commented on code in PR #50206: URL: https://github.com/apache/spark/pull/50206#discussion_r1985999507 ## .github/workflows/build_maven_java21_macos15.yml: ## @@ -36,5 +36,9 @@ jobs: os: macos-15 envs: >- { - "OBJC_DISABLE_INITIALIZE_F

Re: [PR] [SPARK-51365][SQL][TESTS] Reduce `SHUFFLE_EXCHANGE_MAX_THREAD_THRESHOLD/RESULT_QUERY_STAGE_MAX_THREAD_THRESHOLD` for tests related to `SharedSparkSession/TestHive` when using `macOS + Apple S

2025-03-07 Thread via GitHub
LuciferYang commented on code in PR #50206: URL: https://github.com/apache/spark/pull/50206#discussion_r1985995682 ## .github/workflows/build_and_test.yml: ## @@ -1290,3 +1290,20 @@ jobs: cd ui-test npm install --save-dev node --experimental-vm-m

Re: [PR] [MINOR][SQL] Improve readability of JDBC truncate table condition check [spark]

2025-03-07 Thread via GitHub
jinkachy closed pull request #50207: [MINOR][SQL] Improve readability of JDBC truncate table condition check URL: https://github.com/apache/spark/pull/50207 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

Re: [PR] [WIP][SPARK-51436][CORE][SQL][K8s][SS] Fix bug that cancel Future specified mayInterruptIfRunning with true [spark]

2025-03-07 Thread via GitHub
beliefer commented on code in PR #50209: URL: https://github.com/apache/spark/pull/50209#discussion_r1985966155 ## sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/BroadcastExchangeExec.scala: ## @@ -108,7 +108,7 @@ trait BroadcastExchangeLike extends Exchange {

Re: [PR] [WIP][SPARK-51436][CORE][SQL][K8s][SS] Fix bug that cancel Future specified mayInterruptIfRunning with true [spark]

2025-03-07 Thread via GitHub
beliefer commented on code in PR #50209: URL: https://github.com/apache/spark/pull/50209#discussion_r1985965443 ## resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsPollingSnapshotSource.scala: ## @@ -63,7 +63,7 @@ class Executor

Re: [PR] [MINOR][SQL] Improve readability of JDBC truncate table condition check [spark]

2025-03-07 Thread via GitHub
beliefer commented on code in PR #50207: URL: https://github.com/apache/spark/pull/50207#discussion_r1985965363 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcRelationProvider.scala: ## @@ -54,7 +54,7 @@ class JdbcRelationProvider extends Creatabl

Re: [PR] [SPARK-50639][SQL] Improve warning logging in CacheManager [spark]

2025-03-07 Thread via GitHub
vrozov commented on PR #49276: URL: https://github.com/apache/spark/pull/49276#issuecomment-2704365043 @hvanhovell ? @dongjoon-hyun ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-51298][SQL] Support variant in CSV scan [spark]

2025-03-07 Thread via GitHub
sandip-db commented on code in PR #50052: URL: https://github.com/apache/spark/pull/50052#discussion_r1972960791 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVFileFormat.scala: ## @@ -68,6 +69,11 @@ class CSVFileFormat extends TextBasedFileFormat w

Re: [PR] [SPARK-51437][CORE] Let timeoutCheckingTask could response thread interrupt [spark]

2025-03-07 Thread via GitHub
beliefer closed pull request #50211: [SPARK-51437][CORE] Let timeoutCheckingTask could response thread interrupt URL: https://github.com/apache/spark/pull/50211 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abov

Re: [PR] [WIP][SPARK-51436][CORE][SQL][K8s][SS] Fix bug that cancel Future specified mayInterruptIfRunning with true [spark]

2025-03-07 Thread via GitHub
beliefer commented on code in PR #50209: URL: https://github.com/apache/spark/pull/50209#discussion_r1985955247 ## connector/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/consumer/FetchedDataPool.scala: ## @@ -139,7 +139,7 @@ private[consumer] class FetchedDataPool

Re: [PR] [WIP][SPARK-51436][CORE][SQL][K8s][SS] Fix bug that cancel Future specified mayInterruptIfRunning with true [spark]

2025-03-07 Thread via GitHub
beliefer commented on code in PR #50209: URL: https://github.com/apache/spark/pull/50209#discussion_r1985959471 ## core/src/main/scala/org/apache/spark/deploy/worker/Worker.scala: ## @@ -403,7 +403,7 @@ private[deploy] class Worker( // We have exceeded the initial regis

Re: [PR] [WIP][SPARK-51436][CORE][SQL][K8s][SS] Fix bug that cancel Future specified mayInterruptIfRunning with true [spark]

2025-03-07 Thread via GitHub
beliefer commented on code in PR #50209: URL: https://github.com/apache/spark/pull/50209#discussion_r1985958868 ## core/src/main/scala/org/apache/spark/deploy/master/Master.scala: ## @@ -214,10 +214,10 @@ private[deploy] class Master( applicationMetricsSystem.report()

Re: [PR] [WIP][SPARK-51436][CORE][SQL][K8s][SS] Fix bug that cancel Future specified mayInterruptIfRunning with true [spark]

2025-03-07 Thread via GitHub
beliefer commented on code in PR #50209: URL: https://github.com/apache/spark/pull/50209#discussion_r1985958618 ## core/src/main/scala/org/apache/spark/deploy/master/Master.scala: ## @@ -214,10 +214,10 @@ private[deploy] class Master( applicationMetricsSystem.report()

Re: [PR] [WIP][SPARK-51436][CORE][SQL][K8s][SS] Fix bug that cancel Future specified mayInterruptIfRunning with true [spark]

2025-03-07 Thread via GitHub
beliefer commented on code in PR #50209: URL: https://github.com/apache/spark/pull/50209#discussion_r1985957890 ## core/src/main/scala/org/apache/spark/deploy/client/StandaloneAppClient.scala: ## @@ -277,7 +277,7 @@ private[spark] class StandaloneAppClient( override def o

Re: [PR] [WIP][SPARK-51436][CORE][SQL][K8s][SS] Fix bug that cancel Future specified mayInterruptIfRunning with true [spark]

2025-03-07 Thread via GitHub
beliefer commented on code in PR #50209: URL: https://github.com/apache/spark/pull/50209#discussion_r1985954100 ## core/src/main/scala/org/apache/spark/HeartbeatReceiver.scala: ## @@ -250,7 +250,7 @@ private[spark] class HeartbeatReceiver(sc: SparkContext, clock: Clock) ov

Re: [PR] [SPARK-51437][CORE] Let timeoutCheckingTask could response thread interrupt [spark]

2025-03-07 Thread via GitHub
beliefer commented on PR #50211: URL: https://github.com/apache/spark/pull/50211#issuecomment-2707940074 I make a mistake. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. T

Re: [PR] [SPARK-51271][PYTHON] Add filter pushdown API to Python Data Sources [spark]

2025-03-07 Thread via GitHub
beliefer commented on code in PR #49961: URL: https://github.com/apache/spark/pull/49961#discussion_r1985910327 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/python/UserDefinedPythonDataSource.scala: ## @@ -300,6 +321,94 @@ private class UserDefinedPyt

Re: [PR] [SPARK-51271][PYTHON] Add filter pushdown API to Python Data Sources [spark]

2025-03-07 Thread via GitHub
beliefer commented on code in PR #49961: URL: https://github.com/apache/spark/pull/49961#discussion_r1985908707 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/python/PythonScanBuilder.scala: ## @@ -25,6 +27,40 @@ class PythonScanBuilder( ds: Python

Re: [PR] [SPARK-51437][CORE] Let timeoutCheckingTask could response thread interrupt [spark]

2025-03-07 Thread via GitHub
beliefer commented on PR #50211: URL: https://github.com/apache/spark/pull/50211#issuecomment-2707862706 ping @srowen @dongjoon-hyun @LuciferYang -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] [SPARK-51271][PYTHON] Add filter pushdown API to Python Data Sources [spark]

2025-03-07 Thread via GitHub
beliefer commented on code in PR #49961: URL: https://github.com/apache/spark/pull/49961#discussion_r1985905297 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/python/PythonScan.scala: ## @@ -16,26 +16,43 @@ */ package org.apache.spark.sql.execution.d

Re: [PR] [SPARK-51271][PYTHON] Add filter pushdown API to Python Data Sources [spark]

2025-03-07 Thread via GitHub
beliefer commented on code in PR #49961: URL: https://github.com/apache/spark/pull/49961#discussion_r1985895545 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/python/PythonScanBuilder.scala: ## @@ -25,6 +27,40 @@ class PythonScanBuilder( ds: Python

Re: [PR] [SPARK-51271][PYTHON] Add filter pushdown API to Python Data Sources [spark]

2025-03-07 Thread via GitHub
beliefer commented on code in PR #49961: URL: https://github.com/apache/spark/pull/49961#discussion_r1985896731 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/python/PythonScanBuilder.scala: ## @@ -25,6 +27,40 @@ class PythonScanBuilder( ds: Python

Re: [PR] [SPARK-51272][CORE]. Fix for the race condition in Scheduler causing failure in retrying all partitions in case of indeterministic shuffle keys [spark]

2025-03-07 Thread via GitHub
attilapiros commented on code in PR #50033: URL: https://github.com/apache/spark/pull/50033#discussion_r1985859145 ## core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala: ## @@ -1898,24 +1898,34 @@ private[spark] class DAGScheduler( // Make sure the task's acc

Re: [PR] [DRAFT][SQL] Old collation resolution PR [spark]

2025-03-07 Thread via GitHub
github-actions[bot] closed pull request #48844: [DRAFT][SQL] Old collation resolution PR URL: https://github.com/apache/spark/pull/48844 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [MINOR][SQL] Format the SqlBaseParser.g4 [spark]

2025-03-07 Thread via GitHub
beliefer commented on PR #49987: URL: https://github.com/apache/spark/pull/49987#issuecomment-2707785792 @dongjoon-hyun Thank you! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific com

Re: [PR] corrected link to mllib-guide.md [spark]

2025-03-07 Thread via GitHub
github-actions[bot] closed pull request #48968: corrected link to mllib-guide.md URL: https://github.com/apache/spark/pull/48968 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] [SPARK-45278][YARN] Support executor bind address in Yarn executors [spark]

2025-03-07 Thread via GitHub
github-actions[bot] closed pull request #47892: [SPARK-45278][YARN] Support executor bind address in Yarn executors URL: https://github.com/apache/spark/pull/47892 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL a

Re: [PR] [SPARK-51364][SQL][TESTS] Improve the integration tests for external data source by check filter pushed down [spark]

2025-03-07 Thread via GitHub
beliefer commented on PR #50126: URL: https://github.com/apache/spark/pull/50126#issuecomment-2707762020 @dongjoon-hyun @LuciferYang Thank you! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] [SPARK-51386][CORE][SQL] Assign name to error conditions _LEGACY_ERROR_TEMP_3300-3302 [spark]

2025-03-07 Thread via GitHub
dongjoon-hyun commented on PR #50149: URL: https://github.com/apache/spark/pull/50149#issuecomment-2707390437 Merged to master. Thank you, @pan3793 and all. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [PR] [SPARK-51372][SQL] Introduce a builder pattern in TableCatalog [spark]

2025-03-07 Thread via GitHub
aokolnychyi commented on PR #50137: URL: https://github.com/apache/spark/pull/50137#issuecomment-2707494998 @cloud-fan, I don't have a strong opinion on this one and can go either way. One point to keep in mind that Spark will not have the guarantee that the newly added information to

Re: [PR] [MINOR][SQL] Format the SqlBaseParser.g4 [spark]

2025-03-07 Thread via GitHub
dongjoon-hyun commented on PR #49987: URL: https://github.com/apache/spark/pull/49987#issuecomment-2707398360 If there is no proper automation tool, I still prefer to keep it in the AS-IS status because this is controversial, @beliefer . However, I removed my request change on this PR. I'll

[PR] [SPARK-51440] classify the NPE when null topic field value is in kafka message data and there is no topic option [spark]

2025-03-07 Thread via GitHub
huanliwang-db opened a new pull request, #50214: URL: https://github.com/apache/spark/pull/50214 ### What changes were proposed in this pull request? We are throwing out the NPE now when null topic field value is in kafka message data and there is no topic option. Intr

Re: [PR] [MINOR][CORE] Remove redundant synchronized in ThreadUtils [spark]

2025-03-07 Thread via GitHub
dongjoon-hyun closed pull request #50210: [MINOR][CORE] Remove redundant synchronized in ThreadUtils URL: https://github.com/apache/spark/pull/50210 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to t

Re: [PR] [SPARK-51271][PYTHON] Add filter pushdown API to Python Data Sources [spark]

2025-03-07 Thread via GitHub
wengh commented on code in PR #49961: URL: https://github.com/apache/spark/pull/49961#discussion_r1985505029 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/python/PythonScanBuilder.scala: ## @@ -25,6 +27,40 @@ class PythonScanBuilder( ds: PythonDat

Re: [PR] [SPARK-51271][PYTHON] Add filter pushdown API to Python Data Sources [spark]

2025-03-07 Thread via GitHub
wengh commented on code in PR #49961: URL: https://github.com/apache/spark/pull/49961#discussion_r1985560600 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/python/UserDefinedPythonDataSource.scala: ## @@ -300,6 +321,94 @@ private class UserDefinedPython

Re: [PR] [SPARK-43221][CORE] Host local block fetching should use a block status of a block stored on disk [spark]

2025-03-07 Thread via GitHub
dongjoon-hyun commented on code in PR #50122: URL: https://github.com/apache/spark/pull/50122#discussion_r1985684458 ## core/src/test/scala/org/apache/spark/storage/BlockManagerSuite.scala: ## @@ -474,6 +474,26 @@ class BlockManagerSuite extends SparkFunSuite with Matchers with

Re: [PR] [SPARK-51386][CORE][SQL] Assign name to error conditions _LEGACY_ERROR_TEMP_3300-3302 [spark]

2025-03-07 Thread via GitHub
dongjoon-hyun closed pull request #50149: [SPARK-51386][CORE][SQL] Assign name to error conditions _LEGACY_ERROR_TEMP_3300-3302 URL: https://github.com/apache/spark/pull/50149 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and u

Re: [PR] [SPARK-51425][Connect] Add client API to set custom `operation_id` [spark]

2025-03-07 Thread via GitHub
dongjoon-hyun closed pull request #50191: [SPARK-51425][Connect] Add client API to set custom `operation_id` URL: https://github.com/apache/spark/pull/50191 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

Re: [PR] [MINOR][CORE] Remove redundant synchronized in ThreadUtils [spark]

2025-03-07 Thread via GitHub
dongjoon-hyun commented on PR #50210: URL: https://github.com/apache/spark/pull/50210#issuecomment-2707360419 Merged to master. Thank you, @jinkachy and all. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abov

Re: [PR] [SPARK-51229][BUILD][CONNECT] Fix dependency:analyze goal on connect common [spark]

2025-03-07 Thread via GitHub
vrozov commented on code in PR #49971: URL: https://github.com/apache/spark/pull/49971#discussion_r1985642033 ## sql/connect/common/pom.xml: ## @@ -142,8 +218,26 @@ org.spark-project.spark:unused

Re: [PR] [MINOR][CORE] Remove redundant synchronized in ThreadUtils [spark]

2025-03-07 Thread via GitHub
jinkachy commented on PR #50210: URL: https://github.com/apache/spark/pull/50210#issuecomment-2707270252 > Please config github action. done -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] [SPARK-50763][SQL] Add Analyzer rule for resolving SQL table functions [spark]

2025-03-07 Thread via GitHub
allisonwang-db commented on code in PR #49471: URL: https://github.com/apache/spark/pull/49471#discussion_r1985591167 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala: ## @@ -1675,6 +1676,86 @@ class SessionCatalog( } } + /** +

Re: [PR] [SPARK-51288][DOCS] Add link for Scala API of Spark Connect [spark]

2025-03-07 Thread via GitHub
dongjoon-hyun closed pull request #50042: [SPARK-51288][DOCS] Add link for Scala API of Spark Connect URL: https://github.com/apache/spark/pull/50042 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] [SPARK-50763][SQL] Add Analyzer rule for resolving SQL table functions [spark]

2025-03-07 Thread via GitHub
allisonwang-db commented on code in PR #49471: URL: https://github.com/apache/spark/pull/49471#discussion_r1985591167 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala: ## @@ -1675,6 +1676,86 @@ class SessionCatalog( } } + /** +

Re: [PR] [SPARK-50763][SQL] Add Analyzer rule for resolving SQL table functions [spark]

2025-03-07 Thread via GitHub
allisonwang-db commented on code in PR #49471: URL: https://github.com/apache/spark/pull/49471#discussion_r1985588345 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala: ## @@ -1675,6 +1676,86 @@ class SessionCatalog( } } + /** +

Re: [PR] [SPARK-51334][CONNECT] Add java/scala version in analyze spark_version response [spark]

2025-03-07 Thread via GitHub
dongjoon-hyun commented on PR #50102: URL: https://github.com/apache/spark/pull/50102#issuecomment-2707231534 To @grundprinzip and @garlandz-db , could you propose new messages instead of touching the existing message `SparkVersion `? This kind of piggy-backing is not a good design choice b

Re: [PR] [SPARK-51364][SQL][TESTS] Improve the integration tests for external data source by check filter pushed down [spark]

2025-03-07 Thread via GitHub
dongjoon-hyun closed pull request #50126: [SPARK-51364][SQL][TESTS] Improve the integration tests for external data source by check filter pushed down URL: https://github.com/apache/spark/pull/50126 -- This is an automated message from the Apache Git Service. To respond to the message, please

Re: [PR] [SPARK-51271][PYTHON] Add filter pushdown API to Python Data Sources [spark]

2025-03-07 Thread via GitHub
wengh commented on code in PR #49961: URL: https://github.com/apache/spark/pull/49961#discussion_r1985529476 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/python/PythonScanBuilder.scala: ## @@ -25,6 +27,40 @@ class PythonScanBuilder( ds: PythonDat

Re: [PR] [SPARK-51272][CORE]. Fix for the race condition in Scheduler causing failure in retrying all partitions in case of indeterministic shuffle keys [spark]

2025-03-07 Thread via GitHub
ahshahid commented on code in PR #50033: URL: https://github.com/apache/spark/pull/50033#discussion_r1985528366 ## core/src/test/scala/org/apache/spark/scheduler/DAGSchedulerSuite.scala: ## @@ -3185,16 +3197,106 @@ class DAGSchedulerSuite extends SparkFunSuite with TempLocalSpa

Re: [PR] [SPARK-51271][PYTHON] Add filter pushdown API to Python Data Sources [spark]

2025-03-07 Thread via GitHub
wengh commented on code in PR #49961: URL: https://github.com/apache/spark/pull/49961#discussion_r1985505029 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/python/PythonScanBuilder.scala: ## @@ -25,6 +27,40 @@ class PythonScanBuilder( ds: PythonDat

Re: [PR] [SPARK-51271][PYTHON] Add filter pushdown API to Python Data Sources [spark]

2025-03-07 Thread via GitHub
wengh commented on code in PR #49961: URL: https://github.com/apache/spark/pull/49961#discussion_r1985500079 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/python/UserDefinedPythonDataSource.scala: ## @@ -300,6 +321,94 @@ private class UserDefinedPython

Re: [PR] [SPARK-48231][BUILD] Remove unused CodeHaus Jackson dependencies [spark]

2025-03-07 Thread via GitHub
dongjoon-hyun commented on PR #46521: URL: https://github.com/apache/spark/pull/46521#issuecomment-2707102450 Thank you, @pan3793 ! And, sorry for your inconvenience. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

Re: [PR] [SPARK-48231][BUILD] Remove unused CodeHaus Jackson dependencies [spark]

2025-03-07 Thread via GitHub
pan3793 commented on PR #46521: URL: https://github.com/apache/spark/pull/46521#issuecomment-2707065925 > > For this one PR, I believe we need a verification for different HMS versions to make it sure. > > that's a valid concern, since Spark CI only covers embedded HMS client case, l

Re: [PR] [SPARK-48231][BUILD] Remove unused CodeHaus Jackson dependencies [spark]

2025-03-07 Thread via GitHub
pan3793 commented on PR #46521: URL: https://github.com/apache/spark/pull/46521#issuecomment-2707088340 Get your point, and let me respond in the voting mail -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abov

Re: [PR] [SPARK-51029][BUILD] Remove `hive-llap-common` compile dependency [spark]

2025-03-07 Thread via GitHub
dongjoon-hyun commented on PR #49725: URL: https://github.com/apache/spark/pull/49725#issuecomment-2707079221 Thank you for the comments, @pan3793 . -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go t

Re: [PR] [SPARK-48231][BUILD] Remove unused CodeHaus Jackson dependencies [spark]

2025-03-07 Thread via GitHub
dongjoon-hyun commented on PR #46521: URL: https://github.com/apache/spark/pull/46521#issuecomment-2707077185 The RC is supposed to gather those kind of feedbacks and difficulties. There is no Apache Spark 4.0.0 until we have a community-blessed one. -- This is an automated message from t

Re: [PR] [SPARK-48231][BUILD] Remove unused CodeHaus Jackson dependencies [spark]

2025-03-07 Thread via GitHub
dongjoon-hyun commented on PR #46521: URL: https://github.com/apache/spark/pull/46521#issuecomment-2707075252 No, what I meant here was that your concern is legitimate. So, you can raise your concerns to the broader audience, @pan3793 . For example, dev@spark instead of this PR which is com

Re: [PR] [SPARK-48231][BUILD] Remove unused CodeHaus Jackson dependencies [spark]

2025-03-07 Thread via GitHub
dongjoon-hyun commented on PR #46521: URL: https://github.com/apache/spark/pull/46521#issuecomment-2707044958 > In short, my conclusion is, we should and must keep all jars required by Hive built-in UDF to allow `o.a.h.hive.ql.exec.FunctionRegistry` initialization, for other jars like commo

Re: [PR] [SPARK-48231][BUILD] Remove unused CodeHaus Jackson dependencies [spark]

2025-03-07 Thread via GitHub
pan3793 commented on PR #46521: URL: https://github.com/apache/spark/pull/46521#issuecomment-2707031841 In short, my conclusion is, we should and must keep all jars required by Hive built-in UDF to allow `o.a.h.hive.ql.exec.FunctionRegistry` initialization, for other jars like commons-lang,

Re: [PR] [SPARK-48231][BUILD] Remove unused CodeHaus Jackson dependencies [spark]

2025-03-07 Thread via GitHub
pan3793 commented on PR #46521: URL: https://github.com/apache/spark/pull/46521#issuecomment-2707020792 > Are you assuming to rebuild all Hive UDF jars here? @dongjoon-hyun I never made such an assumption, most of the existing UDFs should work without any change, except to: the UDFs e

Re: [PR] [SPARK-48231][BUILD] Remove unused CodeHaus Jackson dependencies [spark]

2025-03-07 Thread via GitHub
dongjoon-hyun commented on PR #46521: URL: https://github.com/apache/spark/pull/46521#issuecomment-2706978635 BTW, thank you for taking a look at removing this. I support your direction and I hope we can revisit this with you for Apache Spark 4.1.0 timeframe, @pan3793 . -- This is an a

Re: [PR] [WIP][SPARK-51395][SQL] Refine handling of default values in procedures [spark]

2025-03-07 Thread via GitHub
aokolnychyi closed pull request #50197: [WIP][SPARK-51395][SQL] Refine handling of default values in procedures URL: https://github.com/apache/spark/pull/50197 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [PR] [SPARK-48231][BUILD] Remove unused CodeHaus Jackson dependencies [spark]

2025-03-07 Thread via GitHub
dongjoon-hyun commented on PR #46521: URL: https://github.com/apache/spark/pull/46521#issuecomment-2706983644 I added this to a subtask of SPARK-48231 . -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

Re: [PR] [SPARK-48231][BUILD] Remove unused CodeHaus Jackson dependencies [spark]

2025-03-07 Thread via GitHub
dongjoon-hyun commented on PR #46521: URL: https://github.com/apache/spark/pull/46521#issuecomment-2706974909 Thank you for checking, @pan3793 . Are you assuming to rebuild all Hive UDF jars here? I'm wondering if you are presenting the test result with old Hive built-UDF jars here.

Re: [PR] [SPARK-48231][BUILD] Remove unused CodeHaus Jackson dependencies [spark]

2025-03-07 Thread via GitHub
pan3793 commented on PR #46521: URL: https://github.com/apache/spark/pull/46521#issuecomment-2706965438 > For this one PR, I believe we need a verification for different HMS versions to make it sure. @dongjoon-hyun I managed to set up an env to test the IsolatedClassLoader, it works

Re: [PR] [SPARK-51272][CORE]. Fix for the race condition in Scheduler causing failure in retrying all partitions in case of indeterministic shuffle keys [spark]

2025-03-07 Thread via GitHub
attilapiros commented on code in PR #50033: URL: https://github.com/apache/spark/pull/50033#discussion_r1985391930 ## core/src/test/scala/org/apache/spark/scheduler/DAGSchedulerSuite.scala: ## @@ -3185,16 +3197,106 @@ class DAGSchedulerSuite extends SparkFunSuite with TempLocal

Re: [PR] [SPARK-51272][CORE]. Fix for the race condition in Scheduler causing failure in retrying all partitions in case of indeterministic shuffle keys [spark]

2025-03-07 Thread via GitHub
attilapiros commented on code in PR #50033: URL: https://github.com/apache/spark/pull/50033#discussion_r1985391930 ## core/src/test/scala/org/apache/spark/scheduler/DAGSchedulerSuite.scala: ## @@ -3185,16 +3197,106 @@ class DAGSchedulerSuite extends SparkFunSuite with TempLocal

Re: [PR] [SPARK-51182][SQL] DataFrameWriter should throw dataPathNotSpecifiedError when path is not specified [spark]

2025-03-07 Thread via GitHub
vrozov commented on code in PR #49928: URL: https://github.com/apache/spark/pull/49928#discussion_r1985384660 ## sql/core/src/test/java/test/org/apache/spark/sql/JavaDataFrameReaderWriterSuite.java: ## @@ -152,4 +159,16 @@ public void testOrcAPI() { spark.read().schema(sche

Re: [PR] [SPARK-43221][CORE] Host local block fetching should use a block status of a block stored on disk [spark]

2025-03-07 Thread via GitHub
attilapiros commented on PR #50122: URL: https://github.com/apache/spark/pull/50122#issuecomment-2706931664 cc @gengliangwang -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[PR] [WIP][SPARK-51348][BUILD][SQL] Upgrade Hive to 4.0 [spark]

2025-03-07 Thread via GitHub
vrozov opened a new pull request, #50213: URL: https://github.com/apache/spark/pull/50213 ### What changes were proposed in this pull request? Upgrade Hive compile time dependency to 4.0.1 ### Why are the changes needed? Apache Hive 1.x, 2.x and 3.x are EOL ### Doe

Re: [PR] [SPARK-50892][SQL]Add UnionLoopExec, physical operator for recursion, to perform execution of recursive queries [spark]

2025-03-07 Thread via GitHub
Pajaraja commented on code in PR #49955: URL: https://github.com/apache/spark/pull/49955#discussion_r1985302503 ## sql/core/src/main/scala/org/apache/spark/sql/execution/UnionLoopExec.scala: ## @@ -0,0 +1,225 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one

Re: [PR] [SPARK-50892][SQL]Add UnionLoopExec, physical operator for recursion, to perform execution of recursive queries [spark]

2025-03-07 Thread via GitHub
Pajaraja commented on code in PR #49955: URL: https://github.com/apache/spark/pull/49955#discussion_r1985300518 ## sql/core/src/main/scala/org/apache/spark/sql/execution/UnionLoopExec.scala: ## @@ -0,0 +1,225 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one

Re: [PR] [MINOR][SPARK-CORE] Remove redundant synchronized in ThreadUtils [spark]

2025-03-07 Thread via GitHub
beliefer commented on code in PR #50210: URL: https://github.com/apache/spark/pull/50210#discussion_r1985118560 ## core/src/main/scala/org/apache/spark/util/ThreadUtils.scala: ## @@ -65,7 +65,7 @@ private[spark] object ThreadUtils { } } -override def isTerminat

Re: [PR] [SPARK-51365][SQL][TESTS] Reduce `SHUFFLE_EXCHANGE_MAX_THREAD_THRESHOLD/RESULT_QUERY_STAGE_MAX_THREAD_THRESHOLD` for tests related to `SharedSparkSession/TestHive` when using `macOS + Apple S

2025-03-07 Thread via GitHub
LuciferYang commented on code in PR #50206: URL: https://github.com/apache/spark/pull/50206#discussion_r1985261482 ## sql/core/src/test/scala/org/apache/spark/sql/test/SharedSparkSession.scala: ## @@ -79,6 +80,15 @@ trait SharedSparkSessionBase StaticSQLConf.WAREHOUSE_PAT

Re: [PR] [SPARK-51314][DOCS][PS] Add proper note for distributed-sequence about indeterministic case [spark]

2025-03-07 Thread via GitHub
the-sakthi commented on PR #50086: URL: https://github.com/apache/spark/pull/50086#issuecomment-2704944701 I noticed a very minor grammatical nit here, apologies for the oversight. Have created a PR to quickly address that: https://github.com/apache/spark/pull/50196 @itholic --

[PR] [SPARK-51438][SQL] Make CatalystDataToProtobuf and ProtobufDataToCatalyst properly comparable and hashable [spark]

2025-03-07 Thread via GitHub
vladimirg-db opened a new pull request, #50212: URL: https://github.com/apache/spark/pull/50212 ### What changes were proposed in this pull request? Hand-roll `equals` and `hashCode` for `CatalystDataToProtobuf` and `ProtobufDataToCatalyst`. ### Why are the changes needed?

Re: [PR] [SPARK-51425][Connect] Add client API to set custom `operation_id` [spark]

2025-03-07 Thread via GitHub
vicennial commented on PR #50191: URL: https://github.com/apache/spark/pull/50191#issuecomment-2706706239 Thanks for the review! CI is green after some lint changes :) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use t

Re: [PR] [SPARK-50892][SQL]Add UnionLoopExec, physical operator for recursion, to perform execution of recursive queries [spark]

2025-03-07 Thread via GitHub
peter-toth commented on PR #49955: URL: https://github.com/apache/spark/pull/49955#issuecomment-2706641950 > However, we may push down local limit without global limit and at the end they can be very far away. I think we disagree a bit here. While the above is true, a `LocalLimit(n)`

Re: [PR] [SPARK-50992][SQL] OOMs and performance issues with AQE in large plans [spark]

2025-03-07 Thread via GitHub
JackBuggins commented on PR #49724: URL: https://github.com/apache/spark/pull/49724#issuecomment-2706585219 Strongly agree with @SauronShepherd, many will have workflows where the final plan on the UI is not critical, many opt to debug and understand plans via explain. Off and Off wi

Re: [PR] [MINOR][SPARK-CORE] Remove redundant synchronized in ThreadUtils [spark]

2025-03-07 Thread via GitHub
beliefer commented on PR #50210: URL: https://github.com/apache/spark/pull/50210#issuecomment-2706543971 Please config github action. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-51366][SQL] Add a new visitCaseWhen method to V2ExpressionSQLBuilder [spark]

2025-03-07 Thread via GitHub
beliefer closed pull request #50129: [SPARK-51366][SQL] Add a new visitCaseWhen method to V2ExpressionSQLBuilder URL: https://github.com/apache/spark/pull/50129 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abov

Re: [PR] [SPARK-50892][SQL]Add UnionLoopExec, physical operator for recursion, to perform execution of recursive queries [spark]

2025-03-07 Thread via GitHub
cloud-fan commented on code in PR #49955: URL: https://github.com/apache/spark/pull/49955#discussion_r1985046279 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/cteOperators.scala: ## @@ -40,7 +40,8 @@ case class UnionLoop( id: Long, anchor:

Re: [PR] [SPARK-50892][SQL]Add UnionLoopExec, physical operator for recursion, to perform execution of recursive queries [spark]

2025-03-07 Thread via GitHub
cloud-fan commented on code in PR #49955: URL: https://github.com/apache/spark/pull/49955#discussion_r1985063422 ## sql/core/src/main/scala/org/apache/spark/sql/execution/UnionLoopExec.scala: ## @@ -0,0 +1,225 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one

Re: [PR] [SPARK-50892][SQL]Add UnionLoopExec, physical operator for recursion, to perform execution of recursive queries [spark]

2025-03-07 Thread via GitHub
Pajaraja commented on PR #49955: URL: https://github.com/apache/spark/pull/49955#issuecomment-2706383952 > A side note, for those usecases where the `UnionLoop` is infinite we should probaby introduce a config similar to `spark.sql.cteRecursionLevelLimit`, but to limit the number of rows re

[PR] [WIP][SPARK-51437][CORE] Let timeoutCheckingTask could response thread interrupt [spark]

2025-03-07 Thread via GitHub
beliefer opened a new pull request, #50211: URL: https://github.com/apache/spark/pull/50211 ### What changes were proposed in this pull request? This PR proposes to let `timeoutCheckingTask` could response thread interrupt. ### Why are the changes needed? Currently, we cance

Re: [PR] [SPARK-50892][SQL]Add UnionLoopExec, physical operator for recursion, to perform execution of recursive queries [spark]

2025-03-07 Thread via GitHub
cloud-fan commented on code in PR #49955: URL: https://github.com/apache/spark/pull/49955#discussion_r1985050893 ## sql/core/src/main/scala/org/apache/spark/sql/execution/UnionLoopExec.scala: ## @@ -0,0 +1,225 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one

Re: [PR] [MINOR][SQL] Slightly refactor and optimize illegaility check in Recursive CTE Subqueries [spark]

2025-03-07 Thread via GitHub
cloud-fan commented on code in PR #50208: URL: https://github.com/apache/spark/pull/50208#discussion_r1985031301 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala: ## @@ -1037,10 +1034,26 @@ trait CheckAnalysis extends LookupCatalog with

Re: [PR] [SPARK-50892][SQL]Add UnionLoopExec, physical operator for recursion, to perform execution of recursive queries [spark]

2025-03-07 Thread via GitHub
cloud-fan commented on PR #49955: URL: https://github.com/apache/spark/pull/49955#issuecomment-2706427848 > I think LocalLimit(n)'s purpose is to provide a cheap max n row limiter. We don't have a user-facing API for local limit and local limit is always generated from a global limit,

Re: [PR] [MINOR][SQL] Slightly refactor and optimize illegaility check in Recursive CTE Subqueries [spark]

2025-03-07 Thread via GitHub
peter-toth commented on code in PR #50208: URL: https://github.com/apache/spark/pull/50208#discussion_r1985025062 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala: ## @@ -1037,10 +1034,26 @@ trait CheckAnalysis extends LookupCatalog with

Re: [PR] [MINOR][SQL] Slightly refactor and optimize illegaility check in Recursive CTE Subqueries [spark]

2025-03-07 Thread via GitHub
cloud-fan closed pull request #50208: [MINOR][SQL] Slightly refactor and optimize illegaility check in Recursive CTE Subqueries URL: https://github.com/apache/spark/pull/50208 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and u

Re: [PR] [MINOR][SQL] Slightly refactor and optimize illegaility check in Recursive CTE Subqueries [spark]

2025-03-07 Thread via GitHub
cloud-fan commented on PR #50208: URL: https://github.com/apache/spark/pull/50208#issuecomment-2706396016 thanks, merging to master/4.0! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specif

Re: [PR] [SPARK-50892][SQL]Add UnionLoopExec, physical operator for recursion, to perform execution of recursive queries [spark]

2025-03-07 Thread via GitHub
peter-toth commented on PR #49955: URL: https://github.com/apache/spark/pull/49955#issuecomment-2706392622 > This makes sense, but I wonder how would we tell apart the cases where it's an infinite recursion, and we're returning the first k (k modifiable in flag) results vs a finite (but ver

Re: [PR] [SPARK-51350][SQL] Implement Show Procedures [spark]

2025-03-07 Thread via GitHub
cloud-fan commented on code in PR #50109: URL: https://github.com/apache/spark/pull/50109#discussion_r1984471179 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/InMemoryCatalog.scala: ## @@ -651,5 +651,4 @@ class InMemoryCatalog( requireDbExists(db)

[PR] [MINOR][SPARK-CORE] Remove redundant synchronized in ThreadUtils [spark]

2025-03-07 Thread via GitHub
jinkachy opened a new pull request, #50210: URL: https://github.com/apache/spark/pull/50210 ### What changes were proposed in this pull request? This PR removes the redundant `synchronized` keyword from the `isTerminated` method in `sameThreadExecutorService()` imp

Re: [PR] [SPARK-50892][SQL]Add UnionLoopExec, physical operator for recursion, to perform execution of recursive queries [spark]

2025-03-07 Thread via GitHub
peter-toth commented on PR #49955: URL: https://github.com/apache/spark/pull/49955#issuecomment-2706311534 + A side note, for those usecases where the `UnionLoop` is infinite we should probaby introduce a config similar to `spark.sql.cteRecursionLevelLimit`, but limit the number of rows ret

Re: [PR] [SPARK-51029][BUILD] Remove `hive-llap-common` compile dependency [spark]

2025-03-07 Thread via GitHub
pan3793 commented on PR #49725: URL: https://github.com/apache/spark/pull/49725#issuecomment-2706298521 Sorry, I can't follow the decision of removing `hive-llap-common-2.3.10.jar` from Spark dist, it technically breaks the feature "Support Hive UDF", without this jar, Spark is not able to

Re: [PR] [SPARK-51271][PYTHON] Add filter pushdown API to Python Data Sources [spark]

2025-03-07 Thread via GitHub
cloud-fan commented on code in PR #49961: URL: https://github.com/apache/spark/pull/49961#discussion_r1984617330 ## python/pyspark/sql/datasource.py: ## @@ -234,6 +249,62 @@ def streamReader(self, schema: StructType) -> "DataSourceStreamReader": ) +ColumnPath = Tup

[PR] [WIP][SPARK-51436][SQL] Change the mayInterruptIfRunning from true to false [spark]

2025-03-07 Thread via GitHub
beliefer opened a new pull request, #50209: URL: https://github.com/apache/spark/pull/50209 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? 'No'. ### How was t

[PR] [MINOR][SQL] Slightly refactor and optimize illegaility check in Recursive CTE Subqueries [spark]

2025-03-07 Thread via GitHub
Pajaraja opened a new pull request, #50208: URL: https://github.com/apache/spark/pull/50208 ### What changes were proposed in this pull request? Change the place where we check whether there is a recursive CTE within a subquery. Also, change implementation to be instead of collecting

[PR] Revert "[SPARK-51396][SQL] RuntimeConfig.getOption shouldn't use exceptions for control flow" [spark]

2025-03-07 Thread via GitHub
JoshRosen opened a new pull request, #50200: URL: https://github.com/apache/spark/pull/50200 ### What changes were proposed in this pull request? This reverts commit db06293dd100b4f2a4efe3e7624a9be2345e6575 / https://github.com/apache/spark/pull/50167. That PR introduced a subt

  1   2   >