Re: [PR] [SPARK-49479][CORE] Cancel the Timer non-daemon thread on stopping the BarrierCoordinator [spark]

2025-03-08 Thread via GitHub
jjayadeep06 commented on code in PR #50020: URL: https://github.com/apache/spark/pull/50020#discussion_r1983092651 ## core/src/main/scala/org/apache/spark/BarrierCoordinator.scala: ## @@ -80,8 +82,13 @@ private[spark] class BarrierCoordinator( states.forEachValue(1, clear

Re: [PR] [SPARK-51436][CORE][SQL][K8s][SS] Fix bug that cancel Future specified mayInterruptIfRunning with true [spark]

2025-03-08 Thread via GitHub
srowen commented on code in PR #50209: URL: https://github.com/apache/spark/pull/50209#discussion_r1986124610 ## connector/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/consumer/FetchedDataPool.scala: ## @@ -139,7 +139,7 @@ private[consumer] class FetchedDataPool(

Re: [PR] [SPARK-51436][CORE][SQL][K8s][SS] Fix bug that cancel Future specified mayInterruptIfRunning with true [spark]

2025-03-08 Thread via GitHub
srowen commented on code in PR #50209: URL: https://github.com/apache/spark/pull/50209#discussion_r1986206274 ## connector/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/consumer/FetchedDataPool.scala: ## @@ -139,7 +139,7 @@ private[consumer] class FetchedDataPool(

Re: [PR] [SPARK-51436][CORE][SQL][K8s][SS] Fix bug that cancel Future specified mayInterruptIfRunning with true [spark]

2025-03-08 Thread via GitHub
beliefer commented on code in PR #50209: URL: https://github.com/apache/spark/pull/50209#discussion_r1986207234 ## connector/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/consumer/FetchedDataPool.scala: ## @@ -139,7 +139,7 @@ private[consumer] class FetchedDataPool

Re: [PR] [SPARK-51338][INFRA] Add automated CI build for `connect-examples` [spark]

2025-03-08 Thread via GitHub
srowen commented on PR #50187: URL: https://github.com/apache/spark/pull/50187#issuecomment-2708415916 Can the examples module simply point to SNAPSHOT versions like everything else in the build? the main branch code is always pointing at unreleased code, but on release, those SNAPSHOT vers

Re: [PR] [SPARK-51338][INFRA] Add automated CI build for `connect-examples` [spark]

2025-03-08 Thread via GitHub
LuciferYang commented on PR #50187: URL: https://github.com/apache/spark/pull/50187#issuecomment-2708419792 > Can the examples module simply point to SNAPSHOT versions like everything else in the build? the main branch code is always pointing at unreleased code, but on release, those SNAPSH

Re: [PR] [SPARK-51338] Add automated CI build for `connect-examples` [spark]

2025-03-08 Thread via GitHub
hvanhovell commented on PR #50187: URL: https://github.com/apache/spark/pull/50187#issuecomment-2704355339 @HyukjinKwon can you take a look? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the sp

[PR] Change host to ip [spark]

2025-03-08 Thread via GitHub
AryelSouza opened a new pull request, #50216: URL: https://github.com/apache/spark/pull/50216 an issue was open about the necessity of changing host to ip because ip would be deprecated for the documentation -- This is an automated message from the Apache Git Service. To respond to th

Re: [PR] [SPARK-51182][SQL] DataFrameWriter should throw dataPathNotSpecifiedError when path is not specified [spark]

2025-03-08 Thread via GitHub
vrozov commented on PR #49928: URL: https://github.com/apache/spark/pull/49928#issuecomment-2708396116 @cloud-fan @LuciferYang I prefer to keep the test in the java as it does not hurt and 1. There is similar test in R even though it is not R specific 2. Other tests in `JavaDataFr

Re: [PR] [SPARK-51436][CORE][SQL][K8s][SS] Fix bug that cancel Future specified mayInterruptIfRunning with true [spark]

2025-03-08 Thread via GitHub
beliefer commented on code in PR #50209: URL: https://github.com/apache/spark/pull/50209#discussion_r1986190446 ## connector/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/consumer/FetchedDataPool.scala: ## @@ -139,7 +139,7 @@ private[consumer] class FetchedDataPool

Re: [PR] [SPARK-51436][CORE][SQL][K8s][SS] Fix bug that cancel Future specified mayInterruptIfRunning with true [spark]

2025-03-08 Thread via GitHub
srowen commented on code in PR #50209: URL: https://github.com/apache/spark/pull/50209#discussion_r1986190918 ## connector/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/consumer/FetchedDataPool.scala: ## @@ -139,7 +139,7 @@ private[consumer] class FetchedDataPool(

Re: [PR] Remove session string calls [spark]

2025-03-08 Thread via GitHub
github-actions[bot] commented on PR #48974: URL: https://github.com/apache/spark/pull/48974#issuecomment-2708582247 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.

Re: [PR] [SPARK-51436][CORE][SQL][K8s][SS] Fix bug that cancel Future specified mayInterruptIfRunning with true [spark]

2025-03-08 Thread via GitHub
beliefer commented on code in PR #50209: URL: https://github.com/apache/spark/pull/50209#discussion_r1986202471 ## connector/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/consumer/FetchedDataPool.scala: ## @@ -139,7 +139,7 @@ private[consumer] class FetchedDataPool

Re: [PR] [SPARK-51436][CORE][SQL][K8s][SS] Fix bug that cancel Future specified mayInterruptIfRunning with true [spark]

2025-03-08 Thread via GitHub
beliefer commented on code in PR #50209: URL: https://github.com/apache/spark/pull/50209#discussion_r1986202471 ## connector/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/consumer/FetchedDataPool.scala: ## @@ -139,7 +139,7 @@ private[consumer] class FetchedDataPool

Re: [PR] [SPARK-51365][SQL][TESTS] Reduce `SHUFFLE_EXCHANGE_MAX_THREAD_THRESHOLD/RESULT_QUERY_STAGE_MAX_THREAD_THRESHOLD` for tests related to `SharedSparkSession/TestHive` when using `macOS + Apple S

2025-03-08 Thread via GitHub
dongjoon-hyun commented on code in PR #50206: URL: https://github.com/apache/spark/pull/50206#discussion_r1986220526 ## .github/workflows/build_maven_java21_macos15.yml: ## @@ -36,5 +36,9 @@ jobs: os: macos-15 envs: >- { - "OBJC_DISABLE_INITIALIZE

Re: [PR] [SPARK-51365][SQL][TESTS] Reduce `SHUFFLE_EXCHANGE_MAX_THREAD_THRESHOLD/RESULT_QUERY_STAGE_MAX_THREAD_THRESHOLD` for tests related to `SharedSparkSession/TestHive` when using `macOS + Apple S

2025-03-08 Thread via GitHub
LuciferYang commented on code in PR #50206: URL: https://github.com/apache/spark/pull/50206#discussion_r1986220726 ## .github/workflows/build_maven_java21_macos15.yml: ## @@ -36,5 +36,9 @@ jobs: os: macos-15 envs: >- { - "OBJC_DISABLE_INITIALIZE_F

Re: [PR] [SPARK-51365][SQL][TESTS] Reduce `SHUFFLE_EXCHANGE_MAX_THREAD_THRESHOLD/RESULT_QUERY_STAGE_MAX_THREAD_THRESHOLD` for tests related to `SharedSparkSession/TestHive` when using `macOS + Apple S

2025-03-08 Thread via GitHub
dongjoon-hyun commented on code in PR #50206: URL: https://github.com/apache/spark/pull/50206#discussion_r1986220526 ## .github/workflows/build_maven_java21_macos15.yml: ## @@ -36,5 +36,9 @@ jobs: os: macos-15 envs: >- { - "OBJC_DISABLE_INITIALIZE

[PR] [SPARK-51444][CORE] Remove unreachable code from `TaskSchedulerImpl#statusUpdate` [spark]

2025-03-08 Thread via GitHub
LuciferYang opened a new pull request, #50218: URL: https://github.com/apache/spark/pull/50218 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How

Re: [PR] [SPARK-51402][SQL][TESTS] Test TimeType in UDF [spark]

2025-03-08 Thread via GitHub
MaxGekk commented on code in PR #50194: URL: https://github.com/apache/spark/pull/50194#discussion_r1986236836 ## dev/create-release/release-build.sh: ## @@ -137,6 +137,12 @@ if [[ "$1" == "finalize" ]]; then --repository-url https://upload.pypi.org/legacy/ \ "pyspark_

Re: [PR] [SPARK-51436][CORE][SQL][K8s][SS] Fix bug that cancel Future specified mayInterruptIfRunning with true [spark]

2025-03-08 Thread via GitHub
beliefer commented on PR #50209: URL: https://github.com/apache/spark/pull/50209#issuecomment-2708241246 ping @srowen @dongjoon-hyun @LuciferYang -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] [SPARK-51429][Connect] Add "Acknowledgement" message to ExecutePlanResponse [spark]

2025-03-08 Thread via GitHub
vicennial commented on PR #50193: URL: https://github.com/apache/spark/pull/50193#issuecomment-2708400817 Putting this on hold ATM as some unexpected complications popped up (particularly the interactions with response indices and response caching) -- This is an automated message from the

Re: [PR] [SPARK-51298][SQL] Support variant in CSV scan [spark]

2025-03-08 Thread via GitHub
sandip-db commented on code in PR #50052: URL: https://github.com/apache/spark/pull/50052#discussion_r1986121542 ## sql/core/src/test/scala/org/apache/spark/sql/CsvFunctionsSuite.scala: ## @@ -760,12 +760,9 @@ class CsvFunctionsSuite extends QueryTest with SharedSparkSession {

Re: [PR] [SPARK-50763][SQL] Add Analyzer rule for resolving SQL table functions [spark]

2025-03-08 Thread via GitHub
cloud-fan commented on code in PR #49471: URL: https://github.com/apache/spark/pull/49471#discussion_r1982863500 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala: ## @@ -1675,6 +1676,91 @@ class SessionCatalog( } } + /** + *

Re: [PR] [SPARK-50892][SQL]Add UnionLoopExec, physical operator for recursion, to perform execution of recursive queries [spark]

2025-03-08 Thread via GitHub
cloud-fan commented on PR #49955: URL: https://github.com/apache/spark/pull/49955#issuecomment-2705416036 @peter-toth Ideally recursive CTE should stop if the last iteration generates no data. Pushing down the LIMIT and applying an early stop is an optimization and should not change the qu

[PR] [SPARK-51443] Fix singleVariantColumn in DSv2 and readStream. [spark]

2025-03-08 Thread via GitHub
chenhao-db opened a new pull request, #50217: URL: https://github.com/apache/spark/pull/50217 ### What changes were proposed in this pull request? The current JSON `singleVariantColumn` mode doesn't work in DSv2 and `spark.readStream`. This PR fixes the two cases: - DSv1 calls `Jso

Re: [PR] [SPARK-45265][SQL] Support Hive 4.0 metastore [spark]

2025-03-08 Thread via GitHub
hidataplus commented on code in PR #48823: URL: https://github.com/apache/spark/pull/48823#discussion_r1986028539 ## sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala: ## @@ -1030,7 +1030,7 @@ private[spark] class HiveExternalCatalog(conf: SparkConf, h

Re: [PR] [SPARK-51338][INFRA] Add automated CI build for `connect-examples` [spark]

2025-03-08 Thread via GitHub
LuciferYang commented on PR #50187: URL: https://github.com/apache/spark/pull/50187#issuecomment-2708296150 I've noticed a rather peculiar issue here. It seems that the `connect-examples` project is dependent on a released version of Spark, which means we can only update to a new version af

[PR] [SPARK-51359][CORE][SQL] Set INT64 as the default timestamp type for Parquet files [spark]

2025-03-08 Thread via GitHub
ganeshashree opened a new pull request, #50215: URL: https://github.com/apache/spark/pull/50215 ### What changes were proposed in this pull request? Changes done to set INT64 as the default timestamp type for Parquet files. ### Why are the changes needed?

Re: [PR] [SPARK-43221][CORE] Host local block fetching should use a block status of a block stored on disk [spark]

2025-03-08 Thread via GitHub
attilapiros commented on code in PR #50122: URL: https://github.com/apache/spark/pull/50122#discussion_r1986090749 ## core/src/test/scala/org/apache/spark/storage/BlockManagerSuite.scala: ## @@ -474,6 +474,26 @@ class BlockManagerSuite extends SparkFunSuite with Matchers with P

Re: [PR] [SPARK-45265][SQL] Support Hive 4.0 metastore [spark]

2025-03-08 Thread via GitHub
hidataplus commented on code in PR #48823: URL: https://github.com/apache/spark/pull/48823#discussion_r1986020465 ## sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala: ## @@ -177,8 +179,10 @@ private[hive] class HiveClientImpl( // got changed. We

Re: [PR] [SPARK-51365][SQL][TESTS] Reduce `SHUFFLE_EXCHANGE_MAX_THREAD_THRESHOLD/RESULT_QUERY_STAGE_MAX_THREAD_THRESHOLD` for tests related to `SharedSparkSession/TestHive` when using `macOS + Apple S

2025-03-08 Thread via GitHub
LuciferYang commented on PR #50206: URL: https://github.com/apache/spark/pull/50206#issuecomment-2708248561 The PR title and description will be updated after the finalization of the plan. -- This is an automated message from the Apache Git Service. To respond to the message, plea

Re: [PR] [SPARK-51182][SQL] DataFrameWriter should throw dataPathNotSpecifiedError when path is not specified [spark]

2025-03-08 Thread via GitHub
LuciferYang commented on PR #49928: URL: https://github.com/apache/spark/pull/49928#issuecomment-2708280968 > @vrozov can you remove the java test? [#49928 (comment)](https://github.com/apache/spark/pull/49928#discussion_r1981021185) +1, for @cloud-fan's comments -- This is an auto