Re: [PR] [SPARK-51922] [SS] Fix UTFDataFormatException thrown from StateStoreChangelogReaderFactory for v1 [spark]

2025-04-25 Thread via GitHub
HeartSaVioR closed pull request #50721: [SPARK-51922] [SS] Fix UTFDataFormatException thrown from StateStoreChangelogReaderFactory for v1 URL: https://github.com/apache/spark/pull/50721 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to Gi

Re: [PR] [SPARK-51921][PYTHON][SS] Use long type for TTL duration in millisecond [spark]

2025-04-25 Thread via GitHub
anishshri-db commented on PR #50719: URL: https://github.com/apache/spark/pull/50719#issuecomment-2831892176 cc - @HeartSaVioR - PTAL, thanks ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] [SPARK-51904][SS] Removing async metadata purging for StateSchemaV3 and ignoring non-batch files when listing OperatorMetadata files [spark]

2025-04-25 Thread via GitHub
HeartSaVioR closed pull request #50700: [SPARK-51904][SS] Removing async metadata purging for StateSchemaV3 and ignoring non-batch files when listing OperatorMetadata files URL: https://github.com/apache/spark/pull/50700 -- This is an automated message from the Apache Git Service. To respond

Re: [PR] [SPARK-51817][SPARK-49578][CONNECT] Re-introduce ansiConfig fields in messageParameters of CAST_INVALID_INPUT and CAST_OVERFLOW [spark]

2025-04-25 Thread via GitHub
nija-at commented on PR #50604: URL: https://github.com/apache/spark/pull/50604#issuecomment-2830397277 @cloud-fan thanks. I've fixed these now. Waiting for CI to confirm. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and us

Re: [PR] [SPARK-51820][SQL][CONNECT] Address remaining issues for new `group`/`order` by ordinal approach [spark]

2025-04-25 Thread via GitHub
mihailoale-db commented on PR #50699: URL: https://github.com/apache/spark/pull/50699#issuecomment-2830576716 IIUC we changed the API/approach for functions that explicitly add `Sort`/`Aggregate`, but there are other functions/rules that do that implicitly (e.g `randomSplit`)? @cloud-fan ar

Re: [PR] [SPARK-51805] [SQL] Get function with improper argument should throw proper exception instead of an internal one [spark]

2025-04-25 Thread via GitHub
mihailoale-db commented on code in PR #50590: URL: https://github.com/apache/spark/pull/50590#discussion_r2060432450 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeExtractors.scala: ## @@ -314,6 +315,30 @@ case class GetArrayItem( })

Re: [PR] [SPARK-51918][CORE] Executor exit wait for out/err appenders to stop + flush remaining data [spark]

2025-04-25 Thread via GitHub
m8719-github closed pull request #50715: [SPARK-51918][CORE] Executor exit wait for out/err appenders to stop + flush remaining data URL: https://github.com/apache/spark/pull/50715 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

Re: [PR] [SPARK-51821][CORE] Call interrupt() without holding uninterruptibleLock to avoid possible deadlock [spark]

2025-04-25 Thread via GitHub
mridulm commented on code in PR #50594: URL: https://github.com/apache/spark/pull/50594#discussion_r2060457816 ## core/src/main/scala/org/apache/spark/util/UninterruptibleThread.scala: ## @@ -69,10 +75,22 @@ private[spark] class UninterruptibleThread( } uninterruptib

Re: [PR] [SPARK-51821][CORE] Call interrupt() without holding uninterruptibleLock to avoid possible deadlock [spark]

2025-04-25 Thread via GitHub
mridulm commented on code in PR #50594: URL: https://github.com/apache/spark/pull/50594#discussion_r2060457816 ## core/src/main/scala/org/apache/spark/util/UninterruptibleThread.scala: ## @@ -69,10 +75,22 @@ private[spark] class UninterruptibleThread( } uninterruptib

Re: [PR] [SPARK-51805] [SQL] Get function with improper argument should throw proper exception instead of an internal one [spark]

2025-04-25 Thread via GitHub
mihailoale-db commented on code in PR #50590: URL: https://github.com/apache/spark/pull/50590#discussion_r2060463599 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeExtractors.scala: ## @@ -314,6 +315,30 @@ case class GetArrayItem( })

Re: [PR] [SPARK-51827][SS][CONNECT] Support Spark Connect on transformWithState in PySpark [spark]

2025-04-25 Thread via GitHub
HeartSaVioR commented on PR #50704: URL: https://github.com/apache/spark/pull/50704#issuecomment-2830351699 @hvanhovell @HyukjinKwon @jingz-db Please take a look, thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

Re: [PR] [SPARK-51805] [SQL] Get function with improper argument should throw proper exception instead of an internal one [spark]

2025-04-25 Thread via GitHub
cloud-fan commented on code in PR #50590: URL: https://github.com/apache/spark/pull/50590#discussion_r2060321814 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeExtractors.scala: ## @@ -314,6 +315,30 @@ case class GetArrayItem( }) }

[PR] SPARK-51710 Passing an empty array to PySpark Dataframe.dropDuplicates should behave the same as passing no arguments [spark]

2025-04-25 Thread via GitHub
joke1196 opened a new pull request, #50714: URL: https://github.com/apache/spark/pull/50714 ### What changes were proposed in this pull request? This PR aligns the behavior of `DataFrame.dropDuplicates([])` to be the same as `DataFrame.dropDuplicates()`. ### Why are

[PR] [SPARK-51918][CORE] Wait for out/err appenders to stop + flush remaining data [spark]

2025-04-25 Thread via GitHub
m8719-github opened a new pull request, #50715: URL: https://github.com/apache/spark/pull/50715 ### What changes were proposed in this pull request? Fix executor exit routine to wait for stdout/stderr appenders to stop and flush remaining data. ### Why are the chang

Re: [PR] [SPARK-51821][CORE] Call interrupt() without holding uninterruptibleLock to avoid possible deadlock [spark]

2025-04-25 Thread via GitHub
mridulm commented on code in PR #50594: URL: https://github.com/apache/spark/pull/50594#discussion_r2060409987 ## core/src/main/scala/org/apache/spark/util/UninterruptibleThread.scala: ## @@ -69,10 +75,22 @@ private[spark] class UninterruptibleThread( } uninterruptib

Re: [PR] [SPARK-51821][CORE] Call interrupt() without holding uninterruptibleLock to avoid possible deadlock [spark]

2025-04-25 Thread via GitHub
mridulm commented on code in PR #50594: URL: https://github.com/apache/spark/pull/50594#discussion_r2060409987 ## core/src/main/scala/org/apache/spark/util/UninterruptibleThread.scala: ## @@ -69,10 +75,22 @@ private[spark] class UninterruptibleThread( } uninterruptib

Re: [PR] [SPARK-51821][CORE] Call interrupt() without holding uninterruptibleLock to avoid possible deadlock [spark]

2025-04-25 Thread via GitHub
mridulm commented on code in PR #50594: URL: https://github.com/apache/spark/pull/50594#discussion_r2060420903 ## core/src/main/scala/org/apache/spark/util/UninterruptibleThread.scala: ## @@ -69,10 +75,22 @@ private[spark] class UninterruptibleThread( } uninterruptib

Re: [PR] [SPARK-51821][CORE] Call interrupt() without holding uninterruptibleLock to avoid possible deadlock [spark]

2025-04-25 Thread via GitHub
mridulm commented on code in PR #50594: URL: https://github.com/apache/spark/pull/50594#discussion_r2060475882 ## core/src/test/scala/org/apache/spark/util/UninterruptibleThreadSuite.scala: ## @@ -115,6 +116,45 @@ class UninterruptibleThreadSuite extends SparkFunSuite { ass

Re: [PR] [SPARK-51821][CORE] Call interrupt() without holding uninterruptibleLock to avoid possible deadlock [spark]

2025-04-25 Thread via GitHub
mridulm commented on code in PR #50594: URL: https://github.com/apache/spark/pull/50594#discussion_r2060471211 ## core/src/main/scala/org/apache/spark/util/UninterruptibleThread.scala: ## @@ -92,11 +110,17 @@ private[spark] class UninterruptibleThread( * interrupted until it

Re: [PR] [SPARK-51821][CORE] Call interrupt() without holding uninterruptibleLock to avoid possible deadlock [spark]

2025-04-25 Thread via GitHub
mridulm commented on code in PR #50594: URL: https://github.com/apache/spark/pull/50594#discussion_r2060471211 ## core/src/main/scala/org/apache/spark/util/UninterruptibleThread.scala: ## @@ -92,11 +110,17 @@ private[spark] class UninterruptibleThread( * interrupted until it

Re: [PR] [SPARK-51728][SQL] Add SELECT EXCEPT Support [spark]

2025-04-25 Thread via GitHub
Gschiavon commented on PR #50536: URL: https://github.com/apache/spark/pull/50536#issuecomment-2830770986 @HyukjinKwon any thoughts on this? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the sp

Re: [PR] [SPARK-51821][CORE] Call interrupt() without holding uninterruptibleLock to avoid possible deadlock [spark]

2025-04-25 Thread via GitHub
mridulm commented on code in PR #50594: URL: https://github.com/apache/spark/pull/50594#discussion_r2060475882 ## core/src/test/scala/org/apache/spark/util/UninterruptibleThreadSuite.scala: ## @@ -115,6 +116,45 @@ class UninterruptibleThreadSuite extends SparkFunSuite { ass

[PR] [PYTHON] Allow overwriting statically registered Python Data Source [spark]

2025-04-25 Thread via GitHub
wengh opened a new pull request, #50716: URL: https://github.com/apache/spark/pull/50716 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How was t

[PR] [SPARK-51671][SQL] Add column pruning ro Recursive CTEs [spark]

2025-04-25 Thread via GitHub
Pajaraja opened a new pull request, #50717: URL: https://github.com/apache/spark/pull/50717 ### What changes were proposed in this pull request? Modify ColumnPruning optimizer rule to successfully prune UnionLoops. For this, ColumnPruning had to migrated to inside the abstract Opti

Re: [PR] [SPARK-51821][CORE] Call interrupt() without holding uninterruptibleLock to avoid possible deadlock [spark]

2025-04-25 Thread via GitHub
mridulm commented on PR #50594: URL: https://github.com/apache/spark/pull/50594#issuecomment-2829589932 Ok, I see the problem. The expectation is for begin and end to be within try/finally - but usually coding pattern would result in the try being used to catch InterruptedException and h

Re: [PR] [SPARK-51911] Support `lateralJoin` in `DataFrame` [spark-connect-swift]

2025-04-25 Thread via GitHub
dongjoon-hyun closed pull request #88: [SPARK-51911] Support `lateralJoin` in `DataFrame` URL: https://github.com/apache/spark-connect-swift/pull/88 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to t

Re: [PR] [SPARK-51821][CORE] Call interrupt() without holding uninterruptibleLock to avoid possible deadlock [spark]

2025-04-25 Thread via GitHub
mridulm commented on PR #50594: URL: https://github.com/apache/spark/pull/50594#issuecomment-2829599847 Actually, I think my formulation above will still have deadlock - though for a more involved reason, sigh. -- This is an automated message from the Apache Git Service. To respond to the

Re: [PR] [SPARK-51911] Support `lateralJoin` in `DataFrame` [spark-connect-swift]

2025-04-25 Thread via GitHub
dongjoon-hyun commented on PR #88: URL: https://github.com/apache/spark-connect-swift/pull/88#issuecomment-2829594978 Let me merge this as a foundation of the future work. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and u

[PR] [SPARK-51898][PYTHON][CONNECT][TESTS][FOLLOW-UP] Fix inconsistent exception [spark]

2025-04-25 Thread via GitHub
zhengruifeng opened a new pull request, #50708: URL: https://github.com/apache/spark/pull/50708 ### What changes were proposed in this pull request? `toJSON` and `rdd` throws `PySparkNotImplementedError` in connect model, but `PySparkAttributeError` in connect-only model

[PR] [MINOR][PYTHON][DOCS] Add 4 missing functions to API references [spark]

2025-04-25 Thread via GitHub
zhengruifeng opened a new pull request, #50709: URL: https://github.com/apache/spark/pull/50709 ### What changes were proposed in this pull request? Add 4 missing functions to API references ### Why are the changes needed? for docs ### Does this PR introduc

Re: [PR] [SPARK-51821][CORE] Call interrupt() without holding uninterruptibleLock to avoid possible deadlock [spark]

2025-04-25 Thread via GitHub
mridulm commented on code in PR #50594: URL: https://github.com/apache/spark/pull/50594#discussion_r2059722561 ## core/src/main/scala/org/apache/spark/util/UninterruptibleThread.scala: ## @@ -69,10 +75,22 @@ private[spark] class UninterruptibleThread( } uninterruptib

[PR] [SPARK-51915][PYTHON][CONNECT][TESTS] Enable SparkConnectDataFrameDebug in connect-only mode [spark]

2025-04-25 Thread via GitHub
zhengruifeng opened a new pull request, #50710: URL: https://github.com/apache/spark/pull/50710 ### What changes were proposed in this pull request? Enable SparkConnectDataFrameDebug in connect-only mode ### Why are the changes needed? to improve test coverage

[PR] [SPARK-51916] Add `create_(scala|table)_function` and `drop_function` test scripts [spark-connect-swift]

2025-04-25 Thread via GitHub
dongjoon-hyun opened a new pull request, #90: URL: https://github.com/apache/spark-connect-swift/pull/90 … ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change?

Re: [PR] [SPARK-51916] Add `create_(scala|table)_function` and `drop_function` test scripts [spark-connect-swift]

2025-04-25 Thread via GitHub
dongjoon-hyun commented on PR #90: URL: https://github.com/apache/spark-connect-swift/pull/90#issuecomment-2829694817 Thank you, @viirya ! Merged to main. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [PR] [SPARK-51916] Add `create_(scala|table)_function` and `drop_function` test scripts [spark-connect-swift]

2025-04-25 Thread via GitHub
dongjoon-hyun closed pull request #90: [SPARK-51916] Add `create_(scala|table)_function` and `drop_function` test scripts URL: https://github.com/apache/spark-connect-swift/pull/90 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

Re: [PR] [WIP][SPARK-51899][SQL] Implement an error allowlist for spark.catalog.listTables() [spark]

2025-04-25 Thread via GitHub
heyihong commented on code in PR #50696: URL: https://github.com/apache/spark/pull/50696#discussion_r2059778984 ## sql/core/src/main/scala/org/apache/spark/sql/classic/Catalog.scala: ## @@ -184,25 +185,19 @@ class Catalog(sparkSession: SparkSession) extends catalog.Catalog with

Re: [PR] [WIP][SPARK-51899][SQL] Implement an error allowlist for spark.catalog.listTables() [spark]

2025-04-25 Thread via GitHub
heyihong commented on code in PR #50696: URL: https://github.com/apache/spark/pull/50696#discussion_r2059778984 ## sql/core/src/main/scala/org/apache/spark/sql/classic/Catalog.scala: ## @@ -184,25 +185,19 @@ class Catalog(sparkSession: SparkSession) extends catalog.Catalog with

Re: [PR] [WIP][SPARK-51899][SQL] Implement an error allowlist for spark.catalog.listTables() [spark]

2025-04-25 Thread via GitHub
heyihong commented on code in PR #50696: URL: https://github.com/apache/spark/pull/50696#discussion_r2059778984 ## sql/core/src/main/scala/org/apache/spark/sql/classic/Catalog.scala: ## @@ -184,25 +185,19 @@ class Catalog(sparkSession: SparkSession) extends catalog.Catalog with

Re: [PR] [SPARK-51821][CORE] Call interrupt() without holding uninterruptibleLock to avoid possible deadlock [spark]

2025-04-25 Thread via GitHub
mridulm commented on PR #50594: URL: https://github.com/apache/spark/pull/50594#issuecomment-2829532707 > @mridulm Do you refer to "[SPARK-51821](https://issues.apache.org/jira/browse/SPARK-51821) uninterruptibleLock deadlock" test? No, it will not fail. Why do you think it would fail?

Re: [PR] [SPARK-51805] [SQL] Get function with improper argument should throw proper exception instead of an internal one [spark]

2025-04-25 Thread via GitHub
cloud-fan closed pull request #50590: [SPARK-51805] [SQL] Get function with improper argument should throw proper exception instead of an internal one URL: https://github.com/apache/spark/pull/50590 -- This is an automated message from the Apache Git Service. To respond to the message, please

Re: [PR] [SPARK-51805] [SQL] Get function with improper argument should throw proper exception instead of an internal one [spark]

2025-04-25 Thread via GitHub
cloud-fan commented on PR #50590: URL: https://github.com/apache/spark/pull/50590#issuecomment-2830878038 thanks, merging to master/4.0! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specif

Re: [PR] [SPARK-51596][SS] Fix concurrent StateStoreProvider maintenance and closing [spark]

2025-04-25 Thread via GitHub
liviazhu-db closed pull request #50391: [SPARK-51596][SS] Fix concurrent StateStoreProvider maintenance and closing URL: https://github.com/apache/spark/pull/50391 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL a

Re: [PR] [SPARK-51596][SS] Fix concurrent StateStoreProvider maintenance and closing [spark]

2025-04-25 Thread via GitHub
liviazhu-db commented on PR #50391: URL: https://github.com/apache/spark/pull/50391#issuecomment-2831035233 Replicated by https://github.com/apache/spark/pull/50595 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the U

Re: [PR] [SPARK-51596][SS] Fix concurrent StateStoreProvider maintenance and closing [spark]

2025-04-25 Thread via GitHub
anishshri-db commented on code in PR #50595: URL: https://github.com/apache/spark/pull/50595#discussion_r2060769106 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/StateStore.scala: ## @@ -1095,6 +1131,14 @@ object StateStore extends Logging { }

[PR] [SPARK-51921] Use long type for TTL duration in millisecond [spark]

2025-04-25 Thread via GitHub
bogao007 opened a new pull request, #50719: URL: https://github.com/apache/spark/pull/50719 ### What changes were proposed in this pull request? Use long type (int64 for protobuf) for TTL duration in millisecond ### Why are the changes needed? Allow users to set l

Re: [PR] [SPARK-51921][PYTHON][SS] Use long type for TTL duration in millisecond [spark]

2025-04-25 Thread via GitHub
bogao007 commented on code in PR #50719: URL: https://github.com/apache/spark/pull/50719#discussion_r2060945740 ## python/pyspark/sql/tests/pandas/helper/helper_pandas_transform_with_state.py: ## @@ -1159,7 +1159,8 @@ class PandasMapStateLargeTTLProcessor(PandasMapStateProcesso

[PR] [DO NOT MERGE]Testing nested correlations handling [spark]

2025-04-25 Thread via GitHub
AveryQi115 opened a new pull request, #50720: URL: https://github.com/apache/spark/pull/50720 # I use this pr to test the handling altogether and modify/add testcases, do not merge ### What changes were proposed in this pull request? ### Why are the changes needed?

Re: [PR] [SPARK-51921][PYTHON][SS] Use long type for TTL duration in millisecond [spark]

2025-04-25 Thread via GitHub
anishshri-db commented on code in PR #50719: URL: https://github.com/apache/spark/pull/50719#discussion_r2060942288 ## python/pyspark/sql/streaming/proto/StateMessage_pb2.py: ## @@ -40,7 +40,7 @@ DESCRIPTOR = _descriptor_pool.Default().AddSerializedFile( - b'\n;org/apac

[PR] [SPARK-51922] [SS] Fix UTFDataFormatException thrown from StateStoreChangelogReaderFactory for v1 [spark]

2025-04-25 Thread via GitHub
liviazhu-db opened a new pull request, #50721: URL: https://github.com/apache/spark/pull/50721 ### What changes were proposed in this pull request? Catch the UTFDataFormatException thrown for v1 in the StateStoreChangelogReaderFactory and assign the version to 1. ##

Re: [PR] [SPARK-51921][PYTHON][SS] Use long type for TTL duration in millisecond [spark]

2025-04-25 Thread via GitHub
anishshri-db commented on code in PR #50719: URL: https://github.com/apache/spark/pull/50719#discussion_r2060942687 ## python/pyspark/sql/tests/pandas/helper/helper_pandas_transform_with_state.py: ## @@ -1159,7 +1159,8 @@ class PandasMapStateLargeTTLProcessor(PandasMapStateProc

Re: [PR] [SPARK-51921][PYTHON][SS] Use long type for TTL duration in millisecond [spark]

2025-04-25 Thread via GitHub
bogao007 commented on code in PR #50719: URL: https://github.com/apache/spark/pull/50719#discussion_r2060945937 ## python/pyspark/sql/streaming/proto/StateMessage_pb2.py: ## @@ -40,7 +40,7 @@ DESCRIPTOR = _descriptor_pool.Default().AddSerializedFile( - b'\n;org/apache/s

Re: [PR] [WIP][SPARK-50983][SQL] Support Nested Correlated Subqueries for Analyzer [spark]

2025-04-25 Thread via GitHub
AveryQi115 closed pull request #49660: [WIP][SPARK-50983][SQL] Support Nested Correlated Subqueries for Analyzer URL: https://github.com/apache/spark/pull/49660 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abov

[PR] [MINOR][DOCS] Add migration note for mapInPandas and mapInArrow validation [spark]

2025-04-25 Thread via GitHub
wengh opened a new pull request, #50722: URL: https://github.com/apache/spark/pull/50722 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How was t

Re: [PR] [SPARK-51921][PYTHON][SS] Use long type for TTL duration in millisecond [spark]

2025-04-25 Thread via GitHub
bogao007 commented on code in PR #50719: URL: https://github.com/apache/spark/pull/50719#discussion_r2060950968 ## python/pyspark/sql/tests/pandas/helper/helper_pandas_transform_with_state.py: ## @@ -1159,7 +1159,8 @@ class PandasMapStateLargeTTLProcessor(PandasMapStateProcesso

Re: [PR] [SPARK-51739][PYTHON] Validate Arrow schema from mapInArrow & mapInPandas & DataSource [spark]

2025-04-25 Thread via GitHub
wengh commented on PR #50531: URL: https://github.com/apache/spark/pull/50531#issuecomment-2831521899 > Thanks for the fix! But this is a breaking change. Can we document this in the migration guide? @allisonwang-db https://github.com/apache/spark/pull/50722 -- This is an automated

Re: [PR] [MINOR][PYTHON][DOCS] Add 4 missing functions to API references [spark]

2025-04-25 Thread via GitHub
zhengruifeng closed pull request #50709: [MINOR][PYTHON][DOCS] Add 4 missing functions to API references URL: https://github.com/apache/spark/pull/50709 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[PR] DESCRIPTION [spark]

2025-04-25 Thread via GitHub
klu2300030052 opened a new pull request, #50712: URL: https://github.com/apache/spark/pull/50712 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### H

Re: [PR] [MINOR][PYTHON][DOCS] Add 4 missing functions to API references [spark]

2025-04-25 Thread via GitHub
zhengruifeng commented on PR #50709: URL: https://github.com/apache/spark/pull/50709#issuecomment-2829976924 merged to master -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comm

Re: [PR] [SPARK-51915][PYTHON][CONNECT][TESTS] Enable SparkConnectDataFrameDebug in connect-only mode [spark]

2025-04-25 Thread via GitHub
zhengruifeng commented on PR #50710: URL: https://github.com/apache/spark/pull/50710#issuecomment-2829980550 merged to master -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] [SPARK-51915][PYTHON][CONNECT][TESTS] Enable SparkConnectDataFrameDebug in connect-only mode [spark]

2025-04-25 Thread via GitHub
zhengruifeng closed pull request #50710: [SPARK-51915][PYTHON][CONNECT][TESTS] Enable SparkConnectDataFrameDebug in connect-only mode URL: https://github.com/apache/spark/pull/50710 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

Re: [PR] [SPARK-51898][PYTHON][CONNECT][TESTS][FOLLOW-UP] Fix inconsistent exception [spark]

2025-04-25 Thread via GitHub
HyukjinKwon commented on code in PR #50708: URL: https://github.com/apache/spark/pull/50708#discussion_r2059904698 ## python/pyspark/sql/connect/dataframe.py: ## @@ -2190,20 +2190,18 @@ def localCheckpoint( assert isinstance(checkpointed._plan, plan.CachedRemoteRelation

Re: [PR] [SPARK-51898][PYTHON][CONNECT][TESTS][FOLLOW-UP] Fix inconsistent exception [spark]

2025-04-25 Thread via GitHub
zhengruifeng commented on code in PR #50708: URL: https://github.com/apache/spark/pull/50708#discussion_r2059968399 ## python/pyspark/sql/connect/dataframe.py: ## @@ -2190,20 +2190,18 @@ def localCheckpoint( assert isinstance(checkpointed._plan, plan.CachedRemoteRelatio

[PR] [SPARK-51917] Add `DataFrameWriterV2` actor [spark-connect-swift]

2025-04-25 Thread via GitHub
dongjoon-hyun opened a new pull request, #91: URL: https://github.com/apache/spark-connect-swift/pull/91 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change?

[PR] Update CONTRIBUTING.md [spark]

2025-04-25 Thread via GitHub
klu2300030052 opened a new pull request, #50713: URL: https://github.com/apache/spark/pull/50713 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### H

[PR] [SPARK-51914][SQL] Add com.mysql.cj to spark.sql.hive.metastore.sharedPrefixes [spark]

2025-04-25 Thread via GitHub
yaooqinn opened a new pull request, #50711: URL: https://github.com/apache/spark/pull/50711 ### What changes were proposed in this pull request? This PR adds `com.mysql.cj` to `spark.sql.hive.metastore.sharedPrefixes` ### Why are the changes needed? Following upst

Re: [PR] [WIP][SPARK-51899][SQL] Implement an error allowlist for spark.catalog.listTables() [spark]

2025-04-25 Thread via GitHub
HyukjinKwon commented on code in PR #50696: URL: https://github.com/apache/spark/pull/50696#discussion_r2059902276 ## sql/core/src/main/scala/org/apache/spark/sql/classic/Catalog.scala: ## @@ -184,25 +185,19 @@ class Catalog(sparkSession: SparkSession) extends catalog.Catalog w

Re: [PR] [PYTHON] Allow overwriting statically registered Python Data Source [spark]

2025-04-25 Thread via GitHub
wengh commented on code in PR #50716: URL: https://github.com/apache/spark/pull/50716#discussion_r2060518095 ## python/docs/source/user_guide/sql/python_data_source.rst: ## @@ -520,4 +520,6 @@ The following example demonstrates how to implement a basic Data Source using Ar Usa

Re: [PR] [PYTHON] Allow overwriting statically registered Python Data Source [spark]

2025-04-25 Thread via GitHub
wengh commented on PR #50716: URL: https://github.com/apache/spark/pull/50716#issuecomment-2830832282 @allisonwang-db @HyukjinKwon please take a look -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [PR] [PYTHON] Allow overwriting statically registered Python Data Source [spark]

2025-04-25 Thread via GitHub
wengh commented on code in PR #50716: URL: https://github.com/apache/spark/pull/50716#discussion_r2060518095 ## python/docs/source/user_guide/sql/python_data_source.rst: ## @@ -520,4 +520,6 @@ The following example demonstrates how to implement a basic Data Source using Ar Usa

Re: [PR] [PYTHON] Allow overwriting statically registered Python Data Source [spark]

2025-04-25 Thread via GitHub
wengh commented on code in PR #50716: URL: https://github.com/apache/spark/pull/50716#discussion_r2060518095 ## python/docs/source/user_guide/sql/python_data_source.rst: ## @@ -520,4 +520,6 @@ The following example demonstrates how to implement a basic Data Source using Ar Usa

Re: [PR] [SPARK-51596][SS] Fix concurrent StateStoreProvider maintenance and closing [spark]

2025-04-25 Thread via GitHub
anishshri-db commented on code in PR #50595: URL: https://github.com/apache/spark/pull/50595#discussion_r2060583685 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/StateStore.scala: ## @@ -1009,14 +1013,46 @@ object StateStore extends Logging {

[PR] [SPARK-51920][SS][PYTHON] Fix composite/nested StructType in value state for python [spark]

2025-04-25 Thread via GitHub
jingz-db opened a new pull request, #50718: URL: https://github.com/apache/spark/pull/50718 ### What changes were proposed in this pull request? Fix a bug in TransformWithStateInPandas in Python. ### Why are the changes needed? Currently, all user provided state

Re: [PR] [SPARK-51900][SQL] Properly throw datatype mismatch in single-pass Analyzer [spark]

2025-04-25 Thread via GitHub
MaxGekk closed pull request #50697: [SPARK-51900][SQL] Properly throw datatype mismatch in single-pass Analyzer URL: https://github.com/apache/spark/pull/50697 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [PR] [SPARK-51900][SQL] Properly throw datatype mismatch in single-pass Analyzer [spark]

2025-04-25 Thread via GitHub
MaxGekk commented on PR #50697: URL: https://github.com/apache/spark/pull/50697#issuecomment-2831140760 +1, LGTM. Merging to master. Thank you, @vladimirg-db. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

Re: [PR] [SPARK-51739][PYTHON] Validate Arrow schema from mapInArrow & mapInPandas & DataSource [spark]

2025-04-25 Thread via GitHub
allisonwang-db commented on PR #50531: URL: https://github.com/apache/spark/pull/50531#issuecomment-2831044013 Thanks for the fix! But this is a breaking change. Can we document this in the migration guide? -- This is an automated message from the Apache Git Service. To respond to the me

Re: [PR] [SPARK-51596][SS] Fix concurrent StateStoreProvider maintenance and closing [spark]

2025-04-25 Thread via GitHub
ericm-db commented on code in PR #50595: URL: https://github.com/apache/spark/pull/50595#discussion_r2060768228 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/StateStore.scala: ## @@ -1095,6 +1131,14 @@ object StateStore extends Logging { } }

Re: [PR] [SPARK-51904][SS] Removing async metadata purging for StateSchemaV3 and ignoring non-batch files when listing OperatorMetadata files [spark]

2025-04-25 Thread via GitHub
HeartSaVioR commented on PR #50700: URL: https://github.com/apache/spark/pull/50700#issuecomment-2831641630 Thanks! Merging to master/4.0 (since this makes trouble with schema evolution). -- This is an automated message from the Apache Git Service. To respond to the message, please log on

Re: [PR] [SPARK-51922] [SS] Fix UTFDataFormatException thrown from StateStoreChangelogReaderFactory for v1 [spark]

2025-04-25 Thread via GitHub
anishshri-db commented on code in PR #50721: URL: https://github.com/apache/spark/pull/50721#discussion_r2060954104 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/StateStoreChangelog.scala: ## @@ -368,7 +368,10 @@ class StateStoreChangelogReaderFactory

Re: [PR] [SPARK-51922] [SS] Fix UTFDataFormatException thrown from StateStoreChangelogReaderFactory for v1 [spark]

2025-04-25 Thread via GitHub
liviazhu-db commented on code in PR #50721: URL: https://github.com/apache/spark/pull/50721#discussion_r2060955056 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/StateStoreChangelog.scala: ## @@ -368,7 +368,10 @@ class StateStoreChangelogReaderFactory(

Re: [PR] [SPARK-49919][SQL] Add special limits support for return content as JSON dataset [spark]

2025-04-25 Thread via GitHub
github-actions[bot] closed pull request #48407: [SPARK-49919][SQL] Add special limits support for return content as JSON dataset URL: https://github.com/apache/spark/pull/48407 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

Re: [PR] [SPARK-50758][K8S]Mounts the krb5 config map on the executor pod [spark]

2025-04-25 Thread via GitHub
github-actions[bot] commented on PR #49467: URL: https://github.com/apache/spark/pull/49467#issuecomment-2831655569 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.

Re: [PR] [SPARK-50800][PYTHON][TESTS] Upgrade python to 3.11 in Python Packaging test [spark]

2025-04-25 Thread via GitHub
github-actions[bot] commented on PR #49464: URL: https://github.com/apache/spark/pull/49464#issuecomment-2831655578 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.

Re: [PR] [SPARK-48665][PYTHON][CONNECT] Support providing a dict in pyspark lit to create a map. [spark]

2025-04-25 Thread via GitHub
github-actions[bot] closed pull request #49318: [SPARK-48665][PYTHON][CONNECT] Support providing a dict in pyspark lit to create a map. URL: https://github.com/apache/spark/pull/49318 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to Git

Re: [PR] [SPARK-50802][SQL] Remove ApplyCharTypePadding rule [spark]

2025-04-25 Thread via GitHub
github-actions[bot] closed pull request #49470: [SPARK-50802][SQL] Remove ApplyCharTypePadding rule URL: https://github.com/apache/spark/pull/49470 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to th

Re: [PR] [SPARK-51922] [SS] Fix UTFDataFormatException thrown from StateStoreChangelogReaderFactory for v1 [spark]

2025-04-25 Thread via GitHub
HeartSaVioR commented on PR #50721: URL: https://github.com/apache/spark/pull/50721#issuecomment-2831874969 Thanks! Merging to master/4.0 (The fix is straightforward and low risk.) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitH