Re: [PR] [SPARK-51902][SQL] Enforce check constraint on table insertion [spark]

2025-05-01 Thread via GitHub
gengliangwang commented on code in PR #50761: URL: https://github.com/apache/spark/pull/50761#discussion_r2070951619 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveTableConstraint.scala: ## @@ -0,0 +1,70 @@ +/* + * Licensed to the Apache Software Fo

Re: [PR] [SPARK-51756][CORE] Computes RowBasedChecksum in ShuffleWriters [spark]

2025-05-01 Thread via GitHub
attilapiros commented on PR #50230: URL: https://github.com/apache/spark/pull/50230#issuecomment-2846072525 I miss the logging: I would prefer to have some log lines which helps to figure out what has happened regarding the row based checksums during a run. At least at debug level. WDYT?

Re: [PR] [SPARK-51972][SS] State Store file integrity verification using checksum [spark]

2025-05-01 Thread via GitHub
anishshri-db commented on code in PR #50773: URL: https://github.com/apache/spark/pull/50773#discussion_r2070974574 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/ChecksumCheckpointFileManager.scala: ## @@ -0,0 +1,512 @@ +/* + * Licensed to the Apache Softwa

Re: [PR] [SPARK-51756][CORE] Computes RowBasedChecksum in ShuffleWriters [spark]

2025-05-01 Thread via GitHub
attilapiros commented on code in PR #50230: URL: https://github.com/apache/spark/pull/50230#discussion_r2070893252 ## core/src/main/java/org/apache/spark/shuffle/checksum/RowBasedChecksum.scala: ## @@ -0,0 +1,114 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under

Re: [PR] [SPARK-51756][CORE] Computes RowBasedChecksum in ShuffleWriters [spark]

2025-05-01 Thread via GitHub
attilapiros commented on code in PR #50230: URL: https://github.com/apache/spark/pull/50230#discussion_r2070893252 ## core/src/main/java/org/apache/spark/shuffle/checksum/RowBasedChecksum.scala: ## @@ -0,0 +1,114 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under

Re: [PR] [SPARK-51902][SQL] Enforce check constraint on table insertion [spark]

2025-05-01 Thread via GitHub
aokolnychyi commented on code in PR #50761: URL: https://github.com/apache/spark/pull/50761#discussion_r2070887598 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveTableConstraint.scala: ## @@ -0,0 +1,70 @@ +/* + * Licensed to the Apache Software Foun

Re: [PR] [SPARK-51756][CORE] Computes RowBasedChecksum in ShuffleWriters [spark]

2025-05-01 Thread via GitHub
attilapiros commented on code in PR #50230: URL: https://github.com/apache/spark/pull/50230#discussion_r2070895177 ## core/src/main/java/org/apache/spark/shuffle/checksum/RowBasedChecksum.scala: ## @@ -0,0 +1,114 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under

Re: [PR] [MINOR][PS][DOC] Update pandas API on Spark option doc [spark]

2025-05-01 Thread via GitHub
ueshin commented on PR #50777: URL: https://github.com/apache/spark/pull/50777#issuecomment-2845935604 The test failure should be fixed by #50778. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] [SPARK-51964][SQL] Correctly resolve attributes from hidden output in ORDER BY and HAVING on top of an Aggregate in single-pass Analyzer [spark]

2025-05-01 Thread via GitHub
sririshindra commented on code in PR #50769: URL: https://github.com/apache/spark/pull/50769#discussion_r2070179418 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/resolver/AggregateResolver.scala: ## @@ -329,4 +332,17 @@ class AggregateResolver(operatorRes

Re: [PR] [SPARK-51016][SQL] Non-deterministic SQL expressions should set indeterminate map stage output level [spark]

2025-05-01 Thread via GitHub
ahshahid commented on PR #50757: URL: https://github.com/apache/spark/pull/50757#issuecomment-2845350793 I also doubt that user would be able to specify the inDeterminancy flag in the Map operations as I think it is going to make it complicated for user to understand its impact, and if inad

Re: [PR] [SPARK-51016][SQL] Non-deterministic SQL expressions should set indeterminate map stage output level [spark]

2025-05-01 Thread via GitHub
peter-toth commented on PR #50757: URL: https://github.com/apache/spark/pull/50757#issuecomment-2845349959 Thanks @cloud-fan , somehow I missed that PR. Runtime shuffle checksum seems like a good idea, but it must come with some costs as well. -- This is an automated message from the Apac

Re: [PR] [SPARK-51978] Upgrade `kubernetes-client` to 7.2.0 for K8s 1.33 [spark-kubernetes-operator]

2025-05-01 Thread via GitHub
viirya commented on PR #180: URL: https://github.com/apache/spark-kubernetes-operator/pull/180#issuecomment-2845367431 Pending CI -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific com

Re: [PR] [SPARK-51016][SQL] Non-deterministic SQL expressions should set indeterminate map stage output level [spark]

2025-05-01 Thread via GitHub
ahshahid commented on code in PR #50757: URL: https://github.com/apache/spark/pull/50757#discussion_r2070588096 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/QueryPlan.scala: ## @@ -103,13 +103,21 @@ abstract class QueryPlan[PlanType <: QueryPlan[PlanType]]

Re: [PR] [SPARK-51016][SQL] Non-deterministic SQL expressions should set indeterminate map stage output level [spark]

2025-05-01 Thread via GitHub
ahshahid commented on code in PR #50757: URL: https://github.com/apache/spark/pull/50757#discussion_r2070581649 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/QueryPlan.scala: ## @@ -103,13 +103,21 @@ abstract class QueryPlan[PlanType <: QueryPlan[PlanType]]

Re: [PR] [SPARK-51955] Adding release() to ReadStateStore interface and reusing ReadStore for Streaming Aggregations [spark]

2025-05-01 Thread via GitHub
liviazhu-db commented on code in PR #50742: URL: https://github.com/apache/spark/pull/50742#discussion_r2070593834 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDBStateStoreProvider.scala: ## @@ -446,17 +459,48 @@ private[sql] class RocksDBStateS

[PR] [SPARK-51977] Improve `SparkSQLRepl` to support multiple lines [spark-connect-swift]

2025-05-01 Thread via GitHub
dongjoon-hyun opened a new pull request, #102: URL: https://github.com/apache/spark-connect-swift/pull/102 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change?

Re: [PR] [SPARK-51291] Reclassify validation errors thrown from state store loading [spark]

2025-05-01 Thread via GitHub
micheal-o commented on code in PR #50045: URL: https://github.com/apache/spark/pull/50045#discussion_r2069927112 ## common/utils/src/main/resources/error/error-conditions.json: ## @@ -324,6 +324,17 @@ "The change log writer version cannot be ." ] }, +

Re: [PR] [SPARK-51955] Adding release() to ReadStateStore interface and reusing ReadStore for Streaming Aggregations [spark]

2025-05-01 Thread via GitHub
ericm-db commented on code in PR #50742: URL: https://github.com/apache/spark/pull/50742#discussion_r2070607523 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDBStateStoreProvider.scala: ## @@ -446,17 +459,48 @@ private[sql] class RocksDBStateStor

Re: [PR] [SPARK-51955] Adding release() to ReadStateStore interface and reusing ReadStore for Streaming Aggregations [spark]

2025-05-01 Thread via GitHub
ericm-db commented on code in PR #50742: URL: https://github.com/apache/spark/pull/50742#discussion_r2070607523 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDBStateStoreProvider.scala: ## @@ -446,17 +459,48 @@ private[sql] class RocksDBStateStor

Re: [PR] [SPARK-51955] Adding release() to ReadStateStore interface and reusing ReadStore for Streaming Aggregations [spark]

2025-05-01 Thread via GitHub
liviazhu-db commented on code in PR #50742: URL: https://github.com/apache/spark/pull/50742#discussion_r2070612638 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDBStateStoreProvider.scala: ## @@ -446,17 +459,48 @@ private[sql] class RocksDBStateS

Re: [PR] [SPARK-51955] Adding release() to ReadStateStore interface and reusing ReadStore for Streaming Aggregations [spark]

2025-05-01 Thread via GitHub
ericm-db commented on code in PR #50742: URL: https://github.com/apache/spark/pull/50742#discussion_r2070615297 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDBStateStoreProvider.scala: ## @@ -446,17 +459,48 @@ private[sql] class RocksDBStateStor

Re: [PR] [SPARK-51978] Upgrade `kubernetes-client` to 7.2.0 for K8s 1.33 [spark-kubernetes-operator]

2025-05-01 Thread via GitHub
dongjoon-hyun commented on PR #180: URL: https://github.com/apache/spark-kubernetes-operator/pull/180#issuecomment-2845432060 Thank you, @viirya ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] [SPARK-51955] Adding release() to ReadStateStore interface and reusing ReadStore for Streaming Aggregations [spark]

2025-05-01 Thread via GitHub
ericm-db commented on code in PR #50742: URL: https://github.com/apache/spark/pull/50742#discussion_r2070624489 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDBStateStoreProvider.scala: ## @@ -446,17 +459,48 @@ private[sql] class RocksDBStateStor

Re: [PR] [SPARK-51955] Adding release() to ReadStateStore interface and reusing ReadStore for Streaming Aggregations [spark]

2025-05-01 Thread via GitHub
ericm-db commented on PR #50742: URL: https://github.com/apache/spark/pull/50742#issuecomment-2845440177 > Looks good! Could you add a test in StateStoreRDDSuite to check the ThreadLocal logic correctly passes the readstore to the writestore too? Yup, working on that rn! -- This is an

Re: [PR] [SPARK-51964][SQL] Correctly resolve attributes from hidden output in ORDER BY and HAVING on top of an Aggregate in single-pass Analyzer [spark]

2025-05-01 Thread via GitHub
sririshindra commented on PR #50769: URL: https://github.com/apache/spark/pull/50769#issuecomment-2845007884 @vladimirg-db , Could you please point me any existing tests that might have covered this scenario? I am thinking there should be a test that checks for the following query

Re: [PR] [SPARK-51964][SQL] Correctly resolve attributes from hidden output in ORDER BY and HAVING on top of an Aggregate in single-pass Analyzer [spark]

2025-05-01 Thread via GitHub
sririshindra commented on PR #50769: URL: https://github.com/apache/spark/pull/50769#issuecomment-2845039363 > @sririshindra here's a test that fails for single-pass Analyzer at the moment: https://github.com/apache/spark/blob/master/sql/core/src/test/resources/sql-tests/inputs/order-by.sql

Re: [PR] [SPARK-51973][K8S][BUILD] Upgrade `kubernetes-client` to 7.2.0 for K8s 1.33 [spark]

2025-05-01 Thread via GitHub
dongjoon-hyun commented on PR #50775: URL: https://github.com/apache/spark/pull/50775#issuecomment-2845044708 All K8s related unit and integration tests passed. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL a

Re: [PR] [SPARK-51973][K8S][BUILD] Upgrade `kubernetes-client` to 7.2.0 for K8s 1.33 [spark]

2025-05-01 Thread via GitHub
dongjoon-hyun commented on PR #50775: URL: https://github.com/apache/spark/pull/50775#issuecomment-2845045137 Could you review this PR when you have some time, @LuciferYang ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

Re: [PR] [SPARK-51964][SQL] Correctly resolve attributes from hidden output in ORDER BY and HAVING on top of an Aggregate in single-pass Analyzer [spark]

2025-05-01 Thread via GitHub
vladimirg-db commented on PR #50769: URL: https://github.com/apache/spark/pull/50769#issuecomment-2845048241 @sririshindra single-pass Analyzer is a project to replace the current fixed-point Analyzer and is currently under development. It's not yet enabled by default (not running in CI). I

Re: [PR] [SPARK-51973][K8S][BUILD] Upgrade `kubernetes-client` to 7.2.0 for K8s 1.33 [spark]

2025-05-01 Thread via GitHub
LuciferYang commented on PR #50775: URL: https://github.com/apache/spark/pull/50775#issuecomment-2845064073 Merged into master. Thanks @dongjoon-hyun -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [PR] [SPARK-51973][K8S][BUILD] Upgrade `kubernetes-client` to 7.2.0 for K8s 1.33 [spark]

2025-05-01 Thread via GitHub
LuciferYang closed pull request #50775: [SPARK-51973][K8S][BUILD] Upgrade `kubernetes-client` to 7.2.0 for K8s 1.33 URL: https://github.com/apache/spark/pull/50775 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL a

Re: [PR] [SPARK-51973][K8S][BUILD] Upgrade `kubernetes-client` to 7.2.0 for K8s 1.33 [spark]

2025-05-01 Thread via GitHub
LuciferYang commented on PR #50775: URL: https://github.com/apache/spark/pull/50775#issuecomment-2845061751 It seems that the failure of PySpark is not related to the current PR (Pull Request). We can merge this one. -- This is an automated message from the Apache Git Service. To respond

Re: [PR] [SPARK-51016][SQL] Non-deterministic SQL expressions should set indeterminate map stage output level [spark]

2025-05-01 Thread via GitHub
peter-toth commented on code in PR #50757: URL: https://github.com/apache/spark/pull/50757#discussion_r2069998646 ## core/src/main/scala/org/apache/spark/rdd/RDD.scala: ## Review Comment: 1., 2. This PR doesn't change deterministic calculation of plan nodes so it shouldn't

Re: [PR] [SPARK-51964][SQL] Correctly resolve attributes from hidden output in ORDER BY and HAVING on top of an Aggregate in single-pass Analyzer [spark]

2025-05-01 Thread via GitHub
vladimirg-db commented on PR #50769: URL: https://github.com/apache/spark/pull/50769#issuecomment-2844630029 @cloud-fan tests passed, please take a look. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

Re: [PR] [SPARK-51016][SQL] Non-deterministic SQL expressions should set indeterminate map stage output level [spark]

2025-05-01 Thread via GitHub
cloud-fan commented on PR #50757: URL: https://github.com/apache/spark/pull/50757#issuecomment-2844972082 My worry is that the `Expression#deterministic` is a bit abused in Spark, e.g. `SparkPartitionID`, `InputFileName` are marked as nondeterministic, but they produce the same result when

[PR] [MINOR][PS][DOC] Update pandas API on Spark option doc [spark]

2025-05-01 Thread via GitHub
ueshin opened a new pull request, #50777: URL: https://github.com/apache/spark/pull/50777 ### What changes were proposed in this pull request? Updates pandas API on Spark option doc. ### Why are the changes needed? The descriptions for some options are outdated. ##

Re: [PR] [SPARK-51978] Upgrade `kubernetes-client` to 7.2.0 for K8s 1.33 [spark-kubernetes-operator]

2025-05-01 Thread via GitHub
dongjoon-hyun closed pull request #180: [SPARK-51978] Upgrade `kubernetes-client` to 7.2.0 for K8s 1.33 URL: https://github.com/apache/spark-kubernetes-operator/pull/180 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

Re: [PR] [SPARK-51906][SQL] Dsv2 expressions in alter table add columns [spark]

2025-05-01 Thread via GitHub
szehon-ho commented on code in PR #50701: URL: https://github.com/apache/spark/pull/50701#discussion_r2069680175 ## sql/core/src/test/scala/org/apache/spark/sql/connector/DataSourceV2SQLSuite.scala: ## @@ -3560,11 +3560,10 @@ class DataSourceV2SQLSuiteV1Filter val excep

[PR] [SPARK-51966][PYTHON] Replace select.select() with select.poll() when running on POSIX os [spark]

2025-05-01 Thread via GitHub
wjszlachta-man opened a new pull request, #50774: URL: https://github.com/apache/spark/pull/50774 ### What changes were proposed in this pull request? On glibc based Linux systems `select()` can monitor only file descriptor numbers that are less than `FD_SETSIZE` (1024). This i

[PR] [SPARK-51976] Add `array`, `map`, `timestamp`, `posexplode` test queries [spark-connect-swift]

2025-05-01 Thread via GitHub
dongjoon-hyun opened a new pull request, #101: URL: https://github.com/apache/spark-connect-swift/pull/101 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change?

Re: [PR] [SPARK-51871] Improve `SQLTests` to check column names [spark-connect-swift]

2025-05-01 Thread via GitHub
dongjoon-hyun commented on PR #85: URL: https://github.com/apache/spark-connect-swift/pull/85#issuecomment-2844701280 Thank you, @viirya ! Merged to main. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [PR] [SPARK-51871] Improve `SQLTests` to check column names [spark-connect-swift]

2025-05-01 Thread via GitHub
dongjoon-hyun closed pull request #85: [SPARK-51871] Improve `SQLTests` to check column names URL: https://github.com/apache/spark-connect-swift/pull/85 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [PR] [SPARK-51847][PYTHON] Extend PySpark testing framework util functions with basic data tests [spark]

2025-05-01 Thread via GitHub
stanlocht commented on PR #50644: URL: https://github.com/apache/spark/pull/50644#issuecomment-2844704988 Hi @HyukjinKwon, @zhengruifeng, @asl3 — just following up to see if you might have a chance to review the PR when time allows. Appreciate your time and input! -- This is an automated

Re: [PR] [SPARK-51964][SQL] Correctly resolve attributes from hidden output in ORDER BY and HAVING on top of an Aggregate in single-pass Analyzer [spark]

2025-05-01 Thread via GitHub
vladimirg-db commented on code in PR #50769: URL: https://github.com/apache/spark/pull/50769#discussion_r2070189488 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/resolver/AggregateResolver.scala: ## @@ -109,7 +111,8 @@ class AggregateResolver(operatorReso

Re: [PR] [SPARK-51976] Add `array`, `map`, `timestamp`, `posexplode` test queries [spark-connect-swift]

2025-05-01 Thread via GitHub
dongjoon-hyun commented on PR #101: URL: https://github.com/apache/spark-connect-swift/pull/101#issuecomment-2845281778 Thank you! Merged to main. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] [SPARK-51976] Add `array`, `map`, `timestamp`, `posexplode` test queries [spark-connect-swift]

2025-05-01 Thread via GitHub
dongjoon-hyun closed pull request #101: [SPARK-51976] Add `array`, `map`, `timestamp`, `posexplode` test queries URL: https://github.com/apache/spark-connect-swift/pull/101 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[PR] [SPARK-51978] Upgrade `kubernetes-client` to 7.2.0 for K8s 1.33 [spark-kubernetes-operator]

2025-05-01 Thread via GitHub
dongjoon-hyun opened a new pull request, #180: URL: https://github.com/apache/spark-kubernetes-operator/pull/180 ### What changes were proposed in this pull request? This PR aims to upgrade `kubernetes-client` to 7.2.0 like Apache Spark. - https://github.com/apache/spark/pull/50775

Re: [PR] [SPARK-51016][SQL] Non-deterministic SQL expressions should set indeterminate map stage output level [spark]

2025-05-01 Thread via GitHub
ahshahid commented on code in PR #50757: URL: https://github.com/apache/spark/pull/50757#discussion_r2070564298 ## core/src/main/scala/org/apache/spark/rdd/RDD.scala: ## Review Comment: > @ahshahid Regarding point 3 I was open for your change and asked you to extend your i

Re: [PR] [SPARK-51756][CORE] Computes RowBasedChecksum in ShuffleWriters [spark]

2025-05-01 Thread via GitHub
peter-toth commented on code in PR #50230: URL: https://github.com/apache/spark/pull/50230#discussion_r2070569934 ## core/src/main/java/org/apache/spark/shuffle/checksum/RowBasedChecksum.scala: ## @@ -0,0 +1,114 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under o

[PR] [SPARK-51979][SQL][TEST] Add more SQL Query tests for SQL TVF return columns [spark]

2025-05-01 Thread via GitHub
allisonwang-db opened a new pull request, #50776: URL: https://github.com/apache/spark/pull/50776 ### What changes were proposed in this pull request? This PR adds more SQL query tests for SQL User-defined table function with various valid and invalid return columns. ##

Re: [PR] [SPARK-51016][SQL] Non-deterministic SQL expressions should set indeterminate map stage output level [spark]

2025-05-01 Thread via GitHub
ahshahid commented on PR #50757: URL: https://github.com/apache/spark/pull/50757#issuecomment-2845342978 IMHO the issue of inDeterministic value of an expression should be looked only from the basis of whether ShuffleStage can loose/add row because of inDeterministic nature of the expressio

Re: [PR] [SPARK-51979][SQL][TESTS] Add more SQL Query tests for SQL TVF return columns [spark]

2025-05-01 Thread via GitHub
allisonwang-db commented on PR #50776: URL: https://github.com/apache/spark/pull/50776#issuecomment-2845344033 cc @cloud-fan -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] [SPARK-51973][K8S][BUILD] Upgrade `kubernetes-client` to 7.2.0 for K8s 1.33 [spark]

2025-05-01 Thread via GitHub
dongjoon-hyun commented on PR #50775: URL: https://github.com/apache/spark/pull/50775#issuecomment-2845096329 Thank you, @LuciferYang ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specifi

[PR] [SPARK-51973][K8S][BUILD] Upgrade `kubernetes-client` to 7.2.0 for K8s 1.33 [spark]

2025-05-01 Thread via GitHub
dongjoon-hyun opened a new pull request, #50775: URL: https://github.com/apache/spark/pull/50775 …s 1.33 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change?

Re: [PR] [SPARK-51958][CORE][SQL][TESTS] Replace hardcoded icu version with `c.ibm.icu.util.VersionInfo.ICU_VERSION` it tests [spark]

2025-05-01 Thread via GitHub
LuciferYang commented on PR #50764: URL: https://github.com/apache/spark/pull/50764#issuecomment-2844795461 Thanks @dongjoon-hyun @HyukjinKwon and @zhengruifeng -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

Re: [PR] [SPARK-51977] Improve `SparkSQLRepl` to support multiple lines [spark-connect-swift]

2025-05-01 Thread via GitHub
dongjoon-hyun commented on PR #102: URL: https://github.com/apache/spark-connect-swift/pull/102#issuecomment-2845240542 Thank you, @viirya . Merged to main. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abov

Re: [PR] [SPARK-51977] Improve `SparkSQLRepl` to support multiple lines [spark-connect-swift]

2025-05-01 Thread via GitHub
dongjoon-hyun closed pull request #102: [SPARK-51977] Improve `SparkSQLRepl` to support multiple lines URL: https://github.com/apache/spark-connect-swift/pull/102 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL ab

Re: [PR] [SPARK-51291] Reclassify validation errors thrown from state store loading [spark]

2025-05-01 Thread via GitHub
liviazhu-db commented on code in PR #50045: URL: https://github.com/apache/spark/pull/50045#discussion_r2070512349 ## common/utils/src/main/resources/error/error-conditions.json: ## @@ -324,6 +324,17 @@ "The change log writer version cannot be ." ] },

Re: [PR] [SPARK-51291] Reclassify validation errors thrown from state store loading [spark]

2025-05-01 Thread via GitHub
liviazhu-db commented on code in PR #50045: URL: https://github.com/apache/spark/pull/50045#discussion_r2070515391 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/HDFSBackedStateStoreProvider.scala: ## @@ -426,7 +423,7 @@ private[sql] class HDFSBackedSt

Re: [PR] [SPARK-51291] Reclassify validation errors thrown from state store loading [spark]

2025-05-01 Thread via GitHub
liviazhu-db commented on code in PR #50045: URL: https://github.com/apache/spark/pull/50045#discussion_r2070519903 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/StateStoreErrors.scala: ## @@ -435,15 +452,31 @@ class StateStoreFailedToGetChangelogWrite

Re: [PR] [SPARK-51291] Reclassify validation errors thrown from state store loading [spark]

2025-05-01 Thread via GitHub
liviazhu-db commented on code in PR #50045: URL: https://github.com/apache/spark/pull/50045#discussion_r2070521155 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/StateStoreErrors.scala: ## @@ -435,15 +452,31 @@ class StateStoreFailedToGetChangelogWrite

Re: [PR] [SPARK-51976] Add `array`, `map`, `timestamp`, `posexplode` test queries [spark-connect-swift]

2025-05-01 Thread via GitHub
dongjoon-hyun commented on PR #101: URL: https://github.com/apache/spark-connect-swift/pull/101#issuecomment-2845256608 Could you review this test case PR when you have some time, @huaxingao ? -- This is an automated message from the Apache Git Service. To respond to the message, please l

Re: [PR] [SPARK-51291] Reclassify validation errors thrown from state store loading [spark]

2025-05-01 Thread via GitHub
liviazhu-db commented on code in PR #50045: URL: https://github.com/apache/spark/pull/50045#discussion_r2070515391 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/HDFSBackedStateStoreProvider.scala: ## @@ -426,7 +423,7 @@ private[sql] class HDFSBackedSt

Re: [PR] [SPARK-51182][SQL] DataFrameWriter should throw dataPathNotSpecifiedError when path is not specified [spark]

2025-05-01 Thread via GitHub
vrozov commented on PR #49928: URL: https://github.com/apache/spark/pull/49928#issuecomment-2844998622 @cloud-fan The positive test cases are already covered in the `JavaDataFrameReaderWriterSuite.java`. Please see https://github.com/apache/spark/blob/7019d5e63b7218049bacf3392716bf6faf8f82a

Re: [PR] [SPARK-51964][SQL] Correctly resolve attributes from hidden output in ORDER BY and HAVING on top of an Aggregate in single-pass Analyzer [spark]

2025-05-01 Thread via GitHub
vladimirg-db commented on PR #50769: URL: https://github.com/apache/spark/pull/50769#issuecomment-2845016716 @sririshindra here's a test that fails for single-pass Analyzer at the moment: https://github.com/apache/spark/blob/master/sql/core/src/test/resources/sql-tests/inputs/order-by.sql#L

Re: [PR] [SPARK-51955] Adding release() to ReadStateStore interface and reusing ReadStore for Streaming Aggregations [spark]

2025-05-01 Thread via GitHub
anishshri-db commented on code in PR #50742: URL: https://github.com/apache/spark/pull/50742#discussion_r2070673830 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/StateStoreRDD.scala: ## @@ -27,6 +27,28 @@ import org.apache.spark.sql.internal.SessionSt

Re: [PR] [SPARK-51955] Adding release() to ReadStateStore interface and reusing ReadStore for Streaming Aggregations [spark]

2025-05-01 Thread via GitHub
ericm-db commented on code in PR #50742: URL: https://github.com/apache/spark/pull/50742#discussion_r2070674011 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/StateStoreRDD.scala: ## @@ -27,6 +27,28 @@ import org.apache.spark.sql.internal.SessionState

Re: [PR] [SPARK-51955] Adding release() to ReadStateStore interface and reusing ReadStore for Streaming Aggregations [spark]

2025-05-01 Thread via GitHub
anishshri-db commented on code in PR #50742: URL: https://github.com/apache/spark/pull/50742#discussion_r2070681239 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/StateStoreRDD.scala: ## @@ -27,6 +27,43 @@ import org.apache.spark.sql.internal.SessionSt

[PR] Enable `--use-pep517` in `dev/run-pip-tests` [spark]

2025-05-01 Thread via GitHub
LuciferYang opened a new pull request, #50778: URL: https://github.com/apache/spark/pull/50778 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How

[PR] [SPARK-51983][PS] Prepare the test environment for pandas API on Spark with ANSI mode enabled [spark]

2025-05-01 Thread via GitHub
ueshin opened a new pull request, #50779: URL: https://github.com/apache/spark/pull/50779 ### What changes were proposed in this pull request? Prepares the test environment for pandas API on Spark with ANSI mode enabled. - Remove forcibly disabling ANSI mode in tests - Add a

Re: [PR] [SPARK-51756][CORE] Computes RowBasedChecksum in ShuffleWriters [spark]

2025-05-01 Thread via GitHub
attilapiros commented on code in PR #50230: URL: https://github.com/apache/spark/pull/50230#discussion_r2070923376 ## core/src/main/java/org/apache/spark/shuffle/checksum/RowBasedChecksum.scala: ## @@ -0,0 +1,114 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under

Re: [PR] [SPARK-51756][CORE] Computes RowBasedChecksum in ShuffleWriters [spark]

2025-05-01 Thread via GitHub
attilapiros commented on code in PR #50230: URL: https://github.com/apache/spark/pull/50230#discussion_r2070928028 ## core/src/main/java/org/apache/spark/shuffle/sort/UnsafeShuffleWriter.java: ## @@ -330,7 +344,8 @@ private long[] mergeSpillsUsingStandardWriter(SpillInfo[] spil

Re: [PR] [SPARK-51972][SS] State Store file integrity verification using checksum [spark]

2025-05-01 Thread via GitHub
anishshri-db commented on code in PR #50773: URL: https://github.com/apache/spark/pull/50773#discussion_r2070916110 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDB.scala: ## @@ -1247,6 +1255,8 @@ class RocksDB( silentDeleteRecursively(loc

Re: [PR] [SPARK-51756][CORE] Computes RowBasedChecksum in ShuffleWriters [spark]

2025-05-01 Thread via GitHub
attilapiros commented on code in PR #50230: URL: https://github.com/apache/spark/pull/50230#discussion_r2070921381 ## core/src/main/java/org/apache/spark/shuffle/sort/BypassMergeSortShuffleWriter.java: ## @@ -199,6 +214,14 @@ public long[] getPartitionLengths() { return par

Re: [PR] [SPARK-51756][CORE] Computes RowBasedChecksum in ShuffleWriters [spark]

2025-05-01 Thread via GitHub
attilapiros commented on code in PR #50230: URL: https://github.com/apache/spark/pull/50230#discussion_r2070928418 ## core/src/main/java/org/apache/spark/shuffle/sort/UnsafeShuffleWriter.java: ## @@ -163,6 +167,13 @@ public long getPeakMemoryUsedBytes() { return peakMemoryU

[PR] [SPARK-51981][Strcutured Streaming]Add JobTags to queryStartedEvent [spark]

2025-05-01 Thread via GitHub
gjxdxh opened a new pull request, #50780: URL: https://github.com/apache/spark/pull/50780 ### What changes were proposed in this pull request? Adding a new jobTags parameter for QueryStartedEvent so that it can be connected to the actual spark connect command that triggered this s

Re: [PR] [SPARK-51756][CORE] Computes RowBasedChecksum in ShuffleWriters [spark]

2025-05-01 Thread via GitHub
attilapiros commented on code in PR #50230: URL: https://github.com/apache/spark/pull/50230#discussion_r2070893252 ## core/src/main/java/org/apache/spark/shuffle/checksum/RowBasedChecksum.scala: ## @@ -0,0 +1,114 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under

Re: [PR] [SPARK-51756][CORE] Computes RowBasedChecksum in ShuffleWriters [spark]

2025-05-01 Thread via GitHub
attilapiros commented on code in PR #50230: URL: https://github.com/apache/spark/pull/50230#discussion_r2070895177 ## core/src/main/java/org/apache/spark/shuffle/checksum/RowBasedChecksum.scala: ## @@ -0,0 +1,114 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under

Re: [PR] [SPARK-51756][CORE] Computes RowBasedChecksum in ShuffleWriters [spark]

2025-05-01 Thread via GitHub
attilapiros commented on code in PR #50230: URL: https://github.com/apache/spark/pull/50230#discussion_r2070945270 ## sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/UnsafeRowChecksum.scala: ## @@ -0,0 +1,57 @@ +/* + * Licensed to the Apache Software Foundat

Re: [PR] [SPARK-51847][PYTHON] Extend PySpark testing framework util functions with basic data tests [spark]

2025-05-01 Thread via GitHub
asl3 commented on code in PR #50644: URL: https://github.com/apache/spark/pull/50644#discussion_r2059059636 ## python/pyspark/testing/utils.py: ## @@ -580,6 +598,7 @@ def compare_datatypes_ignore_nullable(dt1: Any, dt2: Any): if TYPE_CHECKING: import pandas + Review Com

Re: [PR] [SPARK-51016][SQL] Non-deterministic SQL expressions should set indeterminate map stage output level [spark]

2025-05-01 Thread via GitHub
attilapiros commented on code in PR #50757: URL: https://github.com/apache/spark/pull/50757#discussion_r2070814828 ## core/src/main/scala/org/apache/spark/rdd/RDD.scala: ## Review Comment: Please check this lines in RDD.scala: https://github.com/apache/spark/blob/085bfc

Re: [PR] [SPARK-51756][CORE] Computes RowBasedChecksum in ShuffleWriters [spark]

2025-05-01 Thread via GitHub
attilapiros commented on code in PR #50230: URL: https://github.com/apache/spark/pull/50230#discussion_r2070910587 ## core/src/main/java/org/apache/spark/shuffle/sort/BypassMergeSortShuffleWriter.java: ## @@ -104,6 +105,14 @@ final class BypassMergeSortShuffleWriter private l

Re: [PR] [SPARK-51902][SQL] Enforce check constraint on table insertion [spark]

2025-05-01 Thread via GitHub
gengliangwang commented on code in PR #50761: URL: https://github.com/apache/spark/pull/50761#discussion_r2070911373 ## common/utils/src/main/resources/error/error-conditions.json: ## @@ -544,6 +544,14 @@ ], "sqlState" : "56000" }, + "CHECK_CONSTRAINT_VIOLATION" :

Re: [PR] [SPARK-51756][CORE] Computes RowBasedChecksum in ShuffleWriters [spark]

2025-05-01 Thread via GitHub
attilapiros commented on code in PR #50230: URL: https://github.com/apache/spark/pull/50230#discussion_r2070910837 ## core/src/main/java/org/apache/spark/shuffle/sort/BypassMergeSortShuffleWriter.java: ## @@ -132,6 +141,8 @@ final class BypassMergeSortShuffleWriter this.ser

Re: [PR] [SPARK-51902][SQL] Enforce check constraint on table insertion [spark]

2025-05-01 Thread via GitHub
gengliangwang commented on code in PR #50761: URL: https://github.com/apache/spark/pull/50761#discussion_r2070913049 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveTableConstraint.scala: ## @@ -0,0 +1,70 @@ +/* + * Licensed to the Apache Software Fo

Re: [PR] [SPARK-51902][SQL] Enforce check constraint on table insertion [spark]

2025-05-01 Thread via GitHub
gengliangwang commented on code in PR #50761: URL: https://github.com/apache/spark/pull/50761#discussion_r2070938781 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveTableConstraint.scala: ## @@ -0,0 +1,70 @@ +/* + * Licensed to the Apache Software Fo

Re: [PR] [SPARK-51756][CORE] Computes RowBasedChecksum in ShuffleWriters [spark]

2025-05-01 Thread via GitHub
attilapiros commented on code in PR #50230: URL: https://github.com/apache/spark/pull/50230#discussion_r2070938550 ## core/src/main/java/org/apache/spark/shuffle/checksum/RowBasedChecksum.scala: ## @@ -0,0 +1,114 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under

Re: [PR] [SPARK-51902][SQL] Enforce check constraint on table insertion [spark]

2025-05-01 Thread via GitHub
gengliangwang commented on code in PR #50761: URL: https://github.com/apache/spark/pull/50761#discussion_r2070941624 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/constraints.scala: ## @@ -259,3 +263,94 @@ case class ForeignKeyConstraint( copy(use

Re: [PR] [SPARK-51756][CORE] Computes RowBasedChecksum in ShuffleWriters [spark]

2025-05-01 Thread via GitHub
attilapiros commented on code in PR #50230: URL: https://github.com/apache/spark/pull/50230#discussion_r2070942000 ## core/src/main/java/org/apache/spark/shuffle/checksum/RowBasedChecksum.scala: ## @@ -0,0 +1,114 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under

Re: [PR] [SPARK-51902][SQL] Enforce check constraint on table insertion [spark]

2025-05-01 Thread via GitHub
gengliangwang commented on code in PR #50761: URL: https://github.com/apache/spark/pull/50761#discussion_r2070943644 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveTableConstraint.scala: ## @@ -0,0 +1,70 @@ +/* + * Licensed to the Apache Software Fo

Re: [PR] [SPARK-51016][SQL] Non-deterministic SQL expressions should set indeterminate map stage output level [spark]

2025-05-01 Thread via GitHub
ahshahid commented on PR #50757: URL: https://github.com/apache/spark/pull/50757#issuecomment-2845812865 I am out right now.. will check back.. but what I m trying to say is that a map stage should be marked inDeterminate iff the partitioner is using indeterministic Val. And afaik only

Re: [PR] [SPARK-51883][DOCS][PYTHON] Python Data Source user guide for filter pushdown [spark]

2025-05-01 Thread via GitHub
wengh commented on code in PR #50684: URL: https://github.com/apache/spark/pull/50684#discussion_r2071064159 ## python/pyspark/sql/datasource.py: ## @@ -539,6 +539,11 @@ def pushFilters(self, filters: List["Filter"]) -> Iterable["Filter"]: This method is allowed to mod

Re: [PR] [SPARK-51883][DOCS][PYTHON] Python Data Source user guide for filter pushdown [spark]

2025-05-01 Thread via GitHub
wengh commented on code in PR #50684: URL: https://github.com/apache/spark/pull/50684#discussion_r2071064597 ## python/docs/source/user_guide/sql/python_data_source.rst: ## @@ -356,17 +356,28 @@ For library that are used inside a method, it must be imported inside the method

Re: [PR] [SPARK-51821][CORE] Call interrupt() without holding uninterruptibleLock to avoid possible deadlock [spark]

2025-05-01 Thread via GitHub
vrozov commented on code in PR #50594: URL: https://github.com/apache/spark/pull/50594#discussion_r2071066514 ## core/src/main/scala/org/apache/spark/util/UninterruptibleThread.scala: ## @@ -92,11 +110,27 @@ private[spark] class UninterruptibleThread( * interrupted until it

Re: [PR] [SPARK-51883][DOCS][PYTHON] Python Data Source user guide for filter pushdown [spark]

2025-05-01 Thread via GitHub
wengh commented on code in PR #50684: URL: https://github.com/apache/spark/pull/50684#discussion_r2071064597 ## python/docs/source/user_guide/sql/python_data_source.rst: ## @@ -356,17 +356,28 @@ For library that are used inside a method, it must be imported inside the method

Re: [PR] [SPARK-51906][SQL] Dsv2 expressions in alter table add columns [spark]

2025-05-01 Thread via GitHub
szehon-ho commented on PR #50701: URL: https://github.com/apache/spark/pull/50701#issuecomment-2846276506 @cloud-fan @aokolnychyi can you guys take a look when you have time? Thanks -- This is an automated message from the Apache Git Service. To respond to the message, please log on to Gi

Re: [PR] [SPARK-51906][SQL] Dsv2 expressions in alter table add columns [spark]

2025-05-01 Thread via GitHub
szehon-ho commented on code in PR #50701: URL: https://github.com/apache/spark/pull/50701#discussion_r2071075799 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala: ## @@ -3994,10 +3994,45 @@ class Analyzer(override val catalogManager: CatalogM

Re: [PR] [SPARK-51596][SS] Fix concurrent StateStoreProvider maintenance and closing [spark]

2025-05-01 Thread via GitHub
micheal-o commented on code in PR #50595: URL: https://github.com/apache/spark/pull/50595#discussion_r2071086887 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/StateStore.scala: ## @@ -1173,6 +1225,107 @@ object StateStore extends Logging { } }

Re: [PR] [SPARK-51980][PYTHON][TESTS] Enable `--use-pep517` in `dev/run-pip-tests` [spark]

2025-05-01 Thread via GitHub
dongjoon-hyun closed pull request #50778: [SPARK-51980][PYTHON][TESTS] Enable `--use-pep517` in `dev/run-pip-tests` URL: https://github.com/apache/spark/pull/50778 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL a

Re: [PR] [SPARK-51596][SS] Fix concurrent StateStoreProvider maintenance and closing [spark]

2025-05-01 Thread via GitHub
micheal-o commented on code in PR #50595: URL: https://github.com/apache/spark/pull/50595#discussion_r2071091954 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/StateStore.scala: ## @@ -,60 +1159,64 @@ object StateStore extends Logging { } }

  1   2   >