Re: [PR] [SPARK-52232][SQL] Fix non-deterministic queries to produce different results at every step [spark]

2025-05-22 Thread via GitHub
cloud-fan commented on code in PR #50957: URL: https://github.com/apache/spark/pull/50957#discussion_r2102671633 ## sql/core/src/test/scala/org/apache/spark/sql/execution/RecursiveCTESuite.scala: ## @@ -0,0 +1,103 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under

Re: [PR] [SPARK-52232][SQL] Fix non-deterministic queries to produce different results at every step [spark]

2025-05-22 Thread via GitHub
Pajaraja commented on code in PR #50957: URL: https://github.com/apache/spark/pull/50957#discussion_r2102694244 ## sql/core/src/test/scala/org/apache/spark/sql/execution/RecursiveCTESuite.scala: ## @@ -0,0 +1,103 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under

Re: [PR] [SPARK-52232][SQL] Fix non-deterministic queries to produce different results at every step [spark]

2025-05-22 Thread via GitHub
Pajaraja commented on code in PR #50957: URL: https://github.com/apache/spark/pull/50957#discussion_r2102694804 ## sql/core/src/main/scala/org/apache/spark/sql/execution/UnionLoopExec.scala: ## @@ -183,11 +183,24 @@ case class UnionLoopExec( // Main loop for obtaining the r

Re: [PR] [SPARK-52181] Increase variant size limit to 128MiB [spark]

2025-05-22 Thread via GitHub
cloud-fan commented on PR #50913: URL: https://github.com/apache/spark/pull/50913#issuecomment-2901398320 @tedyu you are welcome to add such a config, but I think it's hard as variant component is an individual module that does not depend on spark sql. -- This is an automated message from

Re: [PR] [SPARK-50137][HIVE] Avoid fallback to Hive-incompatible ways when table creation fails by thrift exception [spark]

2025-05-22 Thread via GitHub
dongjoon-hyun commented on code in PR #48668: URL: https://github.com/apache/spark/pull/48668#discussion_r2102684669 ## sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveExternalCatalogSuite.scala: ## @@ -241,4 +242,33 @@ class HiveExternalCatalogSuite extends ExternalCatal

Re: [PR] [SPARK-50137][SQL] Avoid fallback to Hive-incompatible ways when table creation fails by thrift exception [spark]

2025-05-22 Thread via GitHub
dongjoon-hyun closed pull request #50985: [SPARK-50137][SQL] Avoid fallback to Hive-incompatible ways when table creation fails by thrift exception URL: https://github.com/apache/spark/pull/50985 -- This is an automated message from the Apache Git Service. To respond to the message, please lo

Re: [PR] [SPARK-50137][SQL] Avoid fallback to Hive-incompatible ways when table creation fails by thrift exception [spark]

2025-05-22 Thread via GitHub
dongjoon-hyun commented on PR #50985: URL: https://github.com/apache/spark/pull/50985#issuecomment-2901424585 Merged to master/4.0. If you want this in branch-3.5, please make a backporting PR. I couldn't backport directly because there is a code conflict, @wecharyu . -- This is an

Re: [PR] [SPARK-50137][SQL] Avoid fallback to Hive-incompatible ways when table creation fails by thrift exception [spark]

2025-05-22 Thread via GitHub
dongjoon-hyun commented on PR #50985: URL: https://github.com/apache/spark/pull/50985#issuecomment-2901431462 I assigned SPARK-50137 to you. Thank you for your contribution. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

Re: [PR] [SPARK-52219][SQL] Schema level collation support for tables [spark]

2025-05-22 Thread via GitHub
cloud-fan commented on code in PR #50937: URL: https://github.com/apache/spark/pull/50937#discussion_r2102705961 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ApplyDefaultCollationToStringType.scala: ## @@ -91,6 +94,50 @@ object ApplyDefaultCollationToStr

Re: [PR] [SPARK-52240][SQL] Fix VectorizedDeltaLengthByteArrayReader.readBinary to handle current row [spark]

2025-05-22 Thread via GitHub
djspiewak commented on PR #50966: URL: https://github.com/apache/spark/pull/50966#issuecomment-2901442631 Small world! :D -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [PR] [SPARK-52181] Increase variant size limit to 128MiB [spark]

2025-05-22 Thread via GitHub
tedyu commented on PR #50913: URL: https://github.com/apache/spark/pull/50913#issuecomment-2901457189 It seems we can pass `Configuration` explicitly to `VariantVal.toString`: ``` def toString(conf: VariantConf): String ``` where `VariantConf` is a lightweight case class. This

Re: [PR] [SPARK-52260][SQL][TESTS] Add test for Update/Merge Into/Delete From table [spark]

2025-05-22 Thread via GitHub
cloud-fan commented on code in PR #50982: URL: https://github.com/apache/spark/pull/50982#discussion_r2102720850 ## sql/core/src/test/scala/org/apache/spark/sql/connector/DeleteFromTableSuiteBase.scala: ## @@ -63,6 +63,31 @@ abstract class DeleteFromTableSuiteBase extends RowLe

Re: [PR] [SPARK-52236][SQL] Standardize analyze exceptions for default value [spark]

2025-05-22 Thread via GitHub
cloud-fan commented on code in PR #50960: URL: https://github.com/apache/spark/pull/50960#discussion_r2102726510 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/ResolveDefaultColumnsUtil.scala: ## @@ -583,21 +583,29 @@ object ResolveDefaultColumns extends Query

Re: [PR] [SPARK-52153][SQL] Fix from_json and to_json with variant [spark]

2025-05-22 Thread via GitHub
chenhao-db commented on PR #50901: URL: https://github.com/apache/spark/pull/50901#issuecomment-2901649624 > where is the fix to respect `from_json` options? It is https://github.com/apache/spark/pull/50901/files#diff-33cdfba36c601bb3ecf067f47580b0627758f056f3d2d05984ec4dad24154cf5R15

Re: [PR] [SPARK-52032][SQL] Fix orc filter pushdown with null-safe equality operator [spark]

2025-05-22 Thread via GitHub
dongjoon-hyun commented on code in PR #50932: URL: https://github.com/apache/spark/pull/50932#discussion_r2102859149 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcFilters.scala: ## @@ -202,6 +202,10 @@ private[sql] object OrcFilters extends OrcFilt

Re: [PR] [SPARK-52153][SQL] Fix from_json and to_json with variant [spark]

2025-05-22 Thread via GitHub
cloud-fan closed pull request #50901: [SPARK-52153][SQL] Fix from_json and to_json with variant URL: https://github.com/apache/spark/pull/50901 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the sp

Re: [PR] [SPARK-52153][SQL] Fix from_json and to_json with variant [spark]

2025-05-22 Thread via GitHub
cloud-fan commented on PR #50901: URL: https://github.com/apache/spark/pull/50901#issuecomment-2901689446 thanks, merging to master/4.0! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specif

Re: [PR] [SPARK-51847][PYTHON] Extend PySpark testing framework util functions with basic data tests [spark]

2025-05-22 Thread via GitHub
stanlocht commented on PR #50644: URL: https://github.com/apache/spark/pull/50644#issuecomment-2901316972 Hi @HyukjinKwon, @zhengruifeng, @asl3 — just checking in on this PR. I’ve made the changes based on the earlier feedback, so let me know if there’s anything else you’d like to see. Woul

Re: [PR] [MINOR][INFRA] Limit build_python_3.11_macos execution time to up to 2 hours [spark]

2025-05-22 Thread via GitHub
dongjoon-hyun commented on PR #50978: URL: https://github.com/apache/spark/pull/50978#issuecomment-2901540665 Thank you, @zhengruifeng and @LuciferYang ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

Re: [PR] [SPARK-52233][SQL] Fix map_zip_with for Floating Point Types [spark]

2025-05-22 Thread via GitHub
cloud-fan commented on code in PR #50967: URL: https://github.com/apache/spark/pull/50967#discussion_r2102779173 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/higherOrderFunctions.scala: ## @@ -1124,23 +1124,40 @@ case class MapZipWith(left: Expression

Re: [PR] Corrected row index usage when exploding packed arrays in vectorized reader [spark]

2025-05-22 Thread via GitHub
djspiewak commented on PR #46928: URL: https://github.com/apache/spark/pull/46928#issuecomment-2901448498 @LuciferYang Slightly different situation here as jars and class files are not the same as parquet. Is there an existing example of how to generate *test* parquet files? As I noted, the

Re: [PR] [SPARK-52236][SQL] Standardize analyze exceptions for default value [spark]

2025-05-22 Thread via GitHub
cloud-fan commented on PR #50960: URL: https://github.com/apache/spark/pull/50960#issuecomment-2901486665 The AQE test failure is unrelated, thanks, merging to master! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use th

Re: [PR] [SPARK-52236][SQL] Standardize analyze exceptions for default value [spark]

2025-05-22 Thread via GitHub
cloud-fan commented on code in PR #50960: URL: https://github.com/apache/spark/pull/50960#discussion_r2102738375 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/ResolveDefaultColumnsUtil.scala: ## @@ -583,21 +583,29 @@ object ResolveDefaultColumns extends Query

Re: [PR] [SPARK-52236][SQL] Standardize analyze exceptions for default value [spark]

2025-05-22 Thread via GitHub
cloud-fan closed pull request #50960: [SPARK-52236][SQL] Standardize analyze exceptions for default value URL: https://github.com/apache/spark/pull/50960 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [PR] [SPARK-52065][SQL] Produce another plan tree with output columns (name, data type, nullability) in plan change logging [spark]

2025-05-22 Thread via GitHub
cloud-fan commented on code in PR #50852: URL: https://github.com/apache/spark/pull/50852#discussion_r2102753811 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/rules/RuleExecutor.scala: ## @@ -59,10 +59,15 @@ class PlanChangeLogger[TreeType <: TreeNode[_]] extends

Re: [PR] [SPARK-52065][SQL] Produce another plan tree with output columns (name, data type, nullability) in plan change logging [spark]

2025-05-22 Thread via GitHub
cloud-fan commented on code in PR #50852: URL: https://github.com/apache/spark/pull/50852#discussion_r2102754407 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/rules/RuleExecutor.scala: ## @@ -59,10 +59,15 @@ class PlanChangeLogger[TreeType <: TreeNode[_]] extends

Re: [PR] [SPARK-52260][SQL][TESTS] Add test for Update/Merge Into/Delete From table [spark]

2025-05-22 Thread via GitHub
gengliangwang commented on code in PR #50982: URL: https://github.com/apache/spark/pull/50982#discussion_r2102932267 ## sql/core/src/test/scala/org/apache/spark/sql/connector/DeleteFromTableSuiteBase.scala: ## @@ -63,6 +63,31 @@ abstract class DeleteFromTableSuiteBase extends R

Re: [PR] [SPARK-50137][SQL] Avoid fallback to Hive-incompatible ways when table creation fails by thrift exception [spark]

2025-05-22 Thread via GitHub
sunchao commented on PR #50985: URL: https://github.com/apache/spark/pull/50985#issuecomment-2901833681 thank you @wecharyu for the quick fix -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the s

Re: [PR] [SPARK-52219][SQL] Schema level collation support for tables [spark]

2025-05-22 Thread via GitHub
cloud-fan commented on code in PR #50937: URL: https://github.com/apache/spark/pull/50937#discussion_r2103004729 ## sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala: ## @@ -143,6 +144,28 @@ case class AlterDatabasePropertiesCommand( } } +/** + * A c

Re: [PR] [SPARK-52219][SQL] Schema level collation support for tables [spark]

2025-05-22 Thread via GitHub
cloud-fan commented on code in PR #50937: URL: https://github.com/apache/spark/pull/50937#discussion_r2103007477 ## sql/core/src/test/scala/org/apache/spark/sql/execution/command/AlterNamespaceSetPropertiesSuiteBase.scala: ## @@ -110,8 +110,11 @@ trait AlterNamespaceSetPropertie

Re: [PR] [SPARK-52219][SQL] Schema level collation support for tables [spark]

2025-05-22 Thread via GitHub
ilicmarkodb commented on code in PR #50937: URL: https://github.com/apache/spark/pull/50937#discussion_r2103011349 ## sql/core/src/test/scala/org/apache/spark/sql/execution/command/AlterNamespaceSetPropertiesSuiteBase.scala: ## @@ -110,8 +110,11 @@ trait AlterNamespaceSetPropert

Re: [PR] [SPARK-52219][SQL] Schema level collation support for tables [spark]

2025-05-22 Thread via GitHub
cloud-fan commented on code in PR #50937: URL: https://github.com/apache/spark/pull/50937#discussion_r2103016789 ## sql/core/src/test/scala/org/apache/spark/sql/execution/command/AlterNamespaceSetPropertiesSuiteBase.scala: ## @@ -110,8 +110,11 @@ trait AlterNamespaceSetPropertie

Re: [PR] [SPARK-52219][SQL] Schema level collation support for tables [spark]

2025-05-22 Thread via GitHub
ilicmarkodb commented on code in PR #50937: URL: https://github.com/apache/spark/pull/50937#discussion_r2103038567 ## sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala: ## @@ -143,6 +144,28 @@ case class AlterDatabasePropertiesCommand( } } +/** + * A

Re: [PR] [SPARK-52260][SQL][TESTS] Add test for Update/Merge Into/Delete From table [spark]

2025-05-22 Thread via GitHub
gengliangwang commented on PR #50982: URL: https://github.com/apache/spark/pull/50982#issuecomment-2902233304 Merging to master -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific commen

Re: [PR] [SPARK-52260][SQL][TESTS] Add test for Update/Merge Into/Delete From table [spark]

2025-05-22 Thread via GitHub
gengliangwang closed pull request #50982: [SPARK-52260][SQL][TESTS] Add test for Update/Merge Into/Delete From table URL: https://github.com/apache/spark/pull/50982 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[PR] Move the benchmark code related to Sorter from `SorterSuite` to `SorterBenchmark` [spark]

2025-05-22 Thread via GitHub
LuciferYang opened a new pull request, #50987: URL: https://github.com/apache/spark/pull/50987 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How

Re: [PR] [SPARK-52232][SQL] Fix non-deterministic queries to produce different results at every step [spark]

2025-05-22 Thread via GitHub
cloud-fan commented on code in PR #50957: URL: https://github.com/apache/spark/pull/50957#discussion_r2102673919 ## sql/core/src/main/scala/org/apache/spark/sql/execution/UnionLoopExec.scala: ## @@ -183,11 +183,24 @@ case class UnionLoopExec( // Main loop for obtaining the

Re: [PR] [SPARK-52232][SQL] Fix non-deterministic queries to produce different results at every step [spark]

2025-05-22 Thread via GitHub
cloud-fan commented on code in PR #50957: URL: https://github.com/apache/spark/pull/50957#discussion_r2102675985 ## sql/core/src/main/scala/org/apache/spark/sql/execution/UnionLoopExec.scala: ## @@ -183,11 +183,24 @@ case class UnionLoopExec( // Main loop for obtaining the

Re: [PR] [SPARK-52060][SQL] Make `OneRowRelationExec` node [spark]

2025-05-22 Thread via GitHub
cloud-fan commented on PR #50849: URL: https://github.com/apache/spark/pull/50849#issuecomment-2901588073 @richardc-db can you do a rebase to re-trigger the CI? We fixed an OOM test issue recently. -- This is an automated message from the Apache Git Service. To respond to the message, ple

Re: [PR] [SPARK-52233][SQL] Fix map_zip_with for Floating Point Types [spark]

2025-05-22 Thread via GitHub
cloud-fan commented on code in PR #50967: URL: https://github.com/apache/spark/pull/50967#discussion_r2102780173 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/higherOrderFunctions.scala: ## @@ -1124,23 +1124,40 @@ case class MapZipWith(left: Expression

Re: [PR] [SPARK-52232][SQL] Fix non-deterministic queries to produce different results at every step [spark]

2025-05-22 Thread via GitHub
cloud-fan commented on code in PR #50957: URL: https://github.com/apache/spark/pull/50957#discussion_r2102809199 ## sql/core/src/main/scala/org/apache/spark/sql/execution/UnionLoopExec.scala: ## @@ -183,11 +183,24 @@ case class UnionLoopExec( // Main loop for obtaining the

Re: [PR] [SPARK-52226] [SQL] Fix unusual equality checks in three operators [spark]

2025-05-22 Thread via GitHub
cloud-fan commented on code in PR #50949: URL: https://github.com/apache/spark/pull/50949#discussion_r2102816476 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/BatchScanExec.scala: ## @@ -48,8 +48,8 @@ case class BatchScanExec( // TODO: unify the equ

Re: [PR] [SPARK-52153][SQL] Fix from_json and to_json with variant [spark]

2025-05-22 Thread via GitHub
cloud-fan commented on code in PR #50901: URL: https://github.com/apache/spark/pull/50901#discussion_r2102823215 ## common/variant/src/main/java/org/apache/spark/types/variant/Variant.java: ## @@ -316,9 +316,15 @@ static void toJsonImpl(byte[] value, byte[] metadata, int pos, S

Re: [PR] [SPARK-52153][SQL] Fix from_json and to_json with variant [spark]

2025-05-22 Thread via GitHub
cloud-fan commented on PR #50901: URL: https://github.com/apache/spark/pull/50901#issuecomment-2901636001 where is the fix to respect `from_json` options? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above t

Re: [PR] [SPARK-52153][SQL] Fix from_json and to_json with variant [spark]

2025-05-22 Thread via GitHub
chenhao-db commented on code in PR #50901: URL: https://github.com/apache/spark/pull/50901#discussion_r2102830177 ## common/variant/src/main/java/org/apache/spark/types/variant/Variant.java: ## @@ -316,9 +316,15 @@ static void toJsonImpl(byte[] value, byte[] metadata, int pos,

Re: [PR] [SPARK-52181] Increase variant size limit to 128MiB [spark]

2025-05-22 Thread via GitHub
tedyu commented on PR #50913: URL: https://github.com/apache/spark/pull/50913#issuecomment-2902029149 bq. The new OOM in tests is because we changed the test code to use a bigger input. With the release containing this PR, some users would feed their workloads with bigger input. --

Re: [PR] [SPARK-52181] Increase variant size limit to 128MiB [spark]

2025-05-22 Thread via GitHub
chenhao-db commented on PR #50913: URL: https://github.com/apache/spark/pull/50913#issuecomment-2902013690 Increasing the limit won't bring any new OOM to existing workloads. The new OOM in tests is because we changed the test code to use a bigger input. I'm not sure whether we are ab

Re: [PR] [SPARK-52224][CONNECT][PYTHON] Introduce pyyaml as a dependency for the Python client [spark]

2025-05-22 Thread via GitHub
xinrong-meng commented on PR #50944: URL: https://github.com/apache/spark/pull/50944#issuecomment-2902061024 LGTM thank you! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] [SPARK-52226] [SQL] Fix unusual equality checks in three operators [spark]

2025-05-22 Thread via GitHub
li-boxuan commented on code in PR #50949: URL: https://github.com/apache/spark/pull/50949#discussion_r2103113932 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/BatchScanExec.scala: ## @@ -48,8 +48,8 @@ case class BatchScanExec( // TODO: unify the equ

Re: [PR] [SPARK-52236][SQL] Standardize analyze exceptions for default value [spark]

2025-05-22 Thread via GitHub
szehon-ho commented on code in PR #50960: URL: https://github.com/apache/spark/pull/50960#discussion_r2103151918 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/ResolveDefaultColumnsUtil.scala: ## @@ -583,21 +583,29 @@ object ResolveDefaultColumns extends Query

[PR] [SPARK-52264][PS][TESTS] Test divide-by-zero behavior with more numeric data types [spark]

2025-05-22 Thread via GitHub
xinrong-meng opened a new pull request, #50988: URL: https://github.com/apache/spark/pull/50988 ### What changes were proposed in this pull request? Test divide-by-zero behavior with more numeric data types ### Why are the changes needed? To ensure that divide-by-zero operati

Re: [PR] [SPARK-52224][CONNECT][PYTHON] Introduce pyyaml as a dependency for the Python client [spark]

2025-05-22 Thread via GitHub
sryza closed pull request #50944: [SPARK-52224][CONNECT][PYTHON] Introduce pyyaml as a dependency for the Python client URL: https://github.com/apache/spark/pull/50944 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the U

[PR] [SPARK-52265][SQL][TEST] Fix regex leading to empty PROCESS_TABLES.testingVersions in HiveExternalCatalogVersionsSuite [spark]

2025-05-22 Thread via GitHub
efaracci018 opened a new pull request, #50989: URL: https://github.com/apache/spark/pull/50989 ### What changes were proposed in this pull request? Fix the version parsing logic in `HiveExternalCatalogVersionsSuite` to properly handle new artifact paths in https://dist.apache.org/rep

Re: [PR] [SPARK-52262][SQL] swap order of withConnection and classifyException in loadTable [spark]

2025-05-22 Thread via GitHub
urosstan-db commented on code in PR #50986: URL: https://github.com/apache/spark/pull/50986#discussion_r2103353356 ## connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/V2JDBCTest.scala: ## @@ -1118,4 +1118,18 @@ private[v2] trait V2JDBCTest extends S

[PR] [SPARK-52267] Match field ID in ParquetToSparkSchemaConverter [spark]

2025-05-22 Thread via GitHub
chenhao-db opened a new pull request, #50990: URL: https://github.com/apache/spark/pull/50990 ### What changes were proposed in this pull request? In the vectorized Parquet reader, there are two classes to resolve the Parquet schema when reading a Parquet file: - `ParquetReadSu

Re: [PR] [SPARK-52267] Match field ID in ParquetToSparkSchemaConverter [spark]

2025-05-22 Thread via GitHub
chenhao-db commented on PR #50990: URL: https://github.com/apache/spark/pull/50990#issuecomment-2902564582 @cloud-fan @jackierwzhang Please take a look. Thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL a

[PR] [SPARK-52268] Add `variant` SQL test and answer file [spark-connect-swift]

2025-05-22 Thread via GitHub
dongjoon-hyun opened a new pull request, #167: URL: https://github.com/apache/spark-connect-swift/pull/167 ### What changes were proposed in this pull request? This PR aims to add `variant` SQL test and answer file. ### Why are the changes needed? To add a test coverage f

[PR] [SPARK-52269] Add `cast` SQL test and answer file [spark-connect-swift]

2025-05-22 Thread via GitHub
dongjoon-hyun opened a new pull request, #168: URL: https://github.com/apache/spark-connect-swift/pull/168 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change?

Re: [PR] [SPARK-52269] Add `cast` SQL test and answer file [spark-connect-swift]

2025-05-22 Thread via GitHub
dongjoon-hyun commented on PR #168: URL: https://github.com/apache/spark-connect-swift/pull/168#issuecomment-2902616213 Could you review this test PR when you have some time, @huaxingao ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on

Re: [PR] [SPARK-52268] Add `variant` SQL test and answer file [spark-connect-swift]

2025-05-22 Thread via GitHub
dongjoon-hyun closed pull request #167: [SPARK-52268] Add `variant` SQL test and answer file URL: https://github.com/apache/spark-connect-swift/pull/167 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [PR] [SPARK-52268] Add `variant` SQL test and answer file [spark-connect-swift]

2025-05-22 Thread via GitHub
dongjoon-hyun commented on PR #167: URL: https://github.com/apache/spark-connect-swift/pull/167#issuecomment-2902608653 Could you review this test PR when you have some time, @huaxingao ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on

Re: [PR] [SPARK-52269] Add `cast` SQL test and answer file [spark-connect-swift]

2025-05-22 Thread via GitHub
dongjoon-hyun closed pull request #168: [SPARK-52269] Add `cast` SQL test and answer file URL: https://github.com/apache/spark-connect-swift/pull/168 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] [SPARK-52269] Add `cast` SQL test and answer file [spark-connect-swift]

2025-05-22 Thread via GitHub
dongjoon-hyun commented on PR #168: URL: https://github.com/apache/spark-connect-swift/pull/168#issuecomment-2902624629 Thank you, @huaxingao . Merged to main. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL a

Re: [PR] [SPARK-52268] Add `variant` SQL test and answer file [spark-connect-swift]

2025-05-22 Thread via GitHub
dongjoon-hyun commented on PR #167: URL: https://github.com/apache/spark-connect-swift/pull/167#issuecomment-2902623249 Thank you, @huaxingao . Merged to main. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL a

Re: [PR] [WIP] [SPARK-47547] BloomFilter fpp degradation [spark]

2025-05-22 Thread via GitHub
peter-toth commented on PR #50933: URL: https://github.com/apache/spark/pull/50933#issuecomment-2900562227 > Should we create a dedicated BloomFilterImplV2 class for the fixed logic, just so we can keep the old V1 implementation for deserializing old byte streams? I don't think we ne

Re: [PR] [SPARK-52219][SQL] Schema level collation support for tables [spark]

2025-05-22 Thread via GitHub
ilicmarkodb commented on code in PR #50937: URL: https://github.com/apache/spark/pull/50937#discussion_r2102233242 ## sql/core/src/test/scala/org/apache/spark/sql/test/SQLTestUtils.scala: ## @@ -304,6 +304,21 @@ private[sql] trait SQLTestUtilsBase } } + /** + * Drop

Re: [PR] [SPARK-52179][INFRA][FOLLOW-UP] Skip dryruns in forked repository [spark]

2025-05-22 Thread via GitHub
HyukjinKwon closed pull request #50984: [SPARK-52179][INFRA][FOLLOW-UP] Skip dryruns in forked repository URL: https://github.com/apache/spark/pull/50984 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [PR] [SPARK-52179][INFRA][FOLLOW-UP] Skip dryruns in forked repository [spark]

2025-05-22 Thread via GitHub
HyukjinKwon commented on PR #50984: URL: https://github.com/apache/spark/pull/50984#issuecomment-2900727649 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] [SPARK-52257][PYTHON][TESTS] Add tests for chained arrow UDF [spark]

2025-05-22 Thread via GitHub
zhengruifeng closed pull request #50977: [SPARK-52257][PYTHON][TESTS] Add tests for chained arrow UDF URL: https://github.com/apache/spark/pull/50977 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] [SPARK-52257][PYTHON][TESTS] Add tests for chained arrow UDF [spark]

2025-05-22 Thread via GitHub
zhengruifeng commented on PR #50977: URL: https://github.com/apache/spark/pull/50977#issuecomment-2900287693 thanks, merged to master -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [MINOR][INFRA] Limit build_python_3.11_macos execution time to up to 2 hours [spark]

2025-05-22 Thread via GitHub
zhengruifeng closed pull request #50978: [MINOR][INFRA] Limit build_python_3.11_macos execution time to up to 2 hours URL: https://github.com/apache/spark/pull/50978 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[PR] [SPARK-52258][BUILD][FOLLOW-UP] Add retries when connecting to the ASF repo [spark]

2025-05-22 Thread via GitHub
HyukjinKwon opened a new pull request, #50983: URL: https://github.com/apache/spark/pull/50983 ### What changes were proposed in this pull request? This PR adds retries when connecting to the ASF repo ### Why are the changes needed? `curl` to ASF repo is actually flaky.

Re: [PR] [MINOR][INFRA] Limit build_python_3.11_macos execution time to up to 2 hours [spark]

2025-05-22 Thread via GitHub
zhengruifeng commented on PR #50978: URL: https://github.com/apache/spark/pull/50978#issuecomment-2900299073 thanks, merged to master -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-52179][INFRA][FOLLOW-UP] Skip dryruns in forked repository [spark]

2025-05-22 Thread via GitHub
HyukjinKwon commented on code in PR #50984: URL: https://github.com/apache/spark/pull/50984#discussion_r2101934211 ## .github/workflows/release.yml: ## @@ -76,7 +76,9 @@ jobs: name: Release Apache Spark (dryrun and RC) runs-on: ubuntu-latest # Do not allow dispatc

Re: [PR] [SPARK-52232][SQL] Fix non-deterministic queries to produce different results at every step [spark]

2025-05-22 Thread via GitHub
Pajaraja commented on code in PR #50957: URL: https://github.com/apache/spark/pull/50957#discussion_r2101939744 ## sql/core/src/test/resources/sql-tests/results/cte-recursion.sql.out: ## @@ -1475,3 +1475,68 @@ struct 3 4 5 + + +-- !query +WITH RECURSIVE randoms(val) AS ( +

Re: [PR] [SPARK-52232][SQL] Fix non-deterministic queries to produce different results at every step [spark]

2025-05-22 Thread via GitHub
Pajaraja commented on code in PR #50957: URL: https://github.com/apache/spark/pull/50957#discussion_r2101938410 ## sql/core/src/main/scala/org/apache/spark/sql/execution/UnionLoopExec.scala: ## @@ -183,11 +183,24 @@ case class UnionLoopExec( // Main loop for obtaining the r

Re: [PR] [SPARK-52232][SQL] Fix non-deterministic queries to produce different results at every step [spark]

2025-05-22 Thread via GitHub
Pajaraja commented on code in PR #50957: URL: https://github.com/apache/spark/pull/50957#discussion_r2101938808 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala: ## @@ -1267,6 +1267,9 @@ case class Shuffle(child: Expression, ran

Re: [PR] [SPARK-52232][SQL] Fix non-deterministic queries to produce different results at every step [spark]

2025-05-22 Thread via GitHub
Pajaraja commented on code in PR #50957: URL: https://github.com/apache/spark/pull/50957#discussion_r2101885084 ## sql/core/src/main/scala/org/apache/spark/sql/execution/UnionLoopExec.scala: ## @@ -183,11 +183,24 @@ case class UnionLoopExec( // Main loop for obtaining the r

[PR] [SPARK-52179][INFRA][FOLLOW-UP] Skip dryruns in forked repository [spark]

2025-05-22 Thread via GitHub
HyukjinKwon opened a new pull request, #50984: URL: https://github.com/apache/spark/pull/50984 ### What changes were proposed in this pull request? This PR is a followup of https://github.com/apache/spark/pull/50911 that skip dryruns in forked repository. ### Why are the change

Re: [PR] [SPARK-52232][SQL] Fix non-deterministic queries to produce different results at every step [spark]

2025-05-22 Thread via GitHub
Pajaraja commented on code in PR #50957: URL: https://github.com/apache/spark/pull/50957#discussion_r2101939434 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/randomExpressions.scala: ## @@ -114,6 +115,8 @@ case class Rand(child: Expression, hideSeed: B

Re: [PR] [SPARK-52232][SQL] Fix non-deterministic queries to produce different results at every step [spark]

2025-05-22 Thread via GitHub
Pajaraja commented on code in PR #50957: URL: https://github.com/apache/spark/pull/50957#discussion_r2101956985 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/randomExpressions.scala: ## @@ -165,6 +168,8 @@ case class Randn(child: Expression, hideSeed:

Re: [PR] [SPARK-52232][SQL] Fix non-deterministic queries to produce different results at every step [spark]

2025-05-22 Thread via GitHub
Pajaraja commented on code in PR #50957: URL: https://github.com/apache/spark/pull/50957#discussion_r2101967964 ## sql/core/src/test/scala/org/apache/spark/sql/execution/RecursiveCTESuite.scala: ## @@ -0,0 +1,103 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under

[PR] [SPARK-50137][HIVE] Avoid fallback to Hive-incompatible ways when table creation fails by thrift exception [spark]

2025-05-22 Thread via GitHub
wecharyu opened a new pull request, #50985: URL: https://github.com/apache/spark/pull/50985 ### What changes were proposed in this pull request? Enhance the datasource table creation, do not fallback to hive incompatible way if failure was caused by thrift exception. ### Why ar

Re: [PR] [SPARK-50137][HIVE] Avoid fallback to Hive-incompatible ways when table creation fails by thrift exception [spark]

2025-05-22 Thread via GitHub
wecharyu commented on code in PR #48668: URL: https://github.com/apache/spark/pull/48668#discussion_r2101996196 ## sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveExternalCatalogSuite.scala: ## @@ -241,4 +242,33 @@ class HiveExternalCatalogSuite extends ExternalCatalogSui

Re: [PR] [SPARK-52232][SQL] Fix non-deterministic queries to produce different results at every step [spark]

2025-05-22 Thread via GitHub
Pajaraja commented on code in PR #50957: URL: https://github.com/apache/spark/pull/50957#discussion_r2101927721 ## sql/core/src/test/scala/org/apache/spark/sql/execution/RecursiveCTESuite.scala: ## @@ -0,0 +1,103 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under

Re: [PR] [SPARK-51272][CORE][3.5] Aborting instead of continuing partially completed indeterminate result stage at ResubmitFailedStages [spark]

2025-05-22 Thread via GitHub
mridulm commented on PR #50946: URL: https://github.com/apache/spark/pull/50946#issuecomment-2900137227 +CC @shardulm94 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [PR] [SPARK-52258][BUILD][FOLLOW-UP] Add retries when connecting to the ASF repo [spark]

2025-05-22 Thread via GitHub
HyukjinKwon closed pull request #50983: [SPARK-52258][BUILD][FOLLOW-UP] Add retries when connecting to the ASF repo URL: https://github.com/apache/spark/pull/50983 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL a

[PR] [SPARK-52262][SQL] swap order of withConnection and classifyException in loadTable [spark]

2025-05-22 Thread via GitHub
alekjarmov opened a new pull request, #50986: URL: https://github.com/apache/spark/pull/50986 ### What changes were proposed in this pull request? Swap order of `withConnection` and `classifyException` in `loadTable` since connection errors were swallowed with `FAILED_JDBC.TAB

Re: [PR] [SPARK-52181] Increase variant size limit to 128MiB [spark]

2025-05-22 Thread via GitHub
tedyu commented on PR #50913: URL: https://github.com/apache/spark/pull/50913#issuecomment-2900987837 I am still trying to understand the implication of OOM in tests. The previous comment provided one refactor that avoids unnecessary creation of bigdecimal's. In my opinion, providing a c

Re: [PR] [SPARK-52265][SQL][TEST] Fix regex leading to empty PROCESS_TABLES.testingVersions in HiveExternalCatalogVersionsSuite [spark]

2025-05-22 Thread via GitHub
LuciferYang commented on PR #50989: URL: https://github.com/apache/spark/pull/50989#issuecomment-2903266768 It seems that new issues may still arise after the 4.0 release with this pr: ``` build/sbt "hive/testOnly org.apache.spark.sql.hive.HiveExternalCatalogVersionsSuite" -Phive

Re: [PR] [SPARK-52271] Upgrade Spark to 4.0.0 in CIs and docs [spark-connect-swift]

2025-05-22 Thread via GitHub
dongjoon-hyun commented on PR #169: URL: https://github.com/apache/spark-connect-swift/pull/169#issuecomment-2903342082 This PR is ready. - https://dist.apache.org/repos/dist/release/spark/spark-4.0.0/ -- This is an automated message from the Apache Git Service. To respond to the messa

Re: [PR] [SPARK-52271] Upgrade Spark to 4.0.0 in CIs and docs [spark-connect-swift]

2025-05-22 Thread via GitHub
dongjoon-hyun commented on PR #169: URL: https://github.com/apache/spark-connect-swift/pull/169#issuecomment-2903349319 Wow. Thank you, @viirya ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] [SPARK-52271] Upgrade Spark to 4.0.0 in CIs and docs [spark-connect-swift]

2025-05-22 Thread via GitHub
dongjoon-hyun closed pull request #169: [SPARK-52271] Upgrade Spark to 4.0.0 in CIs and docs URL: https://github.com/apache/spark-connect-swift/pull/169 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [PR] [SPARK-52271] Upgrade Spark to 4.0.0 in CIs and docs [spark-connect-swift]

2025-05-22 Thread via GitHub
dongjoon-hyun commented on PR #169: URL: https://github.com/apache/spark-connect-swift/pull/169#issuecomment-2903355952 Merged to main. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specif

Re: [PR] [SPARK-52272][SQL] HiveExternalCatalog alter table should update column comments [spark]

2025-05-22 Thread via GitHub
szehon-ho commented on code in PR #50991: URL: https://github.com/apache/spark/pull/50991#discussion_r2103631127 ## sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala: ## @@ -684,11 +684,23 @@ private[spark] class HiveExternalCatalog(conf: SparkConf, ha

Re: [PR] [SPARK-52272][SQL] HiveExternalCatalog alter table should update column comments [spark]

2025-05-22 Thread via GitHub
szehon-ho commented on code in PR #50991: URL: https://github.com/apache/spark/pull/50991#discussion_r2103613204 ## sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala: ## @@ -684,11 +684,23 @@ private[spark] class HiveExternalCatalog(conf: SparkConf, ha

[PR] [SPARK-52225][BUILD][FOLLOW-UP] Change -it to -ti in Docker execution in release script [spark]

2025-05-22 Thread via GitHub
HyukjinKwon opened a new pull request, #50994: URL: https://github.com/apache/spark/pull/50994 ### What changes were proposed in this pull request? This PR is a followup of https://github.com/apache/spark/pull/50945 that fixes it to -ti in Docker execution in release script. ##

Re: [PR] [SPARK-52225][BUILD][FOLLOW-UP] Change -it to -ti in Docker execution in release script [spark]

2025-05-22 Thread via GitHub
HyukjinKwon closed pull request #50994: [SPARK-52225][BUILD][FOLLOW-UP] Change -it to -ti in Docker execution in release script URL: https://github.com/apache/spark/pull/50994 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and u

Re: [PR] [SPARK-52225][BUILD][FOLLOW-UP] Change -it to -ti in Docker execution in release script [spark]

2025-05-22 Thread via GitHub
HyukjinKwon commented on PR #50994: URL: https://github.com/apache/spark/pull/50994#issuecomment-2903106133 Merged to master, branch-4.0 and branch-3.5. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

Re: [PR] [SPARK-52005] Upgrade Spark build dependency to 4.0.0 [spark-kubernetes-operator]

2025-05-22 Thread via GitHub
dongjoon-hyun commented on PR #221: URL: https://github.com/apache/spark-kubernetes-operator/pull/221#issuecomment-2902679483 Currently, this is in a `Draft` status until the release manage finishes the upload steps. -- This is an automated message from the Apache Git Service. To respond

  1   2   >