Re: [PR] [SPARK-51309][BUILD] Upgrade rocksdbjni to 9.10.0 [spark]

2025-02-26 Thread via GitHub
LuciferYang closed pull request #50076: [SPARK-51309][BUILD] Upgrade rocksdbjni to 9.10.0 URL: https://github.com/apache/spark/pull/50076 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-51309][BUILD] Upgrade rocksdbjni to 9.10.0 [spark]

2025-02-26 Thread via GitHub
LuciferYang commented on PR #50076: URL: https://github.com/apache/spark/pull/50076#issuecomment-2684999399 Merged into master. Thanks @wayneguow and @yaooqinn -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL a

Re: [PR] [SPARK-51310][SQL] Resolve the type of default string producing expressions [spark]

2025-02-26 Thread via GitHub
cloud-fan commented on code in PR #50053: URL: https://github.com/apache/spark/pull/50053#discussion_r1971577627 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveDDLCommandStringTypes.scala: ## @@ -121,6 +124,29 @@ object ResolveDDLCommandStringTypes

Re: [PR] [SPARK-51324][SQL] Fix nested FOR statement throwing error if empty result [spark]

2025-02-26 Thread via GitHub
miland-db commented on code in PR #50090: URL: https://github.com/apache/spark/pull/50090#discussion_r1971579683 ## sql/core/src/main/scala/org/apache/spark/sql/scripting/SqlScriptingExecutionNode.scala: ## @@ -208,7 +208,7 @@ class TriggerToExceptionHandlerMap( } object Tri

Re: [PR] [SPARK-51273][SQL] Spark Connect Call Procedure runs the procedure twice [spark]

2025-02-26 Thread via GitHub
cloud-fan commented on code in PR #50031: URL: https://github.com/apache/spark/pull/50031#discussion_r1971589160 ## sql/api/src/test/scala/org/apache/spark/sql/StaticProcedureSuiteBase.scala: ## @@ -0,0 +1,33 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one

[PR] [SPARK-51325] Check in source code for `smallJar.jar` [spark]

2025-02-26 Thread via GitHub
vicennial opened a new pull request, #50092: URL: https://github.com/apache/spark/pull/50092 ### What changes were proposed in this pull request? Adds source code to create `smallJar.jar`. The jar is regenerated using the source provided and it's CRC is updated. ### Why

Re: [PR] [SPARK-51325] Check in source code for `smallJar.jar` [spark]

2025-02-26 Thread via GitHub
vicennial commented on PR #50092: URL: https://github.com/apache/spark/pull/50092#issuecomment-2685944709 @dongjoon-hyun @vrozov Would a change like this where the source code is included be acceptable in keeping the JAR? -- This is an automated message from the Apache Git Service. To res

Re: [PR] [SPARK-51325] Check in source code for `smallJar.jar` [spark]

2025-02-26 Thread via GitHub
dongjoon-hyun commented on PR #50092: URL: https://github.com/apache/spark/pull/50092#issuecomment-2685993435 Thank you for pining me. However, I would recommend to participate the dev mailing list before taking any actions on code, @vicennial . -- This is an automated message from the Ap

Re: [PR] [SPARK-51273][SQL] Spark Connect Call Procedure runs the procedure twice [spark]

2025-02-26 Thread via GitHub
szehon-ho commented on PR #50031: URL: https://github.com/apache/spark/pull/50031#issuecomment-2686011498 Also wanted to give credit to @raveeram-db to identify the issue and the idea for the fix! -- This is an automated message from the Apache Git Service. To respond to the message, plea

Re: [PR] [SPARK-51324][SQL] Fix nested FOR statement throwing error if empty result [spark]

2025-02-26 Thread via GitHub
miland-db commented on code in PR #50090: URL: https://github.com/apache/spark/pull/50090#discussion_r1971687991 ## sql/core/src/main/scala/org/apache/spark/sql/scripting/SqlScriptingExecutionNode.scala: ## @@ -208,7 +208,7 @@ class TriggerToExceptionHandlerMap( } object Tri

Re: [PR] [SPARK-49960][SQL] Custom ExpressionEncoder support and TransformingEncoder fixes [spark]

2025-02-26 Thread via GitHub
chris-twiner commented on code in PR #50023: URL: https://github.com/apache/spark/pull/50023#discussion_r1971693941 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/DeserializerBuildHelper.scala: ## @@ -270,6 +270,8 @@ object DeserializerBuildHelper { enc: Agn

Re: [PR] [SPARK-49960][SQL] Custom ExpressionEncoder support and TransformingEncoder fixes [spark]

2025-02-26 Thread via GitHub
chris-twiner commented on code in PR #50023: URL: https://github.com/apache/spark/pull/50023#discussion_r1971695592 ## sql/core/src/test/scala/org/apache/spark/sql/DatasetSuite.scala: ## @@ -2841,6 +2860,265 @@ class DatasetSuite extends QueryTest checkDataset(Seq(seqMutabl

Re: [PR] [SPARK-51310][SQL] Resolve the type of default string producing expressions [spark]

2025-02-26 Thread via GitHub
cloud-fan commented on code in PR #50053: URL: https://github.com/apache/spark/pull/50053#discussion_r1971639864 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveDDLCommandStringTypes.scala: ## @@ -121,6 +124,29 @@ object ResolveDDLCommandStringTypes

Re: [PR] [SPARK-51270][SQL] Support UUID type in Variant [spark]

2025-02-26 Thread via GitHub
cashmand commented on code in PR #50025: URL: https://github.com/apache/spark/pull/50025#discussion_r1971807592 ## common/variant/src/main/java/org/apache/spark/types/variant/VariantShreddingWriter.java: ## @@ -283,6 +283,11 @@ private static Object tryTypedShred( ret

Re: [PR] [SPARK-51270][SQL] Support UUID type in Variant [spark]

2025-02-26 Thread via GitHub
cashmand commented on code in PR #50025: URL: https://github.com/apache/spark/pull/50025#discussion_r1971821381 ## common/variant/src/main/java/org/apache/spark/types/variant/VariantUtil.java: ## @@ -497,6 +508,19 @@ public static String getString(byte[] value, int pos) { t

Re: [PR] [SPARK-49960][SQL] Custom ExpressionEncoder support and TransformingEncoder fixes [spark]

2025-02-26 Thread via GitHub
chris-twiner commented on code in PR #50023: URL: https://github.com/apache/spark/pull/50023#discussion_r1971629784 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/SerializerBuildHelper.scala: ## @@ -429,7 +430,10 @@ object SerializerBuildHelper { Literal(c

Re: [PR] [SPARK-49960][SQL] Custom ExpressionEncoder support and TransformingEncoder fixes [spark]

2025-02-26 Thread via GitHub
chris-twiner commented on code in PR #50023: URL: https://github.com/apache/spark/pull/50023#discussion_r1971266224 ## sql/core/src/test/scala/org/apache/spark/sql/DatasetSuite.scala: ## @@ -2841,6 +2860,265 @@ class DatasetSuite extends QueryTest checkDataset(Seq(seqMutabl

Re: [PR] [SPARK-51315][SQL] Enabling object level collations by default [spark]

2025-02-26 Thread via GitHub
cloud-fan commented on PR #50082: URL: https://github.com/apache/spark/pull/50082#issuecomment-2685102193 thanks, merging to master/4.0! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specif

Re: [PR] [SPARK-51315][SQL] Enabling object level collations by default [spark]

2025-02-26 Thread via GitHub
cloud-fan closed pull request #50082: [SPARK-51315][SQL] Enabling object level collations by default URL: https://github.com/apache/spark/pull/50082 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to t

Re: [PR] [SPARK-51289][SQL] Throw a proper error message for not fully implemented `SQLTableFunction` [spark]

2025-02-26 Thread via GitHub
cloud-fan commented on code in PR #50073: URL: https://github.com/apache/spark/pull/50073#discussion_r1971581507 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/EliminateSQLFunctionNode.scala: ## @@ -34,8 +36,8 @@ object EliminateSQLFunctionNode extends Ru

Re: [PR] [SPARK-51310][SQL] Resolve the type of default string producing expressions [spark]

2025-02-26 Thread via GitHub
stefankandic commented on code in PR #50053: URL: https://github.com/apache/spark/pull/50053#discussion_r1971607731 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveDDLCommandStringTypes.scala: ## @@ -121,6 +124,29 @@ object ResolveDDLCommandStringTyp

Re: [PR] [SPARK-49960][SQL] Custom ExpressionEncoder support and TransformingEncoder fixes [spark]

2025-02-26 Thread via GitHub
chris-twiner commented on code in PR #50023: URL: https://github.com/apache/spark/pull/50023#discussion_r1971723345 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/DeserializerBuildHelper.scala: ## @@ -458,7 +460,10 @@ object DeserializerBuildHelper { Liter

Re: [PR] [SPARK-51273][SQL] Spark Connect Call Procedure runs the procedure twice [spark]

2025-02-26 Thread via GitHub
hvanhovell commented on code in PR #50031: URL: https://github.com/apache/spark/pull/50031#discussion_r1971823853 ## sql/api/src/test/scala/org/apache/spark/sql/StaticProcedureSuiteBase.scala: ## @@ -0,0 +1,33 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one

Re: [PR] [SPARK-49960][SQL] Custom ExpressionEncoder support and TransformingEncoder fixes [spark]

2025-02-26 Thread via GitHub
chris-twiner commented on code in PR #50023: URL: https://github.com/apache/spark/pull/50023#discussion_r1971723345 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/DeserializerBuildHelper.scala: ## @@ -458,7 +460,10 @@ object DeserializerBuildHelper { Liter

Re: [PR] [SPARK-51324][SQL] Fix nested FOR statement throwing error if empty result [spark]

2025-02-26 Thread via GitHub
cloud-fan commented on code in PR #50090: URL: https://github.com/apache/spark/pull/50090#discussion_r1971591219 ## sql/core/src/main/scala/org/apache/spark/sql/scripting/SqlScriptingExecutionNode.scala: ## @@ -208,7 +208,7 @@ class TriggerToExceptionHandlerMap( } object Tri

Re: [PR] [SPARK-49960][SQL] Custom ExpressionEncoder support and TransformingEncoder fixes [spark]

2025-02-26 Thread via GitHub
chris-twiner commented on code in PR #50023: URL: https://github.com/apache/spark/pull/50023#discussion_r1971704899 ## sql/core/src/test/scala/org/apache/spark/sql/DatasetSuite.scala: ## @@ -2841,6 +2860,265 @@ class DatasetSuite extends QueryTest checkDataset(Seq(seqMutabl

Re: [PR] [SPARK-49960][SQL] Custom ExpressionEncoder support and TransformingEncoder fixes [spark]

2025-02-26 Thread via GitHub
chris-twiner commented on code in PR #50023: URL: https://github.com/apache/spark/pull/50023#discussion_r1971723345 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/DeserializerBuildHelper.scala: ## @@ -458,7 +460,10 @@ object DeserializerBuildHelper { Liter

Re: [PR] [SPARK-49960][SQL] Custom ExpressionEncoder support and TransformingEncoder fixes [spark]

2025-02-26 Thread via GitHub
chris-twiner commented on PR #50023: URL: https://github.com/apache/spark/pull/50023#issuecomment-2685258037 > Looks good overall. > > I am fine with merging it as is. Or a I can wait a bit so you can address the comments. I would like to get this in by RC2 (this will be cut end of th

Re: [PR] [SPARK-50793][SQL] Fix MySQL cast function for DOUBLE, LONGTEXT, BIGINT and BLOB types [spark]

2025-02-26 Thread via GitHub
sunxiaoguang commented on code in PR #49453: URL: https://github.com/apache/spark/pull/49453#discussion_r1971726586 ## connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/MySQLIntegrationSuite.scala: ## @@ -241,6 +241,84 @@ class MySQLIntegrationSuite

Re: [PR] [SPARK-51322][SQL] Better error message for streaming subquery expression [spark]

2025-02-26 Thread via GitHub
viirya commented on PR #50088: URL: https://github.com/apache/spark/pull/50088#issuecomment-2685203127 KafkaSourceStressForDontFailOnDataLossSuite failed but looks like unrelated. ``` [info] Cause: java.util.concurrent.ExecutionException: org.apache.kafka.common.errors.UnknownTop

Re: [PR] [SPARK-51310][SQL] Resolve the type of default string producing expressions [spark]

2025-02-26 Thread via GitHub
stefankandic commented on code in PR #50053: URL: https://github.com/apache/spark/pull/50053#discussion_r1971895942 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveDDLCommandStringTypes.scala: ## @@ -121,6 +124,29 @@ object ResolveDDLCommandStringTyp

Re: [PR] [SPARK-50994][CORE] Perform RDD conversion under tracked execution [spark]

2025-02-26 Thread via GitHub
BOOTMGR commented on code in PR #49678: URL: https://github.com/apache/spark/pull/49678#discussion_r1971949558 ## sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala: ## @@ -2721,6 +2721,25 @@ class DataFrameSuite extends QueryTest parameters = Map("name" ->

Re: [PR] [SPARK-51229][BUILD][CONNECT] Fix dependency:analyze goal on connect common [spark]

2025-02-26 Thread via GitHub
vicennial commented on code in PR #49971: URL: https://github.com/apache/spark/pull/49971#discussion_r1972274605 ## pom.xml: ## @@ -860,6 +861,56 @@ ${protobuf.version} provided + Review Comment: What's the reasoning behind moving/adding these

Re: [PR] [SPARK-51280][CONNECT] Improve RESPONSE_ALREADY_RECEIVED error class [spark]

2025-02-26 Thread via GitHub
the-sakthi commented on code in PR #50091: URL: https://github.com/apache/spark/pull/50091#discussion_r1972363827 ## python/pyspark/sql/connect/client/reattach.py: ## @@ -261,14 +261,19 @@ def _call_iter(self, iter_fun: Callable) -> Any: return iter_fun() e

Re: [PR] [SPARK-51314][DOCS][PS] Add proper note for distributed-sequence about indeterministic case [spark]

2025-02-26 Thread via GitHub
the-sakthi commented on code in PR #50086: URL: https://github.com/apache/spark/pull/50086#discussion_r1972374723 ## python/docs/source/user_guide/pandas_on_spark/options.rst: ## @@ -208,6 +208,16 @@ This is conceptually equivalent to the PySpark example as below: >>> spar

Re: [PR] [SPARK-51322][SQL] Better error message for streaming subquery expression [spark]

2025-02-26 Thread via GitHub
the-sakthi commented on PR #50088: URL: https://github.com/apache/spark/pull/50088#issuecomment-2686132211 LGTM -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscri

Re: [PR] [SPARK-51323] Duplicate "total" on Py SQL metrics [spark]

2025-02-26 Thread via GitHub
the-sakthi commented on PR #50089: URL: https://github.com/apache/spark/pull/50089#issuecomment-2686130142 LGTM (also could you please add a screenshot of the fixed issue as you did in the Jira desc if possible)? -- This is an automated message from the Apache Git Service. To respond to

Re: [PR] [SPARK-50792][SQL][FOLLOWUP] Test case should reuse the exists table. [spark]

2025-02-26 Thread via GitHub
the-sakthi commented on PR #50087: URL: https://github.com/apache/spark/pull/50087#issuecomment-2686134977 The mentioned JIRA is in "resolved" state. We either ought to reopen it or open a new jira? -- This is an automated message from the Apache Git Service. To respond to the message, pl

Re: [PR] [SPARK-51280][CONNECT] Improve RESPONSE_ALREADY_RECEIVED error class [spark]

2025-02-26 Thread via GitHub
changgyoopark-db commented on code in PR #50091: URL: https://github.com/apache/spark/pull/50091#discussion_r1972397157 ## python/pyspark/sql/connect/client/reattach.py: ## @@ -261,14 +261,19 @@ def _call_iter(self, iter_fun: Callable) -> Any: return iter_fun()

Re: [PR] [SPARK-50639][SQL] Improve warning logging in CacheManager [spark]

2025-02-26 Thread via GitHub
vrozov commented on PR #49276: URL: https://github.com/apache/spark/pull/49276#issuecomment-2686232066 @gengliangwang ^^^ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [PR] [SPARK-51182][SQL] DataFrameWriter should throw dataPathNotSpecifiedError when path is not specified [spark]

2025-02-26 Thread via GitHub
vrozov commented on PR #49928: URL: https://github.com/apache/spark/pull/49928#issuecomment-2686228988 @HeartSaVioR can you please suggest another committer to review this small PR? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to Gi

[PR] [SPARK-51326][CONNECT] Remove LazyExpression proto message [spark]

2025-02-26 Thread via GitHub
ueshin opened a new pull request, #50093: URL: https://github.com/apache/spark/pull/50093 ### What changes were proposed in this pull request? Removes `LazyExpression` proto message. Any features using this proto message is not released yet, so it's safe to remove from master a

Re: [PR] [SPARK-51270][SQL] Support UUID type in Variant [spark]

2025-02-26 Thread via GitHub
chenhao-db commented on code in PR #50025: URL: https://github.com/apache/spark/pull/50025#discussion_r1972565165 ## common/variant/src/main/java/org/apache/spark/types/variant/VariantBuilder.java: ## @@ -240,6 +242,19 @@ public void appendBinary(byte[] binary) { writePos +

Re: [PR] [SPARK-51273][SQL] Spark Connect Call Procedure runs the procedure twice [spark]

2025-02-26 Thread via GitHub
aokolnychyi commented on code in PR #50031: URL: https://github.com/apache/spark/pull/50031#discussion_r1972592710 ## sql/api/src/test/scala/org/apache/spark/sql/StaticProcedureSuiteBase.scala: ## @@ -0,0 +1,33 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under on

Re: [PR] [SPARK-51229][BUILD][CONNECT] Fix dependency:analyze goal on connect common [spark]

2025-02-26 Thread via GitHub
vrozov commented on PR #49971: URL: https://github.com/apache/spark/pull/49971#issuecomment-2686486753 @vicennial please see my response -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specif

Re: [PR] [SPARK-51229][BUILD][CONNECT] Fix dependency:analyze goal on connect common [spark]

2025-02-26 Thread via GitHub
vrozov commented on code in PR #49971: URL: https://github.com/apache/spark/pull/49971#discussion_r1972602804 ## sql/connect/common/pom.xml: ## @@ -142,8 +218,26 @@ org.spark-project.spark:unused

[PR] [SPARK-51326][CONNECT][4.0] Remove LazyExpression proto message [spark]

2025-02-26 Thread via GitHub
ueshin opened a new pull request, #50094: URL: https://github.com/apache/spark/pull/50094 ### What changes were proposed in this pull request? This is a backport of #50093. Removes `LazyExpression` proto message. As any feature using this proto message is not released yet

Re: [PR] [SPARK-51326][CONNECT] Remove LazyExpression proto message [spark]

2025-02-26 Thread via GitHub
ueshin commented on PR #50093: URL: https://github.com/apache/spark/pull/50093#issuecomment-2686499500 As "Protobuf breaking change detection and Python CodeGen check" failed with: ``` Error: Previously present message "LazyExpression" was deleted from file. Error: Previously pre

Re: [PR] [SPARK-51277][PYTHON] Implement 0-arg implementation in Arrow-optimized Python UDF [spark]

2025-02-26 Thread via GitHub
HyukjinKwon commented on PR #50084: URL: https://github.com/apache/spark/pull/50084#issuecomment-2686511930 Merged to master and branch-4.0. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the sp

Re: [PR] [SPARK-51273][SQL] Spark Connect Call Procedure runs the procedure twice [spark]

2025-02-26 Thread via GitHub
cloud-fan commented on PR #50031: URL: https://github.com/apache/spark/pull/50031#issuecomment-2686523433 thanks, merging to master/4.0! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specif

Re: [PR] [SPARK-51273][SQL] Spark Connect Call Procedure runs the procedure twice [spark]

2025-02-26 Thread via GitHub
cloud-fan closed pull request #50031: [SPARK-51273][SQL] Spark Connect Call Procedure runs the procedure twice URL: https://github.com/apache/spark/pull/50031 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [PR] [SPARK-51324][SQL] Fix nested FOR statement throwing error if empty result [spark]

2025-02-26 Thread via GitHub
cloud-fan closed pull request #50090: [SPARK-51324][SQL] Fix nested FOR statement throwing error if empty result URL: https://github.com/apache/spark/pull/50090 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abov

Re: [PR] [SPARK-51324][SQL] Fix nested FOR statement throwing error if empty result [spark]

2025-02-26 Thread via GitHub
cloud-fan commented on PR #50090: URL: https://github.com/apache/spark/pull/50090#issuecomment-2686526586 thanks, merging to master/4.0! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specif

Re: [PR] Dev/milast/recurisve cte [spark]

2025-02-26 Thread via GitHub
github-actions[bot] commented on PR #48878: URL: https://github.com/apache/spark/pull/48878#issuecomment-2686505170 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.

Re: [PR] [SPARK-51314][DOCS][PS] Add proper note for distributed-sequence about indeterministic case [spark]

2025-02-26 Thread via GitHub
itholic commented on code in PR #50086: URL: https://github.com/apache/spark/pull/50086#discussion_r1972664726 ## python/docs/source/user_guide/pandas_on_spark/options.rst: ## @@ -208,6 +208,16 @@ This is conceptually equivalent to the PySpark example as below: >>> spark_d

Re: [PR] [SPARK-51314][DOCS][PS] Add proper note for distributed-sequence about indeterministic case [spark]

2025-02-26 Thread via GitHub
the-sakthi commented on PR #50086: URL: https://github.com/apache/spark/pull/50086#issuecomment-2686579395 LGTM -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscri

Re: [PR] [SPARK-51314][DOCS][PS] Add proper note for distributed-sequence about indeterministic case [spark]

2025-02-26 Thread via GitHub
the-sakthi commented on code in PR #50086: URL: https://github.com/apache/spark/pull/50086#discussion_r1972666469 ## python/docs/source/user_guide/pandas_on_spark/options.rst: ## @@ -208,6 +208,16 @@ This is conceptually equivalent to the PySpark example as below: >>> spar

Re: [PR] [SPARK-51280][CONNECT] Improve RESPONSE_ALREADY_RECEIVED error class [spark]

2025-02-26 Thread via GitHub
the-sakthi commented on PR #50091: URL: https://github.com/apache/spark/pull/50091#issuecomment-2686580342 LGTM (barring the builds pass) :) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the s

Re: [PR] [SPARK-51277][PYTHON] Implement 0-arg implementation in Arrow-optimized Python UDF [spark]

2025-02-26 Thread via GitHub
HyukjinKwon closed pull request #50084: [SPARK-51277][PYTHON] Implement 0-arg implementation in Arrow-optimized Python UDF URL: https://github.com/apache/spark/pull/50084 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use th

Re: [PR] [SPARK-50793][SQL] Fix MySQL cast function for DOUBLE, LONGTEXT, BIGINT and BLOB types [spark]

2025-02-26 Thread via GitHub
sunxiaoguang commented on code in PR #49453: URL: https://github.com/apache/spark/pull/49453#discussion_r1971726586 ## connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/MySQLIntegrationSuite.scala: ## @@ -241,6 +241,84 @@ class MySQLIntegrationSuite

Re: [PR] [SPARK-50793][SQL] Fix MySQL cast function for DOUBLE, LONGTEXT, BIGINT and BLOB types [spark]

2025-02-26 Thread via GitHub
sunxiaoguang commented on code in PR #49453: URL: https://github.com/apache/spark/pull/49453#discussion_r1971726586 ## connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/MySQLIntegrationSuite.scala: ## @@ -241,6 +241,84 @@ class MySQLIntegrationSuite

[PR] [SPARK-51329][ML][PYTHON] Add `numFeatures` for clustering models [spark]

2025-02-26 Thread via GitHub
zhengruifeng opened a new pull request, #50095: URL: https://github.com/apache/spark/pull/50095 ### What changes were proposed in this pull request? Add `numFeatures` for clustering models ### Why are the changes needed? for feature parity between python and scala, th

[PR] [SPARK-51330][PYTHON] Enable spark.sql.execution.pythonUDTF.arrow.enabled by default [spark]

2025-02-26 Thread via GitHub
HyukjinKwon opened a new pull request, #50096: URL: https://github.com/apache/spark/pull/50096 ### What changes were proposed in this pull request? This PR enables `spark.sql.execution.pythonUDTF.arrow.enabled` by default. ### Why are the changes needed? We enabled Arrow

Re: [PR] [SPARK-51330][PYTHON] Enable spark.sql.execution.pythonUDTF.arrow.enabled by default [spark]

2025-02-26 Thread via GitHub
HyukjinKwon commented on PR #50096: URL: https://github.com/apache/spark/pull/50096#issuecomment-2686694904 cc @allisonwang-db -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific commen

Re: [PR] [SPARK-51206][PYTHON][CONNECT] Move Arrow conversion helpers out of Spark Connect [spark]

2025-02-26 Thread via GitHub
HyukjinKwon closed pull request #49941: [SPARK-51206][PYTHON][CONNECT] Move Arrow conversion helpers out of Spark Connect URL: https://github.com/apache/spark/pull/49941 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

Re: [PR] [SPARK-51206][PYTHON][CONNECT] Move Arrow conversion helpers out of Spark Connect [spark]

2025-02-26 Thread via GitHub
HyukjinKwon commented on PR #49941: URL: https://github.com/apache/spark/pull/49941#issuecomment-2686706677 Merged to master and branch-4.0. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the sp

Re: [PR] [SPARK-50792][SQL][FOLLOWUP] Test case should reuse the exists table. [spark]

2025-02-26 Thread via GitHub
beliefer commented on PR #50087: URL: https://github.com/apache/spark/pull/50087#issuecomment-2686744384 > The mentioned JIRA is in "resolved" state. We either ought to reopen it or open a new jira? It doesn't matter. The feature in 4.0.0 has not released yet. -- This is an automat

Re: [PR] [SPARK-51314][DOCS][PS] Add proper note for distributed-sequence about indeterministic case [spark]

2025-02-26 Thread via GitHub
itholic commented on code in PR #50086: URL: https://github.com/apache/spark/pull/50086#discussion_r1972761013 ## python/docs/source/user_guide/pandas_on_spark/options.rst: ## @@ -208,6 +208,16 @@ This is conceptually equivalent to the PySpark example as below: >>> spark_d

Re: [PR] [SPARK-51316][PYTHON] Allow Arrow batches in bytes instead of number of rows [spark]

2025-02-26 Thread via GitHub
viirya commented on code in PR #50080: URL: https://github.com/apache/spark/pull/50080#discussion_r1971109180 ## sql/core/src/main/scala/org/apache/spark/sql/execution/python/PythonArrowInput.scala: ## @@ -140,3 +141,53 @@ private[python] trait BasicPythonArrowInput extends Pyt

Re: [PR] [SPARK-51316][PYTHON] Allow Arrow batches in bytes instead of number of rows [spark]

2025-02-26 Thread via GitHub
viirya commented on code in PR #50080: URL: https://github.com/apache/spark/pull/50080#discussion_r1971108130 ## python/pyspark/sql/tests/arrow/test_arrow_map.py: ## @@ -194,6 +199,15 @@ def tearDownClass(cls): ReusedSQLTestCase.tearDownClass() +class MapInArrowWit

[PR] [SPARK-51314][DOCS][PS] Add proper note for distributed-sequence about indeterministic case [spark]

2025-02-26 Thread via GitHub
itholic opened a new pull request, #50086: URL: https://github.com/apache/spark/pull/50086 ### What changes were proposed in this pull request? This PR proposes to add proper note for distributed-sequence about indeterministic case ### Why are the changes needed?

Re: [PR] [SPARK-51315][SQL] Enabling object level collations by default [spark]

2025-02-26 Thread via GitHub
cloud-fan commented on code in PR #50082: URL: https://github.com/apache/spark/pull/50082#discussion_r1971351719 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveDDLCommandStringTypes.scala: ## @@ -48,12 +48,18 @@ object ResolveDDLCommandStringTypes e

Re: [PR] [SPARK-51265][SQL] IncrementalExecution should set the command execution code correctly [spark]

2025-02-26 Thread via GitHub
cloud-fan closed pull request #50037: [SPARK-51265][SQL] IncrementalExecution should set the command execution code correctly URL: https://github.com/apache/spark/pull/50037 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

Re: [PR] [SPARK-51281][SQL] DataFrameWriterV2 should respect the path option [spark]

2025-02-26 Thread via GitHub
cloud-fan commented on code in PR #50040: URL: https://github.com/apache/spark/pull/50040#discussion_r1971357977 ## sql/core/src/test/scala/org/apache/spark/sql/DataFrameWriterV2Suite.scala: ## @@ -839,4 +839,30 @@ class DataFrameWriterV2Suite extends QueryTest with SharedSpark

Re: [PR] [SPARK-51315][SQL] Enabling object level collations by default [spark]

2025-02-26 Thread via GitHub
cloud-fan commented on code in PR #50082: URL: https://github.com/apache/spark/pull/50082#discussion_r1971352052 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveDDLCommandStringTypes.scala: ## @@ -48,12 +48,18 @@ object ResolveDDLCommandStringTypes e

Re: [PR] [SPARK-51281][SQL] DataFrameWriterV2 should respect the path option [spark]

2025-02-26 Thread via GitHub
cloud-fan commented on PR #50040: URL: https://github.com/apache/spark/pull/50040#issuecomment-2684585464 We only have the API doc: https://spark.apache.org/docs/latest/api/scala/org/apache/spark/sql/DataFrameWriterV2.html -- This is an automated message from the Apache Git Service. To re

Re: [PR] [SPARK-51265][SQL] IncrementalExecution should set the command execution code correctly [spark]

2025-02-26 Thread via GitHub
cloud-fan commented on PR #50037: URL: https://github.com/apache/spark/pull/50037#issuecomment-2684580221 thanks for the review, merging to master/4.0! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to g

Re: [PR] [SPARK-51315][SQL] Enabling object level collations by default [spark]

2025-02-26 Thread via GitHub
dejankrak-db commented on code in PR #50082: URL: https://github.com/apache/spark/pull/50082#discussion_r1971394139 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveDDLCommandStringTypes.scala: ## @@ -48,12 +48,18 @@ object ResolveDDLCommandStringType

[PR] [SPARK-50792][SQL][FOLLOWUP] Test case should reuse the exists table. [spark]

2025-02-26 Thread via GitHub
beliefer opened a new pull request, #50087: URL: https://github.com/apache/spark/pull/50087 ### What changes were proposed in this pull request? This PR proposes update the test case by reuse the exists table. ### Why are the changes needed? https://github.com/apache/spark/pu

Re: [PR] [SPARK-51303] [SQL] [TESTS] Extend `ORDER BY` testing coverage [spark]

2025-02-26 Thread via GitHub
mihailoale-db commented on code in PR #50069: URL: https://github.com/apache/spark/pull/50069#discussion_r1971387629 ## sql/core/src/test/resources/sql-tests/inputs/order-by.sql: ## @@ -0,0 +1,24 @@ +-- Test data. +CREATE OR REPLACE TEMPORARY VIEW testData AS SELECT * FROM VALUE

Re: [PR] [SPARK-51321][SQL] Add rpad and lpad support for PostgresDialect and MsSQLServerDialect expression pushdown [spark]

2025-02-26 Thread via GitHub
milosstojanovic commented on PR #50060: URL: https://github.com/apache/spark/pull/50060#issuecomment-2684639209 > Please create an issue for track down. Done SPARK-51321 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub a

Re: [PR] [SPARK-49960][SQL] Custom ExpressionEncoder support and TransformingEncoder fixes [spark]

2025-02-26 Thread via GitHub
chris-twiner commented on code in PR #50023: URL: https://github.com/apache/spark/pull/50023#discussion_r1971266224 ## sql/core/src/test/scala/org/apache/spark/sql/DatasetSuite.scala: ## @@ -2841,6 +2860,265 @@ class DatasetSuite extends QueryTest checkDataset(Seq(seqMutabl

Re: [PR] [SPARK-49960][SQL] Custom ExpressionEncoder support and TransformingEncoder fixes [spark]

2025-02-26 Thread via GitHub
chris-twiner commented on code in PR #50023: URL: https://github.com/apache/spark/pull/50023#discussion_r1971211739 ## sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/encoders/ExpressionEncoderSuite.scala: ## @@ -142,6 +142,19 @@ case class OptionNestedGeneric[T](list:

Re: [PR] [SPARK-51315][SQL] Enabling object level collations by default [spark]

2025-02-26 Thread via GitHub
dejankrak-db commented on PR #50082: URL: https://github.com/apache/spark/pull/50082#issuecomment-2684374914 @cloud-fan, @stefankandic, @stevomitric, please take a look when you find some time, thanks! -- This is an automated message from the Apache Git Service. To respond to the message,

Re: [PR] [SPARK-49960][SQL] Custom ExpressionEncoder support and TransformingEncoder fixes [spark]

2025-02-26 Thread via GitHub
chris-twiner commented on code in PR #50023: URL: https://github.com/apache/spark/pull/50023#discussion_r1971213487 ## sql/api/src/main/scala/org/apache/spark/sql/catalyst/encoders/AgnosticEncoder.scala: ## @@ -286,5 +286,6 @@ object AgnosticEncoders { override def dataType

Re: [PR] [SPARK-49960][SQL] Custom ExpressionEncoder support and TransformingEncoder fixes [spark]

2025-02-26 Thread via GitHub
chris-twiner commented on code in PR #50023: URL: https://github.com/apache/spark/pull/50023#discussion_r1971221056 ## sql/core/src/test/scala/org/apache/spark/sql/DatasetSuite.scala: ## @@ -2841,6 +2860,265 @@ class DatasetSuite extends QueryTest checkDataset(Seq(seqMutabl

Re: [PR] [SPARK-49960][SQL] Custom ExpressionEncoder support and TransformingEncoder fixes [spark]

2025-02-26 Thread via GitHub
chris-twiner commented on code in PR #50023: URL: https://github.com/apache/spark/pull/50023#discussion_r1971232668 ## sql/core/src/test/scala/org/apache/spark/sql/DatasetSuite.scala: ## @@ -2841,6 +2860,265 @@ class DatasetSuite extends QueryTest checkDataset(Seq(seqMutabl

Re: [PR] [SPARK-51316][PYTHON] Allow Arrow batches in bytes instead of number of rows [spark]

2025-02-26 Thread via GitHub
HyukjinKwon commented on code in PR #50080: URL: https://github.com/apache/spark/pull/50080#discussion_r1971201977 ## python/pyspark/sql/tests/arrow/test_arrow_map.py: ## @@ -194,6 +199,15 @@ def tearDownClass(cls): ReusedSQLTestCase.tearDownClass() +class MapInArr

Re: [PR] [SPARK-51312][SQL] Fix createDataFrame from RDD[Row] [spark]

2025-02-26 Thread via GitHub
MaxGekk closed pull request #50079: [SPARK-51312][SQL] Fix createDataFrame from RDD[Row] URL: https://github.com/apache/spark/pull/50079 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-49960][SQL] Custom ExpressionEncoder support and TransformingEncoder fixes [spark]

2025-02-26 Thread via GitHub
chris-twiner commented on code in PR #50023: URL: https://github.com/apache/spark/pull/50023#discussion_r1971213487 ## sql/api/src/main/scala/org/apache/spark/sql/catalyst/encoders/AgnosticEncoder.scala: ## @@ -286,5 +286,6 @@ object AgnosticEncoders { override def dataType

Re: [PR] [SPARK-50793][SQL] Fix MySQL cast function for DOUBLE, LONGTEXT, BIGINT and BLOB types [spark]

2025-02-26 Thread via GitHub
beliefer commented on code in PR #49453: URL: https://github.com/apache/spark/pull/49453#discussion_r1971156786 ## connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/MySQLIntegrationSuite.scala: ## @@ -241,6 +241,84 @@ class MySQLIntegrationSuite exte

Re: [PR] [SPARK-50793][SQL] Fix MySQL cast function for DOUBLE, LONGTEXT, BIGINT and BLOB types [spark]

2025-02-26 Thread via GitHub
beliefer commented on code in PR #49453: URL: https://github.com/apache/spark/pull/49453#discussion_r1971150984 ## connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/MySQLIntegrationSuite.scala: ## @@ -241,6 +241,84 @@ class MySQLIntegrationSuite exte

Re: [PR] [SPARK-50793][SQL] Fix MySQL cast function for DOUBLE, LONGTEXT, BIGINT and BLOB types [spark]

2025-02-26 Thread via GitHub
beliefer commented on code in PR #49453: URL: https://github.com/apache/spark/pull/49453#discussion_r1971153903 ## connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/MySQLIntegrationSuite.scala: ## @@ -241,6 +241,84 @@ class MySQLIntegrationSuite exte

Re: [PR] [SPARK-50793][SQL] Fix MySQL cast function for DOUBLE, LONGTEXT, BIGINT and BLOB types [spark]

2025-02-26 Thread via GitHub
beliefer commented on code in PR #49453: URL: https://github.com/apache/spark/pull/49453#discussion_r1971158932 ## connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/MySQLIntegrationSuite.scala: ## @@ -241,6 +241,84 @@ class MySQLIntegrationSuite exte

Re: [PR] [SPARK-50793][SQL] Fix MySQL cast function for DOUBLE, LONGTEXT, BIGINT and BLOB types [spark]

2025-02-26 Thread via GitHub
beliefer commented on code in PR #49453: URL: https://github.com/apache/spark/pull/49453#discussion_r1971156786 ## connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/MySQLIntegrationSuite.scala: ## @@ -241,6 +241,84 @@ class MySQLIntegrationSuite exte

Re: [PR] [SPARK-49960][SQL] Custom ExpressionEncoder support and TransformingEncoder fixes [spark]

2025-02-26 Thread via GitHub
chris-twiner commented on code in PR #50023: URL: https://github.com/apache/spark/pull/50023#discussion_r1971206736 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/encoders/ExpressionEncoder.scala: ## @@ -228,7 +236,8 @@ case class ExpressionEncoder[T]( * return

Re: [PR] [SPARK-51303] [SQL] [TESTS] Extend `ORDER BY` testing coverage [spark]

2025-02-26 Thread via GitHub
beliefer commented on code in PR #50069: URL: https://github.com/apache/spark/pull/50069#discussion_r1971260413 ## sql/core/src/test/resources/sql-tests/inputs/order-by.sql: ## @@ -0,0 +1,24 @@ +-- Test data. +CREATE OR REPLACE TEMPORARY VIEW testData AS SELECT * FROM VALUES Re

Re: [PR] [SPARK-51303] [SQL] [TESTS] Extend `ORDER BY` testing coverage [spark]

2025-02-26 Thread via GitHub
vladimirg-db commented on code in PR #50069: URL: https://github.com/apache/spark/pull/50069#discussion_r1971262627 ## sql/core/src/test/resources/sql-tests/inputs/order-by.sql: ## @@ -0,0 +1,24 @@ +-- Test data. +CREATE OR REPLACE TEMPORARY VIEW testData AS SELECT * FROM VALUES

Re: [PR] [SPARK-49960][SQL] Custom ExpressionEncoder support and TransformingEncoder fixes [spark]

2025-02-26 Thread via GitHub
chris-twiner commented on code in PR #50023: URL: https://github.com/apache/spark/pull/50023#discussion_r1971266224 ## sql/core/src/test/scala/org/apache/spark/sql/DatasetSuite.scala: ## @@ -2841,6 +2860,265 @@ class DatasetSuite extends QueryTest checkDataset(Seq(seqMutabl

Re: [PR] [SPARK-51312][SQL] Fix createDataFrame from RDD[Row] [spark]

2025-02-26 Thread via GitHub
MaxGekk commented on PR #50079: URL: https://github.com/apache/spark/pull/50079#issuecomment-2684443652 +1, LGTM. Merging to master. Thank you, @mihailom-db and @hvanhovell @cloud-fan for review. -- This is an automated message from the Apache Git Service. To respond to the message, ple

  1   2   >