date:20250221

[PR] [SPARK-51292][SQL] Remove unnecessary inheritance from PlanTestBase, ExpressionEvalHelper and PlanTest [spark]

2025-02-21 Thread via GitHub

beliefer opened a new pull request, #50047: URL: https://github.com/apache/spark/pull/50047 ### What changes were proposed in this pull request? This PR proposes to remove unnecessary inheritance from `PlanTestBase`, `ExpressionEvalHelper` and `PlanTest`. ### Why are the change

Re: [PR] [SPARK-51258][SQL][FOLLOWUP] Remove unnecessary inheritance from SQLConfHelper [spark]

2025-02-21 Thread via GitHub

beliefer commented on PR #50046: URL: https://github.com/apache/spark/pull/50046#issuecomment-2676003297 ping @LuciferYang @MaxGekk @cloud-fan -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to th

[PR] [SPARK-51258][SQL][FOLLOWUP] Remove unnecessary inheritance from SQLConfHelper [spark]

2025-02-21 Thread via GitHub

beliefer opened a new pull request, #50046: URL: https://github.com/apache/spark/pull/50046 ### What changes were proposed in this pull request? This PR proposes to remove unnecessary inheritance from `SQLConfHelper`. ### Why are the changes needed? 1. There are some trait al

Re: [PR] [SPARK-51258][SQL] Remove unnecessary inheritance from SQLConfHelper [spark]

2025-02-21 Thread via GitHub

beliefer commented on PR #50005: URL: https://github.com/apache/spark/pull/50005#issuecomment-2675971847 Merged into branch-4.0/master @LuciferYang @cloud-fan Thank you! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and u

Re: [PR] [SPARK-51258][SQL] Remove unnecessary inheritance from SQLConfHelper [spark]

2025-02-21 Thread via GitHub

beliefer closed pull request #50005: [SPARK-51258][SQL] Remove unnecessary inheritance from SQLConfHelper URL: https://github.com/apache/spark/pull/50005 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [PR] [SPARK-51283][SQL][TESTS] Add test cases for LZ4 and SNAPPY for text [spark]

2025-02-21 Thread via GitHub

beliefer commented on PR #50043: URL: https://github.com/apache/spark/pull/50043#issuecomment-2675965441 @dongjoon-hyun @LuciferYang Thank you! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] [SPARK-51156][CONNECT] Static token authentication support in Spark Connect [spark]

2025-02-21 Thread via GitHub

HyukjinKwon commented on PR #50006: URL: https://github.com/apache/spark/pull/50006#issuecomment-2675925891 There's a conflict with my PR merged. I resolved them on my own 👍 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

Re: [PR] [SPARK-51016][SQL] Fix for incorrect results on retry for Left Outer Join with indeterministic join keys [spark]

2025-02-21 Thread via GitHub

ahshahid commented on PR #50029: URL: https://github.com/apache/spark/pull/50029#issuecomment-2675895477 > It is unclear to me why the changes to spark core are required - marking the RDD with the appropriate `DeterministicLevel` should be sufficient. I have described the race conditi

Re: [PR] [SPARK-51016][SQL] Fix for incorrect results on retry for Left Outer Join with indeterministic join keys [spark]

2025-02-21 Thread via GitHub

ahshahid commented on PR #50029: URL: https://github.com/apache/spark/pull/50029#issuecomment-2675898956 > It is unclear to me why the changes to spark core are required - marking the RDD with the appropriate `DeterministicLevel` should be sufficient. In case of ShuffleMap stage, the base

Re: [PR] [SPARK-51016][SQL] Fix for incorrect results on retry for Left Outer Join with indeterministic join keys [spark]

2025-02-21 Thread via GitHub

mridulm commented on PR #50029: URL: https://github.com/apache/spark/pull/50029#issuecomment-2675889144 It is unclear to me why the changes to spark core are required - marking the RDD with the appropriate `DeterministicLevel` should be sufficient. -- This is an automated message from the

Re: [PR] [SPARK-51276][PYTHON] Enable spark.sql.execution.arrow.pyspark.enabled by default [spark]

2025-02-21 Thread via GitHub

HyukjinKwon closed pull request #50036: [SPARK-51276][PYTHON] Enable spark.sql.execution.arrow.pyspark.enabled by default URL: https://github.com/apache/spark/pull/50036 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

Re: [PR] [SPARK-51279][CONNECT] Avoid constant sleep for waiting Spark Connect server in Scala [spark]

2025-02-21 Thread via GitHub

HyukjinKwon closed pull request #50039: [SPARK-51279][CONNECT] Avoid constant sleep for waiting Spark Connect server in Scala URL: https://github.com/apache/spark/pull/50039 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

Re: [PR] [SPARK-50110][SQL] Fix read csv fails when data contains spaces before and after [spark]

2025-02-21 Thread via GitHub

github-actions[bot] closed pull request #48653: [SPARK-50110][SQL] Fix read csv fails when data contains spaces before and after URL: https://github.com/apache/spark/pull/48653 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[PR] [SPARK-51291] [WIP] Reclassify validation errors thrown from state store loading [spark]

2025-02-21 Thread via GitHub

liviazhu-db opened a new pull request, #50045: URL: https://github.com/apache/spark/pull/50045 ### What changes were proposed in this pull request? Created new `CANNOT_LOAD_STATE_STORE.KEY_ROW_FORMAT_VALIDATION_FAILURE`/`CANNOT_LOAD_STATE_STORE.VALUE_ROW_FORMAT_VALIDATION_

Re: [PR] [SPARK-50272][SQL] Merge options of table and relation in FallBackFileSourceV2 [spark]

2025-02-21 Thread via GitHub

github-actions[bot] closed pull request #48801: [SPARK-50272][SQL] Merge options of table and relation in FallBackFileSourceV2 URL: https://github.com/apache/spark/pull/48801 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and us

Re: [PR] [SPARK-49992][SQL] Session level collation should not impact DDL queries [spark]

2025-02-21 Thread via GitHub

github-actions[bot] commented on PR #48436: URL: https://github.com/apache/spark/pull/48436#issuecomment-2675868359 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.

Re: [PR] [SPARK-49008][PYTHON] Use `ParamSpec` to propagate `func` signature in `transform` [spark]

2025-02-21 Thread via GitHub

github-actions[bot] closed pull request #47493: [SPARK-49008][PYTHON] Use `ParamSpec` to propagate `func` signature in `transform` URL: https://github.com/apache/spark/pull/47493 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub an

Re: [PR] [SPARK-50357][Core]If a stage contains ExpandExec, the CoalesceShuffl… [spark]

2025-02-21 Thread via GitHub

github-actions[bot] closed pull request #48819: [SPARK-50357][Core]If a stage contains ExpandExec, the CoalesceShuffl… URL: https://github.com/apache/spark/pull/48819 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the UR

Re: [PR] [SPARK-50160][SQL][KAFKA] KafkaWriteTask: allow customizing record timestamp [spark]

2025-02-21 Thread via GitHub

github-actions[bot] closed pull request #48695: [SPARK-50160][SQL][KAFKA] KafkaWriteTask: allow customizing record timestamp URL: https://github.com/apache/spark/pull/48695 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

Re: [PR] [SPARK-51279][CONNECT] Avoid constant sleep for waiting Spark Connect server in Scala [spark]

2025-02-21 Thread via GitHub

HyukjinKwon commented on PR #50039: URL: https://github.com/apache/spark/pull/50039#issuecomment-2675867975 Merged to master and branch-4.0. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the sp

Re: [PR] [SPARK-51276][PYTHON] Enable spark.sql.execution.arrow.pyspark.enabled by default [spark]

2025-02-21 Thread via GitHub

HyukjinKwon commented on PR #50036: URL: https://github.com/apache/spark/pull/50036#issuecomment-2675866833 Merged to master and branch-4.0. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the sp

Re: [PR] [SPARK-51281][SQL] DataFrameWriterV2 should respect the path option [spark]

2025-02-21 Thread via GitHub

dongjoon-hyun commented on code in PR #50040: URL: https://github.com/apache/spark/pull/50040#discussion_r1965769460 ## sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala: ## @@ -5545,6 +5545,15 @@ object SQLConf { .booleanConf .createWithDefault(f

Re: [PR] [SPARK-51290][SQL] Enable filling default values in DSv2 writes [spark]

2025-02-21 Thread via GitHub

aokolnychyi commented on PR #50044: URL: https://github.com/apache/spark/pull/50044#issuecomment-2675807959 cc @cloud-fan @szehon-ho @amaliujia @gengliangwang @dongjoon-hyun @viirya @huaxingao -- This is an automated message from the Apache Git Service. To respond to the message, please

Re: [PR] [SPARK-51290][SQL] Enable filling default values in DSv2 writes [spark]

2025-02-21 Thread via GitHub

aokolnychyi commented on code in PR #50044: URL: https://github.com/apache/spark/pull/50044#discussion_r1966310415 ## sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/V2WriteAnalysisSuite.scala: ## @@ -423,8 +423,8 @@ abstract class V2WriteAnalysisSuiteBase ext

Re: [PR] [SPARK-51278][PYTHON] Use appropriate structure of JSON format for `PySparkLogger` [spark]

2025-02-21 Thread via GitHub

gengliangwang commented on code in PR #50038: URL: https://github.com/apache/spark/pull/50038#discussion_r1966226373 ## python/pyspark/logger/logger.py: ## @@ -66,10 +67,21 @@ def format(self, record: logging.LogRecord) -> str: } if record.exc_info:

Re: [PR] [SPARK-11077] [SQL] Join elimination in Catalyst [spark]

2025-02-21 Thread via GitHub

jzhuge commented on PR #9089: URL: https://github.com/apache/spark/pull/9089#issuecomment-2675622872 Is there any desire to resurrect this PR? e.g., without any breaking changes (new hint only, no api change) -- This is an automated message from the Apache Git Service. To respond to the m

Re: [PR] [SPARK-51273][SQL] Spark Connect Call Procedure runs the procedure twice [spark]

2025-02-21 Thread via GitHub

szehon-ho commented on PR #50031: URL: https://github.com/apache/spark/pull/50031#issuecomment-2675569385 @aokolnychyi @cloud-fan do you guys want to take a look, thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

Re: [PR] [SPARK-51278][PYTHON] Use appropriate structure of JSON format for `PySparkLogger` [spark]

2025-02-21 Thread via GitHub

ueshin commented on code in PR #50038: URL: https://github.com/apache/spark/pull/50038#discussion_r1966078097 ## python/pyspark/logger/logger.py: ## @@ -66,10 +67,21 @@ def format(self, record: logging.LogRecord) -> str: } if record.exc_info: exc_t

Re: [PR] [SPARK-51283][SQL][TESTS] Add test cases for LZ4 and SNAPPY for text [spark]

2025-02-21 Thread via GitHub

dongjoon-hyun commented on PR #50043: URL: https://github.com/apache/spark/pull/50043#issuecomment-2674960601 Thank you, @beliefer and @LuciferYang . Merged to master/4.0. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

Re: [PR] [SPARK-51290][SQL] Enable filling default values in DSv2 writes [spark]

2025-02-21 Thread via GitHub

aokolnychyi commented on code in PR #50044: URL: https://github.com/apache/spark/pull/50044#discussion_r1966030259 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala: ## @@ -3534,7 +3534,8 @@ class Analyzer(override val catalogManager: CatalogM

Re: [PR] [SPARK-51092][SS] Skip the v1 FlatMapGroupsWithState tests with timeout on big endian platforms [spark]

2025-02-21 Thread via GitHub

jonathan-albrecht-ibm commented on PR #49811: URL: https://github.com/apache/spark/pull/49811#issuecomment-2675439317 @HeartSaVioR, thanks for reviewing and merging! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

Re: [PR] [SPARK-51278][PYTHON] Use appropriate structure of JSON format for `PySparkLogger` [spark]

2025-02-21 Thread via GitHub

ueshin commented on code in PR #50038: URL: https://github.com/apache/spark/pull/50038#discussion_r1966078097 ## python/pyspark/logger/logger.py: ## @@ -66,10 +67,21 @@ def format(self, record: logging.LogRecord) -> str: } if record.exc_info: exc_t

Re: [PR] [SPARK-51252] [SS] Add instance metrics for last uploaded snapshot version in HDFS State Stores [spark]

2025-02-21 Thread via GitHub

ericm-db commented on PR #50030: URL: https://github.com/apache/spark/pull/50030#issuecomment-2675381639 Is there a feasible way to combine the tests so we don't have to define them separately for RocksDBStateStoreProvider and HDFSStateStoreProvider? -- This is an automated message from t

Re: [PR] [SPARK-51252] [SS] Add instance metrics for last uploaded snapshot version in HDFS State Stores [spark]

2025-02-21 Thread via GitHub

ericm-db commented on PR #50030: URL: https://github.com/apache/spark/pull/50030#issuecomment-2675392417 Hm, maybe for that check it could be dependent on which state store provider it is? I think there's just a lot of duplication right now -- This is an automated message from the Apache

Re: [PR] [SPARK-51252] [SS] Add instance metrics for last uploaded snapshot version in HDFS State Stores [spark]

2025-02-21 Thread via GitHub

zecookiez commented on PR #50030: URL: https://github.com/apache/spark/pull/50030#issuecomment-2675388682 There is some overlapping logic between the two types of tests that I could see if we can simplify, but HDFS state store do their snapshot upload checks a little bit differently [(see t

Re: [PR] [SPARK-51290][SQL] Enable filling default values in DSv2 writes [spark]

2025-02-21 Thread via GitHub

aokolnychyi commented on code in PR #50044: URL: https://github.com/apache/spark/pull/50044#discussion_r1966035022 ## sql/catalyst/src/test/scala/org/apache/spark/sql/connector/catalog/InMemoryBaseTable.scala: ## @@ -718,6 +724,11 @@ private class BufferedRowsReader( sche

[PR] [SPARK-51290][SQL] Enable filling default values in DSv2 writes [spark]

2025-02-21 Thread via GitHub

aokolnychyi opened a new pull request, #50044: URL: https://github.com/apache/spark/pull/50044 ### What changes were proposed in this pull request? This PR enables filling default values in DSv2 writes. ### Why are the changes needed? These changes are

Re: [PR] [SPARK-51274][PYTHON] PySparkLogger should respect the expected keyword arguments [spark]

2025-02-21 Thread via GitHub

ueshin commented on PR #50032: URL: https://github.com/apache/spark/pull/50032#issuecomment-2675309319 Thanks! merging to master/4.0. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-51274][PYTHON] PySparkLogger should respect the expected keyword arguments [spark]

2025-02-21 Thread via GitHub

ueshin closed pull request #50032: [SPARK-51274][PYTHON] PySparkLogger should respect the expected keyword arguments URL: https://github.com/apache/spark/pull/50032 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

Re: [PR] [SPARK-51156][CONNECT] Static token authentication support in Spark Connect [spark]

2025-02-21 Thread via GitHub

Kimahriman commented on PR #50006: URL: https://github.com/apache/spark/pull/50006#issuecomment-2675251514 > @Kimahriman for scala you probably need to change `org.apache.spark.sql.connect.client.SparkConnectClient.Configuration.credentials`. If you don't want SSL I guess you can compose `I

Re: [PR] [SPARK-51276][PYTHON] Enable spark.sql.execution.arrow.pyspark.enabled by default [spark]

2025-02-21 Thread via GitHub

dongjoon-hyun commented on PR #50036: URL: https://github.com/apache/spark/pull/50036#issuecomment-2674943318 cc @cloud-fan since this is targeting Apache Spark 4.0.0. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use th

Re: [PR] [SPARK-51244][INFRA][3.5] Upgrade left Github Action image from `ubuntu-20.04` to `ubuntu-22.04` and solved the `TPCDSQueryBenchmark` compatibility issue [spark]

2025-02-21 Thread via GitHub

dongjoon-hyun closed pull request #49988: [SPARK-51244][INFRA][3.5] Upgrade left Github Action image from `ubuntu-20.04` to `ubuntu-22.04` and solved the `TPCDSQueryBenchmark` compatibility issue URL: https://github.com/apache/spark/pull/49988 -- This is an automated message from the Apache

Re: [PR] [SPARK-51263][CORE][SQL][TESTS] Clean up unnecessary `invokePrivate` method calls in test code [spark]

2025-02-21 Thread via GitHub

dongjoon-hyun closed pull request #50012: [SPARK-51263][CORE][SQL][TESTS] Clean up unnecessary `invokePrivate` method calls in test code URL: https://github.com/apache/spark/pull/50012 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to Git

Re: [PR] [SPARK-51283][SQL][TESTS] Add test cases for LZ4 and SNAPPY for text [spark]

2025-02-21 Thread via GitHub

dongjoon-hyun closed pull request #50043: [SPARK-51283][SQL][TESTS] Add test cases for LZ4 and SNAPPY for text URL: https://github.com/apache/spark/pull/50043 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [PR] [SPARK-51282][ML][PYTHON][CONNECT] Optimize OneVsRestModel transform by eliminating the JVM-Python data exchange [spark]

2025-02-21 Thread via GitHub

srowen commented on code in PR #50041: URL: https://github.com/apache/spark/pull/50041#discussion_r1965664294 ## python/pyspark/sql/internal.py: ## @@ -130,3 +130,42 @@ def make_interval(unit: str, e: Union[Column, int, float]) -> Column: "SECOND": "secs",

Re: [PR] [SPARK-49479][CORE] Cancel the Timer non-daemon thread on stopping the BarrierCoordinator [spark]

2025-02-21 Thread via GitHub

jjayadeep06 commented on PR #50020: URL: https://github.com/apache/spark/pull/50020#issuecomment-2674692917 @mridulm / @LuciferYang - Can you please review when you get a chance ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitH

Re: [PR] [SPARK-51282][ML][PYTHON][CONNECT] Optimize OneVsRestModel transform by eliminating the JVM-Python data exchange [spark]

2025-02-21 Thread via GitHub

zhengruifeng commented on code in PR #50041: URL: https://github.com/apache/spark/pull/50041#discussion_r1965485829 ## python/pyspark/sql/internal.py: ## @@ -130,3 +130,42 @@ def make_interval(unit: str, e: Union[Column, int, float]) -> Column: "SECOND": "secs",

Re: [PR] [SPARK-51282][ML][PYTHON][CONNECT] Optimize OneVsRestModel transform by eliminating the JVM-Python data exchange [spark]

2025-02-21 Thread via GitHub

zhengruifeng commented on code in PR #50041: URL: https://github.com/apache/spark/pull/50041#discussion_r1965485829 ## python/pyspark/sql/internal.py: ## @@ -130,3 +130,42 @@ def make_interval(unit: str, e: Union[Column, int, float]) -> Column: "SECOND": "secs",

Re: [PR] [SPARK-51282][ML][PYTHON][CONNECT] Optimize OneVsRestModel transform by eliminating the JVM-Python data exchange [spark]

2025-02-21 Thread via GitHub

zhengruifeng commented on code in PR #50041: URL: https://github.com/apache/spark/pull/50041#discussion_r1965493411 ## python/pyspark/sql/internal.py: ## @@ -130,3 +130,42 @@ def make_interval(unit: str, e: Union[Column, int, float]) -> Column: "SECOND": "secs",

Re: [PR] [SPARK-51283][SQL] Add test cases for LZ4 and SNAPPY for text [spark]

2025-02-21 Thread via GitHub

beliefer commented on PR #50043: URL: https://github.com/apache/spark/pull/50043#issuecomment-2674433462 ping @dongjoon-hyun @LuciferYang -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spec

Re: [PR] [SPARK-51284][SQL] Fix SQL Script execution for empty result [spark]

2025-02-21 Thread via GitHub

cloud-fan closed pull request #50024: [SPARK-51284][SQL] Fix SQL Script execution for empty result URL: https://github.com/apache/spark/pull/50024 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] [SPARK-51284][SQL] Fix SQL Script execution for empty result [spark]

2025-02-21 Thread via GitHub

cloud-fan commented on PR #50024: URL: https://github.com/apache/spark/pull/50024#issuecomment-2674411204 thanks, merging to master/4.0! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specif

Re: [PR] [SPARK-51288] Add link for Scala API of Spark Connect [spark]

2025-02-21 Thread via GitHub

HyukjinKwon commented on PR #50042: URL: https://github.com/apache/spark/pull/50042#issuecomment-2674287953 Merged to master and branch-4.0. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the sp

Re: [PR] [SPARK-51282][ML][PYTHON][CONNECT] Optimize OneVsRestModel transform by eliminating the JVM-Python data exchange [spark]

2025-02-21 Thread via GitHub

srowen commented on code in PR #50041: URL: https://github.com/apache/spark/pull/50041#discussion_r1965290086 ## python/pyspark/sql/internal.py: ## @@ -130,3 +130,42 @@ def make_interval(unit: str, e: Union[Column, int, float]) -> Column: "SECOND": "secs",

Re: [PR] [SPARK-50785][SQL] Refactor FOR statement to utilize local variables properly. [spark]

2025-02-21 Thread via GitHub

davidm-db commented on code in PR #50026: URL: https://github.com/apache/spark/pull/50026#discussion_r1965209406 ## sql/core/src/main/scala/org/apache/spark/sql/scripting/SqlScriptingExecutionNode.scala: ## @@ -823,35 +827,41 @@ class ForStatementExec( override def hasN

Re: [PR] [SPARK-50785][SQL] Refactor FOR statement to utilize local variables properly. [spark]

2025-02-21 Thread via GitHub

davidm-db commented on code in PR #50026: URL: https://github.com/apache/spark/pull/50026#discussion_r1965209406 ## sql/core/src/main/scala/org/apache/spark/sql/scripting/SqlScriptingExecutionNode.scala: ## @@ -823,35 +827,41 @@ class ForStatementExec( override def hasN

Re: [PR] [SPARK-50785][SQL] Refactor FOR statement to utilize local variables properly. [spark]

2025-02-21 Thread via GitHub

dusantism-db commented on code in PR #50026: URL: https://github.com/apache/spark/pull/50026#discussion_r1965190136 ## sql/core/src/main/scala/org/apache/spark/sql/scripting/SqlScriptingExecutionNode.scala: ## @@ -823,35 +827,41 @@ class ForStatementExec( override def h

Re: [PR] [SPARK-51284][SQL] Fix SQL Script execution for empty result [spark]

2025-02-21 Thread via GitHub

davidm-db commented on code in PR #50024: URL: https://github.com/apache/spark/pull/50024#discussion_r1965188151 ## sql/core/src/test/scala/org/apache/spark/sql/scripting/SqlScriptingE2eSuite.scala: ## @@ -139,6 +145,24 @@ class SqlScriptingE2eSuite extends QueryTest with Share

[PR] [SPARK-51283][SQL] Add test cases for LZ4 and SNAPPY for text [spark]

2025-02-21 Thread via GitHub

beliefer opened a new pull request, #50043: URL: https://github.com/apache/spark/pull/50043 ### What changes were proposed in this pull request? This PR proposes to add test cases for `LZ4` and `SNAPPY` for text. ### Why are the changes needed? Currently, Spark missing the te

[PR] add link [spark]

2025-02-21 Thread via GitHub

xupefei opened a new pull request, #50042: URL: https://github.com/apache/spark/pull/50042 ### What changes were proposed in this pull request? Follows https://github.com/apache/spark/pull/47332 ### Why are the changes needed? ### Does this PR introduce _any_

[PR] [SPARK-51282][ML][CONNECT] Optimize OneVsRestModel transform by eliminating the JVM-Python data exchange [spark]

2025-02-21 Thread via GitHub

zhengruifeng opened a new pull request, #50041: URL: https://github.com/apache/spark/pull/50041 ### What changes were proposed in this pull request? Optimize OneVsRestModel transform by eliminating the JVM-Python data exchange ### Why are the changes needed? for better perfor

Re: [PR] [SPARK-51281][SQL] DataFrameWriterV2 should respect the path option [spark]

2025-02-21 Thread via GitHub

cloud-fan commented on PR #50040: URL: https://github.com/apache/spark/pull/50040#issuecomment-2673951588 @dongjoon-hyun shall we include it in 3.5.5? Also cc @aokolnychyi -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and u

Re: [PR] [SPARK-51281][SQL] DataFrameWriterV2 should respect the path option [spark]

2025-02-21 Thread via GitHub

cloud-fan commented on code in PR #50040: URL: https://github.com/apache/spark/pull/50040#discussion_r1965099273 ## sql/core/src/test/scala/org/apache/spark/sql/DataFrameWriterV2Suite.scala: ## @@ -839,4 +839,26 @@ class DataFrameWriterV2Suite extends QueryTest with SharedSpark

Re: [PR] [SPARK-51281][SQL] DataFrameWriterV2 should respect the path option [spark]

2025-02-21 Thread via GitHub

cloud-fan commented on code in PR #50040: URL: https://github.com/apache/spark/pull/50040#discussion_r1965098875 ## sql/core/src/main/scala/org/apache/spark/sql/classic/DataFrameWriterV2.scala: ## @@ -146,25 +148,30 @@ final class DataFrameWriterV2[T] private[sql](table: String

[PR] [SPARK-51281][SQL] DataFrameWriterV2 should respect the path option [spark]

2025-02-21 Thread via GitHub

cloud-fan opened a new pull request, #50040: URL: https://github.com/apache/spark/pull/50040 ### What changes were proposed in this pull request? Unlike `DataFrameWriter.saveAsTable` where we explicitly get the "path" option and treat it as table location, `DataFrameWriterV2`

Re: [PR] [SPARK-51265][SQL] IncrementalExecution should set the command execution code correctly [spark]

2025-02-21 Thread via GitHub

HeartSaVioR commented on code in PR #50037: URL: https://github.com/apache/spark/pull/50037#discussion_r1965082740 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/IncrementalExecution.scala: ## @@ -70,8 +70,9 @@ class IncrementalExecution( val currentSta

Re: [PR] [SPARK-51263][CORE][SQL][TESTS] Clean up unnecessary `invokePrivate` method calls in test code [spark]

2025-02-21 Thread via GitHub

LuciferYang commented on PR #50012: URL: https://github.com/apache/spark/pull/50012#issuecomment-2673898056 Thank you @beliefer ~ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific com

Re: [PR] [SPARK-51265][SQL] IncrementalExecution should set the command execution code correctly [spark]

2025-02-21 Thread via GitHub

HeartSaVioR commented on code in PR #50037: URL: https://github.com/apache/spark/pull/50037#discussion_r1965082740 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/IncrementalExecution.scala: ## @@ -70,8 +70,9 @@ class IncrementalExecution( val currentSta

Re: [PR] [SPARK-51275][PYTHON][ML][CONNECT] Session propagation in python readwrite [spark]

2025-02-21 Thread via GitHub

zhengruifeng commented on PR #50035: URL: https://github.com/apache/spark/pull/50035#issuecomment-2673838043 thanks, merged to master -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-51275][PYTHON][ML][CONNECT] Session propagation in python readwrite [spark]

2025-02-21 Thread via GitHub

zhengruifeng closed pull request #50035: [SPARK-51275][PYTHON][ML][CONNECT] Session propagation in python readwrite URL: https://github.com/apache/spark/pull/50035 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL a

Re: [PR] [SPARK-51265][SQL] IncrementalExecution should set the command execution code correctly [spark]

2025-02-21 Thread via GitHub

HeartSaVioR commented on code in PR #50037: URL: https://github.com/apache/spark/pull/50037#discussion_r1965052300 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/IncrementalExecution.scala: ## @@ -70,8 +70,9 @@ class IncrementalExecution( val currentSta

Re: [PR] [SPARK-51279][CONNECT] Avoid constant sleep for waiting Spark Connect server in Scala [spark]

2025-02-21 Thread via GitHub

HyukjinKwon commented on code in PR #50039: URL: https://github.com/apache/spark/pull/50039#discussion_r1965048253 ## sql/connect/common/src/main/scala/org/apache/spark/sql/connect/SparkSession.scala: ## @@ -706,6 +710,33 @@ object SparkSession extends SparkSessionCompanion with

Re: [PR] [SPARK-51279][CONNECT] Avoid constant sleep for waiting Spark Connect server in Scala [spark]

2025-02-21 Thread via GitHub

HyukjinKwon commented on PR #50039: URL: https://github.com/apache/spark/pull/50039#issuecomment-2673867838 cc @cloud-fan -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. T

Re: [PR] [SPARK-51267][CONNECT] Match local Spark Connect server logic between Python and Scala [spark]

2025-02-21 Thread via GitHub

HyukjinKwon commented on code in PR #50017: URL: https://github.com/apache/spark/pull/50017#discussion_r1965048523 ## sql/connect/common/src/main/scala/org/apache/spark/sql/connect/SparkSession.scala: ## @@ -712,37 +729,41 @@ object SparkSession extends SparkSessionCompanion wit

[PR] [SPARK-51279][CONNECT] Avoid constant sleep for waiting Spark Connect server in Scala [spark]

2025-02-21 Thread via GitHub

HyukjinKwon opened a new pull request, #50039: URL: https://github.com/apache/spark/pull/50039 ### What changes were proposed in this pull request? This PR proposes to address https://github.com/apache/spark/pull/50017#discussion_r1963027036 by avoiding constant sleep but waiting unt

75 matches

Mail list logo