Re: [PR] [SPARK-51762][SQL] Fix Resolution of Views in Single-Pass Analyzer When Bridging is Enabled [spark]

2025-04-14 Thread via GitHub
vladimirg-db commented on code in PR #50555: URL: https://github.com/apache/spark/pull/50555#discussion_r2042241844 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/resolver/ViewResolver.scala: ## @@ -101,14 +108,16 @@ class ViewResolver(resolver: Resolver,

Re: [PR] [SPARK-46640][FOLLOW-UP] Consider the whole expression tree when excluding subquery references [spark]

2025-04-14 Thread via GitHub
cloud-fan commented on PR #50570: URL: https://github.com/apache/spark/pull/50570#issuecomment-2803553726 thanks, merging to master/4.0! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specif

Re: [PR] [SPARK-51799] Support user-specified schema in `DataFrameReader` [spark-connect-swift]

2025-04-14 Thread via GitHub
dongjoon-hyun commented on PR #58: URL: https://github.com/apache/spark-connect-swift/pull/58#issuecomment-2803560982 Could you review this PR, @yaooqinn ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [PR] [SPARK-51439][SQL] Support SQL UDF with DEFAULT argument [spark]

2025-04-14 Thread via GitHub
wengh commented on code in PR #50408: URL: https://github.com/apache/spark/pull/50408#discussion_r2043336354 ## sql/api/src/main/scala/org/apache/spark/sql/types/StructField.scala: ## @@ -218,9 +224,11 @@ case class StructField( metadata.contains(EXISTS_DEFAULT_COLUMN_METAD

Re: [PR] [SPARK-51439][SQL] Support SQL UDF with DEFAULT argument [spark]

2025-04-14 Thread via GitHub
wengh commented on code in PR #50408: URL: https://github.com/apache/spark/pull/50408#discussion_r2043344382 ## sql/core/src/test/scala/org/apache/spark/sql/execution/SQLFunctionSuite.scala: ## @@ -74,4 +74,17 @@ class SQLFunctionSuite extends QueryTest with SharedSparkSession

Re: [PR] [SPARK-46640][FOLLOW-UP] Consider the whole expression tree when excluding subquery references [spark]

2025-04-14 Thread via GitHub
cloud-fan closed pull request #50570: [SPARK-46640][FOLLOW-UP] Consider the whole expression tree when excluding subquery references URL: https://github.com/apache/spark/pull/50570 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

Re: [PR] [SPARK-46640][FOLLOW-UP] Consider the whole expression tree when excluding subquery references [spark]

2025-04-14 Thread via GitHub
nikhilsheoran-db commented on PR #50570: URL: https://github.com/apache/spark/pull/50570#issuecomment-2802936378 > Can we re-trigger the Github Action jobs? Done. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

Re: [PR] [SPARK-51688][PYTHON] Use Unix Domain Socket between Python and JVM communication [spark]

2025-04-14 Thread via GitHub
HyukjinKwon closed pull request #50466: [SPARK-51688][PYTHON] Use Unix Domain Socket between Python and JVM communication URL: https://github.com/apache/spark/pull/50466 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

Re: [PR] [SPARK-51688][PYTHON] Use Unix Domain Socket between Python and JVM communication [spark]

2025-04-14 Thread via GitHub
HyukjinKwon commented on PR #50466: URL: https://github.com/apache/spark/pull/50466#issuecomment-2803500466 Merged to master. I will followup for CI and refactoring. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and u

Re: [PR] [SPARK-51796] [SQL] Disallow Sort order expressions under non-Sort operators [spark]

2025-04-14 Thread via GitHub
cloud-fan commented on PR #50582: URL: https://github.com/apache/spark/pull/50582#issuecomment-2803483143 there are cases that an expression containing `SortOrder` (e.g. `ListAgg`) can appear in non-sort operators. We need to find out all these cases. -- This is an automated message from

Re: [PR] [SPARK-51800][INFRA] Set up the CI for UDS in PySpark [spark]

2025-04-14 Thread via GitHub
HyukjinKwon commented on PR #50585: URL: https://github.com/apache/spark/pull/50585#issuecomment-2803517828 cc @zhengruifeng @LuciferYang @ueshin FYI -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [PR] [SPARK-51439][SQL] Support SQL UDF with DEFAULT argument [spark]

2025-04-14 Thread via GitHub
allisonwang-db commented on code in PR #50408: URL: https://github.com/apache/spark/pull/50408#discussion_r2043286455 ## sql/api/src/main/scala/org/apache/spark/sql/types/StructField.scala: ## @@ -218,9 +224,11 @@ case class StructField( metadata.contains(EXISTS_DEFAULT_COL

[PR] [SPARK-51800][INFRA] Set up the CI for UDS in PySpark [spark]

2025-04-14 Thread via GitHub
HyukjinKwon opened a new pull request, #50585: URL: https://github.com/apache/spark/pull/50585 ### What changes were proposed in this pull request? This PR is a followup of https://github.com/apache/spark/pull/50466 that sets the CI for UDS in Python. ### Why are the changes ne

Re: [PR] [SPARK-51439][SQL] Support SQL UDF with DEFAULT argument [spark]

2025-04-14 Thread via GitHub
wengh commented on code in PR #50408: URL: https://github.com/apache/spark/pull/50408#discussion_r2043336354 ## sql/api/src/main/scala/org/apache/spark/sql/types/StructField.scala: ## @@ -218,9 +224,11 @@ case class StructField( metadata.contains(EXISTS_DEFAULT_COLUMN_METAD

Re: [PR] [SPARK-51791][ML] `ImputerModel` stores coefficients with arrays instead of dataframe [spark]

2025-04-14 Thread via GitHub
zhengruifeng closed pull request #50578: [SPARK-51791][ML] `ImputerModel` stores coefficients with arrays instead of dataframe URL: https://github.com/apache/spark/pull/50578 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and us

Re: [PR] [SPARK-51771][SQL] Add DSv2 APIs for ALTER TABLE ADD/DROP CONSTRAINT [spark]

2025-04-14 Thread via GitHub
cloud-fan commented on code in PR #50561: URL: https://github.com/apache/spark/pull/50561#discussion_r2043786095 ## sql/catalyst/src/main/scala/org/apache/spark/sql/connector/catalog/CatalogV2Util.scala: ## @@ -296,6 +297,49 @@ private[sql] object CatalogV2Util { } } +

Re: [PR] [SPARK-51611][SQL] New iteration of single-pass Analyzer functionality [spark]

2025-04-14 Thread via GitHub
iwanttobepowerful commented on PR #50406: URL: https://github.com/apache/spark/pull/50406#issuecomment-2804008495 @vladimirg-db hi, does new Analyzer's principle similar to calcite's validator? -- This is an automated message from the Apache Git Service. To respond to the message, please

Re: [PR] [SPARK-51800][INFRA] Set up the CI for UDS in PySpark [spark]

2025-04-14 Thread via GitHub
LuciferYang commented on PR #50585: URL: https://github.com/apache/spark/pull/50585#issuecomment-2803945674 late LGTM -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To uns

Re: [PR] [SPARK-51638][CORE] Fix fetching the remote disk stored RDD blocks via the external shuffle service [spark]

2025-04-14 Thread via GitHub
mridulm commented on code in PR #50439: URL: https://github.com/apache/spark/pull/50439#discussion_r2043250802 ## core/src/main/scala/org/apache/spark/storage/BlockManagerMasterEndpoint.scala: ## @@ -863,36 +863,36 @@ class BlockManagerMasterEndpoint( blockId: BlockId,

Re: [PR] [SPARK-51799] Support user-specified schema in `DataFrameReader` [spark-connect-swift]

2025-04-14 Thread via GitHub
yaooqinn commented on code in PR #58: URL: https://github.com/apache/spark-connect-swift/pull/58#discussion_r2043405612 ## Tests/SparkConnectTests/DataFrameReaderTests.swift: ## @@ -85,4 +85,27 @@ struct DataFrameReaderTests { }) await spark.stop() } + + @Test +

Re: [PR] [SPARK-51688][PYTHON][FOLLOW-UP] Implement UDS in Accumulators [spark]

2025-04-14 Thread via GitHub
HyukjinKwon commented on PR #50587: URL: https://github.com/apache/spark/pull/50587#issuecomment-2803731737 cc @ueshin -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To u

Re: [PR] [SPARK-51768][SS][TESTS] Create Failure Injection Test for Streaming offset and commit log write failures [spark]

2025-04-14 Thread via GitHub
HeartSaVioR commented on code in PR #50559: URL: https://github.com/apache/spark/pull/50559#discussion_r2043086506 ## sql/core/src/test/scala/org/apache/spark/sql/execution/streaming/state/RocksDBCheckpointFailureInjectionSuite.scala: ## @@ -414,6 +414,84 @@ class RocksDBCheckpo

Re: [PR] Test kryo 4.0.3 [spark]

2025-04-14 Thread via GitHub
LuciferYang commented on PR #50586: URL: https://github.com/apache/spark/pull/50586#issuecomment-2803694558 cc @yaooqinn Let's test 4.0.3 first -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] Test kryo 4.0.3 [spark]

2025-04-14 Thread via GitHub
LuciferYang commented on code in PR #50586: URL: https://github.com/apache/spark/pull/50586#discussion_r2043498047 ## pom.xml: ## @@ -493,6 +494,16 @@ chill-java ${chill.version} + +com.esotericsoftware +kryo-shaded +${kryo

Re: [PR] [SPARK-51799] Support user-specified schema in `DataFrameReader` [spark-connect-swift]

2025-04-14 Thread via GitHub
dongjoon-hyun commented on code in PR #58: URL: https://github.com/apache/spark-connect-swift/pull/58#discussion_r2043483481 ## Tests/SparkConnectTests/DataFrameReaderTests.swift: ## @@ -85,4 +85,27 @@ struct DataFrameReaderTests { }) await spark.stop() } + + @Tes

Re: [PR] [SPARK-51799] Support user-specified schema in `DataFrameReader` [spark-connect-swift]

2025-04-14 Thread via GitHub
dongjoon-hyun commented on PR #58: URL: https://github.com/apache/spark-connect-swift/pull/58#issuecomment-2803683781 Merged to main. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-51799] Support user-specified schema in `DataFrameReader` [spark-connect-swift]

2025-04-14 Thread via GitHub
dongjoon-hyun closed pull request #58: [SPARK-51799] Support user-specified schema in `DataFrameReader` URL: https://github.com/apache/spark-connect-swift/pull/58 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL ab

Re: [PR] [SPARK-51272][CORE]. Fix for the race condition in Scheduler causing failure in retrying all partitions in case of indeterministic shuffle keys [spark]

2025-04-14 Thread via GitHub
attilapiros commented on PR #50033: URL: https://github.com/apache/spark/pull/50033#issuecomment-2801616710 > If yes, we could be more aggressive when handling this case - > Invalidate all downstream shuffle output > Any result stage which has/had started, and not completed - fail t

Re: [PR] [SPARK-51793] Support `ddlParse` and `jsonToDdl` in `SparkConnectClient` [spark-connect-swift]

2025-04-14 Thread via GitHub
dongjoon-hyun commented on PR #57: URL: https://github.com/apache/spark-connect-swift/pull/57#issuecomment-2803378343 Thank you, @viirya . Merged to main. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[PR] [SPARK-51799] Support user-specified schema in `DataFrameReader` [spark-connect-swift]

2025-04-14 Thread via GitHub
dongjoon-hyun opened a new pull request, #58: URL: https://github.com/apache/spark-connect-swift/pull/58 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change?

Re: [PR] [SPARK-51793] Support `ddlParse` and `jsonToDdl` in `SparkConnectClient` [spark-connect-swift]

2025-04-14 Thread via GitHub
dongjoon-hyun closed pull request #57: [SPARK-51793] Support `ddlParse` and `jsonToDdl` in `SparkConnectClient` URL: https://github.com/apache/spark-connect-swift/pull/57 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use th

Re: [PR] [SPARK-48728][SQL] Support ignoreNulls for collect_list and collect_set [spark]

2025-04-14 Thread via GitHub
github-actions[bot] closed pull request #47149: [SPARK-48728][SQL] Support ignoreNulls for collect_list and collect_set URL: https://github.com/apache/spark/pull/47149 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the U

Re: [PR] [SPARK-50600][SQL] Set analyzed on analysis failure [spark]

2025-04-14 Thread via GitHub
github-actions[bot] commented on PR #49236: URL: https://github.com/apache/spark/pull/49236#issuecomment-2803412462 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.

Re: [PR] [SPARK-51768][SS][TESTS] Create Failure Injection Test for Streaming offset and commit log write failures [spark]

2025-04-14 Thread via GitHub
HeartSaVioR commented on code in PR #50559: URL: https://github.com/apache/spark/pull/50559#discussion_r2043090343 ## sql/core/src/test/scala/org/apache/spark/sql/execution/streaming/state/RocksDBCheckpointFailureInjectionSuite.scala: ## @@ -414,6 +414,84 @@ class RocksDBCheckpo

Re: [PR] [SPARK-51395][SQL][TESTS][FOLLOW-UP] Explicitly sets failOnError in Abs at tests [spark]

2025-04-14 Thread via GitHub
HyukjinKwon closed pull request #50577: [SPARK-51395][SQL][TESTS][FOLLOW-UP] Explicitly sets failOnError in Abs at tests URL: https://github.com/apache/spark/pull/50577 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

Re: [PR] [SPARK-51395][SQL][TESTS][FOLLOW-UP] Explicitly sets failOnError in Abs at tests [spark]

2025-04-14 Thread via GitHub
HyukjinKwon commented on PR #50577: URL: https://github.com/apache/spark/pull/50577#issuecomment-2803265876 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] [SPARK-51688][PYTHON] Use Unix Domain Socket between Python and JVM communication [spark]

2025-04-14 Thread via GitHub
ueshin commented on code in PR #50466: URL: https://github.com/apache/spark/pull/50466#discussion_r2042971192 ## python/pyspark/ml/deepspeed/tests/test_deepspeed_distributor.py: ## @@ -227,7 +227,7 @@ def setUpClass(cls) -> None: conf = conf.set(k, v) conf

Re: [PR] [SPARK-51688][PYTHON] Use Unix Domain Socket between Python and JVM communication [spark]

2025-04-14 Thread via GitHub
ueshin commented on code in PR #50466: URL: https://github.com/apache/spark/pull/50466#discussion_r2042983687 ## core/src/main/scala/org/apache/spark/internal/config/Python.scala: ## @@ -70,6 +70,29 @@ private[spark] object Python { .booleanConf .createWithDefault(fals

Re: [PR] [SPARK-51688][PYTHON] Use Unix Domain Socket between Python and JVM communication [spark]

2025-04-14 Thread via GitHub
ueshin commented on code in PR #50466: URL: https://github.com/apache/spark/pull/50466#discussion_r2042985313 ## core/src/main/scala/org/apache/spark/internal/config/Python.scala: ## @@ -70,6 +70,29 @@ private[spark] object Python { .booleanConf .createWithDefault(fals

Re: [PR] [SPARK-51553][SQL] Modify EXTRACT to support TIME data type [spark]

2025-04-14 Thread via GitHub
vinodkc commented on code in PR #50558: URL: https://github.com/apache/spark/pull/50558#discussion_r2042988479 ## sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala: ## @@ -4292,6 +4292,15 @@ class SQLQuerySuite extends QueryTest with SharedSparkSession with Adapt

Re: [PR] [SPARK-51795][BUILD] Bump Parquet 1.15.1 [spark]

2025-04-14 Thread via GitHub
HyukjinKwon commented on PR #50583: URL: https://github.com/apache/spark/pull/50583#issuecomment-2803269819 branch-3.4 is EOL -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] [SPARK-51716][SQL] Support serializing Variant to XML [spark]

2025-04-14 Thread via GitHub
HyukjinKwon commented on PR #50560: URL: https://github.com/apache/spark/pull/50560#issuecomment-2803261187 @xiaonanyang-db mind filling the PR description? e.g., you added the unittest, and your change will be tested with that -- This is an automated message from the Apache Git Service.

Re: [PR] [SPARK-51768][SS][TESTS] Create Failure Injection Test for Streaming offset and commit log write failures [spark]

2025-04-14 Thread via GitHub
HeartSaVioR commented on code in PR #50559: URL: https://github.com/apache/spark/pull/50559#discussion_r2043006686 ## sql/core/src/test/scala/org/apache/spark/sql/execution/streaming/state/RocksDBCheckpointFailureInjectionSuite.scala: ## @@ -414,6 +414,84 @@ class RocksDBCheckpo

Re: [PR] [SPARK-51758][SS] Apply late record filtering based on watermark only if timeMode is passed as EventTime to the transformWithState operator [spark]

2025-04-14 Thread via GitHub
HeartSaVioR closed pull request #50550: [SPARK-51758][SS] Apply late record filtering based on watermark only if timeMode is passed as EventTime to the transformWithState operator URL: https://github.com/apache/spark/pull/50550 -- This is an automated message from the Apache Git Service. To

Re: [PR] [SPARK-51768][SS][TESTS] Create Failure Injection Test for Streaming offset and commit log write failures [spark]

2025-04-14 Thread via GitHub
HeartSaVioR commented on code in PR #50559: URL: https://github.com/apache/spark/pull/50559#discussion_r2043006686 ## sql/core/src/test/scala/org/apache/spark/sql/execution/streaming/state/RocksDBCheckpointFailureInjectionSuite.scala: ## @@ -414,6 +414,84 @@ class RocksDBCheckpo

Re: [PR] [SPARK-51747][SQL][FOLLOW-UP] Data source cached plan conf and migration guide [spark]

2025-04-14 Thread via GitHub
gengliangwang commented on PR #50571: URL: https://github.com/apache/spark/pull/50571#issuecomment-2803639033 Thanks, merging to master/3.0 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spe

Re: [PR] [SPARK-51272][CORE]. Fix for the race condition in Scheduler causing failure in retrying all partitions in case of indeterministic shuffle keys [spark]

2025-04-14 Thread via GitHub
ahshahid commented on code in PR #50033: URL: https://github.com/apache/spark/pull/50033#discussion_r2042430708 ## core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala: ## @@ -1554,6 +1554,7 @@ private[spark] class DAGScheduler( case sms: ShuffleMapStage if s

Re: [PR] [SPARK-51795][BUILD] Bump Parquet 1.15.1 [spark]

2025-04-14 Thread via GitHub
CarterFendley commented on PR #50583: URL: https://github.com/apache/spark/pull/50583#issuecomment-2802419359 Tagging @pan3793 @LuciferYang @cloud-fan @yaooqinn @the-sakthi @dongjoon-hyun for visibility. -- This is an automated message from the Apache Git Service. To respond to the messag

Re: [PR] [SPARK-51780][SQL] Implement Describe Procedure [spark]

2025-04-14 Thread via GitHub
szehon-ho commented on PR #50569: URL: https://github.com/apache/spark/pull/50569#issuecomment-2802408850 Thank you! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsu

Re: [PR] [SPARK-51716][SQL] Support serializing Variant to XML [spark]

2025-04-14 Thread via GitHub
xiaonanyang-db commented on PR #50560: URL: https://github.com/apache/spark/pull/50560#issuecomment-2802434892 Hey @HyukjinKwon , can you help merge this PR? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abov

Re: [PR] [SPARK-51549][BUILD][3.5] Bump Parquet 1.15.1 [spark]

2025-04-14 Thread via GitHub
CarterFendley commented on PR #50528: URL: https://github.com/apache/spark/pull/50528#issuecomment-2802433472 @LuciferYang @dongjoon-hyun Do you know when we will get a release of this? The latest `3.5.5` is still critically vulnerable. -- This is an automated message from the Apach

Re: [PR] [SPARK-51747][SQL][FOLLOW-UP] Data source cached plan conf and migration guide [spark]

2025-04-14 Thread via GitHub
szehon-ho commented on code in PR #50571: URL: https://github.com/apache/spark/pull/50571#discussion_r2042624601 ## sql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLSuite.scala: ## @@ -1376,32 +1398,67 @@ abstract class DDLSuite extends QueryTest with DDLSuiteB

[PR] [SPARK-51797][SQL] Add table name to TableConstraint and remove parser dependencies [spark]

2025-04-14 Thread via GitHub
gengliangwang opened a new pull request, #50584: URL: https://github.com/apache/spark/pull/50584 ### What changes were proposed in this pull request? 1. Made `tableName` a class member of `TableConstraint` trait and its implementations - Added `tableName` as a requir

[PR] [SPARK-51795][BUILD] Bump Parquet 1.15.1 [spark]

2025-04-14 Thread via GitHub
CarterFendley opened a new pull request, #50583: URL: https://github.com/apache/spark/pull/50583 ### What changes were proposed in this pull request? Bump Parquet 1.15.1. Backporting #50319 ### Why are the changes needed? Release Notes https://github.co

Re: [PR] [SPARK-51716][SQL] Support serializing Variant to XML [spark]

2025-04-14 Thread via GitHub
chenhao-db commented on code in PR #50560: URL: https://github.com/apache/spark/pull/50560#discussion_r2042707079 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/xml/StaxXmlGenerator.scala: ## @@ -241,4 +251,168 @@ class StaxXmlGenerator( } } } + + /

Re: [PR] [SPARK-51792] Support `saveAsTable` and `insertInto` [spark-connect-swift]

2025-04-14 Thread via GitHub
dongjoon-hyun commented on PR #56: URL: https://github.com/apache/spark-connect-swift/pull/56#issuecomment-2800825230 Thank you again! Merged to main. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to g

Re: [PR] [SPARK-51792] Support `saveAsTable` and `insertInto` [spark-connect-swift]

2025-04-14 Thread via GitHub
dongjoon-hyun closed pull request #56: [SPARK-51792] Support `saveAsTable` and `insertInto` URL: https://github.com/apache/spark-connect-swift/pull/56 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] [SPARK-51792] Support `saveAsTable` and `insertInto` [spark-connect-swift]

2025-04-14 Thread via GitHub
dongjoon-hyun commented on PR #56: URL: https://github.com/apache/spark-connect-swift/pull/56#issuecomment-2800812157 Thank you! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comm

Re: [PR] [SPARK-51272][CORE]. Fix for the race condition in Scheduler causing failure in retrying all partitions in case of indeterministic shuffle keys [spark]

2025-04-14 Thread via GitHub
ahshahid commented on code in PR #50033: URL: https://github.com/apache/spark/pull/50033#discussion_r2042486967 ## core/src/main/scala/org/apache/spark/scheduler/ResultStage.scala: ## @@ -64,5 +75,16 @@ private[spark] class ResultStage( (0 until job.numPartitions).filter(id

Re: [PR] [SPARK-51272][CORE]. Fix for the race condition in Scheduler causing failure in retrying all partitions in case of indeterministic shuffle keys [spark]

2025-04-14 Thread via GitHub
ahshahid commented on code in PR #50033: URL: https://github.com/apache/spark/pull/50033#discussion_r2042446623 ## core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala: ## @@ -2171,21 +2189,41 @@ private[spark] class DAGScheduler( abortStage

Re: [PR] [SPARK-51796] [SQL] Disallow Sort order expressions under non-Sort operators [spark]

2025-04-14 Thread via GitHub
mihailoale-db commented on PR #50582: URL: https://github.com/apache/spark/pull/50582#issuecomment-2802213273 @cloud-fan should we do it like this or should we move `checkExpressionUnderUnsupportedOperator ` check to the begging of the operator match? Thanks -- This is an automated messa

[PR] [SPARK-51796] [SQL] Disallow Sort order expressions under non-Sort operators [spark]

2025-04-14 Thread via GitHub
mihailoale-db opened a new pull request, #50582: URL: https://github.com/apache/spark/pull/50582 ### What changes were proposed in this pull request? I propose that we disallow `SortOrder` expressions under non-`Sort` operators. ### Why are the changes needed? Following Datafram

Re: [PR] [SPARK-51747][SQL][FOLLOW-UP] Data source cached plan conf and migration guide [spark]

2025-04-14 Thread via GitHub
gengliangwang commented on code in PR #50571: URL: https://github.com/apache/spark/pull/50571#discussion_r2042531784 ## sql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLSuite.scala: ## @@ -1376,32 +1398,67 @@ abstract class DDLSuite extends QueryTest with DDLSu

Re: [PR] [SPARK-51638][CORE] Fix fetching the remote disk stored RDD blocks via the external shuffle service [spark]

2025-04-14 Thread via GitHub
attilapiros commented on code in PR #50439: URL: https://github.com/apache/spark/pull/50439#discussion_r2042748146 ## core/src/main/scala/org/apache/spark/storage/BlockManagerMasterEndpoint.scala: ## @@ -863,36 +863,36 @@ class BlockManagerMasterEndpoint( blockId: BlockId

Re: [PR] [SPARK-51758][SS] Apply late record filtering based on watermark only if timeMode is passed as EventTime to the transformWithState operator [spark]

2025-04-14 Thread via GitHub
HeartSaVioR commented on code in PR #50550: URL: https://github.com/apache/spark/pull/50550#discussion_r2042211677 ## sql/core/src/test/scala/org/apache/spark/sql/streaming/TransformWithStateSuite.scala: ## @@ -2358,4 +2382,50 @@ class TransformWithStateValidationSuite extends

Re: [PR] [SPARK-51775][SQL] Normalize LogicalRelation and HiveTableRelation by NormalizePlan [spark]

2025-04-14 Thread via GitHub
cloud-fan closed pull request #50563: [SPARK-51775][SQL] Normalize LogicalRelation and HiveTableRelation by NormalizePlan URL: https://github.com/apache/spark/pull/50563 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

Re: [PR] [SPARK-51797][SQL] Add table name to TableConstraint and remove parser dependencies [spark]

2025-04-14 Thread via GitHub
gengliangwang commented on PR #50584: URL: https://github.com/apache/spark/pull/50584#issuecomment-2802559008 cc @aokolnychyi @szehon-ho -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the speci

Re: [PR] [SPARK-51762][SQL] Fix Resolution of Views in Single-Pass Analyzer When Bridging is Enabled [spark]

2025-04-14 Thread via GitHub
mihailom-db commented on code in PR #50555: URL: https://github.com/apache/spark/pull/50555#discussion_r2042226396 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/resolver/ViewResolver.scala: ## @@ -101,14 +108,16 @@ class ViewResolver(resolver: Resolver, c

Re: [PR] [SPARK-51553][SQL] Modify EXTRACT to support TIME data type [spark]

2025-04-14 Thread via GitHub
MaxGekk commented on code in PR #50558: URL: https://github.com/apache/spark/pull/50558#discussion_r2042301330 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala: ## @@ -650,6 +650,7 @@ object FunctionRegistry { expression[Now]("now

Re: [PR] [SPARK-51747][SQL][FOLLOW-UP] Data source cached plan conf and migration guide [spark]

2025-04-14 Thread via GitHub
gengliangwang commented on code in PR #50571: URL: https://github.com/apache/spark/pull/50571#discussion_r2042515840 ## sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala: ## @@ -5260,6 +5260,15 @@ object SQLConf { .booleanConf .createWithDefault(f

Re: [PR] [SPARK-51747][SQL][FOLLOW-UP] Data source cached plan conf and migration guide [spark]

2025-04-14 Thread via GitHub
szehon-ho commented on code in PR #50571: URL: https://github.com/apache/spark/pull/50571#discussion_r2042624601 ## sql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLSuite.scala: ## @@ -1376,32 +1398,67 @@ abstract class DDLSuite extends QueryTest with DDLSuiteB

Re: [PR] [SPARK-51747][SQL][FOLLOW-UP] Data source cached plan conf and migration guide [spark]

2025-04-14 Thread via GitHub
szehon-ho commented on code in PR #50571: URL: https://github.com/apache/spark/pull/50571#discussion_r2042631436 ## sql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLSuite.scala: ## @@ -1376,32 +1398,67 @@ abstract class DDLSuite extends QueryTest with DDLSuiteB

Re: [PR] [SPARK-51747][SQL][FOLLOW-UP] Data source cached plan conf and migration guide [spark]

2025-04-14 Thread via GitHub
szehon-ho commented on code in PR #50571: URL: https://github.com/apache/spark/pull/50571#discussion_r2042631436 ## sql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLSuite.scala: ## @@ -1376,32 +1398,67 @@ abstract class DDLSuite extends QueryTest with DDLSuiteB

Re: [PR] [SPARK-51747][SQL][FOLLOW-UP] Data source cached plan conf and migration guide [spark]

2025-04-14 Thread via GitHub
szehon-ho commented on code in PR #50571: URL: https://github.com/apache/spark/pull/50571#discussion_r2042631436 ## sql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLSuite.scala: ## @@ -1376,32 +1398,67 @@ abstract class DDLSuite extends QueryTest with DDLSuiteB

Re: [PR] [SPARK-51758][SS] Apply late record filtering based on watermark only if timeMode is passed as EventTime to the transformWithState operator [spark]

2025-04-14 Thread via GitHub
anishshri-db commented on code in PR #50550: URL: https://github.com/apache/spark/pull/50550#discussion_r2042791759 ## sql/core/src/test/scala/org/apache/spark/sql/streaming/TransformWithStateSuite.scala: ## @@ -2358,4 +2382,50 @@ class TransformWithStateValidationSuite extends

Re: [PR] [SPARK-50992][SQL] OOMs and performance issues with AQE in large plans [spark]

2025-04-14 Thread via GitHub
SauronShepherd commented on PR #49724: URL: https://github.com/apache/spark/pull/49724#issuecomment-2800695732 I forgot to update Spark Connect and tests with the changes proposed here. Besides, I'm going to keep the changes simpler and without user-facing changes. I'll open a new PR in sho

[PR] Test no SPARK_LOCAL_IP [spark]

2025-04-14 Thread via GitHub
LuciferYang opened a new pull request, #50581: URL: https://github.com/apache/spark/pull/50581 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How

[PR] [SPARK-51793] Support `ddlParse` and `jsonToDdl` in `SparkConnectClient` [spark-connect-swift]

2025-04-14 Thread via GitHub
dongjoon-hyun opened a new pull request, #57: URL: https://github.com/apache/spark-connect-swift/pull/57 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change?

Re: [PR] [SPARK-51778][SQL][TESTS] Close SQL test gaps discovered during single-pass Analyzer implementation [spark]

2025-04-14 Thread via GitHub
cloud-fan commented on PR #50568: URL: https://github.com/apache/spark/pull/50568#issuecomment-2801726494 thanks, merging to master! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific c

Re: [PR] [SPARK-51778][SQL][TESTS] Close SQL test gaps discovered during single-pass Analyzer implementation [spark]

2025-04-14 Thread via GitHub
cloud-fan closed pull request #50568: [SPARK-51778][SQL][TESTS] Close SQL test gaps discovered during single-pass Analyzer implementation URL: https://github.com/apache/spark/pull/50568 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to Gi

Re: [PR] [SPARK-51792] Support `saveAsTable` and `insertInto` [spark-connect-swift]

2025-04-14 Thread via GitHub
viirya commented on PR #56: URL: https://github.com/apache/spark-connect-swift/pull/56#issuecomment-2800812923 The failed test looks like memory issue? ``` freed pointer was not the last allocation *** Signal 6: Backtracing from 0x7f9cdb69eb2c... done *** *** Program cras

Re: [PR] [SPARK-51792] Support `saveAsTable` and `insertInto` [spark-connect-swift]

2025-04-14 Thread via GitHub
dongjoon-hyun commented on PR #56: URL: https://github.com/apache/spark-connect-swift/pull/56#issuecomment-2800816040 It's `Swift` `Linux` implementation issue~ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

Re: [PR] Enable -Xsource:3 compiler flag [spark]

2025-04-14 Thread via GitHub
zhengruifeng commented on PR #50474: URL: https://github.com/apache/spark/pull/50474#issuecomment-2800690377 also cc @rednaxelafx -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific com

Re: [PR] [SPARK-51762][SQL] Fix Resolution of Views in Single-Pass Analyzer When Bridging is Enabled [spark]

2025-04-14 Thread via GitHub
vladimirg-db commented on code in PR #50555: URL: https://github.com/apache/spark/pull/50555#discussion_r2041688571 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/resolver/ViewResolver.scala: ## @@ -101,14 +108,16 @@ class ViewResolver(resolver: Resolver,

Re: [PR] [SPARK-43782][CORE] Support log level configuration with static Spark conf [spark]

2025-04-14 Thread via GitHub
mehulbatra-d11 commented on PR #41302: URL: https://github.com/apache/spark/pull/41302#issuecomment-2801850393 @dongjoon-hyun Currently we are facing trouble while syncing events logs to s3 using Spark Connect, Is it expected that with Spark 3.5 it's currently not possible to move events lo

Re: [PR] [SPARK-46640][FOLLOW-UP] Consider the whole expression tree when excluding subquery references [spark]

2025-04-14 Thread via GitHub
cloud-fan commented on PR #50570: URL: https://github.com/apache/spark/pull/50570#issuecomment-2801855621 Can we re-trigger the Github Action jobs? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] [SPARK-47081][CONNECT] Support Query Execution Progress [spark]

2025-04-14 Thread via GitHub
grundprinzip commented on PR #45150: URL: https://github.com/apache/spark/pull/45150#issuecomment-2800697846 @virrrat this is not planned and given the Spark 4 release soon, not practical. -- This is an automated message from the Apache Git Service. To respond to the message, please log o

Re: [PR] [SPARK-51775][SQL] Normalize LogicalRelation and HiveTableRelation by NormalizePlan [spark]

2025-04-14 Thread via GitHub
vladimirg-db commented on code in PR #50563: URL: https://github.com/apache/spark/pull/50563#discussion_r2041986430 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/NormalizeableRelation.scala: ## @@ -0,0 +1,29 @@ +/* + * Licensed to the Apache Software Foun

Re: [PR] [SPARK-51792] Support `saveAsTable` and `insertInto` [spark-connect-swift]

2025-04-14 Thread via GitHub
dongjoon-hyun commented on PR #56: URL: https://github.com/apache/spark-connect-swift/pull/56#issuecomment-2800819128 It doesn't happen on MacOS local and `integration-test-mac` and happens in linux environment randomly. I'm still looking at those linux issues. -- This is an automated me

[PR] [SPARK-51792] Support `saveAsTable` and `insertInto` [spark-connect-swift]

2025-04-14 Thread via GitHub
dongjoon-hyun opened a new pull request, #56: URL: https://github.com/apache/spark-connect-swift/pull/56 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change?

[PR] [SPARK-51633][CORE][TESTS] Test [spark]

2025-04-14 Thread via GitHub
LuciferYang opened a new pull request, #50580: URL: https://github.com/apache/spark/pull/50580 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How

Re: [PR] Test no SPARK_LOCAL_IP [spark]

2025-04-14 Thread via GitHub
LuciferYang closed pull request #50581: Test no SPARK_LOCAL_IP URL: https://github.com/apache/spark/pull/50581 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e

Re: [PR] [SPARK-51633][CORE][TESTS] Reset `Utils#customHostname` in the `finally` block of `ExecutorSuite#withExecutor` [spark]

2025-04-14 Thread via GitHub
LuciferYang commented on PR #50580: URL: https://github.com/apache/spark/pull/50580#issuecomment-2803822335 Merged into master and branch-4.0. Thanks @HyukjinKwon and @dongjoon-hyun -- This is an automated message from the Apache Git Service. To respond to the message, please log on to Gi

Re: [PR] [SPARK-51688][PYTHON] Use Unix Domain Socket between Python and JVM communication [spark]

2025-04-14 Thread via GitHub
HyukjinKwon commented on code in PR #50466: URL: https://github.com/apache/spark/pull/50466#discussion_r2043135746 ## python/pyspark/ml/deepspeed/tests/test_deepspeed_distributor.py: ## @@ -227,7 +227,7 @@ def setUpClass(cls) -> None: conf = conf.set(k, v)

[PR] [SPARK-51801] Upgrade ORC Format to 1.1.0 [spark]

2025-04-14 Thread via GitHub
dongjoon-hyun opened a new pull request, #50588: URL: https://github.com/apache/spark/pull/50588 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### H

Re: [PR] [SPARK-51800][INFRA] Set up the CI for UDS in PySpark [spark]

2025-04-14 Thread via GitHub
HyukjinKwon commented on PR #50585: URL: https://github.com/apache/spark/pull/50585#issuecomment-2803746192 Merged to master. Thanks!! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the s

Re: [PR] [SPARK-51800][INFRA] Set up the CI for UDS in PySpark [spark]

2025-04-14 Thread via GitHub
HyukjinKwon closed pull request #50585: [SPARK-51800][INFRA] Set up the CI for UDS in PySpark URL: https://github.com/apache/spark/pull/50585 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spec

Re: [PR] [SPARK-51638][CORE] Fix fetching the remote disk stored RDD blocks via the external shuffle service [spark]

2025-04-14 Thread via GitHub
attilapiros commented on PR #50439: URL: https://github.com/apache/spark/pull/50439#issuecomment-2803863110 @mridulm I have addressed the review comments and added some extra tests -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitH

[PR] [SPARK-51802] OSS PySpark User Guide Docs [spark]

2025-04-14 Thread via GitHub
asl3 opened a new pull request, #50589: URL: https://github.com/apache/spark/pull/50589 ### What changes were proposed in this pull request? Add PySpark User Guide to OSS Docs ### Why are the changes needed? Currently, there is a lack of up-to-date PySpark

Re: [PR] [SPARK-51794][INFRA] Install arm64 Python for MacOS daily test [spark]

2025-04-14 Thread via GitHub
LuciferYang commented on PR #50579: URL: https://github.com/apache/spark/pull/50579#issuecomment-2803907259 all test passed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

  1   2   >