Re: [PR] Revert "[SPARK-47895][SQL] group by alias should be idempotent" [spark]

2025-04-13 Thread via GitHub
dongjoon-hyun commented on PR #50567: URL: https://github.com/apache/spark/pull/50567#issuecomment-2799894516 cc @cloud-fan -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] [SPARK-51691][CORE][TESTS] SerializationDebugger should swallow exception when try to find the reason of serialization problem [spark]

2025-04-13 Thread via GitHub
LuciferYang commented on code in PR #50489: URL: https://github.com/apache/spark/pull/50489#discussion_r2041103661 ## core/src/main/scala/org/apache/spark/serializer/SerializationDebugger.scala: ## @@ -110,8 +112,13 @@ private[spark] object SerializationDebugger extends Logging

[PR] [SPARK-51785] Support `addTag/removeTag/getTags/clearTags` in `SparkSession` [spark-connect-swift]

2025-04-13 Thread via GitHub
dongjoon-hyun opened a new pull request, #54: URL: https://github.com/apache/spark-connect-swift/pull/54 … ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change?

Re: [PR] [SPARK-51752][SQL] Enable rCTE referencing from within a CTE [spark]

2025-04-13 Thread via GitHub
cloud-fan commented on PR #50546: URL: https://github.com/apache/spark/pull/50546#issuecomment-2799941944 thanks, merging to master! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific c

Re: [PR] [SPARK-51752][SQL] Enable rCTE referencing from within a CTE [spark]

2025-04-13 Thread via GitHub
cloud-fan closed pull request #50546: [SPARK-51752][SQL] Enable rCTE referencing from within a CTE URL: https://github.com/apache/spark/pull/50546 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] [SPARK-51691][CORE][TESTS] SerializationDebugger should swallow exception when try to find the reason of serialization problem [spark]

2025-04-13 Thread via GitHub
summaryzb commented on code in PR #50489: URL: https://github.com/apache/spark/pull/50489#discussion_r2041120864 ## core/src/main/scala/org/apache/spark/serializer/SerializationDebugger.scala: ## @@ -110,8 +112,13 @@ private[spark] object SerializationDebugger extends Logging {

Re: [PR] [SPARK-51691][CORE][TESTS] SerializationDebugger should swallow exception when try to find the reason of serialization problem [spark]

2025-04-13 Thread via GitHub
summaryzb commented on code in PR #50489: URL: https://github.com/apache/spark/pull/50489#discussion_r2041128570 ## core/src/main/scala/org/apache/spark/serializer/SerializationDebugger.scala: ## @@ -110,8 +112,13 @@ private[spark] object SerializationDebugger extends Logging {

[PR] [SPARK-51691][CORE][FOLLOWUP] Make SerializationDebugger work in non-… [spark]

2025-04-13 Thread via GitHub
summaryzb opened a new pull request, #50573: URL: https://github.com/apache/spark/pull/50573 ### What changes were proposed in this pull request? This is followup PR of [SPARK-51691 ](https://github.com/apache/spark/pull/51691) ### Why are the changes needed? This Followup P

Re: [PR] [SPARK-51691][CORE][TESTS] SerializationDebugger should swallow exception when try to find the reason of serialization problem [spark]

2025-04-13 Thread via GitHub
summaryzb commented on code in PR #50489: URL: https://github.com/apache/spark/pull/50489#discussion_r2041128570 ## core/src/main/scala/org/apache/spark/serializer/SerializationDebugger.scala: ## @@ -110,8 +112,13 @@ private[spark] object SerializationDebugger extends Logging {

Re: [PR] [SPARK-51691][CORE][TESTS] SerializationDebugger should swallow exception when try to find the reason of serialization problem [spark]

2025-04-13 Thread via GitHub
summaryzb commented on code in PR #50489: URL: https://github.com/apache/spark/pull/50489#discussion_r2041128570 ## core/src/main/scala/org/apache/spark/serializer/SerializationDebugger.scala: ## @@ -110,8 +112,13 @@ private[spark] object SerializationDebugger extends Logging {

Re: [PR] [SPARK-51771][SQL] Add DSv2 APIs for ALTER TABLE ADD/DROP CONSTRAINT [spark]

2025-04-13 Thread via GitHub
Udaundo1973 commented on PR #50561: URL: https://github.com/apache/spark/pull/50561#issuecomment-2800236960 James Channel -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [PR] [SPARK-51701][PYTHON][TESTS] Move test objects to a separate file [spark]

2025-04-13 Thread via GitHub
HyukjinKwon commented on PR #50503: URL: https://github.com/apache/spark/pull/50503#issuecomment-2800242002 Let me backport this down to bracnh-3.5 as well (to recover the test in https://github.com/apache/spark/actions/runs/14252718918/job/39949079960). -- This is an automated message fr

Re: [PR] [SPARK-51688][PYTHON] Use Unix Domain Socket between Python and JVM communication [spark]

2025-04-13 Thread via GitHub
HyukjinKwon commented on PR #50466: URL: https://github.com/apache/spark/pull/50466#issuecomment-2800220500 All tests passed with the conf enabled: https://github.com/HyukjinKwon/spark/actions/runs/14434095374 -- This is an automated message from the Apache Git Service. To respond to the

Re: [PR] [SPARK-51272][CORE]. Fix for the race condition in Scheduler causing failure in retrying all partitions in case of indeterministic shuffle keys [spark]

2025-04-13 Thread via GitHub
attilapiros commented on PR #50033: URL: https://github.com/apache/spark/pull/50033#issuecomment-2800042901 > For the https://github.com/apache/spark/pull/50033#discussion_r2040777376, map stage is determinate - so reexecution will not change input data for 'reducer' (though can change orde

Re: [PR] [SPARK-51272][CORE]. Fix for the race condition in Scheduler causing failure in retrying all partitions in case of indeterministic shuffle keys [spark]

2025-04-13 Thread via GitHub
mridulm commented on PR #50033: URL: https://github.com/apache/spark/pull/50033#issuecomment-2800300563 > Even if the map stage determinate a fetch failure will lead to executor lost which can remove map output of the indeterminate stage and when the result stage is resubmitted the indeterm

[PR] [SPARK-51787] Remove `sessionID` parameter from `getExecutePlanRequest` [spark-connect-swift]

2025-04-13 Thread via GitHub
dongjoon-hyun opened a new pull request, #55: URL: https://github.com/apache/spark-connect-swift/pull/55 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change?

Re: [PR] [SPARK-51272][CORE]. Fix for the race condition in Scheduler causing failure in retrying all partitions in case of indeterministic shuffle keys [spark]

2025-04-13 Thread via GitHub
mridulm commented on PR #50033: URL: https://github.com/apache/spark/pull/50033#issuecomment-2800308552 Specifically about JDBC - assuming it is not due to the case we discussed above - I am not entirely sure :-) If the commit protocol has been correctly implemented, we will need to unde

Re: [PR] [SPARK-51774][CONNECT] Add GRPC Status code to Python Connect GRPC Exception [spark]

2025-04-13 Thread via GitHub
HyukjinKwon commented on PR #50564: URL: https://github.com/apache/spark/pull/50564#issuecomment-2800158156 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] [SPARK-51774][CONNECT] Add GRPC Status code to Python Connect GRPC Exception [spark]

2025-04-13 Thread via GitHub
HyukjinKwon closed pull request #50564: [SPARK-51774][CONNECT] Add GRPC Status code to Python Connect GRPC Exception URL: https://github.com/apache/spark/pull/50564 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

Re: [PR] [SPARK-48728][SQL] Support ignoreNulls for collect_list and collect_set [spark]

2025-04-13 Thread via GitHub
github-actions[bot] commented on PR #47149: URL: https://github.com/apache/spark/pull/47149#issuecomment-2800211562 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.

Re: [PR] [SPARK-51788][INFRA] Add a PySpark test that runs every 3 days using the Ubuntu Arm Runner [spark]

2025-04-13 Thread via GitHub
LuciferYang commented on PR #50574: URL: https://github.com/apache/spark/pull/50574#issuecomment-2800484251 Thanks @zhengruifeng and @HyukjinKwon -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[PR] Test [spark]

2025-04-13 Thread via GitHub
LuciferYang opened a new pull request, #50579: URL: https://github.com/apache/spark/pull/50579 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How

Re: [PR] [SPARK-51688][PYTHON] Use Unix Domain Socket between Python and JVM communication [spark]

2025-04-13 Thread via GitHub
HyukjinKwon commented on PR #50466: URL: https://github.com/apache/spark/pull/50466#issuecomment-2800522804 This should be ready for a look. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the sp

Re: [PR] [SPARK-51646][SQL] Fix propagating collation in views with default collation [spark]

2025-04-13 Thread via GitHub
cloud-fan commented on code in PR #50436: URL: https://github.com/apache/spark/pull/50436#discussion_r2041392876 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveDDLCommandStringTypes.scala: ## @@ -79,6 +83,8 @@ object ResolveDDLCommandStringTypes ext

Re: [PR] [SPARK-51739][PYTHON] Validate Arrow schema from mapInArrow & mapInPandas & DataSource [spark]

2025-04-13 Thread via GitHub
HyukjinKwon closed pull request #50531: [SPARK-51739][PYTHON] Validate Arrow schema from mapInArrow & mapInPandas & DataSource URL: https://github.com/apache/spark/pull/50531 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and us

Re: [PR] [SPARK-51688][PYTHON] Use Unix Domain Socket between Python and JVM communication [spark]

2025-04-13 Thread via GitHub
HyukjinKwon commented on code in PR #50466: URL: https://github.com/apache/spark/pull/50466#discussion_r2041234281 ## core/src/main/scala/org/apache/spark/api/python/PythonRunner.scala: ## @@ -401,33 +415,35 @@ private[spark] abstract class BasePythonRunner[IN, OUT](

Re: [PR] Revert "[SPARK-47895][SQL] group by alias should be idempotent" [spark]

2025-04-13 Thread via GitHub
cloud-fan commented on code in PR #50567: URL: https://github.com/apache/spark/pull/50567#discussion_r2041385665 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveReferencesInAggregate.scala: ## @@ -116,19 +116,7 @@ class ResolveReferencesInAggregate(v

Re: [PR] [SPARK-51788][INFRA] Add a PySpark test that runs every 3 days using the Ubuntu Arm Runner [spark]

2025-04-13 Thread via GitHub
zhengruifeng closed pull request #50574: [SPARK-51788][INFRA] Add a PySpark test that runs every 3 days using the Ubuntu Arm Runner URL: https://github.com/apache/spark/pull/50574 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub a

Re: [PR] [SPARK-51788][INFRA] Add a PySpark test that runs every 3 days using the Ubuntu Arm Runner [spark]

2025-04-13 Thread via GitHub
zhengruifeng commented on PR #50574: URL: https://github.com/apache/spark/pull/50574#issuecomment-2800429861 merged to master -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] Revert "[SPARK-47895][SQL] group by alias should be idempotent" [spark]

2025-04-13 Thread via GitHub
cloud-fan closed pull request #50567: Revert "[SPARK-47895][SQL] group by alias should be idempotent" URL: https://github.com/apache/spark/pull/50567 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] [SPARK-51788][INFRA] Add a PySpark daily test using the Ubuntu Arm Runner [spark]

2025-04-13 Thread via GitHub
LuciferYang commented on code in PR #50574: URL: https://github.com/apache/spark/pull/50574#discussion_r2041359369 ## .github/workflows/build_python_3.11_arm.yml: ## @@ -0,0 +1,35 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license

Re: [PR] [SPARK-46640][FOLLOW-UP] Consider the whole expression tree when excluding subquery references [spark]

2025-04-13 Thread via GitHub
cloud-fan commented on code in PR #50570: URL: https://github.com/apache/spark/pull/50570#discussion_r2041394407 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala: ## @@ -671,9 +671,11 @@ object RemoveRedundantAliases extends Rule[LogicalPlan

Re: [PR] [SPARK-51182][SQL] DataFrameWriter should throw dataPathNotSpecifiedError when path is not specified [spark]

2025-04-13 Thread via GitHub
cloud-fan commented on PR #49928: URL: https://github.com/apache/spark/pull/49928#issuecomment-2800444566 To put it simple, the Java test suites should be created with all basic functionalities at the beginning, and we only add new tests to them if they are Java-specific. It may stil

Re: [PR] Revert "[SPARK-47895][SQL] group by alias should be idempotent" [spark]

2025-04-13 Thread via GitHub
cloud-fan commented on PR #50567: URL: https://github.com/apache/spark/pull/50567#issuecomment-2800431254 thanks, merging to master/4.0! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specif

Re: [PR] [SPARK-51790][SQL] Register UTF8String to KryoSerializer [spark]

2025-04-13 Thread via GitHub
yaooqinn commented on PR #50576: URL: https://github.com/apache/spark/pull/50576#issuecomment-2800543692 > Unfortunately looks pretty [commonly used](https://github.com/search?q=org.apache.spark.unsafe.types.UTF8String+language%3AScala&type=code&l=Scala) - though not sure how many are relyi

Re: [PR] [SPARK-47081][CONNECT] Support Query Execution Progress [spark]

2025-04-13 Thread via GitHub
virrrat commented on PR #45150: URL: https://github.com/apache/spark/pull/45150#issuecomment-2800565929 Is there a plan to port-back this feature to Spark 3.5? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL ab

Re: [PR] [SPARK-51739][PYTHON] Validate Arrow schema from mapInArrow & mapInPandas & DataSource [spark]

2025-04-13 Thread via GitHub
HyukjinKwon commented on PR #50531: URL: https://github.com/apache/spark/pull/50531#issuecomment-2800163747 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[PR] [WIP][ML] `ImputerModel` stores coefficients with arrays instead of dataframe [spark]

2025-04-13 Thread via GitHub
zhengruifeng opened a new pull request, #50578: URL: https://github.com/apache/spark/pull/50578 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### Ho

Re: [PR] Enable -Xsource:3 compiler flag [spark]

2025-04-13 Thread via GitHub
joan38 commented on code in PR #50474: URL: https://github.com/apache/spark/pull/50474#discussion_r2041360418 ## core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala: ## @@ -1014,39 +1014,44 @@ final class ShuffleBlockFetcherIterator( // a Su

Re: [PR] [SPARK-51785] Support `addTag/removeTag/getTags/clearTags` in `SparkSession` [spark-connect-swift]

2025-04-13 Thread via GitHub
dongjoon-hyun commented on PR #54: URL: https://github.com/apache/spark-connect-swift/pull/54#issuecomment-2800189364 Thank you so much always, @peter-toth and @viirya ! Merged to main. -- This is an automated message from the Apache Git Service. To respond to the message, please l

Re: [PR] [SPARK-51785] Support `addTag/removeTag/getTags/clearTags` in `SparkSession` [spark-connect-swift]

2025-04-13 Thread via GitHub
dongjoon-hyun closed pull request #54: [SPARK-51785] Support `addTag/removeTag/getTags/clearTags` in `SparkSession` URL: https://github.com/apache/spark-connect-swift/pull/54 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and us

Re: [PR] [SPARK-51790][SQL] Register UTF8String to KryoSerializer [spark]

2025-04-13 Thread via GitHub
mridulm commented on PR #50576: URL: https://github.com/apache/spark/pull/50576#issuecomment-2800381674 Why are we doing this ? Given `UTF8String` gets used by a lot of external libraries, dropping `KryoSerializable` from its interfaces can be breaking for users. -- This is an automate

Re: [PR] [SPARK-51513][SQL] Fix RewriteMergeIntoTable rule produces unresolved plan [spark]

2025-04-13 Thread via GitHub
HyukjinKwon commented on PR #50281: URL: https://github.com/apache/spark/pull/50281#issuecomment-2800387621 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] Enable -Xsource:3 compiler flag [spark]

2025-04-13 Thread via GitHub
LuciferYang commented on code in PR #50474: URL: https://github.com/apache/spark/pull/50474#discussion_r2041365069 ## core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala: ## @@ -1014,39 +1014,44 @@ final class ShuffleBlockFetcherIterator( //

Re: [PR] [SPARK-51513][SQL] Fix RewriteMergeIntoTable rule produces unresolved plan [spark]

2025-04-13 Thread via GitHub
HyukjinKwon closed pull request #50281: [SPARK-51513][SQL] Fix RewriteMergeIntoTable rule produces unresolved plan URL: https://github.com/apache/spark/pull/50281 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL ab

Re: [PR] [SPARK-51790][SQL] Register UTF8String to KryoSerializer [spark]

2025-04-13 Thread via GitHub
yaooqinn commented on PR #50576: URL: https://github.com/apache/spark/pull/50576#issuecomment-2800397089 > Why are we doing this ? This is bugfix that fails cache queries with string cols using spark.kryo.registrationRequired=true & spark.serializer=org.apache.spark.serializer.KryoSe

Re: [PR] [SPARK-51790][SQL] Register UTF8String to KryoSerializer [spark]

2025-04-13 Thread via GitHub
mridulm commented on PR #50576: URL: https://github.com/apache/spark/pull/50576#issuecomment-2800404298 Unfortunately looks pretty [commonly used](https://github.com/search?q=org.apache.spark.unsafe.types.UTF8String+language%3AScala&type=code&l=Scala) I do see this fails with regist

Re: [PR] [SPARK-51789][CORE] Respect spark.api.mode and spark.remote properly when parsing arguments in Spark Submission [spark]

2025-04-13 Thread via GitHub
HyukjinKwon closed pull request #50575: [SPARK-51789][CORE] Respect spark.api.mode and spark.remote properly when parsing arguments in Spark Submission URL: https://github.com/apache/spark/pull/50575 -- This is an automated message from the Apache Git Service. To respond to the message, plea

Re: [PR] [SPARK-51780][SQL] Implement Describe Procedure [spark]

2025-04-13 Thread via GitHub
cloud-fan commented on code in PR #50569: URL: https://github.com/apache/spark/pull/50569#discussion_r2041376469 ## sql/api/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBaseParser.g4: ## @@ -313,6 +313,7 @@ statement | SHOW CURRENT namespace

Re: [PR] [SPARK-51395][SQL][TESTS][FOLLOW-UP] Explicitly sets failOnError in Abs at tests [spark]

2025-04-13 Thread via GitHub
HyukjinKwon commented on PR #50577: URL: https://github.com/apache/spark/pull/50577#issuecomment-2800408779 cc @cloud-fan @aokolnychyi fyi -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spec

[PR] [SPARK-51395][SQL][TESTS][FOLLOW-UP] Explicitly sets failOnError in Abs at tests [spark]

2025-04-13 Thread via GitHub
HyukjinKwon opened a new pull request, #50577: URL: https://github.com/apache/spark/pull/50577 ### What changes were proposed in this pull request? This PR is a followup of https://github.com/apache/spark/pull/50197 that fixes the tests to pass when ANSI is off. ### Why are the

Re: [PR] [SPARK-51780][SQL] Implement Describe Procedure [spark]

2025-04-13 Thread via GitHub
cloud-fan closed pull request #50569: [SPARK-51780][SQL] Implement Describe Procedure URL: https://github.com/apache/spark/pull/50569 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific com

Re: [PR] [SPARK-51789][CORE] Respect spark.api.mode and spark.remote properly when parsing arguments in Spark Submission [spark]

2025-04-13 Thread via GitHub
HyukjinKwon commented on PR #50575: URL: https://github.com/apache/spark/pull/50575#issuecomment-2800405948 Merged to master and branch-4.0. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the sp

Re: [PR] [SPARK-51773][SQL] Add hashCode and equals to file formats [spark]

2025-04-13 Thread via GitHub
HyukjinKwon commented on code in PR #50562: URL: https://github.com/apache/spark/pull/50562#discussion_r2041377513 ## mllib/src/main/scala/org/apache/spark/ml/source/image/ImageFileFormat.scala: ## @@ -49,6 +49,10 @@ private[image] class ImageFileFormat extends FileFormat with

Re: [PR] [SPARK-51787] Remove `sessionID` parameter from `getExecutePlanRequest` [spark-connect-swift]

2025-04-13 Thread via GitHub
dongjoon-hyun commented on PR #55: URL: https://github.com/apache/spark-connect-swift/pull/55#issuecomment-2800322265 Oh, thank you, @viirya ! Merged to main. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL ab

Re: [PR] [SPARK-51691][CORE][FOLLOWUP] Make SerializationDebugger work in non-… [spark]

2025-04-13 Thread via GitHub
LuciferYang commented on code in PR #50573: URL: https://github.com/apache/spark/pull/50573#discussion_r2041306562 ## core/src/main/scala/org/apache/spark/serializer/SerializationDebugger.scala: ## @@ -112,11 +112,12 @@ private[spark] object SerializationDebugger extends Loggin

Re: [PR] [SPARK-51787] Remove `sessionID` parameter from `getExecutePlanRequest` [spark-connect-swift]

2025-04-13 Thread via GitHub
dongjoon-hyun closed pull request #55: [SPARK-51787] Remove `sessionID` parameter from `getExecutePlanRequest` URL: https://github.com/apache/spark-connect-swift/pull/55 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[PR] [SPARK-51788][INFRA] Add a PySpark daily test using the Ubuntu Arm Runner [spark]

2025-04-13 Thread via GitHub
LuciferYang opened a new pull request, #50574: URL: https://github.com/apache/spark/pull/50574 ### What changes were proposed in this pull request? Similar to SPARK-51761, this pr adds a PySpark daily test using the Ubuntu Arm Runner. ### Why are the changes needed? Check the av

Re: [PR] [SPARK-51788][INFRA] Add a PySpark daily test using the Ubuntu Arm Runner [spark]

2025-04-13 Thread via GitHub
LuciferYang commented on code in PR #50574: URL: https://github.com/apache/spark/pull/50574#discussion_r2041319923 ## .github/workflows/python_hosted_runner_test.yml: ## @@ -152,12 +167,12 @@ jobs: if: always() uses: actions/upload-artifact@v4 with: -

Re: [PR] [SPARK-51788][INFRA] Add a PySpark daily test using the Ubuntu Arm Runner [spark]

2025-04-13 Thread via GitHub
LuciferYang commented on code in PR #50574: URL: https://github.com/apache/spark/pull/50574#discussion_r2041320759 ## .github/workflows/build_python_3.11_arm.yml: ## @@ -0,0 +1,35 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license

Re: [PR] [SPARK-51788][INFRA] Add a PySpark daily test using the Ubuntu Arm Runner [spark]

2025-04-13 Thread via GitHub
LuciferYang commented on code in PR #50574: URL: https://github.com/apache/spark/pull/50574#discussion_r2041319923 ## .github/workflows/python_hosted_runner_test.yml: ## @@ -152,12 +167,12 @@ jobs: if: always() uses: actions/upload-artifact@v4 with: -

Re: [PR] [SPARK-51773][SQL] Add hashCode and equals to file formats [spark]

2025-04-13 Thread via GitHub
cloud-fan commented on PR #50562: URL: https://github.com/apache/spark/pull/50562#issuecomment-2800333645 shall we simply turn them into case class? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go t

Re: [PR] [SPARK-51773][SQL] Add hashCode and equals to file formats [spark]

2025-04-13 Thread via GitHub
cloud-fan commented on code in PR #50562: URL: https://github.com/apache/spark/pull/50562#discussion_r2041321974 ## mllib/src/main/scala/org/apache/spark/ml/source/image/ImageFileFormat.scala: ## @@ -49,6 +49,10 @@ private[image] class ImageFileFormat extends FileFormat with Da

Re: [PR] [SPARK-51775][SQL] Normalize LogicalRelation and HiveTableRelation by NormalizePlan [spark]

2025-04-13 Thread via GitHub
cloud-fan commented on code in PR #50563: URL: https://github.com/apache/spark/pull/50563#discussion_r2041328635 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/NormalizeableRelation.scala: ## @@ -0,0 +1,29 @@ +/* + * Licensed to the Apache Software Foundat

Re: [PR] [SPARK-51776][SQL] Fix logging in single-pass Analyzer [spark]

2025-04-13 Thread via GitHub
cloud-fan closed pull request #50565: [SPARK-51776][SQL] Fix logging in single-pass Analyzer URL: https://github.com/apache/spark/pull/50565 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the speci

Re: [PR] [SPARK-51773][SQL] Add hashCode and equals to file formats [spark]

2025-04-13 Thread via GitHub
HyukjinKwon commented on code in PR #50562: URL: https://github.com/apache/spark/pull/50562#discussion_r2041241468 ## mllib/src/main/scala/org/apache/spark/ml/source/image/ImageFileFormat.scala: ## @@ -49,6 +49,10 @@ private[image] class ImageFileFormat extends FileFormat with

[PR] [SPARK-51789][CORE] Respect spark.api.mode and spark.remote properly when parsing arguments in Spark Submission [spark]

2025-04-13 Thread via GitHub
HyukjinKwon opened a new pull request, #50575: URL: https://github.com/apache/spark/pull/50575 ### What changes were proposed in this pull request? This PR proposes to respect `spark.api.mode` and `spark.remote` properly when parsing arguments in Spark Submission. Currently, the `isRe

Re: [PR] [SPARK-51776][SQL] Fix logging in single-pass Analyzer [spark]

2025-04-13 Thread via GitHub
cloud-fan commented on PR #50565: URL: https://github.com/apache/spark/pull/50565#issuecomment-2800339676 thanks, merging to master! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific c

Re: [PR] [SPARK-51788][INFRA] Add a PySpark daily test using the Ubuntu Arm Runner [spark]

2025-04-13 Thread via GitHub
zhengruifeng commented on code in PR #50574: URL: https://github.com/apache/spark/pull/50574#discussion_r2041344090 ## .github/workflows/build_python_3.11_arm.yml: ## @@ -0,0 +1,35 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license

Re: [PR] [SPARK-51780][SQL] Implement Describe Procedure [spark]

2025-04-13 Thread via GitHub
zhengruifeng commented on code in PR #50569: URL: https://github.com/apache/spark/pull/50569#discussion_r2041358258 ## sql/api/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBaseParser.g4: ## @@ -313,6 +313,7 @@ statement | SHOW CURRENT namespace

Re: [PR] [SPARK-51752][SQL] Enable rCTE referencing from within a CTE [spark]

2025-04-13 Thread via GitHub
Pajaraja commented on code in PR #50546: URL: https://github.com/apache/spark/pull/50546#discussion_r2041085999 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CTESubstitution.scala: ## @@ -358,7 +385,7 @@ object CTESubstitution extends Rule[LogicalPlan] {

Re: [PR] [Core]Convert shuffleWriteTime from Nanoseconds to Milliseconds for Consistency with Other Metrics [spark]

2025-04-13 Thread via GitHub
mridulm commented on PR #50418: URL: https://github.com/apache/spark/pull/50418#issuecomment-2799832404 -1 This is a breaking change and will cause incompatibility for all existing usages of this metric. In addition, ms granularity is insufficient to capture the cost for local writes.