Re: [PR] [SPARK-51443][SS] Fix singleVariantColumn in DSv2 and readStream. [spark]

2025-03-13 Thread via GitHub
cloud-fan closed pull request #50217: [SPARK-51443][SS] Fix singleVariantColumn in DSv2 and readStream. URL: https://github.com/apache/spark/pull/50217 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go t

Re: [PR] [SPARK-51402][SQL][TESTS] Test TimeType in UDF [spark]

2025-03-13 Thread via GitHub
MaxGekk commented on PR #50194: URL: https://github.com/apache/spark/pull/50194#issuecomment-2720623643 +1, LGTM. Merging to master. Thank you, @calilisantos. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

Re: [PR] [SPARK-51441][SQL] Add DSv2 APIs for constraints [spark]

2025-03-13 Thread via GitHub
dongjoon-hyun commented on code in PR #50253: URL: https://github.com/apache/spark/pull/50253#discussion_r1993729490 ## sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/constraints/Constraint.java: ## @@ -0,0 +1,128 @@ +/* + * Licensed to the Apache Software Fou

Re: [PR] [SPARK-51372][SQL] Introduce a builder pattern in TableCatalog [spark]

2025-03-13 Thread via GitHub
cloud-fan commented on PR #50137: URL: https://github.com/apache/spark/pull/50137#issuecomment-2721388429 OK so the requirement is: - adding a new item to table metadata should not require a new overload of `def createTable` - the existing `def createTable` implementation should fail i

Re: [PR] [SPARK-51438][SQL] Make CatalystDataToProtobuf and ProtobufDataToCatalyst properly comparable and hashable [spark]

2025-03-13 Thread via GitHub
cloud-fan commented on PR #50212: URL: https://github.com/apache/spark/pull/50212#issuecomment-2721282652 thanks, merging to master! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific c

Re: [PR] [SPARK-51487][PYTHON][INFRA] Refresh testing images for pyarrow 19 [spark]

2025-03-13 Thread via GitHub
dongjoon-hyun commented on PR #50255: URL: https://github.com/apache/spark/pull/50255#issuecomment-2721458327 To @zhengruifeng , could you confirm this part https://github.com/apache/spark/pull/50255#pullrequestreview-2679800649 ? > this PR seems to affect branch-4.0 Daily CI. Could you c

Re: [PR] [SPARK-51438][SQL] Make CatalystDataToProtobuf and ProtobufDataToCatalyst properly comparable and hashable [spark]

2025-03-13 Thread via GitHub
cloud-fan closed pull request #50212: [SPARK-51438][SQL] Make CatalystDataToProtobuf and ProtobufDataToCatalyst properly comparable and hashable URL: https://github.com/apache/spark/pull/50212 -- This is an automated message from the Apache Git Service. To respond to the message, please log

Re: [PR] [SPARK-51338][INFRA] Add automated CI build for `connect-examples` [spark]

2025-03-13 Thread via GitHub
LuciferYang commented on code in PR #50187: URL: https://github.com/apache/spark/pull/50187#discussion_r1991901327 ## connect-examples/server-library-example/pom.xml: ## @@ -36,7 +36,8 @@ UTF-8 2.13 2.13.15 -3.25.4 -4.0.0-preview2 +4.29.3 +4.1.0-SN

Re: [PR] [SPARK-51441][SQL] Add DSv2 APIs for constraints [spark]

2025-03-13 Thread via GitHub
dongjoon-hyun commented on code in PR #50253: URL: https://github.com/apache/spark/pull/50253#discussion_r1993722344 ## sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/constraints/BaseConstraint.java: ## @@ -0,0 +1,71 @@ +/* + * Licensed to the Apache Software

Re: [PR] [SPARK-51359][CORE][SQL] Set INT64 as the default timestamp type for Parquet files [spark]

2025-03-13 Thread via GitHub
ganeshashree commented on PR #50215: URL: https://github.com/apache/spark/pull/50215#issuecomment-2721580059 > int96 support in parquet apache/iceberg#1138 @HyukjinKwon Yes, this will be a breaking change for applications that depend on INT96 and use future versions of Spark. We are c

Re: [PR] [SPARK-51466][SQL][HIVE] Eliminate Hive built-in UDFs initialization on Hive UDF evaluation [spark]

2025-03-13 Thread via GitHub
cloud-fan commented on PR #50232: URL: https://github.com/apache/spark/pull/50232#issuecomment-2721567771 late LGTM -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsub

Re: [PR] [SPARK-51441][SQL] Add DSv2 APIs for constraints [spark]

2025-03-13 Thread via GitHub
dongjoon-hyun commented on code in PR #50253: URL: https://github.com/apache/spark/pull/50253#discussion_r1993729490 ## sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/constraints/Constraint.java: ## @@ -0,0 +1,128 @@ +/* + * Licensed to the Apache Software Fou

Re: [PR] [SPARK-51350][SQL] Implement Show Procedures [spark]

2025-03-13 Thread via GitHub
cloud-fan commented on code in PR #50109: URL: https://github.com/apache/spark/pull/50109#discussion_r1993730085 ## sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/ProcedureCatalog.java: ## @@ -34,4 +34,9 @@ public interface ProcedureCatalog extends CatalogPlug

Re: [PR] [SPARK-51350][SQL] Implement Show Procedures [spark]

2025-03-13 Thread via GitHub
cloud-fan commented on code in PR #50109: URL: https://github.com/apache/spark/pull/50109#discussion_r1993728678 ## sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/ProcedureCatalog.java: ## @@ -34,4 +34,9 @@ public interface ProcedureCatalog extends CatalogPlug

Re: [PR] [SPARK-51441][SQL] Add DSv2 APIs for constraints [spark]

2025-03-13 Thread via GitHub
dongjoon-hyun commented on code in PR #50253: URL: https://github.com/apache/spark/pull/50253#discussion_r1993731273 ## sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/constraints/Constraint.java: ## @@ -0,0 +1,128 @@ +/* + * Licensed to the Apache Software Fou

Re: [PR] [SPARK-51441][SQL] Add DSv2 APIs for constraints [spark]

2025-03-13 Thread via GitHub
dongjoon-hyun commented on code in PR #50253: URL: https://github.com/apache/spark/pull/50253#discussion_r1993729490 ## sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/constraints/Constraint.java: ## @@ -0,0 +1,128 @@ +/* + * Licensed to the Apache Software Fou

Re: [PR] [SPARK-51350][SQL] Implement Show Procedures [spark]

2025-03-13 Thread via GitHub
cloud-fan commented on code in PR #50109: URL: https://github.com/apache/spark/pull/50109#discussion_r1993733934 ## sql/catalyst/src/test/scala/org/apache/spark/sql/connector/catalog/InMemoryTableCatalog.scala: ## @@ -268,6 +268,11 @@ class InMemoryTableCatalog extends BasicInM

Re: [PR] [SPARK-51350][SQL] Implement Show Procedures [spark]

2025-03-13 Thread via GitHub
cloud-fan commented on code in PR #50109: URL: https://github.com/apache/spark/pull/50109#discussion_r1993734437 ## sql/catalyst/src/test/scala/org/apache/spark/sql/connector/catalog/InMemoryTableCatalog.scala: ## @@ -268,6 +268,11 @@ class InMemoryTableCatalog extends BasicInM

Re: [PR] [SPARK-51350][SQL] Implement Show Procedures [spark]

2025-03-13 Thread via GitHub
cloud-fan commented on code in PR #50109: URL: https://github.com/apache/spark/pull/50109#discussion_r1993737628 ## sql/core/src/main/scala/org/apache/spark/sql/execution/command/ShowProceduresCommand.scala: ## @@ -0,0 +1,54 @@ +/* + * Licensed to the Apache Software Foundation

Re: [PR] [SPARK-51350][SQL] Implement Show Procedures [spark]

2025-03-13 Thread via GitHub
cloud-fan commented on code in PR #50109: URL: https://github.com/apache/spark/pull/50109#discussion_r1993740184 ## sql/core/src/test/scala/org/apache/spark/sql/connector/ProcedureSuite.scala: ## @@ -40,15 +40,23 @@ class ProcedureSuite extends QueryTest with SharedSparkSession

Re: [PR] [SPARK-51441][SQL] Add DSv2 APIs for constraints [spark]

2025-03-13 Thread via GitHub
dongjoon-hyun commented on code in PR #50253: URL: https://github.com/apache/spark/pull/50253#discussion_r1993729490 ## sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/constraints/Constraint.java: ## @@ -0,0 +1,128 @@ +/* + * Licensed to the Apache Software Fou

Re: [PR] [SPARK-48922][SQL] Avoid redundant array transform of identical expression for map type [spark]

2025-03-13 Thread via GitHub
wForget commented on PR #50245: URL: https://github.com/apache/spark/pull/50245#issuecomment-2719721278 > @wForget Could you create backport PR for branch-3.5 Sure, I will create later -- This is an automated message from the Apache Git Service. To respond to the message, please log

Re: [PR] [SPARK-51495] Add `Integration Test` GitHub Action job [spark-connect-swift]

2025-03-13 Thread via GitHub
dongjoon-hyun commented on code in PR #15: URL: https://github.com/apache/spark-connect-swift/pull/15#discussion_r1992468710 ## Tests/SparkConnectTests/DataFrameTests.swift: ## @@ -81,6 +81,7 @@ struct DataFrameTests { await spark.stop() } +#if !os(Linux) Review Comm

Re: [PR] [SPARK-37019][SQL] Add codegen support to array higher-order functions [spark]

2025-03-13 Thread via GitHub
chris-twiner commented on code in PR #34558: URL: https://github.com/apache/spark/pull/34558#discussion_r1993925656 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/higherOrderFunctions.scala: ## @@ -235,6 +256,53 @@ trait HigherOrderFunction extends Expr

Re: [PR] [SPARK-51350][SQL] Implement Show Procedures [spark]

2025-03-13 Thread via GitHub
szehon-ho commented on code in PR #50109: URL: https://github.com/apache/spark/pull/50109#discussion_r1993927792 ## sql/catalyst/src/test/scala/org/apache/spark/sql/connector/catalog/InMemoryTableCatalog.scala: ## @@ -268,6 +268,11 @@ class InMemoryTableCatalog extends BasicInM

Re: [PR] [SPARK-51504] Support `select/limit/sort/orderBy/isEmpty` for `DataFrame` [spark-connect-swift]

2025-03-13 Thread via GitHub
dongjoon-hyun commented on PR #16: URL: https://github.com/apache/spark-connect-swift/pull/16#issuecomment-272244 Could you review this PR when you have some time, @viirya ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

Re: [PR] [SPARK-51504] Support `select/limit/sort/orderBy/isEmpty` for `DataFrame` [spark-connect-swift]

2025-03-13 Thread via GitHub
viirya commented on code in PR #16: URL: https://github.com/apache/spark-connect-swift/pull/16#discussion_r1994364613 ## Sources/SparkConnect/DataFrame.swift: ## @@ -192,4 +192,38 @@ public actor DataFrame: Sendable { print(table.render()) } } + + /// Projects a

Re: [PR] [SPARK-44856][PYTHON] Improve Python UDTF arrow serializer performance [spark]

2025-03-13 Thread via GitHub
dongjoon-hyun commented on PR #50099: URL: https://github.com/apache/spark/pull/50099#issuecomment-2721318301 Could you fix the remaining failures? ``` pyspark.errors.exceptions.base.PySparkRuntimeError: [UDTF_ARROW_TYPE_CAST_ERROR] Cannot convert the output value of the column 'x' wi

Re: [PR] [SPARK-43221][CORE][3.5] Host local block fetching should use a block status of a block stored on disk [spark]

2025-03-13 Thread via GitHub
dongjoon-hyun closed pull request #50260: [SPARK-43221][CORE][3.5] Host local block fetching should use a block status of a block stored on disk URL: https://github.com/apache/spark/pull/50260 -- This is an automated message from the Apache Git Service. To respond to the message, please log o

Re: [PR] [SPARK-51450][CORE] BarrierCoordinator thread not exiting in Spark standalone mode [spark]

2025-03-13 Thread via GitHub
beliefer commented on PR #50223: URL: https://github.com/apache/spark/pull/50223#issuecomment-2721323484 @jjayadeep06 @srowen @cnauroth Thank you ! Merged into branch-3.5 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

Re: [PR] [SPARK-51450][CORE] BarrierCoordinator thread not exiting in Spark standalone mode [spark]

2025-03-13 Thread via GitHub
beliefer closed pull request #50223: [SPARK-51450][CORE] BarrierCoordinator thread not exiting in Spark standalone mode URL: https://github.com/apache/spark/pull/50223 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the U

Re: [PR] [SPARK-43221][CORE][3.5] Host local block fetching should use a block status of a block stored on disk [spark]

2025-03-13 Thread via GitHub
dongjoon-hyun commented on PR #50260: URL: https://github.com/apache/spark/pull/50260#issuecomment-2721308974 Merged to branch-3.5. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific co

Re: [PR] [SPARK-51487][PYTHON][INFRA] Refresh testing images for pyarrow 19 [spark]

2025-03-13 Thread via GitHub
zhengruifeng commented on PR #50255: URL: https://github.com/apache/spark/pull/50255#issuecomment-2721794721 > To @zhengruifeng , could you confirm this part [#50255 (review)](https://github.com/apache/spark/pull/50255#pullrequestreview-2679800649) ? > > > this PR seems to affect bra

Re: [PR] [SPARK-23890][SQL] Support DDL for adding nested columns to struct types [spark]

2025-03-13 Thread via GitHub
ottomata commented on PR #21012: URL: https://github.com/apache/spark/pull/21012#issuecomment-2722408376 > Would one need to use CHANGE|ALTER COLUMN syntax for this? [TIL](https://phabricator.wikimedia.org/T209453#10632894) that Iceberg supports this with .value column name referencin

Re: [PR] [SPARK-51504] Support `select/limit/sort/orderBy/isEmpty` for `DataFrame` [spark-connect-swift]

2025-03-13 Thread via GitHub
viirya commented on code in PR #16: URL: https://github.com/apache/spark-connect-swift/pull/16#discussion_r1994366213 ## Sources/SparkConnect/DataFrame.swift: ## @@ -192,4 +192,38 @@ public actor DataFrame: Sendable { print(table.render()) } } + + /// Projects a

Re: [PR] [SPARK-51504] Support `select/limit/sort/orderBy/isEmpty` for `DataFrame` [spark-connect-swift]

2025-03-13 Thread via GitHub
viirya commented on code in PR #16: URL: https://github.com/apache/spark-connect-swift/pull/16#discussion_r1994368629 ## Tests/SparkConnectTests/DataFrameTests.swift: ## @@ -68,6 +68,76 @@ struct DataFrameTests { await spark.stop() } + @Test + func selectNone() asyn

Re: [PR] [SPARK-51504] Support `select/limit/sort/orderBy/isEmpty` for `DataFrame` [spark-connect-swift]

2025-03-13 Thread via GitHub
viirya commented on code in PR #16: URL: https://github.com/apache/spark-connect-swift/pull/16#discussion_r1994367984 ## Tests/SparkConnectTests/DataFrameTests.swift: ## @@ -68,6 +68,76 @@ struct DataFrameTests { await spark.stop() } + @Test + func selectNone() asyn

Re: [PR] [SPARK-51504] Support `select/limit/sort/orderBy/isEmpty` for `DataFrame` [spark-connect-swift]

2025-03-13 Thread via GitHub
viirya commented on code in PR #16: URL: https://github.com/apache/spark-connect-swift/pull/16#discussion_r1994370384 ## Tests/SparkConnectTests/DataFrameTests.swift: ## @@ -68,6 +68,76 @@ struct DataFrameTests { await spark.stop() } + @Test + func selectNone() asyn

Re: [PR] [SPARK-51504] Support `select/limit/sort/orderBy/isEmpty` for `DataFrame` [spark-connect-swift]

2025-03-13 Thread via GitHub
dongjoon-hyun commented on code in PR #16: URL: https://github.com/apache/spark-connect-swift/pull/16#discussion_r1994369781 ## Tests/SparkConnectTests/DataFrameTests.swift: ## @@ -68,6 +68,76 @@ struct DataFrameTests { await spark.stop() } + @Test + func selectNone

Re: [PR] [SPARK-51504] Support `select/limit/sort/orderBy/isEmpty` for `DataFrame` [spark-connect-swift]

2025-03-13 Thread via GitHub
dongjoon-hyun commented on PR #16: URL: https://github.com/apache/spark-connect-swift/pull/16#issuecomment-2722813101 Merged to main. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-51506][PYTHON][SS] Do not enforce users to implement close() in TransformWithStateInPandas [spark]

2025-03-13 Thread via GitHub
jingz-db commented on PR #50272: URL: https://github.com/apache/spark/pull/50272#issuecomment-2722832096 There seem to be more places that could remove the unnecessary `close()` impl in test file: e.g. https://github.com/apache/spark/blob/ccfc0a9dcea8fa9d6aab4dbb233f8135a3947441/python/pysp

[PR] [SPARK-51501][SQL] Disable ObjectHashAggregate for group by on collated columns [spark]

2025-03-13 Thread via GitHub
stefankandic opened a new pull request, #50267: URL: https://github.com/apache/spark/pull/50267 ### What changes were proposed in this pull request? Disabling `ObjectHashAggregate` when grouping on columns with collations. ### Why are the changes needed? https://github.com/ap

[PR] [SPARK-51502][SQL] Move collations test to collations package [spark]

2025-03-13 Thread via GitHub
stefankandic opened a new pull request, #50268: URL: https://github.com/apache/spark/pull/50268 ### What changes were proposed in this pull request? Move collations test into collations package where most collation test suites already are located. ### Why are the changes ne

Re: [PR] [SPARK-37019][SQL] Add codegen support to array higher-order functions [spark]

2025-03-13 Thread via GitHub
Kimahriman commented on code in PR #34558: URL: https://github.com/apache/spark/pull/34558#discussion_r1993969482 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/higherOrderFunctions.scala: ## @@ -235,6 +256,53 @@ trait HigherOrderFunction extends Expres

[PR] [SPARK-51270] Support nanosecond precision timestamp in Variant [spark]

2025-03-13 Thread via GitHub
cashmand opened a new pull request, #50270: URL: https://github.com/apache/spark/pull/50270 ### What changes were proposed in this pull request? Adds Variant support for the nanosecond precision timestamp types in https://github.com/apache/parquet-format/commit/25f05e73d8cd7f5

[PR] [SPARK-51414][SQL] Add the make_time() function [spark]

2025-03-13 Thread via GitHub
robreeves opened a new pull request, #50269: URL: https://github.com/apache/spark/pull/50269 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How w

Re: [PR] [SPARK-51441][SQL] Add DSv2 APIs for constraints [spark]

2025-03-13 Thread via GitHub
dongjoon-hyun commented on code in PR #50253: URL: https://github.com/apache/spark/pull/50253#discussion_r1993739512 ## sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/constraints/ForeignKey.java: ## @@ -0,0 +1,104 @@ +/* + * Licensed to the Apache Software Fou

Re: [PR] [SPARK-37019][SQL] Add codegen support to array higher-order functions [spark]

2025-03-13 Thread via GitHub
Kimahriman commented on PR #34558: URL: https://github.com/apache/spark/pull/34558#issuecomment-2722033268 > > @Kimahriman just out of curiosity, how much did the performance improve? > > I just wanted to add to the above response that I've implemented a compilation scheme [here](htt

Re: [PR] [SPARK-37019][SQL] Add codegen support to array higher-order functions [spark]

2025-03-13 Thread via GitHub
chris-twiner commented on PR #34558: URL: https://github.com/apache/spark/pull/34558#issuecomment-2722268943 > > > @Kimahriman just out of curiosity, how much did the performance improve? > > > > > > I just wanted to add to the above response that I've implemented a compilation s

Re: [PR] [SPARK-49479][CORE] Cancel the Timer non-daemon thread on stopping the BarrierCoordinator [spark]

2025-03-13 Thread via GitHub
tedyu commented on code in PR #50020: URL: https://github.com/apache/spark/pull/50020#discussion_r1993452621 ## core/src/main/scala/org/apache/spark/BarrierCoordinator.scala: ## @@ -134,11 +138,8 @@ private[spark] class BarrierCoordinator( // Cancel the current active Tim

[PR] [SPARK-51504] Support `DataFrame.select` [spark-connect-swift]

2025-03-13 Thread via GitHub
dongjoon-hyun opened a new pull request, #16: URL: https://github.com/apache/spark-connect-swift/pull/16 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change?

Re: [PR] [SPARK-51489][SQL] Represent SQL Script in Spark UI [spark]

2025-03-13 Thread via GitHub
dusantism-db closed pull request #50256: [SPARK-51489][SQL] Represent SQL Script in Spark UI URL: https://github.com/apache/spark/pull/50256 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the speci

Re: [PR] [SPARK-37019][SQL] Add codegen support to array higher-order functions [spark]

2025-03-13 Thread via GitHub
chris-twiner commented on code in PR #34558: URL: https://github.com/apache/spark/pull/34558#discussion_r1994100057 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/higherOrderFunctions.scala: ## @@ -235,6 +256,53 @@ trait HigherOrderFunction extends Expr

Re: [PR] [SPARK-51505][SQL] Log empty partition number metrics in AQE coalesce [spark]

2025-03-13 Thread via GitHub
liuzqt commented on PR #50273: URL: https://github.com/apache/spark/pull/50273#issuecomment-2722992293 @maryannxue @cloud-fan -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment

Re: [PR] [WIP][SPARK-xxxx][Collation] Guard collation regex expressions behind a flag [spark]

2025-03-13 Thread via GitHub
github-actions[bot] closed pull request #49026: [WIP][SPARK-][Collation] Guard collation regex expressions behind a flag URL: https://github.com/apache/spark/pull/49026 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

Re: [PR] [SPARK-50404][PYTHON] PySpark DataFrame Pipe Method [spark]

2025-03-13 Thread via GitHub
github-actions[bot] commented on PR #48947: URL: https://github.com/apache/spark/pull/48947#issuecomment-2722995058 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.

Re: [PR] [SPARK-50424][INFRA] Extract the common content of `Dockerfile` from `Docs`, `Linter`, and `SparkR` test images [spark]

2025-03-13 Thread via GitHub
github-actions[bot] closed pull request #48967: [SPARK-50424][INFRA] Extract the common content of `Dockerfile` from `Docs`, `Linter`, and `SparkR` test images URL: https://github.com/apache/spark/pull/48967 -- This is an automated message from the Apache Git Service. To respond to the messa

Re: [PR] [SPARK-XXXX][Avro] Fix avro deserialization breaking for UnionType[null, Record] [spark]

2025-03-13 Thread via GitHub
github-actions[bot] closed pull request #49019: [SPARK-][Avro] Fix avro deserialization breaking for UnionType[null, Record] URL: https://github.com/apache/spark/pull/49019 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

Re: [PR] [SPARK-51506][PYTHON][SS] Do not enforce users to implement close() in TransformWithStateInPandas [spark]

2025-03-13 Thread via GitHub
HeartSaVioR commented on PR #50272: URL: https://github.com/apache/spark/pull/50272#issuecomment-2723004044 It's better to retain both types of implementations since we want to "optionally" have close() so we should have covered both cases on testing. -- This is an automated message from

[PR] [SPARK-51505][SQL] Log empty partition number metrics in AQE coalesce [spark]

2025-03-13 Thread via GitHub
liuzqt opened a new pull request, #50273: URL: https://github.com/apache/spark/pull/50273 ### What changes were proposed in this pull request? Log empty partition number metrics in AQE coalesce ### Why are the changes needed? There're cases where shuffle is highly ske

Re: [PR] [SPARK-51506][PYTHON][SS] Do not enforce users to implement close() in TransformWithStateInPandas [spark]

2025-03-13 Thread via GitHub
HeartSaVioR commented on PR #50272: URL: https://github.com/apache/spark/pull/50272#issuecomment-2723005968 Thanks! Merging to master/4.0. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spec

Re: [PR] [SPARK-51506][PYTHON][SS] Do not enforce users to implement close() in TransformWithStateInPandas [spark]

2025-03-13 Thread via GitHub
HeartSaVioR commented on PR #50272: URL: https://github.com/apache/spark/pull/50272#issuecomment-2723008273 Forgot to tell, I have got approval on merging this bugfix from @cloud-fan beforehand. -- This is an automated message from the Apache Git Service. To respond to the message, please

Re: [PR] [SPARK-51506][PYTHON][SS] Do not enforce users to implement close() in TransformWithStateInPandas [spark]

2025-03-13 Thread via GitHub
HeartSaVioR closed pull request #50272: [SPARK-51506][PYTHON][SS] Do not enforce users to implement close() in TransformWithStateInPandas URL: https://github.com/apache/spark/pull/50272 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to Gi

Re: [PR] [SPARK-51272][CORE]. Fix for the race condition in Scheduler causing failure in retrying all partitions in case of indeterministic shuffle keys [spark]

2025-03-13 Thread via GitHub
attilapiros commented on code in PR #50033: URL: https://github.com/apache/spark/pull/50033#discussion_r1994459647 ## core/src/main/scala/org/apache/spark/scheduler/ShuffleMapStage.scala: ## @@ -90,8 +90,11 @@ private[spark] class ShuffleMapStage( /** Returns the sequence o

Re: [PR] [SPARK-51497][SQL] Add the default time formatter [spark]

2025-03-13 Thread via GitHub
dongjoon-hyun closed pull request #50266: [SPARK-51497][SQL] Add the default time formatter URL: https://github.com/apache/spark/pull/50266 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specif

Re: [PR] [SPARK-51506][PYTHON][SS] Do not enforce users to implement close() in TransformWithStateInPandas [spark]

2025-03-13 Thread via GitHub
bogao007 commented on PR #50272: URL: https://github.com/apache/spark/pull/50272#issuecomment-2722873069 > nits: There seem to be more places that could remove the unnecessary `close()` impl in test file: e.g. > > https://github.com/apache/spark/blob/ccfc0a9dcea8fa9d6aab4dbb233f8135a

Re: [PR] [SPARK-51491][PYTHON] Simplify boxplot with subquery APIs [spark]

2025-03-13 Thread via GitHub
zhengruifeng closed pull request #50258: [SPARK-51491][PYTHON] Simplify boxplot with subquery APIs URL: https://github.com/apache/spark/pull/50258 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] [SPARK-51191][SQL] Validate default values handling in DELETE, UPDATE, MERGE [spark]

2025-03-13 Thread via GitHub
dongjoon-hyun commented on code in PR #50271: URL: https://github.com/apache/spark/pull/50271#discussion_r1994422636 ## sql/core/src/test/scala/org/apache/spark/sql/connector/MergeIntoTableSuiteBase.scala: ## @@ -32,6 +32,58 @@ abstract class MergeIntoTableSuiteBase extends Row

Re: [PR] [SPARK-51191][SQL] Validate default values handling in DELETE, UPDATE, MERGE [spark]

2025-03-13 Thread via GitHub
dongjoon-hyun commented on PR #50271: URL: https://github.com/apache/spark/pull/50271#issuecomment-2722908307 cc @huaxingao -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] [SPARK-51491][PYTHON] Simplify boxplot with subquery APIs [spark]

2025-03-13 Thread via GitHub
zhengruifeng commented on PR #50258: URL: https://github.com/apache/spark/pull/50258#issuecomment-2722879036 merged to master -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] [SPARK-51272][CORE]. Fix for the race condition in Scheduler causing failure in retrying all partitions in case of indeterministic shuffle keys [spark]

2025-03-13 Thread via GitHub
ahshahid commented on code in PR #50033: URL: https://github.com/apache/spark/pull/50033#discussion_r1994488768 ## scalastyle-config.xml: ## @@ -94,7 +94,7 @@ This file is divided into 3 sections: - + Review Comment: Ok. will do that. -- This is an auto

Re: [PR] [SPARK-51272][CORE]. Fix for the race condition in Scheduler causing failure in retrying all partitions in case of indeterministic shuffle keys [spark]

2025-03-13 Thread via GitHub
ahshahid commented on code in PR #50033: URL: https://github.com/apache/spark/pull/50033#discussion_r1994488494 ## core/src/main/scala/org/apache/spark/scheduler/ShuffleMapStage.scala: ## @@ -90,8 +90,11 @@ private[spark] class ShuffleMapStage( /** Returns the sequence of p

[PR] [SPARK-51191][SQL] Validate default values handling in DELETE, UPDATE, MERGE [spark]

2025-03-13 Thread via GitHub
aokolnychyi opened a new pull request, #50271: URL: https://github.com/apache/spark/pull/50271 ### What changes were proposed in this pull request? This PR adds tests for default values handling in DELETE, UPDATE, MERGE. ### Why are the changes needed? The

Re: [PR] [SPARK-51272][CORE]. Fix for the race condition in Scheduler causing failure in retrying all partitions in case of indeterministic shuffle keys [spark]

2025-03-13 Thread via GitHub
ahshahid commented on code in PR #50033: URL: https://github.com/apache/spark/pull/50033#discussion_r1994537557 ## core/src/main/scala/org/apache/spark/scheduler/ShuffleMapStage.scala: ## @@ -90,8 +90,11 @@ private[spark] class ShuffleMapStage( /** Returns the sequence of p

Re: [PR] [SPARK-51372][SQL] Introduce a builder pattern in TableCatalog [spark]

2025-03-13 Thread via GitHub
beliefer commented on code in PR #50137: URL: https://github.com/apache/spark/pull/50137#discussion_r1994546972 ## sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/TableBuilderImpl.java: ## @@ -0,0 +1,73 @@ +/* + * Licensed to the Apache Software Foundation (ASF

Re: [PR] [SPARK-50806][SQL] Support InputRDDCodegen interruption on task cancellation [spark]

2025-03-13 Thread via GitHub
Ngone51 commented on PR #49501: URL: https://github.com/apache/spark/pull/49501#issuecomment-2723111433 @cloud-fan @dongjoon-hyun I have updated the PR. Could you take another look? Thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log

[PR] [SPARK-51507] Support creating config map that can be mounted by Spark pods for apps [spark-kubernetes-operator]

2025-03-13 Thread via GitHub
jiangzho opened a new pull request, #166: URL: https://github.com/apache/spark-kubernetes-operator/pull/166 ### What changes were proposed in this pull request? This PR introduces a new field `configMapSpecs`, which enables user to create configmap(s) which can be later mo

[PR] [SPARK-51508] Support `collect(): [[String]]` for `DataFrame` [spark-connect-swift]

2025-03-13 Thread via GitHub
dongjoon-hyun opened a new pull request, #17: URL: https://github.com/apache/spark-connect-swift/pull/17 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change?

Re: [PR] [SPARK-51504] Support `select/limit/sort/orderBy/isEmpty` for `DataFrame` [spark-connect-swift]

2025-03-13 Thread via GitHub
dongjoon-hyun commented on PR #16: URL: https://github.com/apache/spark-connect-swift/pull/16#issuecomment-2722806877 Thank you so much, @viirya ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] [SPARK-51508] Support `collect(): [[String]]` for `DataFrame` [spark-connect-swift]

2025-03-13 Thread via GitHub
dongjoon-hyun commented on code in PR #17: URL: https://github.com/apache/spark-connect-swift/pull/17#discussion_r1994560977 ## Sources/SparkConnect/DataFrame.swift: ## @@ -58,7 +58,7 @@ public actor DataFrame: Sendable { /// Add `Apache Arrow`'s `RecordBatch`s to the inte

Re: [PR] [SPARK-51508] Support `collect(): [[String]]` for `DataFrame` [spark-connect-swift]

2025-03-13 Thread via GitHub
dongjoon-hyun commented on code in PR #17: URL: https://github.com/apache/spark-connect-swift/pull/17#discussion_r1994561915 ## Sources/SparkConnect/SparkSession.swift: ## @@ -45,12 +45,10 @@ public actor SparkSession { /// - userID: an optional user ID. If absent, `SPARK_

Re: [PR] [SPARK-51508] Support `collect(): [[String]]` for `DataFrame` [spark-connect-swift]

2025-03-13 Thread via GitHub
dongjoon-hyun commented on code in PR #17: URL: https://github.com/apache/spark-connect-swift/pull/17#discussion_r1994561481 ## Sources/SparkConnect/SparkConnectClient.swift: ## @@ -275,9 +275,11 @@ public actor SparkConnectClient { let expressions: [Spark_Connect_Expressi

Re: [PR] [SPARK-51508] Support `collect(): [[String]]` for `DataFrame` [spark-connect-swift]

2025-03-13 Thread via GitHub
dongjoon-hyun commented on code in PR #17: URL: https://github.com/apache/spark-connect-swift/pull/17#discussion_r1994569044 ## Tests/SparkConnectTests/DataFrameTests.swift: ## @@ -125,19 +125,25 @@ struct DataFrameTests { await spark.stop() } +#if !os(Linux) Review

Re: [PR] [SPARK-51443][SS] Fix singleVariantColumn in DSv2 and readStream. [spark]

2025-03-13 Thread via GitHub
cloud-fan commented on PR #50217: URL: https://github.com/apache/spark/pull/50217#issuecomment-2721276493 thanks, merging to master/4.0! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specif

Re: [PR] [SPARK-51487][PYTHON][INFRA] Refresh testing images for pyarrow 19 [spark]

2025-03-13 Thread via GitHub
dongjoon-hyun commented on PR #50255: URL: https://github.com/apache/spark/pull/50255#issuecomment-2722887996 Given the assessment, I'm okay to merge this. Feel free to merge, @zhengruifeng . cc @cloud-fan as the release manager of Apache Spark 4.0.0. -- This is an automated messag

Re: [PR] [SPARK-51272][CORE]. Fix for the race condition in Scheduler causing failure in retrying all partitions in case of indeterministic shuffle keys [spark]

2025-03-13 Thread via GitHub
attilapiros commented on code in PR #50033: URL: https://github.com/apache/spark/pull/50033#discussion_r1994377379 ## resource-managers/yarn/src/test/scala/org/apache/spark/deploy/yarn/BaseYarnClusterSuite.scala: ## @@ -167,7 +167,10 @@ abstract class BaseYarnClusterSuite extend

[PR] [SPARK-51506][PYTHON][SS] Do not enforce users to implement close() in TransformWithStateInPandas [spark]

2025-03-13 Thread via GitHub
bogao007 opened a new pull request, #50272: URL: https://github.com/apache/spark/pull/50272 ### What changes were proposed in this pull request? Do not enforce users to implement `close()` in TransformWithStateInPandas since `close()` is an optional function to implement.

Re: [PR] [SPARK-51272][CORE]. Fix for the race condition in Scheduler causing failure in retrying all partitions in case of indeterministic shuffle keys [spark]

2025-03-13 Thread via GitHub
ahshahid commented on code in PR #50033: URL: https://github.com/apache/spark/pull/50033#discussion_r1994490858 ## resource-managers/yarn/pom.xml: ## @@ -37,6 +37,11 @@ spark-core_${scala.binary.version} ${project.version} + + org.apache.spark +

Re: [PR] [SPARK-51272][CORE]. Fix for the race condition in Scheduler causing failure in retrying all partitions in case of indeterministic shuffle keys [spark]

2025-03-13 Thread via GitHub
ahshahid commented on code in PR #50033: URL: https://github.com/apache/spark/pull/50033#discussion_r1994509791 ## resource-managers/yarn/src/test/scala/org/apache/spark/deploy/yarn/BaseYarnClusterSuite.scala: ## @@ -167,7 +167,10 @@ abstract class BaseYarnClusterSuite extends S

Re: [PR] [SPARK-51508] Support `collect(): [[String]]` for `DataFrame` [spark-connect-swift]

2025-03-13 Thread via GitHub
dongjoon-hyun commented on PR #17: URL: https://github.com/apache/spark-connect-swift/pull/17#issuecomment-2723173515 Could you review this, @viirya ? I added the first implementation for `collect()` API for users and for easy testing method. -- This is an automated message from the Apac

Re: [PR] [SPARK-51504] Support `select/limit/sort/orderBy/isEmpty` for `DataFrame` [spark-connect-swift]

2025-03-13 Thread via GitHub
dongjoon-hyun closed pull request #16: [SPARK-51504] Support `select/limit/sort/orderBy/isEmpty` for `DataFrame` URL: https://github.com/apache/spark-connect-swift/pull/16 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use t

Re: [PR] [SPARK-50763][SQL] Add Analyzer rule for resolving SQL table functions [spark]

2025-03-13 Thread via GitHub
cloud-fan commented on PR #49471: URL: https://github.com/apache/spark/pull/49471#issuecomment-2723198248 thanks, merging to master/4.0! (This is the last piece of the SQL UDF feature) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

Re: [PR] [SPARK-50763][SQL] Add Analyzer rule for resolving SQL table functions [spark]

2025-03-13 Thread via GitHub
cloud-fan closed pull request #49471: [SPARK-50763][SQL] Add Analyzer rule for resolving SQL table functions URL: https://github.com/apache/spark/pull/49471 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

Re: [PR] [SPARK-51466][SQL][HIVE] Eliminate Hive built-in UDFs initialization on Hive UDF evaluation [spark]

2025-03-13 Thread via GitHub
pan3793 commented on PR #50232: URL: https://github.com/apache/spark/pull/50232#issuecomment-2720217305 > Hey, why are any manual tests listed in the PR desc not included in UTs? Given that we have excluded `hive-llap-*` deps from STS modules, the existing STS SQL tests should cover a

Re: [PR] [SPARK-51450][CORE] BarrierCoordinator thread not exiting in Spark standalone mode [spark]

2025-03-13 Thread via GitHub
beliefer commented on PR #50223: URL: https://github.com/apache/spark/pull/50223#issuecomment-2720399615 At that time, https://issues.apache.org/jira/browse/SPARK-46895 is an improvement, not a bug fix. You means JVM will not exit even if the branch-3.5 uses `Timer`? -- This is an auto

Re: [PR] [SPARK-51450][CORE] BarrierCoordinator thread not exiting in Spark standalone mode [spark]

2025-03-13 Thread via GitHub
beliefer commented on PR #50223: URL: https://github.com/apache/spark/pull/50223#issuecomment-2720555409 cc @srowen -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsu

Re: [PR] [SPARK-51450][CORE] BarrierCoordinator thread not exiting in Spark standalone mode [spark]

2025-03-13 Thread via GitHub
jjayadeep06 commented on PR #50223: URL: https://github.com/apache/spark/pull/50223#issuecomment-2720538216 > At that time, https://issues.apache.org/jira/browse/SPARK-46895 is an improvement, not a bug fix. You means JVM will not exit even if the branch-3.5 uses `Timer`? yes, and if

[PR] [WIP][SQL] Add the default time formatter [spark]

2025-03-13 Thread via GitHub
MaxGekk opened a new pull request, #50266: URL: https://github.com/apache/spark/pull/50266 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How was

Re: [PR] [SPARK-51402][SQL][TESTS] Test TimeType in UDF [spark]

2025-03-13 Thread via GitHub
MaxGekk closed pull request #50194: [SPARK-51402][SQL][TESTS] Test TimeType in UDF URL: https://github.com/apache/spark/pull/50194 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific commen

Re: [PR] [SPARK-51402][SQL][TESTS] Test TimeType in UDF [spark]

2025-03-13 Thread via GitHub
MaxGekk commented on PR #50194: URL: https://github.com/apache/spark/pull/50194#issuecomment-2720630196 @calilisantos Congratulations with your first contribution to Apache Spark! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHu

  1   2   >