Re: [PR] [MINOR][DOCS] Fix some typos in `LZFBenchmark` [spark]

2024-07-22 Thread via GitHub
HyukjinKwon closed pull request #47435: [MINOR][DOCS] Fix some typos in `LZFBenchmark` URL: https://github.com/apache/spark/pull/47435 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific co

Re: [PR] [MINOR][DOCS] Fix some typos in `LZFBenchmark` [spark]

2024-07-22 Thread via GitHub
HyukjinKwon commented on PR #47435: URL: https://github.com/apache/spark/pull/47435#issuecomment-2242240353 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] [SPARK-48752][FOLLOWUP][PYTHON][DOCS] Use explicit name for line number in log [spark]

2024-07-22 Thread via GitHub
HyukjinKwon commented on code in PR #47437: URL: https://github.com/apache/spark/pull/47437#discussion_r1686068685 ## python/pyspark/errors/exceptions/base.py: ## @@ -137,11 +137,11 @@ def _log_exception(self) -> None: if query_context.contextType().name == "DataFra

Re: [PR] [SPARK-48752][FOLLOWUP][PYTHON][DOCS] Use explicit name for line number in log [spark]

2024-07-22 Thread via GitHub
HyukjinKwon commented on code in PR #47437: URL: https://github.com/apache/spark/pull/47437#discussion_r1686067684 ## python/pyspark/errors/exceptions/base.py: ## @@ -137,11 +137,11 @@ def _log_exception(self) -> None: if query_context.contextType().name == "DataFra

Re: [PR] [SPARK-48962][INFRA] Make the input parameters of `workflows/benchmark` selectable [spark]

2024-07-22 Thread via GitHub
panbingkun commented on code in PR #47438: URL: https://github.com/apache/spark/pull/47438#discussion_r1686077120 ## .github/workflows/benchmark.yml: ## @@ -50,7 +58,7 @@ jobs: outputs: matrix: ${{ steps.set-matrix.outputs.matrix }} env: - SPARK_BENCHMARK_N

Re: [PR] [SPARK-48962][INFRA] Make the input parameters of `workflows/benchmark` selectable [spark]

2024-07-22 Thread via GitHub
panbingkun commented on PR #47438: URL: https://github.com/apache/spark/pull/47438#issuecomment-2242265722 Using the modified `workflows/benchmark` above, a benchmark(`org.apache.spark.io.ZStandardBenchmark`) was triggered as follows: https://github.com/panbingkun/spark/actions/runs/1003643

Re: [PR] [SPARK-48906][SQL] Introduce `SHOW COLLATIONS LIKE ...` syntax to show all collations [spark]

2024-07-22 Thread via GitHub
mihailom-db commented on PR #47364: URL: https://github.com/apache/spark/pull/47364#issuecomment-2242283304 Hi @panbingkun, thanks for taking initiative to push this work forward. The design of the table was discussed previously and the structure that was agreed upon should take a slightly

Re: [PR] [SPARK-47172][CORE] Add support for AES-GCM for RPC encryption [spark]

2024-07-22 Thread via GitHub
LuciferYang commented on code in PR #46515: URL: https://github.com/apache/spark/pull/46515#discussion_r1686131240 ## common/network-common/src/main/java/org/apache/spark/network/crypto/CtrTransportCipher.java: ## @@ -0,0 +1,381 @@ +/* + * Licensed to the Apache Software Foundat

Re: [PR] [SPARK-48906][SQL] Introduce `SHOW COLLATIONS LIKE ...` syntax to show all collations [spark]

2024-07-22 Thread via GitHub
mihailom-db commented on code in PR #47364: URL: https://github.com/apache/spark/pull/47364#discussion_r1686116407 ## docs/sql-ref-ansi-compliance.md: ## @@ -442,6 +442,7 @@ Below is a list of all the keywords in Spark SQL. |CODEGEN|non-reserved|non-reserved|non-reserved| |COL

Re: [PR] [SPARK-48906][SQL] Introduce `SHOW COLLATIONS LIKE ...` syntax to show all collations [spark]

2024-07-22 Thread via GitHub
panbingkun commented on PR #47364: URL: https://github.com/apache/spark/pull/47364#issuecomment-2242344407 > Apart from SQL API, we need to support other APIs as well, which should be used by calling `Session.catalog.collation`. Because of this, your approach might need to be reworked a bit

Re: [PR] [SPARK-48906][SQL] Introduce `SHOW COLLATIONS LIKE ...` syntax to show all collations [spark]

2024-07-22 Thread via GitHub
mihailom-db commented on PR #47364: URL: https://github.com/apache/spark/pull/47364#issuecomment-2242350155 I believe for now we agreed to have only `SHOW COLLATION(S)` as a command, and then add support for both LIKE and ILIKE operators for searching. But it is enough to have LIKE as a sta

Re: [PR] [SPARK-48906][SQL] Introduce `SHOW COLLATIONS LIKE ...` syntax to show all collations [spark]

2024-07-22 Thread via GitHub
panbingkun commented on PR #47364: URL: https://github.com/apache/spark/pull/47364#issuecomment-2242353399 > I believe for now we agreed to have only `SHOW COLLATION(S)` as a command, and then add support for both LIKE and ILIKE operators for searching. But it is enough to have LIKE as a st

Re: [PR] [SPARK-48943][TESTS] Upgrade `h2` to 2.3.230 and enhance the test coverage of behavior changes of `asin` and `acos` complying Standard SQL [spark]

2024-07-22 Thread via GitHub
LuciferYang commented on code in PR #47414: URL: https://github.com/apache/spark/pull/47414#discussion_r1686150662 ## sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCV2Suite.scala: ## @@ -1275,11 +1275,32 @@ class JDBCV2Suite extends QueryTest with SharedSparkSession with

Re: [PR] [SPARK-48906][SQL] Introduce `SHOW COLLATIONS LIKE ...` syntax to show all collations [spark]

2024-07-22 Thread via GitHub
panbingkun commented on code in PR #47364: URL: https://github.com/apache/spark/pull/47364#discussion_r1686161417 ## docs/sql-ref-ansi-compliance.md: ## @@ -442,6 +442,7 @@ Below is a list of all the keywords in Spark SQL. |CODEGEN|non-reserved|non-reserved|non-reserved| |COLL

Re: [PR] [SPARK-48906][SQL] Introduce `SHOW COLLATIONS LIKE ...` syntax to show all collations [spark]

2024-07-22 Thread via GitHub
mihailom-db commented on code in PR #47364: URL: https://github.com/apache/spark/pull/47364#discussion_r1686171509 ## docs/sql-ref-ansi-compliance.md: ## @@ -442,6 +442,7 @@ Below is a list of all the keywords in Spark SQL. |CODEGEN|non-reserved|non-reserved|non-reserved| |COL

Re: [PR] [SPARK-48906][SQL] Introduce `SHOW COLLATIONS LIKE ...` syntax to show all collations [spark]

2024-07-22 Thread via GitHub
panbingkun commented on code in PR #47364: URL: https://github.com/apache/spark/pull/47364#discussion_r1686179885 ## docs/sql-ref-ansi-compliance.md: ## @@ -442,6 +442,7 @@ Below is a list of all the keywords in Spark SQL. |CODEGEN|non-reserved|non-reserved|non-reserved| |COLL

Re: [PR] [SPARK-48906][SQL] Introduce `SHOW COLLATIONS LIKE ...` syntax to show all collations [spark]

2024-07-22 Thread via GitHub
panbingkun commented on code in PR #47364: URL: https://github.com/apache/spark/pull/47364#discussion_r1686181436 ## common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationFactory.java: ## @@ -918,4 +967,8 @@ public static String getClosestSuggestionsOnInvalidNa

Re: [PR] [SPARK-48906][SQL] Introduce `SHOW COLLATIONS LIKE ...` syntax to show all collations [spark]

2024-07-22 Thread via GitHub
panbingkun commented on code in PR #47364: URL: https://github.com/apache/spark/pull/47364#discussion_r1686181924 ## common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationFactory.java: ## @@ -91,7 +91,7 @@ public Optional getVersion() { /** * Entry encap

Re: [PR] [SPARK-43301][CORE][SHUFFLE] BlockStoreClient getHostLocalDirs RPC supports IOException retry [spark]

2024-07-22 Thread via GitHub
cxzl25 commented on code in PR #46805: URL: https://github.com/apache/spark/pull/46805#discussion_r1686216130 ## common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/BlockStoreClient.java: ## @@ -46,6 +53,10 @@ public abstract class BlockStoreClient implements

[PR] [SPARK-44884][Core] Create _SUCCESS marker file for spark write with … [spark]

2024-07-22 Thread via GitHub
anikakelhanka opened a new pull request, #47439: URL: https://github.com/apache/spark/pull/47439 …partitionOveriteMode=dynamic ### What changes were proposed in this pull request? **Issue:** In the Spark versions post v3.0.2, the SUCCESS Marker file is missing on the roo

Re: [PR] [SPARK-48946][SQL] NPE in redact method when session is null [spark]

2024-07-22 Thread via GitHub
mikoszilard commented on PR #47419: URL: https://github.com/apache/spark/pull/47419#issuecomment-2242518732 Thank you very much @dongjoon-hyun. I'm really happy that I could start contributing. -- This is an automated message from the Apache Git Service. To respond to the message, please

Re: [PR] [SPARK-48962][INFRA] Make the input parameters of `workflows/benchmark` selectable [spark]

2024-07-22 Thread via GitHub
HyukjinKwon commented on PR #47438: URL: https://github.com/apache/spark/pull/47438#issuecomment-2242520759 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] [SPARK-48962][INFRA] Make the input parameters of `workflows/benchmark` selectable [spark]

2024-07-22 Thread via GitHub
HyukjinKwon closed pull request #47438: [SPARK-48962][INFRA] Make the input parameters of `workflows/benchmark` selectable URL: https://github.com/apache/spark/pull/47438 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use th

Re: [PR] [SPARK-48835] Introduce versoning to jdbc connectors [spark]

2024-07-22 Thread via GitHub
yaooqinn commented on PR #47181: URL: https://github.com/apache/spark/pull/47181#issuecomment-2242591010 The use of version-numbered configurations in Spark can be unfriendly for users compared to legacy configurations. Various version policies in Spark, such as Thrift Server Versions, API

Re: [PR] [SPARK-43301][CORE][SHUFFLE] BlockStoreClient getHostLocalDirs RPC supports IOException retry [spark]

2024-07-22 Thread via GitHub
cxzl25 commented on code in PR #46805: URL: https://github.com/apache/spark/pull/46805#discussion_r1686322586 ## common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/BlockStoreClient.java: ## @@ -161,6 +172,22 @@ public void getHostLocalDirs( String[] exe

Re: [PR] [SPARK-48943][TESTS] Upgrade `h2` to 2.3.230 and enhance the test coverage of behavior changes of `asin` and `acos` complying Standard SQL [spark]

2024-07-22 Thread via GitHub
wayneguow commented on code in PR #47414: URL: https://github.com/apache/spark/pull/47414#discussion_r1686324246 ## sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCV2Suite.scala: ## @@ -1275,11 +1275,32 @@ class JDBCV2Suite extends QueryTest with SharedSparkSession with E

[PR] [SPARK-48963][INFRA] Support JIRA_ACCESS_TOKEN in translate-contributors.py [spark]

2024-07-22 Thread via GitHub
yaooqinn opened a new pull request, #47440: URL: https://github.com/apache/spark/pull/47440 ### What changes were proposed in this pull request? Support JIRA_ACCESS_TOKEN in translate-contributors.py ### Why are the changes needed? Remove plaintext password in

Re: [PR] [SPARK-48963][INFRA] Support JIRA_ACCESS_TOKEN in translate-contributors.py [spark]

2024-07-22 Thread via GitHub
yaooqinn commented on PR #47440: URL: https://github.com/apache/spark/pull/47440#issuecomment-2242624638 cc @dongjoon-hyun @HyukjinKwon @cloud-fan thanks -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[PR] [MINOR] Add more know translations for contributors [spark]

2024-07-22 Thread via GitHub
yaooqinn opened a new pull request, #47441: URL: https://github.com/apache/spark/pull/47441 ### What changes were proposed in this pull request? Recognized these contribtuor translations ```diff +Yikf - Kaifei Yi +jackylee-ch - Junqing Li +liujiayi771 - Jiayi Liu +mahesh

Re: [PR] [SPARK-47764][FOLLOW-UP] Change to use ShuffleDriverComponents.removeShuffle to remove shuffle properly [spark]

2024-07-22 Thread via GitHub
bozhang2820 commented on code in PR #46302: URL: https://github.com/apache/spark/pull/46302#discussion_r1686367506 ## sql/core/src/main/scala/org/apache/spark/sql/execution/SQLExecution.scala: ## @@ -176,7 +176,13 @@ object SQLExecution extends Logging { shuffleIds.

[PR] [WIP][SPARK-48346][SQL] Support for IF ELSE statements in SQL scripts [spark]

2024-07-22 Thread via GitHub
davidm-db opened a new pull request, #47442: URL: https://github.com/apache/spark/pull/47442 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How w

Re: [PR] [SPARK-48911][SQL][TESTS] Improve collation support testing for various expressions [spark]

2024-07-22 Thread via GitHub
stevomitric commented on code in PR #47372: URL: https://github.com/apache/spark/pull/47372#discussion_r1686423780 ## sql/core/src/test/scala/org/apache/spark/sql/CollationSQLExpressionsSuite.scala: ## @@ -2295,6 +2295,827 @@ class CollationSQLExpressionsSuite assert(typeEx

Re: [PR] [SPARK-48881][SQL] Some dynamic partitions can be compensated to specific partition values [spark]

2024-07-22 Thread via GitHub
LuciferYang commented on PR #47418: URL: https://github.com/apache/spark/pull/47418#issuecomment-2242847274 I haven't reviewed the code changes in the pr yet, but: 1. The PR title should reflect the work done in the current PR as much as possible, the current title looks more like a Jira

Re: [PR] [SPARK-48835] Introduce versoning to jdbc connectors [spark]

2024-07-22 Thread via GitHub
yaooqinn commented on PR #47181: URL: https://github.com/apache/spark/pull/47181#issuecomment-2242860051 Also, any number of features in a version can lead to backporting issues. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

Re: [PR] [SPARK-48881][SQL] Some dynamic partitions can be compensated to specific partition values [spark]

2024-07-22 Thread via GitHub
LuciferYang commented on code in PR #47418: URL: https://github.com/apache/spark/pull/47418#discussion_r1686486058 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/V1Writes.scala: ## @@ -43,7 +43,7 @@ trait V1WriteCommand extends DataWritingCommand { /**

[PR] Add interface which prevents DataSourceV2Strategy to do planning for scan nodes [spark]

2024-07-22 Thread via GitHub
urosstan-db opened a new pull request, #47443: URL: https://github.com/apache/spark/pull/47443 ### What changes were proposed in this pull request? Add new interface (`ExternallyPlannedV1Scan`) for `V1Scan` which will prevent `DataSourceV2Strategy` to do planning of optimized scan node.

Re: [PR] Add interface which prevents DataSourceV2Strategy to do planning for scan nodes [spark]

2024-07-22 Thread via GitHub
urosstan-db commented on code in PR #47443: URL: https://github.com/apache/spark/pull/47443#discussion_r1686489425 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Strategy.scala: ## @@ -108,6 +108,9 @@ class DataSourceV2Strategy(session: Spar

Re: [PR] [SPARK-48941][PYTHON][ML] Replace RDD read / write API invocation with Dataframe read / write API [spark]

2024-07-22 Thread via GitHub
WeichenXu123 closed pull request #47411: [SPARK-48941][PYTHON][ML] Replace RDD read / write API invocation with Dataframe read / write API URL: https://github.com/apache/spark/pull/47411 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

Re: [PR] [SPARK-48941][PYTHON][ML] Replace RDD read / write API invocation with Dataframe read / write API [spark]

2024-07-22 Thread via GitHub
WeichenXu123 commented on PR #47411: URL: https://github.com/apache/spark/pull/47411#issuecomment-2242944242 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment

Re: [PR] [WIP][SPARK-48346][SQL] Support for IF ELSE statements in SQL scripts [spark]

2024-07-22 Thread via GitHub
miland-db commented on code in PR #47442: URL: https://github.com/apache/spark/pull/47442#discussion_r1686546091 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala: ## @@ -175,10 +175,22 @@ class AstBuilder extends DataTypeAstBuilder with SQLCo

Re: [PR] [SPARK-48344][SQL] SQL API change to support execution of compound statements [spark]

2024-07-22 Thread via GitHub
davidm-db commented on code in PR #47403: URL: https://github.com/apache/spark/pull/47403#discussion_r1686579965 ## sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala: ## @@ -650,14 +657,27 @@ class SparkSession private( private[sql] def sql(sqlText: String, args

Re: [PR] [SPARK-48959][SQL] Make `NoSuchNamespaceException` extend `NoSuchDatabaseException` to restore the exception handling [spark]

2024-07-22 Thread via GitHub
cloud-fan commented on PR #47433: URL: https://github.com/apache/spark/pull/47433#issuecomment-2243070022 thanks, merging to master! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific c

Re: [PR] [SPARK-48959][SQL] Make `NoSuchNamespaceException` extend `NoSuchDatabaseException` to restore the exception handling [spark]

2024-07-22 Thread via GitHub
cloud-fan closed pull request #47433: [SPARK-48959][SQL] Make `NoSuchNamespaceException` extend `NoSuchDatabaseException` to restore the exception handling URL: https://github.com/apache/spark/pull/47433 -- This is an automated message from the Apache Git Service. To respond to the message,

[PR] [WIP][SQL] Optimize string searching under UTF8_LCASE collation [spark]

2024-07-22 Thread via GitHub
uros-db opened a new pull request, #47444: URL: https://github.com/apache/spark/pull/47444 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How was

Re: [PR] [SPARK-43242][CORE] Fix throw 'Unexpected type of BlockId' in shuffle corruption diagnose [spark]

2024-07-22 Thread via GitHub
CavemanIV commented on PR #40921: URL: https://github.com/apache/spark/pull/40921#issuecomment-2243095339 many thanks for @cxzl25 adding UT -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spe

Re: [PR] [SPARK-48344][SQL] SQL API change to support execution of compound statements [spark]

2024-07-22 Thread via GitHub
davidm-db commented on code in PR #47403: URL: https://github.com/apache/spark/pull/47403#discussion_r1686757282 ## sql/core/src/main/scala/org/apache/spark/sql/scripting/SqlScriptingExecutionNode.scala: ## @@ -71,14 +85,14 @@ trait NonLeafStatementExec extends CompoundStatement

Re: [PR] [SPARK-48344][SQL] SQL API change to support execution of compound statements [spark]

2024-07-22 Thread via GitHub
davidm-db commented on code in PR #47403: URL: https://github.com/apache/spark/pull/47403#discussion_r1686758572 ## sql/core/src/main/scala/org/apache/spark/sql/scripting/SqlScriptingExecutionNode.scala: ## @@ -91,21 +105,41 @@ class SingleStatementExec( } override def r

Re: [PR] [SPARK-48344][SQL] SQL API change to support execution of compound statements [spark]

2024-07-22 Thread via GitHub
davidm-db commented on code in PR #47403: URL: https://github.com/apache/spark/pull/47403#discussion_r1686780223 ## sql/core/src/main/scala/org/apache/spark/sql/scripting/SqlScriptingExecutionNode.scala: ## @@ -91,21 +105,41 @@ class SingleStatementExec( } override def r

Re: [PR] [SPARK-48761][SQL] Introduce clusterBy DataFrameWriter API for Scala [spark]

2024-07-22 Thread via GitHub
zedtang commented on code in PR #47301: URL: https://github.com/apache/spark/pull/47301#discussion_r1686736484 ## connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala: ## @@ -201,6 +201,22 @@ final class DataFrameWriter[T] private[sql] (ds: Dat

Re: [PR] [SPARK-48963][INFRA] Support `JIRA_ACCESS_TOKEN` in translate-contributors.py [spark]

2024-07-22 Thread via GitHub
dongjoon-hyun closed pull request #47440: [SPARK-48963][INFRA] Support `JIRA_ACCESS_TOKEN` in translate-contributors.py URL: https://github.com/apache/spark/pull/47440 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the U

[PR] [SPARK-48849][SS]Create OperatorStateMetadataV2 for the TransformWithStateExec operator [spark]

2024-07-22 Thread via GitHub
ericm-db opened a new pull request, #47445: URL: https://github.com/apache/spark/pull/47445 ### What changes were proposed in this pull request? Introducing the OperatorStateMetadataV2 format that integrates with the TransformWithStateExec operator. This is used to keep inform

Re: [PR] [SPARK-48963][INFRA] Support `JIRA_ACCESS_TOKEN` in translate-contributors.py [spark]

2024-07-22 Thread via GitHub
dongjoon-hyun commented on PR #47440: URL: https://github.com/apache/spark/pull/47440#issuecomment-2243304347 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific commen

Re: [PR] [SPARK-48849][SS]Create OperatorStateMetadataV2 for the TransformWithStateExec operator [spark]

2024-07-22 Thread via GitHub
ericm-db closed pull request #47273: [SPARK-48849][SS]Create OperatorStateMetadataV2 for the TransformWithStateExec operator URL: https://github.com/apache/spark/pull/47273 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

Re: [PR] [SPARK-48958][BUILD] Upgrade `zstd-jni` to 1.5.6-4 [spark]

2024-07-22 Thread via GitHub
dongjoon-hyun closed pull request #47432: [SPARK-48958][BUILD] Upgrade `zstd-jni` to 1.5.6-4 URL: https://github.com/apache/spark/pull/47432 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the speci

Re: [PR] [SPARK-48835] Introduce versoning to jdbc connectors [spark]

2024-07-22 Thread via GitHub
milastdbx commented on PR #47181: URL: https://github.com/apache/spark/pull/47181#issuecomment-2243402879 @yaooqinn > However, the current system is not working in that way. What do you mean? > The rules for data type mapping cannot be determined by dialect versions alone. Vari

Re: [PR] [SPARK-48929] Fix view internal error and clean up parser exception context [spark]

2024-07-22 Thread via GitHub
srielau commented on code in PR #47405: URL: https://github.com/apache/spark/pull/47405#discussion_r1686852508 ## sql/api/src/main/scala/org/apache/spark/sql/catalyst/parser/parsers.scala: ## @@ -249,20 +249,24 @@ class ParseException private( override def getMessage: String

Re: [PR] [SPARK-20144] Allow reading files in order with spark.sql.files.allowReordering=false [spark]

2024-07-22 Thread via GitHub
tonyye commented on PR #22673: URL: https://github.com/apache/spark/pull/22673#issuecomment-2243404837 I know this is an old issue but does anyone know if this has changed in more recent versions of Spark? Is reading sorted data read in by spark in the same order? @darabos or @dgrnbrg do ei

Re: [PR] [SPARK-48849][SS]Create OperatorStateMetadataV2 for the TransformWithStateExec operator [spark]

2024-07-22 Thread via GitHub
jingz-db commented on code in PR #47445: URL: https://github.com/apache/spark/pull/47445#discussion_r1686896779 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/StateSchemaCompatibilityChecker.scala: ## @@ -66,7 +67,7 @@ class StateSchemaCompatibilityChe

Re: [PR] [SPARK-48849][SS]Create OperatorStateMetadataV2 for the TransformWithStateExec operator [spark]

2024-07-22 Thread via GitHub
ericm-db commented on code in PR #47445: URL: https://github.com/apache/spark/pull/47445#discussion_r1686910533 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/StateSchemaCompatibilityChecker.scala: ## @@ -66,7 +67,7 @@ class StateSchemaCompatibilityChe

Re: [PR] [SPARK-48914][SQL][TESTS] Add OFFSET operator as an option in the subquery generator [spark]

2024-07-22 Thread via GitHub
andylam-db commented on PR #47375: URL: https://github.com/apache/spark/pull/47375#issuecomment-2243487503 bumping @cloud-fan @HyukjinKwon for merge -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go t

[PR] [wip]Metadata vcf [spark]

2024-07-22 Thread via GitHub
ericm-db opened a new pull request, #47446: URL: https://github.com/apache/spark/pull/47446 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How wa

Re: [PR] [SPARK-48929] Fix view internal error and clean up parser exception context [spark]

2024-07-22 Thread via GitHub
gengliangwang closed pull request #47405: [SPARK-48929] Fix view internal error and clean up parser exception context URL: https://github.com/apache/spark/pull/47405 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

Re: [PR] [SPARK-48929] Fix view internal error and clean up parser exception context [spark]

2024-07-22 Thread via GitHub
gengliangwang commented on PR #47405: URL: https://github.com/apache/spark/pull/47405#issuecomment-2243557854 Thanks, merging to master -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specifi

Re: [PR] [SPARK-48957][SS] Return sub-classified error class on state store load for hdfs and rocksdb provider [spark]

2024-07-22 Thread via GitHub
anishshri-db commented on PR #47431: URL: https://github.com/apache/spark/pull/47431#issuecomment-2243616217 Tests are all green. Link here - https://github.com/anishshri-db/spark/actions/runs/10035345281/job/27763418833 -- This is an automated message from the Apache Git Service. To resp

Re: [PR] [SPARK-48849][SS]Create OperatorStateMetadataV2 for the TransformWithStateExec operator [spark]

2024-07-22 Thread via GitHub
anishshri-db commented on code in PR #47445: URL: https://github.com/apache/spark/pull/47445#discussion_r1687021568 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/IncrementalExecution.scala: ## @@ -208,14 +208,25 @@ class IncrementalExecution( }

Re: [PR] [SPARK-48849][SS]Create OperatorStateMetadataV2 for the TransformWithStateExec operator [spark]

2024-07-22 Thread via GitHub
anishshri-db commented on code in PR #47445: URL: https://github.com/apache/spark/pull/47445#discussion_r1687022712 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/state/metadata/StateMetadataSource.scala: ## @@ -188,29 +191,56 @@ class StateMetadataPart

Re: [PR] [SPARK-48849][SS]Create OperatorStateMetadataV2 for the TransformWithStateExec operator [spark]

2024-07-22 Thread via GitHub
anishshri-db commented on code in PR #47445: URL: https://github.com/apache/spark/pull/47445#discussion_r1687023328 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/state/metadata/StateMetadataSource.scala: ## @@ -188,29 +191,56 @@ class StateMetadataPart

Re: [PR] [SPARK-48849][SS]Create OperatorStateMetadataV2 for the TransformWithStateExec operator [spark]

2024-07-22 Thread via GitHub
anishshri-db commented on code in PR #47445: URL: https://github.com/apache/spark/pull/47445#discussion_r1687023864 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/HDFSMetadataLog.scala: ## @@ -325,6 +340,19 @@ class HDFSMetadataLog[T <: AnyRef : ClassTag](s

Re: [PR] [SPARK-48849][SS]Create OperatorStateMetadataV2 for the TransformWithStateExec operator [spark]

2024-07-22 Thread via GitHub
anishshri-db commented on code in PR #47445: URL: https://github.com/apache/spark/pull/47445#discussion_r1687024442 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/OperatorStateMetadataLog.scala: ## @@ -0,0 +1,64 @@ +/* + * Licensed to the Apache Software Fou

Re: [PR] [SPARK-48966][SQL] Improve error message with invalid unresolved column reference in UDTF call [spark]

2024-07-22 Thread via GitHub
dtenedor commented on PR #47447: URL: https://github.com/apache/spark/pull/47447#issuecomment-2243631525 cc @allisonwang-db @ueshin -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific c

Re: [PR] [SPARK-48849][SS]Create OperatorStateMetadataV2 for the TransformWithStateExec operator [spark]

2024-07-22 Thread via GitHub
anishshri-db commented on code in PR #47445: URL: https://github.com/apache/spark/pull/47445#discussion_r1687025381 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/TransformWithStateExec.scala: ## @@ -382,12 +397,45 @@ case class TransformWithStateExec(

[PR] [SPARK-48966][SQL] Improve error message with invalid unresolved column reference in UDTF call [spark]

2024-07-22 Thread via GitHub
dtenedor opened a new pull request, #47447: URL: https://github.com/apache/spark/pull/47447 ### What changes were proposed in this pull request? This bug covers improving an error message in the event of invalid UDTF calls. For example: ``` select * from udtf( observed

Re: [PR] [WIP] State data integration [spark]

2024-07-22 Thread via GitHub
jingz-db closed pull request #47306: [WIP] State data integration URL: https://github.com/apache/spark/pull/47306 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe

[PR] State source value [spark]

2024-07-22 Thread via GitHub
jingz-db opened a new pull request, #47448: URL: https://github.com/apache/spark/pull/47448 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How wa

Re: [PR] State source value [spark]

2024-07-22 Thread via GitHub
jingz-db closed pull request #47448: State source value URL: https://github.com/apache/spark/pull/47448 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

[PR] New value state [spark]

2024-07-22 Thread via GitHub
jingz-db opened a new pull request, #47449: URL: https://github.com/apache/spark/pull/47449 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How wa

Re: [PR] [SPARK-48849][SS]Create OperatorStateMetadataV2 for the TransformWithStateExec operator [spark]

2024-07-22 Thread via GitHub
anishshri-db commented on code in PR #47445: URL: https://github.com/apache/spark/pull/47445#discussion_r1687042904 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/state/metadata/StateMetadataSource.scala: ## @@ -188,29 +191,56 @@ class StateMetadataPart

Re: [PR] [MINOR][INFRA] Add more know translations for contributors [spark]

2024-07-22 Thread via GitHub
dongjoon-hyun commented on PR #47441: URL: https://github.com/apache/spark/pull/47441#issuecomment-2243661735 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific commen

Re: [PR] [MINOR][INFRA] Add more know translations for contributors [spark]

2024-07-22 Thread via GitHub
dongjoon-hyun closed pull request #47441: [MINOR][INFRA] Add more know translations for contributors URL: https://github.com/apache/spark/pull/47441 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to t

Re: [PR] New value state [spark]

2024-07-22 Thread via GitHub
jingz-db closed pull request #47449: New value state URL: https://github.com/apache/spark/pull/47449 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: rev

[PR] State data integration [spark]

2024-07-22 Thread via GitHub
jingz-db opened a new pull request, #47450: URL: https://github.com/apache/spark/pull/47450 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How wa

Re: [PR] [SPARK-48849][SS]Create OperatorStateMetadataV2 for the TransformWithStateExec operator [spark]

2024-07-22 Thread via GitHub
anishshri-db commented on code in PR #47445: URL: https://github.com/apache/spark/pull/47445#discussion_r1687054341 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/StateSchemaCompatibilityChecker.scala: ## @@ -219,7 +222,8 @@ object StateSchemaCompatibi

[PR] [SPARK-45787][SQL] Support Catalog.listColumns for clustering columns [spark]

2024-07-22 Thread via GitHub
zedtang opened a new pull request, #47451: URL: https://github.com/apache/spark/pull/47451 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How was

Re: [PR] [SPARK-48931][SS] Reduce Cloud Store List API cost for state store maintenance task [spark]

2024-07-22 Thread via GitHub
chaoqin-li1123 commented on code in PR #47393: URL: https://github.com/apache/spark/pull/47393#discussion_r1687060460 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDBFileManager.scala: ## @@ -426,7 +441,29 @@ class RocksDBFileManager( * - SST

Re: [PR] [SPARK-45787][SQL] Support Catalog.listColumns for clustering columns [spark]

2024-07-22 Thread via GitHub
zedtang commented on PR #47451: URL: https://github.com/apache/spark/pull/47451#issuecomment-2243690425 This PR depends on https://github.com/apache/spark/pull/47301 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

Re: [PR] [SPARK-48762] Introduce clusterBy DataFrameWriter API for Python [spark]

2024-07-22 Thread via GitHub
zedtang commented on PR #47452: URL: https://github.com/apache/spark/pull/47452#issuecomment-2243708336 This PR depends on https://github.com/apache/spark/pull/47451 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[PR] [SPARK-48762] Introduce clusterBy DataFrameWriter API for Python [spark]

2024-07-22 Thread via GitHub
zedtang opened a new pull request, #47452: URL: https://github.com/apache/spark/pull/47452 ### What changes were proposed in this pull request? Introduce clusterBy DataFrameWriter API for Python. ### Why are the changes needed? Introduce more ways for users to int

Re: [PR] [SPARK-48901][SPARK-48916][STREAMING][PYTHON] Introduce clusterBy DataStreamWriter API [spark]

2024-07-22 Thread via GitHub
zedtang commented on code in PR #47376: URL: https://github.com/apache/spark/pull/47376#discussion_r1687098487 ## sql/core/src/main/scala/org/apache/spark/sql/streaming/DataStreamWriter.scala: ## @@ -303,7 +331,7 @@ final class DataStreamWriter[T] private[sql](ds: Dataset[T]) {

[PR] [SPARK-48968] Avoid unnecessary task configuration in `spark-operator-api` [spark-kubernetes-operator]

2024-07-22 Thread via GitHub
dongjoon-hyun opened a new pull request, #20: URL: https://github.com/apache/spark-kubernetes-operator/pull/20 … ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing chang

Re: [PR] [SPARK-48968] Avoid unnecessary task configuration in `spark-operator-api` [spark-kubernetes-operator]

2024-07-22 Thread via GitHub
dongjoon-hyun commented on PR #20: URL: https://github.com/apache/spark-kubernetes-operator/pull/20#issuecomment-2243900381 Thank you, @huaxingao ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go t

Re: [PR] [SPARK-48968] Avoid unnecessary task configuration in `spark-operator-api` [spark-kubernetes-operator]

2024-07-22 Thread via GitHub
dongjoon-hyun closed pull request #20: [SPARK-48968] Avoid unnecessary task configuration in `spark-operator-api` URL: https://github.com/apache/spark-kubernetes-operator/pull/20 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub an

Re: [PR] [SPARK-48755] State V2 base implementation and ValueState support [spark]

2024-07-22 Thread via GitHub
anishshri-db commented on code in PR #47133: URL: https://github.com/apache/spark/pull/47133#discussion_r1687194253 ## python/pyspark/sql/pandas/group_ops.py: ## @@ -358,6 +362,120 @@ def applyInPandasWithState( ) return DataFrame(jdf, self.session) + +de

Re: [PR] [SPARK-48755] State V2 base implementation and ValueState support [spark]

2024-07-22 Thread via GitHub
bogao007 commented on code in PR #47133: URL: https://github.com/apache/spark/pull/47133#discussion_r1687203660 ## python/pyspark/sql/streaming/state_api_client.py: ## @@ -0,0 +1,162 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor licen

Re: [PR] [SPARK-48931][SS] Reduce Cloud Store List API cost for state store maintenance task [spark]

2024-07-22 Thread via GitHub
chaoqin-li1123 commented on PR #47393: URL: https://github.com/apache/spark/pull/47393#issuecomment-2243936187 Can you also update the pr description about the new conf? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

Re: [PR] [SPARK-48703][SQL][TESTS] Upgrade `mssql-jdbc` to 12.6.3.jre11 [spark]

2024-07-22 Thread via GitHub
dongjoon-hyun commented on PR #47075: URL: https://github.com/apache/spark/pull/47075#issuecomment-2243948941 Merged to master for Apache Spark 4.0.0-preview2. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL ab

Re: [PR] [SPARK-48703][SQL][TESTS] Upgrade `mssql-jdbc` to 12.6.3.jre11 [spark]

2024-07-22 Thread via GitHub
dongjoon-hyun closed pull request #47075: [SPARK-48703][SQL][TESTS] Upgrade `mssql-jdbc` to 12.6.3.jre11 URL: https://github.com/apache/spark/pull/47075 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[PR] [SPARK-48969] Fix `spark-operator` module to define test framework explicitly [spark-kubernetes-operator]

2024-07-22 Thread via GitHub
dongjoon-hyun opened a new pull request, #21: URL: https://github.com/apache/spark-kubernetes-operator/pull/21 … ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing chang

Re: [PR] [SPARK-48400][CORE]Promote `PrometheusServlet` to `DeveloperApi` [spark]

2024-07-22 Thread via GitHub
jiangzho commented on code in PR #46716: URL: https://github.com/apache/spark/pull/46716#discussion_r1687244271 ## core/src/main/scala/org/apache/spark/metrics/sink/PrometheusServlet.scala: ## @@ -24,15 +24,21 @@ import jakarta.servlet.http.HttpServletRequest import org.eclipse

Re: [PR] [SPARK-48969] Fix `spark-operator` module to define test framework explicitly [spark-kubernetes-operator]

2024-07-22 Thread via GitHub
dongjoon-hyun commented on PR #21: URL: https://github.com/apache/spark-kubernetes-operator/pull/21#issuecomment-2243985331 Oh, you are online. I hope you are getting better and better. Thank you so much, @viirya ! -- This is an automated message from the Apache Git Service. To respo

Re: [PR] [SPARK-48969] Fix `spark-operator` module to define test framework explicitly [spark-kubernetes-operator]

2024-07-22 Thread via GitHub
dongjoon-hyun closed pull request #21: [SPARK-48969] Fix `spark-operator` module to define test framework explicitly URL: https://github.com/apache/spark-kubernetes-operator/pull/21 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

  1   2   >