Re: [PR] [SPARK-50005][SQL] Enhance method verifyNotReadPath to identify subqueries hidden in the filter conditions. [spark]

2024-10-28 Thread via GitHub
zeal-thinker commented on code in PR #48554: URL: https://github.com/apache/spark/pull/48554#discussion_r1820088697 ## sql/core/src/test/scala/org/apache/spark/sql/sources/InsertSuite.scala: ## @@ -2731,6 +2731,27 @@ class InsertSuite extends DataSourceTest with SharedSparkSess

Re: [PR] [SPARK-47094][SQL][TEST][FOLLOWUP] SPJ : fix bucket reducer function [spark]

2024-10-28 Thread via GitHub
huaxingao commented on code in PR #47126: URL: https://github.com/apache/spark/pull/47126#discussion_r1820162439 ## sql/core/src/test/scala/org/apache/spark/sql/connector/KeyGroupedPartitioningSuite.scala: ## @@ -1809,6 +1809,56 @@ class KeyGroupedPartitioningSuite extends Dist

Re: [PR] [SPARK-50005][SQL] Enhance method verifyNotReadPath to identify subqueries hidden in the filter conditions. [spark]

2024-10-28 Thread via GitHub
xunxunmimi5577 commented on code in PR #48554: URL: https://github.com/apache/spark/pull/48554#discussion_r1820141576 ## sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala: ## @@ -1074,11 +1070,18 @@ object DDLUtils extends Logging { query: LogicalPl

Re: [PR] [SPARK-50117][BUILD][SS] Change to using `maven/sbt` plugin to generate `StateMessage.java` [spark]

2024-10-28 Thread via GitHub
dongjoon-hyun commented on PR #48654: URL: https://github.com/apache/spark/pull/48654#issuecomment-2443146258 Thank you for the sync-up with them, @HeartSaVioR . -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

Re: [PR] [SPARK-47094][SQL][TEST][FOLLOWUP] SPJ : fix bucket reducer function [spark]

2024-10-28 Thread via GitHub
viirya commented on PR #47126: URL: https://github.com/apache/spark/pull/47126#issuecomment-2443241928 cc @huaxingao -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To uns

Re: [PR] [MINOR][BUILD] Skip `deepspeed` in requirements on MacOS [spark]

2024-10-28 Thread via GitHub
dongjoon-hyun commented on PR #42411: URL: https://github.com/apache/spark/pull/42411#issuecomment-2443213834 Hi, @zhengruifeng , @yaooqinn , @HyukjinKwon . Let me backport this minor commit to `branch-3.5`. :) -- This is an automated message from the Apache Git Service. To respond to t

Re: [PR] [SPARK-50152][SS] Support handleInitialState with state data source reader [spark]

2024-10-28 Thread via GitHub
anishshri-db commented on code in PR #48686: URL: https://github.com/apache/spark/pull/48686#discussion_r1820081521 ## sql/core/src/test/scala/org/apache/spark/sql/streaming/TransformWithStateInitialStateSuite.scala: ## @@ -83,6 +98,59 @@ abstract class StatefulProcessorWithInit

[PR] [SPARK-50154][SQL] Assign appropriate error condition for `_LEGACY_ERROR_TEMP_0043`: `INVALID_RESET_COMMAND_FORMAT` [spark]

2024-10-28 Thread via GitHub
itholic opened a new pull request, #48689: URL: https://github.com/apache/spark/pull/48689 ### What changes were proposed in this pull request? This PR proposes to assign proper error condition & sqlstate for `_LEGACY_ERROR_TEMP_0043`: `INVALID_RESET_COMMAND_FORMAT`

Re: [PR] [SPARK-50113][CONNECT][PYTHON][TESTS] Compatibility check should respect `ONLY_SUPPORTED_WITH_SPARK_CONNECT` [spark]

2024-10-28 Thread via GitHub
itholic commented on code in PR #48651: URL: https://github.com/apache/spark/pull/48651#discussion_r1820066066 ## python/pyspark/sql/tests/test_connect_compatibility.py: ## @@ -60,16 +60,23 @@ from pyspark.sql.connect.streaming.readwriter import DataStreamReader as Connect

Re: [PR] [SPARK-49563][SQL] Add SQL pipe syntax for the WINDOW operator [spark]

2024-10-28 Thread via GitHub
Angryrou commented on PR #48649: URL: https://github.com/apache/spark/pull/48649#issuecomment-2443234073 Thank you @dtenedor for the consideration suggestions! I have updated the code accordingly and added more test cases. -- This is an automated message from the Apache Git Service. T

Re: [PR] [SPARK-50016][SQL] Assign appropriate error condition for `_LEGACY_ERROR_TEMP_2067`: `UNSUPPORTED_PARTITION_TRANSFORM` [spark]

2024-10-28 Thread via GitHub
itholic commented on code in PR #48655: URL: https://github.com/apache/spark/pull/48655#discussion_r1820087080 ## sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryExecutionErrors.scala: ## @@ -868,8 +868,8 @@ private[sql] object QueryExecutionErrors extends QueryErr

Re: [PR] [SPARK-50005][SQL] Enhance method verifyNotReadPath to identify subqueries hidden in the filter conditions. [spark]

2024-10-28 Thread via GitHub
zeal-thinker commented on code in PR #48554: URL: https://github.com/apache/spark/pull/48554#discussion_r1820083908 ## sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala: ## @@ -1008,13 +1008,9 @@ object DDLUtils extends Logging { if (!catalog.isTempVi

Re: [PR] [SPARK-50016][SQL] Assign appropriate error condition for `_LEGACY_ERROR_TEMP_2067`: `UNSUPPORTED_PARTITION_TRANSFORM` [spark]

2024-10-28 Thread via GitHub
itholic commented on code in PR #48655: URL: https://github.com/apache/spark/pull/48655#discussion_r1820089089 ## common/utils/src/main/resources/error/error-conditions.json: ## @@ -5398,6 +5398,12 @@ }, "sqlState" : "42902" }, + "UNSUPPORTED_PARTITION_TRANSFORM" :

Re: [PR] [SPARK-50152][SS] Support handleInitialState with state data source reader [spark]

2024-10-28 Thread via GitHub
anishshri-db commented on code in PR #48686: URL: https://github.com/apache/spark/pull/48686#discussion_r1820083680 ## sql/core/src/test/scala/org/apache/spark/sql/streaming/TransformWithStateInitialStateSuite.scala: ## @@ -497,4 +573,126 @@ class TransformWithStateInitialStateS

Re: [PR] [SPARK-50096][SQL] Assign appropriate error condition for `_LEGACY_ERROR_TEMP_2150`: `TUPLE_SIZE_EXCEEDS_LIMIT` [spark]

2024-10-28 Thread via GitHub
itholic commented on code in PR #48631: URL: https://github.com/apache/spark/pull/48631#discussion_r1820081434 ## common/utils/src/main/resources/error/error-conditions.json: ## @@ -4628,6 +4628,12 @@ ], "sqlState" : "42K09" }, + "TUPLE_SIZE_EXCEEDS_LIMIT" : { +

Re: [PR] [SPARK-50005][SQL] Enhance method verifyNotReadPath to identify subqueries hidden in the filter conditions. [spark]

2024-10-28 Thread via GitHub
zeal-thinker commented on code in PR #48554: URL: https://github.com/apache/spark/pull/48554#discussion_r1820084448 ## sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala: ## @@ -1074,11 +1070,18 @@ object DDLUtils extends Logging { query: LogicalPlan

Re: [PR] [SPARK-50152][SS] Support handleInitialState with state data source reader [spark]

2024-10-28 Thread via GitHub
anishshri-db commented on PR #48686: URL: https://github.com/apache/spark/pull/48686#issuecomment-2443207720 @jingz-db - can u check if the lint failure is related ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

Re: [PR] [SPARK-49297][CORE][TESTS] Fix race condition in BlockManagerDecommissionIntegrationSuite [spark]

2024-10-28 Thread via GitHub
dongjoon-hyun closed pull request #48683: [SPARK-49297][CORE][TESTS] Fix race condition in BlockManagerDecommissionIntegrationSuite URL: https://github.com/apache/spark/pull/48683 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub a

Re: [PR] [SPARK-49297][CORE][TESTS] Fix race condition in BlockManagerDecommissionIntegrationSuite [spark]

2024-10-28 Thread via GitHub
dongjoon-hyun commented on PR #48683: URL: https://github.com/apache/spark/pull/48683#issuecomment-2443153867 After a minor import statement merging, I verified this manually. ```scala - import java.nio.file.Files - import java.nio.file.Paths + import java.nio.file.{Files, Paths}

Re: [PR] [SPARK-49563][SQL] Add SQL pipe syntax for the WINDOW operator [spark]

2024-10-28 Thread via GitHub
Angryrou commented on code in PR #48649: URL: https://github.com/apache/spark/pull/48649#discussion_r1820017432 ## sql/core/src/test/resources/sql-tests/inputs/pipe-operators.sql: ## @@ -665,15 +665,28 @@ table t table t |> order by x sort by x; --- The WINDOW clause is not

[PR] [SPARK-50153][SQL] Add `name` to `RuleExecutor` to make printing `QueryExecutionMetrics`'s logs clearer [spark]

2024-10-28 Thread via GitHub
panbingkun opened a new pull request, #48688: URL: https://github.com/apache/spark/pull/48688 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? Yes, ### How was

Re: [PR] [SPARK-50117][BUILD][SS] Change to using `maven/sbt` plugin to generate `StateMessage.java` [spark]

2024-10-28 Thread via GitHub
LuciferYang commented on PR #48654: URL: https://github.com/apache/spark/pull/48654#issuecomment-2443051272 > Please allow me to wait for a day to sync with folks working on this (TWS in PySpark). I've asked them to comment and they will come in US tz. If there is no comment in tomorrow US

Re: [PR] [SPARK-49563][SQL] Add SQL pipe syntax for the WINDOW operator [spark]

2024-10-28 Thread via GitHub
Angryrou commented on code in PR #48649: URL: https://github.com/apache/spark/pull/48649#discussion_r1820017432 ## sql/core/src/test/resources/sql-tests/inputs/pipe-operators.sql: ## @@ -665,15 +665,28 @@ table t table t |> order by x sort by x; --- The WINDOW clause is not

Re: [PR] [SPARK-49563][SQL] Add SQL pipe syntax for the WINDOW operator [spark]

2024-10-28 Thread via GitHub
Angryrou commented on code in PR #48649: URL: https://github.com/apache/spark/pull/48649#discussion_r1820017432 ## sql/core/src/test/resources/sql-tests/inputs/pipe-operators.sql: ## @@ -665,15 +665,28 @@ table t table t |> order by x sort by x; --- The WINDOW clause is not

Re: [PR] [SPARK-49563][SQL] Add SQL pipe syntax for the WINDOW operator [spark]

2024-10-28 Thread via GitHub
Angryrou commented on code in PR #48649: URL: https://github.com/apache/spark/pull/48649#discussion_r1820017432 ## sql/core/src/test/resources/sql-tests/inputs/pipe-operators.sql: ## @@ -665,15 +665,28 @@ table t table t |> order by x sort by x; --- The WINDOW clause is not

Re: [PR] [SPARK-50092][SQL] Fix PostgreSQL connector behaviour for multidimensional arrays [spark]

2024-10-28 Thread via GitHub
yaooqinn commented on PR #48625: URL: https://github.com/apache/spark/pull/48625#issuecomment-2443044261 Does this approach still work when the resultset given by limit 1 is empty? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitH

[PR] [SPARK-50102][SQL][CONNECT] Add shims need for missing public SQL methods. [spark]

2024-10-28 Thread via GitHub
hvanhovell opened a new pull request, #48687: URL: https://github.com/apache/spark/pull/48687 ### What changes were proposed in this pull request? This PR makes the following changes: - It adds shims for a couple of a number of classes exposed in the (classic) SQL interface: `BaseRela

Re: [PR] [SPARK-49899][PYTHON][SS] Support deleteIfExists for TransformWithStateInPandas [spark]

2024-10-28 Thread via GitHub
HeartSaVioR commented on code in PR #48373: URL: https://github.com/apache/spark/pull/48373#discussion_r1820001517 ## python/pyspark/sql/pandas/group_ops.py: ## @@ -503,7 +503,11 @@ def transformWithStateUDF( statefulProcessorApiClient.set_implicit_key(key)

Re: [PR] [SPARK-50092][SQL] Fix PostgreSQL connector behaviour for multidimensional arrays [spark]

2024-10-28 Thread via GitHub
yaooqinn commented on code in PR #48625: URL: https://github.com/apache/spark/pull/48625#discussion_r1819994352 ## connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/PostgresIntegrationSuite.scala: ## @@ -302,4 +317,22 @@ class PostgresIntegrationSuit

Re: [PR] [SPARK-50096][SQL] Assign appropriate error condition for `_LEGACY_ERROR_TEMP_2150`: `TUPLE_SIZE_EXCEEDS_LIMIT` [spark]

2024-10-28 Thread via GitHub
itholic commented on code in PR #48631: URL: https://github.com/apache/spark/pull/48631#discussion_r1819986087 ## common/utils/src/main/resources/error/error-conditions.json: ## @@ -4628,6 +4628,12 @@ ], "sqlState" : "42K09" }, + "TUPLE_SIZE_EXCEEDS_LIMIT" : { +

Re: [PR] [SPARK-50117][BUILD][SS] Change to using `maven/sbt` plugin to generate `StateMessage.java` [spark]

2024-10-28 Thread via GitHub
HeartSaVioR commented on PR #48654: URL: https://github.com/apache/spark/pull/48654#issuecomment-2443028368 Please allow me to wait for a day to sync with folks working on this (TWS in PySpark). I've asked them to comment and they will come in US tz. If there is no comment in tomorrow US tz

Re: [PR] [SPARK-50083][SQL] Integrate `_LEGACY_ERROR_TEMP_1231` into `PARTITIONS_NOT_FOUND` [spark]

2024-10-28 Thread via GitHub
itholic commented on code in PR #48614: URL: https://github.com/apache/spark/pull/48614#discussion_r1819978681 ## sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala: ## @@ -611,7 +611,7 @@ class HiveDDLSuite ) } - test("add/drop partitions

Re: [PR] [SPARK-50117][BUILD][SS] Change to using `maven/sbt` plugin to generate `StateMessage.java` [spark]

2024-10-28 Thread via GitHub
HeartSaVioR commented on PR #48654: URL: https://github.com/apache/spark/pull/48654#issuecomment-2443021130 Ah OK, so the generated file is placed in target, not source directory. Sounds good. -- This is an automated message from the Apache Git Service. To respond to the message, please l

Re: [PR] [SPARK-49513][SS] Add Support for timer in transformWithStateInPandas API [spark]

2024-10-28 Thread via GitHub
jingz-db commented on PR #47878: URL: https://github.com/apache/spark/pull/47878#issuecomment-2442995781 Hey @HyukjinKwon, do we have any place that could manually escape the python style check for certain files? Currently the linter check is only failing on auto-generated file created by `

Re: [PR] [SPARK-50117][BUILD][SS] Change to using `maven/sbt` plugin to generate `StateMessage.java` [spark]

2024-10-28 Thread via GitHub
LuciferYang commented on PR #48654: URL: https://github.com/apache/spark/pull/48654#issuecomment-2442942415 > Would we want to also add StateMessage.java to gitignore so that we don't try to add the file again? double checked: - Maven ``` build/mvn clean install -

Re: [PR] [SPARK-49899][PYTHON][SS] Support deleteIfExists for TransformWithStateInPandas [spark]

2024-10-28 Thread via GitHub
bogao007 commented on code in PR #48373: URL: https://github.com/apache/spark/pull/48373#discussion_r1819945337 ## python/pyspark/sql/pandas/group_ops.py: ## @@ -503,7 +503,11 @@ def transformWithStateUDF( statefulProcessorApiClient.set_implicit_key(key)

Re: [PR] [SPARK-49899][PYTHON][SS] Support deleteIfExists for TransformWithStateInPandas [spark]

2024-10-28 Thread via GitHub
bogao007 commented on code in PR #48373: URL: https://github.com/apache/spark/pull/48373#discussion_r1819943103 ## python/pyspark/sql/pandas/group_ops.py: ## @@ -503,7 +503,11 @@ def transformWithStateUDF( statefulProcessorApiClient.set_implicit_key(key)

Re: [PR] [SPARK-50124][SQL] LIMIT/OFFSET should preserve data ordering [spark]

2024-10-28 Thread via GitHub
JoshRosen commented on code in PR #48661: URL: https://github.com/apache/spark/pull/48661#discussion_r1819930126 ## sql/core/src/main/scala/org/apache/spark/sql/execution/InsertSortForLimitAndOffset.scala: ## @@ -0,0 +1,68 @@ +/* + * Licensed to the Apache Software Foundation (A

Re: [PR] [SPARK-48879][SQL] Expand the charset list with Chinese Standard Charsets [spark]

2024-10-28 Thread via GitHub
github-actions[bot] closed pull request #47320: [SPARK-48879][SQL] Expand the charset list with Chinese Standard Charsets URL: https://github.com/apache/spark/pull/47320 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

Re: [PR] [SPARK-49563][SQL] Add SQL pipe syntax for the WINDOW operator [spark]

2024-10-28 Thread via GitHub
dtenedor commented on code in PR #48649: URL: https://github.com/apache/spark/pull/48649#discussion_r1819915404 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala: ## @@ -1037,6 +1037,11 @@ class AstBuilder extends DataTypeAstBuilder th

Re: [PR] [SPARK-49707][BUILD] Upgrade rocksdbjni to 9.7.3 [spark]

2024-10-28 Thread via GitHub
panbingkun commented on PR #48155: URL: https://github.com/apache/spark/pull/48155#issuecomment-2442920460 Thanks all @LuciferYang @dongjoon-hyun ❤️ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go t

Re: [PR] [SPARK-50140][BUILD] Upgrade `zstd-jni` to 1.5.6-7 [spark]

2024-10-28 Thread via GitHub
panbingkun commented on PR #48671: URL: https://github.com/apache/spark/pull/48671#issuecomment-2442916619 > Let's wait for a few days until the announcement comes out. Don't hurry, let's wait for it 😄 -- This is an automated message from the Apache Git Service. To respond to the me

Re: [PR] [SPARK-50152][SS] Support handleInitialState with state data source reader [spark]

2024-10-28 Thread via GitHub
anishshri-db commented on code in PR #48686: URL: https://github.com/apache/spark/pull/48686#discussion_r1819916471 ## sql/core/src/test/scala/org/apache/spark/sql/streaming/TransformWithStateInitialStateSuite.scala: ## @@ -497,4 +537,85 @@ class TransformWithStateInitialStateSu

Re: [PR] [SPARK-50152][SS] Support handleInitialState with state data source reader [spark]

2024-10-28 Thread via GitHub
anishshri-db commented on code in PR #48686: URL: https://github.com/apache/spark/pull/48686#discussion_r1819916257 ## sql/core/src/test/scala/org/apache/spark/sql/streaming/TransformWithStateInitialStateSuite.scala: ## @@ -497,4 +537,85 @@ class TransformWithStateInitialStateSu

Re: [PR] [SPARK-50150][BUILD][3.5] Upgrade Jetty to 9.4.56.v20240826 [spark]

2024-10-28 Thread via GitHub
dongjoon-hyun closed pull request #48684: [SPARK-50150][BUILD][3.5] Upgrade Jetty to 9.4.56.v20240826 URL: https://github.com/apache/spark/pull/48684 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] [SPARK-50149][INFRA] Update INFRA docker image to use `jammy-20240911.1` [spark]

2024-10-28 Thread via GitHub
dongjoon-hyun closed pull request #48682: [SPARK-50149][INFRA] Update INFRA docker image to use `jammy-20240911.1` URL: https://github.com/apache/spark/pull/48682 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL ab

Re: [PR] [SPARK-49563][SQL] Add SQL pipe syntax for the WINDOW operator [spark]

2024-10-28 Thread via GitHub
dtenedor commented on PR #48649: URL: https://github.com/apache/spark/pull/48649#issuecomment-2442897812 cc @cloud-fan @gengliangwang :) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specif

Re: [PR] [SPARK-50149][INFRA] Update INFRA docker image to use `jammy-20240911.1` [spark]

2024-10-28 Thread via GitHub
dongjoon-hyun commented on PR #48682: URL: https://github.com/apache/spark/pull/48682#issuecomment-2442897772 Thank you, @viirya ! Merged to master for Apache Spark 4.0.0. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

Re: [PR] [SPARK-50150][BUILD][3.5] Upgrade Jetty to 9.4.56.v20240826 [spark]

2024-10-28 Thread via GitHub
dongjoon-hyun commented on PR #48684: URL: https://github.com/apache/spark/pull/48684#issuecomment-2442897315 Thank you, @viirya , too. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specifi

Re: [PR] [SPARK-50124][SQL] LIMIT/OFFSET should preserve data ordering [spark]

2024-10-28 Thread via GitHub
JoshRosen commented on code in PR #48661: URL: https://github.com/apache/spark/pull/48661#discussion_r1819908756 ## sql/core/src/main/scala/org/apache/spark/sql/execution/InsertSortForLimitAndOffset.scala: ## @@ -0,0 +1,68 @@ +/* + * Licensed to the Apache Software Foundation (A

Re: [PR] [SPARK-50150][BUILD][3.5] Upgrade Jetty to 9.4.56.v20240826 [spark]

2024-10-28 Thread via GitHub
dongjoon-hyun commented on PR #48684: URL: https://github.com/apache/spark/pull/48684#issuecomment-2442893861 Merged to branch-3.5. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific co

Re: [PR] [SPARK-50150][BUILD][3.5] Upgrade Jetty to 9.4.56.v20240826 [spark]

2024-10-28 Thread via GitHub
dongjoon-hyun commented on PR #48684: URL: https://github.com/apache/spark/pull/48684#issuecomment-2442893713 Thank you, @attilapiros ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specifi

Re: [PR] [SPARK-50124][SQL] LIMIT/OFFSET should preserve data ordering [spark]

2024-10-28 Thread via GitHub
JoshRosen commented on code in PR #48661: URL: https://github.com/apache/spark/pull/48661#discussion_r1819902550 ## sql/core/src/main/scala/org/apache/spark/sql/execution/InsertSortForLimitAndOffset.scala: ## @@ -0,0 +1,68 @@ +/* + * Licensed to the Apache Software Foundation (A

[PR] Initial state reader integration [spark]

2024-10-28 Thread via GitHub
jingz-db opened a new pull request, #48686: URL: https://github.com/apache/spark/pull/48686 ### What changes were proposed in this pull request? +---+-+-+--++ |groupingKey|value|listValue|userMapKey|userMapValue| +---+--

Re: [PR] [SPARK-50147][PYTHON][DOCS][TESTS] Refine docstring for trigonometric functions [spark]

2024-10-28 Thread via GitHub
dongjoon-hyun commented on PR #48678: URL: https://github.com/apache/spark/pull/48678#issuecomment-2442873708 Thank you, @zhengruifeng ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specif

Re: [PR] [SPARK-50147][PYTHON][DOCS][TESTS] Refine docstring for trigonometric functions [spark]

2024-10-28 Thread via GitHub
zhengruifeng commented on PR #48678: URL: https://github.com/apache/spark/pull/48678#issuecomment-2442872614 @dongjoon-hyun thanks for the review, let me add recent PRs to [SPARK-44728](https://issues.apache.org/jira/browse/SPARK-44728) -- This is an automated message from the Apache Git

Re: [PR] [SPARK-50149][INFRA] Update INFRA docker image to use `jammy-20240911.1` [spark]

2024-10-28 Thread via GitHub
dongjoon-hyun commented on PR #48682: URL: https://github.com/apache/spark/pull/48682#issuecomment-2442871970 Could you review this PR, @viirya ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to t

Re: [PR] [SPARK-50150][BUILD][3.5] Upgrade Jetty to 9.4.56.v20240826 [spark]

2024-10-28 Thread via GitHub
dongjoon-hyun commented on PR #48684: URL: https://github.com/apache/spark/pull/48684#issuecomment-2442872264 Could you review this PR, @viirya ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to t

Re: [PR] [SPARK-49562][SQL] Add SQL pipe syntax for aggregation [spark]

2024-10-28 Thread via GitHub
dtenedor commented on PR #48529: URL: https://github.com/apache/spark/pull/48529#issuecomment-2442816057 friendly ping @cloud-fan @gengliangwang :) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go t

Re: [PR] [SPARK-50117][BUILD][SS] Change to using `maven/sbt` plugin to generate `StateMessage.java` [spark]

2024-10-28 Thread via GitHub
HeartSaVioR commented on PR #48654: URL: https://github.com/apache/spark/pull/48654#issuecomment-2442803955 Would we want to also add StateMessage.java to gitignore so that we don't try to add the file? -- This is an automated message from the Apache Git Service. To respond to the message

Re: [PR] [SPARK-49513][SS] Add Support for timer in transformWithStateInPandas API [spark]

2024-10-28 Thread via GitHub
jingz-db commented on PR #47878: URL: https://github.com/apache/spark/pull/47878#issuecomment-2442791970 > @jingz-db Looks like linter is still failing - have you ensured that the master branch for your repo is up-to-date with the master branch for Apache repo? I do, I rebased on the

Re: [PR] [SPARK-50149][INFRA] Update INFRA docker image to use `jammy-20240911.1` [spark]

2024-10-28 Thread via GitHub
dongjoon-hyun commented on PR #48682: URL: https://github.com/apache/spark/pull/48682#issuecomment-2442683316 Could you review this PR, @attilapiros ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [PR] [SPARK-49513][SS] Add Support for timer in transformWithStateInPandas API [spark]

2024-10-28 Thread via GitHub
HeartSaVioR commented on PR #47878: URL: https://github.com/apache/spark/pull/47878#issuecomment-2442699945 @jingz-db Looks like linter is still failing - have you ensured that the master branch for your repo is up-to-date with the master branch for Apache repo? -- This is an automate

Re: [PR] [SPARK-37178][ML] Add Target Encoding to ml.feature [spark]

2024-10-28 Thread via GitHub
rebo16v commented on PR #48347: URL: https://github.com/apache/spark/pull/48347#issuecomment-2442691411 @zhengruifeng -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To un

Re: [PR] [SPARK-37178][ML] Add Target Encoding to ml.feature [spark]

2024-10-28 Thread via GitHub
rebo16v commented on code in PR #48347: URL: https://github.com/apache/spark/pull/48347#discussion_r1819796924 ## mllib/src/test/scala/org/apache/spark/ml/feature/TargetEncoderSuite.scala: ## @@ -0,0 +1,538 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or

Re: [PR] [SPARK-37178][ML] Add Target Encoding to ml.feature [spark]

2024-10-28 Thread via GitHub
rebo16v commented on code in PR #48347: URL: https://github.com/apache/spark/pull/48347#discussion_r1819795209 ## mllib/src/main/scala/org/apache/spark/ml/feature/TargetEncoder.scala: ## @@ -0,0 +1,460 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more

Re: [PR] [SPARK-37178][ML] Add Target Encoding to ml.feature [spark]

2024-10-28 Thread via GitHub
rebo16v commented on code in PR #48347: URL: https://github.com/apache/spark/pull/48347#discussion_r1819796924 ## mllib/src/test/scala/org/apache/spark/ml/feature/TargetEncoderSuite.scala: ## @@ -0,0 +1,538 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or

Re: [PR] [SPARK-50150][BUILD][3.5] Upgrade Jetty to 9.4.56.v20240826 [spark]

2024-10-28 Thread via GitHub
dongjoon-hyun commented on PR #48684: URL: https://github.com/apache/spark/pull/48684#issuecomment-2442683757 Could you review this PR, @attilapiros ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [PR] [SPARK-50124][SQL] LIMIT/OFFSET should preserve data ordering [spark]

2024-10-28 Thread via GitHub
bersprockets commented on code in PR #48661: URL: https://github.com/apache/spark/pull/48661#discussion_r1819799634 ## sql/core/src/test/scala/org/apache/spark/sql/execution/InsertSortForLimitAndOffsetSuite.scala: ## @@ -0,0 +1,120 @@ +/* + * Licensed to the Apache Software Foun

Re: [PR] [SPARK-49297][CORE][TESTS] Fix race condition in BlockManagerDecommissionIntegrationSuite [spark]

2024-10-28 Thread via GitHub
attilapiros commented on code in PR #48683: URL: https://github.com/apache/spark/pull/48683#discussion_r1819712363 ## core/src/test/scala/org/apache/spark/storage/BlockManagerDecommissionIntegrationSuite.scala: ## @@ -377,20 +379,22 @@ class BlockManagerDecommissionIntegrationSu

Re: [PR] [SPARK-49676][SS][PYTHON] Add Support for Chaining of Operators in transformWithStateInPandas API [spark]

2024-10-28 Thread via GitHub
jingz-db commented on code in PR #48124: URL: https://github.com/apache/spark/pull/48124#discussion_r1819740895 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/IncrementalExecution.scala: ## @@ -425,6 +425,29 @@ class IncrementalExecution( even

Re: [PR] [SPARK-49676][SS][PYTHON] Add Support for Chaining of Operators in transformWithStateInPandas API [spark]

2024-10-28 Thread via GitHub
jingz-db commented on code in PR #48124: URL: https://github.com/apache/spark/pull/48124#discussion_r1819737427 ## sql/core/src/main/scala/org/apache/spark/sql/execution/python/TransformWithStateInPandasExec.scala: ## @@ -106,6 +107,37 @@ case class TransformWithStateInPandasExe

[PR] [SPARK-50151][SS][RocksDB Hardening] - Fix new file mapping version advancement and ineffective file reuse bug [spark]

2024-10-28 Thread via GitHub
micheal-o opened a new pull request, #48685: URL: https://github.com/apache/spark/pull/48685 ### What changes were proposed in this pull request? There are 2 bugs in the recently added new approach for RocksDB SST file mapping in this PR: https://github.com/apache/spark/pull/47875

[PR] [SPARK-49297][CORE][TESTS] Fix race condition in BlockManagerDecommissionIntegrationSuite [spark]

2024-10-28 Thread via GitHub
attilapiros opened a new pull request, #48683: URL: https://github.com/apache/spark/pull/48683 ### What changes were proposed in this pull request? Fixing race condition in the test "SPARK-46957: Migrated shuffle files should be able to cleanup from executor" of `BlockManagerDeco

Re: [PR] [SPARK-49297][CORE][TESTS] Fix race condition in BlockManagerDecommissionIntegrationSuite [spark]

2024-10-28 Thread via GitHub
dongjoon-hyun commented on code in PR #48683: URL: https://github.com/apache/spark/pull/48683#discussion_r1819597117 ## core/src/test/scala/org/apache/spark/storage/BlockManagerDecommissionIntegrationSuite.scala: ## @@ -377,20 +379,22 @@ class BlockManagerDecommissionIntegration

[PR] [SPARK-50150][BUILD][3.5] Upgrade Jetty to 9.4.56.v20240826 [spark]

2024-10-28 Thread via GitHub
dongjoon-hyun opened a new pull request, #48684: URL: https://github.com/apache/spark/pull/48684 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### H

Re: [PR] [SPARK-49562][SQL] Add SQL pipe syntax for aggregation [spark]

2024-10-28 Thread via GitHub
dtenedor commented on code in PR #48529: URL: https://github.com/apache/spark/pull/48529#discussion_r1819581672 ## sql/api/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBaseParser.g4: ## @@ -765,7 +765,7 @@ temporalClause aggregationClause : GROUP BY groupingExpr

[PR] [SPARK-50149][INFRA] Update INFRA docker image to use `jammy-20240911.1` [spark]

2024-10-28 Thread via GitHub
dongjoon-hyun opened a new pull request, #48682: URL: https://github.com/apache/spark/pull/48682 ### What changes were proposed in this pull request? This PR aims to update `infra` docker image to use `jammy-20240911.1` instead of `jammy-20240227`. ### Why are the changes neede

Re: [PR] [SPARK-49899][PYTHON][SS] Support deleteIfExists for TransformWithStateInPandas [spark]

2024-10-28 Thread via GitHub
jingz-db commented on code in PR #48373: URL: https://github.com/apache/spark/pull/48373#discussion_r1819559837 ## sql/core/src/main/scala/org/apache/spark/sql/execution/python/TransformWithStateInPandasExec.scala: ## @@ -106,6 +106,30 @@ case class TransformWithStateInPandasExe

Re: [PR] [SPARK-49899][PYTHON][SS] Support deleteIfExists for TransformWithStateInPandas [spark]

2024-10-28 Thread via GitHub
jingz-db commented on code in PR #48373: URL: https://github.com/apache/spark/pull/48373#discussion_r1819550379 ## python/pyspark/sql/pandas/group_ops.py: ## @@ -503,7 +503,11 @@ def transformWithStateUDF( statefulProcessorApiClient.set_implicit_key(key)

[PR] [WIP] Added Logistic Matrix Factorization(LMF) and Item2Vec models [spark]

2024-10-28 Thread via GitHub
ezamyatin opened a new pull request, #48681: URL: https://github.com/apache/spark/pull/48681 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How w

Re: [PR] [SPARK-50140][BUILD] Upgrade `zstd-jni` to 1.5.6-7 [spark]

2024-10-28 Thread via GitHub
dongjoon-hyun commented on PR #48671: URL: https://github.com/apache/spark/pull/48671#issuecomment-2442249830 Let's wait for a few days until the announcement comes out. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

Re: [PR] [SPARK-50147][PYTHON][DOCS][TESTS] Refine docstring for trigonometric functions [spark]

2024-10-28 Thread via GitHub
dongjoon-hyun commented on PR #48678: URL: https://github.com/apache/spark/pull/48678#issuecomment-2442247011 BTW, is this a subtask of SPARK-44728 , @zhengruifeng ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

Re: [PR] [SPARK-50147][PYTHON][DOCS][TESTS] Refine docstring for trigonometric functions [spark]

2024-10-28 Thread via GitHub
dongjoon-hyun closed pull request #48678: [SPARK-50147][PYTHON][DOCS][TESTS] Refine docstring for trigonometric functions URL: https://github.com/apache/spark/pull/48678 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

Re: [PR] [SPARK-49707][BUILD] Upgrade rocksdbjni to 9.7.3 [spark]

2024-10-28 Thread via GitHub
dongjoon-hyun commented on PR #48155: URL: https://github.com/apache/spark/pull/48155#issuecomment-2442143126 +1, LGTM. Thank you again, @panbingkun and @LuciferYang . -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[PR] Bump rexml from 3.2.6 to 3.3.9 in /docs [spark]

2024-10-28 Thread via GitHub
dependabot[bot] opened a new pull request, #48680: URL: https://github.com/apache/spark/pull/48680 Bumps [rexml](https://github.com/ruby/rexml) from 3.2.6 to 3.3.9. Release notes Sourced from https://github.com/ruby/rexml/releases";>rexml's releases. REXML 3.3.9 - 2024-10-24

Re: [PR] [SPARK-49637] Changed error message for INVALID_FRACTION_OF_SECOND [spark]

2024-10-28 Thread via GitHub
markonik-db commented on PR #48656: URL: https://github.com/apache/spark/pull/48656#issuecomment-2442011609 If CI passes, hopefully it can be merged today -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above t

Re: [PR] [SPARK-50143][BUILD] Fix `protobuf` module Maven compilation [spark]

2024-10-28 Thread via GitHub
dongjoon-hyun commented on PR #48673: URL: https://github.com/apache/spark/pull/48673#issuecomment-2442008904 For the record, Maven CI becomes healthy back after this PR. - https://github.com/apache/spark/actions/workflows/build_maven.yml - https://github.com/apache/spark/actions/runs

Re: [PR] Bump rexml from 3.2.6 to 3.3.9 in /docs [spark]

2024-10-28 Thread via GitHub
dependabot[bot] commented on PR #48680: URL: https://github.com/apache/spark/pull/48680#issuecomment-2442001515 OK, I won't notify you again about this release, but will get in touch when a new version is available. If you'd rather skip all updates until the next major or minor version, let

Re: [PR] Bump rexml from 3.2.6 to 3.3.9 in /docs [spark]

2024-10-28 Thread via GitHub
dongjoon-hyun closed pull request #48680: Bump rexml from 3.2.6 to 3.3.9 in /docs URL: https://github.com/apache/spark/pull/48680 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment

Re: [PR] [SPARK-50145][BUILD] Upgrade Jetty to 11.0.24 [spark]

2024-10-28 Thread via GitHub
dongjoon-hyun closed pull request #48674: [SPARK-50145][BUILD] Upgrade Jetty to 11.0.24 URL: https://github.com/apache/spark/pull/48674 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific c

Re: [PR] [SPARK-50145][BUILD] Upgrade Jetty to 11.0.24 [spark]

2024-10-28 Thread via GitHub
dongjoon-hyun commented on PR #48674: URL: https://github.com/apache/spark/pull/48674#issuecomment-2441985708 All tests passed. Merged to master for Apache Spark 4.0.0 on February 2025. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

Re: [PR] [SPARK-50092][SQL] Fix PostgreSQL connector behaviour for multidimensional arrays [spark]

2024-10-28 Thread via GitHub
PetarVasiljevic-DB commented on code in PR #48625: URL: https://github.com/apache/spark/pull/48625#discussion_r1819282320 ## sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala: ## @@ -4421,6 +4421,16 @@ object SQLConf { .booleanConf .createWith

Re: [PR] [SPARK-49993][SQL] Improve error messages for Sum and Average [spark]

2024-10-28 Thread via GitHub
MaxGekk commented on PR #48499: URL: https://github.com/apache/spark/pull/48499#issuecomment-2441867297 +1, LGTM. Merging to master. Thank you, @mihailom-db and @srielau @HyukjinKwon for review. -- This is an automated message from the Apache Git Service. To respond to the message, plea

Re: [PR] [SPARK-49993][SQL] Improve error messages for Sum and Average [spark]

2024-10-28 Thread via GitHub
MaxGekk closed pull request #48499: [SPARK-49993][SQL] Improve error messages for Sum and Average URL: https://github.com/apache/spark/pull/48499 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] [SPARK-48139][CONNECT][TESTS] Try stabilising multi-thread tests in CI [spark]

2024-10-28 Thread via GitHub
xupefei commented on PR #48622: URL: https://github.com/apache/spark/pull/48622#issuecomment-2441785807 An inrelavent test is failing: ``` TypeError: Channel.unary_stream() got an unexpected keyword argument '_registered_method' [info] - python listener process: process terminat

Re: [PR] [SPARK-50143][BUILD] Fix `protobuf` module Maven compilation [spark]

2024-10-28 Thread via GitHub
dongjoon-hyun commented on PR #48673: URL: https://github.com/apache/spark/pull/48673#issuecomment-2441830338 Thank you, @panbingkun and @viirya . -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] [SPARK-50145][BUILD] Upgrade Jetty to 11.0.24 [spark]

2024-10-28 Thread via GitHub
dongjoon-hyun commented on PR #48674: URL: https://github.com/apache/spark/pull/48674#issuecomment-2441828746 Thank you, @yaooqinn and @LuciferYang . To @LuciferYang , sure. I re-triggered it. -- This is an automated message from the Apache Git Service. To respond to the message, pl

Re: [PR] [SPARK-37178][ML] Add Target Encoding to ml.feature [spark]

2024-10-28 Thread via GitHub
rebo16v commented on code in PR #48347: URL: https://github.com/apache/spark/pull/48347#discussion_r1819002703 ## mllib/src/test/scala/org/apache/spark/ml/feature/TargetEncoderSuite.scala: ## @@ -0,0 +1,538 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or

Re: [PR] [SPARK-50071][SQL][PYTHON] Add try_make_timestamp(_ltz and _ntz) and related tests [spark]

2024-10-28 Thread via GitHub
markonik-db commented on code in PR #48624: URL: https://github.com/apache/spark/pull/48624#discussion_r1819177879 ## python/docs/source/reference/pyspark.sql/functions.rst: ## @@ -279,6 +279,9 @@ Date and Timestamp Functions make_timestamp make_timestamp_ltz make

  1   2   >