Re: [PR] [SPARK-48865][SQL] Add try_url_decode function [spark]

2024-07-17 Thread via GitHub
wForget commented on code in PR #47294: URL: https://github.com/apache/spark/pull/47294#discussion_r1682257760 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/urlExpressions.scala: ## @@ -106,6 +106,37 @@ case class UrlDecode(child: Expression) overri

Re: [PR] [SPARK-48893][SQL][DOCS] Add some examples for `linearRegression` built-in functions [spark]

2024-07-17 Thread via GitHub
LuciferYang commented on PR #47343: URL: https://github.com/apache/spark/pull/47343#issuecomment-2235705909 fine to me, but if you can help refine the Python doc for the corresponding function in builtin.py, that would be even better. -- This is an automated message from the Apache Git Se

Re: [PR] [ONLY TEST][SQL] Improve TPCDSCollationQueryTestSuite [spark]

2024-07-17 Thread via GitHub
LuciferYang commented on PR #47369: URL: https://github.com/apache/spark/pull/47369#issuecomment-2235697156 > > from the log we can see > > ``` > > Warning: [1888.608s][warning][gc,alloc] broadcast-exchange-907: Retried waiting for GCLocker too often allocating 33554434 words > > ``

Re: [PR] [ONLY TEST][SQL] Improve TPCDSCollationQueryTestSuite [spark]

2024-07-17 Thread via GitHub
panbingkun commented on PR #47369: URL: https://github.com/apache/spark/pull/47369#issuecomment-2235684142 I will continue to observe and consider aligning them if I encounter them again in the future. -- This is an automated message from the Apache Git Service. To respond to the message,

Re: [PR] [ONLY TEST][SQL] Improve TPCDSCollationQueryTestSuite [spark]

2024-07-17 Thread via GitHub
panbingkun commented on PR #47369: URL: https://github.com/apache/spark/pull/47369#issuecomment-2235681312 > from the log we can see > > ``` > Warning: [1888.608s][warning][gc,alloc] broadcast-exchange-907: Retried waiting for GCLocker too often allocating 33554434 words > ```

Re: [PR] [SPARK-48821][SQL] Support Update in DataFrameWriterV2 [spark]

2024-07-17 Thread via GitHub
huaxingao commented on PR #47233: URL: https://github.com/apache/spark/pull/47233#issuecomment-2235545571 cc @cloud-fan -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [PR] [SPARK-48821][SQL] Support Update in DataFrameWriterV2 [spark]

2024-07-17 Thread via GitHub
huaxingao commented on PR #47233: URL: https://github.com/apache/spark/pull/47233#issuecomment-2235529448 I think this over, maybe we should support write options. We can have something like this: ``` SQL: UPDATE table SET column_name = value WITH (key = value)

Re: [PR] [ONLY TEST][SQL] Improve TPCDSCollationQueryTestSuite [spark]

2024-07-17 Thread via GitHub
LuciferYang commented on PR #47369: URL: https://github.com/apache/spark/pull/47369#issuecomment-2235456348 from the log we can see ``` Warning: [1888.608s][warning][gc,alloc] broadcast-exchange-907: Retried waiting for GCLocker too often allocating 33554434 words ``` If

Re: [PR] [SPARK-48758][Core] Race condition between executor registration and heartbeat [spark]

2024-07-17 Thread via GitHub
LuciferYang commented on PR #47395: URL: https://github.com/apache/spark/pull/47395#issuecomment-2235355579 Can you update the PR title? The current title looks more like a Jira ticket name. For a PR title, it's better to clearly reflect what this PR does. -- This is an automated message

Re: [PR] [SPARK-48890][CORE][SS] Add Structured Streaming related fields to log4j ThreadContext [spark]

2024-07-17 Thread via GitHub
WweiL commented on code in PR #47340: URL: https://github.com/apache/spark/pull/47340#discussion_r1682123610 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala: ## @@ -405,6 +413,8 @@ abstract class StreamExecution( new QueryTerm

Re: [PR] [SPARK-48922][SQL] Optimize nested data type insertion performance [spark]

2024-07-17 Thread via GitHub
wForget commented on PR #47381: URL: https://github.com/apache/spark/pull/47381#issuecomment-2235349742 The previous behavior was introduced in #33728, @cloud-fan could you please take a look? -- This is an automated message from the Apache Git Service. To respond to the message, please l

Re: [PR] [SPARK-48752][PYTHON][CONNECT][DOCS] Introduce `pyspark.logger` for improved structured logging for PySpark [spark]

2024-07-17 Thread via GitHub
HyukjinKwon closed pull request #47145: [SPARK-48752][PYTHON][CONNECT][DOCS] Introduce `pyspark.logger` for improved structured logging for PySpark URL: https://github.com/apache/spark/pull/47145 -- This is an automated message from the Apache Git Service. To respond to the message, please lo

Re: [PR] [SPARK-48752][PYTHON][CONNECT][DOCS] Introduce `pyspark.logger` for improved structured logging for PySpark [spark]

2024-07-17 Thread via GitHub
HyukjinKwon commented on PR #47145: URL: https://github.com/apache/spark/pull/47145#issuecomment-2235339596 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] [SPARK-48921][SQL] ScalaUDF encoders in subquery should be resolved for MergeInto [spark]

2024-07-17 Thread via GitHub
viirya commented on code in PR #47380: URL: https://github.com/apache/spark/pull/47380#discussion_r1682117782 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala: ## @@ -354,6 +351,14 @@ class Analyzer(override val catalogManager: CatalogManager

[PR] [MINOR][SQL][TESTS] Enable test case `testOrcAPI` in `JavaDataFrameReaderWriterSuite` [spark]

2024-07-17 Thread via GitHub
LuciferYang opened a new pull request, #47400: URL: https://github.com/apache/spark/pull/47400 ### What changes were proposed in this pull request? This PR enabled test case `testOrcAPI` in `JavaDataFrameReaderWriterSuite` because this test no longer depends on Hive classes, we can test i

Re: [PR] [SPARK-48915][SQL][TESTS][FOLLOWUP] Add some uncovered predicates(!=, <, <=, >, >=) for correlation in `GeneratedSubquerySuite` [spark]

2024-07-17 Thread via GitHub
wayneguow commented on PR #47399: URL: https://github.com/apache/spark/pull/47399#issuecomment-2235331344 cc @cloud-fan @andylam-db . And I have some concerns that adding the predicate axis for correlation will significantly increase the test time of `GeneratedSubquerySuite`. -- This is

Re: [PR] [SPARK-48934][SS] Python datetime types converted incorrectly for setting timeout in applyInPandasWithState [spark]

2024-07-17 Thread via GitHub
HeartSaVioR commented on PR #47398: URL: https://github.com/apache/spark/pull/47398#issuecomment-2235319405 ``` starting flake8 test... flake8 checks failed: ./python/pyspark/sql/streaming/state.py:21:1: F401 'pyspark.sql.types.DateType' imported but unused from pyspark.sql.types

Re: [PR] [SPARK-48890][CORE][SS] Add Structured Streaming related fields to log4j ThreadContext [spark]

2024-07-17 Thread via GitHub
gengliangwang commented on code in PR #47340: URL: https://github.com/apache/spark/pull/47340#discussion_r1682095372 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala: ## @@ -405,6 +413,8 @@ abstract class StreamExecution( new Q

Re: [PR] [SPARK-48890][CORE][SS] Add Structured Streaming related fields to log4j ThreadContext [spark]

2024-07-17 Thread via GitHub
gengliangwang commented on PR #47340: URL: https://github.com/apache/spark/pull/47340#issuecomment-2235298861 Thanks, merging to master -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specifi

Re: [PR] [SPARK-48623][CORE] Migrate FileAppender logs to structured logging [spark]

2024-07-17 Thread via GitHub
gengliangwang closed pull request #47394: [SPARK-48623][CORE] Migrate FileAppender logs to structured logging URL: https://github.com/apache/spark/pull/47394 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above t

[PR] [SPARK-48915][SQL][TESTS][FOLLOWUP] Add some uncovered predicates(!=, <, <=, >, >=) for correlation in `GeneratedSubquerySuite` [spark]

2024-07-17 Thread via GitHub
wayneguow opened a new pull request, #47399: URL: https://github.com/apache/spark/pull/47399 ### What changes were proposed in this pull request? In PR #47386, we improves coverage of predicate types of scalar subquery in the WHERE clause. Follow up, this PR as aims to add

Re: [PR] [SPARK-48623][CORE] Migrate FileAppender logs to structured logging [spark]

2024-07-17 Thread via GitHub
gengliangwang commented on PR #47394: URL: https://github.com/apache/spark/pull/47394#issuecomment-2235291750 Thanks, merging to master -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specifi

Re: [PR] [SPARK-48921][SQL] ScalaUDF encoders in subquery should be resolved for MergeInto [spark]

2024-07-17 Thread via GitHub
dongjoon-hyun commented on code in PR #47380: URL: https://github.com/apache/spark/pull/47380#discussion_r1682050844 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala: ## @@ -354,6 +351,14 @@ class Analyzer(override val catalogManager: Catalog

Re: [PR] [SPARK-48921][SQL] ScalaUDF encoders in subquery should be resolved for MergeInto [spark]

2024-07-17 Thread via GitHub
dongjoon-hyun commented on code in PR #47380: URL: https://github.com/apache/spark/pull/47380#discussion_r1682050254 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala: ## @@ -354,6 +351,10 @@ class Analyzer(override val catalogManager: Catalog

Re: [PR] [SPARK-36680][SQL][FOLLOWUP] Files with options should be put into resolveDataSource function [spark]

2024-07-17 Thread via GitHub
logze commented on PR #47370: URL: https://github.com/apache/spark/pull/47370#issuecomment-2235185933 @cloud-fan Thank you very much, I modified the code according to your suggestion, can you take a look at it again? -- This is an automated message from the Apache Git Service. To respond

[PR] [SPARK-48934][SS] Python datetime types converted incorrectly for setting timeout in applyInPandasWithState [spark]

2024-07-17 Thread via GitHub
siying opened a new pull request, #47398: URL: https://github.com/apache/spark/pull/47398 ### What changes were proposed in this pull request? Fix the way applyInPandasWithState's setTimeoutTimestamp() handles argument of datetime ### Why are the changes needed? In applyInPa

Re: [PR] [SPARK-48921][SQL] ScalaUDF encoders in subquery should be resolved for MergeInto [spark]

2024-07-17 Thread via GitHub
viirya commented on code in PR #47380: URL: https://github.com/apache/spark/pull/47380#discussion_r1682011249 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala: ## @@ -354,6 +351,10 @@ class Analyzer(override val catalogManager: CatalogManager

Re: [PR] [SPARK-48921][SQL] ScalaUDF encoders in subquery should be resolved for MergeInto [spark]

2024-07-17 Thread via GitHub
cloud-fan commented on code in PR #47380: URL: https://github.com/apache/spark/pull/47380#discussion_r1682009509 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala: ## @@ -354,6 +351,10 @@ class Analyzer(override val catalogManager: CatalogMana

Re: [PR] [SPARK-48900] Add `reason` field for `cancelJobGroup` and `cancelJobsWithTag` [spark]

2024-07-17 Thread via GitHub
cloud-fan closed pull request #47361: [SPARK-48900] Add `reason` field for `cancelJobGroup` and `cancelJobsWithTag` URL: https://github.com/apache/spark/pull/47361 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL a

Re: [PR] [SPARK-48900] Add `reason` field for `cancelJobGroup` and `cancelJobsWithTag` [spark]

2024-07-17 Thread via GitHub
cloud-fan commented on PR #47361: URL: https://github.com/apache/spark/pull/47361#issuecomment-2235170659 thanks, merging to master! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific c

Re: [PR] [SPARK-48388][SQL] Fix SET statement behavior for SQL Scripts [spark]

2024-07-17 Thread via GitHub
cloud-fan commented on code in PR #47272: URL: https://github.com/apache/spark/pull/47272#discussion_r1682004920 ## sql/api/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBaseParser.g4: ## @@ -251,26 +258,29 @@ statement | (MSCK)? REPAIR TABLE identifierReference

Re: [PR] [SPARK-48915][SQL][TESTS] Add some uncovered predicates(!=, <=, >, >=) in test cases of `GeneratedSubquerySuite` [spark]

2024-07-17 Thread via GitHub
cloud-fan closed pull request #47386: [SPARK-48915][SQL][TESTS] Add some uncovered predicates(!=, <=, >, >=) in test cases of `GeneratedSubquerySuite` URL: https://github.com/apache/spark/pull/47386 -- This is an automated message from the Apache Git Service. To respond to the message, please

Re: [PR] [SPARK-48915][SQL][TESTS] Add some uncovered predicates(!=, <=, >, >=) in test cases of `GeneratedSubquerySuite` [spark]

2024-07-17 Thread via GitHub
cloud-fan commented on PR #47386: URL: https://github.com/apache/spark/pull/47386#issuecomment-2235161996 thanks, merging to master! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific c

Re: [PR] [SPARK-48510][CONNECT][FOLLOW-UP-MK2] Fix for UDAF `toColumn` API when running tests in Maven [spark]

2024-07-17 Thread via GitHub
HyukjinKwon commented on PR #47387: URL: https://github.com/apache/spark/pull/47387#issuecomment-2235159963 oops -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscr

Re: [PR] [SPARK-48921][SQL] ScalaUDF encoders in subquery should be resolved for MergeInto [spark]

2024-07-17 Thread via GitHub
viirya commented on code in PR #47380: URL: https://github.com/apache/spark/pull/47380#discussion_r1681999792 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala: ## @@ -327,7 +327,6 @@ class Analyzer(override val catalogManager: CatalogManager)

Re: [PR] [SPARK-48921][SQL] ScalaUDF encoders in subquery should be resolved for MergeInto [spark]

2024-07-17 Thread via GitHub
cloud-fan commented on code in PR #47380: URL: https://github.com/apache/spark/pull/47380#discussion_r1681992831 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala: ## @@ -354,6 +353,8 @@ class Analyzer(override val catalogManager: CatalogManage

Re: [PR] [SPARK-48921][SQL] ScalaUDF encoders in subquery should be resolved for MergeInto [spark]

2024-07-17 Thread via GitHub
cloud-fan commented on code in PR #47380: URL: https://github.com/apache/spark/pull/47380#discussion_r1681992188 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala: ## @@ -327,7 +327,6 @@ class Analyzer(override val catalogManager: CatalogManage

Re: [PR] [SPARK-48510][CONNECT][FOLLOW-UP-MK2] Fix for UDAF `toColumn` API when running tests in Maven [spark]

2024-07-17 Thread via GitHub
dongjoon-hyun commented on PR #47387: URL: https://github.com/apache/spark/pull/47387#issuecomment-2235144480 Ur, it seems that `-MK2` is merged as a commit, @HyukjinKwon . :) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub a

Re: [PR] [SPARK-48932][BUILD] Upgrade `commons-lang3` to 3.15.0 [spark]

2024-07-17 Thread via GitHub
dongjoon-hyun commented on PR #47396: URL: https://github.com/apache/spark/pull/47396#issuecomment-2235132492 Merged to master for Apache Spark 4.0.0-preview2. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL ab

Re: [PR] [SPARK-48932][BUILD] Upgrade `commons-lang3` to 3.15.0 [spark]

2024-07-17 Thread via GitHub
dongjoon-hyun closed pull request #47396: [SPARK-48932][BUILD] Upgrade `commons-lang3` to 3.15.0 URL: https://github.com/apache/spark/pull/47396 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the s

Re: [PR] [SPARK-48932][BUILD] Upgrade `commons-lang3` to 3.15.0 [spark]

2024-07-17 Thread via GitHub
panbingkun commented on PR #47396: URL: https://github.com/apache/spark/pull/47396#issuecomment-2235120208 cc @dongjoon-hyun @LuciferYang -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spec

Re: [PR] [SPARK-48752][PYTHON][CONNECT][DOCS] Introduce `pyspark.logger` for improved structured logging for PySpark [spark]

2024-07-17 Thread via GitHub
itholic commented on PR #47145: URL: https://github.com/apache/spark/pull/47145#issuecomment-2235085479 Thanks @HyukjinKwon just applied the comments -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [PR] [SPARK-48921][SQL] ScalaUDF encoders in subquery should be resolved for MergeInto [spark]

2024-07-17 Thread via GitHub
viirya commented on code in PR #47380: URL: https://github.com/apache/spark/pull/47380#discussion_r1681968217 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala: ## @@ -352,8 +353,6 @@ class Analyzer(override val catalogManager: CatalogManager)

Re: [PR] [SPARK-48921][SQL] ScalaUDF encoders in subquery should be resolved for MergeInto [spark]

2024-07-17 Thread via GitHub
cloud-fan commented on code in PR #47380: URL: https://github.com/apache/spark/pull/47380#discussion_r1681966247 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala: ## @@ -352,8 +353,6 @@ class Analyzer(override val catalogManager: CatalogManage

Re: [PR] [SPARK-48921][SQL] ScalaUDF encoders in subquery should be resolved for MergeInto [spark]

2024-07-17 Thread via GitHub
cloud-fan commented on code in PR #47380: URL: https://github.com/apache/spark/pull/47380#discussion_r1681965976 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala: ## @@ -352,8 +353,6 @@ class Analyzer(override val catalogManager: CatalogManage

Re: [PR] [SPARK-48821][SQL] Support Update in DataFrameWriterV2 [spark]

2024-07-17 Thread via GitHub
szehon-ho commented on PR #47233: URL: https://github.com/apache/spark/pull/47233#issuecomment-2235056181 OK thanks for checking. I had that originally, I can change it back. I guess we wont support something like this then (write options)? ``` spark.table("source") .upd

[PR] [SPARK-48933][BUILD] Upgrade `protobuf-java` to `3.25.3` [spark]

2024-07-17 Thread via GitHub
panbingkun opened a new pull request, #47397: URL: https://github.com/apache/spark/pull/47397 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How

Re: [PR] [SPARK-48921][SQL] ScalaUDF encoders in subquery should be resolved for MergeInto [spark]

2024-07-17 Thread via GitHub
viirya commented on PR #47380: URL: https://github.com/apache/spark/pull/47380#issuecomment-223554 @cloud-fan I changed the rule order of `ResolveEncodersInUDF`. The unit test is updated too. -- This is an automated message from the Apache Git Service. To respond to the message, pleas

Re: [PR] [SPARK-48623][CORE] Migrate FileAppender logs to structured logging [spark]

2024-07-17 Thread via GitHub
HyukjinKwon commented on PR #47394: URL: https://github.com/apache/spark/pull/47394#issuecomment-2234948418 cc @gengliangwang -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment

Re: [PR] [SPARK-48926][SQL][TESTS] Use `checkError` method to optimize exception check logic related to `UNRESOLVED_COLUMN` error classes [spark]

2024-07-17 Thread via GitHub
HyukjinKwon commented on PR #47389: URL: https://github.com/apache/spark/pull/47389#issuecomment-2234946401 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] [SPARK-48926][SQL][TESTS] Use `checkError` method to optimize exception check logic related to `UNRESOLVED_COLUMN` error classes [spark]

2024-07-17 Thread via GitHub
HyukjinKwon closed pull request #47389: [SPARK-48926][SQL][TESTS] Use `checkError` method to optimize exception check logic related to `UNRESOLVED_COLUMN` error classes URL: https://github.com/apache/spark/pull/47389 -- This is an automated message from the Apache Git Service. To respond to

Re: [PR] [SPARK-48628][CORE] Add task peak on/off heap memory metrics [spark]

2024-07-17 Thread via GitHub
liuzqt commented on code in PR #47192: URL: https://github.com/apache/spark/pull/47192#discussion_r1681934753 ## core/src/main/java/org/apache/spark/memory/TaskMemoryManager.java: ## @@ -202,6 +226,18 @@ public long acquireExecutionMemory(long required, MemoryConsumer requestin

Re: [PR] [SPARK-48510][CONNECT][FOLLOW-UP-MK2] Fix for UDAF `toColumn` API when running tests in Maven [spark]

2024-07-17 Thread via GitHub
HyukjinKwon commented on PR #47387: URL: https://github.com/apache/spark/pull/47387#issuecomment-2234945852 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] [SPARK-48510][CONNECT][FOLLOW-UP-MK2] Fix for UDAF `toColumn` API when running tests in Maven [spark]

2024-07-17 Thread via GitHub
HyukjinKwon closed pull request #47387: [SPARK-48510][CONNECT][FOLLOW-UP-MK2] Fix for UDAF `toColumn` API when running tests in Maven URL: https://github.com/apache/spark/pull/47387 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

Re: [PR] [SPARK-48910][SQL] Use HashSet/HashMap to avoid linear searches in PreprocessTableCreation [spark]

2024-07-17 Thread via GitHub
HyukjinKwon commented on PR #47384: URL: https://github.com/apache/spark/pull/47384#issuecomment-2234944051 Can you update your `master` in forked branch to be the latest? seems yes not related to your PR -- This is an automated message from the Apache Git Service. To respond to the messa

Re: [PR] [SPARK-48924][PS] Add a pandas-like `make_interval` helper function [spark]

2024-07-17 Thread via GitHub
HyukjinKwon commented on PR #47385: URL: https://github.com/apache/spark/pull/47385#issuecomment-2234942872 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] [SPARK-48924][PS] Add a pandas-like `make_interval` helper function [spark]

2024-07-17 Thread via GitHub
HyukjinKwon closed pull request #47385: [SPARK-48924][PS] Add a pandas-like `make_interval` helper function URL: https://github.com/apache/spark/pull/47385 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

Re: [PR] [SPARK-48875][COMMON] Variant support quick exit in getFieldByKey [spark]

2024-07-17 Thread via GitHub
xuzifu666 commented on PR #47314: URL: https://github.com/apache/spark/pull/47314#issuecomment-2234942195 OK close it first. @cloud-fan @yaooqinn @chenhao-db @gene-db -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use th

Re: [PR] [SPARK-48875][COMMON] Variant support quick exit in getFieldByKey [spark]

2024-07-17 Thread via GitHub
xuzifu666 closed pull request #47314: [SPARK-48875][COMMON] Variant support quick exit in getFieldByKey URL: https://github.com/apache/spark/pull/47314 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go t

Re: [PR] [SPARK-48926][SQL][TESTS] Use `checkError` method to optimize exception check logic related to `UNRESOLVED_COLUMN` error classes [spark]

2024-07-17 Thread via GitHub
wayneguow commented on PR #47389: URL: https://github.com/apache/spark/pull/47389#issuecomment-2234918570 @allisonwang-db Thank you, I changed it to `TESTS`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abov

Re: [PR] [SPARK-48924][PS] Add a pandas-like `make_interval` helper function [spark]

2024-07-17 Thread via GitHub
zhengruifeng commented on code in PR #47385: URL: https://github.com/apache/spark/pull/47385#discussion_r1681920226 ## python/pyspark/pandas/resample.py: ## @@ -130,19 +130,6 @@ def _resamplekey_type(self) -> DataType: def _agg_columns_scols(self) -> List[Column]:

Re: [PR] [SPARK-48931][SS] Reduce Cloud Store List API cost for state store maintenance task [spark]

2024-07-17 Thread via GitHub
chaoqin-li1123 commented on code in PR #47393: URL: https://github.com/apache/spark/pull/47393#discussion_r1681920047 ## sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala: ## @@ -2129,6 +2129,13 @@ object SQLConf { .intConf .createWithDefault(100)

Re: [PR] [SPARK-47649][SQL] Make the parameter `inputs` of the function `[csv|parquet|orc|json|text|xml](paths: String*)` non empty [spark]

2024-07-17 Thread via GitHub
github-actions[bot] closed pull request #45776: [SPARK-47649][SQL] Make the parameter `inputs` of the function `[csv|parquet|orc|json|text|xml](paths: String*)` non empty URL: https://github.com/apache/spark/pull/45776 -- This is an automated message from the Apache Git Service. To respond t

Re: [PR] [SPARK-48931][SS] Reduce Cloud Store List API cost for state store maintenance task [spark]

2024-07-17 Thread via GitHub
chaoqin-li1123 commented on code in PR #47393: URL: https://github.com/apache/spark/pull/47393#discussion_r1681918789 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/StateStoreConf.scala: ## @@ -41,6 +41,13 @@ class StateStoreConf( /** Minimum versio

Re: [PR] [SPARK-48755] State V2 base implementation and ValueState support [spark]

2024-07-17 Thread via GitHub
HyukjinKwon commented on code in PR #47133: URL: https://github.com/apache/spark/pull/47133#discussion_r1681915902 ## python/pyspark/sql/streaming/state_api_client.py: ## @@ -0,0 +1,142 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor li

Re: [PR] [SPARK-48752][PYTHON][CONNECT][DOCS] Introduce `pyspark.logger` for improved structured logging for PySpark [spark]

2024-07-17 Thread via GitHub
itholic commented on code in PR #47145: URL: https://github.com/apache/spark/pull/47145#discussion_r1681915720 ## python/pyspark/logger/tests/test_logger.py: ## @@ -0,0 +1,128 @@ +# -*- encoding: utf-8 -*- +# +# Licensed to the Apache Software Foundation (ASF) under one or more

Re: [PR] [SPARK-48755] State V2 base implementation and ValueState support [spark]

2024-07-17 Thread via GitHub
HyukjinKwon commented on code in PR #47133: URL: https://github.com/apache/spark/pull/47133#discussion_r1681915435 ## python/pyspark/sql/pandas/group_ops.py: ## @@ -33,6 +36,7 @@ PandasCogroupedMapFunction, ArrowGroupedMapFunction, ArrowCogroupedMapFun

[PR] [SPARK-48932][BUILD] Upgrade `commons-lang3` to 3.15.0 [spark]

2024-07-17 Thread via GitHub
panbingkun opened a new pull request, #47396: URL: https://github.com/apache/spark/pull/47396 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How

Re: [PR] [SPARK-48752][PYTHON][CONNECT][DOCS] Introduce `pyspark.logger` for improved structured logging for PySpark [spark]

2024-07-17 Thread via GitHub
HyukjinKwon commented on code in PR #47145: URL: https://github.com/apache/spark/pull/47145#discussion_r1681906748 ## python/pyspark/logger/logger.py: ## @@ -0,0 +1,156 @@ +# -*- encoding: utf-8 -*- +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# cont

Re: [PR] [SPARK-48752][PYTHON][CONNECT][DOCS] Introduce `pyspark.logger` for improved structured logging for PySpark [spark]

2024-07-17 Thread via GitHub
HyukjinKwon commented on code in PR #47145: URL: https://github.com/apache/spark/pull/47145#discussion_r1681906498 ## python/pyspark/logger/logger.py: ## @@ -0,0 +1,156 @@ +# -*- encoding: utf-8 -*- +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# cont

Re: [PR] [SPARK-48752][PYTHON][CONNECT][DOCS] Introduce `pyspark.logger` for improved structured logging for PySpark [spark]

2024-07-17 Thread via GitHub
HyukjinKwon commented on code in PR #47145: URL: https://github.com/apache/spark/pull/47145#discussion_r1681906279 ## python/pyspark/logger/tests/test_logger.py: ## @@ -0,0 +1,128 @@ +# -*- encoding: utf-8 -*- +# +# Licensed to the Apache Software Foundation (ASF) under one or m

Re: [PR] [SPARK-48752][PYTHON][CONNECT][DOCS] Introduce `pyspark.logger` for improved structured logging for PySpark [spark]

2024-07-17 Thread via GitHub
HyukjinKwon commented on code in PR #47145: URL: https://github.com/apache/spark/pull/47145#discussion_r1681903197 ## python/pyspark/logger/tests/connect/__init__.py: ## @@ -0,0 +1,16 @@ +# Review Comment: this should also be added into `setup.py` in connect so the tests can

Re: [PR] [SPARK-48752][PYTHON][CONNECT][DOCS] Introduce `pyspark.logger` for improved structured logging for PySpark [spark]

2024-07-17 Thread via GitHub
HyukjinKwon commented on code in PR #47145: URL: https://github.com/apache/spark/pull/47145#discussion_r1681902745 ## python/pyspark/logger/__init__.py: ## @@ -0,0 +1,27 @@ +# Review Comment: Add it to `setup.py` in both connect and classic -- This is an automated messag

Re: [PR] [SPARK-48510][CONNECT][FOLLOW-UP-MK2] Fix for UDAF `toColumn` API when running tests in Maven [spark]

2024-07-17 Thread via GitHub
dongjoon-hyun commented on PR #47387: URL: https://github.com/apache/spark/pull/47387#issuecomment-2234622712 To @xupefei , please revise the PR title like the following. ``` - [SPARK-48510][CONNECT][FOLLOW-UP-MK2] Fix for UDAF toColumn API when running tests in Maven + [SPARK-48510

[PR] [SPARK-48758][Core] Race condition between executor registration and heartbeat [spark]

2024-07-17 Thread via GitHub
miaoever opened a new pull request, #47395: URL: https://github.com/apache/spark/pull/47395 ### What changes were proposed in this pull request? We found a race condition in our prod jobs when executor finished registration but when the it starts to heartbeat, the driver tells the executo

Re: [PR] [SPARK-48510][CONNECT][FOLLOW-UP] Fix for UDAF `toColumn` API when running tests in Maven [spark]

2024-07-17 Thread via GitHub
dongjoon-hyun commented on PR #47368: URL: https://github.com/apache/spark/pull/47368#issuecomment-2234618393 Oh, is this still failing with the different error, @xupefei ? ``` - UDAF custom Aggregator - toColumn *** FAILED *** java.lang.NullPointerException: Cannot invoke "java.l

Re: [PR] [SPARK-48931][SS] Reduce Cloud Store List API cost for state store maintenance task [spark]

2024-07-17 Thread via GitHub
Kimahriman commented on code in PR #47393: URL: https://github.com/apache/spark/pull/47393#discussion_r1681869279 ## sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala: ## @@ -2129,6 +2129,13 @@ object SQLConf { .intConf .createWithDefault(100) +

Re: [PR] [SPARK-48931][SS] Reduce Cloud Store List API cost for state store maintenance task [spark]

2024-07-17 Thread via GitHub
chaoqin-li1123 commented on code in PR #47393: URL: https://github.com/apache/spark/pull/47393#discussion_r1681866435 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDB.scala: ## @@ -185,6 +185,8 @@ class RocksDB( val latestSnapshotVersion

Re: [PR] [SPARK-48930][CORE] Redact `awsAccessKeyId` by including `accesskey` pattern [spark]

2024-07-17 Thread via GitHub
dongjoon-hyun commented on PR #47392: URL: https://github.com/apache/spark/pull/47392#issuecomment-2234556165 Merged to all live release branches. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] [SPARK-48930][CORE] Redact `awsAccessKeyId` by including `accesskey` pattern [spark]

2024-07-17 Thread via GitHub
dongjoon-hyun closed pull request #47392: [SPARK-48930][CORE] Redact `awsAccessKeyId` by including `accesskey` pattern URL: https://github.com/apache/spark/pull/47392 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the UR

Re: [PR] [SPARK-48931][SS] Reduce Cloud Store List API cost for state store maintenance task [spark]

2024-07-17 Thread via GitHub
riyaverm-db commented on code in PR #47393: URL: https://github.com/apache/spark/pull/47393#discussion_r1681860880 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDB.scala: ## @@ -185,6 +185,8 @@ class RocksDB( val latestSnapshotVersion =

Re: [PR] [SPARK-48930][CORE] Redact `awsAccessKeyId` by including `accesskey` pattern [spark]

2024-07-17 Thread via GitHub
viirya commented on PR #47392: URL: https://github.com/apache/spark/pull/47392#issuecomment-2234544717 Thanks @dongjoon-hyun -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] [SPARK-48930][CORE] Redact `awsAccessKeyId` by including `accesskey` pattern [spark]

2024-07-17 Thread via GitHub
dongjoon-hyun commented on PR #47392: URL: https://github.com/apache/spark/pull/47392#issuecomment-2234542866 Thank you, @viirya ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific com

Re: [PR] [SPARK-48930][CORE] Redact `awsAccessKeyId` by including `accesskey` pattern [spark]

2024-07-17 Thread via GitHub
dongjoon-hyun commented on PR #47392: URL: https://github.com/apache/spark/pull/47392#issuecomment-2234533027 Could you review this PR, @viirya ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to t

Re: [PR] [SPARK-48931][SS] Reduce Cloud Store List API cost for state store maintenance task [spark]

2024-07-17 Thread via GitHub
chaoqin-li1123 commented on code in PR #47393: URL: https://github.com/apache/spark/pull/47393#discussion_r1681857191 ## sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala: ## @@ -2129,6 +2129,13 @@ object SQLConf { .intConf .createWithDefault(100)

Re: [PR] [SPARK-48931][SS] Reduce Cloud Store List API cost for state store maintenance task [spark]

2024-07-17 Thread via GitHub
Kimahriman commented on code in PR #47393: URL: https://github.com/apache/spark/pull/47393#discussion_r1681854029 ## sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala: ## @@ -2129,6 +2129,13 @@ object SQLConf { .intConf .createWithDefault(100) +

[PR] migrate file appender [spark]

2024-07-17 Thread via GitHub
asl3 opened a new pull request, #47394: URL: https://github.com/apache/spark/pull/47394 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How was th

Re: [PR] [SPARK-48931][SS] Reduce Cloud Store List API cost for state store maintenance task [spark]

2024-07-17 Thread via GitHub
chaoqin-li1123 commented on code in PR #47393: URL: https://github.com/apache/spark/pull/47393#discussion_r1681834923 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/StateStoreConf.scala: ## @@ -41,6 +41,9 @@ class StateStoreConf( /** Minimum version

Re: [PR] [SPARK-48931][SS] Reduce Cloud Store List API cost for state store maintenance task [spark]

2024-07-17 Thread via GitHub
chaoqin-li1123 commented on code in PR #47393: URL: https://github.com/apache/spark/pull/47393#discussion_r1681834138 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDBFileManager.scala: ## @@ -451,6 +492,8 @@ class RocksDBFileManager( .filt

Re: [PR] [SPARK-48931][SS] Reduce Cloud Store List API cost for state store maintenance task [spark]

2024-07-17 Thread via GitHub
chaoqin-li1123 commented on code in PR #47393: URL: https://github.com/apache/spark/pull/47393#discussion_r1681833016 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDB.scala: ## @@ -185,6 +185,8 @@ class RocksDB( val latestSnapshotVersion

Re: [PR] [SPARK-48382]Add controller / reconciler module to operator [spark-kubernetes-operator]

2024-07-17 Thread via GitHub
jiangzho commented on code in PR #12: URL: https://github.com/apache/spark-kubernetes-operator/pull/12#discussion_r1681826494 ## spark-operator/src/main/java/org/apache/spark/k8s/operator/SparkOperator.java: ## @@ -0,0 +1,225 @@ +/* + * Licensed to the Apache Software Foundatio

Re: [PR] [SPARK-48382]Add controller / reconciler module to operator [spark-kubernetes-operator]

2024-07-17 Thread via GitHub
jiangzho commented on code in PR #12: URL: https://github.com/apache/spark-kubernetes-operator/pull/12#discussion_r1681824226 ## spark-operator/src/main/java/org/apache/spark/k8s/operator/utils/StatusRecorder.java: ## @@ -0,0 +1,150 @@ +/* + * Licensed to the Apache Software Fo

Re: [PR] [SPARK-48382]Add controller / reconciler module to operator [spark-kubernetes-operator]

2024-07-17 Thread via GitHub
jiangzho commented on code in PR #12: URL: https://github.com/apache/spark-kubernetes-operator/pull/12#discussion_r1681816265 ## spark-operator/src/main/java/org/apache/spark/k8s/operator/metrics/MetricsSystem.java: ## @@ -0,0 +1,139 @@ +/* + * Licensed to the Apache Software F

Re: [PR] [SPARK-48382]Add controller / reconciler module to operator [spark-kubernetes-operator]

2024-07-17 Thread via GitHub
jiangzho commented on code in PR #12: URL: https://github.com/apache/spark-kubernetes-operator/pull/12#discussion_r1681815897 ## spark-operator/src/main/java/org/apache/spark/k8s/operator/metrics/MetricsService.java: ## @@ -0,0 +1,56 @@ +/* + * Licensed to the Apache Software F

Re: [PR] [SPARK-48382]Add controller / reconciler module to operator [spark-kubernetes-operator]

2024-07-17 Thread via GitHub
jiangzho commented on code in PR #12: URL: https://github.com/apache/spark-kubernetes-operator/pull/12#discussion_r1681814845 ## spark-operator/src/main/java/org/apache/spark/k8s/operator/config/SparkOperatorConf.java: ## @@ -0,0 +1,429 @@ +/* + * Licensed to the Apache Softwar

Re: [PR] [SPARK-48382]Add controller / reconciler module to operator [spark-kubernetes-operator]

2024-07-17 Thread via GitHub
jiangzho commented on code in PR #12: URL: https://github.com/apache/spark-kubernetes-operator/pull/12#discussion_r1681813251 ## spark-operator/src/main/java/org/apache/spark/k8s/operator/config/SparkOperatorConf.java: ## @@ -0,0 +1,429 @@ +/* + * Licensed to the Apache Softwar

Re: [PR] [SPARK-48382]Add controller / reconciler module to operator [spark-kubernetes-operator]

2024-07-17 Thread via GitHub
jiangzho commented on code in PR #12: URL: https://github.com/apache/spark-kubernetes-operator/pull/12#discussion_r1681812834 ## spark-operator/src/main/java/org/apache/spark/k8s/operator/config/SparkOperatorConf.java: ## @@ -0,0 +1,429 @@ +/* + * Licensed to the Apache Softwar

Re: [PR] [SPARK-48382]Add controller / reconciler module to operator [spark-kubernetes-operator]

2024-07-17 Thread via GitHub
jiangzho commented on code in PR #12: URL: https://github.com/apache/spark-kubernetes-operator/pull/12#discussion_r1681811959 ## spark-operator/src/main/java/org/apache/spark/k8s/operator/config/SparkOperatorConf.java: ## @@ -0,0 +1,429 @@ +/* + * Licensed to the Apache Softwar

Re: [PR] [SPARK-48382]Add controller / reconciler module to operator [spark-kubernetes-operator]

2024-07-17 Thread via GitHub
jiangzho commented on code in PR #12: URL: https://github.com/apache/spark-kubernetes-operator/pull/12#discussion_r1681811410 ## spark-operator/src/main/java/org/apache/spark/k8s/operator/config/SparkOperatorConf.java: ## @@ -0,0 +1,429 @@ +/* + * Licensed to the Apache Softwar

Re: [PR] [SPARK-48382]Add controller / reconciler module to operator [spark-kubernetes-operator]

2024-07-17 Thread via GitHub
jiangzho commented on code in PR #12: URL: https://github.com/apache/spark-kubernetes-operator/pull/12#discussion_r1681808777 ## spark-operator/src/main/java/org/apache/spark/k8s/operator/config/SparkOperatorConf.java: ## @@ -0,0 +1,429 @@ +/* + * Licensed to the Apache Softwar

  1   2   3   >