Re: [PR] [SPARK-50792][SQL] Format binary data as a binary literal in JDBC. [spark]

2025-01-13 Thread via GitHub
beliefer commented on code in PR #49452: URL: https://github.com/apache/spark/pull/49452#discussion_r1913047022 ## connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/V2JDBCTest.scala: ## @@ -986,4 +986,18 @@ private[v2] trait V2JDBCTest extends Shared

Re: [PR] [SPARK-50793][SQL] Fix MySQL cast function for DOUBLE, LONGTEXT, SMALLINT, INTEGER, BIGINT and BLOB types [spark]

2025-01-13 Thread via GitHub
sunxiaoguang commented on code in PR #49453: URL: https://github.com/apache/spark/pull/49453#discussion_r1912728676 ## connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/MySQLIntegrationSuite.scala: ## @@ -241,6 +241,37 @@ class MySQLIntegrationSuite

Re: [PR] [SPARK-50793][SQL] Fix MySQL cast function for DOUBLE, LONGTEXT, SMALLINT, INTEGER, BIGINT and BLOB types [spark]

2025-01-13 Thread via GitHub
sunxiaoguang commented on code in PR #49453: URL: https://github.com/apache/spark/pull/49453#discussion_r1912727149 ## sql/core/src/main/scala/org/apache/spark/sql/jdbc/MySQLDialect.scala: ## @@ -259,6 +272,8 @@ private case class MySQLDialect() extends JdbcDialect with SQLConf

Re: [PR] [SPARK-50792][SQL] Format binary data as a binary literal in JDBC. [spark]

2025-01-13 Thread via GitHub
beliefer commented on code in PR #49452: URL: https://github.com/apache/spark/pull/49452#discussion_r1913055894 ## connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/V2JDBCTest.scala: ## @@ -986,4 +986,18 @@ private[v2] trait V2JDBCTest extends Shared

Re: [PR] [SPARK-50792][SQL] Format binary data as a binary literal in JDBC. [spark]

2025-01-13 Thread via GitHub
beliefer commented on code in PR #49452: URL: https://github.com/apache/spark/pull/49452#discussion_r1913047022 ## connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/V2JDBCTest.scala: ## @@ -986,4 +986,18 @@ private[v2] trait V2JDBCTest extends Shared

Re: [PR] [SPARK-50792][SQL] Format binary data as a binary literal in JDBC. [spark]

2025-01-13 Thread via GitHub
beliefer commented on code in PR #49452: URL: https://github.com/apache/spark/pull/49452#discussion_r1913058001 ## sql/core/src/main/scala/org/apache/spark/sql/jdbc/OracleDialect.scala: ## @@ -61,6 +61,34 @@ private case class OracleDialect() extends JdbcDialect with SQLConfHel

Re: [PR] [SPARK-50792][SQL] Format binary data as a binary literal in JDBC. [spark]

2025-01-13 Thread via GitHub
beliefer commented on code in PR #49452: URL: https://github.com/apache/spark/pull/49452#discussion_r1912899051 ## connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/V2JDBCTest.scala: ## @@ -986,4 +986,23 @@ private[v2] trait V2JDBCTest extends Shared

Re: [PR] [SPARK-50792][SQL] Format binary data as a binary literal in JDBC. [spark]

2025-01-13 Thread via GitHub
sunxiaoguang commented on code in PR #49452: URL: https://github.com/apache/spark/pull/49452#discussion_r1913063805 ## sql/core/src/main/scala/org/apache/spark/sql/jdbc/OracleDialect.scala: ## @@ -61,6 +61,34 @@ private case class OracleDialect() extends JdbcDialect with SQLCon

Re: [PR] [SPARK-50792][SQL] Format binary data as a binary literal in JDBC. [spark]

2025-01-13 Thread via GitHub
sunxiaoguang commented on code in PR #49452: URL: https://github.com/apache/spark/pull/49452#discussion_r1913065220 ## sql/core/src/main/scala/org/apache/spark/sql/jdbc/OracleDialect.scala: ## @@ -61,6 +61,34 @@ private case class OracleDialect() extends JdbcDialect with SQLCon

Re: [PR] [SPARK-50735][CONNECT] Failure in ExecuteResponseObserver results in infinite reattaching requests [spark]

2025-01-13 Thread via GitHub
changgyoopark-db commented on code in PR #49370: URL: https://github.com/apache/spark/pull/49370#discussion_r1913157644 ## sql/connect/server/src/main/scala/org/apache/spark/sql/connect/execution/ExecuteGrpcResponseSender.scala: ## @@ -319,7 +325,14 @@ private[connect] class Exe

Re: [PR] [SPARK-50735][CONNECT] Failure in ExecuteResponseObserver results in infinite reattaching requests [spark]

2025-01-13 Thread via GitHub
juliuszsompolski commented on code in PR #49370: URL: https://github.com/apache/spark/pull/49370#discussion_r1913334809 ## sql/connect/server/src/main/scala/org/apache/spark/sql/connect/execution/ExecuteGrpcResponseSender.scala: ## @@ -319,7 +325,14 @@ private[connect] class Exe

Re: [PR] [SPARK-50735][CONNECT] Failure in ExecuteResponseObserver results in infinite reattaching requests [spark]

2025-01-13 Thread via GitHub
changgyoopark-db commented on code in PR #49370: URL: https://github.com/apache/spark/pull/49370#discussion_r1913342621 ## sql/connect/server/src/main/scala/org/apache/spark/sql/connect/execution/ExecuteGrpcResponseSender.scala: ## @@ -319,7 +325,14 @@ private[connect] class Exe

Re: [PR] [SPARK-50790][PYTHON] Implement parse json in pyspark [spark]

2025-01-13 Thread via GitHub
gene-db commented on code in PR #49450: URL: https://github.com/apache/spark/pull/49450#discussion_r1913982782 ## python/pyspark/sql/types.py: ## @@ -1770,6 +1770,15 @@ def toJson(self, zone_id: str = "UTC") -> str: """ return VariantUtils.to_json(self.value, s

Re: [PR] [SPARK-50800][PYTHON][TESTS] Upgrade python to 3.11 in Python Packaging test [spark]

2025-01-13 Thread via GitHub
dongjoon-hyun commented on PR #49464: URL: https://github.com/apache/spark/pull/49464#issuecomment-2588538702 According to the Apache Spark release process, `Python 3.9` packaging should be used always and we can test it against all supported Python versions. It's because we don't exp

Re: [PR] [SPARK-50800][PYTHON][TESTS] Upgrade python to 3.11 in Python Packaging test [spark]

2025-01-13 Thread via GitHub
HyukjinKwon commented on PR #49464: URL: https://github.com/apache/spark/pull/49464#issuecomment-2588537494 We actually have the schedule the build for all supported versions ... If we need to test one specific thing with the one specific Python version, then we should think about which ver

Re: [PR] [SPARK-50392][PYTHON][FOLLOWUP] Move `import`s into methods to fix `connect-only` builds [spark]

2025-01-13 Thread via GitHub
dongjoon-hyun closed pull request #49472: [SPARK-50392][PYTHON][FOLLOWUP] Move `import`s into methods to fix `connect-only` builds URL: https://github.com/apache/spark/pull/49472 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub an

Re: [PR] [SPARK-48745][INFRA][PYTHON][TESTS][FOLLOWUP] use `conda-incubator/setup-miniconda` action [spark]

2025-01-13 Thread via GitHub
HyukjinKwon closed pull request #49465: [SPARK-48745][INFRA][PYTHON][TESTS][FOLLOWUP] use `conda-incubator/setup-miniconda` action URL: https://github.com/apache/spark/pull/49465 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub a

Re: [PR] [SPARK-48745][INFRA][PYTHON][TESTS][FOLLOWUP] use `conda-incubator/setup-miniconda` action [spark]

2025-01-13 Thread via GitHub
HyukjinKwon commented on PR #49465: URL: https://github.com/apache/spark/pull/49465#issuecomment-2588533292 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] [SPARK-50790][PYTHON] Implement parse json in pyspark [spark]

2025-01-13 Thread via GitHub
HyukjinKwon commented on code in PR #49450: URL: https://github.com/apache/spark/pull/49450#discussion_r1913984184 ## python/pyspark/sql/types.py: ## @@ -1770,6 +1770,15 @@ def toJson(self, zone_id: str = "UTC") -> str: """ return VariantUtils.to_json(self.valu

Re: [PR] [SPARK-50800][PYTHON][TESTS] Upgrade python to 3.11 in Python Packaging test [spark]

2025-01-13 Thread via GitHub
HyukjinKwon commented on PR #49464: URL: https://github.com/apache/spark/pull/49464#issuecomment-2588497845 I think we should spend some time to pick which versions to use in the CI. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to Gi

Re: [PR] [SPARK-50392][PYTHON][FOLLOWUP] Move `import`s into methods to fix `connect-only` builds [spark]

2025-01-13 Thread via GitHub
dongjoon-hyun commented on PR #49472: URL: https://github.com/apache/spark/pull/49472#issuecomment-2588506306 Merged to master. Thank you! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spec

Re: [PR] [SPARK-50800][PYTHON][TESTS] Upgrade python to 3.11 in Python Packaging test [spark]

2025-01-13 Thread via GitHub
zhengruifeng commented on PR #49464: URL: https://github.com/apache/spark/pull/49464#issuecomment-2588535715 @dongjoon-hyun @HyukjinKwon It is a good question, I think probably we should test it against all supported versions, WDYT? -- This is an automated message from the Apache Git

[PR] [SPARK-47081][CONNECT][FOLLOW-UP] Respect `spark.connect.progress.reportInterval` over timeout [spark]

2025-01-13 Thread via GitHub
HyukjinKwon opened a new pull request, #49474: URL: https://github.com/apache/spark/pull/49474 ### What changes were proposed in this pull request? This PR is a followup that addresses https://github.com/apache/spark/pull/45150#discussion_r1913310090 ### Why are the changes nee

Re: [PR] [SPARK-48745][INFRA][PYTHON][TESTS][FOLLOWUP] use `conda-incubator/setup-miniconda` action [spark]

2025-01-13 Thread via GitHub
panbingkun commented on PR #49465: URL: https://github.com/apache/spark/pull/49465#issuecomment-2588543652 Thanks all. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To un

Re: [PR] [SPARK-50799][PYTHON] Refine the docstring of rlike, length, octet_length, bit_length, and transform [spark]

2025-01-13 Thread via GitHub
drexler-sky commented on code in PR #49463: URL: https://github.com/apache/spark/pull/49463#discussion_r1913998280 ## python/pyspark/sql/functions/builtin.py: ## @@ -15177,9 +15177,9 @@ def rlike(str: "ColumnOrName", regexp: "ColumnOrName") -> Column: Parameters ---

Re: [PR] [SPARK-50601][SQL] Support withColumns / withColumnsRenamed in subqueries [spark]

2025-01-13 Thread via GitHub
ueshin closed pull request #49386: [SPARK-50601][SQL] Support withColumns / withColumnsRenamed in subqueries URL: https://github.com/apache/spark/pull/49386 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

Re: [PR] [SPARK-47081][CONNECT][FOLLOW-UP] Respect `spark.connect.progress.reportInterval` over timeout [spark]

2025-01-13 Thread via GitHub
HyukjinKwon closed pull request #49474: [SPARK-47081][CONNECT][FOLLOW-UP] Respect `spark.connect.progress.reportInterval` over timeout URL: https://github.com/apache/spark/pull/49474 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHu

Re: [PR] [SPARK-47081][CONNECT][FOLLOW-UP] Respect `spark.connect.progress.reportInterval` over timeout [spark]

2025-01-13 Thread via GitHub
HyukjinKwon commented on PR #49474: URL: https://github.com/apache/spark/pull/49474#issuecomment-2588713451 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] [SPARK-50633][FOLLOWUP] Set `CODECOV_TOKEN` with environment variables [spark]

2025-01-13 Thread via GitHub
panbingkun commented on PR #49477: URL: https://github.com/apache/spark/pull/49477#issuecomment-2588811574 @zhengruifeng @LuciferYang @dongjoon-hyun @HyukjinKwon Could you take a quick look? I want to merge it quickly and verify that this approach is feasible. Thanks! -- This is an

Re: [PR] [SPARK-50633][FOLLOWUP] Set `CODECOV_TOKEN` with environment variables [spark]

2025-01-13 Thread via GitHub
panbingkun commented on PR #49477: URL: https://github.com/apache/spark/pull/49477#issuecomment-2588818307 Merging to master! Let me trigger a new job now. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abo

Re: [PR] [SPARK-50633][FOLLOWUP] Set `CODECOV_TOKEN` with environment variables [spark]

2025-01-13 Thread via GitHub
panbingkun commented on PR #49477: URL: https://github.com/apache/spark/pull/49477#issuecomment-2588813894 > +1 for the try, @panbingkun . Thanks! Let me try it out. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and u

Re: [PR] [SPARK-50633][FOLLOWUP] Set `CODECOV_TOKEN` with environment variables [spark]

2025-01-13 Thread via GitHub
panbingkun closed pull request #49477: [SPARK-50633][FOLLOWUP] Set `CODECOV_TOKEN` with environment variables URL: https://github.com/apache/spark/pull/49477 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above t

Re: [PR] [WIP][SPARK-24815] [CORE] Trigger Interval based DRA for Structured Streaming [spark]

2025-01-13 Thread via GitHub
raman-sauko commented on PR #42352: URL: https://github.com/apache/spark/pull/42352#issuecomment-2588250750 We were so close :) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comme

[PR] [SPARK-50763][SQL] Add Analyzer rule for resolving SQL table UDFs [spark]

2025-01-13 Thread via GitHub
allisonwang-db opened a new pull request, #49471: URL: https://github.com/apache/spark/pull/49471 ### What changes were proposed in this pull request? This PR adds an Analyzer rule to resolve SQL user-defined table functions. ### Why are the changes needed?

Re: [PR] [SPARK-50773][Core] Disable structured logging by default [spark]

2025-01-13 Thread via GitHub
dongjoon-hyun commented on PR #49421: URL: https://github.com/apache/spark/pull/49421#issuecomment-2588517924 For the record, the vote passed already. - https://lists.apache.org/thread/jhxrkqs3d1f32d6jyw86kd9qomjdjtzv -- This is an automated message from the Apache Git Service. To respo

Re: [PR] [SPARK-50773][Core] Disable structured logging by default [spark]

2025-01-13 Thread via GitHub
dongjoon-hyun closed pull request #49421: [SPARK-50773][Core] Disable structured logging by default URL: https://github.com/apache/spark/pull/49421 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to th

Re: [PR] [SPARK-50773][Core] Disable structured logging by default [spark]

2025-01-13 Thread via GitHub
dongjoon-hyun commented on PR #49421: URL: https://github.com/apache/spark/pull/49421#issuecomment-2588516389 The last failure seems to be outdated result after merging #49369 . https://github.com/user-attachments/assets/d66b5436-948e-4e70-9613-593bc7fbfc4a"; /> Let me merge t

Re: [PR] [SPARK-48745][INFRA][PYTHON][TESTS][FOLLOWUP] use `conda-incubator/setup-miniconda` action [spark]

2025-01-13 Thread via GitHub
panbingkun commented on code in PR #49465: URL: https://github.com/apache/spark/pull/49465#discussion_r1913990691 ## .github/workflows/build_and_test.yml: ## @@ -600,19 +600,16 @@ jobs: done - name: Install Conda for pip packaging test if: contains(matrix.mo

Re: [PR] [SPARK-47081][CONNECT] Support Query Execution Progress [spark]

2025-01-13 Thread via GitHub
HyukjinKwon commented on code in PR #45150: URL: https://github.com/apache/spark/pull/45150#discussion_r1913991119 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/execution/ExecuteGrpcResponseSender.scala: ## @@ -201,9 +237,18 @@ private[connect] class Ex

Re: [PR] [SPARK-50800][PYTHON][TESTS] Upgrade python to 3.11 in Python Packaging test [spark]

2025-01-13 Thread via GitHub
zhengruifeng commented on PR #49464: URL: https://github.com/apache/spark/pull/49464#issuecomment-2588770478 thanks, let me do more investigation, I am currently confused in the python versions in the test -- This is an automated message from the Apache Git Service. To respond to the mess

[PR] [SPARK-50805][CORE] Move method `nameForAppAndAttempt` to `o.a.s.u.Utils` [spark]

2025-01-13 Thread via GitHub
pan3793 opened a new pull request, #49476: URL: https://github.com/apache/spark/pull/49476 ### What changes were proposed in this pull request? Pure refactor, move method `nameForAppAndAttempt` from `EventLogFileWriter` to `o.a.s.u.Utils`. ### Why are the changes needed

Re: [PR] [SPARK-50793][SQL] Fix MySQL cast function for DOUBLE, LONGTEXT, SMALLINT, INTEGER, BIGINT and BLOB types [spark]

2025-01-13 Thread via GitHub
yaooqinn commented on code in PR #49453: URL: https://github.com/apache/spark/pull/49453#discussion_r1914140218 ## sql/core/src/main/scala/org/apache/spark/sql/jdbc/MySQLDialect.scala: ## @@ -112,6 +112,19 @@ private case class MySQLDialect() extends JdbcDialect with SQLConfHel

Re: [PR] [SPARK-50805][CORE] Move method `nameForAppAndAttempt` to `o.a.s.u.Utils` [spark]

2025-01-13 Thread via GitHub
dongjoon-hyun closed pull request #49476: [SPARK-50805][CORE] Move method `nameForAppAndAttempt` to `o.a.s.u.Utils` URL: https://github.com/apache/spark/pull/49476 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL a

Re: [PR] [SPARK-50805][CORE] Move method `nameForAppAndAttempt` to `o.a.s.u.Utils` [spark]

2025-01-13 Thread via GitHub
dongjoon-hyun commented on PR #49476: URL: https://github.com/apache/spark/pull/49476#issuecomment-2588795124 Merged to master for Apache Spark 4.0.0. I manually tested. ``` $ build/sbt "core/testOnly org.apache.spark.deploy.history.*" ... [info] RollingEventLogFilesWriter

Re: [PR] [SPARK-50783][CORE] Canonicalize JVM profiler results file name and layout on DFS [spark]

2025-01-13 Thread via GitHub
dongjoon-hyun commented on PR #49440: URL: https://github.com/apache/spark/pull/49440#issuecomment-2588796164 I merged the spin-offed PR, @pan3793 . Could you rebase this to the master? - https://github.com/apache/spark/pull/49476 -- This is an automated message from the Apache Git Serv

[PR] [SPARK-50633][FOLLOWUP] Set `CODECOV_TOKEN` with environment variables [spark]

2025-01-13 Thread via GitHub
panbingkun opened a new pull request, #49477: URL: https://github.com/apache/spark/pull/49477 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How

[PR] [MINOR][DOCS] Fix the examples of createDataFrame [spark]

2025-01-13 Thread via GitHub
zhengruifeng opened a new pull request, #49475: URL: https://github.com/apache/spark/pull/49475 ### What changes were proposed in this pull request? Fix the examples of createDataFrame ### Why are the changes needed? `collect` -> `show` ### Does this PR introdu

Re: [PR] [SPARK-50601][SQL] Support withColumns / withColumnsRenamed in subqueries [spark]

2025-01-13 Thread via GitHub
ueshin commented on PR #49386: URL: https://github.com/apache/spark/pull/49386#issuecomment-2588705330 The remaining test failures are not related to this PR. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abo

Re: [PR] [SPARK-50601][SQL] Support withColumns / withColumnsRenamed in subqueries [spark]

2025-01-13 Thread via GitHub
ueshin commented on PR #49386: URL: https://github.com/apache/spark/pull/49386#issuecomment-2588706930 Thanks! merging to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comm

Re: [PR] [SPARK-50800][PYTHON][TESTS] Upgrade python to 3.11 in Python Packaging test [spark]

2025-01-13 Thread via GitHub
HyukjinKwon commented on PR #49464: URL: https://github.com/apache/spark/pull/49464#issuecomment-2588538012 For type hints, we're using Python 3.9 IIRC. We should probably investigate, list them out, and pick which version to test .. -- This is an automated message from the Apache Git Ser

Re: [PR] [SPARK-50795][SQL] Display all DESCRIBE AS JSON dates in ISO-8601 format [spark]

2025-01-13 Thread via GitHub
cloud-fan commented on code in PR #49455: URL: https://github.com/apache/spark/pull/49455#discussion_r1914010021 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/interface.scala: ## @@ -91,6 +90,14 @@ trait MetadataMapSupport { } map } + + val t

Re: [PR] [SPARK-50783] Canonicalize JVM profiler results file name and layout on DFS [spark]

2025-01-13 Thread via GitHub
pan3793 commented on code in PR #49440: URL: https://github.com/apache/spark/pull/49440#discussion_r1914173761 ## connector/profiler/src/main/scala/org/apache/spark/executor/profiler/ExecutorJVMProfiler.scala: ## @@ -89,28 +100,34 @@ private[spark] class ExecutorJVMProfiler(conf

Re: [PR] [SPARK-50783] Canonicalize JVM profiler results file name and layout on DFS [spark]

2025-01-13 Thread via GitHub
dongjoon-hyun closed pull request #49440: [SPARK-50783] Canonicalize JVM profiler results file name and layout on DFS URL: https://github.com/apache/spark/pull/49440 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

Re: [PR] [SPARK-50774][SQL] Centralize collation names in one place [spark]

2025-01-13 Thread via GitHub
HyukjinKwon commented on PR #49425: URL: https://github.com/apache/spark/pull/49425#issuecomment-2588846732 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] [SPARK-50774][SQL] Centralize collation names in one place [spark]

2025-01-13 Thread via GitHub
HyukjinKwon closed pull request #49425: [SPARK-50774][SQL] Centralize collation names in one place URL: https://github.com/apache/spark/pull/49425 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] [SPARK-49907][ML][CONNECT] Support spark.ml on Connect [spark]

2025-01-13 Thread via GitHub
zhengruifeng commented on PR #48791: URL: https://github.com/apache/spark/pull/48791#issuecomment-2588867463 had offline discussion with @wbo4958 , separate messages for vector/matrix are compact so more suitable for large params like initial model weights. -- This is an automated message

Re: [PR] [SPARK-48809][PYTHON][DOCS] Reimplemented `spark version drop down` of the `PySpark doc site` and fix bug [spark]

2025-01-13 Thread via GitHub
HyukjinKwon closed pull request #47214: [SPARK-48809][PYTHON][DOCS] Reimplemented `spark version drop down` of the `PySpark doc site` and fix bug URL: https://github.com/apache/spark/pull/47214 -- This is an automated message from the Apache Git Service. To respond to the message, please log

Re: [PR] [SPARK-48809][PYTHON][DOCS] Reimplemented `spark version drop down` of the `PySpark doc site` and fix bug [spark]

2025-01-13 Thread via GitHub
HyukjinKwon commented on PR #47214: URL: https://github.com/apache/spark/pull/47214#issuecomment-25 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] [SPARK-50790][PYTHON] Implement parse json in pyspark [spark]

2025-01-13 Thread via GitHub
gene-db commented on code in PR #49450: URL: https://github.com/apache/spark/pull/49450#discussion_r1914193399 ## python/pyspark/sql/types.py: ## @@ -1770,6 +1770,15 @@ def toJson(self, zone_id: str = "UTC") -> str: """ return VariantUtils.to_json(self.value, s

Re: [PR] [SPARK-50783] Canonicalize JVM profiler results file name and layout on DFS [spark]

2025-01-13 Thread via GitHub
dongjoon-hyun commented on PR #49440: URL: https://github.com/apache/spark/pull/49440#issuecomment-2588827882 Thank you! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [PR] [SPARK-50804][SQL] to_protobuf() should not throw MatchError [spark]

2025-01-13 Thread via GitHub
LuciferYang commented on code in PR #49473: URL: https://github.com/apache/spark/pull/49473#discussion_r1914176360 ## sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryCompilationErrors.scala: ## @@ -3783,6 +3783,14 @@ private[sql] object QueryCompilationErrors extend

Re: [PR] [SPARK-50804][SQL] to_protobuf() should not throw MatchError [spark]

2025-01-13 Thread via GitHub
HyukjinKwon commented on code in PR #49473: URL: https://github.com/apache/spark/pull/49473#discussion_r1914176714 ## connector/protobuf/src/main/scala/org/apache/spark/sql/protobuf/ProtobufSerializer.scala: ## @@ -52,6 +52,10 @@ private[sql] class ProtobufSerializer( r

Re: [PR] [SPARK-50804][SQL] to_protobuf() should not throw MatchError [spark]

2025-01-13 Thread via GitHub
LuciferYang commented on code in PR #49473: URL: https://github.com/apache/spark/pull/49473#discussion_r1914176360 ## sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryCompilationErrors.scala: ## @@ -3783,6 +3783,14 @@ private[sql] object QueryCompilationErrors extend

Re: [PR] [SPARK-50783] Canonicalize JVM profiler results file name and layout on DFS [spark]

2025-01-13 Thread via GitHub
dongjoon-hyun commented on PR #49440: URL: https://github.com/apache/spark/pull/49440#issuecomment-2588833656 Since this is a subset of previous status, I manually tested the compilation. Merged to master for Apache Spark 4.0.0. Thank you, @pan3793 and @parthchandra . -- This

Re: [PR] [SPARK-50783] Canonicalize JVM profiler results file name and layout on DFS [spark]

2025-01-13 Thread via GitHub
pan3793 commented on PR #49440: URL: https://github.com/apache/spark/pull/49440#issuecomment-2588827053 @dongjoon-hyun thanks, rebased -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [MINOR][DOCS] Fix the examples of createDataFrame [spark]

2025-01-13 Thread via GitHub
dongjoon-hyun commented on PR #49475: URL: https://github.com/apache/spark/pull/49475#issuecomment-2588836578 `sql` module passed. Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go t

Re: [PR] [MINOR][DOCS] Fix the examples of createDataFrame [spark]

2025-01-13 Thread via GitHub
dongjoon-hyun closed pull request #49475: [MINOR][DOCS] Fix the examples of createDataFrame URL: https://github.com/apache/spark/pull/49475 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specif

Re: [PR] [SPARK-50804][SQL] to_protobuf() should not throw MatchError [spark]

2025-01-13 Thread via GitHub
siying commented on code in PR #49473: URL: https://github.com/apache/spark/pull/49473#discussion_r1914207839 ## connector/protobuf/src/main/scala/org/apache/spark/sql/protobuf/ProtobufSerializer.scala: ## @@ -52,6 +52,10 @@ private[sql] class ProtobufSerializer( rootCa

Re: [PR] [SPARK-50790][PYTHON] Implement parse json in pyspark [spark]

2025-01-13 Thread via GitHub
HyukjinKwon commented on PR #49450: URL: https://github.com/apache/spark/pull/49450#issuecomment-2588894625 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] [SPARK-49907][ML][CONNECT] Support spark.ml on Connect [spark]

2025-01-13 Thread via GitHub
zhengruifeng commented on code in PR #48791: URL: https://github.com/apache/spark/pull/48791#discussion_r1914210183 ## sql/connect/server/src/main/scala/org/apache/spark/sql/connect/ml/MLUtils.scala: ## @@ -0,0 +1,353 @@ +/* + * Licensed to the Apache Software Foundation (ASF) u

Re: [PR] [SPARK-50804][SQL] to_protobuf() should not throw MatchError [spark]

2025-01-13 Thread via GitHub
HyukjinKwon commented on code in PR #49473: URL: https://github.com/apache/spark/pull/49473#discussion_r1914211537 ## connector/protobuf/src/main/scala/org/apache/spark/sql/protobuf/ProtobufSerializer.scala: ## @@ -52,6 +52,10 @@ private[sql] class ProtobufSerializer( r

Re: [PR] [SPARK-50804][SQL] to_protobuf() should not throw MatchError [spark]

2025-01-13 Thread via GitHub
HyukjinKwon commented on code in PR #49473: URL: https://github.com/apache/spark/pull/49473#discussion_r1914211727 ## connector/protobuf/src/main/scala/org/apache/spark/sql/protobuf/ProtobufSerializer.scala: ## @@ -52,6 +52,10 @@ private[sql] class ProtobufSerializer( r

Re: [PR] [SPARK-50790][PYTHON] Implement parse json in pyspark [spark]

2025-01-13 Thread via GitHub
HyukjinKwon closed pull request #49450: [SPARK-50790][PYTHON] Implement parse json in pyspark URL: https://github.com/apache/spark/pull/49450 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spec

[PR] [SPARK-50807][BUILD] Upgrade Scala to 2.13.16 [spark]

2025-01-13 Thread via GitHub
dongjoon-hyun opened a new pull request, #49478: URL: https://github.com/apache/spark/pull/49478 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### H

Re: [PR] [SPARK-50807][BUILD] Upgrade Scala to 2.13.16 [spark]

2025-01-13 Thread via GitHub
dongjoon-hyun commented on PR #49478: URL: https://github.com/apache/spark/pull/49478#issuecomment-2588902018 cc @LuciferYang and @panbingkun -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] [SPARK-50790][PYTHON] Implement parse json in pyspark [spark]

2025-01-13 Thread via GitHub
cloud-fan commented on code in PR #49450: URL: https://github.com/apache/spark/pull/49450#discussion_r1912799260 ## python/pyspark/sql/variant_utils.py: ## @@ -140,6 +157,15 @@ def to_python(cls, value: bytes, metadata: bytes) -> str: """ return cls._to_python(

Re: [PR] [SPARK-50624][SQL] Add TimestampNTZType to ColumnarRow/MutableColumnarRow [spark]

2025-01-13 Thread via GitHub
cloud-fan commented on PR #49437: URL: https://github.com/apache/spark/pull/49437#issuecomment-2586446314 thanks, merging to master/3.5! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specif

Re: [PR] [SPARK-50624][SQL] Add TimestampNTZType to ColumnarRow/MutableColumnarRow [spark]

2025-01-13 Thread via GitHub
cloud-fan closed pull request #49437: [SPARK-50624][SQL] Add TimestampNTZType to ColumnarRow/MutableColumnarRow URL: https://github.com/apache/spark/pull/49437 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [PR] [SPARK-50612][SQL][FOLLOWUP] Put normalization of inner project lists under a flag [spark]

2025-01-13 Thread via GitHub
mihailotim-db commented on PR #49285: URL: https://github.com/apache/spark/pull/49285#issuecomment-2586457454 @MaxGekk This change is not required after #49319 and #49334. Underlying change will be removed here #49460. Closing this PR -- This is an automated message from the Apache Git Se

Re: [PR] [SPARK-50612][SQL][FOLLOWUP] Put normalization of inner project lists under a flag [spark]

2025-01-13 Thread via GitHub
mihailotim-db closed pull request #49285: [SPARK-50612][SQL][FOLLOWUP] Put normalization of inner project lists under a flag URL: https://github.com/apache/spark/pull/49285 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

Re: [PR] [SPARK-50525][SQL][TESTS][FOLLOWUP] Fix `DataFrameSuite.repartition by MapType` test assumption [spark]

2025-01-13 Thread via GitHub
ostronaut commented on PR #49457: URL: https://github.com/apache/spark/pull/49457#issuecomment-2586524808 Thank you @dongjoon-hyun for fixing it! I was not aware that `SPARK_ANSI_SQL_MODE` might cause some issues. -- This is an automated message from the Apache Git Service. To respond to

Re: [PR] [SPARK-50792][SQL] Format binary data as a binary literal in JDBC. [spark]

2025-01-13 Thread via GitHub
sunxiaoguang commented on code in PR #49452: URL: https://github.com/apache/spark/pull/49452#discussion_r1913063805 ## sql/core/src/main/scala/org/apache/spark/sql/jdbc/OracleDialect.scala: ## @@ -61,6 +61,34 @@ private case class OracleDialect() extends JdbcDialect with SQLCon

Re: [I] Support Struct Conversion when reading Arrow data [spark-connect-go]

2025-01-13 Thread via GitHub
grundprinzip closed issue #114: Support Struct Conversion when reading Arrow data URL: https://github.com/apache/spark-connect-go/issues/114 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the speci

Re: [PR] Add Support for Struct Conversion when reading Arrow data [spark-connect-go]

2025-01-13 Thread via GitHub
grundprinzip closed pull request #115: Add Support for Struct Conversion when reading Arrow data URL: https://github.com/apache/spark-connect-go/pull/115 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [PR] [SPARK-50792][SQL] Format binary data as a binary literal in JDBC. [spark]

2025-01-13 Thread via GitHub
sunxiaoguang commented on code in PR #49452: URL: https://github.com/apache/spark/pull/49452#discussion_r1912932203 ## connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/V2JDBCTest.scala: ## @@ -986,4 +986,23 @@ private[v2] trait V2JDBCTest extends Sh

Re: [PR] [SPARK-50792][SQL] Format binary data as a binary literal in JDBC. [spark]

2025-01-13 Thread via GitHub
sunxiaoguang commented on code in PR #49452: URL: https://github.com/apache/spark/pull/49452#discussion_r1912932203 ## connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/V2JDBCTest.scala: ## @@ -986,4 +986,23 @@ private[v2] trait V2JDBCTest extends Sh

Re: [PR] [SPARK-50735][CONNECT] Failure in ExecuteResponseObserver results in infinite reattaching requests [spark]

2025-01-13 Thread via GitHub
juliuszsompolski commented on code in PR #49370: URL: https://github.com/apache/spark/pull/49370#discussion_r1913280117 ## sql/connect/server/src/main/scala/org/apache/spark/sql/connect/execution/ExecuteThreadRunner.scala: ## @@ -63,6 +63,16 @@ private[connect] class ExecuteThre

Re: [PR] [SPARK-48530][SQL] Support for local variables in SQL Scripting [spark]

2025-01-13 Thread via GitHub
davidm-db commented on code in PR #49445: URL: https://github.com/apache/spark/pull/49445#discussion_r1913293308 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveCatalogs.scala: ## @@ -73,28 +93,39 @@ class ResolveCatalogs(val catalogManager: CatalogM

Re: [PR] [SPARK-48530][SQL] Support for local variables in SQL Scripting [spark]

2025-01-13 Thread via GitHub
srielau commented on code in PR #49445: URL: https://github.com/apache/spark/pull/49445#discussion_r1913294290 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ColumnResolutionHelper.scala: ## @@ -266,22 +268,40 @@ trait ColumnResolutionHelper extends Loggin

Re: [PR] [SPARK-48530][SQL] Support for local variables in SQL Scripting [spark]

2025-01-13 Thread via GitHub
srielau commented on code in PR #49445: URL: https://github.com/apache/spark/pull/49445#discussion_r1913307900 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveCatalogs.scala: ## @@ -73,28 +93,39 @@ class ResolveCatalogs(val catalogManager: CatalogMan

Re: [PR] [SPARK-47081][CONNECT] Support Query Execution Progress [spark]

2025-01-13 Thread via GitHub
juliuszsompolski commented on code in PR #45150: URL: https://github.com/apache/spark/pull/45150#discussion_r1913310090 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/execution/ExecuteGrpcResponseSender.scala: ## @@ -201,9 +237,18 @@ private[connect] cla

Re: [PR] [SPARK-50790][PYTHON] Implement parse json in pyspark [spark]

2025-01-13 Thread via GitHub
gene-db commented on code in PR #49450: URL: https://github.com/apache/spark/pull/49450#discussion_r1913484164 ## python/pyspark/sql/variant_utils.py: ## @@ -140,6 +157,15 @@ def to_python(cls, value: bytes, metadata: bytes) -> str: """ return cls._to_python(va

Re: [PR] [SPARK-50735][CONNECT] Failure in ExecuteResponseObserver results in infinite reattaching requests [spark]

2025-01-13 Thread via GitHub
changgyoopark-db commented on code in PR #49370: URL: https://github.com/apache/spark/pull/49370#discussion_r1913527486 ## sql/connect/server/src/main/scala/org/apache/spark/sql/connect/execution/ExecuteGrpcResponseSender.scala: ## @@ -319,7 +325,14 @@ private[connect] class Exe

[PR] [SPARK-50802] Remove ApplyCharTypePadding rule [spark]

2025-01-13 Thread via GitHub
jovanm-db opened a new pull request, #49470: URL: https://github.com/apache/spark/pull/49470 ### What changes were proposed in this pull request? Removal of `ApplyCharTypePadding` rule. ### Why are the changes needed? ### Does this PR introduce _any_ u

Re: [PR] [SPARK-48745][INFRA][PYTHON][TESTS][FOLLOWUP] use `conda-incubator/setup-miniconda` action [spark]

2025-01-13 Thread via GitHub
dongjoon-hyun commented on code in PR #49465: URL: https://github.com/apache/spark/pull/49465#discussion_r1913539463 ## .github/workflows/build_and_test.yml: ## @@ -600,19 +600,16 @@ jobs: done - name: Install Conda for pip packaging test if: contains(matrix

Re: [PR] [SPARK-50762][SQL] Add Analyzer rule for resolving SQL scalar UDFs [spark]

2025-01-13 Thread via GitHub
allisonwang-db commented on code in PR #49414: URL: https://github.com/apache/spark/pull/49414#discussion_r1913629542 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala: ## @@ -2363,6 +2364,278 @@ class Analyzer(override val catalogManager: Cat

Re: [PR] [SPARK-50735][CONNECT] Failure in ExecuteResponseObserver results in infinite reattaching requests [spark]

2025-01-13 Thread via GitHub
juliuszsompolski commented on code in PR #49370: URL: https://github.com/apache/spark/pull/49370#discussion_r1913629950 ## sql/connect/server/src/main/scala/org/apache/spark/sql/connect/service/ExecuteHolder.scala: ## @@ -126,6 +126,18 @@ private[connect] class ExecuteHolder(

Re: [PR] [SPARK-50735][CONNECT] Failure in ExecuteResponseObserver results in infinite reattaching requests [spark]

2025-01-13 Thread via GitHub
juliuszsompolski commented on code in PR #49370: URL: https://github.com/apache/spark/pull/49370#discussion_r1913629950 ## sql/connect/server/src/main/scala/org/apache/spark/sql/connect/service/ExecuteHolder.scala: ## @@ -126,6 +126,18 @@ private[connect] class ExecuteHolder(

Re: [PR] [SPARK-50382][CONNECT] Add documentation for general information on application development with/extending Spark Connect [spark]

2025-01-13 Thread via GitHub
vicennial commented on PR #48922: URL: https://github.com/apache/spark/pull/48922#issuecomment-2588090753 @nchammas Not completely. There is more information in https://github.com/apache/spark/pull/45340 that makes quite a lot of sense to include, especially about the bits that explain the

Re: [PR] [SPARK-50791][SQL] Fix NPE in State Store error handling [spark]

2025-01-13 Thread via GitHub
liviazhu-db commented on code in PR #49451: URL: https://github.com/apache/spark/pull/49451#discussion_r1913750631 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/HDFSBackedStateStoreProvider.scala: ## @@ -291,7 +291,8 @@ private[sql] class HDFSBackedSt

  1   2   3   >