Re: [PR] [SPARK-51855] Support `Spark SQL REPL` [spark-connect-swift]

2025-04-21 Thread via GitHub
dongjoon-hyun commented on PR #77: URL: https://github.com/apache/spark-connect-swift/pull/77#issuecomment-2817785855 Merged to main. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-49008][PYTHON] Use `ParamSpec` to propagate `func` signature in `transform` [spark]

2025-04-21 Thread via GitHub
zhengruifeng commented on PR #47493: URL: https://github.com/apache/spark/pull/47493#issuecomment-2817856072 > Python 3.9 is being EOL soon. Can we drop Python 3.9 in the master branch, and land this change? I think we need separate PRs to drop python 3.9 first. -- This is an autom

[PR] [SPARK-51856] Add API for estimate saved model size [spark]

2025-04-21 Thread via GitHub
WeichenXu123 opened a new pull request, #50652: URL: https://github.com/apache/spark/pull/50652 ### What changes were proposed in this pull request? Add API for estimate saved model size ### Why are the changes needed? For Spark server ML cache management.

Re: [PR] [SPARK-51663][SQL][FOLLOWUP] change buildLeft and buildRight to function [spark]

2025-04-21 Thread via GitHub
beliefer closed pull request #50636: [SPARK-51663][SQL][FOLLOWUP] change buildLeft and buildRight to function URL: https://github.com/apache/spark/pull/50636 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above t

Re: [PR] [SPARK-51663][SQL][FOLLOWUP] change buildLeft and buildRight to function [spark]

2025-04-21 Thread via GitHub
beliefer commented on PR #50636: URL: https://github.com/apache/spark/pull/50636#issuecomment-2817974167 @cloud-fan @yaooqinn @LuciferYang Thank you all. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

Re: [PR] [SPARK-51814][SS][PYTHON][FOLLLOW-UP] Use RecordBatch.schema.names instead of column_names for old pyarrow compatibility [spark]

2025-04-21 Thread via GitHub
HyukjinKwon commented on PR #50658: URL: https://github.com/apache/spark/pull/50658#issuecomment-2820016412 cc @HeartSaVioR -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[PR] [SPARK-51774][CONNECT][FOLLOW-UP][TESTS] Skip ConnectErrorsTest if grpc is not available [spark]

2025-04-21 Thread via GitHub
HyukjinKwon opened a new pull request, #50659: URL: https://github.com/apache/spark/pull/50659 ### What changes were proposed in this pull request? This PR proposes to skip `ConnectErrorsTest` if `grpc` is not available. ### Why are the changes needed? To recover the sche

Re: [PR] [SPARK-51688][PYTHON][FOLLOW-UP] Use `socketPath.get` for logging instead of `socketHost.get` [spark]

2025-04-21 Thread via GitHub
HyukjinKwon closed pull request #50657: [SPARK-51688][PYTHON][FOLLOW-UP] Use `socketPath.get` for logging instead of `socketHost.get` URL: https://github.com/apache/spark/pull/50657 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[PR] [SPARK-51688][PYTHON][FOLLOW-UP] Use `socketPath.get` for logging instead of `socketHost.get` [spark]

2025-04-21 Thread via GitHub
HyukjinKwon opened a new pull request, #50657: URL: https://github.com/apache/spark/pull/50657 ### What changes were proposed in this pull request? This PR is a followup of https://github.com/apache/spark/pull/50648 that uses `socketPath.get` for logging instead of `socketHost.get`

[PR] [SPARK-51814][SS][PYTHON][FOLLLOW-UP] Use RecordBatch.schema.names instead of column_names for old pyarrow compatibility [spark]

2025-04-21 Thread via GitHub
HyukjinKwon opened a new pull request, #50658: URL: https://github.com/apache/spark/pull/50658 ### What changes were proposed in this pull request? This PR is a followup of https://github.com/apache/spark/pull/50600 that proposes to use `RecordBatch.schema.names` instead of `column_na

Re: [PR] [SPARK-51860][CONNECT] Disable `spark.connect.grpc.debug.enabled` by default [spark]

2025-04-21 Thread via GitHub
dongjoon-hyun commented on code in PR #50655: URL: https://github.com/apache/spark/pull/50655#discussion_r2053232077 ## sql/connect/server/src/main/scala/org/apache/spark/sql/connect/service/SparkConnectService.scala: ## @@ -363,7 +363,7 @@ object SparkConnectService extends Log

[PR] [SPARK-51860][CONNECT] Disable `spark.connect.grpc.debug.enabled` by default [spark]

2025-04-21 Thread via GitHub
dongjoon-hyun opened a new pull request, #50655: URL: https://github.com/apache/spark/pull/50655 … ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change?

[PR] [SPARK-51861][SQL][UI] Remove duplicated/unnecessary info InMemoryRelation Plan Detail [spark]

2025-04-21 Thread via GitHub
yaooqinn opened a new pull request, #50656: URL: https://github.com/apache/spark/pull/50656 ### What changes were proposed in this pull request? Remove duplicated info in InMemoryRelation Plan Detail - serializer, a dev-only api - cachedPlan, duplicated - logicalPlan, not

Re: [PR] [SPARK-51861][SQL][UI] Remove duplicated/unnecessary info InMemoryRelation Plan Detail [spark]

2025-04-21 Thread via GitHub
yaooqinn commented on PR #50656: URL: https://github.com/apache/spark/pull/50656#issuecomment-2819953020 This is also related to a change in 4.0.0—https://issues.apache.org/jira/browse/SPARK-49982—which makes the UI more verbose. Thus, if appropriate, please consider including this in the n

Re: [PR] [SPARK-51814][SS][PYTHON][FOLLLOW-UP] Use RecordBatch.schema.names instead of column_names for old pyarrow compatibility [spark]

2025-04-21 Thread via GitHub
HyukjinKwon commented on PR #50658: URL: https://github.com/apache/spark/pull/50658#issuecomment-2820025285 It's a post merge CI so we can't capture it in PR builder 😢 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use t

Re: [PR] [SPARK-51814][SS][PYTHON][FOLLLOW-UP] Use RecordBatch.schema.names instead of column_names for old pyarrow compatibility [spark]

2025-04-21 Thread via GitHub
HyukjinKwon commented on PR #50658: URL: https://github.com/apache/spark/pull/50658#issuecomment-2820025419 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] [SPARK-51758][SS] Fix test case related to extra batch causing empty df due to watermark [spark]

2025-04-21 Thread via GitHub
HeartSaVioR commented on PR #50626: URL: https://github.com/apache/spark/pull/50626#issuecomment-2820026062 @LuciferYang Sorry I just saw this. I've asked @anishshri-db to take a look. Btw do we only see the failure from that setup, e.g. only Java 21? -- This is an automated message from

Re: [PR] [SPARK-51814][SS][PYTHON][FOLLLOW-UP] Use RecordBatch.schema.names instead of column_names for old pyarrow compatibility [spark]

2025-04-21 Thread via GitHub
HyukjinKwon closed pull request #50658: [SPARK-51814][SS][PYTHON][FOLLLOW-UP] Use RecordBatch.schema.names instead of column_names for old pyarrow compatibility URL: https://github.com/apache/spark/pull/50658 -- This is an automated message from the Apache Git Service. To respond to the mess

Re: [PR] [SPARK-51774][CONNECT][FOLLOW-UP][TESTS] Skip ConnectErrorsTest if grpc is not available [spark]

2025-04-21 Thread via GitHub
HyukjinKwon closed pull request #50659: [SPARK-51774][CONNECT][FOLLOW-UP][TESTS] Skip ConnectErrorsTest if grpc is not available URL: https://github.com/apache/spark/pull/50659 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

Re: [PR] [SPARK-51774][CONNECT][FOLLOW-UP][TESTS] Skip ConnectErrorsTest if grpc is not available [spark]

2025-04-21 Thread via GitHub
HyukjinKwon commented on PR #50659: URL: https://github.com/apache/spark/pull/50659#issuecomment-2820028727 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] [SPARK-49488][SQL][FOLLOWUP] Do not push down extract expression if extracted field is second [spark]

2025-04-21 Thread via GitHub
beliefer commented on PR #50637: URL: https://github.com/apache/spark/pull/50637#issuecomment-2819895990 @cloud-fan Thank you! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment

Re: [PR] [SPARK-51838][PYTHON][TESTS][FOLLWO-UP] Ignore typing module in `test_wildcard_import` [spark]

2025-04-21 Thread via GitHub
zhengruifeng closed pull request #50653: [SPARK-51838][PYTHON][TESTS][FOLLWO-UP] Ignore typing module in `test_wildcard_import` URL: https://github.com/apache/spark/pull/50653 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

Re: [PR] [SPARK-51838][PYTHON][TESTS][FOLLWO-UP] Ignore typing module in `test_wildcard_import` [spark]

2025-04-21 Thread via GitHub
zhengruifeng commented on PR #50653: URL: https://github.com/apache/spark/pull/50653#issuecomment-2819858279 merged to master -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] [SPARK-51779] [SS] Use virtual column families for stream-stream joins [spark]

2025-04-21 Thread via GitHub
HeartSaVioR commented on PR #50572: URL: https://github.com/apache/spark/pull/50572#issuecomment-2819858167 Thanks! Merging to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-51860][CONNECT] Disable `spark.connect.grpc.debug.enabled` by default [spark]

2025-04-21 Thread via GitHub
dongjoon-hyun commented on PR #50655: URL: https://github.com/apache/spark/pull/50655#issuecomment-2819895648 WDYT, @martin-g , @hvanhovell , @cloud-fan , @HyukjinKwon , @zhengruifeng ? For now, I created SPARK-51860 as a subtask of SPARK-44111. If it's too late for Apache Spark 4.0.0

Re: [PR] [SPARK-51758][SS] Fix test case related to extra batch causing empty df due to watermark [spark]

2025-04-21 Thread via GitHub
HeartSaVioR commented on PR #50626: URL: https://github.com/apache/spark/pull/50626#issuecomment-2820180097 Uh wait, do you know about which classes are printed as DEBUG? I don't know we expect this during testing. -- This is an automated message from the Apache Git Service. To respond to

Re: [PR] [SPARK-51860][CONNECT] Disable `spark.connect.grpc.debug.enabled` by default [spark]

2025-04-21 Thread via GitHub
dongjoon-hyun commented on PR #50655: URL: https://github.com/apache/spark/pull/50655#issuecomment-2820073407 Thank you, @HyukjinKwon and @yaooqinn ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[PR] [SPARK-51862][ML][CONNECT][TESTS] Clean up ml cache before ReusedConnectTestCase and ReusedMixedTestCase [spark]

2025-04-21 Thread via GitHub
zhengruifeng opened a new pull request, #50660: URL: https://github.com/apache/spark/pull/50660 ### What changes were proposed in this pull request? Clean up ml cache before ReusedConnectTestCase and ReusedMixedTestCase ### Why are the changes needed? to make sure the

Re: [PR] [SPARK-51758][SS] Fix test case related to extra batch causing empty df due to watermark [spark]

2025-04-21 Thread via GitHub
LuciferYang commented on PR #50626: URL: https://github.com/apache/spark/pull/50626#issuecomment-2820071296 > @LuciferYang - is it possible to increase disk space for this runner ? There should be no way to increase it, but there is a script for cleaning up unnecessary packages.

Re: [PR] [SPARK-51779] [SS] Use virtual column families for stream-stream joins [spark]

2025-04-21 Thread via GitHub
HeartSaVioR closed pull request #50572: [SPARK-51779] [SS] Use virtual column families for stream-stream joins URL: https://github.com/apache/spark/pull/50572 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[PR] [SPARK-51863] Support `join` and `crossJoin` in `DataFrame` [spark-connect-swift]

2025-04-21 Thread via GitHub
dongjoon-hyun opened a new pull request, #80: URL: https://github.com/apache/spark-connect-swift/pull/80 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change?

Re: [PR] [MINOR] rename the spark-connect-enabled tarball [spark]

2025-04-21 Thread via GitHub
yaooqinn closed pull request #50654: [MINOR] rename the spark-connect-enabled tarball URL: https://github.com/apache/spark/pull/50654 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific com

Re: [PR] [MINOR] rename the spark-connect-enabled tarball [spark]

2025-04-21 Thread via GitHub
yaooqinn commented on PR #50654: URL: https://github.com/apache/spark/pull/50654#issuecomment-2819988406 Merged to master and 4.0.0, thank you @cloud-fan @HyukjinKwon @dongjoon-hyun -- This is an automated message from the Apache Git Service. To respond to the message, please log on to Gi

Re: [PR] [SPARK-51860][CONNECT] Disable `spark.connect.grpc.debug.enabled` by default [spark]

2025-04-21 Thread via GitHub
yaooqinn closed pull request #50655: [SPARK-51860][CONNECT] Disable `spark.connect.grpc.debug.enabled` by default URL: https://github.com/apache/spark/pull/50655 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abo

Re: [PR] [SPARK-51860][CONNECT] Disable `spark.connect.grpc.debug.enabled` by default [spark]

2025-04-21 Thread via GitHub
yaooqinn commented on PR #50655: URL: https://github.com/apache/spark/pull/50655#issuecomment-2819991324 Thank you @dongjoon-hyun @HyukjinKwon Merged to master and 4.0.0 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

Re: [PR] [SPARK-51758][SS] Fix test case related to extra batch causing empty df due to watermark [spark]

2025-04-21 Thread via GitHub
LuciferYang commented on PR #50626: URL: https://github.com/apache/spark/pull/50626#issuecomment-2820226551 It seems that all classes are printing debug information, including some third-party packages. This change likely started occurring 4 days ago, as the compressed log files that failed

Re: [PR] [SPARK-51861][SQL][UI] Remove duplicated/unnecessary info of InMemoryRelation Plan Detail [spark]

2025-04-21 Thread via GitHub
yaooqinn closed pull request #50656: [SPARK-51861][SQL][UI] Remove duplicated/unnecessary info of InMemoryRelation Plan Detail URL: https://github.com/apache/spark/pull/50656 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and us

Re: [PR] [SPARK-51861][SQL][UI] Remove duplicated/unnecessary info of InMemoryRelation Plan Detail [spark]

2025-04-21 Thread via GitHub
yaooqinn commented on PR #50656: URL: https://github.com/apache/spark/pull/50656#issuecomment-2820117953 Merged to master and 4.0.0, thank you @dongjoon-hyun -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abo

Re: [PR] [SPARK-51758][SS] Fix test case related to extra batch causing empty df due to watermark [spark]

2025-04-21 Thread via GitHub
anishshri-db commented on PR #50626: URL: https://github.com/apache/spark/pull/50626#issuecomment-2820143197 @LuciferYang - I think the tests being run are not the same across these nightly runs (also seems very unlikely that its related to this change - since I just basically added a Pytho

Re: [PR] [SPARK-51758][SS] Fix test case related to extra batch causing empty df due to watermark [spark]

2025-04-21 Thread via GitHub
LuciferYang commented on PR #50626: URL: https://github.com/apache/spark/pull/50626#issuecomment-2820143914 @HyukjinKwon @HeartSaVioR @zhengruifeng Do you know which log4j configuration file is used by the corresponding Java process during the PySpark test? I found that although the

Re: [PR] [SPARK-51758][SS] Fix test case related to extra batch causing empty df due to watermark [spark]

2025-04-21 Thread via GitHub
anishshri-db commented on PR #50626: URL: https://github.com/apache/spark/pull/50626#issuecomment-2820051466 @HeartSaVioR - yes will take a look and check further. I do see disk space issues but not sure if its related to the failure ``` 2025-04-19T06:49:46.7849441Z ##[warnin

Re: [PR] [SPARK-51758][SS] Fix test case related to extra batch causing empty df due to watermark [spark]

2025-04-21 Thread via GitHub
anishshri-db commented on PR #50626: URL: https://github.com/apache/spark/pull/50626#issuecomment-2820052221 @LuciferYang - is it possible to increase disk space for this runner ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHu

Re: [PR] [SPARK-51758][SS] Fix test case related to extra batch causing empty df due to watermark [spark]

2025-04-21 Thread via GitHub
LuciferYang commented on PR #50626: URL: https://github.com/apache/spark/pull/50626#issuecomment-2820059032 > @LuciferYang Sorry I just saw this. I've asked @anishshri-db to take a look. Btw do we only see the failure from that setup, e.g. only Java 21? @HeartSaVioR Yes, currently I h

[PR] [SPARK-51858] Support `SPARK_REMOTE` [spark-connect-swift]

2025-04-21 Thread via GitHub
dongjoon-hyun opened a new pull request, #79: URL: https://github.com/apache/spark-connect-swift/pull/79 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change?

Re: [PR] [SPARK-51779] [SS] Use virtual column families for stream-stream joins [spark]

2025-04-21 Thread via GitHub
zecookiez commented on code in PR #50572: URL: https://github.com/apache/spark/pull/50572#discussion_r2052737949 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/OperatorStateMetadata.scala: ## @@ -464,3 +465,11 @@ class OperatorStateMetadataV2FileManage

Re: [PR] [SPARK-51779] [SS] Use virtual column families for stream-stream joins [spark]

2025-04-21 Thread via GitHub
zecookiez commented on code in PR #50572: URL: https://github.com/apache/spark/pull/50572#discussion_r2052735734 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamingSymmetricHashJoinExec.scala: ## @@ -307,19 +349,23 @@ case class StreamingSymmetricHashJo

Re: [PR] [SPARK-51779] [SS] Use virtual column families for stream-stream joins [spark]

2025-04-21 Thread via GitHub
zecookiez commented on code in PR #50572: URL: https://github.com/apache/spark/pull/50572#discussion_r2052760583 ## sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala: ## @@ -2569,10 +2569,11 @@ object SQLConf { .internal() .doc("State format v

Re: [PR] [SPARK-51243][CORE][ML] Configurable allow native BLAS [spark]

2025-04-21 Thread via GitHub
pan3793 commented on PR #49986: URL: https://github.com/apache/spark/pull/49986#issuecomment-2817990487 Kindly ping @WeichenXu123 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comm

Re: [PR] [SPARK-51845][ML][CONNECT] Add proto messages `CleanCache` and `GetCacheInfo` [spark]

2025-04-21 Thread via GitHub
zhengruifeng closed pull request #50643: [SPARK-51845][ML][CONNECT] Add proto messages `CleanCache` and `GetCacheInfo` URL: https://github.com/apache/spark/pull/50643 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the UR

Re: [PR] [SPARK-51845][ML][CONNECT] Add proto messages `CleanCache` and `GetCacheInfo` [spark]

2025-04-21 Thread via GitHub
zhengruifeng commented on PR #50643: URL: https://github.com/apache/spark/pull/50643#issuecomment-2818317759 merged to master, thanks all -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the speci

[PR] [MINOR] rename the spark-connect-enabled tarball [spark]

2025-04-21 Thread via GitHub
cloud-fan opened a new pull request, #50654: URL: https://github.com/apache/spark/pull/50654 ### What changes were proposed in this pull request? Looking at the 4.0 release tarballs, we have `pyspark_connect-4.0.0.tar.gz` and `spark-4.0.0-bin-hadoop3-spark-connect.tgz`. This P

Re: [PR] [SPARK-51857] Support `token/userId/userAgent` parameters in `SparkConnectClient` [spark-connect-swift]

2025-04-21 Thread via GitHub
dongjoon-hyun commented on PR #78: URL: https://github.com/apache/spark-connect-swift/pull/78#issuecomment-2818531765 Merged to main~ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-51857] Support `token/userId/userAgent` parameters in `SparkConnectClient` [spark-connect-swift]

2025-04-21 Thread via GitHub
dongjoon-hyun closed pull request #78: [SPARK-51857] Support `token/userId/userAgent` parameters in `SparkConnectClient` URL: https://github.com/apache/spark-connect-swift/pull/78 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub a

[PR] Bump google.golang.org/grpc from 1.71.0 to 1.72.0 [spark-connect-go]

2025-04-21 Thread via GitHub
dependabot[bot] opened a new pull request, #137: URL: https://github.com/apache/spark-connect-go/pull/137 Bumps [google.golang.org/grpc](https://github.com/grpc/grpc-go) from 1.71.0 to 1.72.0. Release notes Sourced from https://github.com/grpc/grpc-go/releases";>google.golang.org/g

Re: [PR] Bump google.golang.org/grpc from 1.71.0 to 1.71.1 [spark-connect-go]

2025-04-21 Thread via GitHub
dependabot[bot] commented on PR #135: URL: https://github.com/apache/spark-connect-go/pull/135#issuecomment-2818557352 Superseded by #137. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spec

Re: [PR] Bump google.golang.org/grpc from 1.71.0 to 1.71.1 [spark-connect-go]

2025-04-21 Thread via GitHub
dependabot[bot] closed pull request #135: Bump google.golang.org/grpc from 1.71.0 to 1.71.1 URL: https://github.com/apache/spark-connect-go/pull/135 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to t

Re: [PR] [SPARK-51779] [SS] Use virtual column families for stream-stream joins [spark]

2025-04-21 Thread via GitHub
zecookiez commented on code in PR #50572: URL: https://github.com/apache/spark/pull/50572#discussion_r2052775692 ## sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamingJoinSuite.scala: ## @@ -1581,7 +1844,8 @@ class StreamingOuterJoinSuite extends StreamingJoinSuite

Re: [PR] [SPARK-50983][SQL]Part 1.a Add nested outer attributes for SubqueryExpression [spark]

2025-04-21 Thread via GitHub
AveryQi115 commented on code in PR #50285: URL: https://github.com/apache/spark/pull/50285#discussion_r2052940646 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/subquery.scala: ## @@ -67,6 +67,8 @@ abstract class PlanExpression[T <: QueryPlan[_]] extend

Re: [PR] [SPARK-51779] [SS] Use virtual column families for stream-stream joins [spark]

2025-04-21 Thread via GitHub
HeartSaVioR commented on code in PR #50572: URL: https://github.com/apache/spark/pull/50572#discussion_r2052936433 ## sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamingJoinSuite.scala: ## @@ -1097,11 +1098,8 @@ class StreamingInnerJoinSuite extends StreamingJoinSui

Re: [PR] [SPARK-50983][SQL]Part 1.a Add nested outer attributes for SubqueryExpression [spark]

2025-04-21 Thread via GitHub
AveryQi115 commented on code in PR #50285: URL: https://github.com/apache/spark/pull/50285#discussion_r2052977353 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/FunctionTableSubqueryArgumentExpression.scala: ## @@ -86,11 +88,20 @@ case class FunctionTab

Re: [PR] [SPARK-29397][core] Extend plugin interface to include the driver. [spark]

2025-04-21 Thread via GitHub
Madhukar525722 commented on PR #26170: URL: https://github.com/apache/spark/pull/26170#issuecomment-2819455084 Hi @vanzin @HyukjinKwon There is a statement When a Spark plugin provides an executor plugin, this method will be called during the initialization of the executor process. It

Re: [PR] [SPARK-51758][SS] Fix test case related to extra batch causing empty df due to watermark [spark]

2025-04-21 Thread via GitHub
LuciferYang commented on PR #50626: URL: https://github.com/apache/spark/pull/50626#issuecomment-2818025375 @anishshri-db @HeartSaVioR @zhengruifeng Could you please take a moment to help verify whether the failures in the daily test pipeline for `branch-4.0, Scala 2.13, Hadoop 3, JDK 21` o

Re: [PR] [SPARK-51782] Add `build-ubuntu-arm` test pipeline [spark-connect-swift]

2025-04-21 Thread via GitHub
dongjoon-hyun commented on PR #52: URL: https://github.com/apache/spark-connect-swift/pull/52#issuecomment-2818091638 Let me merge this because this is just an additional test pipeline. -- This is an automated message from the Apache Git Service. To respond to the message, please log on t

Re: [PR] [SPARK-51782] Add `build-ubuntu-arm` test pipeline [spark-connect-swift]

2025-04-21 Thread via GitHub
dongjoon-hyun closed pull request #52: [SPARK-51782] Add `build-ubuntu-arm` test pipeline URL: https://github.com/apache/spark-connect-swift/pull/52 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to t

Re: [PR] [MINOR][DOCS] Minor update to example [spark]

2025-04-21 Thread via GitHub
dongjoon-hyun commented on PR #50647: URL: https://github.com/apache/spark/pull/50647#issuecomment-2818112663 Just a question for my understanding. - Is this based on the official Apache Parquet community CVE announcement? - If then, could you provide the Apache Parquet Security website

Re: [PR] [SPARK-48585][SQL] Make `built-in` JdbcDialect's method `classifyException` throw out the `original` exception [spark]

2025-04-21 Thread via GitHub
cloud-fan commented on code in PR #46937: URL: https://github.com/apache/spark/pull/46937#discussion_r2052401492 ## sql/core/src/main/scala/org/apache/spark/sql/jdbc/JdbcDialects.scala: ## @@ -841,6 +841,23 @@ abstract class JdbcDialect extends Serializable with Logging {

Re: [PR] [SPARK-51857] Support `token/userId/userAgent` parameters in `SparkConnectClient` [spark-connect-swift]

2025-04-21 Thread via GitHub
viirya commented on code in PR #78: URL: https://github.com/apache/spark-connect-swift/pull/78#discussion_r2052408643 ## Sources/SparkConnect/SparkSession.swift: ## @@ -39,15 +38,8 @@ public actor SparkSession { /// Create a session that uses the specified connection string

[PR] [SPARK-51838][PYTHON][TESTS][FOLLWO-UP] Skip typing module in `test_wildcard_import` [spark]

2025-04-21 Thread via GitHub
zhengruifeng opened a new pull request, #50653: URL: https://github.com/apache/spark/pull/50653 ### What changes were proposed in this pull request? Skip typing module in `test_wildcard_import` ### Why are the changes needed? to improve test coverage ### Does t

Re: [PR] [SPARK-51857] Support `token/userId/userAgent` parameters in `SparkConnectClient` [spark-connect-swift]

2025-04-21 Thread via GitHub
dongjoon-hyun commented on code in PR #78: URL: https://github.com/apache/spark-connect-swift/pull/78#discussion_r2052459910 ## Sources/SparkConnect/SparkSession.swift: ## @@ -39,15 +38,8 @@ public actor SparkSession { /// Create a session that uses the specified connection

Re: [PR] [SPARK-51858] Support `SPARK_REMOTE` [spark-connect-swift]

2025-04-21 Thread via GitHub
dongjoon-hyun commented on PR #79: URL: https://github.com/apache/spark-connect-swift/pull/79#issuecomment-2818653122 Thank you, @viirya ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spe

Re: [PR] [SPARK-51858] Support `SPARK_REMOTE` [spark-connect-swift]

2025-04-21 Thread via GitHub
dongjoon-hyun closed pull request #79: [SPARK-51858] Support `SPARK_REMOTE` URL: https://github.com/apache/spark-connect-swift/pull/79 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific com

Re: [PR] [SPARK-51858] Support `SPARK_REMOTE` [spark-connect-swift]

2025-04-21 Thread via GitHub
dongjoon-hyun commented on PR #79: URL: https://github.com/apache/spark-connect-swift/pull/79#issuecomment-2818677048 Merged to main. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-50983][SQL]Part 1.a Add nested outer attributes for SubqueryExpression [spark]

2025-04-21 Thread via GitHub
agubichev commented on code in PR #50285: URL: https://github.com/apache/spark/pull/50285#discussion_r2053056229 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/subquery.scala: ## @@ -67,6 +67,8 @@ abstract class PlanExpression[T <: QueryPlan[_]] extends

Re: [PR] [SPARK-49488][SQL][FOLLOWUP] Do not push down extract expression if extracted field is second [spark]

2025-04-21 Thread via GitHub
cloud-fan commented on PR #50637: URL: https://github.com/apache/spark/pull/50637#issuecomment-2819746887 thanks, merging to master/4.0! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specif

Re: [PR] [SPARK-49488][SQL][FOLLOWUP] Do not push down extract expression if extracted field is second [spark]

2025-04-21 Thread via GitHub
cloud-fan closed pull request #50637: [SPARK-49488][SQL][FOLLOWUP] Do not push down extract expression if extracted field is second URL: https://github.com/apache/spark/pull/50637 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub a

Re: [PR] Add New Parameter `options` to Logical Plan `View`. [spark]

2025-04-21 Thread via GitHub
HyukjinKwon commented on PR #50645: URL: https://github.com/apache/spark/pull/50645#issuecomment-2819754111 Mind adding a test and filing a JIRA please? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

Re: [PR] Logistic Matrix Factorization(LMF) and Item2Vec models [spark]

2025-04-21 Thread via GitHub
github-actions[bot] commented on PR #48681: URL: https://github.com/apache/spark/pull/48681#issuecomment-2819766282 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.

[PR] [SPARK-51857] Support `token/userId/userAgent` parameters in `SparkConnectClient` [spark-connect-swift]

2025-04-21 Thread via GitHub
dongjoon-hyun opened a new pull request, #78: URL: https://github.com/apache/spark-connect-swift/pull/78 … ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change?