[PR] [SPARK-51282][SPARK-51422][ML][FOLLOW-UP] Replace UDF with builtin functions [spark]

2025-03-18 Thread via GitHub
zhengruifeng opened a new pull request, #50321: URL: https://github.com/apache/spark/pull/50321 ### What changes were proposed in this pull request? Make scala side changes corresponding to https://github.com/apache/spark/pull/50041 and https://github.com/apache/spark/pull/50184

Re: [PR] [SPARK-51549][BUILD] Bump Parquet 1.15.1 [spark]

2025-03-18 Thread via GitHub
LuciferYang commented on PR #50319: URL: https://github.com/apache/spark/pull/50319#issuecomment-2735427062 The new version seems to include two bug fixes: - https://github.com/apache/parquet-java/issues/3172 - https://github.com/apache/parquet-java/issues/3133 and I think w

[PR] Test auto set OBJC_DISABLE_INITIALIZE_FORK_SAFETY for macos [spark]

2025-03-18 Thread via GitHub
LuciferYang opened a new pull request, #50320: URL: https://github.com/apache/spark/pull/50320 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How

Re: [PR] [SPARK-51549][BUILD] Bump Parquet 1.15.1 [spark]

2025-03-18 Thread via GitHub
pan3793 commented on PR #50319: URL: https://github.com/apache/spark/pull/50319#issuecomment-2735410874 Compared to 1.15.0, only a few lines changed in this patch version, might be good to go branch-4.0? cc @cloud-fan @LuciferYang -- This is an automated message from the Apache Git Servi

[PR] [SPARK-51549][BUILD] Bump Parquet 1.15.1 [spark]

2025-03-18 Thread via GitHub
pan3793 opened a new pull request, #50319: URL: https://github.com/apache/spark/pull/50319 ### What changes were proposed in this pull request? Bump Parquet 1.15.1. ### Why are the changes needed? Release Notes https://parquet.apache.org/blog/2025/03/16/1.15.1/

Re: [PR] [SPARK-51542][UI] Add a scroll-button for addressing top and bottom [spark]

2025-03-18 Thread via GitHub
LuciferYang commented on PR #50305: URL: https://github.com/apache/spark/pull/50305#issuecomment-2735385482 Merged into master. Thanks @yaooqinn and @dongjoon-hyun -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the U

Re: [PR] [SPARK-51542][UI] Add a scroll-button for addressing top and bottom [spark]

2025-03-18 Thread via GitHub
yaooqinn commented on PR #50305: URL: https://github.com/apache/spark/pull/50305#issuecomment-2735385594 Thank you @LuciferYang and @dongjoon-hyun -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] [SPARK-51542][UI] Add a scroll-button for addressing top and bottom [spark]

2025-03-18 Thread via GitHub
LuciferYang closed pull request #50305: [SPARK-51542][UI] Add a scroll-button for addressing top and bottom URL: https://github.com/apache/spark/pull/50305 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

Re: [PR] Bump net.snowflake:snowflake-jdbc from 3.22.0 to 3.23.1 [spark]

2025-03-18 Thread via GitHub
dependabot[bot] commented on PR #50317: URL: https://github.com/apache/spark/pull/50317#issuecomment-2735376933 OK, I won't notify you again about this release, but will get in touch when a new version is available. If you'd rather skip all updates until the next major or minor version, let

Re: [PR] Bump net.snowflake:snowflake-jdbc from 3.22.0 to 3.23.1 [spark]

2025-03-18 Thread via GitHub
yaooqinn closed pull request #50317: Bump net.snowflake:snowflake-jdbc from 3.22.0 to 3.23.1 URL: https://github.com/apache/spark/pull/50317 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the speci

Re: [PR] [SPARK-51547][SQL] Assign name to the error condition: _LEGACY_ERROR_TEMP_2130 [spark]

2025-03-18 Thread via GitHub
MaxGekk commented on PR #50307: URL: https://github.com/apache/spark/pull/50307#issuecomment-2735355428 Merging to master. Thank you, @dongjoon-hyun @vrozov for review. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use t

Re: [PR] [MINIOR][PYTHON] Specifying udf type in SCHEMA_MISMATCH_FOR_PANDAS_UDF error message [spark]

2025-03-18 Thread via GitHub
viirya commented on PR #50312: URL: https://github.com/apache/spark/pull/50312#issuecomment-2735357355 Thank you @zhengruifeng -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment

Re: [PR] [SPARK-51547][SQL] Assign name to the error condition: _LEGACY_ERROR_TEMP_2130 [spark]

2025-03-18 Thread via GitHub
MaxGekk closed pull request #50307: [SPARK-51547][SQL] Assign name to the error condition: _LEGACY_ERROR_TEMP_2130 URL: https://github.com/apache/spark/pull/50307 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL ab

Re: [PR] [SPARK-51400] Replace ArrayContains nodes to InSet [spark]

2025-03-18 Thread via GitHub
beliefer commented on PR #50170: URL: https://github.com/apache/spark/pull/50170#issuecomment-2732010166 @adrians Could you provide the micro benchmarks for this optimization ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub a

Re: [PR] [SPARK-51187][SQL][SS] Introduce the migration logic of config removal from SPARK-49699 [spark]

2025-03-18 Thread via GitHub
HeartSaVioR commented on PR #50314: URL: https://github.com/apache/spark/pull/50314#issuecomment-2735299721 https://github.com/HeartSaVioR/spark/actions/runs/13936938059/job/39006581717 > SPARK-47148: AQE should avoid to submit shuffle job on cancellation *** FAILED *** (6 seconds, 93

Re: [PR] [SPARK-51430][PYTHON] Stop PySpark context logger from propagating logs to stdout [spark]

2025-03-18 Thread via GitHub
allisonwang-db commented on PR #50198: URL: https://github.com/apache/spark/pull/50198#issuecomment-2735007337 @dongjoon-hyun you still see the analysis exception ``` AnalysisException: [UNRESOLVED_COLUMN.WITH_SUGGESTION] A column, variable, or function parameter with name `id2` cannot

Re: [PR] [SPARK-50892][SQL]Add UnionLoopExec, physical operator for recursion, to perform execution of recursive queries [spark]

2025-03-18 Thread via GitHub
Pajaraja commented on code in PR #49955: URL: https://github.com/apache/spark/pull/49955#discussion_r2001302195 ## sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala: ## @@ -4537,6 +4537,22 @@ object SQLConf { .checkValues(LegacyBehaviorPolicy.values.ma

[PR] [MINOR][TESTS] Ignore messages below error level from `HadoopFSUtils` for sql module tests [spark]

2025-03-18 Thread via GitHub
LuciferYang opened a new pull request, #50316: URL: https://github.com/apache/spark/pull/50316 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How

Re: [PR] [SPARK-51542][UI] Add a scroll-button for addressing top and bottom [spark]

2025-03-18 Thread via GitHub
yaooqinn commented on PR #50305: URL: https://github.com/apache/spark/pull/50305#issuecomment-2735203999 Hi @LuciferYang, I de-jQuery'd, PTAL, again. Thank you @dongjoon-hyun, looking forward to your feedback. -- This is an automated message from the Apache Git Service. To respond t

[PR] [WIP] Fix add metadata columns [spark]

2025-03-18 Thread via GitHub
mihailotim-db opened a new pull request, #50304: URL: https://github.com/apache/spark/pull/50304 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### H

Re: [PR] [SPARK-51400] Replace ArrayContains nodes to InSet [spark]

2025-03-18 Thread via GitHub
adrians commented on code in PR #50170: URL: https://github.com/apache/spark/pull/50170#discussion_r2000685345 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala: ## @@ -298,6 +299,24 @@ object ReorderAssociativeOperator extends Rule[Logica

Re: [PR] [SPARK-51272][CORE]. Fix for the race condition in Scheduler causing failure in retrying all partitions in case of indeterministic shuffle keys [spark]

2025-03-18 Thread via GitHub
ahshahid commented on code in PR #50033: URL: https://github.com/apache/spark/pull/50033#discussion_r2001966138 ## resource-managers/yarn/pom.xml: ## @@ -37,6 +37,11 @@ spark-core_${scala.binary.version} ${project.version} + + org.apache.spark +

Re: [PR] [SPARK-50892][SQL]Add UnionLoopExec, physical operator for recursion, to perform execution of recursive queries [spark]

2025-03-18 Thread via GitHub
cloud-fan commented on code in PR #49955: URL: https://github.com/apache/spark/pull/49955#discussion_r2000869374 ## sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala: ## @@ -4537,6 +4537,22 @@ object SQLConf { .checkValues(LegacyBehaviorPolicy.values.m

[PR] Bump net.snowflake:snowflake-jdbc from 3.22.0 to 3.23.1 [spark]

2025-03-18 Thread via GitHub
dependabot[bot] opened a new pull request, #50317: URL: https://github.com/apache/spark/pull/50317 Bumps [net.snowflake:snowflake-jdbc](https://github.com/snowflakedb/snowflake-jdbc) from 3.22.0 to 3.23.1. Release notes Sourced from https://github.com/snowflakedb/snowflake-jdbc/re

Re: [PR] [SPARK-51546][TESTS] Fix npm vulnerabilities by `npm audit fix` [spark]

2025-03-18 Thread via GitHub
LuciferYang commented on PR #50309: URL: https://github.com/apache/spark/pull/50309#issuecomment-2735207146 Merged into master. Thanks @dongjoon-hyun and @yaooqinn -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the U

Re: [PR] [SPARK-51542][UI] Add a scroll-button for addressing top and bottom [spark]

2025-03-18 Thread via GitHub
yaooqinn commented on code in PR #50305: URL: https://github.com/apache/spark/pull/50305#discussion_r2002335113 ## core/src/main/scala/org/apache/spark/ui/UIUtils.scala: ## @@ -222,6 +222,7 @@ private[spark] object UIUtils extends Logging { + setUIRoot

Re: [PR] [SPARK-51543][BUILD] Upgrade jersey to 3.0.17 [spark]

2025-03-18 Thread via GitHub
LuciferYang commented on PR #50303: URL: https://github.com/apache/spark/pull/50303#issuecomment-2735208601 Merged into master. Thanks @dongjoon-hyun -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [PR] [SPARK-51543][BUILD] Upgrade jersey to 3.0.17 [spark]

2025-03-18 Thread via GitHub
LuciferYang closed pull request #50303: [SPARK-51543][BUILD] Upgrade jersey to 3.0.17 URL: https://github.com/apache/spark/pull/50303 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific co

Re: [PR] [SPARK-51271][PYTHON] Add filter pushdown API to Python Data Sources [spark]

2025-03-18 Thread via GitHub
wengh commented on code in PR #49961: URL: https://github.com/apache/spark/pull/49961#discussion_r2002218436 ## sql/core/src/test/scala/org/apache/spark/sql/execution/python/PythonDataSourceSuite.scala: ## @@ -213,6 +220,66 @@ class PythonDataSourceSuite extends PythonDataSourc

Re: [PR] [SPARK-51400] Replace ArrayContains nodes to InSet [spark]

2025-03-18 Thread via GitHub
beliefer commented on PR #50170: URL: https://github.com/apache/spark/pull/50170#issuecomment-2735189189 I have a question: is this replacement good to any data sources ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[PR] [SPARK-51187][SQL][SS] Introduce the migration logic of config removal from SPARK-49699 [spark]

2025-03-18 Thread via GitHub
HeartSaVioR opened a new pull request, #50314: URL: https://github.com/apache/spark/pull/50314 ### What changes were proposed in this pull request? This PR proposes to implement the graceful deprecation of incorrect config introduced in SPARK-49699. SPARK-49699 was included in

Re: [PR] [SPARK-49082][SQL] Support widening Date to TimestampNTZ in Avro reader [spark]

2025-03-18 Thread via GitHub
aldenlau-db closed pull request #50315: [SPARK-49082][SQL] Support widening Date to TimestampNTZ in Avro reader URL: https://github.com/apache/spark/pull/50315 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [PR] [SPARK-51187][SQL][SS][4.0] Implement the graceful deprecation of incorrect config introduced in SPARK-49699 [spark]

2025-03-18 Thread via GitHub
HeartSaVioR commented on PR #49984: URL: https://github.com/apache/spark/pull/49984#issuecomment-2735114537 @cloud-fan I will rebase this PR, but probably this PR https://github.com/apache/spark/pull/50314 would be more properly to address community's concern. The VOTE has passed, but M

[PR] Initial commit [spark]

2025-03-18 Thread via GitHub
aldenlau-db opened a new pull request, #50315: URL: https://github.com/apache/spark/pull/50315 ### What changes were proposed in this pull request? This change adds support for widening type promotions from `Date` to `TimestampNTZ` in `AvroDeserializer. ### Why are the

Re: [PR] [SPARK-51541][SQL] Support the `TIME` data type in `Literal` methods [spark]

2025-03-18 Thread via GitHub
MaxGekk closed pull request #50299: [SPARK-51541][SQL] Support the `TIME` data type in `Literal` methods URL: https://github.com/apache/spark/pull/50299 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [PR] [SPARK-51271][PYTHON] Add filter pushdown API to Python Data Sources [spark]

2025-03-18 Thread via GitHub
wengh commented on code in PR #49961: URL: https://github.com/apache/spark/pull/49961#discussion_r2002218436 ## sql/core/src/test/scala/org/apache/spark/sql/execution/python/PythonDataSourceSuite.scala: ## @@ -213,6 +220,66 @@ class PythonDataSourceSuite extends PythonDataSourc

Re: [PR] [SPARK-51536][ML][PYTHON][CONNECT] Add missing whitelist for feature transformers / models [spark]

2025-03-18 Thread via GitHub
WeichenXu123 closed pull request #50306: [SPARK-51536][ML][PYTHON][CONNECT] Add missing whitelist for feature transformers / models URL: https://github.com/apache/spark/pull/50306 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub a

Re: [PR] [SPARK-51340][ML][CONNECT] Model size estimation [spark]

2025-03-18 Thread via GitHub
zhengruifeng commented on code in PR #50278: URL: https://github.com/apache/spark/pull/50278#discussion_r2002206038 ## mllib/src/main/scala/org/apache/spark/ml/Estimator.scala: ## @@ -81,4 +81,25 @@ abstract class Estimator[M <: Model[M]] extends PipelineStage { } overr

Re: [PR] [SPARK-44856][PYTHON] Improve Python UDTF arrow serializer performance [spark]

2025-03-18 Thread via GitHub
ueshin commented on code in PR #50099: URL: https://github.com/apache/spark/pull/50099#discussion_r2002073300 ## python/pyspark/worker.py: ## @@ -1417,6 +1434,153 @@ def mapper(_, it): return mapper, None, ser, ser +elif eval_type == PythonEvalType.SQL_ARROW_TAB

Re: [PR] [MINIOR][PYTHON] Specifying udf type in SCHEMA_MISMATCH_FOR_PANDAS_UDF error message [spark]

2025-03-18 Thread via GitHub
viirya commented on PR #50312: URL: https://github.com/apache/spark/pull/50312#issuecomment-2734977835 Thank you @dongjoon-hyun -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comme

Re: [PR] [SPARK-51261][ML][PYTHON][CONNECT] Introduce model size estimation to control ml cache [spark]

2025-03-18 Thread via GitHub
zhengruifeng closed pull request #50013: [SPARK-51261][ML][PYTHON][CONNECT] Introduce model size estimation to control ml cache URL: https://github.com/apache/spark/pull/50013 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and u

Re: [PR] [SPARK-51536][ML][PYTHON][CONNECT] Add missing whitelist for feature transformers / models [spark]

2025-03-18 Thread via GitHub
zhengruifeng commented on code in PR #50306: URL: https://github.com/apache/spark/pull/50306#discussion_r2002178074 ## sql/connect/server/src/main/scala/org/apache/spark/sql/connect/ml/MLUtils.scala: ## @@ -642,8 +642,26 @@ private[ml] object MLUtils { // Association Rules

[PR] [WIP] Python UDF traceback improvement [spark]

2025-03-18 Thread via GitHub
wengh opened a new pull request, #50313: URL: https://github.com/apache/spark/pull/50313 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How was t

Re: [PR] Add Sample functionality in DataFrame. [spark-connect-go]

2025-03-18 Thread via GitHub
imvtsl commented on code in PR #84: URL: https://github.com/apache/spark-connect-go/pull/84#discussion_r2002169879 ## spark/sql/dataframe.go: ## @@ -148,6 +148,14 @@ type DataFrame interface { Rollup(ctx context.Context, cols ...column.Convertible) *GroupedData /

Re: [PR] Add Sample functionality in DataFrame. [spark-connect-go]

2025-03-18 Thread via GitHub
imvtsl commented on code in PR #84: URL: https://github.com/apache/spark-connect-go/pull/84#discussion_r2002169879 ## spark/sql/dataframe.go: ## @@ -148,6 +148,14 @@ type DataFrame interface { Rollup(ctx context.Context, cols ...column.Convertible) *GroupedData /

Re: [PR] Add Sample functionality in DataFrame. [spark-connect-go]

2025-03-18 Thread via GitHub
imvtsl commented on PR #84: URL: https://github.com/apache/spark-connect-go/pull/84#issuecomment-2734966459 Late here, but I wanted to put forth my point: > the parameter withReplacement in the other cases already contains the "with" part in the name. I believe the original nam

Re: [PR] [SPARK-51505][SQL] Log empty partition number metrics in AQE coalesce [spark]

2025-03-18 Thread via GitHub
liuzqt commented on PR #50273: URL: https://github.com/apache/spark/pull/50273#issuecomment-2734945883 > can you provide some screenshots before and after this change? Let's see all the metrics available today and how this new metric adds value. Added Spark UI screenshot -- This is

Re: [PR] [SPARK-51271][PYTHON] Add filter pushdown API to Python Data Sources [spark]

2025-03-18 Thread via GitHub
allisonwang-db commented on code in PR #49961: URL: https://github.com/apache/spark/pull/49961#discussion_r2002148510 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/python/PythonScan.scala: ## @@ -16,26 +16,43 @@ */ package org.apache.spark.sql.execu

Re: [PR] [SPARK-51271][PYTHON] Add filter pushdown API to Python Data Sources [spark]

2025-03-18 Thread via GitHub
allisonwang-db commented on code in PR #49961: URL: https://github.com/apache/spark/pull/49961#discussion_r2002140192 ## sql/core/src/test/scala/org/apache/spark/sql/execution/python/PythonDataSourceSuite.scala: ## @@ -213,6 +220,66 @@ class PythonDataSourceSuite extends Python

Re: [PR] [SPARK-51420][SQL] Get minutes of TIME datatype [spark]

2025-03-18 Thread via GitHub
the-sakthi commented on code in PR #50296: URL: https://github.com/apache/spark/pull/50296#discussion_r2001066777 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/timeExpressions.scala: ## @@ -120,3 +121,53 @@ case class ToTimeParser(fmt: Option[String])

Re: [PR] [SPARK-51518][SQL] Support | as an alternative to |> for the SQL pipe operator token [spark]

2025-03-18 Thread via GitHub
dongjoon-hyun commented on PR #50284: URL: https://github.com/apache/spark/pull/50284#issuecomment-2734747783 Yes, right.. For the record, I also red that paper, of course, while I reviewed your original SQL Pipe Syntax PR. Thank you for considering adding them as test cases. > Making a

Re: [PR] [SPARK-51400] Replace ArrayContains nodes to InSet [spark]

2025-03-18 Thread via GitHub
adrians commented on PR #50170: URL: https://github.com/apache/spark/pull/50170#issuecomment-2734739308 I've added a benchmark testcase in `FilterPushdownBenchmark.scala`, by mostly copy-pasting the existing `InSet` testcase. One run *without* the ArrayContains-to-InSet rule ([full l

[PR] [MINIOR][PYTHON] Specifying udf type in SCHEMA_MISMATCH_FOR_PANDAS_UDF error message [spark]

2025-03-18 Thread via GitHub
viirya opened a new pull request, #50312: URL: https://github.com/apache/spark/pull/50312 ### What changes were proposed in this pull request? This minor patch adds `udf_type` parameter to `SCHEMA_MISMATCH_FOR_PANDAS_UDF` error message. ### Why are the changes neede

Re: [PR] [SPARK-XXXXX][SQL] Add maxRecordsPerOutputBatch to limit the number of record of Arrow output batch [spark]

2025-03-18 Thread via GitHub
viirya commented on code in PR #50301: URL: https://github.com/apache/spark/pull/50301#discussion_r2001863178 ## sql/core/src/main/scala/org/apache/spark/sql/execution/python/PythonArrowOutput.scala: ## @@ -83,17 +89,37 @@ private[python] trait PythonArrowOutput[OUT <: AnyRef] {

Re: [PR] [SPARK-51518][SQL] Support | as an alternative to |> for the SQL pipe operator token [spark]

2025-03-18 Thread via GitHub
dtenedor commented on PR #50284: URL: https://github.com/apache/spark/pull/50284#issuecomment-2734474463 I found a parsing ambiguity with this syntax that the research paper mentions: ![image](https://github.com/user-attachments/assets/2f336202-23ab-4121-ab00-96b5af6d1ad2) Mak

Re: [PR] [SPARK-51414][SQL] Add the make_time() function [spark]

2025-03-18 Thread via GitHub
robreeves commented on code in PR #50269: URL: https://github.com/apache/spark/pull/50269#discussion_r2001809416 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala: ## @@ -2556,6 +2556,91 @@ case class MakeDate( copy(year = ne

Re: [PR] [SPARK-50416][CORE] A more portable terminal / pipe test needed for bin/load-spark-env.sh [spark]

2025-03-18 Thread via GitHub
LuciferYang commented on PR #48937: URL: https://github.com/apache/spark/pull/48937#issuecomment-2734317403 > I re-tested the modified line in `MINGW64_NT-10.0-22631 d5 3.5.4-0bc1222b.x86_64 2024-09-04 18:28 UTC x86_64 Msys` and the new version appears to be functionally identical, and the

Re: [PR] [SPARK-50416][CORE] A more portable terminal / pipe test needed for bin/load-spark-env.sh [spark]

2025-03-18 Thread via GitHub
philwalk commented on PR #48937: URL: https://github.com/apache/spark/pull/48937#issuecomment-2734301465 > @dongjoon-hyun @HyukjinKwon @pan3793 Do you need to take another look? I re-tested the modified line in `MINGW64_NT-10.0-22631 d5 3.5.4-0bc1222b.x86_64 2024-09-04 18:28 UTC x86_64 Ms

Re: [PR] [SPARK-51358] [SS] Introduce snapshot upload lag detection through StateStoreCoordinator [spark]

2025-03-18 Thread via GitHub
ericm-db commented on code in PR #50123: URL: https://github.com/apache/spark/pull/50123#discussion_r2001497587 ## sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala: ## @@ -2236,31 +2236,35 @@ object SQLConf { .booleanConf .createWithDefault(t

Re: [PR] [SPARK-51414][SQL] Add the make_time() function [spark]

2025-03-18 Thread via GitHub
MaxGekk commented on code in PR #50269: URL: https://github.com/apache/spark/pull/50269#discussion_r2001596554 ## sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/DateExpressionsSuite.scala: ## @@ -2150,4 +2150,66 @@ class DateExpressionsSuite extends SparkF

[PR] Revert #48786 - "Revert "[SPARK-48273][SQL] Fix late rewrite of PlanWithUnresolvedIdentifier"" [spark]

2025-03-18 Thread via GitHub
dusantism-db opened a new pull request, #50311: URL: https://github.com/apache/spark/pull/50311 ### What changes were proposed in this pull request? This PR reverts https://github.com/apache/spark/pull/48786 ### Why are the changes needed? Custom rules in the early an

Re: [PR] [SPARK-51414][SQL] Add the make_time() function [spark]

2025-03-18 Thread via GitHub
robreeves commented on code in PR #50269: URL: https://github.com/apache/spark/pull/50269#discussion_r2001535420 ## sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/DateExpressionsSuite.scala: ## @@ -2150,4 +2150,66 @@ class DateExpressionsSuite extends Spar

Re: [PR] [SPARK-51358] [SS] Introduce snapshot upload lag detection through StateStoreCoordinator [spark]

2025-03-18 Thread via GitHub
ericm-db commented on code in PR #50123: URL: https://github.com/apache/spark/pull/50123#discussion_r2001497587 ## sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala: ## @@ -2236,31 +2236,35 @@ object SQLConf { .booleanConf .createWithDefault(t

Re: [PR] [SPARK-51547][SQL] Assign name to the error condition: _LEGACY_ERROR_TEMP_2130 [spark]

2025-03-18 Thread via GitHub
vrozov commented on PR #50307: URL: https://github.com/apache/spark/pull/50307#issuecomment-2733987765 LGTM -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe,

Re: [PR] [SPARK-51372][SQL] Introduce a builder pattern in TableCatalog [spark]

2025-03-18 Thread via GitHub
anoopj commented on code in PR #50137: URL: https://github.com/apache/spark/pull/50137#discussion_r2001406072 ## sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/TableCatalog.java: ## @@ -311,4 +311,15 @@ default boolean purgeTable(Identifier ident) throws Unsu

[PR] [SPARK-47241][SQL][FOLLOWUP] Fix issue when laterally referencing a `Generator` [spark]

2025-03-18 Thread via GitHub
mihailotim-db opened a new pull request, #50310: URL: https://github.com/apache/spark/pull/50310 ### What changes were proposed in this pull request? Fix issue when laterally referencing a `Generator`. ### Why are the changes needed? Fix the following query patter

Re: [PR] [SPARK-50892][SQL]Add UnionLoopExec, physical operator for recursion, to perform execution of recursive queries [spark]

2025-03-18 Thread via GitHub
cloud-fan commented on code in PR #49955: URL: https://github.com/apache/spark/pull/49955#discussion_r2001137848 ## sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala: ## @@ -4537,6 +4537,22 @@ object SQLConf { .checkValues(LegacyBehaviorPolicy.values.m

[PR] [SPARK-51546] Fix npm vulnerabilities by `npm audit fix` [spark]

2025-03-18 Thread via GitHub
LuciferYang opened a new pull request, #50309: URL: https://github.com/apache/spark/pull/50309 ### What changes were proposed in this pull request? This pr the following npm vulnerabilities by `npm audit fix`: ``` # npm audit report @babel/helpers <7.26.10 Severity: mo

Re: [PR] [SPARK-50416][CORE] A more portable terminal / pipe test needed for bin/load-spark-env.sh [spark]

2025-03-18 Thread via GitHub
LuciferYang commented on PR #48937: URL: https://github.com/apache/spark/pull/48937#issuecomment-2733569642 @dongjoon-hyun @HyukjinKwon @pan3793 Do you need to take another look? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

Re: [PR] [SPARK-51542][UI] Add a scroll-button for addressing top and bottom [spark]

2025-03-18 Thread via GitHub
LuciferYang commented on code in PR #50305: URL: https://github.com/apache/spark/pull/50305#discussion_r2001229805 ## core/src/main/resources/org/apache/spark/ui/static/scroll-button.js: ## @@ -0,0 +1,43 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or mo

[PR] [WIP][SQL] Assign name to the error condition: _LEGACY_ERROR_TEMP_2130 [spark]

2025-03-18 Thread via GitHub
MaxGekk opened a new pull request, #50307: URL: https://github.com/apache/spark/pull/50307 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How was

[PR] [SPARK-50838][SQL] Add Cross Join as legal in recursion of Recursive CTE [spark]

2025-03-18 Thread via GitHub
Pajaraja opened a new pull request, #50308: URL: https://github.com/apache/spark/pull/50308 ### What changes were proposed in this pull request? Add Cross Joins as legal in recursion in Recursive CTEs. ### Why are the changes needed? Cross join is allowed in the recursion

Re: [PR] Revert "[SPARK-51172][SS] Rename to spark.sql.optimizer.pruneFiltersCanPruneStreamingSubplan" [spark]

2025-03-18 Thread via GitHub
cloud-fan closed pull request #50291: Revert "[SPARK-51172][SS] Rename to spark.sql.optimizer.pruneFiltersCanPruneStreamingSubplan" URL: https://github.com/apache/spark/pull/50291 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub a

Re: [PR] Revert "[SPARK-51172][SS] Rename to spark.sql.optimizer.pruneFiltersCanPruneStreamingSubplan" [spark]

2025-03-18 Thread via GitHub
cloud-fan commented on PR #50291: URL: https://github.com/apache/spark/pull/50291#issuecomment-2733387141 closing since the community has reached a consensus about the migration story. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

Re: [PR] [SPARK-51187][SQL][SS][4.0] Implement the graceful deprecation of incorrect config introduced in SPARK-49699 [spark]

2025-03-18 Thread via GitHub
cloud-fan commented on PR #49984: URL: https://github.com/apache/spark/pull/49984#issuecomment-2733376577 @HeartSaVioR can you rebase this PR to trigger the tests? I think it's time to merge it now and unblock 4.0 -- This is an automated message from the Apache Git Service. To respond to

[PR] [SPARK-51536][ML][PYTHON][CONNECT] Add missing whitelist for feature transformers / models [spark]

2025-03-18 Thread via GitHub
WeichenXu123 opened a new pull request, #50306: URL: https://github.com/apache/spark/pull/50306 ### What changes were proposed in this pull request? Add missing whitelist for feature transformers / models ### Why are the changes needed? Fix these feature models /

Re: [PR] [SPARK-51420][SQL] Get minutes of TIME datatype [spark]

2025-03-18 Thread via GitHub
the-sakthi commented on PR #50296: URL: https://github.com/apache/spark/pull/50296#issuecomment-2733280959 Let me know if this updated PR aligns better with your suggestions, @MaxGekk Also, I'm looking for guidance, if possible, on fixing the codegen issue which I pointed above. -- T

Re: [PR] [BUILD] Upgrade jersey to 3.0.17 [spark]

2025-03-18 Thread via GitHub
LuciferYang commented on PR #50303: URL: https://github.com/apache/spark/pull/50303#issuecomment-2732374344 Test first -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To un

Re: [PR] [SPARK-51420][SQL] Get minutes of TIME datatype [spark]

2025-03-18 Thread via GitHub
the-sakthi commented on code in PR #50296: URL: https://github.com/apache/spark/pull/50296#discussion_r2001062893 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/timeExpressions.scala: ## @@ -120,3 +121,53 @@ case class ToTimeParser(fmt: Option[String])

Re: [PR] [SPARK-50892][SQL]Add UnionLoopExec, physical operator for recursion, to perform execution of recursive queries [spark]

2025-03-18 Thread via GitHub
Pajaraja commented on code in PR #49955: URL: https://github.com/apache/spark/pull/49955#discussion_r2000754019 ## sql/core/src/main/scala/org/apache/spark/sql/execution/UnionLoopExec.scala: ## @@ -0,0 +1,225 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one

Re: [PR] [SPARK-51332][SQL] DS V2 supports push down BIT_AND, BIT_OR, BIT_XOR, BIT_COUNT and BIT_GET [spark]

2025-03-18 Thread via GitHub
beliefer commented on PR #50097: URL: https://github.com/apache/spark/pull/50097#issuecomment-2733158729 > Can you remind me how this is done in the current JDBC v2 framework? Thanks! Since https://github.com/apache/spark/pull/50143 -- This is an automated message from the Apache G

Re: [PR] [SPARK-50892][SQL]Add UnionLoopExec, physical operator for recursion, to perform execution of recursive queries [spark]

2025-03-18 Thread via GitHub
cloud-fan commented on code in PR #49955: URL: https://github.com/apache/spark/pull/49955#discussion_r2000868775 ## sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala: ## @@ -4537,6 +4537,22 @@ object SQLConf { .checkValues(LegacyBehaviorPolicy.values.m

[PR] [SPARK-51542][UI] Add a scroll-button for addressing top and bottom [spark]

2025-03-18 Thread via GitHub
yaooqinn opened a new pull request, #50305: URL: https://github.com/apache/spark/pull/50305 ### What changes were proposed in this pull request? This PR adds buttons for addressing the top and bottom of every page ### Why are the changes needed? This is a UI UX improvement.

Re: [PR] [SPARK-50892][SQL]Add UnionLoopExec, physical operator for recursion, to perform execution of recursive queries [spark]

2025-03-18 Thread via GitHub
Pajaraja commented on code in PR #49955: URL: https://github.com/apache/spark/pull/49955#discussion_r2000927060 ## sql/core/src/main/scala/org/apache/spark/sql/execution/UnionLoopExec.scala: ## @@ -0,0 +1,229 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one

Re: [PR] [SPARK-50892][SQL]Add UnionLoopExec, physical operator for recursion, to perform execution of recursive queries [spark]

2025-03-18 Thread via GitHub
Pajaraja commented on code in PR #49955: URL: https://github.com/apache/spark/pull/49955#discussion_r2000923539 ## sql/core/src/main/scala/org/apache/spark/sql/execution/UnionLoopExec.scala: ## @@ -0,0 +1,229 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one

Re: [PR] [SPARK-50892][SQL]Add UnionLoopExec, physical operator for recursion, to perform execution of recursive queries [spark]

2025-03-18 Thread via GitHub
Pajaraja commented on code in PR #49955: URL: https://github.com/apache/spark/pull/49955#discussion_r2000904956 ## sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala: ## @@ -4537,6 +4537,22 @@ object SQLConf { .checkValues(LegacyBehaviorPolicy.values.ma

Re: [PR] [SPARK-50892][SQL]Add UnionLoopExec, physical operator for recursion, to perform execution of recursive queries [spark]

2025-03-18 Thread via GitHub
cloud-fan commented on code in PR #49955: URL: https://github.com/apache/spark/pull/49955#discussion_r2000889519 ## sql/core/src/main/scala/org/apache/spark/sql/execution/UnionLoopExec.scala: ## @@ -0,0 +1,229 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one

Re: [PR] [SPARK-50892][SQL]Add UnionLoopExec, physical operator for recursion, to perform execution of recursive queries [spark]

2025-03-18 Thread via GitHub
cloud-fan commented on code in PR #49955: URL: https://github.com/apache/spark/pull/49955#discussion_r2000882944 ## sql/core/src/main/scala/org/apache/spark/sql/execution/UnionLoopExec.scala: ## @@ -0,0 +1,229 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one

Re: [PR] [SPARK-50892][SQL]Add UnionLoopExec, physical operator for recursion, to perform execution of recursive queries [spark]

2025-03-18 Thread via GitHub
cloud-fan commented on code in PR #49955: URL: https://github.com/apache/spark/pull/49955#discussion_r2000882198 ## sql/core/src/main/scala/org/apache/spark/sql/execution/UnionLoopExec.scala: ## @@ -0,0 +1,229 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one

Re: [PR] [SPARK-50892][SQL]Add UnionLoopExec, physical operator for recursion, to perform execution of recursive queries [spark]

2025-03-18 Thread via GitHub
cloud-fan commented on code in PR #49955: URL: https://github.com/apache/spark/pull/49955#discussion_r2000884957 ## sql/core/src/main/scala/org/apache/spark/sql/execution/UnionLoopExec.scala: ## @@ -0,0 +1,229 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one

Re: [PR] [SPARK-50892][SQL]Add UnionLoopExec, physical operator for recursion, to perform execution of recursive queries [spark]

2025-03-18 Thread via GitHub
cloud-fan commented on code in PR #49955: URL: https://github.com/apache/spark/pull/49955#discussion_r2000881100 ## sql/core/src/main/scala/org/apache/spark/sql/execution/UnionLoopExec.scala: ## @@ -0,0 +1,229 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one

Re: [PR] [SPARK-50892][SQL]Add UnionLoopExec, physical operator for recursion, to perform execution of recursive queries [spark]

2025-03-18 Thread via GitHub
cloud-fan commented on code in PR #49955: URL: https://github.com/apache/spark/pull/49955#discussion_r2000878188 ## sql/core/src/main/scala/org/apache/spark/sql/execution/UnionLoopExec.scala: ## @@ -0,0 +1,229 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one

Re: [PR] [SPARK-51542][UI] Add a scroll-button for addressing top and bottom [spark]

2025-03-18 Thread via GitHub
yaooqinn commented on code in PR #50305: URL: https://github.com/apache/spark/pull/50305#discussion_r2000871416 ## core/src/main/resources/org/apache/spark/ui/static/scroll-button.js: ## @@ -0,0 +1,44 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more

Re: [PR] [SPARK-51414][SQL] Add the make_time() function [spark]

2025-03-18 Thread via GitHub
MaxGekk commented on code in PR #50269: URL: https://github.com/apache/spark/pull/50269#discussion_r2000708917 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala: ## @@ -2556,6 +2556,91 @@ case class MakeDate( copy(year = newF

Re: [PR] [SPARK-51332][SQL] DS V2 supports push down BIT_AND, BIT_OR, BIT_XOR, BIT_COUNT and BIT_GET [spark]

2025-03-18 Thread via GitHub
cloud-fan commented on PR #50097: URL: https://github.com/apache/spark/pull/50097#issuecomment-2732752199 > We do not push down these functions to all JDBC dialects in default. Can you remind me how this is done in the current JDBC v2 framework? Thanks! -- This is an automated messa

Re: [PR] [SPARK-51400] Replace ArrayContains nodes to InSet [spark]

2025-03-18 Thread via GitHub
adrians commented on PR #50170: URL: https://github.com/apache/spark/pull/50170#issuecomment-2732565425 @beliefer - Sure, I'll add those later this week. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[PR] [BUILD] Upgrade jersey to 3.0.17 [spark]

2025-03-18 Thread via GitHub
LuciferYang opened a new pull request, #50303: URL: https://github.com/apache/spark/pull/50303 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How

Re: [PR] [SPARK-51451][SQL] Fix ExtractGenerator to wait for UnresolvedStarWithColumns to be resolved [spark]

2025-03-18 Thread via GitHub
cloud-fan closed pull request #50286: [SPARK-51451][SQL] Fix ExtractGenerator to wait for UnresolvedStarWithColumns to be resolved URL: https://github.com/apache/spark/pull/50286 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub an

Re: [PR] [SPARK-51541][SQL] Support the `TIME` data type in `Literal` methods [spark]

2025-03-18 Thread via GitHub
MaxGekk commented on PR #50299: URL: https://github.com/apache/spark/pull/50299#issuecomment-2732033074 Merging to master. Thank you, @yaooqinn @LuciferYang for review. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use t

Re: [PR] [SPARK-51505][SQL] Log empty partition number metrics in AQE coalesce [spark]

2025-03-18 Thread via GitHub
cloud-fan commented on PR #50273: URL: https://github.com/apache/spark/pull/50273#issuecomment-2731840772 can you provide some screenshots before and after this change? Let's see all the metrics available today and how this new metric adds value. -- This is an automated message from the A

  1   2   >