date:20250318

[PR] [SPARK-51282][SPARK-51422][ML][FOLLOW-UP] Replace UDF with builtin functions [spark]

2025-03-18 Thread via GitHub

zhengruifeng opened a new pull request, #50321: URL: https://github.com/apache/spark/pull/50321 ### What changes were proposed in this pull request? Make scala side changes corresponding to https://github.com/apache/spark/pull/50041 and https://github.com/apache/spark/pull/50184

Re: [PR] [SPARK-51549][BUILD] Bump Parquet 1.15.1 [spark]

2025-03-18 Thread via GitHub

LuciferYang commented on PR #50319: URL: https://github.com/apache/spark/pull/50319#issuecomment-2735427062 The new version seems to include two bug fixes: - https://github.com/apache/parquet-java/issues/3172 - https://github.com/apache/parquet-java/issues/3133 and I think w

[PR] Test auto set OBJC_DISABLE_INITIALIZE_FORK_SAFETY for macos [spark]

2025-03-18 Thread via GitHub

LuciferYang opened a new pull request, #50320: URL: https://github.com/apache/spark/pull/50320 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How

Re: [PR] [SPARK-51549][BUILD] Bump Parquet 1.15.1 [spark]

2025-03-18 Thread via GitHub

pan3793 commented on PR #50319: URL: https://github.com/apache/spark/pull/50319#issuecomment-2735410874 Compared to 1.15.0, only a few lines changed in this patch version, might be good to go branch-4.0? cc @cloud-fan @LuciferYang -- This is an automated message from the Apache Git Servi

[PR] [SPARK-51549][BUILD] Bump Parquet 1.15.1 [spark]

2025-03-18 Thread via GitHub

pan3793 opened a new pull request, #50319: URL: https://github.com/apache/spark/pull/50319 ### What changes were proposed in this pull request? Bump Parquet 1.15.1. ### Why are the changes needed? Release Notes https://parquet.apache.org/blog/2025/03/16/1.15.1/

Re: [PR] [SPARK-51542][UI] Add a scroll-button for addressing top and bottom [spark]

2025-03-18 Thread via GitHub

LuciferYang commented on PR #50305: URL: https://github.com/apache/spark/pull/50305#issuecomment-2735385482 Merged into master. Thanks @yaooqinn and @dongjoon-hyun -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the U

Re: [PR] [SPARK-51542][UI] Add a scroll-button for addressing top and bottom [spark]

2025-03-18 Thread via GitHub

yaooqinn commented on PR #50305: URL: https://github.com/apache/spark/pull/50305#issuecomment-2735385594 Thank you @LuciferYang and @dongjoon-hyun -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] [SPARK-51542][UI] Add a scroll-button for addressing top and bottom [spark]

2025-03-18 Thread via GitHub

LuciferYang closed pull request #50305: [SPARK-51542][UI] Add a scroll-button for addressing top and bottom URL: https://github.com/apache/spark/pull/50305 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

Re: [PR] Bump net.snowflake:snowflake-jdbc from 3.22.0 to 3.23.1 [spark]

2025-03-18 Thread via GitHub

dependabot[bot] commented on PR #50317: URL: https://github.com/apache/spark/pull/50317#issuecomment-2735376933 OK, I won't notify you again about this release, but will get in touch when a new version is available. If you'd rather skip all updates until the next major or minor version, let

Re: [PR] Bump net.snowflake:snowflake-jdbc from 3.22.0 to 3.23.1 [spark]

2025-03-18 Thread via GitHub

yaooqinn closed pull request #50317: Bump net.snowflake:snowflake-jdbc from 3.22.0 to 3.23.1 URL: https://github.com/apache/spark/pull/50317 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the speci

Re: [PR] [SPARK-51547][SQL] Assign name to the error condition: _LEGACY_ERROR_TEMP_2130 [spark]

2025-03-18 Thread via GitHub

MaxGekk commented on PR #50307: URL: https://github.com/apache/spark/pull/50307#issuecomment-2735355428 Merging to master. Thank you, @dongjoon-hyun @vrozov for review. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use t

Re: [PR] [MINIOR][PYTHON] Specifying udf type in SCHEMA_MISMATCH_FOR_PANDAS_UDF error message [spark]

2025-03-18 Thread via GitHub

viirya commented on PR #50312: URL: https://github.com/apache/spark/pull/50312#issuecomment-2735357355 Thank you @zhengruifeng -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment

Re: [PR] [SPARK-51547][SQL] Assign name to the error condition: _LEGACY_ERROR_TEMP_2130 [spark]

2025-03-18 Thread via GitHub

MaxGekk closed pull request #50307: [SPARK-51547][SQL] Assign name to the error condition: _LEGACY_ERROR_TEMP_2130 URL: https://github.com/apache/spark/pull/50307 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL ab

Re: [PR] [SPARK-51400] Replace ArrayContains nodes to InSet [spark]

2025-03-18 Thread via GitHub

beliefer commented on PR #50170: URL: https://github.com/apache/spark/pull/50170#issuecomment-2732010166 @adrians Could you provide the micro benchmarks for this optimization ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub a

Re: [PR] [SPARK-51187][SQL][SS] Introduce the migration logic of config removal from SPARK-49699 [spark]

2025-03-18 Thread via GitHub

HeartSaVioR commented on PR #50314: URL: https://github.com/apache/spark/pull/50314#issuecomment-2735299721 https://github.com/HeartSaVioR/spark/actions/runs/13936938059/job/39006581717 > SPARK-47148: AQE should avoid to submit shuffle job on cancellation *** FAILED *** (6 seconds, 93

Re: [PR] [SPARK-51430][PYTHON] Stop PySpark context logger from propagating logs to stdout [spark]

2025-03-18 Thread via GitHub

allisonwang-db commented on PR #50198: URL: https://github.com/apache/spark/pull/50198#issuecomment-2735007337 @dongjoon-hyun you still see the analysis exception ``` AnalysisException: [UNRESOLVED_COLUMN.WITH_SUGGESTION] A column, variable, or function parameter with name `id2` cannot

Re: [PR] [SPARK-50892][SQL]Add UnionLoopExec, physical operator for recursion, to perform execution of recursive queries [spark]

2025-03-18 Thread via GitHub

Pajaraja commented on code in PR #49955: URL: https://github.com/apache/spark/pull/49955#discussion_r2001302195 ## sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala: ## @@ -4537,6 +4537,22 @@ object SQLConf { .checkValues(LegacyBehaviorPolicy.values.ma

[PR] [MINOR][TESTS] Ignore messages below error level from `HadoopFSUtils` for sql module tests [spark]

2025-03-18 Thread via GitHub

LuciferYang opened a new pull request, #50316: URL: https://github.com/apache/spark/pull/50316 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How

Re: [PR] [SPARK-51542][UI] Add a scroll-button for addressing top and bottom [spark]

2025-03-18 Thread via GitHub

yaooqinn commented on PR #50305: URL: https://github.com/apache/spark/pull/50305#issuecomment-2735203999 Hi @LuciferYang, I de-jQuery'd, PTAL, again. Thank you @dongjoon-hyun, looking forward to your feedback. -- This is an automated message from the Apache Git Service. To respond t

[PR] [WIP] Fix add metadata columns [spark]

2025-03-18 Thread via GitHub

mihailotim-db opened a new pull request, #50304: URL: https://github.com/apache/spark/pull/50304 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### H

Re: [PR] [SPARK-51400] Replace ArrayContains nodes to InSet [spark]

2025-03-18 Thread via GitHub

adrians commented on code in PR #50170: URL: https://github.com/apache/spark/pull/50170#discussion_r2000685345 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala: ## @@ -298,6 +299,24 @@ object ReorderAssociativeOperator extends Rule[Logica

Re: [PR] [SPARK-51272][CORE]. Fix for the race condition in Scheduler causing failure in retrying all partitions in case of indeterministic shuffle keys [spark]

2025-03-18 Thread via GitHub

ahshahid commented on code in PR #50033: URL: https://github.com/apache/spark/pull/50033#discussion_r2001966138 ## resource-managers/yarn/pom.xml: ## @@ -37,6 +37,11 @@ spark-core_${scala.binary.version} ${project.version} + + org.apache.spark +

Re: [PR] [SPARK-50892][SQL]Add UnionLoopExec, physical operator for recursion, to perform execution of recursive queries [spark]

2025-03-18 Thread via GitHub

cloud-fan commented on code in PR #49955: URL: https://github.com/apache/spark/pull/49955#discussion_r2000869374 ## sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala: ## @@ -4537,6 +4537,22 @@ object SQLConf { .checkValues(LegacyBehaviorPolicy.values.m

[PR] Bump net.snowflake:snowflake-jdbc from 3.22.0 to 3.23.1 [spark]

2025-03-18 Thread via GitHub

dependabot[bot] opened a new pull request, #50317: URL: https://github.com/apache/spark/pull/50317 Bumps [net.snowflake:snowflake-jdbc](https://github.com/snowflakedb/snowflake-jdbc) from 3.22.0 to 3.23.1. Release notes Sourced from https://github.com/snowflakedb/snowflake-jdbc/re

Re: [PR] [SPARK-51546][TESTS] Fix npm vulnerabilities by `npm audit fix` [spark]

2025-03-18 Thread via GitHub

LuciferYang commented on PR #50309: URL: https://github.com/apache/spark/pull/50309#issuecomment-2735207146 Merged into master. Thanks @dongjoon-hyun and @yaooqinn -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the U

Re: [PR] [SPARK-51542][UI] Add a scroll-button for addressing top and bottom [spark]

2025-03-18 Thread via GitHub

yaooqinn commented on code in PR #50305: URL: https://github.com/apache/spark/pull/50305#discussion_r2002335113 ## core/src/main/scala/org/apache/spark/ui/UIUtils.scala: ## @@ -222,6 +222,7 @@ private[spark] object UIUtils extends Logging { + setUIRoot

Re: [PR] [SPARK-51543][BUILD] Upgrade jersey to 3.0.17 [spark]

2025-03-18 Thread via GitHub

LuciferYang commented on PR #50303: URL: https://github.com/apache/spark/pull/50303#issuecomment-2735208601 Merged into master. Thanks @dongjoon-hyun -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [PR] [SPARK-51543][BUILD] Upgrade jersey to 3.0.17 [spark]

2025-03-18 Thread via GitHub

LuciferYang closed pull request #50303: [SPARK-51543][BUILD] Upgrade jersey to 3.0.17 URL: https://github.com/apache/spark/pull/50303 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific co

Re: [PR] [SPARK-51271][PYTHON] Add filter pushdown API to Python Data Sources [spark]

2025-03-18 Thread via GitHub

wengh commented on code in PR #49961: URL: https://github.com/apache/spark/pull/49961#discussion_r2002218436 ## sql/core/src/test/scala/org/apache/spark/sql/execution/python/PythonDataSourceSuite.scala: ## @@ -213,6 +220,66 @@ class PythonDataSourceSuite extends PythonDataSourc

Re: [PR] [SPARK-51400] Replace ArrayContains nodes to InSet [spark]

2025-03-18 Thread via GitHub

beliefer commented on PR #50170: URL: https://github.com/apache/spark/pull/50170#issuecomment-2735189189 I have a question: is this replacement good to any data sources ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[PR] [SPARK-51187][SQL][SS] Introduce the migration logic of config removal from SPARK-49699 [spark]

2025-03-18 Thread via GitHub

HeartSaVioR opened a new pull request, #50314: URL: https://github.com/apache/spark/pull/50314 ### What changes were proposed in this pull request? This PR proposes to implement the graceful deprecation of incorrect config introduced in SPARK-49699. SPARK-49699 was included in

Re: [PR] [SPARK-49082][SQL] Support widening Date to TimestampNTZ in Avro reader [spark]

2025-03-18 Thread via GitHub

aldenlau-db closed pull request #50315: [SPARK-49082][SQL] Support widening Date to TimestampNTZ in Avro reader URL: https://github.com/apache/spark/pull/50315 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [PR] [SPARK-51187][SQL][SS][4.0] Implement the graceful deprecation of incorrect config introduced in SPARK-49699 [spark]

2025-03-18 Thread via GitHub

HeartSaVioR commented on PR #49984: URL: https://github.com/apache/spark/pull/49984#issuecomment-2735114537 @cloud-fan I will rebase this PR, but probably this PR https://github.com/apache/spark/pull/50314 would be more properly to address community's concern. The VOTE has passed, but M

[PR] Initial commit [spark]

2025-03-18 Thread via GitHub

aldenlau-db opened a new pull request, #50315: URL: https://github.com/apache/spark/pull/50315 ### What changes were proposed in this pull request? This change adds support for widening type promotions from `Date` to `TimestampNTZ` in `AvroDeserializer. ### Why are the

Re: [PR] [SPARK-51541][SQL] Support the `TIME` data type in `Literal` methods [spark]

2025-03-18 Thread via GitHub

MaxGekk closed pull request #50299: [SPARK-51541][SQL] Support the `TIME` data type in `Literal` methods URL: https://github.com/apache/spark/pull/50299 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [PR] [SPARK-51271][PYTHON] Add filter pushdown API to Python Data Sources [spark]

2025-03-18 Thread via GitHub

wengh commented on code in PR #49961: URL: https://github.com/apache/spark/pull/49961#discussion_r2002218436 ## sql/core/src/test/scala/org/apache/spark/sql/execution/python/PythonDataSourceSuite.scala: ## @@ -213,6 +220,66 @@ class PythonDataSourceSuite extends PythonDataSourc

Re: [PR] [SPARK-51536][ML][PYTHON][CONNECT] Add missing whitelist for feature transformers / models [spark]

2025-03-18 Thread via GitHub

WeichenXu123 closed pull request #50306: [SPARK-51536][ML][PYTHON][CONNECT] Add missing whitelist for feature transformers / models URL: https://github.com/apache/spark/pull/50306 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub a

Re: [PR] [SPARK-51340][ML][CONNECT] Model size estimation [spark]

2025-03-18 Thread via GitHub

zhengruifeng commented on code in PR #50278: URL: https://github.com/apache/spark/pull/50278#discussion_r2002206038 ## mllib/src/main/scala/org/apache/spark/ml/Estimator.scala: ## @@ -81,4 +81,25 @@ abstract class Estimator[M <: Model[M]] extends PipelineStage { } overr

Re: [PR] [SPARK-44856][PYTHON] Improve Python UDTF arrow serializer performance [spark]

2025-03-18 Thread via GitHub

ueshin commented on code in PR #50099: URL: https://github.com/apache/spark/pull/50099#discussion_r2002073300 ## python/pyspark/worker.py: ## @@ -1417,6 +1434,153 @@ def mapper(_, it): return mapper, None, ser, ser +elif eval_type == PythonEvalType.SQL_ARROW_TAB

Re: [PR] [MINIOR][PYTHON] Specifying udf type in SCHEMA_MISMATCH_FOR_PANDAS_UDF error message [spark]

2025-03-18 Thread via GitHub

viirya commented on PR #50312: URL: https://github.com/apache/spark/pull/50312#issuecomment-2734977835 Thank you @dongjoon-hyun -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comme

Re: [PR] [SPARK-51261][ML][PYTHON][CONNECT] Introduce model size estimation to control ml cache [spark]

2025-03-18 Thread via GitHub

zhengruifeng closed pull request #50013: [SPARK-51261][ML][PYTHON][CONNECT] Introduce model size estimation to control ml cache URL: https://github.com/apache/spark/pull/50013 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and u

Re: [PR] [SPARK-51536][ML][PYTHON][CONNECT] Add missing whitelist for feature transformers / models [spark]

2025-03-18 Thread via GitHub

zhengruifeng commented on code in PR #50306: URL: https://github.com/apache/spark/pull/50306#discussion_r2002178074 ## sql/connect/server/src/main/scala/org/apache/spark/sql/connect/ml/MLUtils.scala: ## @@ -642,8 +642,26 @@ private[ml] object MLUtils { // Association Rules

[PR] [WIP] Python UDF traceback improvement [spark]

2025-03-18 Thread via GitHub

wengh opened a new pull request, #50313: URL: https://github.com/apache/spark/pull/50313 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How was t

Re: [PR] Add Sample functionality in DataFrame. [spark-connect-go]

2025-03-18 Thread via GitHub

imvtsl commented on code in PR #84: URL: https://github.com/apache/spark-connect-go/pull/84#discussion_r2002169879 ## spark/sql/dataframe.go: ## @@ -148,6 +148,14 @@ type DataFrame interface { Rollup(ctx context.Context, cols ...column.Convertible) *GroupedData /

Re: [PR] Add Sample functionality in DataFrame. [spark-connect-go]

2025-03-18 Thread via GitHub

imvtsl commented on code in PR #84: URL: https://github.com/apache/spark-connect-go/pull/84#discussion_r2002169879 ## spark/sql/dataframe.go: ## @@ -148,6 +148,14 @@ type DataFrame interface { Rollup(ctx context.Context, cols ...column.Convertible) *GroupedData /

Re: [PR] Add Sample functionality in DataFrame. [spark-connect-go]

2025-03-18 Thread via GitHub

imvtsl commented on PR #84: URL: https://github.com/apache/spark-connect-go/pull/84#issuecomment-2734966459 Late here, but I wanted to put forth my point: > the parameter withReplacement in the other cases already contains the "with" part in the name. I believe the original nam

Re: [PR] [SPARK-51505][SQL] Log empty partition number metrics in AQE coalesce [spark]

2025-03-18 Thread via GitHub

liuzqt commented on PR #50273: URL: https://github.com/apache/spark/pull/50273#issuecomment-2734945883 > can you provide some screenshots before and after this change? Let's see all the metrics available today and how this new metric adds value. Added Spark UI screenshot -- This is

Re: [PR] [SPARK-51271][PYTHON] Add filter pushdown API to Python Data Sources [spark]

2025-03-18 Thread via GitHub

allisonwang-db commented on code in PR #49961: URL: https://github.com/apache/spark/pull/49961#discussion_r2002148510 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/python/PythonScan.scala: ## @@ -16,26 +16,43 @@ */ package org.apache.spark.sql.execu

Re: [PR] [SPARK-51271][PYTHON] Add filter pushdown API to Python Data Sources [spark]

2025-03-18 Thread via GitHub

allisonwang-db commented on code in PR #49961: URL: https://github.com/apache/spark/pull/49961#discussion_r2002140192 ## sql/core/src/test/scala/org/apache/spark/sql/execution/python/PythonDataSourceSuite.scala: ## @@ -213,6 +220,66 @@ class PythonDataSourceSuite extends Python

Re: [PR] [SPARK-51420][SQL] Get minutes of TIME datatype [spark]

2025-03-18 Thread via GitHub

the-sakthi commented on code in PR #50296: URL: https://github.com/apache/spark/pull/50296#discussion_r2001066777 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/timeExpressions.scala: ## @@ -120,3 +121,53 @@ case class ToTimeParser(fmt: Option[String])

Re: [PR] [SPARK-51518][SQL] Support | as an alternative to |> for the SQL pipe operator token [spark]

2025-03-18 Thread via GitHub

dongjoon-hyun commented on PR #50284: URL: https://github.com/apache/spark/pull/50284#issuecomment-2734747783 Yes, right.. For the record, I also red that paper, of course, while I reviewed your original SQL Pipe Syntax PR. Thank you for considering adding them as test cases. > Making a

Re: [PR] [SPARK-51400] Replace ArrayContains nodes to InSet [spark]

2025-03-18 Thread via GitHub

adrians commented on PR #50170: URL: https://github.com/apache/spark/pull/50170#issuecomment-2734739308 I've added a benchmark testcase in `FilterPushdownBenchmark.scala`, by mostly copy-pasting the existing `InSet` testcase. One run *without* the ArrayContains-to-InSet rule ([full l

[PR] [MINIOR][PYTHON] Specifying udf type in SCHEMA_MISMATCH_FOR_PANDAS_UDF error message [spark]

2025-03-18 Thread via GitHub

viirya opened a new pull request, #50312: URL: https://github.com/apache/spark/pull/50312 ### What changes were proposed in this pull request? This minor patch adds `udf_type` parameter to `SCHEMA_MISMATCH_FOR_PANDAS_UDF` error message. ### Why are the changes neede

Re: [PR] [SPARK-XXXXX][SQL] Add maxRecordsPerOutputBatch to limit the number of record of Arrow output batch [spark]

2025-03-18 Thread via GitHub

viirya commented on code in PR #50301: URL: https://github.com/apache/spark/pull/50301#discussion_r2001863178 ## sql/core/src/main/scala/org/apache/spark/sql/execution/python/PythonArrowOutput.scala: ## @@ -83,17 +89,37 @@ private[python] trait PythonArrowOutput[OUT <: AnyRef] {

Re: [PR] [SPARK-51518][SQL] Support | as an alternative to |> for the SQL pipe operator token [spark]

2025-03-18 Thread via GitHub

dtenedor commented on PR #50284: URL: https://github.com/apache/spark/pull/50284#issuecomment-2734474463 I found a parsing ambiguity with this syntax that the research paper mentions: ![image](https://github.com/user-attachments/assets/2f336202-23ab-4121-ab00-96b5af6d1ad2) Mak

Re: [PR] [SPARK-51414][SQL] Add the make_time() function [spark]

2025-03-18 Thread via GitHub

robreeves commented on code in PR #50269: URL: https://github.com/apache/spark/pull/50269#discussion_r2001809416 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala: ## @@ -2556,6 +2556,91 @@ case class MakeDate( copy(year = ne

Re: [PR] [SPARK-50416][CORE] A more portable terminal / pipe test needed for bin/load-spark-env.sh [spark]

2025-03-18 Thread via GitHub

LuciferYang commented on PR #48937: URL: https://github.com/apache/spark/pull/48937#issuecomment-2734317403 > I re-tested the modified line in `MINGW64_NT-10.0-22631 d5 3.5.4-0bc1222b.x86_64 2024-09-04 18:28 UTC x86_64 Msys` and the new version appears to be functionally identical, and the

Re: [PR] [SPARK-50416][CORE] A more portable terminal / pipe test needed for bin/load-spark-env.sh [spark]

2025-03-18 Thread via GitHub

philwalk commented on PR #48937: URL: https://github.com/apache/spark/pull/48937#issuecomment-2734301465 > @dongjoon-hyun @HyukjinKwon @pan3793 Do you need to take another look? I re-tested the modified line in `MINGW64_NT-10.0-22631 d5 3.5.4-0bc1222b.x86_64 2024-09-04 18:28 UTC x86_64 Ms

Re: [PR] [SPARK-51358] [SS] Introduce snapshot upload lag detection through StateStoreCoordinator [spark]

2025-03-18 Thread via GitHub

ericm-db commented on code in PR #50123: URL: https://github.com/apache/spark/pull/50123#discussion_r2001497587 ## sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala: ## @@ -2236,31 +2236,35 @@ object SQLConf { .booleanConf .createWithDefault(t

Re: [PR] [SPARK-51414][SQL] Add the make_time() function [spark]

2025-03-18 Thread via GitHub

MaxGekk commented on code in PR #50269: URL: https://github.com/apache/spark/pull/50269#discussion_r2001596554 ## sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/DateExpressionsSuite.scala: ## @@ -2150,4 +2150,66 @@ class DateExpressionsSuite extends SparkF

[PR] Revert #48786 - "Revert "[SPARK-48273][SQL] Fix late rewrite of PlanWithUnresolvedIdentifier"" [spark]

2025-03-18 Thread via GitHub

dusantism-db opened a new pull request, #50311: URL: https://github.com/apache/spark/pull/50311 ### What changes were proposed in this pull request? This PR reverts https://github.com/apache/spark/pull/48786 ### Why are the changes needed? Custom rules in the early an

Re: [PR] [SPARK-51414][SQL] Add the make_time() function [spark]

2025-03-18 Thread via GitHub

robreeves commented on code in PR #50269: URL: https://github.com/apache/spark/pull/50269#discussion_r2001535420 ## sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/DateExpressionsSuite.scala: ## @@ -2150,4 +2150,66 @@ class DateExpressionsSuite extends Spar

Re: [PR] [SPARK-51358] [SS] Introduce snapshot upload lag detection through StateStoreCoordinator [spark]

2025-03-18 Thread via GitHub

ericm-db commented on code in PR #50123: URL: https://github.com/apache/spark/pull/50123#discussion_r2001497587 ## sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala: ## @@ -2236,31 +2236,35 @@ object SQLConf { .booleanConf .createWithDefault(t

Re: [PR] [SPARK-51547][SQL] Assign name to the error condition: _LEGACY_ERROR_TEMP_2130 [spark]

2025-03-18 Thread via GitHub

vrozov commented on PR #50307: URL: https://github.com/apache/spark/pull/50307#issuecomment-2733987765 LGTM -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe,

Re: [PR] [SPARK-51372][SQL] Introduce a builder pattern in TableCatalog [spark]

2025-03-18 Thread via GitHub

anoopj commented on code in PR #50137: URL: https://github.com/apache/spark/pull/50137#discussion_r2001406072 ## sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/TableCatalog.java: ## @@ -311,4 +311,15 @@ default boolean purgeTable(Identifier ident) throws Unsu

[PR] [SPARK-47241][SQL][FOLLOWUP] Fix issue when laterally referencing a `Generator` [spark]

2025-03-18 Thread via GitHub

mihailotim-db opened a new pull request, #50310: URL: https://github.com/apache/spark/pull/50310 ### What changes were proposed in this pull request? Fix issue when laterally referencing a `Generator`. ### Why are the changes needed? Fix the following query patter

Re: [PR] [SPARK-50892][SQL]Add UnionLoopExec, physical operator for recursion, to perform execution of recursive queries [spark]

2025-03-18 Thread via GitHub

cloud-fan commented on code in PR #49955: URL: https://github.com/apache/spark/pull/49955#discussion_r2001137848 ## sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala: ## @@ -4537,6 +4537,22 @@ object SQLConf { .checkValues(LegacyBehaviorPolicy.values.m

[PR] [SPARK-51546] Fix npm vulnerabilities by `npm audit fix` [spark]

2025-03-18 Thread via GitHub

LuciferYang opened a new pull request, #50309: URL: https://github.com/apache/spark/pull/50309 ### What changes were proposed in this pull request? This pr the following npm vulnerabilities by `npm audit fix`: ``` # npm audit report @babel/helpers <7.26.10 Severity: mo

Re: [PR] [SPARK-50416][CORE] A more portable terminal / pipe test needed for bin/load-spark-env.sh [spark]

2025-03-18 Thread via GitHub

LuciferYang commented on PR #48937: URL: https://github.com/apache/spark/pull/48937#issuecomment-2733569642 @dongjoon-hyun @HyukjinKwon @pan3793 Do you need to take another look? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

Re: [PR] [SPARK-51542][UI] Add a scroll-button for addressing top and bottom [spark]

2025-03-18 Thread via GitHub

LuciferYang commented on code in PR #50305: URL: https://github.com/apache/spark/pull/50305#discussion_r2001229805 ## core/src/main/resources/org/apache/spark/ui/static/scroll-button.js: ## @@ -0,0 +1,43 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or mo

[PR] [WIP][SQL] Assign name to the error condition: _LEGACY_ERROR_TEMP_2130 [spark]

2025-03-18 Thread via GitHub

MaxGekk opened a new pull request, #50307: URL: https://github.com/apache/spark/pull/50307 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How was

[PR] [SPARK-50838][SQL] Add Cross Join as legal in recursion of Recursive CTE [spark]

2025-03-18 Thread via GitHub

Pajaraja opened a new pull request, #50308: URL: https://github.com/apache/spark/pull/50308 ### What changes were proposed in this pull request? Add Cross Joins as legal in recursion in Recursive CTEs. ### Why are the changes needed? Cross join is allowed in the recursion

Re: [PR] Revert "[SPARK-51172][SS] Rename to spark.sql.optimizer.pruneFiltersCanPruneStreamingSubplan" [spark]

2025-03-18 Thread via GitHub

cloud-fan closed pull request #50291: Revert "[SPARK-51172][SS] Rename to spark.sql.optimizer.pruneFiltersCanPruneStreamingSubplan" URL: https://github.com/apache/spark/pull/50291 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub a

Re: [PR] Revert "[SPARK-51172][SS] Rename to spark.sql.optimizer.pruneFiltersCanPruneStreamingSubplan" [spark]

2025-03-18 Thread via GitHub

cloud-fan commented on PR #50291: URL: https://github.com/apache/spark/pull/50291#issuecomment-2733387141 closing since the community has reached a consensus about the migration story. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

Re: [PR] [SPARK-51187][SQL][SS][4.0] Implement the graceful deprecation of incorrect config introduced in SPARK-49699 [spark]

2025-03-18 Thread via GitHub

cloud-fan commented on PR #49984: URL: https://github.com/apache/spark/pull/49984#issuecomment-2733376577 @HeartSaVioR can you rebase this PR to trigger the tests? I think it's time to merge it now and unblock 4.0 -- This is an automated message from the Apache Git Service. To respond to

[PR] [SPARK-51536][ML][PYTHON][CONNECT] Add missing whitelist for feature transformers / models [spark]

2025-03-18 Thread via GitHub

WeichenXu123 opened a new pull request, #50306: URL: https://github.com/apache/spark/pull/50306 ### What changes were proposed in this pull request? Add missing whitelist for feature transformers / models ### Why are the changes needed? Fix these feature models /

Re: [PR] [SPARK-51420][SQL] Get minutes of TIME datatype [spark]

2025-03-18 Thread via GitHub

the-sakthi commented on PR #50296: URL: https://github.com/apache/spark/pull/50296#issuecomment-2733280959 Let me know if this updated PR aligns better with your suggestions, @MaxGekk Also, I'm looking for guidance, if possible, on fixing the codegen issue which I pointed above. -- T

Re: [PR] [BUILD] Upgrade jersey to 3.0.17 [spark]

2025-03-18 Thread via GitHub

LuciferYang commented on PR #50303: URL: https://github.com/apache/spark/pull/50303#issuecomment-2732374344 Test first -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To un

Re: [PR] [SPARK-51420][SQL] Get minutes of TIME datatype [spark]

2025-03-18 Thread via GitHub

the-sakthi commented on code in PR #50296: URL: https://github.com/apache/spark/pull/50296#discussion_r2001062893 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/timeExpressions.scala: ## @@ -120,3 +121,53 @@ case class ToTimeParser(fmt: Option[String])

Re: [PR] [SPARK-50892][SQL]Add UnionLoopExec, physical operator for recursion, to perform execution of recursive queries [spark]

2025-03-18 Thread via GitHub

Pajaraja commented on code in PR #49955: URL: https://github.com/apache/spark/pull/49955#discussion_r2000754019 ## sql/core/src/main/scala/org/apache/spark/sql/execution/UnionLoopExec.scala: ## @@ -0,0 +1,225 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one

Re: [PR] [SPARK-51332][SQL] DS V2 supports push down BIT_AND, BIT_OR, BIT_XOR, BIT_COUNT and BIT_GET [spark]

2025-03-18 Thread via GitHub

beliefer commented on PR #50097: URL: https://github.com/apache/spark/pull/50097#issuecomment-2733158729 > Can you remind me how this is done in the current JDBC v2 framework? Thanks! Since https://github.com/apache/spark/pull/50143 -- This is an automated message from the Apache G

Re: [PR] [SPARK-50892][SQL]Add UnionLoopExec, physical operator for recursion, to perform execution of recursive queries [spark]

2025-03-18 Thread via GitHub

cloud-fan commented on code in PR #49955: URL: https://github.com/apache/spark/pull/49955#discussion_r2000868775 ## sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala: ## @@ -4537,6 +4537,22 @@ object SQLConf { .checkValues(LegacyBehaviorPolicy.values.m

[PR] [SPARK-51542][UI] Add a scroll-button for addressing top and bottom [spark]

2025-03-18 Thread via GitHub

yaooqinn opened a new pull request, #50305: URL: https://github.com/apache/spark/pull/50305 ### What changes were proposed in this pull request? This PR adds buttons for addressing the top and bottom of every page ### Why are the changes needed? This is a UI UX improvement.

Re: [PR] [SPARK-50892][SQL]Add UnionLoopExec, physical operator for recursion, to perform execution of recursive queries [spark]

2025-03-18 Thread via GitHub

Pajaraja commented on code in PR #49955: URL: https://github.com/apache/spark/pull/49955#discussion_r2000927060 ## sql/core/src/main/scala/org/apache/spark/sql/execution/UnionLoopExec.scala: ## @@ -0,0 +1,229 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one

Re: [PR] [SPARK-50892][SQL]Add UnionLoopExec, physical operator for recursion, to perform execution of recursive queries [spark]

2025-03-18 Thread via GitHub

Pajaraja commented on code in PR #49955: URL: https://github.com/apache/spark/pull/49955#discussion_r2000923539 ## sql/core/src/main/scala/org/apache/spark/sql/execution/UnionLoopExec.scala: ## @@ -0,0 +1,229 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one

Re: [PR] [SPARK-50892][SQL]Add UnionLoopExec, physical operator for recursion, to perform execution of recursive queries [spark]

2025-03-18 Thread via GitHub

Pajaraja commented on code in PR #49955: URL: https://github.com/apache/spark/pull/49955#discussion_r2000904956 ## sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala: ## @@ -4537,6 +4537,22 @@ object SQLConf { .checkValues(LegacyBehaviorPolicy.values.ma

Re: [PR] [SPARK-50892][SQL]Add UnionLoopExec, physical operator for recursion, to perform execution of recursive queries [spark]

2025-03-18 Thread via GitHub

cloud-fan commented on code in PR #49955: URL: https://github.com/apache/spark/pull/49955#discussion_r2000889519 ## sql/core/src/main/scala/org/apache/spark/sql/execution/UnionLoopExec.scala: ## @@ -0,0 +1,229 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one

Re: [PR] [SPARK-50892][SQL]Add UnionLoopExec, physical operator for recursion, to perform execution of recursive queries [spark]

2025-03-18 Thread via GitHub

cloud-fan commented on code in PR #49955: URL: https://github.com/apache/spark/pull/49955#discussion_r2000882944 ## sql/core/src/main/scala/org/apache/spark/sql/execution/UnionLoopExec.scala: ## @@ -0,0 +1,229 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one

Re: [PR] [SPARK-50892][SQL]Add UnionLoopExec, physical operator for recursion, to perform execution of recursive queries [spark]

2025-03-18 Thread via GitHub

cloud-fan commented on code in PR #49955: URL: https://github.com/apache/spark/pull/49955#discussion_r2000882198 ## sql/core/src/main/scala/org/apache/spark/sql/execution/UnionLoopExec.scala: ## @@ -0,0 +1,229 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one

Re: [PR] [SPARK-50892][SQL]Add UnionLoopExec, physical operator for recursion, to perform execution of recursive queries [spark]

2025-03-18 Thread via GitHub

cloud-fan commented on code in PR #49955: URL: https://github.com/apache/spark/pull/49955#discussion_r2000884957 ## sql/core/src/main/scala/org/apache/spark/sql/execution/UnionLoopExec.scala: ## @@ -0,0 +1,229 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one

Re: [PR] [SPARK-50892][SQL]Add UnionLoopExec, physical operator for recursion, to perform execution of recursive queries [spark]

2025-03-18 Thread via GitHub

cloud-fan commented on code in PR #49955: URL: https://github.com/apache/spark/pull/49955#discussion_r2000881100 ## sql/core/src/main/scala/org/apache/spark/sql/execution/UnionLoopExec.scala: ## @@ -0,0 +1,229 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one

Re: [PR] [SPARK-50892][SQL]Add UnionLoopExec, physical operator for recursion, to perform execution of recursive queries [spark]

2025-03-18 Thread via GitHub

cloud-fan commented on code in PR #49955: URL: https://github.com/apache/spark/pull/49955#discussion_r2000878188 ## sql/core/src/main/scala/org/apache/spark/sql/execution/UnionLoopExec.scala: ## @@ -0,0 +1,229 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one

Re: [PR] [SPARK-51542][UI] Add a scroll-button for addressing top and bottom [spark]

2025-03-18 Thread via GitHub

yaooqinn commented on code in PR #50305: URL: https://github.com/apache/spark/pull/50305#discussion_r2000871416 ## core/src/main/resources/org/apache/spark/ui/static/scroll-button.js: ## @@ -0,0 +1,44 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more

Re: [PR] [SPARK-51414][SQL] Add the make_time() function [spark]

2025-03-18 Thread via GitHub

MaxGekk commented on code in PR #50269: URL: https://github.com/apache/spark/pull/50269#discussion_r2000708917 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala: ## @@ -2556,6 +2556,91 @@ case class MakeDate( copy(year = newF

Re: [PR] [SPARK-51332][SQL] DS V2 supports push down BIT_AND, BIT_OR, BIT_XOR, BIT_COUNT and BIT_GET [spark]

2025-03-18 Thread via GitHub

cloud-fan commented on PR #50097: URL: https://github.com/apache/spark/pull/50097#issuecomment-2732752199 > We do not push down these functions to all JDBC dialects in default. Can you remind me how this is done in the current JDBC v2 framework? Thanks! -- This is an automated messa

Re: [PR] [SPARK-51400] Replace ArrayContains nodes to InSet [spark]

2025-03-18 Thread via GitHub

adrians commented on PR #50170: URL: https://github.com/apache/spark/pull/50170#issuecomment-2732565425 @beliefer - Sure, I'll add those later this week. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[PR] [BUILD] Upgrade jersey to 3.0.17 [spark]

2025-03-18 Thread via GitHub

LuciferYang opened a new pull request, #50303: URL: https://github.com/apache/spark/pull/50303 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How

Re: [PR] [SPARK-51451][SQL] Fix ExtractGenerator to wait for UnresolvedStarWithColumns to be resolved [spark]

2025-03-18 Thread via GitHub

cloud-fan closed pull request #50286: [SPARK-51451][SQL] Fix ExtractGenerator to wait for UnresolvedStarWithColumns to be resolved URL: https://github.com/apache/spark/pull/50286 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub an

Re: [PR] [SPARK-51541][SQL] Support the `TIME` data type in `Literal` methods [spark]

2025-03-18 Thread via GitHub

MaxGekk commented on PR #50299: URL: https://github.com/apache/spark/pull/50299#issuecomment-2732033074 Merging to master. Thank you, @yaooqinn @LuciferYang for review. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use t

Re: [PR] [SPARK-51505][SQL] Log empty partition number metrics in AQE coalesce [spark]

2025-03-18 Thread via GitHub

cloud-fan commented on PR #50273: URL: https://github.com/apache/spark/pull/50273#issuecomment-2731840772 can you provide some screenshots before and after this change? Let's see all the metrics available today and how this new metric adds value. -- This is an automated message from the A

1 2 >

100 matches

Mail list logo