[PR] [SPARK-50325][SQL] Factor out alias resolution to be reused in the single-pass Analyzer [spark]

2024-11-15 Thread via GitHub
vladimirg-db opened a new pull request, #48857: URL: https://github.com/apache/spark/pull/48857 ### What changes were proposed in this pull request? Factor out alias resolution code to the `AliasResolution` object. ### Why are the changes needed? Some Analyzer code will b

Re: [PR] [SPARK-50313][SQL][TESTS] Enable ANSI in SQL *SQLQueryTestSuite by default [spark]

2024-11-15 Thread via GitHub
yaooqinn commented on code in PR #48842: URL: https://github.com/apache/spark/pull/48842#discussion_r1843367043 ## sql/core/src/test/resources/sql-tests/inputs/decimalArithmeticOperations.sql: ## @@ -88,7 +88,7 @@ SELECT CAST(10 AS DECIMAL(10, 2)) div CAST(3 AS DECIMAL(5, 1));

Re: [PR] [SPARK-50313][SQL][TESTS] Enable ANSI in SQL *SQLQueryTestSuite by default [spark]

2024-11-15 Thread via GitHub
yaooqinn commented on code in PR #48842: URL: https://github.com/apache/spark/pull/48842#discussion_r1843366680 ## sql/core/src/test/resources/sql-tests/inputs/udf/udf-union.sql: ## @@ -11,7 +11,7 @@ FROM (SELECT udf(c1) as c1, udf(c2) as c2 FROM t1 -- Type Coerced Union S

[PR] [SPARK-50327][SQL] Factor out function resolution to be reused in the single-pass Analyzer [spark]

2024-11-15 Thread via GitHub
vladimirg-db opened a new pull request, #48858: URL: https://github.com/apache/spark/pull/48858 ### What changes were proposed in this pull request? Factor out function resolution code to the `FunctionResolution` object. ### Why are the changes needed? Some Analyzer code

Re: [PR] [SPARK-50309][SQL] Add documentation for SQL pipe syntax [spark]

2024-11-15 Thread via GitHub
cloud-fan commented on code in PR #48852: URL: https://github.com/apache/spark/pull/48852#discussion_r1843507956 ## docs/sql-pipe-syntax.md: ## @@ -0,0 +1,540 @@ +--- +layout: global +title: SQL Pipe Syntax +displayTitle: SQL Pipe Syntax +license: | + Licensed to the Apache Sof

Re: [PR] [SPARK-48195][CORE] Save and reuse RDD/Broadcast created by SparkPlan [spark]

2024-11-15 Thread via GitHub
yaooqinn commented on PR #48037: URL: https://github.com/apache/spark/pull/48037#issuecomment-2478403269 It looks like the number of suppressed can be calculated by `2 * calls of Utils.doTryWithCallerStacktrace` -- This is an automated message from the Apache Git Service. To respond to th

Re: [PR] [SPARK-50313][SQL][TESTS] Enable ANSI in SQL *SQLQueryTestSuite by default [spark]

2024-11-15 Thread via GitHub
yaooqinn commented on PR #48842: URL: https://github.com/apache/spark/pull/48842#issuecomment-2478422326 cc @dongjoon-hyun as the initiator of SPARK-4, also cc @cloud-fan, thank you -- This is an automated message from the Apache Git Service. To respond to the message, please log on t

Re: [PR] [SPARK-50322][SQL] Fix parameterized identifier in a sub-query [spark]

2024-11-15 Thread via GitHub
cloud-fan commented on code in PR #48847: URL: https://github.com/apache/spark/pull/48847#discussion_r1843517881 ## sql/core/src/test/scala/org/apache/spark/sql/ParametersSuite.scala: ## @@ -741,4 +741,21 @@ class ParametersSuite extends QueryTest with SharedSparkSession with P

Re: [PR] [SPARK-50322][SQL] Fix parameterized identifier in a sub-query [spark]

2024-11-15 Thread via GitHub
MaxGekk commented on code in PR #48847: URL: https://github.com/apache/spark/pull/48847#discussion_r1843594293 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/parameters.scala: ## @@ -189,7 +189,8 @@ object BindParameters extends ParameterizedQueryProcessor

Re: [PR] [SPARK-42746][SQL] Implement LISTAGG function [spark]

2024-11-15 Thread via GitHub
cloud-fan commented on code in PR #48748: URL: https://github.com/apache/spark/pull/48748#discussion_r1843594523 ## sql/core/src/test/resources/sql-tests/results/listagg.sql.out: ## @@ -0,0 +1,436 @@ +-- Automatically generated by SQLQueryTestSuite +-- !query +CREATE TEMP VIEW d

Re: [PR] [SPARK-42746][SQL] Implement LISTAGG function [spark]

2024-11-15 Thread via GitHub
cloud-fan commented on code in PR #48748: URL: https://github.com/apache/spark/pull/48748#discussion_r1843597510 ## sql/core/src/test/resources/sql-tests/results/listagg.sql.out: ## @@ -0,0 +1,436 @@ +-- Automatically generated by SQLQueryTestSuite +-- !query +CREATE TEMP VIEW d

Re: [PR] [SPARK-42746][SQL] Implement LISTAGG function [spark]

2024-11-15 Thread via GitHub
cloud-fan commented on code in PR #48748: URL: https://github.com/apache/spark/pull/48748#discussion_r1843595865 ## sql/core/src/test/resources/sql-tests/results/listagg.sql.out: ## @@ -0,0 +1,436 @@ +-- Automatically generated by SQLQueryTestSuite +-- !query +CREATE TEMP VIEW d

Re: [PR] [SPARK-49490][SQL] Add benchmarks for initCap [spark]

2024-11-15 Thread via GitHub
stevomitric commented on code in PR #48501: URL: https://github.com/apache/spark/pull/48501#discussion_r1843598536 ## sql/core/benchmarks/CollationBenchmark-jdk21-results.txt: ## @@ -1,54 +1,88 @@ -OpenJDK 64-Bit Server VM 21.0.5+11-LTS on Linux 6.5.0-1025-azure -AMD EPYC 7763 6

Re: [PR] [SPARK-49490][SQL] Add benchmarks for initCap [spark]

2024-11-15 Thread via GitHub
stevomitric commented on code in PR #48501: URL: https://github.com/apache/spark/pull/48501#discussion_r1843598536 ## sql/core/benchmarks/CollationBenchmark-jdk21-results.txt: ## @@ -1,54 +1,88 @@ -OpenJDK 64-Bit Server VM 21.0.5+11-LTS on Linux 6.5.0-1025-azure -AMD EPYC 7763 6

Re: [PR] [SPARK-49490][SQL] Add benchmarks for initCap [spark]

2024-11-15 Thread via GitHub
stevomitric commented on code in PR #48501: URL: https://github.com/apache/spark/pull/48501#discussion_r1843598536 ## sql/core/benchmarks/CollationBenchmark-jdk21-results.txt: ## @@ -1,54 +1,88 @@ -OpenJDK 64-Bit Server VM 21.0.5+11-LTS on Linux 6.5.0-1025-azure -AMD EPYC 7763 6

[PR] [SPARK-50328][INFRA] Add a separate docker file for SparkR [spark]

2024-11-15 Thread via GitHub
zhengruifeng opened a new pull request, #48859: URL: https://github.com/apache/spark/pull/48859 ### What changes were proposed in this pull request? Add a separate docker file for SparkR ### Why are the changes needed? For env isolation ### Does this PR introduce _any_

Re: [PR] [SPARK-50322][SQL] Fix parameterized identifier in a sub-query [spark]

2024-11-15 Thread via GitHub
cloud-fan commented on code in PR #48847: URL: https://github.com/apache/spark/pull/48847#discussion_r1843600908 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/parameters.scala: ## @@ -189,7 +189,8 @@ object BindParameters extends ParameterizedQueryProcess

Re: [PR] [SPARK-50130][SQL][FOLLOWUP] Make Encoder generation lazy [spark]

2024-11-15 Thread via GitHub
cloud-fan commented on code in PR #48829: URL: https://github.com/apache/spark/pull/48829#discussion_r1843600376 ## sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala: ## @@ -95,13 +95,12 @@ private[sql] object Dataset { def ofRows(sparkSession: SparkSession, logicalP

Re: [PR] [MINOR] Fix code style for if/for/while statements [spark]

2024-11-15 Thread via GitHub
MaxGekk closed pull request #48425: [MINOR] Fix code style for if/for/while statements URL: https://github.com/apache/spark/pull/48425 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific co

Re: [PR] [SPARK-42746][SQL] Implement LISTAGG function [spark]

2024-11-15 Thread via GitHub
cloud-fan commented on code in PR #48748: URL: https://github.com/apache/spark/pull/48748#discussion_r1843593480 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/collect.scala: ## @@ -265,3 +271,257 @@ private[aggregate] object CollectTopK {

Re: [PR] [SPARK-42746][SQL] Implement LISTAGG function [spark]

2024-11-15 Thread via GitHub
cloud-fan commented on code in PR #48748: URL: https://github.com/apache/spark/pull/48748#discussion_r1843593193 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/collect.scala: ## @@ -265,3 +271,257 @@ private[aggregate] object CollectTopK {

Re: [PR] [SPARK-50327][SQL] Factor out function resolution to be reused in the single-pass Analyzer [spark]

2024-11-15 Thread via GitHub
MaxGekk closed pull request #48858: [SPARK-50327][SQL] Factor out function resolution to be reused in the single-pass Analyzer URL: https://github.com/apache/spark/pull/48858 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and us

Re: [PR] [SPARK-50327][SQL] Factor out function resolution to be reused in the single-pass Analyzer [spark]

2024-11-15 Thread via GitHub
MaxGekk commented on code in PR #48858: URL: https://github.com/apache/spark/pull/48858#discussion_r1843958488 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionResolution.scala: ## @@ -0,0 +1,354 @@ +/* + * Licensed to the Apache Software Foundation

Re: [PR] [SPARK-50327][SQL] Factor out function resolution to be reused in the single-pass Analyzer [spark]

2024-11-15 Thread via GitHub
MaxGekk commented on PR #48858: URL: https://github.com/apache/spark/pull/48858#issuecomment-2479096472 +1, LGTM. Merging to master. Thank you, @vladimirg-db. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

Re: [PR] [SPARK-50320][CORE] Make `--remote` an official option by removing `experimental` warning [spark]

2024-11-15 Thread via GitHub
dongjoon-hyun closed pull request #48850: [SPARK-50320][CORE] Make `--remote` an official option by removing `experimental` warning URL: https://github.com/apache/spark/pull/48850 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub a

Re: [PR] [SPARK-50320][CORE] Make `--remote` an official option by removing `experimental` warning [spark]

2024-11-15 Thread via GitHub
dongjoon-hyun commented on PR #48850: URL: https://github.com/apache/spark/pull/48850#issuecomment-2479114901 Thank you, @yaooqinn ! Merged to master for Apache Spark 4.0.0 on February 2025. -- This is an automated message from the Apache Git Service. To respond to the message, ple

Re: [PR] [WIP] Subquery only Spark Connect. [spark]

2024-11-15 Thread via GitHub
ueshin closed pull request #48863: [WIP] Subquery only Spark Connect. URL: https://github.com/apache/spark/pull/48863 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubsc

[PR] [WIP] Subquery only Spark Connect. [spark]

2024-11-15 Thread via GitHub
ueshin opened a new pull request, #48864: URL: https://github.com/apache/spark/pull/48864 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How was

Re: [PR] [SPARK-49789][SQL] Handling of generic parameter with bounds while creating encoders [spark]

2024-11-15 Thread via GitHub
ahshahid commented on code in PR #48252: URL: https://github.com/apache/spark/pull/48252#discussion_r1844712588 ## connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/SparkSession.scala: ## @@ -137,8 +137,16 @@ class SparkSession private[sql] ( /** @inheritdoc

Re: [PR] [SPARK-49789][SQL] Handling of generic parameter with bounds while creating encoders [spark]

2024-11-15 Thread via GitHub
ahshahid commented on code in PR #48252: URL: https://github.com/apache/spark/pull/48252#discussion_r1844719865 ## sql/api/src/main/scala/org/apache/spark/sql/catalyst/JavaTypeInference.scala: ## @@ -148,34 +163,180 @@ object JavaTypeInference { // TODO: we should only co

Re: [PR] [SPARK-49789][SQL] Handling of generic parameter with bounds while creating encoders [spark]

2024-11-15 Thread via GitHub
ahshahid commented on code in PR #48252: URL: https://github.com/apache/spark/pull/48252#discussion_r1844722744 ## sql/api/src/main/scala/org/apache/spark/sql/catalyst/JavaTypeInference.scala: ## @@ -148,34 +163,180 @@ object JavaTypeInference { // TODO: we should only co

Re: [PR] [SPARK-49789][SQL] Handling of generic parameter with bounds while creating encoders [spark]

2024-11-15 Thread via GitHub
ahshahid commented on code in PR #48252: URL: https://github.com/apache/spark/pull/48252#discussion_r1844722897 ## sql/core/src/test/scala/org/apache/spark/sql/DatasetSuite.scala: ## @@ -2802,6 +2821,79 @@ class DatasetSuite extends QueryTest } } } + + test("SPAR

Re: [PR] [MINOR] Use LinkedHashSet for ResolveLateralColumnAliasReference to generate stable hash for the plan [spark]

2024-11-15 Thread via GitHub
github-actions[bot] commented on PR #47571: URL: https://github.com/apache/spark/pull/47571#issuecomment-2480202800 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.

Re: [PR] Enhance the metrics in SparkUI with logical plan stats [spark]

2024-11-15 Thread via GitHub
github-actions[bot] closed pull request #47534: Enhance the metrics in SparkUI with logical plan stats URL: https://github.com/apache/spark/pull/47534 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] Fixed comma splice in cluster-overview.md [spark]

2024-11-15 Thread via GitHub
github-actions[bot] commented on PR #47615: URL: https://github.com/apache/spark/pull/47615#issuecomment-2480202795 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.

Re: [PR] [SPARK-49789][SQL] Handling of generic parameter with bounds while creating encoders [spark]

2024-11-15 Thread via GitHub
ahshahid commented on code in PR #48252: URL: https://github.com/apache/spark/pull/48252#discussion_r1844724224 ## sql/api/src/main/scala/org/apache/spark/sql/catalyst/JavaTypeInference.scala: ## @@ -148,34 +163,180 @@ object JavaTypeInference { // TODO: we should only co

Re: [PR] [SPARK-48835] Introduce versoning to jdbc connectors [spark]

2024-11-15 Thread via GitHub
github-actions[bot] commented on PR #47181: URL: https://github.com/apache/spark/pull/47181#issuecomment-2480202821 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.

Re: [PR] [SPARK-50194][SS][PYTHON] Integration of New Timer API and Initial State API with Timer [spark]

2024-11-15 Thread via GitHub
bogao007 commented on code in PR #48838: URL: https://github.com/apache/spark/pull/48838#discussion_r1844724391 ## python/pyspark/sql/streaming/stateful_processor.py: ## @@ -420,10 +411,27 @@ def handleInputRows( timer_values: TimerValues Timer va

Re: [PR] [SPARK-49789][SQL] Handling of generic parameter with bounds while creating encoders [spark]

2024-11-15 Thread via GitHub
ahshahid commented on code in PR #48252: URL: https://github.com/apache/spark/pull/48252#discussion_r1844723692 ## sql/core/src/test/scala/org/apache/spark/sql/DatasetSuite.scala: ## @@ -2909,6 +3016,7 @@ object KryoData { /** Used to test Java encoder. */ class JavaData(val

Re: [PR] [SPARK-49676][SS][PYTHON] Add Support for Chaining of Operators in transformWithStateInPandas API [spark]

2024-11-15 Thread via GitHub
jingz-db commented on PR #48124: URL: https://github.com/apache/spark/pull/48124#issuecomment-2480233401 Hey @HeartSaVioR, could you take another look? Thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL ab

Re: [PR] [SPARK-48769][SQL] Support constant folding for ScalaUDF [spark]

2024-11-15 Thread via GitHub
github-actions[bot] closed pull request #47164: [SPARK-48769][SQL] Support constant folding for ScalaUDF URL: https://github.com/apache/spark/pull/47164 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [PR] [SPARK-49789][SQL] Handling of generic parameter with bounds while creating encoders [spark]

2024-11-15 Thread via GitHub
ahshahid commented on code in PR #48252: URL: https://github.com/apache/spark/pull/48252#discussion_r1844713507 ## sql/api/src/main/scala/org/apache/spark/sql/catalyst/JavaTypeInference.scala: ## @@ -148,34 +163,180 @@ object JavaTypeInference { // TODO: we should only co

Re: [PR] [SPARK-50324][PYTHON][CONNECT] Make `createDataFrame` trigger `Config` RPC at most once [spark]

2024-11-15 Thread via GitHub
HyukjinKwon commented on code in PR #48856: URL: https://github.com/apache/spark/pull/48856#discussion_r1844745641 ## python/pyspark/sql/connect/session.py: ## @@ -706,9 +724,9 @@ def createDataFrame( else: local_relation = LocalRelation(_table) -

Re: [PR] [SPARK-50324][PYTHON][CONNECT] Make `createDataFrame` trigger `Config` RPC at most once [spark]

2024-11-15 Thread via GitHub
zhengruifeng commented on code in PR #48856: URL: https://github.com/apache/spark/pull/48856#discussion_r1844867144 ## python/pyspark/sql/connect/session.py: ## @@ -706,9 +724,9 @@ def createDataFrame( else: local_relation = LocalRelation(_table) -

Re: [PR] [SPARK-50130][SQL][FOLLOWUP] Make Encoder generation lazy [spark]

2024-11-15 Thread via GitHub
ueshin commented on code in PR #48829: URL: https://github.com/apache/spark/pull/48829#discussion_r1844320464 ## sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala: ## @@ -95,13 +95,12 @@ private[sql] object Dataset { def ofRows(sparkSession: SparkSession, logicalPlan

Re: [PR] [SPARK-50017] Support Avro encoding for TransformWithState operator [spark]

2024-11-15 Thread via GitHub
brkyvz commented on code in PR #48401: URL: https://github.com/apache/spark/pull/48401#discussion_r1844378796 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StatefulProcessorHandleImpl.scala: ## @@ -343,7 +497,8 @@ class StatefulProcessorHandleImpl( * actu

Re: [PR] [SPARK-49789][SQL] Handling of generic parameter with bounds while creating encoders [spark]

2024-11-15 Thread via GitHub
sririshindra commented on code in PR #48252: URL: https://github.com/apache/spark/pull/48252#discussion_r1844241460 ## sql/api/src/main/scala/org/apache/spark/sql/catalyst/JavaTypeInference.scala: ## @@ -148,34 +163,180 @@ object JavaTypeInference { // TODO: we should onl

Re: [PR] [WIP] Subquery only Spark Connect. [spark]

2024-11-15 Thread via GitHub
ueshin closed pull request #48864: [WIP] Subquery only Spark Connect. URL: https://github.com/apache/spark/pull/48864 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubsc

[PR] [SQL][SPARK-50329] fix InSet$toString [spark]

2024-11-15 Thread via GitHub
averyqi-db opened a new pull request, #48865: URL: https://github.com/apache/spark/pull/48865 ### What changes were proposed in this pull request? Fix InSet$toString for unresolved plan node ### Why are the changes needed? InSet$toString should always work eve

Re: [PR] [SPARK-50329][SQL] fix InSet$toString [spark]

2024-11-15 Thread via GitHub
HyukjinKwon commented on code in PR #48865: URL: https://github.com/apache/spark/pull/48865#discussion_r1844738816 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/predicates.scala: ## @@ -609,6 +609,9 @@ case class InSet(child: Expression, hset: Set[Any]

Re: [PR] [SPARK-50309][DOCS] Document `SQL Pipe` Syntax [spark]

2024-11-15 Thread via GitHub
gengliangwang closed pull request #48852: [SPARK-50309][DOCS] Document `SQL Pipe` Syntax URL: https://github.com/apache/spark/pull/48852 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[PR] Subquery only Spark Connect. [spark]

2024-11-15 Thread via GitHub
ueshin opened a new pull request, #48863: URL: https://github.com/apache/spark/pull/48863 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How was

Re: [PR] [SPARK-49787][SQL] Cast between UDT and other types [spark]

2024-11-15 Thread via GitHub
HyukjinKwon commented on PR #48251: URL: https://github.com/apache/spark/pull/48251#issuecomment-2480360796 Actually I wanted to make a fix like this a long time ago, and gave up after reading ANSI spec because UDT cannot be casted to any type according to it IIRC. -- This is an automat

Re: [PR] [SPARK-42746][SQL] Implement LISTAGG function [spark]

2024-11-15 Thread via GitHub
cloud-fan commented on code in PR #48748: URL: https://github.com/apache/spark/pull/48748#discussion_r1843587760 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala: ## @@ -2219,21 +2219,24 @@ class Analyzer(override val catalogManager: CatalogM

Re: [PR] [SPARK-42746][SQL] Implement LISTAGG function [spark]

2024-11-15 Thread via GitHub
cloud-fan commented on code in PR #48748: URL: https://github.com/apache/spark/pull/48748#discussion_r1843590480 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/SupportsOrderingWithinGroup.scala: ## @@ -20,9 +20,28 @@ package org.apache.spark.s

Re: [PR] [SPARK-42746][SQL] Implement LISTAGG function [spark]

2024-11-15 Thread via GitHub
cloud-fan commented on code in PR #48748: URL: https://github.com/apache/spark/pull/48748#discussion_r1843591341 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/SupportsOrderingWithinGroup.scala: ## @@ -20,9 +20,28 @@ package org.apache.spark.s

Re: [PR] [SPARK-50327][SQL] Factor out function resolution to be reused in the single-pass Analyzer [spark]

2024-11-15 Thread via GitHub
vladimirg-db commented on code in PR #48858: URL: https://github.com/apache/spark/pull/48858#discussion_r1843505454 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionResolution.scala: ## @@ -0,0 +1,354 @@ +/* + * Licensed to the Apache Software Founda

Re: [PR] [MINOR] Fix code style for if/for/while statements [spark]

2024-11-15 Thread via GitHub
MaxGekk commented on PR #48425: URL: https://github.com/apache/spark/pull/48425#issuecomment-2478527669 +1, LGTM. Merging to master. Thank you, @exmy. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

Re: [PR] [SPARK-42746][SQL] Implement LISTAGG function [spark]

2024-11-15 Thread via GitHub
mikhailnik-db commented on code in PR #48748: URL: https://github.com/apache/spark/pull/48748#discussion_r1843628058 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala: ## @@ -2219,21 +2219,24 @@ class Analyzer(override val catalogManager: Cata

Re: [PR] [SPARK-50295][INFRA] Add a script to build docs with image [spark]

2024-11-15 Thread via GitHub
pan3793 commented on code in PR #48860: URL: https://github.com/apache/spark/pull/48860#discussion_r1843718476 ## dev/spark-test-image/docs/build-docs-on-local: ## @@ -0,0 +1,68 @@ +#!/usr/bin/env bash + +# +# Licensed to the Apache Software Foundation (ASF) under one or more +#

Re: [PR] [SPARK-50322][SQL] Fix parameterized identifier in a sub-query [spark]

2024-11-15 Thread via GitHub
cloud-fan commented on code in PR #48847: URL: https://github.com/apache/spark/pull/48847#discussion_r1843582476 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/parameters.scala: ## @@ -189,7 +189,8 @@ object BindParameters extends ParameterizedQueryProcess

Re: [PR] [SPARK-42746][SQL] Implement LISTAGG function [spark]

2024-11-15 Thread via GitHub
mikhailnik-db commented on code in PR #48748: URL: https://github.com/apache/spark/pull/48748#discussion_r1843674767 ## sql/core/src/test/resources/sql-tests/results/listagg.sql.out: ## @@ -0,0 +1,436 @@ +-- Automatically generated by SQLQueryTestSuite +-- !query +CREATE TEMP VI

Re: [PR] [SPARK-50325][SQL] Factor out alias resolution to be reused in the single-pass Analyzer [spark]

2024-11-15 Thread via GitHub
MaxGekk commented on PR #48857: URL: https://github.com/apache/spark/pull/48857#issuecomment-2478677345 +1, LGTM. Merging to master. Thank you, @vladimirg-db. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

Re: [PR] [SPARK-50325][SQL] Factor out alias resolution to be reused in the single-pass Analyzer [spark]

2024-11-15 Thread via GitHub
MaxGekk closed pull request #48857: [SPARK-50325][SQL] Factor out alias resolution to be reused in the single-pass Analyzer URL: https://github.com/apache/spark/pull/48857 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use t

Re: [PR] [SPARK-42746][SQL] Implement LISTAGG function [spark]

2024-11-15 Thread via GitHub
mikhailnik-db commented on code in PR #48748: URL: https://github.com/apache/spark/pull/48748#discussion_r1843673496 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/SupportsOrderingWithinGroup.scala: ## @@ -20,9 +20,28 @@ package org.apache.spa

Re: [PR] [SPARK-50322][SQL] Fix parameterized identifier in a sub-query [spark]

2024-11-15 Thread via GitHub
MaxGekk closed pull request #48847: [SPARK-50322][SQL] Fix parameterized identifier in a sub-query URL: https://github.com/apache/spark/pull/48847 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] [SPARK-50322][SQL] Fix parameterized identifier in a sub-query [spark]

2024-11-15 Thread via GitHub
MaxGekk commented on PR #48847: URL: https://github.com/apache/spark/pull/48847#issuecomment-2478680192 Merging to master. Thank you, @srielau and @cloud-fan for review. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

Re: [PR] [SPARK-50049][SQL] Support custom driver metrics in writing to v2 table [spark]

2024-11-15 Thread via GitHub
cloud-fan commented on PR #48573: URL: https://github.com/apache/spark/pull/48573#issuecomment-2478369932 thanks, merging to master! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific c

Re: [PR] [SPARK-50322][SQL] Fix parameterized identifier in a sub-query [spark]

2024-11-15 Thread via GitHub
MaxGekk commented on code in PR #48847: URL: https://github.com/apache/spark/pull/48847#discussion_r1843530688 ## sql/core/src/test/scala/org/apache/spark/sql/ParametersSuite.scala: ## @@ -741,4 +741,21 @@ class ParametersSuite extends QueryTest with SharedSparkSession with Pla

Re: [PR] [SPARK-42746][SQL] Implement LISTAGG function [spark]

2024-11-15 Thread via GitHub
mikhailnik-db commented on code in PR #48748: URL: https://github.com/apache/spark/pull/48748#discussion_r1843656733 ## sql/core/src/test/resources/sql-tests/results/listagg.sql.out: ## @@ -0,0 +1,436 @@ +-- Automatically generated by SQLQueryTestSuite +-- !query +CREATE TEMP VI

Re: [PR] [SPARK-42746][SQL] Implement LISTAGG function [spark]

2024-11-15 Thread via GitHub
mikhailnik-db commented on code in PR #48748: URL: https://github.com/apache/spark/pull/48748#discussion_r1843655281 ## sql/core/src/test/resources/sql-tests/results/listagg.sql.out: ## @@ -0,0 +1,436 @@ +-- Automatically generated by SQLQueryTestSuite +-- !query +CREATE TEMP VI

Re: [PR] [SPARK-42746][SQL] Implement LISTAGG function [spark]

2024-11-15 Thread via GitHub
mikhailnik-db commented on code in PR #48748: URL: https://github.com/apache/spark/pull/48748#discussion_r1843664606 ## sql/core/src/test/resources/sql-tests/results/listagg.sql.out: ## @@ -0,0 +1,436 @@ +-- Automatically generated by SQLQueryTestSuite +-- !query +CREATE TEMP VI

Re: [PR] [SPARK-42746][SQL] Implement LISTAGG function [spark]

2024-11-15 Thread via GitHub
mikhailnik-db commented on code in PR #48748: URL: https://github.com/apache/spark/pull/48748#discussion_r1843670210 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/SupportsOrderingWithinGroup.scala: ## @@ -20,9 +20,28 @@ package org.apache.spa

Re: [PR] [SPARK-50081][SQL] Codegen Support for `XPath*`(by Invoke & RuntimeReplaceable) [spark]

2024-11-15 Thread via GitHub
panbingkun commented on PR #48610: URL: https://github.com/apache/spark/pull/48610#issuecomment-2479038190 > @panbingkun Could you resolve conflicts, please. Updated, thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to Git

Re: [PR] [SPARK-50322][SQL] Fix parameterized identifier in a sub-query [spark]

2024-11-15 Thread via GitHub
cloud-fan commented on code in PR #48847: URL: https://github.com/apache/spark/pull/48847#discussion_r1843518188 ## sql/core/src/test/scala/org/apache/spark/sql/ParametersSuite.scala: ## @@ -741,4 +741,21 @@ class ParametersSuite extends QueryTest with SharedSparkSession with P

Re: [PR] [SPARK-48195][CORE] Save and reuse RDD/Broadcast created by SparkPlan [spark]

2024-11-15 Thread via GitHub
cloud-fan commented on PR #48037: URL: https://github.com/apache/spark/pull/48037#issuecomment-2478376297 hmmm there are 4 suppressed exceptions for this simple query? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use th

Re: [PR] [SPARK-50309][SQL] Add documentation for SQL pipe syntax [spark]

2024-11-15 Thread via GitHub
cloud-fan commented on code in PR #48852: URL: https://github.com/apache/spark/pull/48852#discussion_r1843508769 ## docs/sql-pipe-syntax.md: ## @@ -0,0 +1,540 @@ +--- +layout: global +title: SQL Pipe Syntax +displayTitle: SQL Pipe Syntax +license: | + Licensed to the Apache Sof

[PR] [SPARK-50295][INFRA] Add a script to build docs with image [spark]

2024-11-15 Thread via GitHub
panbingkun opened a new pull request, #48860: URL: https://github.com/apache/spark/pull/48860 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How was this

Re: [PR] [SPARK-50226][SQL] Correct MakeDTInterval and MakeYMInterval to catch Java exceptions [spark]

2024-11-15 Thread via GitHub
gotocoding-DB commented on code in PR #48773: URL: https://github.com/apache/spark/pull/48773#discussion_r1842229690 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/intervalExpressions.scala: ## @@ -568,17 +572,28 @@ case class MakeYMInterval(years: Expr

Re: [PR] [SPARK-50049][SQL] Support custom driver metrics in writing to v2 table [spark]

2024-11-15 Thread via GitHub
cloud-fan closed pull request #48573: [SPARK-50049][SQL] Support custom driver metrics in writing to v2 table URL: https://github.com/apache/spark/pull/48573 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above t

Re: [PR] [SPARK-50237][SQL] Assign appropriate error condition for `_LEGACY_ERROR_TEMP_2138-9`: `CIRCULAR_CLASS_REFERENCE` [spark]

2024-11-15 Thread via GitHub
MaxGekk commented on PR #48769: URL: https://github.com/apache/spark/pull/48769#issuecomment-2478475873 +1, LGTM. Merging to master. Thank you, @itholic. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [PR] [SPARK-50237][SQL] Assign appropriate error condition for `_LEGACY_ERROR_TEMP_2138-9`: `CIRCULAR_CLASS_REFERENCE` [spark]

2024-11-15 Thread via GitHub
MaxGekk closed pull request #48769: [SPARK-50237][SQL] Assign appropriate error condition for `_LEGACY_ERROR_TEMP_2138-9`: `CIRCULAR_CLASS_REFERENCE` URL: https://github.com/apache/spark/pull/48769 -- This is an automated message from the Apache Git Service. To respond to the message, please

Re: [PR] [SPARK-50091][SQL] Handle case of aggregates in left-hand operand of IN-subquery [spark]

2024-11-15 Thread via GitHub
bersprockets commented on PR #48627: URL: https://github.com/apache/spark/pull/48627#issuecomment-2479232015 cc @cloud-fan -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] [SPARK-50322][SQL] Fix parameterized identifier in a sub-query [spark]

2024-11-15 Thread via GitHub
MaxGekk commented on PR #48847: URL: https://github.com/apache/spark/pull/48847#issuecomment-2479259749 We don't need to backport this fix to `branch-3.5` because it doesn't have the changes https://github.com/apache/spark/pull/47180. -- This is an automated message from the Apache Git Se

Re: [PR] [SPARK-49490][SQL] Add benchmarks for initCap [spark]

2024-11-15 Thread via GitHub
mrk-andreev commented on code in PR #48501: URL: https://github.com/apache/spark/pull/48501#discussion_r1844407932 ## sql/core/benchmarks/CollationBenchmark-jdk21-results.txt: ## @@ -1,54 +1,88 @@ -OpenJDK 64-Bit Server VM 21.0.5+11-LTS on Linux 6.5.0-1025-azure -AMD EPYC 7763 6

Re: [PR] [SPARK-50017] Support Avro encoding for TransformWithState operator [spark]

2024-11-15 Thread via GitHub
ericm-db commented on code in PR #48401: URL: https://github.com/apache/spark/pull/48401#discussion_r1844412927 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/TransformWithStateExec.scala: ## @@ -115,7 +119,7 @@ case class TransformWithStateExec( * Fetch

Re: [PR] [SPARK-50236][SQL] Assign appropriate error condition for `_LEGACY_ERROR_TEMP_1156`: `COLUMN_NOT_DEFINED_IN_TABLE ` [spark]

2024-11-15 Thread via GitHub
MaxGekk closed pull request #48768: [SPARK-50236][SQL] Assign appropriate error condition for `_LEGACY_ERROR_TEMP_1156`: `COLUMN_NOT_DEFINED_IN_TABLE ` URL: https://github.com/apache/spark/pull/48768 -- This is an automated message from the Apache Git Service. To respond to the message, pleas

Re: [PR] [SPARK-50236][SQL] Assign appropriate error condition for `_LEGACY_ERROR_TEMP_1156`: `COLUMN_NOT_DEFINED_IN_TABLE ` [spark]

2024-11-15 Thread via GitHub
MaxGekk commented on PR #48768: URL: https://github.com/apache/spark/pull/48768#issuecomment-2479344627 +1, LGTM. Merging to master. Thank you, @itholic. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [PR] [SPARK-50130][SQL][FOLLOWUP] Make Encoder generation lazy [spark]

2024-11-15 Thread via GitHub
ueshin commented on code in PR #48829: URL: https://github.com/apache/spark/pull/48829#discussion_r1844320464 ## sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala: ## @@ -95,13 +95,12 @@ private[sql] object Dataset { def ofRows(sparkSession: SparkSession, logicalPlan

Re: [PR] [SPARK-50017] Support Avro encoding for TransformWithState operator [spark]

2024-11-15 Thread via GitHub
brkyvz commented on code in PR #48401: URL: https://github.com/apache/spark/pull/48401#discussion_r1844340161 ## sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala: ## @@ -2204,6 +2204,16 @@ object SQLConf { .intConf .createWithDefault(3) +

[PR] [DRAFT] Two string types [spark]

2024-11-15 Thread via GitHub
stefankandic opened a new pull request, #48861: URL: https://github.com/apache/spark/pull/48861 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### Ho

Re: [PR] [SPARK-49490][SQL] Add benchmarks for initCap [spark]

2024-11-15 Thread via GitHub
mrk-andreev commented on code in PR #48501: URL: https://github.com/apache/spark/pull/48501#discussion_r1844404602 ## sql/core/benchmarks/CollationBenchmark-jdk21-results.txt: ## @@ -1,54 +1,88 @@ -OpenJDK 64-Bit Server VM 21.0.5+11-LTS on Linux 6.5.0-1025-azure -AMD EPYC 7763 6

[PR] [SPARK-50301][SS] Make TransformWithState metrics reflect their intuitive meanings [spark]

2024-11-15 Thread via GitHub
neilramaswamy opened a new pull request, #48862: URL: https://github.com/apache/spark/pull/48862 ### What changes were proposed in this pull request? These changes make the following changes to metrics in TWS: - `allUpdatesTimeMs` now captures the time it takes to process all th

Re: [PR] [SPARK-45592][SPARK-45282][SQL] Correctness issue in AQE with InMemoryTableScanExec [spark]

2024-11-15 Thread via GitHub
Tom-Newton commented on PR #43760: URL: https://github.com/apache/spark/pull/43760#issuecomment-2479390482 > The fix addresses the issue by disabling coalescing in InMemoryTableScan for shuffles in the final stage. This PR seems to indicate that there will be a correctness bug if we a

Re: [PR] [SPARK-50017] Support Avro encoding for TransformWithState operator [spark]

2024-11-15 Thread via GitHub
ericm-db commented on code in PR #48401: URL: https://github.com/apache/spark/pull/48401#discussion_r1844389128 ## connector/avro/src/main/scala/org/apache/spark/sql/avro/AvroDataToCatalyst.scala: ## @@ -24,6 +24,7 @@ import org.apache.avro.generic.GenericDatumReader import org

Re: [PR] [SPARK-50017] Support Avro encoding for TransformWithState operator [spark]

2024-11-15 Thread via GitHub
ericm-db commented on code in PR #48401: URL: https://github.com/apache/spark/pull/48401#discussion_r1844390849 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/TransformWithStateExec.scala: ## @@ -115,7 +119,7 @@ case class TransformWithStateExec( * Fetch

Re: [PR] [SPARK-50017] Support Avro encoding for TransformWithState operator [spark]

2024-11-15 Thread via GitHub
ericm-db commented on code in PR #48401: URL: https://github.com/apache/spark/pull/48401#discussion_r1844389823 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/IncrementalExecution.scala: ## @@ -259,6 +259,24 @@ class IncrementalExecution( } } + /*

Re: [PR] [SPARK-50309][DOCS] Document `SQL Pipe` Syntax [spark]

2024-11-15 Thread via GitHub
dtenedor commented on code in PR #48852: URL: https://github.com/apache/spark/pull/48852#discussion_r1844576182 ## docs/sql-pipe-syntax.md: ## @@ -0,0 +1,540 @@ +--- +layout: global +title: SQL Pipe Syntax +displayTitle: SQL Pipe Syntax +license: | + Licensed to the Apache Soft

Re: [PR] [SPARK-49787][SQL] Cast between UDT and other types [spark]

2024-11-15 Thread via GitHub
cloud-fan commented on PR #48251: URL: https://github.com/apache/spark/pull/48251#issuecomment-2480296031 I don't think ANSI SQL allows casting between UDT and builtin types. Shall we use the `UnwrapUDT` to unwrap expressions that return UDT, in the store assignment rule `ResolveOutputRelat

Re: [PR] [SPARK-49787][SQL] Cast between UDT and other types [spark]

2024-11-15 Thread via GitHub
viirya commented on PR #48251: URL: https://github.com/apache/spark/pull/48251#issuecomment-2480395470 > I don't think ANSI SQL allows casting between UDT and builtin types. Shall we use the `UnwrapUDT` expression to wrap expressions that return UDT, in the store assignment rule `ResolveOut

  1   2   >