Re: [PR] [SPARK-50124][SQL][FOLLOWUP] InsertSortForLimitAndOffset should propagate missing ordering columns [spark]

2025-01-09 Thread via GitHub
cloud-fan commented on code in PR #49416: URL: https://github.com/apache/spark/pull/49416#discussion_r1908316564 ## sql/core/src/main/scala/org/apache/spark/sql/execution/InsertSortForLimitAndOffset.scala: ## @@ -43,33 +42,61 @@ object InsertSortForLimitAndOffset extends Rule[Sp

Re: [PR] [SPARK-50764][PYTHON] Refine the docstring of xpath related methods [spark]

2025-01-09 Thread via GitHub
drexler-sky commented on PR #49422: URL: https://github.com/apache/spark/pull/49422#issuecomment-2579399374 Thanks @HyukjinKwon -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comme

Re: [PR] [SPARK-50762][SQL] Add Analyzer rule for resolving SQL scalar UDFs [spark]

2025-01-09 Thread via GitHub
allisonwang-db commented on code in PR #49414: URL: https://github.com/apache/spark/pull/49414#discussion_r1908320718 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala: ## @@ -2363,6 +2364,285 @@ class Analyzer(override val catalogManager: Cat

Re: [PR] [SPARK-50124][SQL][FOLLOWUP] InsertSortForLimitAndOffset should propagate missing ordering columns [spark]

2025-01-09 Thread via GitHub
cloud-fan commented on code in PR #49416: URL: https://github.com/apache/spark/pull/49416#discussion_r1908323371 ## sql/core/src/main/scala/org/apache/spark/sql/execution/InsertSortForLimitAndOffset.scala: ## @@ -43,33 +42,61 @@ object InsertSortForLimitAndOffset extends Rule[Sp

Re: [PR] [SPARK-50469][SQL] V1Writes should respect the output ordering [spark]

2025-01-09 Thread via GitHub
cloud-fan commented on code in PR #49027: URL: https://github.com/apache/spark/pull/49027#discussion_r1908329899 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/V1Writes.scala: ## @@ -98,16 +98,27 @@ object V1Writes extends Rule[LogicalPlan] with SQLConfHe

Re: [PR] [SPARK-50124][SQL][FOLLOWUP] InsertSortForLimitAndOffset should propagate missing ordering columns [spark]

2025-01-09 Thread via GitHub
viirya commented on code in PR #49416: URL: https://github.com/apache/spark/pull/49416#discussion_r1908336236 ## sql/core/src/main/scala/org/apache/spark/sql/execution/InsertSortForLimitAndOffset.scala: ## @@ -43,33 +42,61 @@ object InsertSortForLimitAndOffset extends Rule[Spark

Re: [PR] [SPARK-50124][SQL][FOLLOWUP] InsertSortForLimitAndOffset should propagate missing ordering columns [spark]

2025-01-09 Thread via GitHub
cloud-fan commented on code in PR #49416: URL: https://github.com/apache/spark/pull/49416#discussion_r1908323371 ## sql/core/src/main/scala/org/apache/spark/sql/execution/InsertSortForLimitAndOffset.scala: ## @@ -43,33 +42,61 @@ object InsertSortForLimitAndOffset extends Rule[Sp

Re: [PR] [SPARK-50762][SQL] Add Analyzer rule for resolving SQL scalar UDFs [spark]

2025-01-09 Thread via GitHub
allisonwang-db commented on code in PR #49414: URL: https://github.com/apache/spark/pull/49414#discussion_r1908342775 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala: ## @@ -2363,6 +2364,285 @@ class Analyzer(override val catalogManager: Cat

Re: [PR] [SPARK-50767][SQL] Remove codegen of `from_json` [spark]

2025-01-09 Thread via GitHub
panbingkun commented on PR #49411: URL: https://github.com/apache/spark/pull/49411#issuecomment-2579439319 In the `withFilter` scenario of `SubExprEliminationBenchmark`, the root cause as follows: ```scala val df = spark.read .text(path.getAbsolutePath)

Re: [PR] [SPARK-50767][SQL] Remove codegen of `from_json` [spark]

2025-01-09 Thread via GitHub
panbingkun commented on PR #49411: URL: https://github.com/apache/spark/pull/49411#issuecomment-2579445696 If we can implement subexpressionElimination optimization in the method `FilterExec.doConsume`, like `ProjectExec.doConsume`, that would be great. cc @cloud-fan -- This is an

[PR] [SPARK-50624][SQL] Add TimestampNTZType to ColumnarRow [spark]

2025-01-09 Thread via GitHub
nastra opened a new pull request, #49437: URL: https://github.com/apache/spark/pull/49437 ### What changes were proposed in this pull request? Noticed that this was missing when using this in Iceberg. See additional details in https://github.com/apache/iceberg/pull/11815#d

Re: [PR] [SPARK-50624][SQL] Add TimestampNTZType to ColumnarRow [spark]

2025-01-09 Thread via GitHub
nastra commented on PR #49437: URL: https://github.com/apache/spark/pull/49437#issuecomment-2581996699 @cloud-fan could you take a look please? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] [SPARK-50633][INFRA] Fix `codecov/codecov-action@v4` daily scheduling failure [spark]

2025-01-09 Thread via GitHub
panbingkun commented on PR #49251: URL: https://github.com/apache/spark/pull/49251#issuecomment-2581813853 Considering that it has already been added, I will merge this PR and observe it later. https://github.com/user-attachments/assets/8d270ad3-2885-4bb6-a181-e0b89b7486c7"; /> --

[PR] Use `overrideStdFeatures` instead of `setFeatureMask` [spark]

2025-01-09 Thread via GitHub
LuciferYang opened a new pull request, #49434: URL: https://github.com/apache/spark/pull/49434 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How

Re: [PR] [SPARK-50633][INFRA] Fix `codecov/codecov-action@v4` daily scheduling failure [spark]

2025-01-09 Thread via GitHub
panbingkun commented on PR #49251: URL: https://github.com/apache/spark/pull/49251#issuecomment-2581818346 Thanks for the review @HyukjinKwon @dongjoon-hyun , merging to master! I will continue to observe it. -- This is an automated message from the Apache Git Service. To respond to th

Re: [PR] [SPARK-50633][INFRA] Fix `codecov/codecov-action@v4` daily scheduling failure [spark]

2025-01-09 Thread via GitHub
panbingkun closed pull request #49251: [SPARK-50633][INFRA] Fix `codecov/codecov-action@v4` daily scheduling failure URL: https://github.com/apache/spark/pull/49251 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[PR] [SPARK-50781][SQL] Cache `QueryPlan. expressions` [spark]

2025-01-09 Thread via GitHub
zhengruifeng opened a new pull request, #49435: URL: https://github.com/apache/spark/pull/49435 ### What changes were proposed in this pull request? Cache `QueryPlan. expressions` ### Why are the changes needed? We observed that we were spending a significant amount of time r

Re: [PR] [SPARK-50780][SQL] Use `overrideStdFeatures` instead of `setFeatureMask` [spark]

2025-01-09 Thread via GitHub
LuciferYang commented on code in PR #49434: URL: https://github.com/apache/spark/pull/49434#discussion_r1909867794 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonParser.scala: ## @@ -338,8 +339,11 @@ class JacksonParser( UTF8String.fromBy

[PR] [SPARK-50782][SQL] Replace the use of reflection in `CodeGenerator.updateAndGetCompilationStats` with direct calls to `ClassFile.CodeAttribute#code` [spark]

2025-01-09 Thread via GitHub
LuciferYang opened a new pull request, #49436: URL: https://github.com/apache/spark/pull/49436 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How

Re: [PR] [SPARK-50779][SQL] Adding feature flag for object level collations [spark]

2025-01-09 Thread via GitHub
dejankrak-db commented on PR #49431: URL: https://github.com/apache/spark/pull/49431#issuecomment-2581550993 @cloud-fan , please take a look and help merge this PR, to add a feature flag for disabling object-level collations while the feature is still in development, thanks! CC @stefanka

Re: [PR] [SPARK-50707][SQL] Enable casting to/from char/varchar [spark]

2025-01-09 Thread via GitHub
cloud-fan commented on code in PR #49340: URL: https://github.com/apache/spark/pull/49340#discussion_r1909822717 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala: ## @@ -318,9 +316,8 @@ object Cast extends QueryErrorsBase { case _ if from

Re: [PR] [SPARK-50779][SQL] Adding feature flag for object level collations [spark]

2025-01-09 Thread via GitHub
cloud-fan closed pull request #49431: [SPARK-50779][SQL] Adding feature flag for object level collations URL: https://github.com/apache/spark/pull/49431 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [PR] [SPARK-50779][SQL] Adding feature flag for object level collations [spark]

2025-01-09 Thread via GitHub
cloud-fan commented on PR #49431: URL: https://github.com/apache/spark/pull/49431#issuecomment-2581800349 thanks, merging to master! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific c

Re: [PR] [SPARK-50707][SQL] Enable casting to/from char/varchar [spark]

2025-01-09 Thread via GitHub
mihailom-db commented on code in PR #49340: URL: https://github.com/apache/spark/pull/49340#discussion_r1909909244 ## sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CastSuiteBase.scala: ## @@ -1428,4 +1424,43 @@ abstract class CastSuiteBase extends SparkFu

Re: [PR] [SPARK-50634][INFRA] Upgrade `codecov/codecov-action` to v5 [spark]

2025-01-09 Thread via GitHub
panbingkun closed pull request #49228: [SPARK-50634][INFRA] Upgrade `codecov/codecov-action` to v5 URL: https://github.com/apache/spark/pull/49228 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] [SPARK-48344][SQL] Enhance SQL Script Execution: Replace NOOP with COLLECT for Result DataFrames [spark]

2025-01-09 Thread via GitHub
cloud-fan closed pull request #49372: [SPARK-48344][SQL] Enhance SQL Script Execution: Replace NOOP with COLLECT for Result DataFrames URL: https://github.com/apache/spark/pull/49372 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHu

Re: [PR] [SPARK-48344][SQL] Enhance SQL Script Execution: Replace NOOP with COLLECT for Result DataFrames [spark]

2025-01-09 Thread via GitHub
cloud-fan commented on PR #49372: URL: https://github.com/apache/spark/pull/49372#issuecomment-2581534689 thanks, merging to master! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific c

Re: [PR] Add Support for Struct Conversion when reading Arrow data [spark-connect-go]

2025-01-09 Thread via GitHub
haoxins commented on code in PR #115: URL: https://github.com/apache/spark-connect-go/pull/115#discussion_r1909779187 ## spark/sql/types/arrow.go: ## @@ -260,6 +260,27 @@ func readArrayData(t arrow.Type, data arrow.ArrayData) ([]any, error) { }

Re: [PR] [SPARK-50694][SQL] Support withColumns / withColumnsRenamed in subqueries [spark]

2025-01-09 Thread via GitHub
cloud-fan commented on code in PR #49386: URL: https://github.com/apache/spark/pull/49386#discussion_r1909794832 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/unresolved.scala: ## @@ -712,6 +717,109 @@ case class UnresolvedStarExceptOrReplace( } } +

Re: [PR] [SPARK-49883][SS][TESTS][FOLLOWUP] RocksDB Fault Tolerance Test [spark]

2025-01-09 Thread via GitHub
HeartSaVioR commented on PR #49175: URL: https://github.com/apache/spark/pull/49175#issuecomment-2581357885 I'm merging on behalf of @brkyvz . Thanks! Merging to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

Re: [PR] [SPARK-49883][SS][TESTS][FOLLOWUP] RocksDB Fault Tolerance Test [spark]

2025-01-09 Thread via GitHub
HeartSaVioR closed pull request #49175: [SPARK-49883][SS][TESTS][FOLLOWUP] RocksDB Fault Tolerance Test URL: https://github.com/apache/spark/pull/49175 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go t

[PR] [SPARK-50778][PYTHON] Add metadataColumn to PySpark DataFrame [spark]

2025-01-09 Thread via GitHub
ueshin opened a new pull request, #49430: URL: https://github.com/apache/spark/pull/49430 ### What changes were proposed in this pull request? Add `metadataColumn` to PySpark DataFrame. ### Why are the changes needed? Feature parity: The API is missing in PySpark.

Re: [PR] [SPARK-50694][SQL] Support withColumns / withColumnsRenamed in subqueries [spark]

2025-01-09 Thread via GitHub
cloud-fan commented on code in PR #49386: URL: https://github.com/apache/spark/pull/49386#discussion_r1909795652 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/unresolved.scala: ## @@ -712,6 +717,109 @@ case class UnresolvedStarExceptOrReplace( } } +

Re: [PR] [SPARK-50694][SQL] Support withColumns / withColumnsRenamed in subqueries [spark]

2025-01-09 Thread via GitHub
cloud-fan commented on code in PR #49386: URL: https://github.com/apache/spark/pull/49386#discussion_r1909797114 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/unresolved.scala: ## @@ -712,6 +717,109 @@ case class UnresolvedStarExceptOrReplace( } } +

Re: [PR] [SPARK-50600][CONNECT][SQL] Set analyzed on analysis failure [spark]

2025-01-09 Thread via GitHub
cloud-fan commented on PR #49383: URL: https://github.com/apache/spark/pull/49383#issuecomment-2581710657 thanks, merging to master! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific c

Re: [PR] [SPARK-50600][CONNECT][SQL] Set analyzed on analysis failure [spark]

2025-01-09 Thread via GitHub
cloud-fan closed pull request #49383: [SPARK-50600][CONNECT][SQL] Set analyzed on analysis failure URL: https://github.com/apache/spark/pull/49383 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] [SPARK-50525][SQL] Define InsertMapSortInRepartitionExpressions Optimizer Rule [spark]

2025-01-09 Thread via GitHub
cloud-fan commented on PR #49144: URL: https://github.com/apache/spark/pull/49144#issuecomment-2581711847 The Spark Connect failure is unrelated, thanks, merging to master! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and u

Re: [PR] [SPARK-50525][SQL] Define InsertMapSortInRepartitionExpressions Optimizer Rule [spark]

2025-01-09 Thread via GitHub
cloud-fan closed pull request #49144: [SPARK-50525][SQL] Define InsertMapSortInRepartitionExpressions Optimizer Rule URL: https://github.com/apache/spark/pull/49144 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

Re: [PR] [SPARK-50719][PYTHON] Support `interruptOperation` for PySpark [spark]

2025-01-09 Thread via GitHub
zhengruifeng commented on PR #49423: URL: https://github.com/apache/spark/pull/49423#issuecomment-2581714805 merged to master -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] [SPARK-50694][SQL] Support withColumns / withColumnsRenamed in subqueries [spark]

2025-01-09 Thread via GitHub
cloud-fan commented on code in PR #49386: URL: https://github.com/apache/spark/pull/49386#discussion_r1909799001 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/unresolved.scala: ## @@ -712,6 +717,109 @@ case class UnresolvedStarExceptOrReplace( } } +

Re: [PR] [SPARK-50694][SQL] Support withColumns / withColumnsRenamed in subqueries [spark]

2025-01-09 Thread via GitHub
cloud-fan commented on code in PR #49386: URL: https://github.com/apache/spark/pull/49386#discussion_r1909798456 ## sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala: ## @@ -1275,29 +1275,15 @@ class Dataset[T] private[sql]( require(colNames.size == cols.size, Rev

Re: [PR] [SPARK-50719][PYTHON] Support `interruptOperation` for PySpark [spark]

2025-01-09 Thread via GitHub
zhengruifeng closed pull request #49423: [SPARK-50719][PYTHON] Support `interruptOperation` for PySpark URL: https://github.com/apache/spark/pull/49423 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go t

[PR] [MINOR][SQL] Remove `removeWhitespace helper` for DESCRIBE TABLE [spark]

2025-01-09 Thread via GitHub
asl3 opened a new pull request, #49433: URL: https://github.com/apache/spark/pull/49433 ### What changes were proposed in this pull request? When converting from `toJsonLinkedHashMap` result to `DESCRIBE TABLE`, `removeWhitespace` helper is unnecessary. This PR removes the

Re: [PR] [SPARK-50758][K8S]Mounts the krb5 config map on the executor pod [spark]

2025-01-09 Thread via GitHub
maomaodev closed pull request #49426: [SPARK-50758][K8S]Mounts the krb5 config map on the executor pod URL: https://github.com/apache/spark/pull/49426 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] [SPARK-50707][SQL] Enable casting to/from char/varchar [spark]

2025-01-09 Thread via GitHub
cloud-fan commented on code in PR #49340: URL: https://github.com/apache/spark/pull/49340#discussion_r1909817639 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala: ## @@ -284,6 +280,8 @@ object Cast extends QueryErrorsBase { */ def needsT

Re: [PR] [SPARK-50707][SQL] Enable casting to/from char/varchar [spark]

2025-01-09 Thread via GitHub
cloud-fan commented on code in PR #49340: URL: https://github.com/apache/spark/pull/49340#discussion_r1909820879 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala: ## @@ -1138,6 +1135,8 @@ case class Cast( to match { case dt if d

Re: [PR] [SPARK-50707][SQL] Enable casting to/from char/varchar [spark]

2025-01-09 Thread via GitHub
cloud-fan commented on code in PR #49340: URL: https://github.com/apache/spark/pull/49340#discussion_r1909819515 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala: ## @@ -361,10 +358,10 @@ object Cast extends QueryErrorsBase { case (_, _) i

Re: [PR] [SPARK-50707][SQL] Enable casting to/from char/varchar [spark]

2025-01-09 Thread via GitHub
cloud-fan commented on code in PR #49340: URL: https://github.com/apache/spark/pull/49340#discussion_r1909819745 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala: ## @@ -361,10 +358,10 @@ object Cast extends QueryErrorsBase { case (_, _) i

Re: [PR] [SPARK-50707][SQL] Enable casting to/from char/varchar [spark]

2025-01-09 Thread via GitHub
cloud-fan commented on code in PR #49340: URL: https://github.com/apache/spark/pull/49340#discussion_r1909818843 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala: ## @@ -318,9 +316,8 @@ object Cast extends QueryErrorsBase { case _ if from

Re: [PR] [SPARK-50707][SQL] Enable casting to/from char/varchar [spark]

2025-01-09 Thread via GitHub
cloud-fan commented on code in PR #49340: URL: https://github.com/apache/spark/pull/49340#discussion_r1909821997 ## sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CastSuiteBase.scala: ## @@ -1428,4 +1424,43 @@ abstract class CastSuiteBase extends SparkFunS

[PR] Add Support for Struct Conversion when reading Arrow data [spark-connect-go]

2025-01-09 Thread via GitHub
kronsbein opened a new pull request, #115: URL: https://github.com/apache/spark-connect-go/pull/115 ### What changes were proposed in this pull request? This PR adds support for struct conversion when reading Arrow data ### Why are the changes needed? Resolves #114

Re: [PR] [SPARK-50654][SS] CommitMetadata should set stateUniqueIds to None in V1 [spark]

2025-01-09 Thread via GitHub
WweiL commented on code in PR #49278: URL: https://github.com/apache/spark/pull/49278#discussion_r1909605585 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/CommitLog.scala: ## @@ -51,6 +51,8 @@ class CommitLog(sparkSession: SparkSession, path: String) i

[PR] [SPARK-50779][SQL] Adding feature flag for object level collations [spark]

2025-01-09 Thread via GitHub
dejankrak-db opened a new pull request, #49431: URL: https://github.com/apache/spark/pull/49431 ### What changes were proposed in this pull request? As a follow up from https://github.com/apache/spark/pull/49084 and associated JIRA issue https://issues.apache.org/jira/browse/SPARK-506

Re: [PR] [SPARK-50654][SS] CommitMetadata should set stateUniqueIds to None in V1 [spark]

2025-01-09 Thread via GitHub
WweiL commented on code in PR #49278: URL: https://github.com/apache/spark/pull/49278#discussion_r1909605585 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/CommitLog.scala: ## @@ -51,6 +51,8 @@ class CommitLog(sparkSession: SparkSession, path: String) i

Re: [PR] [SPARK-50777][CORE] Remove redundant no-op `init/destroy` methods from `Filter` classes [spark]

2025-01-09 Thread via GitHub
dongjoon-hyun commented on PR #49429: URL: https://github.com/apache/spark/pull/49429#issuecomment-2581654921 Thank you always, @LuciferYang ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] [SPARK-49565][SQL] Improve auto-generated expression aliases with pipe SQL operators [spark]

2025-01-09 Thread via GitHub
cloud-fan closed pull request #49245: [SPARK-49565][SQL] Improve auto-generated expression aliases with pipe SQL operators URL: https://github.com/apache/spark/pull/49245 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use th

Re: [PR] [SPARK-49565][SQL] Improve auto-generated expression aliases with pipe SQL operators [spark]

2025-01-09 Thread via GitHub
cloud-fan commented on PR #49245: URL: https://github.com/apache/spark/pull/49245#issuecomment-2581664281 thanks, merging to master! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific c

[PR] [SPARK-50777][CORE] Remove redundant no-op `init/destroy` methods from `Filter` classes [spark]

2025-01-09 Thread via GitHub
dongjoon-hyun opened a new pull request, #49429: URL: https://github.com/apache/spark/pull/49429 … ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change?

Re: [PR] [SPARK-50560][SQL] Improve type coercion and boundary checking for RANDSTR SQL function [spark]

2025-01-09 Thread via GitHub
dtenedor commented on code in PR #49210: URL: https://github.com/apache/spark/pull/49210#discussion_r1909618156 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercionHelper.scala: ## @@ -400,6 +401,9 @@ abstract class TypeCoercionHelper { NaN

[PR] Conf plugin connect ml [spark]

2025-01-09 Thread via GitHub
wbo4958 opened a new pull request, #49432: URL: https://github.com/apache/spark/pull/49432 Users can specify the ml overrides by `spark.connect.extensions.ml.overrides="org.apache.spark.ml.classification.LogisticRegression=org.apache.spark.sql.connect.ml.MyLogisticRegression"` --

Re: [PR] Conf plugin connect ml [spark]

2025-01-09 Thread via GitHub
wbo4958 closed pull request #49432: Conf plugin connect ml URL: https://github.com/apache/spark/pull/49432 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mai

[PR] [SPARK-50776][KUBERNETES][TESTS] Invalid test assertions on executor Kubernetes service account [spark]

2025-01-09 Thread via GitHub
cnauroth opened a new pull request, #49428: URL: https://github.com/apache/spark/pull/49428 ### What changes were proposed in this pull request? `ExecutorKubernetesCredentialsFeatureStepSuite` tests that Spark sets the correct service account on executor pods for various configuration

Re: [PR] [SPARK-49907][ML][CONNECT] Support spark.ml on Connect [spark]

2025-01-09 Thread via GitHub
wbo4958 commented on PR #48791: URL: https://github.com/apache/spark/pull/48791#issuecomment-2581450570 Hi @zhengruifeng, > I also think we can optimize out the message Param: option 1: support Vector and Matrix in message Literal; option 2 (Preferred): Using message Expression

Re: [PR] [SPARK-50777][CORE] Remove redundant no-op `init/destroy` methods from `Filter` classes [spark]

2025-01-09 Thread via GitHub
LuciferYang closed pull request #49429: [SPARK-50777][CORE] Remove redundant no-op `init/destroy` methods from `Filter` classes URL: https://github.com/apache/spark/pull/49429 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and u

Re: [PR] [SPARK-50777][CORE] Remove redundant no-op `init/destroy` methods from `Filter` classes [spark]

2025-01-09 Thread via GitHub
LuciferYang commented on PR #49429: URL: https://github.com/apache/spark/pull/49429#issuecomment-2581626510 Merged into master for Spark 4.0. Thanks @dongjoon-hyun -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the U

Re: [PR] [SPARK-50778][PYTHON] Add metadataColumn to PySpark DataFrame [spark]

2025-01-09 Thread via GitHub
zhengruifeng closed pull request #49430: [SPARK-50778][PYTHON] Add metadataColumn to PySpark DataFrame URL: https://github.com/apache/spark/pull/49430 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] [SPARK-50778][PYTHON] Add metadataColumn to PySpark DataFrame [spark]

2025-01-09 Thread via GitHub
zhengruifeng commented on PR #49430: URL: https://github.com/apache/spark/pull/49430#issuecomment-2581634062 the test failure should be unrelated. merged to master -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

Re: [PR] [SPARK-49565][SQL] Improve auto-generated expression aliases with pipe SQL operators [spark]

2025-01-09 Thread via GitHub
dtenedor commented on code in PR #49245: URL: https://github.com/apache/spark/pull/49245#discussion_r1909562687 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala: ## @@ -503,7 +504,8 @@ class Analyzer(override val catalogManager: CatalogManager

Re: [PR] [SPARK-49565][SQL] Improve auto-generated expression aliases with pipe SQL operators [spark]

2025-01-09 Thread via GitHub
dtenedor commented on code in PR #49245: URL: https://github.com/apache/spark/pull/49245#discussion_r1909562558 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala: ## @@ -538,6 +540,13 @@ class Analyzer(override val catalogManager: CatalogManag

Re: [PR] [SPARK-50767][SQL] Remove codegen of `from_json` [spark]

2025-01-09 Thread via GitHub
LuciferYang commented on PR #49411: URL: https://github.com/apache/spark/pull/49411#issuecomment-2579831462 > @panbingkun great investigation! +1 to implement subexpression elimination for `FilterExec` So this feature doesn't need to be revert, right? -- This is an automate

Re: [PR] [SPARK-49907][ML][CONNECT] Support spark.ml on Connect [spark]

2025-01-09 Thread via GitHub
grundprinzip commented on code in PR #48791: URL: https://github.com/apache/spark/pull/48791#discussion_r1908550792 ## sql/connect/server/src/main/scala/org/apache/spark/sql/connect/ml/MLUtils.scala: ## @@ -0,0 +1,293 @@ +/* + * Licensed to the Apache Software Foundation (ASF) u

Re: [PR] [SPARK-50774][SQL] Make collation names public in CollationFactory [spark]

2025-01-09 Thread via GitHub
dongjoon-hyun commented on PR #49425: URL: https://github.com/apache/spark/pull/49425#issuecomment-2580896568 Could you run `dev/lint-java` and fix all `LineLength` errors? Otherwise, the CI will fail due to this. ``` $ dev/lint-java Using `mvn` from path: /Users/dongjoon/APACH

Re: [PR] [SPARK-50774][SQL] Make collation names public in CollationFactory [spark]

2025-01-09 Thread via GitHub
dongjoon-hyun commented on code in PR #49425: URL: https://github.com/apache/spark/pull/49425#discussion_r1909253274 ## common/unsafe/src/test/java/org/apache/spark/unsafe/types/CollationSupportSuite.java: ## @@ -19,6 +19,10 @@ import org.apache.spark.SparkException; import or

Re: [PR] [SPARK-50704][SQL] Support more pushdown functions for MySQL connector [spark]

2025-01-09 Thread via GitHub
sunxiaoguang commented on code in PR #49335: URL: https://github.com/apache/spark/pull/49335#discussion_r1908881150 ## sql/core/src/main/scala/org/apache/spark/sql/jdbc/JdbcDialects.scala: ## @@ -406,7 +412,7 @@ abstract class JdbcDialect extends Serializable with Logging {

Re: [PR] [SPARK-50774][SQL] Make collation names public in CollationFactory [spark]

2025-01-09 Thread via GitHub
stefankandic commented on code in PR #49425: URL: https://github.com/apache/spark/pull/49425#discussion_r1909278319 ## common/unsafe/src/test/java/org/apache/spark/unsafe/types/CollationSupportSuite.java: ## @@ -19,6 +19,10 @@ import org.apache.spark.SparkException; import org

[PR] [SPARK-48353][SQL] Introduction of Error Handling mechanism GRAMMAR in SQL Scripting [spark]

2025-01-09 Thread via GitHub
miland-db opened a new pull request, #49427: URL: https://github.com/apache/spark/pull/49427 ### What changes were proposed in this pull request? This pull request introduces the logic of error handling inside SQL Scripting language. Now, it is possible to: - declare conditions for spe

Re: [PR] [SPARK-50707][SQL] Enable casting to/from char/varchar [spark]

2025-01-09 Thread via GitHub
mihailom-db commented on code in PR #49340: URL: https://github.com/apache/spark/pull/49340#discussion_r1909911927 ## sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CastSuiteBase.scala: ## @@ -1428,4 +1424,43 @@ abstract class CastSuiteBase extends SparkFu

Re: [PR] [SPARK-50707][SQL] Enable casting to/from char/varchar [spark]

2025-01-09 Thread via GitHub
mihailom-db commented on code in PR #49340: URL: https://github.com/apache/spark/pull/49340#discussion_r1909912150 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala: ## @@ -1138,6 +1135,8 @@ case class Cast( to match { case dt if

Re: [PR] [SPARK-50773][Core] Disable structured logging by default [spark]

2025-01-09 Thread via GitHub
dongjoon-hyun commented on PR #49421: URL: https://github.com/apache/spark/pull/49421#issuecomment-2580408134 Since the vote was started, could you make a CI pass, @gengliangwang ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitH

Re: [PR] [SPARK-50720][CORE] Support external shuffle service enablement in local-cluster mode [spark]

2025-01-09 Thread via GitHub
Ngone51 commented on PR #49350: URL: https://github.com/apache/spark/pull/49350#issuecomment-2580433290 > Is it our goal to allow ESS + local-cluster mode only if Utils.isTesting or do we intend support it outside of that mode? @JoshRosen Thanks for the review. And, good question! I

Re: [PR] [SPARK-50124][SQL][FOLLOWUP] InsertSortForLimitAndOffset should propagate missing ordering columns [spark]

2025-01-09 Thread via GitHub
dongjoon-hyun commented on code in PR #49416: URL: https://github.com/apache/spark/pull/49416#discussion_r1907543378 ## sql/core/src/test/scala/org/apache/spark/sql/execution/InsertSortForLimitAndOffsetSuite.scala: ## @@ -110,11 +111,21 @@ class InsertSortForLimitAndOffsetSuite

Re: [PR] [SPARK-50767][SQL] Remove codegen of `from_json` [spark]

2025-01-09 Thread via GitHub
panbingkun commented on PR #49411: URL: https://github.com/apache/spark/pull/49411#issuecomment-2579458063 > Does it mean any time we add codegen support for some functions, there is a risk of perf regression? Are we sure `from_json` is the only one or it's simply because `SubExprEliminatio

Re: [PR] [SPARK-50762][SQL] Add Analyzer rule for resolving SQL scalar UDFs [spark]

2025-01-09 Thread via GitHub
allisonwang-db commented on code in PR #49414: URL: https://github.com/apache/spark/pull/49414#discussion_r1908337016 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala: ## @@ -1561,6 +1561,121 @@ class SessionCatalog( } } + /**

Re: [PR] [SPARK-48809][PYTHON][DOCS] Reimplemented `spark version drop down` of the `PySpark doc site` and fix bug [spark]

2025-01-09 Thread via GitHub
panbingkun commented on PR #47214: URL: https://github.com/apache/spark/pull/47214#issuecomment-2579810938 I have verified again and there is no problem at all. https://github.com/user-attachments/assets/c54ea1ed-ad32-4f77-860d-6859e8098b15"; /> -- This is an automated message from

Re: [PR] [SPARK-48344][SQL] Enhance SQL Script Execution: Replace NOOP with COLLECT for Result DataFrames [spark]

2025-01-09 Thread via GitHub
miland-db commented on code in PR #49372: URL: https://github.com/apache/spark/pull/49372#discussion_r1908757408 ## sql/core/src/main/scala/org/apache/spark/sql/scripting/SqlScriptingExecution.scala: ## @@ -42,32 +45,53 @@ class SqlScriptingExecution( val ctx = new SqlScrip

Re: [PR] [SPARK-50767][SQL] Remove codegen of `from_json` [spark]

2025-01-09 Thread via GitHub
cloud-fan commented on PR #49411: URL: https://github.com/apache/spark/pull/49411#issuecomment-2579593488 @panbingkun great investigation! +1 to implement subexpression elimination for `FilterExec` -- This is an automated message from the Apache Git Service. To respond to the message, ple

Re: [PR] [SPARK-50124][SQL][FOLLOWUP] InsertSortForLimitAndOffset should propagate missing ordering columns [spark]

2025-01-09 Thread via GitHub
cloud-fan commented on code in PR #49416: URL: https://github.com/apache/spark/pull/49416#discussion_r190809 ## sql/core/src/main/scala/org/apache/spark/sql/execution/InsertSortForLimitAndOffset.scala: ## @@ -43,33 +42,61 @@ object InsertSortForLimitAndOffset extends Rule[Sp

Re: [PR] [SPARK-49907][ML][CONNECT] Support spark.ml on Connect [spark]

2025-01-09 Thread via GitHub
zhengruifeng commented on code in PR #48791: URL: https://github.com/apache/spark/pull/48791#discussion_r1908683079 ## sql/connect/common/src/main/protobuf/spark/connect/ml_common.proto: ## @@ -0,0 +1,107 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or m

Re: [PR] [SPARK-50762][SQL] Add Analyzer rule for resolving SQL scalar UDFs [spark]

2025-01-09 Thread via GitHub
cloud-fan commented on code in PR #49414: URL: https://github.com/apache/spark/pull/49414#discussion_r1908653022 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala: ## @@ -2363,6 +2364,278 @@ class Analyzer(override val catalogManager: CatalogM

Re: [PR] [SPARK-50762][SQL] Add Analyzer rule for resolving SQL scalar UDFs [spark]

2025-01-09 Thread via GitHub
cloud-fan commented on code in PR #49414: URL: https://github.com/apache/spark/pull/49414#discussion_r1908658817 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala: ## @@ -2363,6 +2364,278 @@ class Analyzer(override val catalogManager: CatalogM

Re: [PR] [SPARK-50762][SQL] Add Analyzer rule for resolving SQL scalar UDFs [spark]

2025-01-09 Thread via GitHub
cloud-fan commented on code in PR #49414: URL: https://github.com/apache/spark/pull/49414#discussion_r1908653022 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala: ## @@ -2363,6 +2364,278 @@ class Analyzer(override val catalogManager: CatalogM

Re: [PR] [SPARK-49907][ML][CONNECT] Support spark.ml on Connect [spark]

2025-01-09 Thread via GitHub
zhengruifeng commented on code in PR #48791: URL: https://github.com/apache/spark/pull/48791#discussion_r1908683079 ## sql/connect/common/src/main/protobuf/spark/connect/ml_common.proto: ## @@ -0,0 +1,107 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or m

[PR] [SPARK-50774] Make collation names public in CollationFactory [spark]

2025-01-09 Thread via GitHub
stefankandic opened a new pull request, #49425: URL: https://github.com/apache/spark/pull/49425 ### What changes were proposed in this pull request? Making collation names public. ### Why are the changes needed? To be able to have this centralized and not have to create new strin

[PR] [SPARK-50758][K8S]Mounts the krb5 config map on the executor pod [spark]

2025-01-09 Thread via GitHub
maomaodev opened a new pull request, #49426: URL: https://github.com/apache/spark/pull/49426 ### What changes were proposed in this pull request? In this pr, for spark on k8s, the krb5.conf config map will be mounted in executor side as well. Before, the krb5.conf config map is

Re: [PR] [SPARK-50774][SQL] Make collation names public in CollationFactory [spark]

2025-01-09 Thread via GitHub
dongjoon-hyun commented on code in PR #49425: URL: https://github.com/apache/spark/pull/49425#discussion_r1909116747 ## sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala: ## @@ -5773,7 +5773,8 @@ class SQLConf extends Serializable with Logging with SqlApiC

Re: [PR] [SPARK-50774][SQL] Make collation names public in CollationFactory [spark]

2025-01-09 Thread via GitHub
stefankandic commented on PR #49425: URL: https://github.com/apache/spark/pull/49425#issuecomment-2580797873 > +1, the proposal sounds reasonable and this PR handles all instances in non-test code. > > > To be able to have this centralized and not have to create new string literals w

Re: [PR] [SPARK-50774][SQL] Make collation names public in CollationFactory [spark]

2025-01-09 Thread via GitHub
stefankandic commented on code in PR #49425: URL: https://github.com/apache/spark/pull/49425#discussion_r1909149706 ## sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala: ## @@ -5773,7 +5773,8 @@ class SQLConf extends Serializable with Logging with SqlApiCo

Re: [PR] [SPARK-50774][SQL] Make collation names public in CollationFactory [spark]

2025-01-09 Thread via GitHub
dongjoon-hyun commented on PR #49425: URL: https://github.com/apache/spark/pull/49425#issuecomment-2580813643 Oh, if that is only a single file, could you include `CollationSupportSuite` into this PR? -- This is an automated message from the Apache Git Service. To respond to the message,

Re: [PR] [SPARK-50774][SQL] Make collation names public in CollationFactory [spark]

2025-01-09 Thread via GitHub
stefankandic commented on PR #49425: URL: https://github.com/apache/spark/pull/49425#issuecomment-2580829914 > Oh, if that is only a single file (containing most of them), could you include `CollationSupportSuite` into this PR? Done -- This is an automated message from the Apache G

Re: [PR] [SPARK-50704][SQL] Support more pushdown functions for MySQL connector [spark]

2025-01-09 Thread via GitHub
sunxiaoguang commented on code in PR #49335: URL: https://github.com/apache/spark/pull/49335#discussion_r1909080004 ## sql/core/src/main/scala/org/apache/spark/sql/jdbc/JdbcDialects.scala: ## @@ -406,7 +412,7 @@ abstract class JdbcDialect extends Serializable with Logging {

  1   2   >