Re: [PR] [SPARK-51443][SS] Fix singleVariantColumn in DSv2 and readStream. [spark]
cloud-fan closed pull request #50217: [SPARK-51443][SS] Fix singleVariantColumn in DSv2 and readStream. URL: https://github.com/apache/spark/pull/50217 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-51402][SQL][TESTS] Test TimeType in UDF [spark]
MaxGekk commented on PR #50194: URL: https://github.com/apache/spark/pull/50194#issuecomment-2720623643 +1, LGTM. Merging to master. Thank you, @calilisantos. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-51441][SQL] Add DSv2 APIs for constraints [spark]
dongjoon-hyun commented on code in PR #50253: URL: https://github.com/apache/spark/pull/50253#discussion_r1993729490 ## sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/constraints/Constraint.java: ## @@ -0,0 +1,128 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.connector.catalog.constraints; + +import org.apache.spark.annotation.Evolving; +import org.apache.spark.sql.connector.catalog.Identifier; +import org.apache.spark.sql.connector.expressions.NamedReference; +import org.apache.spark.sql.connector.expressions.filter.Predicate; + +/** + * A constraint that defines valid states of data in a table. + * + * @since 4.1.0 + */ +@Evolving +public interface Constraint { + /** + * Returns the name of this constraint. + */ + String name(); + + /** + * Returns the definition of this constraint in DDL format. + */ + String toDDL(); + + /** + * Returns the state of this constraint. + */ + ConstraintState state(); + + /** + * Creates a CHECK constraint with a search condition defined in SQL (Spark SQL dialect). + * + * @param name the constraint name + * @param sql the SQL representation of the search condition (Spark SQL dialect) + * @param state the constraint state + * @return a CHECK constraint with the provided configuration + */ + static Check check(String name, String sql, ConstraintState state) { +return new Check(name, sql, null /* no predicate */, state); Review Comment: Shall we use Java `Optional.empty()` instead of `null`? IIRC, this `null` was not a part of SPIP, wan't it? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-51372][SQL] Introduce a builder pattern in TableCatalog [spark]
cloud-fan commented on PR #50137: URL: https://github.com/apache/spark/pull/50137#issuecomment-2721388429 OK so the requirement is: - adding a new item to table metadata should not require a new overload of `def createTable` - the existing `def createTable` implementation should fail if a new table feature is added, but should still work if the new feature is not used I think the builder should not be an interface, but a class that can have member variables and methods: ``` class CreateTableBuilder { protected Column[] columns; ... CreateTableBuilder withColumns ... ... Table create... } ``` Then we add a new method in `TableCatalog` to create a builder, with the default implementation to return the builtin builder. The workflow is Spark getting the builder from `TableCatalog`, calling `withXXX` methods to set up fields, and calling `create` at the end. When we add a new item (e.g. constraints), we add a new `withXXX` method and `supportsXXX` method in the builder. By default `supportsXXX` returns false, and `withXXX` throws an exception if `supportsXXX` is false. Users need to explicitly override `supportsXXX` to return true to adopt the new feature. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-51438][SQL] Make CatalystDataToProtobuf and ProtobufDataToCatalyst properly comparable and hashable [spark]
cloud-fan commented on PR #50212: URL: https://github.com/apache/spark/pull/50212#issuecomment-2721282652 thanks, merging to master! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-51487][PYTHON][INFRA] Refresh testing images for pyarrow 19 [spark]
dongjoon-hyun commented on PR #50255: URL: https://github.com/apache/spark/pull/50255#issuecomment-2721458327 To @zhengruifeng , could you confirm this part https://github.com/apache/spark/pull/50255#pullrequestreview-2679800649 ? > this PR seems to affect branch-4.0 Daily CI. Could you confirm that? I'm wondering if this PR is targeting 4.0.0 although we don't change spark-rm yet. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-51438][SQL] Make CatalystDataToProtobuf and ProtobufDataToCatalyst properly comparable and hashable [spark]
cloud-fan closed pull request #50212: [SPARK-51438][SQL] Make CatalystDataToProtobuf and ProtobufDataToCatalyst properly comparable and hashable URL: https://github.com/apache/spark/pull/50212 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-51338][INFRA] Add automated CI build for `connect-examples` [spark]
LuciferYang commented on code in PR #50187: URL: https://github.com/apache/spark/pull/50187#discussion_r1991901327 ## connect-examples/server-library-example/pom.xml: ## @@ -36,7 +36,8 @@ UTF-8 2.13 2.13.15 -3.25.4 -4.0.0-preview2 +4.29.3 +4.1.0-SNAPSHOT +33.4.0-jre Review Comment: If feasible, it's certainly ok. However, I have a few questions regarding this: 1. How should the versions of other dependencies be updated? Do they need to be consistent with Spark? For instance, the current Spark uses Scala 2.13.16, but this project is still using 2.13.15. 2. During the release process, after changing the Spark version (e.g., from 4.0.0-SNAPSHOT to 4.0.0), is it necessary to check the `build` of this project? 3. Since it aims to be independent project, why don't we choose to maintain this examples project in a separate branch(no Spark code whatsoever), or even create a separate repository like `spark-connect-examples`? If it is an independent repository, would it be more convenient to also include examples for clients in other programming languages, such as Go or Swift? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-51441][SQL] Add DSv2 APIs for constraints [spark]
dongjoon-hyun commented on code in PR #50253: URL: https://github.com/apache/spark/pull/50253#discussion_r1993722344 ## sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/constraints/BaseConstraint.java: ## @@ -0,0 +1,71 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.connector.catalog.constraints; + +import org.apache.spark.sql.connector.expressions.NamedReference; + +import java.util.StringJoiner; Review Comment: Could you place this before `org.apache.spark.` import statements? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-51359][CORE][SQL] Set INT64 as the default timestamp type for Parquet files [spark]
ganeshashree commented on PR #50215: URL: https://github.com/apache/spark/pull/50215#issuecomment-2721580059 > int96 support in parquet apache/iceberg#1138 @HyukjinKwon Yes, this will be a breaking change for applications that depend on INT96 and use future versions of Spark. We are changing the default timestamp type for Parquet files in a future version of Spark. Applications using future versions of Spark to read or write Parquet files will need to configure Spark to handle INT96, as it will no longer be supported by default. We will need to document this change and highlight it in our release notes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-51466][SQL][HIVE] Eliminate Hive built-in UDFs initialization on Hive UDF evaluation [spark]
cloud-fan commented on PR #50232: URL: https://github.com/apache/spark/pull/50232#issuecomment-2721567771 late LGTM -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-51441][SQL] Add DSv2 APIs for constraints [spark]
dongjoon-hyun commented on code in PR #50253: URL: https://github.com/apache/spark/pull/50253#discussion_r1993729490 ## sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/constraints/Constraint.java: ## @@ -0,0 +1,128 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.connector.catalog.constraints; + +import org.apache.spark.annotation.Evolving; +import org.apache.spark.sql.connector.catalog.Identifier; +import org.apache.spark.sql.connector.expressions.NamedReference; +import org.apache.spark.sql.connector.expressions.filter.Predicate; + +/** + * A constraint that defines valid states of data in a table. + * + * @since 4.1.0 + */ +@Evolving +public interface Constraint { + /** + * Returns the name of this constraint. + */ + String name(); + + /** + * Returns the definition of this constraint in DDL format. + */ + String toDDL(); + + /** + * Returns the state of this constraint. + */ + ConstraintState state(); + + /** + * Creates a CHECK constraint with a search condition defined in SQL (Spark SQL dialect). + * + * @param name the constraint name + * @param sql the SQL representation of the search condition (Spark SQL dialect) + * @param state the constraint state + * @return a CHECK constraint with the provided configuration + */ + static Check check(String name, String sql, ConstraintState state) { +return new Check(name, sql, null /* no predicate */, state); Review Comment: Shall we use Java `Optional.empty()` instead of `null`? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-51350][SQL] Implement Show Procedures [spark]
cloud-fan commented on code in PR #50109: URL: https://github.com/apache/spark/pull/50109#discussion_r1993730085 ## sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/ProcedureCatalog.java: ## @@ -34,4 +34,9 @@ public interface ProcedureCatalog extends CatalogPlugin { * @return the loaded unbound procedure */ UnboundProcedure loadProcedure(Identifier ident); + + /** + * List all procedures in the specified database. + */ + Identifier[] listProcedures(String[] namespace); Review Comment: I think for methods that need to be implemented by users, `String[]` is better. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-51350][SQL] Implement Show Procedures [spark]
cloud-fan commented on code in PR #50109: URL: https://github.com/apache/spark/pull/50109#discussion_r1993728678 ## sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/ProcedureCatalog.java: ## @@ -34,4 +34,9 @@ public interface ProcedureCatalog extends CatalogPlugin { * @return the loaded unbound procedure */ UnboundProcedure loadProcedure(Identifier ident); + + /** + * List all procedures in the specified database. Review Comment: +1 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-51441][SQL] Add DSv2 APIs for constraints [spark]
dongjoon-hyun commented on code in PR #50253: URL: https://github.com/apache/spark/pull/50253#discussion_r1993731273 ## sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/constraints/Constraint.java: ## @@ -0,0 +1,128 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.connector.catalog.constraints; + +import org.apache.spark.annotation.Evolving; +import org.apache.spark.sql.connector.catalog.Identifier; +import org.apache.spark.sql.connector.expressions.NamedReference; +import org.apache.spark.sql.connector.expressions.filter.Predicate; + +/** + * A constraint that defines valid states of data in a table. + * + * @since 4.1.0 + */ +@Evolving +public interface Constraint { + /** + * Returns the name of this constraint. + */ + String name(); + + /** + * Returns the definition of this constraint in DDL format. + */ + String toDDL(); + + /** + * Returns the state of this constraint. + */ + ConstraintState state(); + + /** + * Creates a CHECK constraint with a search condition defined in SQL (Spark SQL dialect). + * + * @param name the constraint name + * @param sql the SQL representation of the search condition (Spark SQL dialect) + * @param state the constraint state + * @return a CHECK constraint with the provided configuration + */ + static Check check(String name, String sql, ConstraintState state) { +return new Check(name, sql, null /* no predicate */, state); + } + + /** + * Creates a CHECK constraint with a search condition defined by a {@link Predicate predicate}. + * + * @param name the constraint name + * @param predicate the search condition + * @param state the constraint state + * @return a CHECK constraint with the provided configuration + */ + static Check check(String name, Predicate predicate, ConstraintState state) { +return new Check(name, null /* no SQL */, predicate, state); Review Comment: ditto. `Optional`? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-51441][SQL] Add DSv2 APIs for constraints [spark]
dongjoon-hyun commented on code in PR #50253: URL: https://github.com/apache/spark/pull/50253#discussion_r1993729490 ## sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/constraints/Constraint.java: ## @@ -0,0 +1,128 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.connector.catalog.constraints; + +import org.apache.spark.annotation.Evolving; +import org.apache.spark.sql.connector.catalog.Identifier; +import org.apache.spark.sql.connector.expressions.NamedReference; +import org.apache.spark.sql.connector.expressions.filter.Predicate; + +/** + * A constraint that defines valid states of data in a table. + * + * @since 4.1.0 + */ +@Evolving +public interface Constraint { + /** + * Returns the name of this constraint. + */ + String name(); + + /** + * Returns the definition of this constraint in DDL format. + */ + String toDDL(); + + /** + * Returns the state of this constraint. + */ + ConstraintState state(); + + /** + * Creates a CHECK constraint with a search condition defined in SQL (Spark SQL dialect). + * + * @param name the constraint name + * @param sql the SQL representation of the search condition (Spark SQL dialect) + * @param state the constraint state + * @return a CHECK constraint with the provided configuration + */ + static Check check(String name, String sql, ConstraintState state) { +return new Check(name, sql, null /* no predicate */, state); Review Comment: Shall we use Java `Optional.empty()` instead of `null`? IIRC, this `null` was not a part of SPIP, wan't it? Please correct me if there is a reason to have `null`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-51350][SQL] Implement Show Procedures [spark]
cloud-fan commented on code in PR #50109: URL: https://github.com/apache/spark/pull/50109#discussion_r1993733934 ## sql/catalyst/src/test/scala/org/apache/spark/sql/connector/catalog/InMemoryTableCatalog.scala: ## @@ -268,6 +268,11 @@ class InMemoryTableCatalog extends BasicInMemoryTableCatalog with SupportsNamesp procedure } + override def listProcedures(namespace: Array[String]): Array[Identifier] = { +procedures.asScala.filter{case (_, p) => !p.name().equals("dummy_increment")} Review Comment: oh nvm, I misread the code. BTW it's scala s we can do `case (_, p) if p.mame != "..."` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-51350][SQL] Implement Show Procedures [spark]
cloud-fan commented on code in PR #50109: URL: https://github.com/apache/spark/pull/50109#discussion_r1993734437 ## sql/catalyst/src/test/scala/org/apache/spark/sql/connector/catalog/InMemoryTableCatalog.scala: ## @@ -268,6 +268,11 @@ class InMemoryTableCatalog extends BasicInMemoryTableCatalog with SupportsNamesp procedure } + override def listProcedures(namespace: Array[String]): Array[Identifier] = { Review Comment: the `namespace` parameter is not used? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-51350][SQL] Implement Show Procedures [spark]
cloud-fan commented on code in PR #50109: URL: https://github.com/apache/spark/pull/50109#discussion_r1993737628 ## sql/core/src/main/scala/org/apache/spark/sql/execution/command/ShowProceduresCommand.scala: ## @@ -0,0 +1,54 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution.command + +import org.apache.spark.sql.{Row, SparkSession} +import org.apache.spark.sql.catalyst.analysis.ResolvedNamespace +import org.apache.spark.sql.catalyst.expressions.{Attribute, AttributeReference} +import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan +import org.apache.spark.sql.connector.catalog.CatalogV2Implicits._ +import org.apache.spark.sql.types.StringType + +/** + * A command for users to get procedures. + * If a namespace is not given, the current namespace will be used. + * The syntax of using this command in SQL is: + * {{{ + * SHOW TABLES [(IN|FROM) namespace]] + * }}} + */ +case class ShowProceduresCommand( +child: LogicalPlan, +override val output: Seq[Attribute] = Seq( + AttributeReference("namespace", StringType, nullable = false)(), Review Comment: Shall we follow SHOW FUNCTIONS and only output qualified procedure names? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-51350][SQL] Implement Show Procedures [spark]
cloud-fan commented on code in PR #50109: URL: https://github.com/apache/spark/pull/50109#discussion_r1993740184 ## sql/core/src/test/scala/org/apache/spark/sql/connector/ProcedureSuite.scala: ## @@ -40,15 +40,23 @@ class ProcedureSuite extends QueryTest with SharedSparkSession with BeforeAndAft before { spark.conf.set(s"spark.sql.catalog.cat", classOf[InMemoryCatalog].getName) +spark.conf.set(s"spark.sql.catalog.cat2", classOf[InMemoryCatalog].getName) + +// needed for switching back and forth between catalogs +sql("create database cat.default") Review Comment: The default namespace for these catalogs should be `[]`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-51441][SQL] Add DSv2 APIs for constraints [spark]
dongjoon-hyun commented on code in PR #50253: URL: https://github.com/apache/spark/pull/50253#discussion_r1993729490 ## sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/constraints/Constraint.java: ## @@ -0,0 +1,128 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.connector.catalog.constraints; + +import org.apache.spark.annotation.Evolving; +import org.apache.spark.sql.connector.catalog.Identifier; +import org.apache.spark.sql.connector.expressions.NamedReference; +import org.apache.spark.sql.connector.expressions.filter.Predicate; + +/** + * A constraint that defines valid states of data in a table. + * + * @since 4.1.0 + */ +@Evolving +public interface Constraint { + /** + * Returns the name of this constraint. + */ + String name(); + + /** + * Returns the definition of this constraint in DDL format. + */ + String toDDL(); + + /** + * Returns the state of this constraint. + */ + ConstraintState state(); + + /** + * Creates a CHECK constraint with a search condition defined in SQL (Spark SQL dialect). + * + * @param name the constraint name + * @param sql the SQL representation of the search condition (Spark SQL dialect) + * @param state the constraint state + * @return a CHECK constraint with the provided configuration + */ + static Check check(String name, String sql, ConstraintState state) { +return new Check(name, sql, null /* no predicate */, state); Review Comment: Shall we use Java `Optional.empty()` instead of `null`? IIRC, this `null` was not a part of SPIP, wan't it? Please correct me if there is a reason to have `null`. I might miss the detail. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-48922][SQL] Avoid redundant array transform of identical expression for map type [spark]
wForget commented on PR #50245: URL: https://github.com/apache/spark/pull/50245#issuecomment-2719721278 > @wForget Could you create backport PR for branch-3.5 Sure, I will create later -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-51495] Add `Integration Test` GitHub Action job [spark-connect-swift]
dongjoon-hyun commented on code in PR #15: URL: https://github.com/apache/spark-connect-swift/pull/15#discussion_r1992468710 ## Tests/SparkConnectTests/DataFrameTests.swift: ## @@ -81,6 +81,7 @@ struct DataFrameTests { await spark.stop() } +#if !os(Linux) Review Comment: This is ignored on `Linux` environment due to the `Swift` library difference. Will be recovered later. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-37019][SQL] Add codegen support to array higher-order functions [spark]
chris-twiner commented on code in PR #34558: URL: https://github.com/apache/spark/pull/34558#discussion_r1993925656 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/higherOrderFunctions.scala: ## @@ -235,6 +256,53 @@ trait HigherOrderFunction extends Expression with ExpectsInputTypes { val canonicalizedChildren = cleaned.children.map(_.canonicalized) withNewChildren(canonicalizedChildren) } + + + protected def assignAtomic(atomicRef: String, value: String, isNull: String = FalseLiteral, + nullable: Boolean = false) = { +if (nullable) { + s""" +if ($isNull) { + $atomicRef.set(null); +} else { + $atomicRef.set($value); +} + """ +} else { + s"$atomicRef.set($value);" +} + } + + protected def assignArrayElement(ctx: CodegenContext, arrayName: String, elementCode: ExprCode, + elementVar: NamedLambdaVariable, index: String): String = { +val elementType = elementVar.dataType +val elementAtomic = ctx.addReferenceObj(elementVar.name, elementVar.value) +val extractElement = CodeGenerator.getValue(arrayName, elementType, index) Review Comment: > Finally got back around to this, added tests showing why the atomic reference is needed in case of `CodegenFallback` inside a lambda expression FYI - In [my testing](https://github.com/sparkutils/quality/blob/temp/fabric_0.1.3.1_rc6_map_with_issues/src/main/scala/org/apache/spark/sql/qualityFunctions/LambdaCompilation.scala#L21) you can seemingly, at codegen time, swap out the AtomicReference/NamedLambdaVariable in all cases to a simple set/get wrapper around a field for the eval CodegenFallback case. Per [the original NamedLambdaVariable PR](https://github.com/apache/spark/pull/21954#discussion_r207162167) the AtomicReference only seems to be needed for proper tree transformations, so you can swap out at compilation time to squeeze a little bit [more perf out](https://brooker.co.za/blog/2012/09/10/volatile.html) (lower memory usage, no need for volatile read/memory barriers on write etc. this only really makes a difference if the code is HOF heavy of course, like Quality tends to be). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-51350][SQL] Implement Show Procedures [spark]
szehon-ho commented on code in PR #50109: URL: https://github.com/apache/spark/pull/50109#discussion_r1993927792 ## sql/catalyst/src/test/scala/org/apache/spark/sql/connector/catalog/InMemoryTableCatalog.scala: ## @@ -268,6 +268,11 @@ class InMemoryTableCatalog extends BasicInMemoryTableCatalog with SupportsNamesp procedure } + override def listProcedures(namespace: Array[String]): Array[Identifier] = { +procedures.asScala.filter{case (_, p) => !p.name().equals("dummy_increment")} Review Comment: this is filter, it has to return predicate, do you mean ```.filter{case (_, p) if !p.name().equals("dummy_increment") => true}```? it looks a bit weird. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-51504] Support `select/limit/sort/orderBy/isEmpty` for `DataFrame` [spark-connect-swift]
dongjoon-hyun commented on PR #16: URL: https://github.com/apache/spark-connect-swift/pull/16#issuecomment-272244 Could you review this PR when you have some time, @viirya ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-51504] Support `select/limit/sort/orderBy/isEmpty` for `DataFrame` [spark-connect-swift]
viirya commented on code in PR #16: URL: https://github.com/apache/spark-connect-swift/pull/16#discussion_r1994364613 ## Sources/SparkConnect/DataFrame.swift: ## @@ -192,4 +192,38 @@ public actor DataFrame: Sendable { print(table.render()) } } + + /// Projects a set of expressions and returns a new ``DataFrame``. + /// - Parameter cols: Column names + /// - Returns: A ``DataFrame`` with subset of columns. + public func select(_ cols: String...) -> DataFrame { +return DataFrame(spark: self.spark, plan: SparkConnectClient.getProject(self.plan.root, cols)) + } + + /// Return a new ``DataFrame`` sorted by the specified column(s). + /// - Parameter cols: Column names. + /// - Returns: A sorted ``DataFrame`` + public func sort(_ cols: String...) -> DataFrame { +return DataFrame(spark: self.spark, plan: SparkConnectClient.getSort(self.plan.root, cols)) + } + + /// <#Description#> + /// - Parameter cols: <#cols description#> + /// - Returns: <#description#> Review Comment: The API doc looks incorrect? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-44856][PYTHON] Improve Python UDTF arrow serializer performance [spark]
dongjoon-hyun commented on PR #50099: URL: https://github.com/apache/spark/pull/50099#issuecomment-2721318301 Could you fix the remaining failures? ``` pyspark.errors.exceptions.base.PySparkRuntimeError: [UDTF_ARROW_TYPE_CAST_ERROR] Cannot convert the output value of the column 'x' with type 'object' to the specified return type of the column: 'list'. Please check if the data types match and try again. ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-43221][CORE][3.5] Host local block fetching should use a block status of a block stored on disk [spark]
dongjoon-hyun closed pull request #50260: [SPARK-43221][CORE][3.5] Host local block fetching should use a block status of a block stored on disk URL: https://github.com/apache/spark/pull/50260 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-51450][CORE] BarrierCoordinator thread not exiting in Spark standalone mode [spark]
beliefer commented on PR #50223: URL: https://github.com/apache/spark/pull/50223#issuecomment-2721323484 @jjayadeep06 @srowen @cnauroth Thank you ! Merged into branch-3.5 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-51450][CORE] BarrierCoordinator thread not exiting in Spark standalone mode [spark]
beliefer closed pull request #50223: [SPARK-51450][CORE] BarrierCoordinator thread not exiting in Spark standalone mode URL: https://github.com/apache/spark/pull/50223 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-43221][CORE][3.5] Host local block fetching should use a block status of a block stored on disk [spark]
dongjoon-hyun commented on PR #50260: URL: https://github.com/apache/spark/pull/50260#issuecomment-2721308974 Merged to branch-3.5. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-51487][PYTHON][INFRA] Refresh testing images for pyarrow 19 [spark]
zhengruifeng commented on PR #50255: URL: https://github.com/apache/spark/pull/50255#issuecomment-2721794721 > To @zhengruifeng , could you confirm this part [#50255 (review)](https://github.com/apache/spark/pull/50255#pullrequestreview-2679800649) ? > > > this PR seems to affect branch-4.0 Daily CI. Could you confirm that? I'm wondering if this PR is targeting 4.0.0 although we don't change spark-rm yet. For branch-4.0 Daily CI, I checked it with https://github.com/apache/spark/pull/50262, I think branch-4.0 is compatiable with pyarrow 19. And the docker image tags are different for branch-4.0 and master. Build / Python-only (branch-4.0): > /usr/bin/docker --config /home/runner/work/_temp/.docker_b5b08f1c-8088-4c21-8300-3944aeeddde3 pull ghcr.io/apache/apache-spark-ci-image-pyspark-python-311:branch-4.0-13834271006 Build / Coverage (master, Scala 2.13, Hadoop 3, JDK 17): > /usr/bin/docker --config /home/runner/work/_temp/.docker_5b8ff392-d367-4a9b-92c4-3f068331fdee pull ghcr.io/apache/apache-spark-ci-image-pyspark-python-311:master-13831993775 So I suspect if we only merge this PR in master, it won't affect branch-4.0. Regarding other daily builds, can we verify them later? otherwise, need to change the env of PR builder to each of these jobs and test one by one -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-23890][SQL] Support DDL for adding nested columns to struct types [spark]
ottomata commented on PR #21012: URL: https://github.com/apache/spark/pull/21012#issuecomment-2722408376 > Would one need to use CHANGE|ALTER COLUMN syntax for this? [TIL](https://phabricator.wikimedia.org/T209453#10632894) that Iceberg supports this with .value column name referencing! https://iceberg.apache.org/docs/nightly/spark-ddl/#alter-table-add-column ```sql -- create a map column of struct key and struct value ALTER TABLE prod.db.sample ADD COLUMN points map, struct>; -- add a field to the value struct in a map. Using keyword 'value' to access the map's value column. ALTER TABLE prod.db.sample ADD COLUMN points.value.b int; ``` And, that works via Spark! I guess there is a bug though: https://github.com/apache/iceberg/issues/2962 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-51504] Support `select/limit/sort/orderBy/isEmpty` for `DataFrame` [spark-connect-swift]
viirya commented on code in PR #16: URL: https://github.com/apache/spark-connect-swift/pull/16#discussion_r1994366213 ## Sources/SparkConnect/DataFrame.swift: ## @@ -192,4 +192,38 @@ public actor DataFrame: Sendable { print(table.render()) } } + + /// Projects a set of expressions and returns a new ``DataFrame``. + /// - Parameter cols: Column names + /// - Returns: A ``DataFrame`` with subset of columns. + public func select(_ cols: String...) -> DataFrame { +return DataFrame(spark: self.spark, plan: SparkConnectClient.getProject(self.plan.root, cols)) + } + + /// Return a new ``DataFrame`` sorted by the specified column(s). + /// - Parameter cols: Column names. + /// - Returns: A sorted ``DataFrame`` + public func sort(_ cols: String...) -> DataFrame { +return DataFrame(spark: self.spark, plan: SparkConnectClient.getSort(self.plan.root, cols)) + } + + /// <#Description#> + /// - Parameter cols: <#cols description#> + /// - Returns: <#description#> + public func orderBy(_ cols: String...) -> DataFrame { +return DataFrame(spark: self.spark, plan: SparkConnectClient.getSort(self.plan.root, cols)) + } + + /// Limits the result count to the number specified. + /// - Parameter n: Number of records to return. Will return this number of records or all records if the ``DataFrame`` contains less than this number of records. + /// - Returns: A subset of the records + public func limit(_ n: Int32) -> DataFrame { +return DataFrame(spark: self.spark, plan: SparkConnectClient.getLimit(self.plan.root, n)) + } + + /// Chec if the ``DataFrame`` is empty and returns a boolean value. Review Comment: ```suggestion /// Checks if the ``DataFrame`` is empty and returns a boolean value. ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-51504] Support `select/limit/sort/orderBy/isEmpty` for `DataFrame` [spark-connect-swift]
viirya commented on code in PR #16: URL: https://github.com/apache/spark-connect-swift/pull/16#discussion_r1994368629 ## Tests/SparkConnectTests/DataFrameTests.swift: ## @@ -68,6 +68,76 @@ struct DataFrameTests { await spark.stop() } + @Test + func selectNone() async throws { +let spark = try await SparkSession.builder.getOrCreate() +let emptySchema = try await spark.range(1).select().schema() +#expect(emptySchema == #"{"struct":{}}"#) +await spark.stop() + } + + @Test + func select() async throws { +let spark = try await SparkSession.builder.getOrCreate() +let schema = try await spark.range(1).select("id").schema() +#expect( + schema +== #"{"struct":{"fields":[{"name":"id","dataType":{"long":{}}}]}}"# +) +await spark.stop() + } + + @Test + func selectMultipleColumns() async throws { +let spark = try await SparkSession.builder.getOrCreate() +let schema = try await spark.sql("SELECT * FROM VALUES (1, 2)").select("col2", "col1").schema() +#expect( + schema +== #"{"struct":{"fields":[{"name":"col2","dataType":{"integer":{}}},{"name":"col1","dataType":{"integer":{}}}]}}"# +) +await spark.stop() + } + + @Test + func selectInvalidColumn() async throws { +let spark = try await SparkSession.builder.getOrCreate() +try await #require(throws: Error.self) { + let _ = try await spark.range(1).select("invalid").schema() +} +await spark.stop() + } + + @Test + func limit() async throws { +let spark = try await SparkSession.builder.getOrCreate() +#expect(try await spark.range(10).limit(0).count() == 0) +#expect(try await spark.range(10).limit(1).count() == 1) +#expect(try await spark.range(10).limit(2).count() == 2) +await spark.stop() + } + + @Test + func isEmpty() async throws { +let spark = try await SparkSession.builder.getOrCreate() +#expect(try await spark.range(0).isEmpty()) +#expect(!(try await spark.range(1).isEmpty())) +await spark.stop() + } + + @Test + func sort() async throws { +let spark = try await SparkSession.builder.getOrCreate() +#expect(try await spark.range(10).sort("id").count() == 10) Review Comment: Do we need to check if the return data is actually sorted? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-51504] Support `select/limit/sort/orderBy/isEmpty` for `DataFrame` [spark-connect-swift]
viirya commented on code in PR #16: URL: https://github.com/apache/spark-connect-swift/pull/16#discussion_r1994367984 ## Tests/SparkConnectTests/DataFrameTests.swift: ## @@ -68,6 +68,76 @@ struct DataFrameTests { await spark.stop() } + @Test + func selectNone() async throws { +let spark = try await SparkSession.builder.getOrCreate() +let emptySchema = try await spark.range(1).select().schema() +#expect(emptySchema == #"{"struct":{}}"#) +await spark.stop() + } + + @Test + func select() async throws { +let spark = try await SparkSession.builder.getOrCreate() +let schema = try await spark.range(1).select("id").schema() +#expect( + schema +== #"{"struct":{"fields":[{"name":"id","dataType":{"long":{}}}]}}"# +) +await spark.stop() + } + + @Test + func selectMultipleColumns() async throws { +let spark = try await SparkSession.builder.getOrCreate() +let schema = try await spark.sql("SELECT * FROM VALUES (1, 2)").select("col2", "col1").schema() +#expect( + schema +== #"{"struct":{"fields":[{"name":"col2","dataType":{"integer":{}}},{"name":"col1","dataType":{"integer":{}}}]}}"# +) +await spark.stop() + } + + @Test + func selectInvalidColumn() async throws { +let spark = try await SparkSession.builder.getOrCreate() +try await #require(throws: Error.self) { + let _ = try await spark.range(1).select("invalid").schema() +} +await spark.stop() + } + + @Test + func limit() async throws { +let spark = try await SparkSession.builder.getOrCreate() +#expect(try await spark.range(10).limit(0).count() == 0) +#expect(try await spark.range(10).limit(1).count() == 1) +#expect(try await spark.range(10).limit(2).count() == 2) Review Comment: ```suggestion #expect(try await spark.range(10).limit(2).count() == 2) #expect(try await spark.range(10).limit(15).count() == 10) ``` ## Tests/SparkConnectTests/DataFrameTests.swift: ## @@ -68,6 +68,76 @@ struct DataFrameTests { await spark.stop() } + @Test + func selectNone() async throws { +let spark = try await SparkSession.builder.getOrCreate() +let emptySchema = try await spark.range(1).select().schema() +#expect(emptySchema == #"{"struct":{}}"#) +await spark.stop() + } + + @Test + func select() async throws { +let spark = try await SparkSession.builder.getOrCreate() +let schema = try await spark.range(1).select("id").schema() +#expect( + schema +== #"{"struct":{"fields":[{"name":"id","dataType":{"long":{}}}]}}"# +) +await spark.stop() + } + + @Test + func selectMultipleColumns() async throws { +let spark = try await SparkSession.builder.getOrCreate() +let schema = try await spark.sql("SELECT * FROM VALUES (1, 2)").select("col2", "col1").schema() +#expect( + schema +== #"{"struct":{"fields":[{"name":"col2","dataType":{"integer":{}}},{"name":"col1","dataType":{"integer":{}}}]}}"# +) +await spark.stop() + } + + @Test + func selectInvalidColumn() async throws { +let spark = try await SparkSession.builder.getOrCreate() +try await #require(throws: Error.self) { + let _ = try await spark.range(1).select("invalid").schema() +} +await spark.stop() + } + + @Test + func limit() async throws { +let spark = try await SparkSession.builder.getOrCreate() +#expect(try await spark.range(10).limit(0).count() == 0) +#expect(try await spark.range(10).limit(1).count() == 1) +#expect(try await spark.range(10).limit(2).count() == 2) Review Comment: Maybe add a case exceeding the limit number? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-51504] Support `select/limit/sort/orderBy/isEmpty` for `DataFrame` [spark-connect-swift]
viirya commented on code in PR #16: URL: https://github.com/apache/spark-connect-swift/pull/16#discussion_r1994370384 ## Tests/SparkConnectTests/DataFrameTests.swift: ## @@ -68,6 +68,76 @@ struct DataFrameTests { await spark.stop() } + @Test + func selectNone() async throws { +let spark = try await SparkSession.builder.getOrCreate() +let emptySchema = try await spark.range(1).select().schema() +#expect(emptySchema == #"{"struct":{}}"#) +await spark.stop() + } + + @Test + func select() async throws { +let spark = try await SparkSession.builder.getOrCreate() +let schema = try await spark.range(1).select("id").schema() +#expect( + schema +== #"{"struct":{"fields":[{"name":"id","dataType":{"long":{}}}]}}"# +) +await spark.stop() + } + + @Test + func selectMultipleColumns() async throws { +let spark = try await SparkSession.builder.getOrCreate() +let schema = try await spark.sql("SELECT * FROM VALUES (1, 2)").select("col2", "col1").schema() +#expect( + schema +== #"{"struct":{"fields":[{"name":"col2","dataType":{"integer":{}}},{"name":"col1","dataType":{"integer":{}}}]}}"# +) +await spark.stop() + } + + @Test + func selectInvalidColumn() async throws { +let spark = try await SparkSession.builder.getOrCreate() +try await #require(throws: Error.self) { + let _ = try await spark.range(1).select("invalid").schema() +} +await spark.stop() + } + + @Test + func limit() async throws { +let spark = try await SparkSession.builder.getOrCreate() +#expect(try await spark.range(10).limit(0).count() == 0) +#expect(try await spark.range(10).limit(1).count() == 1) +#expect(try await spark.range(10).limit(2).count() == 2) +await spark.stop() + } + + @Test + func isEmpty() async throws { +let spark = try await SparkSession.builder.getOrCreate() +#expect(try await spark.range(0).isEmpty()) +#expect(!(try await spark.range(1).isEmpty())) +await spark.stop() + } + + @Test + func sort() async throws { +let spark = try await SparkSession.builder.getOrCreate() +#expect(try await spark.range(10).sort("id").count() == 10) Review Comment: Okay, I see. Maybe we can return to this after `collect` is implemented. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-51504] Support `select/limit/sort/orderBy/isEmpty` for `DataFrame` [spark-connect-swift]
dongjoon-hyun commented on code in PR #16: URL: https://github.com/apache/spark-connect-swift/pull/16#discussion_r1994369781 ## Tests/SparkConnectTests/DataFrameTests.swift: ## @@ -68,6 +68,76 @@ struct DataFrameTests { await spark.stop() } + @Test + func selectNone() async throws { +let spark = try await SparkSession.builder.getOrCreate() +let emptySchema = try await spark.range(1).select().schema() +#expect(emptySchema == #"{"struct":{}}"#) +await spark.stop() + } + + @Test + func select() async throws { +let spark = try await SparkSession.builder.getOrCreate() +let schema = try await spark.range(1).select("id").schema() +#expect( + schema +== #"{"struct":{"fields":[{"name":"id","dataType":{"long":{}}}]}}"# +) +await spark.stop() + } + + @Test + func selectMultipleColumns() async throws { +let spark = try await SparkSession.builder.getOrCreate() +let schema = try await spark.sql("SELECT * FROM VALUES (1, 2)").select("col2", "col1").schema() +#expect( + schema +== #"{"struct":{"fields":[{"name":"col2","dataType":{"integer":{}}},{"name":"col1","dataType":{"integer":{}}}]}}"# +) +await spark.stop() + } + + @Test + func selectInvalidColumn() async throws { +let spark = try await SparkSession.builder.getOrCreate() +try await #require(throws: Error.self) { + let _ = try await spark.range(1).select("invalid").schema() +} +await spark.stop() + } + + @Test + func limit() async throws { +let spark = try await SparkSession.builder.getOrCreate() +#expect(try await spark.range(10).limit(0).count() == 0) +#expect(try await spark.range(10).limit(1).count() == 1) +#expect(try await spark.range(10).limit(2).count() == 2) +await spark.stop() + } + + @Test + func isEmpty() async throws { +let spark = try await SparkSession.builder.getOrCreate() +#expect(try await spark.range(0).isEmpty()) +#expect(!(try await spark.range(1).isEmpty())) +await spark.stop() + } + + @Test + func sort() async throws { +let spark = try await SparkSession.builder.getOrCreate() +#expect(try await spark.range(10).sort("id").count() == 10) Review Comment: So, I only manually verified the result so far. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-51504] Support `select/limit/sort/orderBy/isEmpty` for `DataFrame` [spark-connect-swift]
dongjoon-hyun commented on PR #16: URL: https://github.com/apache/spark-connect-swift/pull/16#issuecomment-2722813101 Merged to main. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-51506][PYTHON][SS] Do not enforce users to implement close() in TransformWithStateInPandas [spark]
jingz-db commented on PR #50272: URL: https://github.com/apache/spark/pull/50272#issuecomment-2722832096 There seem to be more places that could remove the unnecessary `close()` impl in test file: e.g. https://github.com/apache/spark/blob/ccfc0a9dcea8fa9d6aab4dbb233f8135a3947441/python/pyspark/sql/tests/pandas/test_pandas_transform_with_state.py#L1479 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[PR] [SPARK-51501][SQL] Disable ObjectHashAggregate for group by on collated columns [spark]
stefankandic opened a new pull request, #50267: URL: https://github.com/apache/spark/pull/50267 ### What changes were proposed in this pull request? Disabling `ObjectHashAggregate` when grouping on columns with collations. ### Why are the changes needed? https://github.com/apache/spark/pull/45290 added support for sort based aggregation on collated columns and explicitly forbade the use of hash aggregate for collated columns. However, it did not consider the third type of aggregate, the object hash aggregate, which is only used when there are also TypedImperativeAggregate expressions present ([source](https://github.com/apache/spark/blob/f3b081066393e1568c364b6d3bc0bceabd1e7e9f/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala#L1204)). That means that if we group by a collated column and also have a TypedImperativeAggregate we will end up using the object has aggregate which can lead to incorrect results like in the example below: ```code CREATE TABLE tbl(c1 STRING COLLATE UTF8_LCASE, c2 INT) USING PARQUET; INSERT INTO tbl VALUES ('HELLO', 1), ('hello', 2), ('HeLlO', 3); SELECT COLLECT_LIST(c2) as list FROM tbl GROUP BY c1; ``` where the result would have three rows with values [1], [2] and [3] instead of one row with value [1, 2, 3]. For this reason we should do the same thing as we did for the regular hash aggregate, make it so that it doesn't support grouping expressions on collated columns. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? New unit tests. ### Was this patch authored or co-authored using generative AI tooling? No. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[PR] [SPARK-51502][SQL] Move collations test to collations package [spark]
stefankandic opened a new pull request, #50268: URL: https://github.com/apache/spark/pull/50268 ### What changes were proposed in this pull request? Move collations test into collations package where most collation test suites already are located. ### Why are the changes needed? For consistency purposes, also not to bloat the sql folder with too many suites. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Existing tests should suffice. ### Was this patch authored or co-authored using generative AI tooling? No. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-37019][SQL] Add codegen support to array higher-order functions [spark]
Kimahriman commented on code in PR #34558: URL: https://github.com/apache/spark/pull/34558#discussion_r1993969482 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/higherOrderFunctions.scala: ## @@ -235,6 +256,53 @@ trait HigherOrderFunction extends Expression with ExpectsInputTypes { val canonicalizedChildren = cleaned.children.map(_.canonicalized) withNewChildren(canonicalizedChildren) } + + + protected def assignAtomic(atomicRef: String, value: String, isNull: String = FalseLiteral, + nullable: Boolean = false) = { +if (nullable) { + s""" +if ($isNull) { + $atomicRef.set(null); +} else { + $atomicRef.set($value); +} + """ +} else { + s"$atomicRef.set($value);" +} + } + + protected def assignArrayElement(ctx: CodegenContext, arrayName: String, elementCode: ExprCode, + elementVar: NamedLambdaVariable, index: String): String = { +val elementType = elementVar.dataType +val elementAtomic = ctx.addReferenceObj(elementVar.name, elementVar.value) +val extractElement = CodeGenerator.getValue(arrayName, elementType, index) Review Comment: Do you just mean the built-in concurrency of `AtomicReference` isn't needed except for something related to the the `transform` operations, and otherwise you could just use a regular variable to store the result? Agree that might be preferable, but probably out of scope for this initial codegen implementation since that would change how the interpreted path works as well -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[PR] [SPARK-51270] Support nanosecond precision timestamp in Variant [spark]
cashmand opened a new pull request, #50270: URL: https://github.com/apache/spark/pull/50270 ### What changes were proposed in this pull request? Adds Variant support for the nanosecond precision timestamp types in https://github.com/apache/parquet-format/commit/25f05e73d8cd7f5c83532ce51cb4f4de8ba5f2a2, similar to the UUID support added in https://github.com/apache/spark/pull/50025. Cast support: the PR allows casting to any type that the corresponding microsecond value supports. For strings, we retain the nanosecond precision, and for all other casts, we truncate to a microsecond timestamp and then cast. An alternative is to round to the nearest microsecond, but truncation is consistent with our current timestamp-to-integer cast, which truncates to the nearest second. We could also consider disabling some of these casts, but I didn't see a strong argument to. SchemaOfVariant: The types are reported as `timestamp_nanos` and `timestamp_nanos_ntz`. If Spark eventually adds native support for these types, we would either need to use these names, or change the behaviour of SchemaOfVariant. ### Why are the changes needed? Support Variant values written by other tools. ### Does this PR introduce _any_ user-facing change? Yes, operations on Variant binaries with these types would have previously failed, and will now succeed. ### How was this patch tested? Unit tests. ### Was this patch authored or co-authored using generative AI tooling? No. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[PR] [SPARK-51414][SQL] Add the make_time() function [spark]
robreeves opened a new pull request, #50269: URL: https://github.com/apache/spark/pull/50269 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? ### Was this patch authored or co-authored using generative AI tooling? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-51441][SQL] Add DSv2 APIs for constraints [spark]
dongjoon-hyun commented on code in PR #50253: URL: https://github.com/apache/spark/pull/50253#discussion_r1993739512 ## sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/constraints/ForeignKey.java: ## @@ -0,0 +1,104 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.connector.catalog.constraints; + +import org.apache.spark.sql.connector.catalog.Identifier; +import org.apache.spark.sql.connector.expressions.NamedReference; + +import java.util.Arrays; +import java.util.Objects; Review Comment: ditto. `import` ordering. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-37019][SQL] Add codegen support to array higher-order functions [spark]
Kimahriman commented on PR #34558: URL: https://github.com/apache/spark/pull/34558#issuecomment-2722033268 > > @Kimahriman just out of curiosity, how much did the performance improve? > > I just wanted to add to the above response that I've implemented a compilation scheme [here](https://sparkutils.github.io/quality/latest/advanced/userFunctions/#controlling-compilation-tweaking-the-quality-optimisations), as part of Quality, and we saw perf boosts of up to 40%, after that adding further lambdas triggered the cost of code generation being higher than the saving. It's definitely usage dependant though, the more work done in the function the higher the cost (and therefore potential saving by compilation), a small boost is noticeable on removal of the atomic under similar ideal circumstances. > > edit - [the source](https://github.com/sparkutils/quality/blob/main/src/main/scala/org/apache/spark/sql/qualityFunctions/LambdaCompilation.scala) Not sure how I missed this comment. we haven't done extensive performance comparisons with and without this, we've just been using it for a few years now. It's hard to quantify the impact since it's completely dependent on the expressions run inside the functions. But that's also the whole point, by enabling codegen for HOFs you enable codegen for expressions inside the lambda functions, which are assumed to be more performant since that's the whole point of codegen. Additionally this enables a follow on I'm currently working on which is enabling subexpression elimination inside of lambda functions, which we've recently identified as a major performance killer for us, as it's very easy to generate a lot of duplicate expression evaluations in certain cases -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-37019][SQL] Add codegen support to array higher-order functions [spark]
chris-twiner commented on PR #34558: URL: https://github.com/apache/spark/pull/34558#issuecomment-2722268943 > > > @Kimahriman just out of curiosity, how much did the performance improve? > > > > > > I just wanted to add to the above response that I've implemented a compilation scheme [here](https://sparkutils.github.io/quality/latest/advanced/userFunctions/#controlling-compilation-tweaking-the-quality-optimisations), as part of Quality, and we saw perf boosts of up to 40%, after that adding further lambdas triggered the cost of code generation being higher than the saving. It's definitely usage dependant though, the more work done in the function the higher the cost (and therefore potential saving by compilation), a small boost is noticeable on removal of the atomic under similar ideal circumstances. > > edit - [the source](https://github.com/sparkutils/quality/blob/main/src/main/scala/org/apache/spark/sql/qualityFunctions/LambdaCompilation.scala) > > Not sure how I missed this comment. we haven't done extensive performance comparisons with and without this, we've just been using it for a few years now. It's hard to quantify the impact since it's completely dependent on the expressions run inside the functions. But that's also the whole point, by enabling codegen for HOFs you enable codegen for expressions inside the lambda functions, which are assumed to be more performant since that's the whole point of codegen. > > Additionally this enables a follow on I'm currently working on which is enabling subexpression elimination inside of lambda functions, which we've recently identified as a major performance killer for us, as it's very easy to generate a lot of duplicate expression evaluations in certain cases The subexpression elimination option is huge! Very exciting -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-49479][CORE] Cancel the Timer non-daemon thread on stopping the BarrierCoordinator [spark]
tedyu commented on code in PR #50020: URL: https://github.com/apache/spark/pull/50020#discussion_r1993452621 ## core/src/main/scala/org/apache/spark/BarrierCoordinator.scala: ## @@ -134,11 +138,8 @@ private[spark] class BarrierCoordinator( // Cancel the current active TimerTask and release resources. private def cancelTimerTask(): Unit = { - if (timerTask != null) { -timerTask.cancel() -timer.purge() -timerTask = null - } + timerFutures.asScala.foreach(_.cancel(false)) Review Comment: Thanks for your comment. I think merging the removal of futures with cancellation would make the code amenable to future code changes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[PR] [SPARK-51504] Support `DataFrame.select` [spark-connect-swift]
dongjoon-hyun opened a new pull request, #16: URL: https://github.com/apache/spark-connect-swift/pull/16 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? ### Was this patch authored or co-authored using generative AI tooling? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-51489][SQL] Represent SQL Script in Spark UI [spark]
dusantism-db closed pull request #50256: [SPARK-51489][SQL] Represent SQL Script in Spark UI URL: https://github.com/apache/spark/pull/50256 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-37019][SQL] Add codegen support to array higher-order functions [spark]
chris-twiner commented on code in PR #34558: URL: https://github.com/apache/spark/pull/34558#discussion_r1994100057 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/higherOrderFunctions.scala: ## @@ -235,6 +256,53 @@ trait HigherOrderFunction extends Expression with ExpectsInputTypes { val canonicalizedChildren = cleaned.children.map(_.canonicalized) withNewChildren(canonicalizedChildren) } + + + protected def assignAtomic(atomicRef: String, value: String, isNull: String = FalseLiteral, + nullable: Boolean = false) = { +if (nullable) { + s""" +if ($isNull) { + $atomicRef.set(null); +} else { + $atomicRef.set($value); +} + """ +} else { + s"$atomicRef.set($value);" +} + } + + protected def assignArrayElement(ctx: CodegenContext, arrayName: String, elementCode: ExprCode, + elementVar: NamedLambdaVariable, index: String): String = { +val elementType = elementVar.dataType +val elementAtomic = ctx.addReferenceObj(elementVar.name, elementVar.value) +val extractElement = CodeGenerator.getValue(arrayName, elementType, index) Review Comment: Yes, that's definitely all it is, I may spend time to find a reproducible case of this but it's not in the hof test suite. It makes sense to keep the two things separate. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-51505][SQL] Log empty partition number metrics in AQE coalesce [spark]
liuzqt commented on PR #50273: URL: https://github.com/apache/spark/pull/50273#issuecomment-2722992293 @maryannxue @cloud-fan -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [WIP][SPARK-xxxx][Collation] Guard collation regex expressions behind a flag [spark]
github-actions[bot] closed pull request #49026: [WIP][SPARK-][Collation] Guard collation regex expressions behind a flag URL: https://github.com/apache/spark/pull/49026 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-50404][PYTHON] PySpark DataFrame Pipe Method [spark]
github-actions[bot] commented on PR #48947: URL: https://github.com/apache/spark/pull/48947#issuecomment-2722995058 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable. If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-50424][INFRA] Extract the common content of `Dockerfile` from `Docs`, `Linter`, and `SparkR` test images [spark]
github-actions[bot] closed pull request #48967: [SPARK-50424][INFRA] Extract the common content of `Dockerfile` from `Docs`, `Linter`, and `SparkR` test images URL: https://github.com/apache/spark/pull/48967 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-XXXX][Avro] Fix avro deserialization breaking for UnionType[null, Record] [spark]
github-actions[bot] closed pull request #49019: [SPARK-][Avro] Fix avro deserialization breaking for UnionType[null, Record] URL: https://github.com/apache/spark/pull/49019 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-51506][PYTHON][SS] Do not enforce users to implement close() in TransformWithStateInPandas [spark]
HeartSaVioR commented on PR #50272: URL: https://github.com/apache/spark/pull/50272#issuecomment-2723004044 It's better to retain both types of implementations since we want to "optionally" have close() so we should have covered both cases on testing. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[PR] [SPARK-51505][SQL] Log empty partition number metrics in AQE coalesce [spark]
liuzqt opened a new pull request, #50273: URL: https://github.com/apache/spark/pull/50273 ### What changes were proposed in this pull request? Log empty partition number metrics in AQE coalesce ### Why are the changes needed? There're cases where shuffle is highly skewed and many partitions are empty (probably due to small NDV) and only a few partitons contains data and are very large. AQE coalesce here basically did nothing by eliminate the empty partitions. AQE coalesce metrics might look confusing and user might think it wrongly coalesce to large partitions. We'd better log empty partition number in the metrics. ### Does this PR introduce _any_ user-facing change? Added `numEmptyPartitions` metrics to AQEShuffleReadExec when there is coalescing. ### How was this patch tested? New test case. ### Was this patch authored or co-authored using generative AI tooling? NO -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-51506][PYTHON][SS] Do not enforce users to implement close() in TransformWithStateInPandas [spark]
HeartSaVioR commented on PR #50272: URL: https://github.com/apache/spark/pull/50272#issuecomment-2723005968 Thanks! Merging to master/4.0. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-51506][PYTHON][SS] Do not enforce users to implement close() in TransformWithStateInPandas [spark]
HeartSaVioR commented on PR #50272: URL: https://github.com/apache/spark/pull/50272#issuecomment-2723008273 Forgot to tell, I have got approval on merging this bugfix from @cloud-fan beforehand. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-51506][PYTHON][SS] Do not enforce users to implement close() in TransformWithStateInPandas [spark]
HeartSaVioR closed pull request #50272: [SPARK-51506][PYTHON][SS] Do not enforce users to implement close() in TransformWithStateInPandas URL: https://github.com/apache/spark/pull/50272 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-51272][CORE]. Fix for the race condition in Scheduler causing failure in retrying all partitions in case of indeterministic shuffle keys [spark]
attilapiros commented on code in PR #50033: URL: https://github.com/apache/spark/pull/50033#discussion_r1994459647 ## core/src/main/scala/org/apache/spark/scheduler/ShuffleMapStage.scala: ## @@ -90,8 +90,11 @@ private[spark] class ShuffleMapStage( /** Returns the sequence of partition ids that are missing (i.e. needs to be computed). */ override def findMissingPartitions(): Seq[Int] = { -mapOutputTrackerMaster - .findMissingPartitions(shuffleDep.shuffleId) - .getOrElse(0 until numPartitions) +if (this.areAllPartitionsMissing(this.latestInfo.attemptNumber())) { Review Comment: I think it would be much better to unregister all the map and merge output at the rollback of the stages (for each rolled back stage) as it is done currently for the barrier RDDs: https://github.com/apache/spark/blob/1014c6babf57d08ba66bba3706bd350f4e9277dd/core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala#L2116 That way `areAllPartitionsMissing` and `attemptIdAllPartitionsMissing` won't be needed as `mapOutputTrackerMaster.findMissingPartitions` will give back all the partitions. And for the `ResultStage` I wonder whether an `abortStage` would be sufficient. ## resource-managers/yarn/src/test/scala/org/apache/spark/deploy/yarn/BaseYarnClusterSuite.scala: ## @@ -167,7 +167,10 @@ abstract class BaseYarnClusterSuite extends SparkFunSuite with Matchers { extraJars: Seq[String] = Nil, extraConf: Map[String, String] = Map(), extraEnv: Map[String, String] = Map(), - outFile: Option[File] = None): SparkAppHandle.State = { + outFile: Option[File] = None, + testTimeOut: Int = 3, // minutes + timeOutIntervalCheck: Int = 1 // seconds Review Comment: Why `Int` what about `Duration`? ## resource-managers/yarn/pom.xml: ## @@ -37,6 +37,11 @@ spark-core_${scala.binary.version} ${project.version} + + org.apache.spark + spark-sql_${scala.binary.version} + ${project.version} + Review Comment: Please do not depend on `spark-sql` in the `spark-yarn`. Especially not in compile scope. I would prefer to have the test chnaged to use only RDDs. ## scalastyle-config.xml: ## @@ -94,7 +94,7 @@ This file is divided into 3 sections: - + Review Comment: Please revert this as this decision has effect to all the future source codes which requires a bigger discussion. I suggest to use: ``` // scalastyle:off argcount // scalastyle:on ``` Or case class containing both values as they are related. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-51497][SQL] Add the default time formatter [spark]
dongjoon-hyun closed pull request #50266: [SPARK-51497][SQL] Add the default time formatter URL: https://github.com/apache/spark/pull/50266 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-51506][PYTHON][SS] Do not enforce users to implement close() in TransformWithStateInPandas [spark]
bogao007 commented on PR #50272: URL: https://github.com/apache/spark/pull/50272#issuecomment-2722873069 > nits: There seem to be more places that could remove the unnecessary `close()` impl in test file: e.g. > > https://github.com/apache/spark/blob/ccfc0a9dcea8fa9d6aab4dbb233f8135a3947441/python/pyspark/sql/tests/pandas/test_pandas_transform_with_state.py#L1479 Yeah I tend to keep both in the test. Removing some `close()` should be able to verify. Do you think we should remove all the `close()` instead? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-51491][PYTHON] Simplify boxplot with subquery APIs [spark]
zhengruifeng closed pull request #50258: [SPARK-51491][PYTHON] Simplify boxplot with subquery APIs URL: https://github.com/apache/spark/pull/50258 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-51191][SQL] Validate default values handling in DELETE, UPDATE, MERGE [spark]
dongjoon-hyun commented on code in PR #50271: URL: https://github.com/apache/spark/pull/50271#discussion_r1994422636 ## sql/core/src/test/scala/org/apache/spark/sql/connector/MergeIntoTableSuiteBase.scala: ## @@ -32,6 +32,58 @@ abstract class MergeIntoTableSuiteBase extends RowLevelOperationSuiteBase { import testImplicits._ + test("merge into table containing added column with default value") { +withTempView("source") { + sql( +s"""CREATE TABLE $tableNameAsString ( + | pk INT NOT NULL, + | salary INT NOT NULL DEFAULT -1, + | dep STRING) + |PARTITIONED BY (dep) + |""".stripMargin) + + append("pk INT NOT NULL, dep STRING", +"""{ "pk": 1, "dep": "hr" } + |{ "pk": 2, "dep": "hr" } + |{ "pk": 3, "dep": "hr" } + |""".stripMargin) + + sql(s"ALTER TABLE $tableNameAsString ADD COLUMN txt STRING DEFAULT 'initial-text'") + + checkAnswer( +sql(s"SELECT * FROM $tableNameAsString"), +Seq( + Row(1, -1, "hr", "initial-text"), + Row(2, -1, "hr", "initial-text"), + Row(3, -1, "hr", "initial-text"))) + + val sourceRows = Seq( +(1, 100, "hr"), +(4, 400, "hr")) + sourceRows.toDF("pk", "salary", "dep").createOrReplaceTempView("source") + + sql( +s"""MERGE INTO $tableNameAsString t + |USING source s + |ON t.pk = s.pk + |WHEN MATCHED THEN + | UPDATE SET t.salary = s.salary, t.txt = DEFAULT Review Comment: Can we remove `t.txt = DEFAULT` part in this line? Or, can we have a different value instead of `DEFAULT`? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-51191][SQL] Validate default values handling in DELETE, UPDATE, MERGE [spark]
dongjoon-hyun commented on PR #50271: URL: https://github.com/apache/spark/pull/50271#issuecomment-2722908307 cc @huaxingao -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-51491][PYTHON] Simplify boxplot with subquery APIs [spark]
zhengruifeng commented on PR #50258: URL: https://github.com/apache/spark/pull/50258#issuecomment-2722879036 merged to master -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-51272][CORE]. Fix for the race condition in Scheduler causing failure in retrying all partitions in case of indeterministic shuffle keys [spark]
ahshahid commented on code in PR #50033: URL: https://github.com/apache/spark/pull/50033#discussion_r1994488768 ## scalastyle-config.xml: ## @@ -94,7 +94,7 @@ This file is divided into 3 sections: - + Review Comment: Ok. will do that. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-51272][CORE]. Fix for the race condition in Scheduler causing failure in retrying all partitions in case of indeterministic shuffle keys [spark]
ahshahid commented on code in PR #50033: URL: https://github.com/apache/spark/pull/50033#discussion_r1994488494 ## core/src/main/scala/org/apache/spark/scheduler/ShuffleMapStage.scala: ## @@ -90,8 +90,11 @@ private[spark] class ShuffleMapStage( /** Returns the sequence of partition ids that are missing (i.e. needs to be computed). */ override def findMissingPartitions(): Seq[Int] = { -mapOutputTrackerMaster - .findMissingPartitions(shuffleDep.shuffleId) - .getOrElse(0 until numPartitions) +if (this.areAllPartitionsMissing(this.latestInfo.attemptNumber())) { Review Comment: For Map , I think what you are saying sounds great. I dont know much about the Barrier RDD logic. But unregistering the map outputs makes sense.. For ResultStage, not sure what you mean by abort stage .. That will throw an Exception , right? But that is not what we should do , if the FetchFailures is happening for the first partition of the ResultStagel -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[PR] [SPARK-51191][SQL] Validate default values handling in DELETE, UPDATE, MERGE [spark]
aokolnychyi opened a new pull request, #50271: URL: https://github.com/apache/spark/pull/50271 ### What changes were proposed in this pull request? This PR adds tests for default values handling in DELETE, UPDATE, MERGE. ### Why are the changes needed? These tests are needed to ensure default values are handled correctly in all DML operations. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? This PR consists of tests. ### Was this patch authored or co-authored using generative AI tooling? No. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-51272][CORE]. Fix for the race condition in Scheduler causing failure in retrying all partitions in case of indeterministic shuffle keys [spark]
ahshahid commented on code in PR #50033: URL: https://github.com/apache/spark/pull/50033#discussion_r1994537557 ## core/src/main/scala/org/apache/spark/scheduler/ShuffleMapStage.scala: ## @@ -90,8 +90,11 @@ private[spark] class ShuffleMapStage( /** Returns the sequence of partition ids that are missing (i.e. needs to be computed). */ override def findMissingPartitions(): Seq[Int] = { -mapOutputTrackerMaster - .findMissingPartitions(shuffleDep.shuffleId) - .getOrElse(0 until numPartitions) +if (this.areAllPartitionsMissing(this.latestInfo.attemptNumber())) { Review Comment: Are you sure that would work even for ShuffleMap ? Coz even if we unregister, but before the attempt ID is increased, a successful partition completion task comes, what would happen in that case? Will it get ignored because of unregistering? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-51372][SQL] Introduce a builder pattern in TableCatalog [spark]
beliefer commented on code in PR #50137: URL: https://github.com/apache/spark/pull/50137#discussion_r1994546972 ## sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/TableBuilderImpl.java: ## @@ -0,0 +1,73 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.connector.catalog; + +import com.google.common.collect.Maps; +import org.apache.spark.sql.catalyst.analysis.NoSuchNamespaceException; +import org.apache.spark.sql.catalyst.analysis.TableAlreadyExistsException; +import org.apache.spark.sql.connector.expressions.Transform; + +import java.util.Map; + +/** + * Default implementation of {@link TableCatalog.TableBuilder}. + */ +public class TableBuilderImpl implements TableCatalog.TableBuilder { + /** Catalog where table needs to be created. */ + private final TableCatalog catalog; + /** Table identifier. */ + private final Identifier identifier; + /** Columns of the new table. */ + private final Column[] columns; + /** Table properties. */ + private final Map properties = Maps.newHashMap(); + /** Transforms to use for partitioning data in the table. */ + private Transform[] partitions = new Transform[0]; + + /** + * Constructor for TableBuilderImpl. + * + * @param catalog catalog where table needs to be created. + * @param identifier identifier for the table. + * @param columns the columns of the new table. + */ + public TableBuilderImpl(TableCatalog catalog, + Identifier identifier, + Column[] columns) { +this.catalog = catalog; +this.identifier = identifier; +this.columns = columns; + } + + @Override + public TableCatalog.TableBuilder withPartitions(Transform[] partitions) { +this.partitions = partitions; +return this; + } + + @Override + public TableCatalog.TableBuilder withProperties(Map properties) { +this.properties.putAll(properties); +return this; + } + + @Override + public Table create() throws TableAlreadyExistsException, NoSuchNamespaceException { +return catalog.createTable(identifier, columns, partitions, properties); Review Comment: I have a question. How to increase flexibility if users customize the `TableBuilder` and override the `create`. It seems there is no flexible way. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-50806][SQL] Support InputRDDCodegen interruption on task cancellation [spark]
Ngone51 commented on PR #49501: URL: https://github.com/apache/spark/pull/49501#issuecomment-2723111433 @cloud-fan @dongjoon-hyun I have updated the PR. Could you take another look? Thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[PR] [SPARK-51507] Support creating config map that can be mounted by Spark pods for apps [spark-kubernetes-operator]
jiangzho opened a new pull request, #166: URL: https://github.com/apache/spark-kubernetes-operator/pull/166 ### What changes were proposed in this pull request? This PR introduces a new field `configMapSpecs`, which enables user to create configmap(s) which can be later mounted on driver and/or executors with values specified in SparkApplications. It also adds corresponding docs and unit tests. ### Why are the changes needed? There's a common user demand for the feature to create lightweight values / files override using configmap. This new feature saves the trouble for user to create configmap(s) beforehand - the config map would share the lifecycle of the corresponding Spark app. User may mount the ConfigMap as needed in driver and executor pod templates - one example is provided in the doc. In future we may further enhance this flow by mounting the created config map automatically to Spark pods. This can be achieved by supporting volume type of ConfigMap in Spark Kubernetes, or by adding mutating webhook & alter pod templates after creation. For now, this PR opens up the possibility of creating config map and mounting them with one single spec. ### Does this PR introduce _any_ user-facing change? No - not yet released ### How was this patch tested? Pass the CIs ### Was this patch authored or co-authored using generative AI tooling? No -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[PR] [SPARK-51508] Support `collect(): [[String]]` for `DataFrame` [spark-connect-swift]
dongjoon-hyun opened a new pull request, #17: URL: https://github.com/apache/spark-connect-swift/pull/17 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? ### Was this patch authored or co-authored using generative AI tooling? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-51504] Support `select/limit/sort/orderBy/isEmpty` for `DataFrame` [spark-connect-swift]
dongjoon-hyun commented on PR #16: URL: https://github.com/apache/spark-connect-swift/pull/16#issuecomment-2722806877 Thank you so much, @viirya ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-51508] Support `collect(): [[String]]` for `DataFrame` [spark-connect-swift]
dongjoon-hyun commented on code in PR #17: URL: https://github.com/apache/spark-connect-swift/pull/17#discussion_r1994560977 ## Sources/SparkConnect/DataFrame.swift: ## @@ -58,7 +58,7 @@ public actor DataFrame: Sendable { /// Add `Apache Arrow`'s `RecordBatch`s to the internal array. /// - Parameter batches: An array of ``RecordBatch``. - private func addBathes(_ batches: [RecordBatch]) { + private func addBatches(_ batches: [RecordBatch]) { Review Comment: This is a typo fix. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-51508] Support `collect(): [[String]]` for `DataFrame` [spark-connect-swift]
dongjoon-hyun commented on code in PR #17: URL: https://github.com/apache/spark-connect-swift/pull/17#discussion_r1994561915 ## Sources/SparkConnect/SparkSession.swift: ## @@ -45,12 +45,10 @@ public actor SparkSession { /// - userID: an optional user ID. If absent, `SPARK_USER` environment or ``ProcessInfo.processInfo.userName`` is used. init(_ connection: String, _ userID: String? = nil) { let processInfo = ProcessInfo.processInfo -#if os(iOS) || os(watchOS) || os(tvOS) -let userName = processInfo.environment["SPARK_USER"] ?? "" -#elseif os(macOS) || os(Linux) +#if os(macOS) || os(Linux) Review Comment: I simplified the implementation in a new way to cover `os(visionOS)` too. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-51508] Support `collect(): [[String]]` for `DataFrame` [spark-connect-swift]
dongjoon-hyun commented on code in PR #17: URL: https://github.com/apache/spark-connect-swift/pull/17#discussion_r1994561481 ## Sources/SparkConnect/SparkConnectClient.swift: ## @@ -275,9 +275,11 @@ public actor SparkConnectClient { let expressions: [Spark_Connect_Expression.SortOrder] = cols.map { var expression = Spark_Connect_Expression.SortOrder() expression.child.exprType = .unresolvedAttribute($0.toUnresolvedAttribute) + expression.direction = .ascending return expression } sort.order = expressions +sort.isGlobal = true Review Comment: I piggy-back this fix while checking a test coverage. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-51508] Support `collect(): [[String]]` for `DataFrame` [spark-connect-swift]
dongjoon-hyun commented on code in PR #17: URL: https://github.com/apache/spark-connect-swift/pull/17#discussion_r1994569044 ## Tests/SparkConnectTests/DataFrameTests.swift: ## @@ -125,19 +125,25 @@ struct DataFrameTests { await spark.stop() } +#if !os(Linux) Review Comment: - Like `show()`, `collect()` currently has a binary compatibility issue on `os(Linux)`. - On MacOS, all tests pass. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-51443][SS] Fix singleVariantColumn in DSv2 and readStream. [spark]
cloud-fan commented on PR #50217: URL: https://github.com/apache/spark/pull/50217#issuecomment-2721276493 thanks, merging to master/4.0! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-51487][PYTHON][INFRA] Refresh testing images for pyarrow 19 [spark]
dongjoon-hyun commented on PR #50255: URL: https://github.com/apache/spark/pull/50255#issuecomment-2722887996 Given the assessment, I'm okay to merge this. Feel free to merge, @zhengruifeng . cc @cloud-fan as the release manager of Apache Spark 4.0.0. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-51272][CORE]. Fix for the race condition in Scheduler causing failure in retrying all partitions in case of indeterministic shuffle keys [spark]
attilapiros commented on code in PR #50033: URL: https://github.com/apache/spark/pull/50033#discussion_r1994377379 ## resource-managers/yarn/src/test/scala/org/apache/spark/deploy/yarn/BaseYarnClusterSuite.scala: ## @@ -167,7 +167,10 @@ abstract class BaseYarnClusterSuite extends SparkFunSuite with Matchers { extraJars: Seq[String] = Nil, extraConf: Map[String, String] = Map(), extraEnv: Map[String, String] = Map(), - outFile: Option[File] = None): SparkAppHandle.State = { + outFile: Option[File] = None, + testTimeOut: Int = 3, // minutes + timeOutIntervalCheck: Int = 1 // seconds Review Comment: Why `Int` why not `Duration`? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[PR] [SPARK-51506][PYTHON][SS] Do not enforce users to implement close() in TransformWithStateInPandas [spark]
bogao007 opened a new pull request, #50272: URL: https://github.com/apache/spark/pull/50272 ### What changes were proposed in this pull request? Do not enforce users to implement `close()` in TransformWithStateInPandas since `close()` is an optional function to implement. ### Why are the changes needed? Optional function should not enforce users to implement. This also aligns with Scala TWS. ### Does this PR introduce _any_ user-facing change? Yes. ### How was this patch tested? Updated the existing unit test by not implement `close()` function in several stateful processors. ### Was this patch authored or co-authored using generative AI tooling? No -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-51272][CORE]. Fix for the race condition in Scheduler causing failure in retrying all partitions in case of indeterministic shuffle keys [spark]
ahshahid commented on code in PR #50033: URL: https://github.com/apache/spark/pull/50033#discussion_r1994490858 ## resource-managers/yarn/pom.xml: ## @@ -37,6 +37,11 @@ spark-core_${scala.binary.version} ${project.version} + + org.apache.spark + spark-sql_${scala.binary.version} + ${project.version} + Review Comment: For now I will add it to Test scope. Then let me try it to reproduce using just RDD.s -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-51272][CORE]. Fix for the race condition in Scheduler causing failure in retrying all partitions in case of indeterministic shuffle keys [spark]
ahshahid commented on code in PR #50033: URL: https://github.com/apache/spark/pull/50033#discussion_r1994509791 ## resource-managers/yarn/src/test/scala/org/apache/spark/deploy/yarn/BaseYarnClusterSuite.scala: ## @@ -167,7 +167,10 @@ abstract class BaseYarnClusterSuite extends SparkFunSuite with Matchers { extraJars: Seq[String] = Nil, extraConf: Map[String, String] = Map(), extraEnv: Map[String, String] = Map(), - outFile: Option[File] = None): SparkAppHandle.State = { + outFile: Option[File] = None, + testTimeOut: Int = 3, // minutes + timeOutIntervalCheck: Int = 1 // seconds Review Comment: Did not think about it.. Thank you ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-51508] Support `collect(): [[String]]` for `DataFrame` [spark-connect-swift]
dongjoon-hyun commented on PR #17: URL: https://github.com/apache/spark-connect-swift/pull/17#issuecomment-2723173515 Could you review this, @viirya ? I added the first implementation for `collect()` API for users and for easy testing method. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-51504] Support `select/limit/sort/orderBy/isEmpty` for `DataFrame` [spark-connect-swift]
dongjoon-hyun closed pull request #16: [SPARK-51504] Support `select/limit/sort/orderBy/isEmpty` for `DataFrame` URL: https://github.com/apache/spark-connect-swift/pull/16 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-50763][SQL] Add Analyzer rule for resolving SQL table functions [spark]
cloud-fan commented on PR #49471: URL: https://github.com/apache/spark/pull/49471#issuecomment-2723198248 thanks, merging to master/4.0! (This is the last piece of the SQL UDF feature) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-50763][SQL] Add Analyzer rule for resolving SQL table functions [spark]
cloud-fan closed pull request #49471: [SPARK-50763][SQL] Add Analyzer rule for resolving SQL table functions URL: https://github.com/apache/spark/pull/49471 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-51466][SQL][HIVE] Eliminate Hive built-in UDFs initialization on Hive UDF evaluation [spark]
pan3793 commented on PR #50232: URL: https://github.com/apache/spark/pull/50232#issuecomment-2720217305 > Hey, why are any manual tests listed in the PR desc not included in UTs? Given that we have excluded `hive-llap-*` deps from STS modules, the existing STS SQL tests should cover all my manual test cases. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-51450][CORE] BarrierCoordinator thread not exiting in Spark standalone mode [spark]
beliefer commented on PR #50223: URL: https://github.com/apache/spark/pull/50223#issuecomment-2720399615 At that time, https://issues.apache.org/jira/browse/SPARK-46895 is an improvement, not a bug fix. You means JVM will not exit even if the branch-3.5 uses `Timer`? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-51450][CORE] BarrierCoordinator thread not exiting in Spark standalone mode [spark]
beliefer commented on PR #50223: URL: https://github.com/apache/spark/pull/50223#issuecomment-2720555409 cc @srowen -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-51450][CORE] BarrierCoordinator thread not exiting in Spark standalone mode [spark]
jjayadeep06 commented on PR #50223: URL: https://github.com/apache/spark/pull/50223#issuecomment-2720538216 > At that time, https://issues.apache.org/jira/browse/SPARK-46895 is an improvement, not a bug fix. You means JVM will not exit even if the branch-3.5 uses `Timer`? yes, and if you look at the changes, it is fixing the existing `Timer` implementation. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[PR] [WIP][SQL] Add the default time formatter [spark]
MaxGekk opened a new pull request, #50266: URL: https://github.com/apache/spark/pull/50266 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? ### Was this patch authored or co-authored using generative AI tooling? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-51402][SQL][TESTS] Test TimeType in UDF [spark]
MaxGekk closed pull request #50194: [SPARK-51402][SQL][TESTS] Test TimeType in UDF URL: https://github.com/apache/spark/pull/50194 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-51402][SQL][TESTS] Test TimeType in UDF [spark]
MaxGekk commented on PR #50194: URL: https://github.com/apache/spark/pull/50194#issuecomment-2720630196 @calilisantos Congratulations with your first contribution to Apache Spark! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org