Re: [PR] [SPARK-51372] Introduce a builder pattern in TableCatalog [spark]

via GitHub Mon, 03 Mar 2025 12:35:26 -0800


szehon-ho commented on code in PR #50137:
URL: https://github.com/apache/spark/pull/50137#discussion_r1978090554



##########
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/CreateTableExec.scala:
##########
@@ -43,7 +43,10 @@ case class CreateTableExec(
   override protected def run(): Seq[InternalRow] = {
     if (!catalog.tableExists(identifier)) {
       try {
-        catalog.createTable(identifier, columns, partitioning.toArray, 
tableProperties.asJava)
+        catalog.buildTable(identifier, columns)

Review Comment:
   I think Spark uses 2 space indents



##########
sql/catalyst/src/test/scala/org/apache/spark/sql/connector/catalog/InMemoryTableCatalog.scala:
##########
@@ -294,6 +295,51 @@ class InMemoryTableCatalog extends 
BasicInMemoryTableCatalog with SupportsNamesp
   }
 
   case class Result(readSchema: StructType, rows: Array[InternalRow]) extends 
LocalScan
+
+  override def buildTable(ident: Identifier, columns: Array[Column]): 
TableBuilder = {
+    new BasicInMemoryTableBuilder(ident, columns, namespaces, tables)
+  }
+
+  private class BasicInMemoryTableBuilder(
+      val ident: Identifier,

Review Comment:
   We can remove 'val' in these variables. 



##########
sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/TableCatalog.java:
##########
@@ -311,4 +311,49 @@ default boolean purgeTable(Identifier ident) throws 
UnsupportedOperationExceptio
    */
   void renameTable(Identifier oldIdent, Identifier newIdent)
       throws NoSuchTableException, TableAlreadyExistsException;
+
+  /**
+   * Instantiate a builder to create a table in the catalog.
+   *
+   * @param ident  a table identifier.
+   * @param columns the columns of the new table.
+   * @return the TableBuilder to create a table.
+   */
+  default TableBuilder buildTable(Identifier ident, Column[] columns) {
+    return new TableBuilderImpl(this, ident, columns);
+  }
+
+  /**
+   * Builder used to create tables.
+   *
+   * <p>Call {@link #buildTable(Identifier, Column[])} to create a new builder.
+   */
+  interface TableBuilder {
+    /**
+     * Sets the partitions for the table.
+     *
+     * @param partitions Partitions for the table.
+     * @return this for method chaining
+     */
+    TableBuilder withPartitions(Transform[] partitions);
+
+    /**
+     * Adds key/value properties to the table.
+     *
+     * @param properties key/value properties
+     * @return this for method chaining
+     */
+    TableBuilder withProperties(Map<String, String> properties);

Review Comment:
   It seems  'withProperties' sounds like it should set/replace the existing 
map, rather than increment the existing map, based on how the other method is 
defined.
   
   In Spark the builder arguments for conf seem to take in single key, value.  
I wonder if we should follow this pattern if the goal is to have an additive 
API , else it seems wasteful (callers have to make a temporary java Map just to 
call this).
   
   Maybe we can support both APIs?
   
   Just my observation.



##########
sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/TableCatalog.java:
##########
@@ -311,4 +311,49 @@ default boolean purgeTable(Identifier ident) throws 
UnsupportedOperationExceptio
    */
   void renameTable(Identifier oldIdent, Identifier newIdent)
       throws NoSuchTableException, TableAlreadyExistsException;
+
+  /**
+   * Instantiate a builder to create a table in the catalog.
+   *
+   * @param ident  a table identifier.

Review Comment:
   Nit: extra space



##########
sql/catalyst/src/test/scala/org/apache/spark/sql/connector/catalog/InMemoryTableCatalog.scala:
##########
@@ -294,6 +295,51 @@ class InMemoryTableCatalog extends 
BasicInMemoryTableCatalog with SupportsNamesp
   }
 
   case class Result(readSchema: StructType, rows: Array[InternalRow]) extends 
LocalScan
+
+  override def buildTable(ident: Identifier, columns: Array[Column]): 
TableBuilder = {
+    new BasicInMemoryTableBuilder(ident, columns, namespaces, tables)
+  }
+
+  private class BasicInMemoryTableBuilder(
+      val ident: Identifier,
+      val columns: Array[Column],
+      val namespaces: util.Map[List[String], Map[String, String]],
+      val tables: util.Map[Identifier, Table])

Review Comment:
   I think we dont need this?



##########
sql/catalyst/src/test/scala/org/apache/spark/sql/connector/catalog/InMemoryTableCatalog.scala:
##########
@@ -294,6 +295,51 @@ class InMemoryTableCatalog extends 
BasicInMemoryTableCatalog with SupportsNamesp
   }
 
   case class Result(readSchema: StructType, rows: Array[InternalRow]) extends 
LocalScan
+
+  override def buildTable(ident: Identifier, columns: Array[Column]): 
TableBuilder = {
+    new BasicInMemoryTableBuilder(ident, columns, namespaces, tables)
+  }
+
+  private class BasicInMemoryTableBuilder(
+      val ident: Identifier,
+      val columns: Array[Column],
+      val namespaces: util.Map[List[String], Map[String, String]],
+      val tables: util.Map[Identifier, Table])
+    extends TableBuilder {
+    import org.apache.spark.sql.connector.catalog.CatalogV2Implicits._
+
+    private val properties: util.Map[String, String] = new util.HashMap();

Review Comment:
   I think we can just define these defaults in the class definition.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-51372] Introduce a builder pattern in TableCatalog [spark]

Reply via email to