Wenchen, what I'm suggesting is a bit of both of your proposals. I think that USING should be optional like your first option. USING (or format(...) in the DF side) should configure the source or implementation, while the catalog should be part of the table identifier. They serve two different purposes: configuring the storage within the catalog, and choosing which catalog to pass create or other calls to. I think that's pretty much what you suggest in #1. The USING syntax would continue to be used to configure storage within a catalog.
(Side note: I don't think this needs to be tied to a particular implementation. We currently use 'parquet' to tell the Spark catalog to use the Parquet source, but another catalog could also use 'parquet' to store data in Parquet format without using the Spark built-in source.) The second option suggests separating the catalog API from data source. In #21306 <https://github.com/apache/spark/pull/21306>, I add the proposed catalog API and a reflection-based loader like the v1 sources use (and v2 sources have used so far). I think that it makes much more sense to start with a catalog and then get the data source for operations like CTAS. This is compatible with the behavior from your point #1: the catalog chooses the source implementation and USING is optional. The reason why we considered an API to get a catalog from the source is because we defined the source API first, but it doesn't make sense to get a catalog from the data source. Catalogs can share data sources (e.g. prod and test environments). Plus, it makes more sense to determine the catalog and then have it return the source implementation because it may require a specific one, like JDBC or Iceberg would. With standard logical plans we always know the catalog when creating the plan: either the table identifier includes an explicit one, or the default catalog is used. In the PR I mentioned above, the catalog implementation's class is determined by Spark config properties, so there's no need to use ServiceLoader and we can use the same implementation class for multiple catalogs with different configs (e.g. prod and test environments). Your last point about path-based tables deserves some attention. But, we also need to define the behavior of path-based tables. Part of what we want to preserve is flexibility, like how you don't need to alter the schema in JSON tables, you just write different data. For the path-based syntax, I suggest looking up source first and using the source if there is one. If not, then look up the catalog. That way existing tables work, but we can migrate to catalogs with names that don't conflict. rb