[
https://issues.apache.org/jira/browse/SPARK-15777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15543329#comment-15543329
]
Yan commented on SPARK-15777:
-----------------------------
1) Currently the rules are applied on a per-session basis. Right, ideally they
should be applied on a per-query basis. We can modify the design/implementation
in that direction. Regarding evaluation ordering, item 5) of the "Scopes,
Limitations and Open Questions" is on this topic. In short, there is an
ordering between the built-in rules and custom rules, but not among the custom
rules. The plugin mechanism is for cooperative behavior so the plugged rules
are expected to be applied against their specific data sources of the plans
only, probably after some plan rewriting. Once the overall ideas are accepted
by the community, we will flesh out the design doc and post the implementation
in a WIP fashion.
2) As mentioned in the doc, this is a not complete design. Hopefully it can lay
down some basic concepts and principles so future work can be built on top of
it. For instance, persistent catalog itself could be another major feature but
it is left out of the scope of this design for now without affecting the
primary functionalities.
3) 3-level table identifier is now for the name space purpose. Yes, join
queries against two tables of the same db and table names but with different
catalog names work well. Arbitrary levels of name spaces are not supported yet .
Thanks for your comments.
> Catalog federation
> ------------------
>
> Key: SPARK-15777
> URL: https://issues.apache.org/jira/browse/SPARK-15777
> Project: Spark
> Issue Type: New Feature
> Components: SQL
> Reporter: Reynold Xin
> Attachments: SparkFederationDesign.pdf
>
>
> This is a ticket to track progress to support federating multiple external
> catalogs. This would require establishing an API (similar to the current
> ExternalCatalog API) for getting information about external catalogs, and
> ability to convert a table into a data source table.
> As part of this, we would also need to be able to support more than a
> two-level table identifier (database.table). At the very least we would need
> a three level identifier for tables (catalog.database.table). A possibly
> direction is to support arbitrary level hierarchical namespaces similar to
> file systems.
> Once we have this implemented, we can convert the current Hive catalog
> implementation into an external catalog that is "mounted" into an internal
> catalog.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]