Thanks Attila for starting the hive-iceberg branch/tag discussion.
In HIVE-27233 , we first introduced the branch/tag syntax in Hive by referring to Spark-Iceberg branch/tag syntax. Spark-Iceberg uses the ALTERsyntax to express the branch/tag operation, and I think most users or engines are used to this syntax. So following Spark-Iceberg syntax is important for users who use multiple engines(Spark & Hive, or others). But what you said is also reasonable. Sometimes, CREATE & DROP syntax are more straightforward for the new users. I also have seen the Dremio-iceberg doc, which shows that Dremio use CREATE & DROP syntax instead of ALTER. For example: | CREATE BRANCH [ IF NOT EXISTS ] <branch_name> [ { FROM | AT } { REF[ERENCE] | BRANCH | TAG | COMMIT } <reference_name> ] [ IN <catalog_name> ] | But this Dreimo CREATE syntax is more like a dialect than Spark-Iceberg ALTER, as we subconsciously think spark-iceberg syntax is right&official. IMO, I am not against for implementing the new branch/tag syntax(CREATE&DROP), as long as there is a strong demand from community users. But the new syntax will be a Hive-style dialect, which other engines(Spark&Trino, etc) will not accept. I would like to hear opinions from other folks. :) Thanks, Butao Zhang ---- Replied Message ---- | From | Attila Turoczy<aturo...@cloudera.com.INVALID> | | Date | 11/6/2024 21:51 | | To | dev<dev@hive.apache.org> | | Subject | Iceberg branching tagging syntax | Dear Hive community, I would like to hear your feedback about some syntax sugar for iceberg branching. If somebody is not aware of this cool feature please read out the following blog post. Currently, Hive is implementing branching as per the official recommendation, which is fine, but there's something about the syntax that feels out of place in modern SQL linguistics. Today, any new functionality tends to be added under the ALTER TABLE umbrella. However, ALTER TABLE is increasingly overloaded with diverse functionalities across different engines, making it less intuitive. (To me the ALTER TABLE is kinda ETC in the SQL linguistic) Many DBAs I’ve worked with are comfortable with commands like CREATE, SELECT, and INSERT, but when it comes to ALTER, things often get more complex and everybody starts to google it. This is particularly true now with the introduction of iceberg branching and tagging features, which are some of the most exciting developments since somebody invited Spotify! :) But from a usability perspective, this syntax is challenging to remember and use. In customer demos, I've been asked why the syntax is so complicated. In my view, these key features deserve dedicated verbs, making them distinct and straightforward. As a proposal, I’d suggest introducing new syntax options specifically for branching and tagging. This wouldn’t replace the current approach but could be an alternative that enhances clarity and ease of use. Create branch: CREATE BRANCH audit_branch FROM audit; From snapshot: CREATE BRANCH audit_branch FROM audit AS OF VERSION 1234; ** ** Maybe the FORM here Could be AT <CommitID> Create tag: CREATE TAG historical_tag FROM audit. same as for AS OF Drop branch: DROP BRANCH audit_branch; Drop Tag: DROP TAG audit_branch; Your opinion is very important to us, as it helps determine whether this is primarily a usability concern for a handful of EU customers, or if it might be better overall to stick with the classic ALTER approach. -Attila