Hi, A friendly Trino maintainer shared an informative survey about branching with me. I'm sharing it here, as it would be helpful when we add or update branching syntaxes.
https://docs.google.com/document/d/1jEF4IkWu-2Gzk5ii2Nb0exuEnAUeo98UbiM3i0xtgWQ/edit?tab=t.0 https://github.com/trinodb/trino/pull/25152 Best, Okumin On Mon, Nov 25, 2024 at 11:27 PM Okumin <m...@okumin.com> wrote: > Hi Attila, > > Adding new commands could be an option. Honestly, I don't often use BRANCH > or TAG, and I don't have a strong opinion on either approach. > > Off topic. If I understand correctly, Dremio's branching semantics are > different from what Hive provides. Dremio supports versioning not per table > but per entire namespace. I wonder if Apache Hive has a plan to support > those semantics. > > Regards, > Okumin > > On Mon, Nov 11, 2024 at 4:11 PM Butao Zhang <butaozha...@163.com> wrote: > >> Thanks Attila for starting the hive-iceberg branch/tag discussion. >> >> In HIVE-27233 <https://issues.apache.org/jira/browse/HIVE-27233> , we >> first introduced the branch/tag syntax in Hive by referring to >> Spark-Iceberg branch/tag syntax. Spark-Iceberg uses the ALTER >> <https://iceberg.apache.org/docs/1.7.0/branching/#historical-tags> syntax >> to express the branch/tag operation, and I think most users or engines are >> used to this syntax. So following Spark-Iceberg syntax is important for >> users who use multiple engines(Spark & Hive, or others). >> >> But what you said is also reasonable. Sometimes, CREATE & DROP syntax are >> more straightforward for the new users. I also have seen the Dremio-iceberg >> doc <https://docs.dremio.com/cloud/reference/sql/commands/create-branch>, >> which shows that Dremio use CREATE & DROP syntax instead of ALTER. For >> example: >> >> CREATE BRANCH [ IF NOT EXISTS ] <branch_name> >> [ { FROM | AT } { REF[ERENCE] | BRANCH | TAG | COMMIT } >> <reference_name> ] >> [ IN <catalog_name> ] >> >> But this Dreimo CREATE syntax is more like a dialect than Spark-Iceberg >> ALTER, as we subconsciously think spark-iceberg syntax is right&official. >> >> IMO, I am not against for implementing the new branch/tag >> syntax(CREATE&DROP), as long as there is a strong demand from community >> users. But the new syntax will be a Hive-style dialect, which other >> engines(Spark&Trino, etc) will not accept. >> >> I would like to hear opinions from other folks. :) >> >> >> Thanks, >> Butao Zhang >> ---- Replied Message ---- >> From Attila Turoczy<aturo...@cloudera.com.INVALID> <undefined> >> Date 11/6/2024 21:51 >> To dev<dev@hive.apache.org> <dev@hive.apache.org> >> Subject Iceberg branching tagging syntax >> Dear Hive community, >> >> I would like to hear your feedback about some syntax sugar for iceberg >> branching. If somebody is not aware of this cool feature please read out >> the following blog post >> <https://medium.com/@ayushtkn/apache-hive-4-x-with-iceberg-branches-tags-3d52293ac0bf> >> . >> >> Currently, Hive is implementing branching as per the official >> <https://iceberg.apache.org/docs/1.6.1/branching/> recommendation, which >> is fine, but there's something about the syntax that feels out of place in >> modern SQL linguistics. Today, any new functionality tends to be added >> under the ALTER TABLE umbrella. However, ALTER TABLE is increasingly >> overloaded with diverse functionalities across different engines, making it >> less intuitive. (To me the ALTER TABLE is kinda ETC in the SQL linguistic) >> >> Many DBAs I’ve worked with are comfortable with commands like CREATE, >> SELECT, and INSERT, but when it comes to ALTER, things often get more >> complex and everybody starts to google it. >> >> This is particularly true now with the introduction of iceberg branching >> and tagging features, which are some of the most exciting developments >> since somebody invited Spotify! :) But from a usability perspective, this >> syntax is challenging to remember and use. >> >> In customer demos, I've been asked why the syntax is so complicated. In >> my view, these key features deserve dedicated verbs, making them distinct >> and straightforward. >> >> As a proposal, I’d suggest introducing new syntax options specifically >> for branching and tagging. *This wouldn’t replace the current approach >> but could be an alternative that enhances clarity and ease of use.* >> Create branch: >> >> CREATE BRANCH audit_branch FROM audit; >> >> From snapshot: >> >> CREATE BRANCH audit_branch FROM audit AS OF VERSION 1234; ** >> >> ** Maybe the FORM here Could be* AT <CommitID>* >> >> Create tag: >> >> CREATE TAG historical_tag FROM audit. >> >> same as for AS OF >> >> Drop branch: >> >> DROP BRANCH audit_branch; >> >> Drop Tag: >> >> DROP TAG audit_branch; >> >> Your opinion is very important to us, as it helps determine whether this >> is primarily a usability concern for a handful of EU customers, or if it >> might be better overall to stick with the classic ALTER approach. >> -Attila >> >>