Hi,

A friendly Trino maintainer shared an informative survey about branching
with me. I'm sharing it here, as it would be helpful when we add or update
branching syntaxes.

https://docs.google.com/document/d/1jEF4IkWu-2Gzk5ii2Nb0exuEnAUeo98UbiM3i0xtgWQ/edit?tab=t.0
https://github.com/trinodb/trino/pull/25152

Best,
Okumin

On Mon, Nov 25, 2024 at 11:27 PM Okumin <m...@okumin.com> wrote:

> Hi Attila,
>
> Adding new commands could be an option. Honestly, I don't often use BRANCH
> or TAG, and I don't have a strong opinion on either approach.
>
> Off topic. If I understand correctly, Dremio's branching semantics are
> different from what Hive provides. Dremio supports versioning not per table
> but per entire namespace. I wonder if Apache Hive has a plan to support
> those semantics.
>
> Regards,
> Okumin
>
> On Mon, Nov 11, 2024 at 4:11 PM Butao Zhang <butaozha...@163.com> wrote:
>
>> Thanks Attila for starting the hive-iceberg branch/tag discussion.
>>
>> In HIVE-27233 <https://issues.apache.org/jira/browse/HIVE-27233> , we
>> first introduced the branch/tag syntax in Hive by referring to
>> Spark-Iceberg branch/tag syntax. Spark-Iceberg uses the ALTER
>> <https://iceberg.apache.org/docs/1.7.0/branching/#historical-tags> syntax
>> to express the branch/tag operation, and I think most users or engines are
>> used to this syntax. So following Spark-Iceberg syntax is important for
>> users who use multiple engines(Spark & Hive, or others).
>>
>> But what you said is also reasonable. Sometimes, CREATE & DROP syntax are
>> more straightforward for the new users. I also have seen the Dremio-iceberg
>> doc <https://docs.dremio.com/cloud/reference/sql/commands/create-branch>,
>> which shows that Dremio use CREATE & DROP syntax instead of ALTER. For
>> example:
>>
>> CREATE BRANCH [ IF NOT EXISTS ] <branch_name>
>>    [ { FROM | AT } { REF[ERENCE] | BRANCH | TAG | COMMIT }
>> <reference_name> ]
>>    [ IN <catalog_name> ]
>>
>> But this Dreimo CREATE syntax is more like a dialect than Spark-Iceberg
>> ALTER, as we subconsciously think spark-iceberg syntax is right&official.
>>
>> IMO, I am not against for implementing the new branch/tag
>> syntax(CREATE&DROP), as long as there is a strong demand from community
>> users. But the new syntax will be a Hive-style dialect, which other
>> engines(Spark&Trino, etc) will not accept.
>>
>> I would like to hear opinions from other folks. :)
>>
>>
>> Thanks,
>> Butao Zhang
>> ---- Replied Message ----
>> From Attila Turoczy<aturo...@cloudera.com.INVALID> <undefined>
>> Date 11/6/2024 21:51
>> To dev<dev@hive.apache.org> <dev@hive.apache.org>
>> Subject Iceberg branching tagging syntax
>> Dear Hive community,
>>
>> I would like to hear your feedback about some syntax sugar for iceberg
>> branching. If somebody is not aware of this cool feature please read out
>> the following blog post
>> <https://medium.com/@ayushtkn/apache-hive-4-x-with-iceberg-branches-tags-3d52293ac0bf>
>> .
>>
>> Currently, Hive is implementing branching as per the official
>> <https://iceberg.apache.org/docs/1.6.1/branching/> recommendation, which
>> is fine, but there's something about the syntax that feels out of place in
>> modern SQL linguistics. Today, any new functionality tends to be added
>> under the ALTER TABLE umbrella. However, ALTER TABLE is increasingly
>> overloaded with diverse functionalities across different engines, making it
>> less intuitive. (To me the ALTER TABLE is kinda ETC in the SQL linguistic)
>>
>> Many DBAs I’ve worked with are comfortable with commands like CREATE,
>> SELECT, and INSERT, but when it comes to ALTER, things often get more
>> complex and everybody starts to google it.
>>
>> This is particularly true now with the introduction of iceberg branching
>> and tagging features, which are some of the most exciting developments
>> since somebody invited Spotify! :) But from a usability perspective, this
>> syntax is challenging to remember and use.
>>
>> In customer demos, I've been asked why the syntax is so complicated. In
>> my view, these key features deserve dedicated verbs, making them distinct
>> and straightforward.
>>
>> As a proposal, I’d suggest introducing new syntax options specifically
>> for branching and tagging. *This wouldn’t replace the current approach
>> but could be an alternative that enhances clarity and ease of use.*
>> Create branch:
>>
>> CREATE BRANCH audit_branch FROM audit;
>>
>> From snapshot:
>>
>> CREATE BRANCH audit_branch FROM audit AS OF VERSION 1234; **
>>
>> ** Maybe the FORM here Could be* AT <CommitID>*
>>
>> Create tag:
>>
>> CREATE TAG historical_tag FROM audit.
>>
>> same as for AS OF
>>
>> Drop branch:
>>
>> DROP BRANCH audit_branch;
>>
>> Drop Tag:
>>
>> DROP TAG audit_branch;
>>
>> Your opinion is very important to us, as it helps determine whether this
>> is primarily a usability concern for a handful of EU customers, or if it
>> might be better overall to stick with the classic ALTER approach.
>> -Attila
>>
>>

Reply via email to