Hi Folks,

Resurfacing this thread again. I had a chat with a few folks earlier this
week and learned that there’s been some recurring customer feedback around
this topic. While it's not exactly a blocker, it does introduce a bit of
friction for users.

There's an example shared earlier in the thread that closely resembles
what’s being requested here. We may not need to follow it exactly, but
doing something similar within our Hive flavor could be valuable.
If adding some syntactic sugar can improve adoption and enhance the user
experience, I think it’s worth exploring.

I plan to file a ticket or two around this sometime next week. Please feel
free to share if you have any strong objections or thoughts.

-Ayush

On Wed, 16 Apr 2025 at 14:59, Shohei Okumiya <oku...@apache.org> wrote:

> Hi,
>
> A friendly Trino maintainer shared an informative survey about branching
> with me. I'm sharing it here, as it would be helpful when we add or update
> branching syntaxes.
>
>
> https://docs.google.com/document/d/1jEF4IkWu-2Gzk5ii2Nb0exuEnAUeo98UbiM3i0xtgWQ/edit?tab=t.0
> https://github.com/trinodb/trino/pull/25152
>
> Best,
> Okumin
>
> On Mon, Nov 25, 2024 at 11:27 PM Okumin <m...@okumin.com> wrote:
>
>> Hi Attila,
>>
>> Adding new commands could be an option. Honestly, I don't often use
>> BRANCH or TAG, and I don't have a strong opinion on either approach.
>>
>> Off topic. If I understand correctly, Dremio's branching semantics are
>> different from what Hive provides. Dremio supports versioning not per table
>> but per entire namespace. I wonder if Apache Hive has a plan to support
>> those semantics.
>>
>> Regards,
>> Okumin
>>
>> On Mon, Nov 11, 2024 at 4:11 PM Butao Zhang <butaozha...@163.com> wrote:
>>
>>> Thanks Attila for starting the hive-iceberg branch/tag discussion.
>>>
>>> In HIVE-27233 <https://issues.apache.org/jira/browse/HIVE-27233> , we
>>> first introduced the branch/tag syntax in Hive by referring to
>>> Spark-Iceberg branch/tag syntax. Spark-Iceberg uses the ALTER
>>> <https://iceberg.apache.org/docs/1.7.0/branching/#historical-tags> syntax
>>> to express the branch/tag operation, and I think most users or engines are
>>> used to this syntax. So following Spark-Iceberg syntax is important for
>>> users who use multiple engines(Spark & Hive, or others).
>>>
>>> But what you said is also reasonable. Sometimes, CREATE & DROP syntax
>>> are more straightforward for the new users. I also have seen the 
>>> Dremio-iceberg
>>> doc <https://docs.dremio.com/cloud/reference/sql/commands/create-branch>,
>>> which shows that Dremio use CREATE & DROP syntax instead of ALTER. For
>>> example:
>>>
>>> CREATE BRANCH [ IF NOT EXISTS ] <branch_name>
>>>    [ { FROM | AT } { REF[ERENCE] | BRANCH | TAG | COMMIT }
>>> <reference_name> ]
>>>    [ IN <catalog_name> ]
>>>
>>> But this Dreimo CREATE syntax is more like a dialect than Spark-Iceberg
>>> ALTER, as we subconsciously think spark-iceberg syntax is right&official.
>>>
>>> IMO, I am not against for implementing the new branch/tag
>>> syntax(CREATE&DROP), as long as there is a strong demand from community
>>> users. But the new syntax will be a Hive-style dialect, which other
>>> engines(Spark&Trino, etc) will not accept.
>>>
>>> I would like to hear opinions from other folks. :)
>>>
>>>
>>> Thanks,
>>> Butao Zhang
>>> ---- Replied Message ----
>>> From Attila Turoczy<aturo...@cloudera.com.INVALID> <undefined>
>>> Date 11/6/2024 21:51
>>> To dev<dev@hive.apache.org> <dev@hive.apache.org>
>>> Subject Iceberg branching tagging syntax
>>> Dear Hive community,
>>>
>>> I would like to hear your feedback about some syntax sugar for iceberg
>>> branching. If somebody is not aware of this cool feature please read out
>>> the following blog post
>>> <https://medium.com/@ayushtkn/apache-hive-4-x-with-iceberg-branches-tags-3d52293ac0bf>
>>> .
>>>
>>> Currently, Hive is implementing branching as per the official
>>> <https://iceberg.apache.org/docs/1.6.1/branching/> recommendation,
>>> which is fine, but there's something about the syntax that feels out of
>>> place in modern SQL linguistics. Today, any new functionality tends to be
>>> added under the ALTER TABLE umbrella. However, ALTER TABLE is
>>> increasingly overloaded with diverse functionalities across different
>>> engines, making it less intuitive. (To me the ALTER TABLE is kinda ETC in
>>> the SQL linguistic)
>>>
>>> Many DBAs I’ve worked with are comfortable with commands like CREATE,
>>> SELECT, and INSERT, but when it comes to ALTER, things often get more
>>> complex and everybody starts to google it.
>>>
>>> This is particularly true now with the introduction of iceberg branching
>>> and tagging features, which are some of the most exciting developments
>>> since somebody invited Spotify! :) But from a usability perspective, this
>>> syntax is challenging to remember and use.
>>>
>>> In customer demos, I've been asked why the syntax is so complicated. In
>>> my view, these key features deserve dedicated verbs, making them distinct
>>> and straightforward.
>>>
>>> As a proposal, I’d suggest introducing new syntax options specifically
>>> for branching and tagging. *This wouldn’t replace the current approach
>>> but could be an alternative that enhances clarity and ease of use.*
>>> Create branch:
>>>
>>> CREATE BRANCH audit_branch FROM audit;
>>>
>>> From snapshot:
>>>
>>> CREATE BRANCH audit_branch FROM audit AS OF VERSION 1234; **
>>>
>>> ** Maybe the FORM here Could be* AT <CommitID>*
>>>
>>> Create tag:
>>>
>>> CREATE TAG historical_tag FROM audit.
>>>
>>> same as for AS OF
>>>
>>> Drop branch:
>>>
>>> DROP BRANCH audit_branch;
>>>
>>> Drop Tag:
>>>
>>> DROP TAG audit_branch;
>>>
>>> Your opinion is very important to us, as it helps determine whether this
>>> is primarily a usability concern for a handful of EU customers, or if it
>>> might be better overall to stick with the classic ALTER approach.
>>> -Attila
>>>
>>>

Reply via email to