Hi Attila,

Adding new commands could be an option. Honestly, I don't often use BRANCH
or TAG, and I don't have a strong opinion on either approach.

Off topic. If I understand correctly, Dremio's branching semantics are
different from what Hive provides. Dremio supports versioning not per table
but per entire namespace. I wonder if Apache Hive has a plan to support
those semantics.

Regards,
Okumin

On Mon, Nov 11, 2024 at 4:11 PM Butao Zhang <butaozha...@163.com> wrote:

> Thanks Attila for starting the hive-iceberg branch/tag discussion.
>
> In HIVE-27233 <https://issues.apache.org/jira/browse/HIVE-27233> , we
> first introduced the branch/tag syntax in Hive by referring to
> Spark-Iceberg branch/tag syntax. Spark-Iceberg uses the ALTER
> <https://iceberg.apache.org/docs/1.7.0/branching/#historical-tags> syntax
> to express the branch/tag operation, and I think most users or engines are
> used to this syntax. So following Spark-Iceberg syntax is important for
> users who use multiple engines(Spark & Hive, or others).
>
> But what you said is also reasonable. Sometimes, CREATE & DROP syntax are
> more straightforward for the new users. I also have seen the Dremio-iceberg
> doc <https://docs.dremio.com/cloud/reference/sql/commands/create-branch>,
> which shows that Dremio use CREATE & DROP syntax instead of ALTER. For
> example:
>
> CREATE BRANCH [ IF NOT EXISTS ] <branch_name>
>    [ { FROM | AT } { REF[ERENCE] | BRANCH | TAG | COMMIT }
> <reference_name> ]
>    [ IN <catalog_name> ]
>
> But this Dreimo CREATE syntax is more like a dialect than Spark-Iceberg
> ALTER, as we subconsciously think spark-iceberg syntax is right&official.
>
> IMO, I am not against for implementing the new branch/tag
> syntax(CREATE&DROP), as long as there is a strong demand from community
> users. But the new syntax will be a Hive-style dialect, which other
> engines(Spark&Trino, etc) will not accept.
>
> I would like to hear opinions from other folks. :)
>
>
> Thanks,
> Butao Zhang
> ---- Replied Message ----
> From Attila Turoczy<aturo...@cloudera.com.INVALID> <undefined>
> Date 11/6/2024 21:51
> To dev<dev@hive.apache.org> <dev@hive.apache.org>
> Subject Iceberg branching tagging syntax
> Dear Hive community,
>
> I would like to hear your feedback about some syntax sugar for iceberg
> branching. If somebody is not aware of this cool feature please read out
> the following blog post
> <https://medium.com/@ayushtkn/apache-hive-4-x-with-iceberg-branches-tags-3d52293ac0bf>
> .
>
> Currently, Hive is implementing branching as per the official
> <https://iceberg.apache.org/docs/1.6.1/branching/> recommendation, which
> is fine, but there's something about the syntax that feels out of place in
> modern SQL linguistics. Today, any new functionality tends to be added
> under the ALTER TABLE umbrella. However, ALTER TABLE is increasingly
> overloaded with diverse functionalities across different engines, making it
> less intuitive. (To me the ALTER TABLE is kinda ETC in the SQL linguistic)
>
> Many DBAs I’ve worked with are comfortable with commands like CREATE,
> SELECT, and INSERT, but when it comes to ALTER, things often get more
> complex and everybody starts to google it.
>
> This is particularly true now with the introduction of iceberg branching
> and tagging features, which are some of the most exciting developments
> since somebody invited Spotify! :) But from a usability perspective, this
> syntax is challenging to remember and use.
>
> In customer demos, I've been asked why the syntax is so complicated. In my
> view, these key features deserve dedicated verbs, making them distinct and
> straightforward.
>
> As a proposal, I’d suggest introducing new syntax options specifically for
> branching and tagging. *This wouldn’t replace the current approach but
> could be an alternative that enhances clarity and ease of use.*
> Create branch:
>
> CREATE BRANCH audit_branch FROM audit;
>
> From snapshot:
>
> CREATE BRANCH audit_branch FROM audit AS OF VERSION 1234; **
>
> ** Maybe the FORM here Could be* AT <CommitID>*
>
> Create tag:
>
> CREATE TAG historical_tag FROM audit.
>
> same as for AS OF
>
> Drop branch:
>
> DROP BRANCH audit_branch;
>
> Drop Tag:
>
> DROP TAG audit_branch;
>
> Your opinion is very important to us, as it helps determine whether this
> is primarily a usability concern for a handful of EU customers, or if it
> might be better overall to stick with the classic ALTER approach.
> -Attila
>
>

Reply via email to