Re: Migrating legacy snapshot daily Hive table concept to Iceberg

Edgar Rodriguez Mon, 15 Mar 2021 08:05:04 -0700

Hi Ryan,

On Tue, Mar 9, 2021 at 5:54 AM Ryan Murray <rym...@gmail.com> wrote:


> Hey Edgar, Cheng Pan,
>
> I am not sure if you are aware of project nessie
> <https://projectnessie.org>? It _may_ suit your needs. Nessie applies
> git-like functionality to iceberg tables (in this case most useful are
> branches and tags).
>

Thanks for the suggestion. I did look at nessie and it looks really cool.


>
> In effect you would be pivoting the snapshot partition into the table
> itself and using nessie tags to represent the previous table snapshots. You
> could create a tag for each database snapshot with the date the snapshot
> was taken and the`main` branch would then receive your half hour updates. I
> think the major issue is that you would lose the `ds` partition column and
> have to use the `select * from tablename@tagname` syntax that nessie
> supports to query a specific `ds`, however it would provide you with the
> `snapshot-tag` concept you suggested above. A potential extra benefit is
> that all tables would be under the same tag so you would in effect have the
> same tag for the set of tables rather than an iceberg snapshot id per table.
>

Yeah, I think the main issue with this workflow is either maintaining the
`ds` way to query the tables - while in Iceberg we try to hide partitioning
from the user - or make it so that it's not too much of a disruptive
migration for the user. For instance, if Iceberg supported the snapshot-tag
we could use the Spark procedure to set the current snapshot using a tag,
which may be easier to use than the snapshot-id that right now it expects.

I think in general, it's a bit hard to track snapshot-ids in Iceberg
specially for making it easy to use when referring to them. All
snapshot-ids may not be relevant to users and an external mapping would be
needed to track them on certain specific points. I think while not having
the full features of nessie, snapshot-tags by themselves would still be
useful for folks using their own catalog/tools or vanilla Iceberg.

Thanks,
-- 
Edgar R

Re: Migrating legacy snapshot daily Hive table concept to Iceberg

Reply via email to