Hey Jason, thanks for updating the chart. I like it a lot. However, there are a lot of boxes and new terms. What do you think of keeping both files, and indicating that the old applies to V1 tables, and the new one to V2 tables.
Kind regards, Fokko Op vr 3 nov 2023 om 14:37 schreef Aaron Niskode-Dossett <aniskodedoss...@etsy.com.invalid>: > An update would be greatly appreciated, thank you! > > On Thu, Nov 2, 2023 at 12:42 PM Jason Hughes <ja...@dremio.com.invalid> > wrote: > >> Hey all, >> >> The current architecture diagram >> <https://iceberg.apache.org/img/iceberg-metadata.png> for an iceberg >> table hasn't been updated in over 3 years, and there's are some aspects to >> the architecture of an iceberg table that have changed, most notably delete >> files and puffin files. since this diagram gets a lot of use in enablement >> content around the community and isn't totally accurate anymore, @Ajantha >> Bhat U <ajantha.bh...@dremio.com> and I discussed updating it to be more >> accurate >> >> here's an updated version of the diagram >> <https://docs.google.com/drawings/d/1m_iiJIJjiymadFIsCYnuUS6BvFo0MYDPCx0kKhZgIx4/edit> >> we put together >> >> a few points for discussion that we're interested in others' thoughts on: >> >> 1. the diagram is obviously somewhat more visually complicated than >> the current one, but IMO the benefit of being more accurate for people >> learning iceberg outweighs the additional complexity >> 2. since the partition stats spec PR >> <https://github.com/apache/iceberg/pull/7105> just got merged, we >> thought it'd be good to include that too while we're updating it, and >> combine puffin files with partition stats files into one category of files >> in the diagram labeled "statistics files". we combined them in the >> diagram, >> rather than splitting them up, because 1. it provides a simpler diagram, >> 2. >> gets the primary point across, and 3. they both serve the purpose of >> providing statistics for tools to leverage (albeit for different use >> cases) >> 3. we put statistics files in place in the diagram for both s0 and >> s1, though we could only have statistics files for s1, which would 1. make >> the diagram simpler, and 2. show a simple example of the use case of not >> needing stats files initially, but then as data grows and/or query >> patterns >> change, now stats files are needed >> >> if folks are on board with updating the diagram, and after we come to a >> conclusion on the above discussion points and any others that come up, I >> can export it to a png and create a PR to update the arch diagram image on >> the site >> >> thanks! >> >> >> Jason Hughes >> >> >> Dremio | Director of Technical Advocacy >> >> >> >> >> > > -- > Aaron Niskode-Dossett, Data Engineering -- Etsy >