Hi I think we have to keep it "clear and simple" as possible. I would prefer to have one diagram per spec version (to be clear in the scope).
So, I would rather keep the current diagram (working for v1) and add a new one v2 centric. It would be great to have a "side by side" presentation, something like: v1 | v 2 When v3 will be out, we can add a new v3 centric diagram. Regards JB On Wed, Nov 8, 2023 at 6:33 PM Jason Hughes <ja...@dremio.com.invalid> wrote: > since v2 has been out for a while and most tools that support iceberg > support v2 (not to mention some only support v2), I think having a single > diagram and using dotted lines for the delete manifests and delete files > will cause more confusion than benefit. also because of the support and > adoption of v2, personally I'm in favor of replacing the arch diagram with > this one that's for v2. that said, if folks are in favor of it, I can also > edit the v1 table diagram to include stats files too and have them coexist > on the spec page, noting which is v1 and which is v2 > > what does everyone think? > > > Jason Hughes > > > Dremio | Director of Technical Advocacy > > > > > > > On Mon, Nov 6, 2023 at 12:47 AM Ajantha Bhat <ajanthab...@gmail.com> > wrote: > >> However, there are a lot of boxes and new terms. What do you think of >>> keeping both files, and indicating that the old applies to V1 tables, and >>> the new one to V2 tables. >> >> >> Statistics are common for both V1 and V2. So, we can't say old applies to >> V1 and new applies to V2. >> For delete, we are using existing boxes. >> So, I think we can keep only one image with dotted delete manifest and >> delete files mentioning it is specific to V2 merge-on-read condition. >> >> Suggestions are welcome. >> >> On Mon, Nov 6, 2023 at 1:54 PM Eduard Tudenhoefner < >> etudenhoef...@apache.org> wrote: >> >>> Thanks for updating the diagram and +1 to Fokko's suggestion. >>> >>> On Fri, Nov 3, 2023 at 3:43 PM Fokko Driesprong <fo...@apache.org> >>> wrote: >>> >>>> Hey Jason, thanks for updating the chart. >>>> >>>> I like it a lot. However, there are a lot of boxes and new terms. What >>>> do you think of keeping both files, and indicating that the old applies to >>>> V1 tables, and the new one to V2 tables. >>>> >>>> Kind regards, >>>> Fokko >>>> >>>> Op vr 3 nov 2023 om 14:37 schreef Aaron Niskode-Dossett >>>> <aniskodedoss...@etsy.com.invalid>: >>>> >>>>> An update would be greatly appreciated, thank you! >>>>> >>>>> On Thu, Nov 2, 2023 at 12:42 PM Jason Hughes <ja...@dremio.com.invalid> >>>>> wrote: >>>>> >>>>>> Hey all, >>>>>> >>>>>> The current architecture diagram >>>>>> <https://iceberg.apache.org/img/iceberg-metadata.png> for an iceberg >>>>>> table hasn't been updated in over 3 years, and there's are some aspects >>>>>> to >>>>>> the architecture of an iceberg table that have changed, most notably >>>>>> delete >>>>>> files and puffin files. since this diagram gets a lot of use in >>>>>> enablement >>>>>> content around the community and isn't totally accurate anymore, @Ajantha >>>>>> Bhat U <ajantha.bh...@dremio.com> and I discussed updating it to be >>>>>> more accurate >>>>>> >>>>>> here's an updated version of the diagram >>>>>> <https://docs.google.com/drawings/d/1m_iiJIJjiymadFIsCYnuUS6BvFo0MYDPCx0kKhZgIx4/edit> >>>>>> we put together >>>>>> >>>>>> a few points for discussion that we're interested in others' thoughts >>>>>> on: >>>>>> >>>>>> 1. the diagram is obviously somewhat more visually complicated >>>>>> than the current one, but IMO the benefit of being more accurate for >>>>>> people >>>>>> learning iceberg outweighs the additional complexity >>>>>> 2. since the partition stats spec PR >>>>>> <https://github.com/apache/iceberg/pull/7105> just got merged, we >>>>>> thought it'd be good to include that too while we're updating it, and >>>>>> combine puffin files with partition stats files into one category of >>>>>> files >>>>>> in the diagram labeled "statistics files". we combined them in the >>>>>> diagram, >>>>>> rather than splitting them up, because 1. it provides a simpler >>>>>> diagram, 2. >>>>>> gets the primary point across, and 3. they both serve the purpose of >>>>>> providing statistics for tools to leverage (albeit for different use >>>>>> cases) >>>>>> 3. we put statistics files in place in the diagram for both s0 >>>>>> and s1, though we could only have statistics files for s1, which >>>>>> would 1. >>>>>> make the diagram simpler, and 2. show a simple example of the use >>>>>> case of >>>>>> not needing stats files initially, but then as data grows and/or query >>>>>> patterns change, now stats files are needed >>>>>> >>>>>> if folks are on board with updating the diagram, and after we come to >>>>>> a conclusion on the above discussion points and any others that come up, >>>>>> I >>>>>> can export it to a png and create a PR to update the arch diagram image >>>>>> on >>>>>> the site >>>>>> >>>>>> thanks! >>>>>> >>>>>> >>>>>> Jason Hughes >>>>>> >>>>>> >>>>>> Dremio | Director of Technical Advocacy >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>> >>>>> -- >>>>> Aaron Niskode-Dossett, Data Engineering -- Etsy >>>>> >>>>