since v2 has been out for a while and most tools that support iceberg
support v2 (not to mention some only support v2), I think having a single
diagram and using dotted lines for the delete manifests and delete files
will cause more confusion than benefit. also because of the support and
adoption of v2, personally I'm in favor of replacing the arch diagram with
this one that's for v2. that said, if folks are in favor of it, I can also
edit the v1 table diagram to include stats files too and have them coexist
on the spec page, noting which is v1 and which is v2

what does everyone think?


Jason Hughes


Dremio | Director of Technical Advocacy






On Mon, Nov 6, 2023 at 12:47 AM Ajantha Bhat <ajanthab...@gmail.com> wrote:

> However, there are a lot of boxes and new terms. What do you think of
>> keeping both files, and indicating that the old applies to V1 tables, and
>> the new one to V2 tables.
>
>
> Statistics are common for both V1 and V2. So, we can't say old applies to
> V1 and new applies to V2.
> For delete, we are using existing boxes.
> So, I think we can keep only one image with dotted delete manifest and
> delete files mentioning it is specific to V2 merge-on-read condition.
>
> Suggestions are welcome.
>
> On Mon, Nov 6, 2023 at 1:54 PM Eduard Tudenhoefner <
> etudenhoef...@apache.org> wrote:
>
>> Thanks for updating the diagram and +1 to Fokko's suggestion.
>>
>> On Fri, Nov 3, 2023 at 3:43 PM Fokko Driesprong <fo...@apache.org> wrote:
>>
>>> Hey Jason, thanks for updating the chart.
>>>
>>> I like it a lot. However, there are a lot of boxes and new terms. What
>>> do you think of keeping both files, and indicating that the old applies to
>>> V1 tables, and the new one to V2 tables.
>>>
>>> Kind regards,
>>> Fokko
>>>
>>> Op vr 3 nov 2023 om 14:37 schreef Aaron Niskode-Dossett
>>> <aniskodedoss...@etsy.com.invalid>:
>>>
>>>> An update would be greatly appreciated, thank you!
>>>>
>>>> On Thu, Nov 2, 2023 at 12:42 PM Jason Hughes <ja...@dremio.com.invalid>
>>>> wrote:
>>>>
>>>>> Hey all,
>>>>>
>>>>> The current architecture diagram
>>>>> <https://iceberg.apache.org/img/iceberg-metadata.png> for an iceberg
>>>>> table hasn't been updated in over 3 years, and there's are some aspects to
>>>>> the architecture of an iceberg table that have changed, most notably 
>>>>> delete
>>>>> files and puffin files. since this diagram gets a lot of use in enablement
>>>>> content around the community and isn't totally accurate anymore, @Ajantha
>>>>> Bhat U <ajantha.bh...@dremio.com> and I discussed updating it to be
>>>>> more accurate
>>>>>
>>>>> here's an updated version of the diagram
>>>>> <https://docs.google.com/drawings/d/1m_iiJIJjiymadFIsCYnuUS6BvFo0MYDPCx0kKhZgIx4/edit>
>>>>> we put together
>>>>>
>>>>> a few points for discussion that we're interested in others' thoughts
>>>>> on:
>>>>>
>>>>>    1. the diagram is obviously somewhat more visually complicated
>>>>>    than the current one, but IMO the benefit of being more accurate for 
>>>>> people
>>>>>    learning iceberg outweighs the additional complexity
>>>>>    2. since the partition stats spec PR
>>>>>    <https://github.com/apache/iceberg/pull/7105> just got merged, we
>>>>>    thought it'd be good to include that too while we're updating it, and
>>>>>    combine puffin files with partition stats files into one category of 
>>>>> files
>>>>>    in the diagram labeled "statistics files". we combined them in the 
>>>>> diagram,
>>>>>    rather than splitting them up, because 1. it provides a simpler 
>>>>> diagram, 2.
>>>>>    gets the primary point across, and 3. they both serve the purpose of
>>>>>    providing statistics for tools to leverage (albeit for different use 
>>>>> cases)
>>>>>    3. we put statistics files in place in the diagram for both s0 and
>>>>>    s1, though we could only have statistics files for s1, which would 1. 
>>>>> make
>>>>>    the diagram simpler, and 2. show a simple example of the use case of 
>>>>> not
>>>>>    needing stats files initially, but then as data grows and/or query 
>>>>> patterns
>>>>>    change, now stats files are needed
>>>>>
>>>>> if folks are on board with updating the diagram, and after we come to
>>>>> a conclusion on the above discussion points and any others that come up, I
>>>>> can export it to a png and create a PR to update the arch diagram image on
>>>>> the site
>>>>>
>>>>> thanks!
>>>>>
>>>>>
>>>>> Jason Hughes
>>>>>
>>>>>
>>>>> Dremio | Director of Technical Advocacy
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>> --
>>>> Aaron Niskode-Dossett, Data Engineering -- Etsy
>>>>
>>>

Reply via email to