Hi

I think we have to keep it "clear and simple" as possible.
I would prefer to have one diagram per spec version (to be clear in the
scope).

So, I would rather keep the current diagram (working for v1) and add a new
one v2 centric.
It would be great to have a "side by side" presentation, something like:

   v1     |    v 2

When v3 will be out, we can add a new v3 centric diagram.

Regards
JB

On Wed, Nov 8, 2023 at 6:33 PM Jason Hughes <ja...@dremio.com.invalid>
wrote:

> since v2 has been out for a while and most tools that support iceberg
> support v2 (not to mention some only support v2), I think having a single
> diagram and using dotted lines for the delete manifests and delete files
> will cause more confusion than benefit. also because of the support and
> adoption of v2, personally I'm in favor of replacing the arch diagram with
> this one that's for v2. that said, if folks are in favor of it, I can also
> edit the v1 table diagram to include stats files too and have them coexist
> on the spec page, noting which is v1 and which is v2
>
> what does everyone think?
>
>
> Jason Hughes
>
>
> Dremio | Director of Technical Advocacy
>
>
>
>
>
>
> On Mon, Nov 6, 2023 at 12:47 AM Ajantha Bhat <ajanthab...@gmail.com>
> wrote:
>
>> However, there are a lot of boxes and new terms. What do you think of
>>> keeping both files, and indicating that the old applies to V1 tables, and
>>> the new one to V2 tables.
>>
>>
>> Statistics are common for both V1 and V2. So, we can't say old applies to
>> V1 and new applies to V2.
>> For delete, we are using existing boxes.
>> So, I think we can keep only one image with dotted delete manifest and
>> delete files mentioning it is specific to V2 merge-on-read condition.
>>
>> Suggestions are welcome.
>>
>> On Mon, Nov 6, 2023 at 1:54 PM Eduard Tudenhoefner <
>> etudenhoef...@apache.org> wrote:
>>
>>> Thanks for updating the diagram and +1 to Fokko's suggestion.
>>>
>>> On Fri, Nov 3, 2023 at 3:43 PM Fokko Driesprong <fo...@apache.org>
>>> wrote:
>>>
>>>> Hey Jason, thanks for updating the chart.
>>>>
>>>> I like it a lot. However, there are a lot of boxes and new terms. What
>>>> do you think of keeping both files, and indicating that the old applies to
>>>> V1 tables, and the new one to V2 tables.
>>>>
>>>> Kind regards,
>>>> Fokko
>>>>
>>>> Op vr 3 nov 2023 om 14:37 schreef Aaron Niskode-Dossett
>>>> <aniskodedoss...@etsy.com.invalid>:
>>>>
>>>>> An update would be greatly appreciated, thank you!
>>>>>
>>>>> On Thu, Nov 2, 2023 at 12:42 PM Jason Hughes <ja...@dremio.com.invalid>
>>>>> wrote:
>>>>>
>>>>>> Hey all,
>>>>>>
>>>>>> The current architecture diagram
>>>>>> <https://iceberg.apache.org/img/iceberg-metadata.png> for an iceberg
>>>>>> table hasn't been updated in over 3 years, and there's are some aspects 
>>>>>> to
>>>>>> the architecture of an iceberg table that have changed, most notably 
>>>>>> delete
>>>>>> files and puffin files. since this diagram gets a lot of use in 
>>>>>> enablement
>>>>>> content around the community and isn't totally accurate anymore, @Ajantha
>>>>>> Bhat U <ajantha.bh...@dremio.com> and I discussed updating it to be
>>>>>> more accurate
>>>>>>
>>>>>> here's an updated version of the diagram
>>>>>> <https://docs.google.com/drawings/d/1m_iiJIJjiymadFIsCYnuUS6BvFo0MYDPCx0kKhZgIx4/edit>
>>>>>> we put together
>>>>>>
>>>>>> a few points for discussion that we're interested in others' thoughts
>>>>>> on:
>>>>>>
>>>>>>    1. the diagram is obviously somewhat more visually complicated
>>>>>>    than the current one, but IMO the benefit of being more accurate for 
>>>>>> people
>>>>>>    learning iceberg outweighs the additional complexity
>>>>>>    2. since the partition stats spec PR
>>>>>>    <https://github.com/apache/iceberg/pull/7105> just got merged, we
>>>>>>    thought it'd be good to include that too while we're updating it, and
>>>>>>    combine puffin files with partition stats files into one category of 
>>>>>> files
>>>>>>    in the diagram labeled "statistics files". we combined them in the 
>>>>>> diagram,
>>>>>>    rather than splitting them up, because 1. it provides a simpler 
>>>>>> diagram, 2.
>>>>>>    gets the primary point across, and 3. they both serve the purpose of
>>>>>>    providing statistics for tools to leverage (albeit for different use 
>>>>>> cases)
>>>>>>    3. we put statistics files in place in the diagram for both s0
>>>>>>    and s1, though we could only have statistics files for s1, which 
>>>>>> would 1.
>>>>>>    make the diagram simpler, and 2. show a simple example of the use 
>>>>>> case of
>>>>>>    not needing stats files initially, but then as data grows and/or query
>>>>>>    patterns change, now stats files are needed
>>>>>>
>>>>>> if folks are on board with updating the diagram, and after we come to
>>>>>> a conclusion on the above discussion points and any others that come up, 
>>>>>> I
>>>>>> can export it to a png and create a PR to update the arch diagram image 
>>>>>> on
>>>>>> the site
>>>>>>
>>>>>> thanks!
>>>>>>
>>>>>>
>>>>>> Jason Hughes
>>>>>>
>>>>>>
>>>>>> Dremio | Director of Technical Advocacy
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>> --
>>>>> Aaron Niskode-Dossett, Data Engineering -- Etsy
>>>>>
>>>>

Reply via email to