Re: Need clarification on max number of columns a Iceberg table can have

Szehon Ho Thu, 31 Oct 2024 18:06:55 -0700

Yes, that is correct!

Thanks
Szehon


On Thu, Oct 31, 2024 at 5:58 PM Pucheng Yang <py...@pinterest.com.invalid>
wrote:

> Hi Szehon,
>
> Thanks for getting back to me so quickly!
>
> No, I don't see anywhere that is failing. My question is more of a general
> question after browsing all the issues. So from what you said, it seems
> Iceberg in theory can support a very large number of columns (say 100K) w/o
> hitting any hard limit due to Iceberg's own implementation (not considering
> metadata too big hence some performance penalty). And there might still be
> some incompatible issues but they are generally fixable.
>
> Is my understanding correct?
>
> Thanks again for your quick response.
>
> On Thu, Oct 31, 2024 at 5:50 PM Szehon Ho <szehon.apa...@gmail.com> wrote:
>
>> Hi Pucheng
>>
>> There were some parts in the implementation where column field ids
>> collided with partition field ids.
>> https://github.com/apache/iceberg/pull/10020 introduced mechanisms for
>> affected code to get unique ids, and known places have been fixed.
>> (Particularly the Spark procedure rewrite_position_deletes and selecting
>> _partition metadata column).  It is possible there's other places not fixed
>> yet.  Do you have concrete examples of broken functionality on the latest
>> code?
>>
>> Thanks
>> Szehon
>>
>> On Thu, Oct 31, 2024 at 5:16 PM Pucheng Yang <py...@pinterest.com.invalid>
>> wrote:
>>
>>> Hey community,
>>>
>>> I was following https://github.com/apache/iceberg/issues/9220 (Max
>>> number of columns) and down the rabbit hole and I found there are a lot of
>>> discussions about issues with tables having more than 1k columns.  However,
>>> after reviewing discussions, it is still a little confusing to me. So I
>>> would like to check here:
>>>
>>> My questions are:
>>> - Without considering performance penalty, do we have a hard limit on
>>> how many columns an Iceberg table can have, and once we exceed that, we
>>> will start to see failures?
>>> - Is it reasonable to say we should still expect some other failures to
>>> happen (but theoretically fixable) after a table has more than 1k columns?
>>>
>>> I would appreciate your answer and I would love to help to document the
>>> answer to my above questions.
>>>
>>> Thanks!
>>>
>>

Re: Need clarification on max number of columns a Iceberg table can have

Reply via email to