Thank you very much! I will try to document this on the website.

On Thu, Oct 31, 2024 at 6:06 PM Szehon Ho <szehon.apa...@gmail.com> wrote:

> Yes, that is correct!
>
> Thanks
> Szehon
>
> On Thu, Oct 31, 2024 at 5:58 PM Pucheng Yang <py...@pinterest.com.invalid>
> wrote:
>
>> Hi Szehon,
>>
>> Thanks for getting back to me so quickly!
>>
>> No, I don't see anywhere that is failing. My question is more of a
>> general question after browsing all the issues. So from what you said, it
>> seems Iceberg in theory can support a very large number of columns (say
>> 100K) w/o hitting any hard limit due to Iceberg's own implementation (not
>> considering metadata too big hence some performance penalty). And there
>> might still be some incompatible issues but they are generally fixable.
>>
>> Is my understanding correct?
>>
>> Thanks again for your quick response.
>>
>> On Thu, Oct 31, 2024 at 5:50 PM Szehon Ho <szehon.apa...@gmail.com>
>> wrote:
>>
>>> Hi Pucheng
>>>
>>> There were some parts in the implementation where column field ids
>>> collided with partition field ids.
>>> https://github.com/apache/iceberg/pull/10020 introduced mechanisms for
>>> affected code to get unique ids, and known places have been fixed.
>>> (Particularly the Spark procedure rewrite_position_deletes and selecting
>>> _partition metadata column).  It is possible there's other places not fixed
>>> yet.  Do you have concrete examples of broken functionality on the latest
>>> code?
>>>
>>> Thanks
>>> Szehon
>>>
>>> On Thu, Oct 31, 2024 at 5:16 PM Pucheng Yang <py...@pinterest.com.invalid>
>>> wrote:
>>>
>>>> Hey community,
>>>>
>>>> I was following https://github.com/apache/iceberg/issues/9220 (Max
>>>> number of columns) and down the rabbit hole and I found there are a lot of
>>>> discussions about issues with tables having more than 1k columns.  However,
>>>> after reviewing discussions, it is still a little confusing to me. So I
>>>> would like to check here:
>>>>
>>>> My questions are:
>>>> - Without considering performance penalty, do we have a hard limit on
>>>> how many columns an Iceberg table can have, and once we exceed that, we
>>>> will start to see failures?
>>>> - Is it reasonable to say we should still expect some other failures to
>>>> happen (but theoretically fixable) after a table has more than 1k columns?
>>>>
>>>> I would appreciate your answer and I would love to help to document the
>>>> answer to my above questions.
>>>>
>>>> Thanks!
>>>>
>>>

Reply via email to