Hi Pucheng

There were some parts in the implementation where column field ids
collided with partition field ids.
https://github.com/apache/iceberg/pull/10020 introduced mechanisms for
affected code to get unique ids, and known places have been fixed.
(Particularly the Spark procedure rewrite_position_deletes and selecting
_partition metadata column).  It is possible there's other places not fixed
yet.  Do you have concrete examples of broken functionality on the latest
code?

Thanks
Szehon

On Thu, Oct 31, 2024 at 5:16 PM Pucheng Yang <py...@pinterest.com.invalid>
wrote:

> Hey community,
>
> I was following https://github.com/apache/iceberg/issues/9220 (Max number
> of columns) and down the rabbit hole and I found there are a lot of
> discussions about issues with tables having more than 1k columns.  However,
> after reviewing discussions, it is still a little confusing to me. So I
> would like to check here:
>
> My questions are:
> - Without considering performance penalty, do we have a hard limit on how
> many columns an Iceberg table can have, and once we exceed that, we will
> start to see failures?
> - Is it reasonable to say we should still expect some other failures to
> happen (but theoretically fixable) after a table has more than 1k columns?
>
> I would appreciate your answer and I would love to help to document the
> answer to my above questions.
>
> Thanks!
>

Reply via email to