Thank you very much! I will try to document this on the website. On Thu, Oct 31, 2024 at 6:06 PM Szehon Ho <szehon.apa...@gmail.com> wrote:
> Yes, that is correct! > > Thanks > Szehon > > On Thu, Oct 31, 2024 at 5:58 PM Pucheng Yang <py...@pinterest.com.invalid> > wrote: > >> Hi Szehon, >> >> Thanks for getting back to me so quickly! >> >> No, I don't see anywhere that is failing. My question is more of a >> general question after browsing all the issues. So from what you said, it >> seems Iceberg in theory can support a very large number of columns (say >> 100K) w/o hitting any hard limit due to Iceberg's own implementation (not >> considering metadata too big hence some performance penalty). And there >> might still be some incompatible issues but they are generally fixable. >> >> Is my understanding correct? >> >> Thanks again for your quick response. >> >> On Thu, Oct 31, 2024 at 5:50 PM Szehon Ho <szehon.apa...@gmail.com> >> wrote: >> >>> Hi Pucheng >>> >>> There were some parts in the implementation where column field ids >>> collided with partition field ids. >>> https://github.com/apache/iceberg/pull/10020 introduced mechanisms for >>> affected code to get unique ids, and known places have been fixed. >>> (Particularly the Spark procedure rewrite_position_deletes and selecting >>> _partition metadata column). It is possible there's other places not fixed >>> yet. Do you have concrete examples of broken functionality on the latest >>> code? >>> >>> Thanks >>> Szehon >>> >>> On Thu, Oct 31, 2024 at 5:16 PM Pucheng Yang <py...@pinterest.com.invalid> >>> wrote: >>> >>>> Hey community, >>>> >>>> I was following https://github.com/apache/iceberg/issues/9220 (Max >>>> number of columns) and down the rabbit hole and I found there are a lot of >>>> discussions about issues with tables having more than 1k columns. However, >>>> after reviewing discussions, it is still a little confusing to me. So I >>>> would like to check here: >>>> >>>> My questions are: >>>> - Without considering performance penalty, do we have a hard limit on >>>> how many columns an Iceberg table can have, and once we exceed that, we >>>> will start to see failures? >>>> - Is it reasonable to say we should still expect some other failures to >>>> happen (but theoretically fixable) after a table has more than 1k columns? >>>> >>>> I would appreciate your answer and I would love to help to document the >>>> answer to my above questions. >>>> >>>> Thanks! >>>> >>>