Yes, that is correct! Thanks Szehon
On Thu, Oct 31, 2024 at 5:58 PM Pucheng Yang <py...@pinterest.com.invalid> wrote: > Hi Szehon, > > Thanks for getting back to me so quickly! > > No, I don't see anywhere that is failing. My question is more of a general > question after browsing all the issues. So from what you said, it seems > Iceberg in theory can support a very large number of columns (say 100K) w/o > hitting any hard limit due to Iceberg's own implementation (not considering > metadata too big hence some performance penalty). And there might still be > some incompatible issues but they are generally fixable. > > Is my understanding correct? > > Thanks again for your quick response. > > On Thu, Oct 31, 2024 at 5:50 PM Szehon Ho <szehon.apa...@gmail.com> wrote: > >> Hi Pucheng >> >> There were some parts in the implementation where column field ids >> collided with partition field ids. >> https://github.com/apache/iceberg/pull/10020 introduced mechanisms for >> affected code to get unique ids, and known places have been fixed. >> (Particularly the Spark procedure rewrite_position_deletes and selecting >> _partition metadata column). It is possible there's other places not fixed >> yet. Do you have concrete examples of broken functionality on the latest >> code? >> >> Thanks >> Szehon >> >> On Thu, Oct 31, 2024 at 5:16 PM Pucheng Yang <py...@pinterest.com.invalid> >> wrote: >> >>> Hey community, >>> >>> I was following https://github.com/apache/iceberg/issues/9220 (Max >>> number of columns) and down the rabbit hole and I found there are a lot of >>> discussions about issues with tables having more than 1k columns. However, >>> after reviewing discussions, it is still a little confusing to me. So I >>> would like to check here: >>> >>> My questions are: >>> - Without considering performance penalty, do we have a hard limit on >>> how many columns an Iceberg table can have, and once we exceed that, we >>> will start to see failures? >>> - Is it reasonable to say we should still expect some other failures to >>> happen (but theoretically fixable) after a table has more than 1k columns? >>> >>> I would appreciate your answer and I would love to help to document the >>> answer to my above questions. >>> >>> Thanks! >>> >>