On Sat, Jul 26, 2025 at 6:09 PM Kevin Liu <kevinjq...@apache.org> wrote:
> > My initial idea was to disallow the use of UnknownType as the element > in ListType and not allow the UnknownType as either a Key or Value of a > MapType. Any thoughts or concerns? > > That makes sense. I would also include `StructType` here too. `StructType` > is another "complex type" (extends NestedType > <https://github.com/apache/iceberg/blob/360f87326d4ccf67512a0240e529035801d9db2b/api/src/main/java/org/apache/iceberg/types/Types.java#L1001>) > just like `ListType` and `MapType`. > This will make `unknown` the first primitive type to not be allowed as > part of another complex type. > Do you mean to forbid `UnknownType` inside `StructType`? I'm afraid that would undermine the orthogonality of the system. A common use of StructType is to store entire rows. If StructType cannot contain elements that are UnknownType but top-level rows can, then you can no longer store an arbitrary top-level row inside a StructType. Unfortunately UnknownType in struct does have some issues. In particular, if it's not stored, then IIUIC you can have issues with structs containing only UnknownType fields -- they will look empty to Parquet, and my understanding is that that isn't allowed. For orthogonality it would have been better to actually store the unknown type, even if it's just as a series of "this is NULL" bits. Omitting these fields in storage seems like a convenient hack that leads to all sorts of surprising corner cases... On Sat, Jul 26, 2025 at 5:43 AM Fokko Driesprong <fo...@apache.org> wrote: > >> Hi everyone, >> >> Recently I took a stab at implementing reading UknownType >> <https://github.com/apache/iceberg/pull/13445> in the Java >> implementation. I thought it would make sense to add this to the reference >> implementation first. However, I ran into a limitation with the current >> definition in the spec: >> >> Must be optional with null defaults; not stored in data files >> >> >> One obvious limitation is that it cannot be the key of a MapType, as it >> has to be not-null. It can't be stored either as the value of a MapType >> since there is no easy way to store just the key without doing awkward >> things, such as writing just the keys as a list. >> >> My initial idea was to disallow the use of UnknownType as the element in >> ListType and not allow the UnknownType as either a Key or Value of a >> MapType. Any thoughts or concerns? >> >> Kind regards from Belgium, >> Fokko >> >