I think it's reasonable to fail in cases where the underlying format can't represent a type, like the element of a list. We can go back and fix this by adding support for using Parquet's UNKNOWN type annotation <https://github.com/apache/parquet-format/blob/master/LogicalTypes.md#unknown-always-null>, but my original concern about it was that it introduces the need to choose an underlying physical type and I didn't want that to cause future problems. Maybe we should just standardize that as fixed[0].
On Mon, Jul 28, 2025 at 1:13 AM Bart Samwel <b...@databricks.com.invalid> wrote: > On Sat, Jul 26, 2025 at 6:09 PM Kevin Liu <kevinjq...@apache.org> wrote: > >> > My initial idea was to disallow the use of UnknownType as the element >> in ListType and not allow the UnknownType as either a Key or Value of a >> MapType. Any thoughts or concerns? >> >> That makes sense. I would also include `StructType` here too. >> `StructType` is another "complex type" (extends NestedType >> <https://github.com/apache/iceberg/blob/360f87326d4ccf67512a0240e529035801d9db2b/api/src/main/java/org/apache/iceberg/types/Types.java#L1001>) >> just like `ListType` and `MapType`. >> This will make `unknown` the first primitive type to not be allowed as >> part of another complex type. >> > > Do you mean to forbid `UnknownType` inside `StructType`? I'm afraid that > would undermine the orthogonality of the system. A common use of StructType > is to store entire rows. If StructType cannot contain elements that are > UnknownType but top-level rows can, then you can no longer store an > arbitrary top-level row inside a StructType. > > Unfortunately UnknownType in struct does have some issues. In particular, > if it's not stored, then IIUIC you can have issues with structs containing > only UnknownType fields -- they will look empty to Parquet, and my > understanding is that that isn't allowed. For orthogonality it would have > been better to actually store the unknown type, even if it's just as a > series of "this is NULL" bits. Omitting these fields in storage seems like > a convenient hack that leads to all sorts of surprising corner cases... > > > On Sat, Jul 26, 2025 at 5:43 AM Fokko Driesprong <fo...@apache.org> wrote: >> >>> Hi everyone, >>> >>> Recently I took a stab at implementing reading UknownType >>> <https://github.com/apache/iceberg/pull/13445> in the Java >>> implementation. I thought it would make sense to add this to the reference >>> implementation first. However, I ran into a limitation with the current >>> definition in the spec: >>> >>> Must be optional with null defaults; not stored in data files >>> >>> >>> One obvious limitation is that it cannot be the key of a MapType, as it >>> has to be not-null. It can't be stored either as the value of a MapType >>> since there is no easy way to store just the key without doing awkward >>> things, such as writing just the keys as a list. >>> >>> My initial idea was to disallow the use of UnknownType as the element >>> in ListType and not allow the UnknownType as either a Key or Value of a >>> MapType. Any thoughts or concerns? >>> >>> Kind regards from Belgium, >>> Fokko >>> >>