I think it's reasonable to fail in cases where the underlying format can't
represent a type, like the element of a list. We can go back and fix this
by adding support for using Parquet's UNKNOWN type annotation
<https://github.com/apache/parquet-format/blob/master/LogicalTypes.md#unknown-always-null>,
but my original concern about it was that it introduces the need to choose
an underlying physical type and I didn't want that to cause future
problems. Maybe we should just standardize that as fixed[0].

On Mon, Jul 28, 2025 at 1:13 AM Bart Samwel <b...@databricks.com.invalid>
wrote:

> On Sat, Jul 26, 2025 at 6:09 PM Kevin Liu <kevinjq...@apache.org> wrote:
>
>> > My initial idea was to disallow the use of UnknownType as the element
>> in ListType and not allow the UnknownType as either a Key or Value of a
>> MapType. Any thoughts or concerns?
>>
>> That makes sense. I would also include `StructType` here too.
>> `StructType` is another  "complex type" (extends NestedType
>> <https://github.com/apache/iceberg/blob/360f87326d4ccf67512a0240e529035801d9db2b/api/src/main/java/org/apache/iceberg/types/Types.java#L1001>)
>> just like `ListType` and `MapType`.
>> This will make `unknown` the first primitive type to not be allowed as
>> part of another complex type.
>>
>
> Do you mean to forbid `UnknownType` inside `StructType`? I'm afraid that
> would undermine the orthogonality of the system. A common use of StructType
> is to store entire rows. If StructType cannot contain elements that are
> UnknownType but top-level rows can, then you can no longer store an
> arbitrary top-level row inside a StructType.
>
> Unfortunately UnknownType in struct does have some issues. In particular,
> if it's not stored, then IIUIC you can have issues with structs containing
> only UnknownType fields -- they will look empty to Parquet, and my
> understanding is that that isn't allowed. For orthogonality it would have
> been better to actually store the unknown type, even if it's just as a
> series of "this is NULL" bits. Omitting these fields in storage seems like
> a convenient hack that leads to all sorts of surprising corner cases...
>
>
> On Sat, Jul 26, 2025 at 5:43 AM Fokko Driesprong <fo...@apache.org> wrote:
>>
>>> Hi everyone,
>>>
>>> Recently I took a stab at implementing reading UknownType
>>> <https://github.com/apache/iceberg/pull/13445> in the Java
>>> implementation. I thought it would make sense to add this to the reference
>>> implementation first. However, I ran into a limitation with the current
>>> definition in the spec:
>>>
>>> Must be optional with null defaults; not stored in data files
>>>
>>>
>>> One obvious limitation is that it cannot be the key of a MapType, as it
>>> has to be not-null. It can't be stored either as the value of a MapType
>>> since there is no easy way to store just the key without doing awkward
>>> things, such as writing just the keys as a list.
>>>
>>> My initial idea was to disallow the use of UnknownType as the element
>>> in ListType and not allow the UnknownType as either a Key or Value of a
>>> MapType. Any thoughts or concerns?
>>>
>>> Kind regards from Belgium,
>>> Fokko
>>>
>>

Reply via email to