Re: [DISCUSS] How does Unknown type behave?

Micah Kornfield Wed, 26 Nov 2025 20:47:49 -0800

Hi Joana,
Here are my thoughts, which are by no means the definitive answer here.

> 1. Given that variant can store any data type (both structured and
> primitive), I'm unclear when unknown would be preferred as similar
> behavior could be achieved by adding nullable variant columns? It seems
> like variant could handle most schema evolution scenarios. Are there
> specific situations where unknown is the better choice?

I think the point of the type is to not impose on a system the need have to
use a nullable variant column if it can't infer the type.   The variant
type has more overhead and can't easily be narrowed solely based on a
metadata operation to other types (but a NullType can easily be widened to
any type as a metadata operation).

The null type is generally meant from moving from schema-less systems to
ones with a schema.  e.g. A CSV file that has an empty value for every
field in a particular column.  I think Parquet's description of its
analogous type [1] is a good illustration:

"Sometimes when discovering the schema of existing data, values are always
null and the physical type can't be determined. This annotation signals the
case where the physical type was guessed from all null values."

That being said I don't think it is necessarily a bad idea if a system
wants to use Nullable variants for this use-case.

2. Also, is unknown intended for explicit use in DDL? Meaning, should users
> write DDL like:

In general, I don't think there is much of a use-case for allowing users to
set this through DDL, other than perhaps cloning it from an existing table.
As you pointed out if someone wishing to keep there options open is likely
better off using variant, or a type that can be widened later.

There are probably multiple ways of handling evolution but two possible
workable alternatives (I don't think these belong in the iceberg spec):
1.  Automatically evolve the schema based on the first inserted non-null
value for the column.
2.  Block insertions that try to insert a non-null values in the column
until user explicitly alters the column to a specific type.

Cheers,
Micah

[1]
https://github.com/apache/parquet-format/blob/master/src/main/thrift/parquet.thrift#L330

On Tue, Nov 18, 2025 at 4:45 AM Joana Hrotkó
<[email protected]> wrote:

> Hi Iceberg Community,
>
> I'm working with Iceberg v3 and trying to understand the practical use
> cases for the unknown type, especially in relation to the variant type.
>
> The variant type handles both semi-structured data (JSON, nested
> objects/arrays) and primitive types (strings, integers, booleans, dates,
> timestamps, etc.) with efficient binary encoding. It supports schema
> evolution and provides good query performance.
>
> The unknown type is described as being for "evolving schemas without
> forcing immediate resolution" and must always default to null.
>
> 1. Given that variant can store any data type (both structured and
> primitive), I'm unclear when unknown would be preferred as similar
> behavior could be achieved by adding nullable variant columns? It seems
> like variant could handle most schema evolution scenarios. Are there
> specific situations where unknown is the better choice?
>
> 2. Also, is unknown intended for explicit use in DDL? Meaning, should
> users write DDL like:
>
> CREATE TABLE foo (col1 unknown)ALTER TABLE foo ADD COLUMN col2 unknown
>
> Or is unknown an internal type that engines use automatically during
> schema evolution?
>
> Cheers,
>
> Joana Hrotkó
>

Re: [DISCUSS] How does Unknown type behave?

Reply via email to