Good afternoon,

I would like to propose two changes to the Iceberg spec:

1) * Primitive types time, timestamp, timestamptz gain property
"precision",* with three possible values: millis, micros, nanos (borrowing the
list from Parquet
<https://github.com/apache/parquet-format/blob/apache-parquet-format-2.9.0/LogicalTypes.md#timestamp>).
The stringified type names would be extended to time[nanos],
timestamp[millis], timestamptz[micros], allowing for easy fallback to
micros whenever the suffix is not present.

For this proposal, here is a diff
<https://github.com/apache/iceberg/compare/master...jacobmarble:apache-iceberg:jgm-time-units>
demonstrating the idea just a bit.

2) * Identifier fields allowed to be optional.* From the spec "it is the
responsibility of processing engines or data providers to enforce" which
means that any such provider could limit the use of optional identifiers,
just as they may limit particular data types or file formats.

To be clear, the spec currently reads "Float, double, and optional fields
cannot be used as identifier fields and a nested field cannot be used as an
identifier field if it is nested in an optional struct, to avoid null
values in identifiers." and I propose "Float and double fields cannot be
used as identifier fields."

- What do people think of these two proposed changes?
- What can I do next?

The spec mentions v3
<https://github.com/apache/iceberg/blob/9df8ddb05428cf3d7145bc5cf4a130de36dbb96a/format/spec.md#version-3>;
is there a plan for a v3 release yet? I saw a conversation about enabling
v2 by default, so I assume v3 is a ways off yet.

-- 
Jacob Marble
🇺🇸 🇺🇦

Reply via email to