+1 for 1). For 2), I don’t think allowing optional field in identifier field would be a good idea. If I understand correctly, identifier fields is quite similar to primary key in relation database. In standard sql standard, NULL != NULL. If optional field is allowed, then two rows (1, NULL), (1, NULL) have exactly same value while they are not equal. The reason why float, double can’t be contained in primary key is similar.
From: Jacob Marble <jacobmar...@influxdata.com> Date: Thursday, August 24, 2023 at 04:18 To: dev@iceberg.apache.org <dev@iceberg.apache.org> Subject: two proposed spec changes Good afternoon, I would like to propose two changes to the Iceberg spec: 1) Primitive types time, timestamp, timestamptz gain property "precision", with three possible values: millis, micros, nanos (borrowing the list from Parquet<https://github.com/apache/parquet-format/blob/apache-parquet-format-2.9.0/LogicalTypes.md#timestamp>). The stringified type names would be extended to time[nanos], timestamp[millis], timestamptz[micros], allowing for easy fallback to micros whenever the suffix is not present. For this proposal, here is a diff<https://github.com/apache/iceberg/compare/master...jacobmarble:apache-iceberg:jgm-time-units> demonstrating the idea just a bit. 2) Identifier fields allowed to be optional. From the spec "it is the responsibility of processing engines or data providers to enforce" which means that any such provider could limit the use of optional identifiers, just as they may limit particular data types or file formats. To be clear, the spec currently reads "Float, double, and optional fields cannot be used as identifier fields and a nested field cannot be used as an identifier field if it is nested in an optional struct, to avoid null values in identifiers." and I propose "Float and double fields cannot be used as identifier fields." - What do people think of these two proposed changes? - What can I do next? The spec mentions v3<https://github.com/apache/iceberg/blob/9df8ddb05428cf3d7145bc5cf4a130de36dbb96a/format/spec.md#version-3>; is there a plan for a v3 release yet? I saw a conversation about enabling v2 by default, so I assume v3 is a ways off yet. -- Jacob Marble 🇺🇸 🇺🇦