+1 for 1).

For 2), I don’t think allowing optional field in identifier field would be a 
good idea. If I understand correctly, identifier fields is quite similar to 
primary key in relation database. In standard sql standard, NULL != NULL. If 
optional field is allowed, then two rows (1, NULL), (1, NULL) have exactly same 
value while they are not equal. The reason why float, double can’t be contained 
in primary key is similar.

From: Jacob Marble <jacobmar...@influxdata.com>
Date: Thursday, August 24, 2023 at 04:18
To: dev@iceberg.apache.org <dev@iceberg.apache.org>
Subject: two proposed spec changes
Good afternoon,

I would like to propose two changes to the Iceberg spec:

1) Primitive types time, timestamp, timestamptz gain property "precision", with 
three possible values: millis, micros, nanos (borrowing the list from 
Parquet<https://github.com/apache/parquet-format/blob/apache-parquet-format-2.9.0/LogicalTypes.md#timestamp>).
 The stringified type names would be extended to time[nanos], 
timestamp[millis], timestamptz[micros], allowing for easy fallback to micros 
whenever the suffix is not present.

For this proposal, here is a 
diff<https://github.com/apache/iceberg/compare/master...jacobmarble:apache-iceberg:jgm-time-units>
 demonstrating the idea just a bit.

2) Identifier fields allowed to be optional. From the spec "it is the 
responsibility of processing engines or data providers to enforce" which means 
that any such provider could limit the use of optional identifiers, just as 
they may limit particular data types or file formats.

To be clear, the spec currently reads "Float, double, and optional fields 
cannot be used as identifier fields and a nested field cannot be used as an 
identifier field if it is nested in an optional struct, to avoid null values in 
identifiers." and I propose "Float and double fields cannot be used as 
identifier fields."

- What do people think of these two proposed changes?
- What can I do next?

The spec mentions 
v3<https://github.com/apache/iceberg/blob/9df8ddb05428cf3d7145bc5cf4a130de36dbb96a/format/spec.md#version-3>;
 is there a plan for a v3 release yet? I saw a conversation about enabling v2 
by default, so I assume v3 is a ways off yet.

--
Jacob Marble
🇺🇸 🇺🇦

Reply via email to