I don't think that we should introduce nanosecond precision types without at least supporting both timestamp and timestamptz. I'm not sure whether nanosecond time should be supported.
For the primary keys, what is the use case you're trying to solve? Do your tables allow null values in primary keys? If so, what is the purpose of it? On Thu, Aug 24, 2023 at 5:40 PM Jacob Marble <jacobmar...@influxdata.com> wrote: > On Thu, Aug 24, 2023 at 5:28 PM Ryan Blue <b...@tabular.io> wrote: > >> I think it's a good idea to start adding timestamp types with nanosecond >> precision. I've heard this a few times lately and having a column of nanos >> as a work-around isn't a great solution. I'm much more skeptical that we >> should allow millis though. That just introduces more for engines to >> implement and there isn't significant value compared to using micros. That >> said, it isn't that much larger of a change to support another so if there >> is strong support I probably wouldn't oppose it. >> > > InfluxDB needs nanosecond timestamp; we don't need millis, and we don't > need types time or timestamptz. However, it seemed not much effort to fill > the blanks that others might need later. > > A funny wrinkle I learned: Parquet stores TIME(MILLIS) as int32, and all > other TIME and TIMESTAMP values as int64. > > >> For primary keys, I agree with Renjie. The standard NULL != NULL behavior >> makes it difficult to have a key that contains NULL and to have >> expectations of uniform behavior across engines. I don't see how the >> work-arounds are relevant to whether optional fields are allowed in a key. >> Most big data engines don't enforce primary key constraints because a >> uniqueness guarantee would be expensive and confusing (e.g. why is INSERT >> running a self-join?), regardless of whether the values can be optional. I >> don't think it's because they'd need to choose whether to use SQL 3-valued >> boolean logic or implicit null-safe equality. >> > > I simply meant to show that there isn't much of a standard in how primary > keys are handled in big data engines. I'll work with my team to consider > work-arounds. > > On Thu, Aug 24, 2023 at 9:29 AM Jacob Marble <jacobmar...@influxdata.com> >> wrote: >> >>> wrt 2) Agreed, NULL != NULL is standard. The human interpretation is >>> NULL = "unknown". However, exceptions are not uncommon. >>> >>> - MySQL docs state "The NULL value means “no data.” >>> <https://dev.mysql.com/doc/refman/8.1/en/null-values.html> ". >>> - SQL Server accommodates (NULL = NULL) == TRUE via SET ANSI_NULLS OFF >>> <https://learn.microsoft.com/en-us/sql/t-sql/statements/set-ansi-nulls-transact-sql?view=sql-server-ver16> >>> . >>> - Snowflake's external table Iceberg feature does not complain when I >>> have created schemas with optional identifier fields. This makes sense >>> because Snowflake doesn't enforce primary key constraints >>> <https://docs.snowflake.com/en/sql-reference/constraints-overview#supported-constraint-types>, >>> even though it enforces NOT NULL constraints. >>> - Databricks doesn't enforce primary key constraints >>> <https://docs.databricks.com/en/tables/constraints.html#declare-primary-key-and-foreign-key-relationships> >>> . >>> - InfluxDB allows tags (identifier columns) to be NULL or "missing". (I >>> am an employee of InfluxData.) >>> >>> I might argue that, with analytical / data warehouse use cases, >>> identifying columns do not have a common interpretation. Indeed, a >>> query/compute engine may have its own reasons for handling primary >>> >>> On Wed, Aug 23, 2023 at 7:57 PM Renjie Liu <liurenjie2...@gmail.com> >>> wrote: >>> >>>> +1 for 1). >>>> >>>> >>>> >>>> For 2), I don’t think allowing optional field in identifier field would >>>> be a good idea. If I understand correctly, identifier fields is quite >>>> similar to primary key in relation database. In standard sql standard, NULL >>>> != NULL. If optional field is allowed, then two rows (1, NULL), (1, NULL) >>>> have exactly same value while they are not equal. The reason why float, >>>> double can’t be contained in primary key is similar. >>>> >>>> >>>> >>>> *From: *Jacob Marble <jacobmar...@influxdata.com> >>>> *Date: *Thursday, August 24, 2023 at 04:18 >>>> *To: *dev@iceberg.apache.org <dev@iceberg.apache.org> >>>> *Subject: *two proposed spec changes >>>> >>>> Good afternoon, >>>> >>>> >>>> >>>> I would like to propose two changes to the Iceberg spec: >>>> >>>> >>>> >>>> 1) *Primitive types time, timestamp, timestamptz gain property >>>> "precision",* with three possible values: millis, micros, nanos >>>> (borrowing the list from Parquet >>>> <https://github.com/apache/parquet-format/blob/apache-parquet-format-2.9.0/LogicalTypes.md#timestamp>). >>>> The stringified type names would be extended to time[nanos], >>>> timestamp[millis], timestamptz[micros], allowing for easy fallback to >>>> micros whenever the suffix is not present. >>>> >>>> >>>> >>>> For this proposal, here is a diff >>>> <https://github.com/apache/iceberg/compare/master...jacobmarble:apache-iceberg:jgm-time-units> >>>> demonstrating the idea just a bit. >>>> >>>> >>>> >>>> 2) *Identifier fields allowed to be optional.* From the spec "it is >>>> the responsibility of processing engines or data providers to enforce" >>>> which means that any such provider could limit the use of optional >>>> identifiers, just as they may limit particular data types or file formats. >>>> >>>> To be clear, the spec currently reads "Float, double, and optional >>>> fields cannot be used as identifier fields and a nested field cannot be >>>> used as an identifier field if it is nested in an optional struct, to avoid >>>> null values in identifiers." and I propose "Float and double fields cannot >>>> be used as identifier fields." >>>> >>>> >>>> >>>> - What do people think of these two proposed changes? >>>> >>>> - What can I do next? >>>> >>>> >>>> >>>> The spec mentions v3 >>>> <https://github.com/apache/iceberg/blob/9df8ddb05428cf3d7145bc5cf4a130de36dbb96a/format/spec.md#version-3>; >>>> is there a plan for a v3 release yet? I saw a conversation about enabling >>>> v2 by default, so I assume v3 is a ways off yet. >>>> >>>> -- >>>> >>>> Jacob Marble >>>> >>>> 🇺🇸 🇺🇦 >>>> >>> >>> >>> -- >>> Jacob Marble >>> 🇺🇸 🇺🇦 >>> >> >> >> -- >> Ryan Blue >> Tabular >> > > > -- > Jacob Marble > 🇺🇸 🇺🇦 > -- Ryan Blue Tabular