There are a couple problems with default values. First, they are part of v3 and haven’t been implemented yet. But the second larger issue is that null is a value. A default doesn’t replace a null that was written in the data. I don’t think default values would help out here.
What I meant by derived field was to add a field to the Iceberg table that is a list of your tags concatenated together, like "a,,c". That would always be non-null and would function the same way. On Tue, Aug 29, 2023 at 10:50 AM Jacob Marble <jacobmar...@influxdata.com> wrote: > Please define "derived field"? > > We don't allow empty string as a tag value, so that sentinel value is > available. However, there are some second-order effects that need to be > considered. > > Just thinking out loud, I haven't explored using default values for tags > in our Iceberg export code; certainly need to. > https://iceberg.apache.org/spec/#default-values > > On Tue, Aug 29, 2023 at 9:52 AM Ryan Blue <b...@tabular.io> wrote: > >> Jacob, could you model this with a derived field? Or could you >> require the tags and use a "unknown" value? >> >> On Mon, Aug 28, 2023 at 11:18 AM Jacob Marble <jacobmar...@influxdata.com> >> wrote: >> >>> On Fri, Aug 25, 2023 at 3:23 PM Ryan Blue <b...@tabular.io> wrote: >>> >>>> I don't think that we should introduce nanosecond precision types >>>> without at least supporting both timestamp and timestamptz. I'm not sure >>>> whether nanosecond time should be supported. >>>> >>> >>> SGTM; this seems to be the most agreeable part of the proposal. >>> >>> For the primary keys, what is the use case you're trying to solve? Do >>>> your tables allow null values in primary keys? If so, what is the purpose >>>> of it? >>>> >>> >>> InfluxDB is a schema-on-write database; tables and columns are created >>> by writing to them. Constraints: >>> - Every table has exactly one timestamp[nanos] column, and is required. >>> - "Field" columns are typed (int, uint, float, string, bool). These are >>> the time series data that vary with time. At least one field value is >>> required, per row. >>> - "Tag" columns are only strings. These are identifying data - used for >>> grouping, filtering. Tag values are not required, whether tag columns are >>> present or not. >>> >>> Primary keys are composed of **non-null tags**, plus timestamp. For >>> example, these rows: >>> >>> timestamp | tag A | tag B | field(int) F >>> 09:25 | null | null | 1 >>> 09:25 | "foo" | null | 1 >>> 09:25 | "foo" | "bar" | 1 >>> 10:25 | "foo" | "bar" | 1 >>> >>> have these primary keys: >>> >>> (09:25) >>> (09:25,A="foo") >>> (09:25,A="foo",B="bar") >>> (10:25,A="foo",B="bar") >>> >>> InfluxDB uses these primary keys in two contexts: >>> - deduplication in query pipelines >>> - compaction (mitigate performance impact of query-time deduplication) >>> >>> -- >>> Jacob Marble >>> 🇺🇸 🇺🇦 >>> >> >> >> -- >> Ryan Blue >> Tabular >> > > > -- > Jacob Marble > 🇺🇸 🇺🇦 > -- Ryan Blue Tabular