There are a couple problems with default values. First, they are part of v3
and haven’t been implemented yet. But the second larger issue is that null
is a value. A default doesn’t replace a null that was written in the data.
I don’t think default values would help out here.

What I meant by derived field was to add a field to the Iceberg table that
is a list of your tags concatenated together, like "a,,c". That would
always be non-null and would function the same way.

On Tue, Aug 29, 2023 at 10:50 AM Jacob Marble <jacobmar...@influxdata.com>
wrote:

> Please define "derived field"?
>
> We don't allow empty string as a tag value, so that sentinel value is
> available. However, there are some second-order effects that need to be
> considered.
>
> Just thinking out loud, I haven't explored using default values for tags
> in our Iceberg export code; certainly need to.
> https://iceberg.apache.org/spec/#default-values
>
> On Tue, Aug 29, 2023 at 9:52 AM Ryan Blue <b...@tabular.io> wrote:
>
>> Jacob, could you model this with a derived field? Or could you
>> require the tags and use a "unknown" value?
>>
>> On Mon, Aug 28, 2023 at 11:18 AM Jacob Marble <jacobmar...@influxdata.com>
>> wrote:
>>
>>> On Fri, Aug 25, 2023 at 3:23 PM Ryan Blue <b...@tabular.io> wrote:
>>>
>>>> I don't think that we should introduce nanosecond precision types
>>>> without at least supporting both timestamp and timestamptz. I'm not sure
>>>> whether nanosecond time should be supported.
>>>>
>>>
>>> SGTM; this seems to be the most agreeable part of the proposal.
>>>
>>> For the primary keys, what is the use case you're trying to solve? Do
>>>> your tables allow null values in primary keys? If so, what is the purpose
>>>> of it?
>>>>
>>>
>>> InfluxDB is a schema-on-write database; tables and columns are created
>>> by writing to them. Constraints:
>>> - Every table has exactly one timestamp[nanos] column, and is required.
>>> - "Field" columns are typed (int, uint, float, string, bool). These are
>>> the time series data that vary with time. At least one field value is
>>> required, per row.
>>> - "Tag" columns are only strings. These are identifying data - used for
>>> grouping, filtering. Tag values are not required, whether tag columns are
>>> present or not.
>>>
>>> Primary keys are composed of **non-null tags**, plus timestamp. For
>>> example, these rows:
>>>
>>> timestamp | tag A | tag B | field(int) F
>>> 09:25 | null | null | 1
>>> 09:25 | "foo" | null | 1
>>> 09:25 | "foo" | "bar" | 1
>>> 10:25 | "foo" | "bar" | 1
>>>
>>> have these primary keys:
>>>
>>> (09:25)
>>> (09:25,A="foo")
>>> (09:25,A="foo",B="bar")
>>> (10:25,A="foo",B="bar")
>>>
>>> InfluxDB uses these primary keys in two contexts:
>>> - deduplication in query pipelines
>>> - compaction (mitigate performance impact of query-time deduplication)
>>>
>>> --
>>> Jacob Marble
>>> 🇺🇸 🇺🇦
>>>
>>
>>
>> --
>> Ryan Blue
>> Tabular
>>
>
>
> --
> Jacob Marble
> 🇺🇸 🇺🇦
>


-- 
Ryan Blue
Tabular

Reply via email to