> On 11 Mar 2024, at 20:56, Jelte Fennema-Nio <postg...@jeltef.nl> wrote:
>
> Attached a few comment fixes/improvements and a pgindent run (patch 0002-0004)
Thanks!
> Now with the added comments, one thing pops out to me: The comments
> mention that we use "Monotonic Random", but when I read the spec that
> explicitly recommends against using an increment of 1 when using
> monotonic random. I feel like if we use an increment of 1, we're
> better off going for the "Fixed-Length Dedicated Counter Bits" method
> (i.e. change the code to start the counter at 0). See patch 0005 for
> an example of that change.
>
> I'm also wondering if we really want to use the extra rand_b bits for
> this. The spec says we MAY, but it does remove the amount of
> randomness in our UUIDs.
Method 1 is a just a Method 2 with specifically picked constants.
But I'll have to use some hand-wavy wordings...
UUID consists of these 128 bits:
a. Mandatory 2 var and 4 ver bits.
b. Flexible but strongly recommended 48 bits unix_ts_ms. These bits contribute
to global sortability of values generated at frequency less than 1KHz.
c. Counter bits:
c1. Initialised with 0 on any time tick.
c2. Initialised with randomness.
c3*. bit width of a counter step (*not counted in 128 bit capacity, can be
non-integral)
d. Randomness bits.
Method 1 is when c2=0. My implementation of method 2 uses c1=1, c2=17
Consider all UUIDs generated at any given milliseconds. Probability of a
collision of two UUIDs generated at frequency less than 1KHz is p = 2^-(c2+d)
Capacity of a counter has expected value of c = 2^(c1)*2^(c2-1)/2^c3
To guess next UUID you can correctly pick one of u = 2^(d+c3)
First, observe that c3 contributes unguessability at exactly same scale as
decreases counter capacity. There is no difference between using bits in d
directly, or in c3. There is no point in non-zero c3. Every bit that could be
given to c3 can equally be given to d.
Second, observe that c2 bits contribute to both collision protection and
counter capacity! And when the time ticks, c2 also contribute to
unguessability! So, technically, we should consider using all available bits as
c2 bits.
How many c1 bits do we need? I've chosen one - to prevent occasional counter
capacity reduction.
If c1 = 1, we can distribute 73 bits between c2 and d. I've chosen c2 = 17 and
d = 56 as an arbitrary compromise between capacity of one backend per ms and
prevention of global collision.
This compromise is mostly dictated by maximum frequency of UUID generation by
one backend, I've chosen 200MHz as a sane value.
This compromise is much easier when you do not have 74 spare bits, this crazy
amount of information forgives almost any mistake. Imagine you have to
distribute 10 bits between c2 and d. And you try to prevent collision between
10 independent devices which need capacity to generate IDs with frequency of
10KHz each and keep sortability. You would have something like c1=1, c2=3,d=6.
Sorry for this long and vague explanation, if it still seems too uncertain we
can have a chat or something like that. I don't think this number picking stuff
deserve to be commented, because it still is quite close to random. RFC gives
us too much freedom of choice.
Thanks!
Best regards, Andrey Borodin.