Re: [PATCH] BUG/MINOR: tools: Ensure the generated UUIDv7 is monotonic

Tim Düsterhus Tue, 09 Jul 2024 10:35:43 -0700

Hi

On 7/8/24 06:38, Willy Tarreau wrote:

Well, I must confess that while I understand what you tried to do, it's
less clear what is the problem you wanted to address. I think that your
concern is the possible lack of monotonicity, is that it ?

An example is probably best to explain the requirements. When I do thefollowing:


        http-after-response set-header test1 %[uuid(7)]
        http-after-response set-header test2 %[uuid(7)]
        http-after-response set-header test3 %[uuid(7)]
        http-after-response set-header test4 %[uuid(7)]
        http-after-response set-header test5 %[uuid(7)]
        http-after-response set-header test6 %[uuid(7)]
        http-after-response set-header test7 %[uuid(7)]
        http-after-response set-header test8 %[uuid(7)]

then the returned UUIDs need to be "ascending" as per RFC 9562#6.2. Thecurrent implementation does not guarantee that, because the embeddedtimestamp only has millisecond resolution and the rest is completelyrandom. That means that the test6 UUID might be lower than the test2 UUID.

RFC 9562 provides several options in section 6.2. Either the bitsfollowing the millisecond part are a simple counter that resets everymillisecond, they are used to store addition clock precision (e.g. microor nanoseconds), or the random part is not actually random and insteadstarts at a random value and is incremented by a random amount every time.


My patch chooses option (2): Storing additional clock precision.

In any case there are much less risks of collisions by filling bits
with random than with clock bits. Here if I read correctly, you're
only keeping 64 bits of random, which means that on average you'll
experience a collision every 2^32 generations. At 50k/s, it happens
once a day. With the 96 bits used previously, it will only happen
every 2^48, which is once every 178 years (which obviously reduces
in distributed systems).


My patch does the following:

48 bits: Unix timestamp with milliseconds.
 4 bits: UUID version.
12 bits: How many 256ns periods happened since the last millisecond?
 2 bits: UUID variant.
62 bits: Random.

Indeed a collision with 64 bits of randomness is expected every 2^32values, but you need to keep in mind that a collision will only be acollision if it happens in the same millisecond + nanosecond offset. Soit actually is a comparison between:


1 ms resolution + 74 bits of randomness
256 ns resolution + 62 bits of randomness

Finally, the purpose of having the timestamp in UUIDv7 (like ULID)
is to group simultaneous operations. They keep caches hot, they allow
events to be found grouped in logs, and offer easier purge mechanisms
for recycling entries (no longer need to look up, just apply a time
offset and eliminate all those below). In that sense there's no quest
for extreme precision (which cannot work on distributed systems anyway),
rather for a reasonably good and reasonably monotonic ordering that
still prevents collisions across multiple systems.


RFC 9562 section 6.2 specifies that:

Take care to ensure UUIDs generated in batches are also monotonic. That is, if 
one thousand UUIDs are generated for the same timestamp, there should be 
sufficient logic for organizing the creation order of those one thousand UUIDs.


and that is what this patch attempts t do.

So if that's also what you're looking for, I'm proposing that we
instead change the use of the unreliable date variable with a call
to now_mono_date_ns() that we can then use as the date source, and
keep the rest of the randoms.


As explained above, your suggested patch does not do the right thing.

Best regards
Tim Düsterhus

Re: [PATCH] BUG/MINOR: tools: Ensure the generated UUIDv7 is monotonic

Reply via email to