Hi
On 7/8/24 06:38, Willy Tarreau wrote:
Well, I must confess that while I understand what you tried to do, it's
less clear what is the problem you wanted to address. I think that your
concern is the possible lack of monotonicity, is that it ?
An example is probably best to explain the requirements. When I do the
following:
http-after-response set-header test1 %[uuid(7)]
http-after-response set-header test2 %[uuid(7)]
http-after-response set-header test3 %[uuid(7)]
http-after-response set-header test4 %[uuid(7)]
http-after-response set-header test5 %[uuid(7)]
http-after-response set-header test6 %[uuid(7)]
http-after-response set-header test7 %[uuid(7)]
http-after-response set-header test8 %[uuid(7)]
then the returned UUIDs need to be "ascending" as per RFC 9562#6.2. The
current implementation does not guarantee that, because the embedded
timestamp only has millisecond resolution and the rest is completely
random. That means that the test6 UUID might be lower than the test2 UUID.
RFC 9562 provides several options in section 6.2. Either the bits
following the millisecond part are a simple counter that resets every
millisecond, they are used to store addition clock precision (e.g. micro
or nanoseconds), or the random part is not actually random and instead
starts at a random value and is incremented by a random amount every time.
My patch chooses option (2): Storing additional clock precision.
In any case there are much less risks of collisions by filling bits
with random than with clock bits. Here if I read correctly, you're
only keeping 64 bits of random, which means that on average you'll
experience a collision every 2^32 generations. At 50k/s, it happens
once a day. With the 96 bits used previously, it will only happen
every 2^48, which is once every 178 years (which obviously reduces
in distributed systems).
My patch does the following:
48 bits: Unix timestamp with milliseconds.
4 bits: UUID version.
12 bits: How many 256ns periods happened since the last millisecond?
2 bits: UUID variant.
62 bits: Random.
Indeed a collision with 64 bits of randomness is expected every 2^32
values, but you need to keep in mind that a collision will only be a
collision if it happens in the same millisecond + nanosecond offset. So
it actually is a comparison between:
1 ms resolution + 74 bits of randomness
256 ns resolution + 62 bits of randomness
Finally, the purpose of having the timestamp in UUIDv7 (like ULID)
is to group simultaneous operations. They keep caches hot, they allow
events to be found grouped in logs, and offer easier purge mechanisms
for recycling entries (no longer need to look up, just apply a time
offset and eliminate all those below). In that sense there's no quest
for extreme precision (which cannot work on distributed systems anyway),
rather for a reasonably good and reasonably monotonic ordering that
still prevents collisions across multiple systems.
RFC 9562 section 6.2 specifies that:
Take care to ensure UUIDs generated in batches are also monotonic. That is, if
one thousand UUIDs are generated for the same timestamp, there should be
sufficient logic for organizing the creation order of those one thousand UUIDs.
and that is what this patch attempts t do.
So if that's also what you're looking for, I'm proposing that we
instead change the use of the unreliable date variable with a call
to now_mono_date_ns() that we can then use as the date source, and
keep the rest of the randoms.
As explained above, your suggested patch does not do the right thing.
Best regards
Tim Düsterhus