Re: Sequence vs UUID

veem v Thu, 02 Feb 2023 11:47:53 -0800

Tested the UUIDv7 generator for postgres as below.

With regards to performance , It's still way behind the sequence. I was
expecting the insert performance of UUID v7 to be closer to the sequence ,
but it doesn't seem so, as it's 500ms vs 3000ms. And the generation takes a
lot longer time as compared to sequence too i.e. 59ms vs 1700ms. Read time
or the index scan looks close i.e. 2.3ms vs 2.6ms.


*************** UUID7 Alternate way ***********************
create or replace function uuid_generate_v7()
returns uuid
as $$
declare
  unix_ts_ms bytea;
  uuid_bytes bytea;
begin
  unix_ts_ms = substring(int8send(floor(extract(epoch from
clock_timestamp()) * 1000)::bigint) from 3);

  -- use random v4 uuid as starting point (which has the same variant we
need)
  uuid_bytes = uuid_send(gen_random_uuid());

  -- overlay timestamp
  uuid_bytes = overlay(uuid_bytes placing unix_ts_ms from 1 for 6);

  -- set version 7
  uuid_bytes = set_byte(uuid_bytes, 6, (b'0111' || get_byte(uuid_bytes,
6)::bit(4))::bit(8)::int);

  return encode(uuid_bytes, 'hex')::uuid;
end
$$
language plpgsql
volatile;


*************** Postgress *****************
CREATE UNLOGGED TABLE test_bigint (  id bigint  PRIMARY KEY);
 CREATE UNLOGGED TABLE test_uuid (   id uuid  PRIMARY KEY);
CREATE UNLOGGED TABLE test_uuid7 (   id uuid  PRIMARY KEY);
create sequence myseq cache 32767;

************* sequence generation vs UUID generation Test**************
explain analyze  select count(nextval('myseq') ) from
generate_series(1,100000);

QUERY PLAN
Aggregate  (cost=1500.00..1500.01 rows=1 width=8) (actual
time=59.037..59.039 rows=1 loops=1)
  ->  Function Scan on generate_series  (cost=0.00..1000.00 rows=100000
width=0) (actual time=18.541..34.200 rows=100000 loops=1)
Planning Time: 0.099 ms
Execution Time: 59.687 ms

explain analyze  select count(gen_random_uuid()) from
generate_series(1,100000);

QUERY PLAN
Aggregate  (cost=1500.00..1500.01 rows=1 width=8) (actual
time=900.633..900.634 rows=1 loops=1)
  ->  Function Scan on generate_series  (cost=0.00..1000.00 rows=100000
width=0) (actual time=12.893..65.820 rows=100000 loops=1)
Planning Time: 0.051 ms
Execution Time: 904.868 ms

explain analyze  select count(uuid_generate_v7()) from
generate_series(1,100000);

QUERY PLAN
Aggregate  (cost=26250.00..26250.01 rows=1 width=8) (actual
time=1710.609..1710.610 rows=1 loops=1)
  ->  Function Scan on generate_series  (cost=0.00..1000.00 rows=100000
width=0) (actual time=21.807..69.168 rows=100000 loops=1)
Planning Time: 0.048 ms
Execution Time: 1711.187 ms

*************** Insert Test***************

explain analyze INSERT INTO test_bigint select nextval('myseq')  from
generate_series(1,100000);

QUERY PLAN
Insert on test_bigint  (cost=0.00..2250.00 rows=0 width=0) (actual
time=551.707..551.708 rows=0 loops=1)
  ->  Function Scan on generate_series  (cost=0.00..1250.00 rows=100000
width=8) (actual time=12.438..91.337 rows=100000 loops=1)
Planning Time: 0.053 ms
Execution Time: 552.240 ms

explain analyze  INSERT INTO test_uuid  select gen_random_uuid()  from
generate_series(1,100000);

QUERY PLAN
Insert on test_uuid  (cost=0.00..2250.00 rows=0 width=0) (actual
time=2040.743..2040.744 rows=0 loops=1)
  ->  Function Scan on generate_series  (cost=0.00..1250.00 rows=100000
width=16) (actual time=12.829..982.446 rows=100000 loops=1)
Planning Time: 0.051 ms
Execution Time: 2041.242 ms

explain analyze  INSERT INTO test_uuid7  select uuid_generate_v7()  from
generate_series(1,100000);

QUERY PLAN
Insert on test_uuid7  (cost=0.00..27000.00 rows=0 width=0) (actual
time=3234.456..3234.457 rows=0 loops=1)
  ->  Function Scan on generate_series  (cost=0.00..26000.00 rows=100000
width=16) (actual time=12.453..1933.217 rows=100000 loops=1)
Planning Time: 0.051 ms
Execution Time: 3234.971 ms

************ Index performance********

Explain analyze select * from test_bigint where id in (select id from
test_bigint limit 1000);

EXPLAIN
QUERY PLAN
Nested Loop  (cost=27.22..1322.70 rows=1000 width=8) (actual
time=0.414..2.307 rows=1000 loops=1)
  ->  HashAggregate  (cost=26.92..28.92 rows=200 width=8) (actual
time=0.370..0.520 rows=1000 loops=1)
        Group Key: test_bigint_1.id
        Batches: 1  Memory Usage: 145kB
        ->  Limit  (cost=0.00..14.42 rows=1000 width=8) (actual
time=0.012..0.163 rows=1000 loops=1)
              ->  Seq Scan on test_bigint test_bigint_1
 (cost=0.00..1444.18 rows=100118 width=8) (actual time=0.011..0.093
rows=1000 loops=1)
  ->  Index Only Scan using test_bigint_pkey on test_bigint
 (cost=0.29..6.53 rows=1 width=8) (actual time=0.001..0.001 rows=1
loops=1000)
        Index Cond: (id = test_bigint_1.id)
        Heap Fetches: 1000
Planning Time: 0.395 ms
Execution Time: 2.395 ms


Explain analyze  select * from test_uuid where  id in (select id from
test_uuid limit 1000);

QUERY PLAN
Nested Loop  (cost=28.32..1459.58 rows=1000 width=16) (actual
time=0.431..3.355 rows=1000 loops=1)
  ->  HashAggregate  (cost=27.91..29.91 rows=200 width=16) (actual
time=0.399..0.556 rows=1000 loops=1)
        Group Key: test_uuid_1.id
        Batches: 1  Memory Usage: 145kB
        ->  Limit  (cost=0.00..15.41 rows=1000 width=16) (actual
time=0.011..0.185 rows=1000 loops=1)
              ->  Seq Scan on test_uuid test_uuid_1  (cost=0.00..1541.85
rows=100085 width=16) (actual time=0.010..0.093 rows=1000 loops=1)
  ->  Index Only Scan using test_uuid_pkey on test_uuid  (cost=0.42..7.21
rows=1 width=16) (actual time=0.002..0.002 rows=1 loops=1000)
        Index Cond: (id = test_uuid_1.id)
        Heap Fetches: 1000
Planning Time: 0.234 ms
Execution Time: 3.419 ms

Explain analyze  select * from test_uuid7 where  id in (select id from
test_uuid7 limit 1000);

QUERY PLAN
Nested Loop  (cost=28.32..1416.01 rows=1000 width=16) (actual
time=0.403..2.586 rows=1000 loops=1)
  ->  HashAggregate  (cost=27.91..29.91 rows=200 width=16) (actual
time=0.371..0.546 rows=1000 loops=1)
        Group Key: test_uuid7_1.id
        Batches: 1  Memory Usage: 145kB
        ->  Limit  (cost=0.00..15.41 rows=1000 width=16) (actual
time=0.011..0.161 rows=1000 loops=1)
              ->  Seq Scan on test_uuid7 test_uuid7_1  (cost=0.00..1541.85
rows=100085 width=16) (actual time=0.010..0.091 rows=1000 loops=1)
  ->  Index Only Scan using test_uuid7_pkey on test_uuid7  (cost=0.42..6.99
rows=1 width=16) (actual time=0.002..0.002 rows=1 loops=1000)
        Index Cond: (id = test_uuid7_1.id)
        Heap Fetches: 1000
Planning Time: 0.101 ms
Execution Time: 2.661 ms

On Thu, 2 Feb 2023 at 20:52, Benedict Holland <[email protected]>
wrote:

> Well... until two processes generate an identical UUID. That happened to
> me several times. It's rare but when that happens, oh boy that is a mess to
> figure out.
>
> Thanks,
> Ben
>
> On Thu, Feb 2, 2023, 10:17 AM Miles Elam <[email protected]>
> wrote:
>
>> On Wed, Feb 1, 2023 at 10:48 AM Kirk Wolak <[email protected]> wrote:
>>
>>>
>>>
>>> On Wed, Feb 1, 2023 at 1:34 PM veem v <[email protected]> wrote:
>>>
>>>>
>>>> 1) sequence generation vs UUID generation, execution time increased
>>>> from ~291ms to 5655ms.
>>>> 2) Insert performance of "sequence" vs "UUID"  execution time increased
>>>> from ~2031ms to 10599ms.
>>>> 3) Index performance for sequence vs UUID,  execution time increased
>>>> from ~.3ms to .5ms.
>>>>
>>>>
>>> Yes, assuming that UUIDs would be efficient as keys when they are
>>> randomly generated, versus sequences (which tend to expand in one
>>> direction, and have been relatively optimized for years).
>>>
>>> This article explains in detail what is going on.  If I were doing this,
>>> I would strongly consider a ULID because of getting the best of both worlds.
>>> https://blog.daveallie.com/ulid-primary-keys
>>>
>>> Of course, YMMV...  And since ULIDs are not native to PG, there is
>>> overhead, but it is far more performant, IMO...
>>>
>>
>> Biased comparison. ULIDs have a timestamp component. The closest UUID
>> equivalent in Postgres is UUIDv1 from the uuid-ossp extension, not v4.
>> Another difference not mentioned in the blog article is that UUID is
>> versioned, meaning you can figure out what kind of data is in the UUID,
>> whereas ULIDs are a "one size fits all" solution.
>>
>> There is an implementation of sequential UUIDs for Postgres I posted
>> earlier in this thread. In addition, here is an implementation of UUIDv7
>> for Postgres:
>>
>> https://gist.github.com/kjmph/5bd772b2c2df145aa645b837da7eca74
>>
>> I would suggest running your tests against v1, v7, and sequential UUID
>> before jumping on ULID, which has no native type/indexing in Postgres.
>>
>> It should also be noted that apps cannot provide a bigint ID due to
>> collisions, but an app can generate UUIDs and ULIDs without fear,
>> essentially shifting the generation time metric in UUID/ULID's favor over a
>> bigserial.
>>
>> - Miles
>>
>>
>>

Re: Sequence vs UUID

Reply via email to