Was trying to test the performance for simple read/write for the bigint vs UUID. What we see is , ~3 times performance degradation while joining on bigint vs UUID columns. Also even just generation of sequence vs bigint itself is degrading by ~3times too. Also even insert performance on same table for ~10million rows is ~1min 39sec for bigint vs ~3minute 11 sec in case of UUID. Is such extent of degradation in performance this expected for UUID?
CREATE TABLE test1_UUID ( id bigint,source_id varchar(36) PRIMARY KEY, Name varchar(20) ); CREATE TABLE test2_UUID (id bigint,source_id varchar(36) PRIMARY KEY,Name varchar(20) ); CREATE TABLE test1_bigint ( id bigint PRIMARY KEY, source_id varchar(36) , Name varchar(20)); CREATE TABLE test2_bigint ( id bigint PRIMARY KEY, source_id varchar(36) , Name varchar(20)); Loaded same 10million rows. explain Analyze select * from test1_bigint a , test2_bigint b where a.id = b.id Merge Join (cost=12.31..875534.52 rows=10000021 width=100) (actual time=0.042..6974.575 rows=10000000 loops=1) Merge Cond: (a.id = b.id) -> Index Scan using test1_bigint_pkey on test1_bigint a (cost=0.43..362780.75 rows=10000021 width=50) (actual time=0.020..2070.079 rows=10000000 loops=1) -> Index Scan using test2_bigint_2_pkey on test2_bigint b (cost=0.43..362780.75 rows=10000021 width=50) (actual time=0.019..2131.086 rows=10000000 loops=1) Planning Time: 0.207 ms Execution Time: 7311.210 ms set enable_seqscan=off; explain Analyze select * from test1_UUID a , test2_UUID b where a.source_id = b.source_id; Merge Join (cost=2.75..2022857.05 rows=10000021 width=100) (actual time=0.043..21954.213 rows=10000000 loops=1) Merge Cond: ((a.source_id)::text = (b.source_id)::text) -> Index Scan using test1_uuid_pkey on test1_UUID a (cost=0.56..936420.18 rows=10000021 width=50) (actual time=0.022..7854.143 rows=10000000 loops=1) -> Index Scan using test2_uuid_2_pkey on test2_UUID b (cost=0.56..936437.90 rows=10000021 width=50) (actual time=0.017..7971.187 rows=10000000 loops=1) Planning Time: 0.516 ms Execution Time: 22292.801 ms ********** create sequence myseq cache 32767; select count(nextval('myseq') ) from generate_series(1,10000000) 1 row retrieved starting from 1 in 4 s 521 ms (execution: 4 s 502 ms, fetching: 19 ms) select count(gen_random_uuid()) from generate_series(1,10000000) 1 row retrieved starting from 1 in 11 s 145 ms (execution: 11 s 128 ms, fetching: 17 ms) On Mon, 30 Jan, 2023, 4:59 pm veem v, <veema0...@gmail.com> wrote: > I have a question, As i understand here, usage wise there are multiple > benefits of UUID over sequences like, in case of distributed app where we > may not be able to rely on one point generator like sequences, in case of > multi master architecture, sharding. > > If we just look in terms of performance wise, the key advantage of > sequence is that for read queries, because of the storage size it will be > smaller and thus it will cache more index rows and so will be beneficial > during read queries and should also be beneficial even on joins because of > its smaller size. Also fetching a value from sequence is cheaper than > calculating the UUIDS. But the downside is during write operation, it can > be a point of contention in case of concurrent data load as every incoming > request will try to modify same table/index page/block. But as its > mentioned in this blog ( > https://www.2ndquadrant.com/en/blog/sequential-uuid-generators/), state > this UUID can be made sequential so even this can be sequential using > prefix-timestamp etc. However isn't it that making the UUID sequential will > again actually be a disadvantage and can be contention point for this > unique index as each incoming write will now fight for same block/page > while doing concurrent data load and will contend for the same table block > or say one side of the index branch/leaf block etc, whereas in case of > random UUIDs the write was spreading across multiple blocks so there was no > contention on any specific blocks? Please correct if my understanding is > wrong? > > > > On Sun, 29 Jan, 2023, 10:33 am Miles Elam, <miles.e...@productops.com> > wrote: > >> On Sat, Jan 28, 2023 at 8:02 PM Ron <ronljohnso...@gmail.com> wrote: >> > >> > Then it's not a Type 4 UUID, which is perfectly fine; just not random. >> >> Yep, which is why it really should be re-versioned to UUIDv8 to be >> pedantic. In everyday use though, almost certainly doesn't matter. >> >> > Also, should now() be replaced by clock_timestamp(), so that it can be >> > called multiple times in the same transaction? >> >> Not necessary. Instead of 122 bits of entropy, you get 106 bits of >> entropy and a new incremented prefix every minute. now() vs >> clock_timestamp() wouldn't make a substantive difference. Should still >> be reasonably safe against the birthday paradox for more than a >> century when creating more than a million UUIDs per second. >> >> >>