Dear PostgreSQL Community,
 
I have a rather large database with ~250 billion records in a partitioned 
table. The database has been running and ingesting data continuously for about 
3 years.
 
I had a "regular" BTree index on one of the fields (a unique bigint column) but 
it was getting too big for the disk it was on. The index was consuming 6.4 TB 
of disk space.
 
I created a new disk with double the size, dropped the original index, and 
started to generate a new index.
 
After doing some research I decided to try to create a hash index instead of a 
BTree. For my purposes, the index is only used to find specific numbers ("=" 
and "IN" queries). From what I read, the hash index should run a little faster 
than btree for my use case and should use less disk space.
 
After 115 hours, the hash index is still generating and is using significantly 
more disk space than the original BTree index (8.4 TB vs 6.4 TB). I don't know 
how to check the status of the index creation task so I can't really estimate 
how much longer it will take or how much disk space it will consume.
 
Questions:
 
(1) Why is the hash index consuming more disk space than the btree index? Is it 
because the hash of the bigint values larger than the storing the bigints in 
the btree?
(2) Are there any known issues having a hash index on this many records?
(3) Are there any known issues having a hash index on partitioned tables?
(4) Is there any way to estimate when the index process will complete?
 
Server info:
 - PostgreSQL 13
 - Ubuntu 20.04.6 LTS
 - 64 cores (only 1 is ever used during index creation)
 - Memory usage is steady at 58GB/188GB
 - All disks are high speed NVMe drives
 - 1,686 tables in the partition
 
Thanks in advance,
Peter

Reply via email to