Thanks to everyone for all the detailed responses so far.
The dataset we're working with, in its original form, is roughly a
terabyte; I would guess in the database it will come out to about 2TB-ish.
It will definitely not fit in RAM, unfortunately, and our access patterns
will be quite random, at
I am using a Postgres instance in an HPC cluster, where they have
generously given me an entire node. This means I have 28 cores and 252GB
RAM. I have to assume that the very conservative default settings for
things like buffers and max working memory are too small here.
We have about 20 billion r