I have not much experience with postgres on this scale though, others may provide better suggestions. However, with this scale you will have a single table size ~30TB+. And someone just pointed out in this thread, ~180 partitions/table as the good to have limit, and if that is true it would be ~170GB per partition. Looks bulky for a system where readresponse time is expected to be in milliseconds.
On Mon, 5 Feb 2024 at 16:51, Lok P <loknath...@gmail.com> wrote: > > > On Mon, 5 Feb, 2024, 2:21 am Lok P, <loknath...@gmail.com> wrote: > > > On Sun, Feb 4, 2024 at 9:18 PM Greg Sabino Mullane <htamf...@gmail.com> > wrote: > > Partitioning will depend a lot on how you access the data. Is it always > using that transaction_date column in the where clause? Can you share the > table definition via \d? We've talked about this table quite a bit, but not > actually seen what it looks like. We are flying blind a little bit. You > mentioned your queries are slow, but can you share an explain analyze on > one of these slow queries? > > 45 billion rows is quite manageable. How many rows are in each day? You > may want to do a larger bucket than 24 hours per partition. > > 1)Even after partitioning the target table , to speed up the data load on > this table , Is there an option to disable the primary and foreign keys and > re-enable them post data load finishes. Will that be a good idea > > > No. > > 3)As the size of the table or each partition is going to be very large and > this will be a read intensive application, compressing the historical > partition will help us save the storage space and will also help the read > queries performance > > > I am not sure what this means. If you are not going to need the data > anymore, dump the data to deep storage and drop the partition. > > Cheers, > Greg > > > > Thank you. > > The table has ~127 columns of different data types , combinations of > Numeric, varchar, date etc. And is having current size ~1TB holding > ~3billion rows currently and the row size is ~300bytes. > > Currently it has lesser volume , but in future the daily transaction per > day which will be inserted into this table will be Max ~500million > rows/day. And the plan is to persist at least ~3months of transaction data > which will be around 45billion rows in total. And if all works well , we > may need to persist ~6 months worth of data in this database in future and > that will be ~90 billion. > > This table will always be queried on the transaction_date column as one of > the filters criteria. But the querying/search criteria can span from a day > to a month worth of transaction date data. > > When you said "*You may want to do a larger bucket than 24 hours per > partition.*", do you mean to say partition by weekly or so? Currently as > per math i.e. 1TB of storage for ~3billion rows. So the daily range > partition size( to hold ~500million transactions/day) will be around > ~100-150GB. Won't that be too much data for a single partition to operate > on, and increasing the granularity further(say weekly) will make the > partition more bulkier? > > What I mean was, we will definitely need the data for querying purposes by > the users, but just to keep the storage space incontrol (and to help the > read queries), we were thinking of having the historical partitions > compressed. And for that , if any compression strategy should we follow on > postgres? > > With regards to loading data to the table faster, wondering why you said ' > *NO*' to load the data first and enabling/Creating the Primary key and > Foreign key constraint later approach. Because this table is a child table > and the parent is already having data in it, loading data to this table in > presence of PK and FK makes it too slow as it tries to validate those for > each set of rows. So we were thinking if doing it at a later stage at > oneshot will be a faster approach. Please suggest. > > I will try to collect some SELECT query and post the explain analyze. > Currently we are trying to get rid of the duplicates. > > Regards > Lok > > > Any thoughts, based on above usage pattern? > > While I did the maths based on the current stats with limited data sets. > The partitions size coming as 100 to 150gb as I explained above, if we keep > it daily range partition. Should we have to make it hourly then? > > So want some experts view if this going to work fine for a read latency > intensive applications or we should have some different strategy? >