Re: How to do faster DML

veem v Mon, 05 Feb 2024 11:25:04 -0800

I have not much experience with postgres on this scale though, others may
provide better suggestions. However, with this scale you will have a single
table size ~30TB+. And someone just pointed out in this thread,  ~180
partitions/table as the good to have limit,  and if that is true it would
be ~170GB per partition. Looks bulky for a system where readresponse time
is expected to be in milliseconds.


On Mon, 5 Feb 2024 at 16:51, Lok P <loknath...@gmail.com> wrote:

>
>
> On Mon, 5 Feb, 2024, 2:21 am Lok P, <loknath...@gmail.com> wrote:
>
>
> On Sun, Feb 4, 2024 at 9:18 PM Greg Sabino Mullane <htamf...@gmail.com>
> wrote:
>
> Partitioning will depend a lot on how you access the data. Is it always
> using that transaction_date column in the where clause? Can you share the
> table definition via \d? We've talked about this table quite a bit, but not
> actually seen what it looks like. We are flying blind a little bit. You
> mentioned your queries are slow, but can you share an explain analyze on
> one of these slow queries?
>
> 45 billion rows is quite manageable. How many rows are in each day? You
> may want to do a larger bucket than 24 hours per partition.
>
> 1)Even after partitioning the target table , to speed up the data load on
> this table , Is there an option to disable the primary and foreign keys and
> re-enable them post data load finishes. Will that be a good idea
>
>
> No.
>
> 3)As the size of the table or each partition is going to be very large and
> this will be a read intensive application, compressing the historical
> partition will help us save the storage space and will also help the read
> queries performance
>
>
> I am not sure what this means. If you are not going to need the data
> anymore, dump the data to deep storage and drop the partition.
>
> Cheers,
> Greg
>
>
>
> Thank you.
>
> The table has ~127 columns of different data types , combinations of
> Numeric, varchar, date etc. And is having current size ~1TB holding
> ~3billion rows currently and the row size is ~300bytes.
>
> Currently it has lesser volume , but in future the daily transaction per
> day which will be inserted into this table will be Max ~500million
> rows/day. And the plan is to persist at least ~3months of transaction data
> which will be around 45billion rows in total. And if all works well , we
> may need to persist ~6 months worth of data in this database in future and
> that will be ~90 billion.
>
> This table will always be queried on the transaction_date column as one of
> the filters criteria. But the querying/search criteria can span from a day
> to a month worth of transaction date data.
>
> When you said "*You may want to do a larger bucket than 24 hours per
> partition.*", do you mean to say partition by weekly or so? Currently as
> per math i.e. 1TB of storage for ~3billion rows. So the daily range
> partition size( to hold ~500million transactions/day) will be around
> ~100-150GB. Won't that be too much data for a single partition to operate
> on, and increasing the granularity further(say weekly) will make the
> partition more bulkier?
>
> What I mean was, we will definitely need the data for querying purposes by
> the users, but just to keep the storage space incontrol (and to help the
> read queries), we were thinking of having the historical partitions
> compressed. And for that , if any compression strategy should we follow on
> postgres?
>
> With regards to loading data to the table faster, wondering why you said '
> *NO*' to load the data first and enabling/Creating the Primary key and
> Foreign key constraint later approach. Because this table is a child table
> and the parent is already having data in it, loading data to this table in
> presence of PK and FK makes it too slow as it tries to validate those for
> each set of rows. So we were thinking if doing it at a later stage at
> oneshot will be a faster approach. Please suggest.
>
> I will try to collect some SELECT query and post the explain analyze.
> Currently we are trying to get rid of the duplicates.
>
> Regards
> Lok
>
>
> Any thoughts, based on above usage pattern?
>
> While I did the maths based on the current stats with limited data sets.
> The partitions size coming as 100 to 150gb as I explained above, if we keep
> it daily range partition. Should we have to make it hourly then?
>
>  So want some experts view if this going to work fine for a read latency
> intensive applications or we should have some different strategy?
>

Re: How to do faster DML

Reply via email to