On Sat, Jun 8, 2024 at 7:03 PM veem v <veema0...@gmail.com> wrote: > Hi , > It's postgres version 15.4. A table is daily range partitioned on a column > transaction_timestamp. It has a unique identifier which is the ideal for > primary key (say transaction_id) , however as there is a limitation in > which we have to include the partition key as part of the primary key, so > it has to be a composite index. Either it has to be > (transaction_id,transaction_timestamp) or ( transaction_timestamp, > transaction_id). But which one should we go for, if both of the columns get > used in all the queries? > > We will always be using transaction_timestamp as mostly a range predicate > filter/join in the query and the transaction_id will be mostly used as a > join condition/direct filter in the queries. So we were wondering, which > column should we be using as a leading column in this index? > > There is a blog below (which is for oracle), showing how the index should > be chosen and it states , "*Stick the columns you do range scans on last > in the index, filters that get equality predicates should come first.* ", > and in that case we should have the PK created as in the order > (transaction_id,transaction_timestamp). It's because making the range > predicate as a leading column won't help use that as an access predicate > but as an filter predicate thus will read more blocks and thus more IO. > Does this hold true in postgres too? > > https://ctandrewsayer.wordpress.com/2017/03/24/the-golden-rule-of-indexing/ >
I believe the analogy holds true here in postgres too and the index in this case should be on (transaction_id, transaction_timestamp). > > > Additionally there is another scenario in which we have the requirement to > have another timestamp column (say create_timestamp) to be added as part of > the primary key along with transaction_id and we are going to query this > table frequently by the column create_timestamp as a range predicate. And > ofcourse we will also have the range predicate filter on partition key > "transaction_timestamp". But we may or may not have join/filter on column > transaction_id, so in this scenario we should go for > (create_timestamp,transaction_id,transaction_timestamp). because > "transaction_timestamp" is set as partition key , so putting it last > doesn't harm us. Will this be the correct order or any other index order is > appropriate? > > > In this case , the index should be on ( create_timestamp,transaction_id,transaction_timestamp), considering the fact that you will always have queries with "create_timestamp" as predicate and may not have transaction_id in the query predicate.