Partitioning options

2024-02-07 Thread sud
Hi ,
We have a system which stores customers' transactions. There are a total of
~100K customers currently and the list will increase in future but not
drastically though(maybe ~50K more or so). The number of transactions per
day is ~400million. and we want to persist them in our postgres database
for ~5months.

The key transaction table is going to have ~450 Million transactions per
day and the data querying/filtering will always happen based on the
"transaction date" column. And mostly "JOIN" will be happening on the
"CUTSOMER_ID" column along with filters for some scenarios on customer_id
columns. Each one day worth of transaction consumes ~130GB of storage space
as we verified using the "pg_relation_size" function, for a sample data set.

As mentioned, there will be ~100K distinct "customer_id" but the data won't
be equally distributed , they will be skewed in nature for e.g. Some of the
big customers will hold majority of the transactions (say 20-30% of total
transactions) and other are distributed among others, but again not equally.

So my question was , in the above scenario should we go for a composite
partitioning strategy i.e range/hash (daily range partition by
transaction_date and hash subpartition by customer_id)?
OR
Should we go for simple daily range partitioning on the transaction_date
column?
OR
Range/list composite partitioning (range partition by transaction_date and
list subpartition by customer_id)?

Thanks and Regards
Sud


Encryption Options

2024-02-15 Thread sud
Hello Friends,

We are newly moving to postgres database (yet to decide if it would be an
on premise one or AWS aurora postgres). However ,  we want to understand
what  encryption / decryption techniques are available in the postgres
database.

We may have some sensitive/"personal information" (like customer name,
account number etc )stored in the database and thus we may need "data at
rest encryption", what are the options available here?

 Along with that, we want to understand, any other option to store the
specific "data attribute" itself in the database by encrypting, so it won't
be visible in clear text to anybody and decrypting the same while needed
and what would be the performance overhead of those options?

Regards
Sud


Re: Encryption Options

2024-02-16 Thread sud
On Fri, Feb 16, 2024 at 10:50 PM Greg Sabino Mullane 
wrote:

> You need to clearly define your threat model. What exactly are you
> defending against? What scenario do you want to avoid?
>
> Also, your decision of on-premise or Aurora is extremely relevant to your
> range of options.
>
>
Thank you.

Yes these are Account number/PCI data and "data at rest" encryption is
something management is asking to have irrespective of whether we encrypt
those before storing in the database or not. And this system needs to
adhere to PCI 4.0 standards , so it looks like we can't persist the PCI
data as is in th database even if the 'data at rest' encryption is there,
it has to be encrypted before storing into the database.
https://www.varonis.com/blog/pci-dss-requirements

Agreed. The on-premise vs aurora will take a different approach for
catering to above needs. We are currently evaluating , what would be the
possible options in each of these cases? and if this would be a factor in
choosing the on-premise postgres vs aurora postgres?

On Sat, Feb 17, 2024 at 12:40 AM Ron Johnson 
wrote:

> The problem with encrypting "account number" is that you can't JOIN or
> WHERE on it. That's not always necessary, though.  The pgcrypto module does
> what it says, but requires application-level changes,
>
> Encryption at rest can be accomplished with filesystem-level encryption,
> and backup encryption.  (PgBackRest has that feature, using AES-256.  Don't
> know about BarMan.)
>
>
Will try to verify these options. Considering these system processes 100's
of millions of transactions, will these encryption add significant
overhead? It would be great, if you can suggest some doc to follow, for
implementing these. Not sure if the same would work for aurora too.

Regards
Sud


Question on Table creation

2024-02-20 Thread sud
Hi,
We are newly creating tables in postgres 15.4 version and we got the DDL
auto generated from one of the tools and they look something like below.
(note- I have modified the exact names to some dummy names.) These are all
failing because the schema which already exists in the database having name
'schema_name' which is all lower case.So then i modified the script to
remove the double quotes from all the table/column/schema names, as it
seems postgres makes things case sensitive if they are put in quotes.

But then encountered the opposite, i.e. some places where it's showing the
object already created in the database as Upper case or mixed case like
schema owner which is showing as "*S*chema_*O*wner" as I see in the
information_schema.schemata data dictionary. And here the scripts failing
if removing the quotes from the schema owner.

So to make it standardized, we have few questions around these

1)In this situation , do we have to drop the  "*S*chema_*O*wner"  and
recreate it with all small letters? And then create the schema with small
letters again?

2)As it seems keeping mixed sets will be troublesome while accessing them
at a later stage, so is it advisable to not to use quotes while creating
key database objects like schema/table/views/columns/indexes/constraints in
postgres? Is there any other standard we should follow in postgres so as to
not have such issues in future?

3)"Comment" on table is not accepted in the same "create table" statement
but as a separate statement post table creation. Hope that is how it works
in postgres.

4)Never created or maintained any partition table in postgres. Here we want
to daily range partition the both tables based on column
"PARENT_CREATE_TIMESTAMP", so is it advisable to create the individual
future partitions post table creation manually or through some automated
job or we should do it using pg_partman extension? I do see a lot of docs
around pg_partman. Would you suggest any specific doc which guides us to do
it in an easy way.

CREATE TABLE "Schema_Name"."PARENT"
(
"PARENT_IDENTIFIER" varchar(36)  NOT NULL ,
"PARENT_CREATE_TIMESTAMP" date  NOT NULL ,
CONSTRAINT "PARENT_PK" PRIMARY KEY
("PARENT_IDENTIFIER","PARENT_CREATE_TIMESTAMP")
);

ALTER TABLE "Schema_Name"."PARENT" OWNER TO "Scheme_Owner";
COMMENT ON TABLE "Schema_Name"."PARENT" IS 'table comment';
COMMENT ON COLUMN "Schema_Name"."PARENT"."PARENT_IDENTIFIER" IS 'Column
comment';

***

CREATE TABLE "Schema_Name"."CHILD"
(
"CHILD_IDENTIFIER" varchar(36)  NOT NULL ,
"CHILD_STATUS_CODE" varchar(9)  NOT NULL ,
"CHILD_EVENT_STATUS_TIMESTAMP" date  NOT NULL ,
"CHILD_CREATE_TIMESTAMP" date  NOT NULL
CONSTRAINT "XPKCHILD_STATUS" PRIMARY KEY
("CHILD_IDENTIFIER","CHILD_EVENT_STATUS_TIMESTAMP","CHILD_CREATE_TIMESTAMP"),
CONSTRAINT "R_12" FOREIGN KEY ("CHILD_IDENTIFIER","CHILD_CREATE_TIMESTAMP")
REFERENCES
"Schema_Name"."PARENT"("PARENT_IDENTIFIER","PARENT_CREATE_TIMESTAMP")
ON UPDATE RESTRICT
ON DELETE RESTRICT
);

CREATE INDEX "XIF1CHILD_STATUS" ON "Schema_Name"."CHILD_STATUS"
(
"CHILD_IDENTIFIER",
"CHILD_CREATE_TIMESTAMP"
);

ALTER TABLE "Schema_Name"."CHILD_STATUS" OWNER TO "Scheme_Owner";
COMMENT ON TABLE "Schema_Name"."CHILD_STATUS" IS 'table comment';
COMMENT ON COLUMN "Schema_Name"."CHILD_STATUS"."CHILD_IDENTIFIER" IS
'column comment';

Regards
Sud


Re: Question on Table creation

2024-02-22 Thread sud
Thank you so much. This really helped.

Regards
Sud

>


Re: Creating table and indexes for new application

2024-02-22 Thread sud
On Fri, 23 Feb, 2024, 12:41 pm Laurenz Albe, 
wrote:

> On Fri, 2024-02-23 at 02:05 +0530, yudhi s
>
> > 2)Should we be creating composite indexes on each foreign key for table2
> and table3, because
> >   any update or delete on parent is going to take lock on all child
> tables?
>
> Every foreign key needs its own index.  A composite index is only
> appropriate if the foreign
> key spans multiple columns.
>
>

>From the DDL which OP posted it's using composite foreign key thus a
composite index would be needed.
However, if someone doesn't delete or update the parent table PK   , is it
still advisable to have all the FK indexed? Like in general I think
transaction id should not get updated in a normal scenario unless some
special case.


Re: Creating table and indexes for new application

2024-02-23 Thread sud
On Fri, 23 Feb, 2024, 1:28 pm yudhi s,  wrote:

>
>
> On Fri, 23 Feb, 2024, 1:20 pm sud,  wrote:
>
>>
>>
>> On Fri, 23 Feb, 2024, 12:41 pm Laurenz Albe, 
>> wrote:
>>
>>> On Fri, 2024-02-23 at 02:05 +0530, yudhi s
>>>
>>> > 2)Should we be creating composite indexes on each foreign key for
>>> table2 and table3, because
>>> >   any update or delete on parent is going to take lock on all child
>>> tables?
>>>
>>> Every foreign key needs its own index.  A composite index is only
>>> appropriate if the foreign
>>> key spans multiple columns.
>>>
>>>
>>
>> From the DDL which OP posted it's using composite foreign key thus a
>> composite index would be needed.
>> However, if someone doesn't delete or update the parent table PK   , is
>> it still advisable to have all the FK indexed? Like in general I think
>> transaction id should not get updated in a normal scenario unless some
>> special case.
>>
>>
>>
> Thank you. I can double check if we have confirmed use case of deleting
> the parent table or updating PK in the parent table. But anyway it can
> happen for data fix for sure in some scenario.
>
> But yes, we are certainly going to drop/purge partition from all the
> parent and child table after specific days. So isn't that need the FK to be
> indexed or else it will scan whole parent table partition?
>


I am not sure if drop partition of parent table, will have a lock or will
do a full scan on the child table while doing the partition maintenance or
dropping the partitions, in absence of foreign key index. Others may
comment here.

>
>>


Re: Question on Table creation

2024-02-27 Thread sud
On Wed, Feb 21, 2024 at 5:29 AM Adrian Klaver 
wrote:

> > 4)Never created or maintained any partition table in postgres. Here we
> > want to daily range partition the both tables based on column
> > "PARENT_CREATE_TIMESTAMP", so is it advisable to create the individual
> > future partitions post table creation manually or through some automated
> > job or we should do it using pg_partman extension? I do see a lot of
> > docs around pg_partman. Would you suggest any specific doc which guides
> > us to do it in an easy way.
>
> The 'easy' way is the one you understand and can maintain. pg_partman
> does a lot of the boiler plate for you so there is that. The other side
> is you need to read and understand:
>
> https://github.com/pgpartman/pg_partman/blob/master/doc/pg_partman.md
>
>
>
While testing the pg_partman extension I see it by default creates a
"default partition" even if we provide the parameter p_start_partition as
"current date". But if someone purposely doesn't want to have the default
partitions , so as to avoid any unwanted data entering the table , which
may be required from a data quality perspective. Is there any option in the
create_parent function to achieve that? Or do we have to manually drop the
default partition after getting this created through create_parent function?

I am not able to understand the significance of the "p_template_table"
parameter. When we create the partitions without passing this parameter the
template table gets created automatically in the partman schema. Checking
the details of the template table, it doesn't show any indexes or
constraints present in it, but still the child partitions get created with
all the necessary indexes and constraints as those are there defined for
the main table. So wondering , in what real life scenario do we really need
the template table to be defined and how will it help if the partitions are
different then from the main table structure?

Lastly , the template table is by default created in the partman schema but
it also works without error,  if we pass the template table to be created
in the application schema. So is there any downside of having the template
table reside in the application schema?

Thanks And Regards
Sud


Re: Question on Table creation

2024-02-29 Thread sud
Thank you so much.

I tested and it worked as you mentioned i.e the template table is helping
to add the extra indexes or constraints to the child table/partitions (by
inheriting those from the template table if defined), those are not defined
there in the main table. Also dropping the default partition does no harm
to the new partition creation through the automatic maintenance job.

Though I am unable to visualize the situation in which we want the child
table/partitions to be having additional indexes or constraints as compared
to the main table. But I also see that the template table will be phased
out quickly. So I believe it's better to just go with the default template
table with current Pg version 16.

https://github.com/pgpartman/pg_partman/blob/master/doc/pg_partman.md

IMPORTANT NOTES:

   -

   The template table feature is only a temporary solution to help speed up
   declarative partitioning adoption. As things are handled better in core,
   the use of the *template table will be phased out quickly *from
   pg_partman. If a feature that was managed by the template is supported in
   core in the future, it will eventually be removed from template management
   in pg_partman, so please plan ahead for that during major version upgrading
   if it applies to you.




On Thu, Feb 29, 2024 at 1:58 AM veem v  wrote:

>
> On Wed, 28 Feb 2024 at 01:24, sud  wrote:
>
>> While testing the pg_partman extension I see it by default creates a
>> "default partition" even if we provide the parameter p_start_partition as
>> "current date". But if someone purposely doesn't want to have the default
>> partitions , so as to avoid any unwanted data entering the table , which
>> may be required from a data quality perspective. Is there any option in the
>> create_parent function to achieve that? Or do we have to manually drop the
>> default partition after getting this created through create_parent function?
>>
>> I am not able to understand the significance of the "p_template_table"
>> parameter. When we create the partitions without passing this parameter the
>> template table gets created automatically in the partman schema. Checking
>> the details of the template table, it doesn't show any indexes or
>> constraints present in it, but still the child partitions get created with
>> all the necessary indexes and constraints as those are there defined for
>> the main table. So wondering , in what real life scenario do we really need
>> the template table to be defined and how will it help if the partitions are
>> different then from the main table structure?
>>
>> Lastly , the template table is by default created in the partman schema
>> but it also works without error,  if we pass the template table to be
>> created in the application schema. So is there any downside of having the
>> template table reside in the application schema?
>>
>>
>
> As per my understanding , you can't control the creation of the default
> partition. If you somehow don't want to keep it then, you can drop it post
> creation of the partition for the first time i.e after the create_parent
> function call.
>
> Template_table is necessary when someone needs the child partitions to
> have a difference in structure as compared to main table like difference in
> indexes or constraints. But I think the number and data types of columns
> should be the same in main table and template tables. I have not tried
> though.
>
> I think keeping a template table in an application schema is not a good
> idea considering the fact that it's not a business table but a technical
> one, and nobody should accidentally alter this.
>
>>
>>
>


Is partition pruning impacted by data type

2024-03-04 Thread sud
Hi,
We are designing one application which is currently restricted to one time
zone users but has the possibility to go global in future. Some of the
transaction tables are going to be daily range partitioned on the
transaction_create_date column. But the "date" data type will have no time
component in it, so we are thinking to make it as timestamp data
type(timestamptz(6)), so that it will help us in us two ways,

firstly , though current use cases in which the majority of the queries are
going to happen on a day or multiple days of transactions. But if we have
any use case which needs further lower granularity like in hourly duration
, then having "timestamp" data type with an index created on it will help.
And in future , if we plan to partition it based on further lower
granularity like hourly , that can be accommodated easily with a
"timestamp" data type.

However the question we have is ,
1)If there is any downside of having the partition key with "timestamp with
timezone" type? Will it impact the partition pruning of the queries anyway
by appending any run time "time zone" conversion function during the query
planning/execution phase?
2) As it will take the default server times , so during daylight saving
the server time will change, so in that case, can it cause any unforeseen
issue?
3)Will this cause the data to be spread unevenly across partitions and make
the partitions unevenly sized? If will go for UTC/GMT as db time, the
user's one day transaction might span across two daily partitions.


Thanks and Regards
Sud


Re: Is partition pruning impacted by data type

2024-03-05 Thread sud
Thank you.

Yes, I tried creating a table manually with column timestamptz(6) type and
partitioned on that and then executed select query with the filter on that
column and I do see partition pruning happening. Not able to visualize any
other issues though, however some teammates say it may have a negative
impact on aggregation type queries , not sure how but will try to test it.
Thanks again for the response.

On Wed, Mar 6, 2024 at 12:35 AM Lok P  wrote:

>
> On Tue, Mar 5, 2024 at 1:09 AM sud  wrote:
>
>>
>> However the question we have is ,
>> 1)If there is any downside of having the partition key with "timestamp
>> with timezone" type? Will it impact the partition pruning of the queries
>> anyway by appending any run time "time zone" conversion function during the
>> query planning/execution phase?
>> 2) As it will take the default server times , so during daylight saving
>> the server time will change, so in that case, can it cause any unforeseen
>> issue?
>> 3)Will this cause the data to be spread unevenly across partitions and
>> make the partitions unevenly sized? If will go for UTC/GMT as db time, the
>> user's one day transaction might span across two daily partitions.
>>
>>
> My 2 cents.
> We have cases which use the "timestamp with timezone" column as partition
> key  and the partition pruning happens for the read queries without any
> issue, so we don't see any conversion functions applied to the predicate as
> such which is partition key. I think if the users go global it's better to
> have the database time in UTC time zone. and it's obvious that, In case of
> global users the data ought to be span across multiple days as the days
> won't be as per the users time zone rather UTC.
>
>
>
>


Re: Is partition pruning impacted by data type

2024-03-07 Thread sud
g_catalog.pg_namespace n on c.relnamespace = n.oid
   left join pg_partitioned_table p on p.partrelid = c.oid
order by n.nspname, c.relname;

test_timestamp2_default DEFAULT
test_timestamp2_p2024_03_07 FOR VALUES FROM ('2024-03-07 00:00:00-05') TO
('2024-03-08 00:00:00-05')
test_timestamp2_p2024_03_08 FOR VALUES FROM ('2024-03-08 00:00:00-05') TO
('2024-03-09 00:00:00-05')
test_timestamp2_p2024_03_09 FOR VALUES FROM ('2024-03-09 00:00:00-05') TO
('2024-03-10 00:00:00-05')
test_timestamp2_p2024_03_10 FOR VALUES FROM ('2024-03-10 00:00:00-05') TO
('2024-03-11 00:00:00-05')
test_timestamp2_p2024_03_11 FOR VALUES FROM ('2024-03-11 00:00:00-05') TO
('2024-03-12 00:00:00-05')


SET SESSION TIME ZONE 'UTC';

test_timestamp2_default DEFAULT
test_timestamp2_p2024_03_07 FOR VALUES FROM ('2024-03-07 *05:00:00+00*') TO
('2024-03-08 05:00:00+00')
test_timestamp2_p2024_03_08 FOR VALUES FROM ('2024-03-08 *05:00:00+00*') TO
('2024-03-09 05:00:00+00')
test_timestamp2_p2024_03_09 FOR VALUES FROM ('2024-03-09 *05:00:00+00*') TO
('2024-03-10 05:00:00+00')
test_timestamp2_p2024_03_10 FOR VALUES FROM ('2024-03-10 *05:00:00+00*') TO
('2024-03-11 05:00:00+00')
test_timestamp2_p2024_03_11 FOR VALUES FROM ('2024-03-11 *05:00:00+00*') TO
('2024-03-12 05:00:00+00')

Regards
Sud


Question related to partitioning with pg_partman

2024-03-08 Thread sud
Starting a new thread...

Something interesting and not sure if its expected behaviour as below. We
are also confused a bit here.

In the below example we created two partitioned tables on timestamptz type
columns with different time zones and the child partitions are created
appropriately with boundaries as one mid night to next mid night of a day
and so on. But when we change the time zone and query the  data dictionary
views again, it shows the start and end of the partition boundary as not
midnights but different times of the day's values.

So I was wondering if this can cause us any unforeseen issues in the long
run while creating the partitions though partman and persisting the data
into the tables from the end users then querying those and having queries
properly partitioned pruned?
or
should we always set the local timezone as UTC always before running or
calling the pg_partman/pg_cron process which creates the partitions? Mainly
in a database which serves global users sitting across multiple timezones.
And same thing while inserting data into the table, we should use UTC
timezone conversion function. Can you please confirm.

And while checking the timezone using the "show timezone" function it shows
the local timezone, so is there any way to see postgres DB the server
timezone?

***Example

SET SESSION TIME ZONE 'UTC';
CREATE TABLE test_timestamp (
ts TIMESTAMP,
tstz TIMESTAMPTZ) PARTITION BY RANGE (tstz);

SELECT partman.create_parent(
   p_parent_table := 'public.test_timestamp',
   p_control := 'tstz',
   p_type := 'native',
   p_interval := '1 day',
   p_premake := 4,
   p_start_partition => '2024-03-07 00:00:00'
);

UPDATE partman.part_config SET infinite_time_partitions = 'true' WHERE
parent_table = 'public.test_timestamp';

with recursive inh as (
   select i.inhrelid, null::text as parent
   from pg_catalog.pg_inherits i
 join pg_catalog.pg_class cl on i.inhparent = cl.oid
 join pg_catalog.pg_namespace nsp on cl.relnamespace = nsp.oid
   where nsp.nspname = 'public'
 and cl.relname = 'test_timestamp2'
   union all
   select i.inhrelid, (i.inhparent::regclass)::text
   from inh
   join pg_catalog.pg_inherits i on (inh.inhrelid = i.inhparent)
)
select c.relname as partition_name,
pg_get_expr(c.relpartbound, c.oid, true) as partition_expression
from inh
   join pg_catalog.pg_class c on inh.inhrelid = c.oid
   join pg_catalog.pg_namespace n on c.relnamespace = n.oid
   left join pg_partitioned_table p on p.partrelid = c.oid
order by n.nspname, c.relname;

test_timestamp_default DEFAULT
test_timestamp_p2024_03_07 FOR VALUES FROM ('2024-03-07 00:00:00+00') TO
('2024-03-08 00:00:00+00')
test_timestamp_p2024_03_08 FOR VALUES FROM ('2024-03-08 00:00:00+00') TO
('2024-03-09 00:00:00+00')
test_timestamp_p2024_03_09 FOR VALUES FROM ('2024-03-09 00:00:00+00') TO
('2024-03-10 00:00:00+00')
test_timestamp_p2024_03_10 FOR VALUES FROM ('2024-03-10 00:00:00+00') TO
('2024-03-11 00:00:00+00')
test_timestamp_p2024_03_11 FOR VALUES FROM ('2024-03-11 00:00:00+00') TO
('2024-03-12 00:00:00+00')

SET SESSION TIME ZONE 'EST';

test_timestamp_default DEFAULT
test_timestamp_p2024_03_07 FOR VALUES FROM ('2024-03-06 *19:00:00-05*') TO
('2024-03-07 19:00:00-05')
test_timestamp_p2024_03_08 FOR VALUES FROM ('2024-03-07 *19:00:00-05*') TO
('2024-03-08 19:00:00-05')
test_timestamp_p2024_03_09 FOR VALUES FROM ('2024-03-08 *19:00:00-05*') TO
('2024-03-09 19:00:00-05')
test_timestamp_p2024_03_10 FOR VALUES FROM ('2024-03-09 *19:00:00-05*') TO
('2024-03-10 19:00:00-05')
test_timestamp_p2024_03_11 FOR VALUES FROM ('2024-03-10 *19:00:00-05*') TO
('2024-03-11 19:00:00-05')

***

SET SESSION TIME ZONE 'EST';

CREATE TABLE test_timestamp2 (
ts TIMESTAMP,
tstz TIMESTAMPTZ) PARTITION BY RANGE (tstz);

SELECT partman.create_parent(
   p_parent_table := 'public.test_timestamp2',
   p_control := 'tstz',
   p_type := 'native',
   p_interval := '1 day',
   p_premake := 4,
   p_start_partition => '2024-03-07 00:00:00'
);

UPDATE partman.part_config SET infinite_time_partitions = 'true' WHERE
parent_table = 'public.test_timestamp2';

with recursive inh as (
   select i.inhrelid, null::text as parent
   from pg_catalog.pg_inherits i
 join pg_catalog.pg_class cl on i.inhparent = cl.oid
 join pg_catalog.pg_namespace nsp on cl.relnamespace = nsp.oid
   where nsp.nspname = 'public'
 and cl.relname = 'test_timestamp2'
   union all
   select i.inhrelid, (i.inhparent::regclass)::text
   from inh
   join pg_catalog.pg_inherits i on (inh.inhrelid = i.inhparent)
)
select c.relname as partition_name,
pg_get_expr(c.relpartbound, c.oid, true) as partition_expression
from inh
   join pg_catalog.pg_class c on inh.inhrelid = c.oid
   join pg_catalog.pg_namespace n on c.relnamespace = n.oid
   left join pg_partitioned_table p on p.partrelid = c.oid
order by n.nspname, c.relname;

test_timestamp2_default DEFAULT
test_timestamp2_p2024_03_07 FOR VALUES FROM ('2024-03-07 00:00:00-05') TO
('2

Re: Question related to partitioning with pg_partman

2024-03-08 Thread sud
Can somebody help me to understand the behaviour?

>


Re: Question related to partitioning with pg_partman

2024-03-10 Thread sud
On Sat, Mar 9, 2024 at 3:41 AM Adrian Klaver 
wrote:

> On 3/8/24 00:23, sud wrote:
> >
> > Starting a new thread...
> >
> > Something interesting and not sure if its expected behaviour as below.
> > We are also confused a bit here.
> >
> > In the below example we created two partitioned tables on timestamptz
> > type columns with different time zones and the child partitions are
> > created appropriately with boundaries as one mid night to next mid night
> > of a day and so on. But when we change the time zone and query the  data
> > dictionary views again, it shows the start and end of the partition
> > boundary as not midnights but different times of the day's values.
> >
> > So I was wondering if this can cause us any unforeseen issues in the
> > long run while creating the partitions though partman and persisting the
> > data into the tables from the end users then querying those and having
> > queries properly partitioned pruned?
> > or
> > should we always set the local timezone as UTC always before running or
> > calling the pg_partman/pg_cron process which creates the partitions?
> > Mainly in a database which serves global users sitting across multiple
> > timezones. And same thing while inserting data into the table, we should
> > use UTC timezone conversion function. Can you please confirm.
>
> '2024-03-07 00:00:00+00' and '2024-03-06 19:00:00-05' are the same time
> as is '2024-03-07 00:00:00-05' and '2024-03-07 05:00:00+00'.
>
> Still I would think for sanity sake you would want to stick with UTC.
>
>

Thank you so much Adrian.

In my example in the first post, I see, if someone connected to a RDS
Postgres database and run the create partition command using pg_partman by
setting the timezone as "UTC", the 7th march partition looks to be spanned
from "7th march midnight" to "8th march midnight", when queried the
partition_experession from the data dictionary view. Which is correct.

And same information if someone querying by setting the timezone as EST is
showing spanning from "6th march 7PM" to "7th March 7PM". And this can
cause sometimes the partition may shift to other days all together. Similar
differences happen if creating the partitions using EST timezone initially
and then querying the data dictionary from UTC timezone.

So my question was, if in these types of scenarios, we should follow a
standard approach of setting the timezone as UTC in such a type of global
user use case, while the system can persist data from multiple users
sitting across different time zones? So that the boundary(start and end
time) of each of the range partitions will be set as consistent in one
timezone across all the partitioned tables?

And even while inserting the data , should we set the timezone to first UTC
and do the data load ?

*** Partition created by pg_partman by setting timezone as UTC
***

*UTC*
*Partition_name Partition_expression*
test_timestamp_p2024_03_07 FOR VALUES FROM ('2024-03-07 00:00:00+00') TO
('2024-03-08 00:00:00+00')

when queried the partition_expression using EST ..

*EST*
*Partition_name Partition_expression*
test_timestamp_p2024_03_07 FOR VALUES FROM ('2024-03-06 19:00:00-05') TO
('2024-03-07 19:00:00-05')


*** Partition created by pg_partman by setting timezone as EST
***

*EST*
*Partition_name Partition_expression*
test_timestamp2_p2024_03_07 FOR VALUES FROM ('2024-03-07 00:00:00-05') TO
('2024-03-08 00:00:00-05')

when queried the partition_expression using UTC ..

*UTC*
*Partition_name Partition_expression*
test_timestamp2_p2024_03_07 FOR VALUES FROM ('2024-03-07 05:00:00+00') TO
('2024-03-08 05:00:00+00')

***

Also i see both the "setting" and "reset_val" is showing as local timezone
only. If we set the timezone to a different value than the local timezone
then it gets updated on the "setting".

Regards
Sud


Re: Question related to partitioning with pg_partman

2024-03-10 Thread sud
On Sun, Mar 10, 2024 at 10:32 PM Adrian Klaver 
wrote:

> On 3/10/24 05:12, sud wrote:
> >
> > In my example in the first post, I see, if someone connected to a RDS
> > Postgres database and run the create partition command using pg_partman
> > by setting the timezone as "UTC", the 7th march partition looks to be
> > spanned from "7th march midnight" to "8th march midnight", when queried
> > the partition_experession from the data dictionary view. Which is
> correct.
> >
> > And same information if someone querying by setting the timezone as EST
> > is showing spanning from "6th march 7PM" to "7th March 7PM". And this
> > can cause sometimes the partition may shift to other days all together.
> > Similar differences happen if creating the partitions using EST timezone
> > initially and then querying the data dictionary from UTC timezone.
>
> The above is at odds with your example below which has the correct values:
>
> 2024-03-07 00:00:00+00 = 2024-03-06 19:00:00-05
>
> >
> > So my question was, if in these types of scenarios, we should follow a
> > standard approach of setting the timezone as UTC in such a type of
> > global user use case, while the system can persist data from multiple
> > users sitting across different time zones? So that the boundary(start
> > and end time) of each of the range partitions will be set as consistent
> > in one timezone across all the partitioned tables?
>
> You need to first determine what your time frames are going to be?
>
> 1) Midnight to Midnight in UTC will be consistent when viewed in UTC. It
> will not be when viewed in other time zone +/- the offset from UTC.
>
> 2) Or Midnight to Midnight in the users time zone, in which case the UTC
> values will differ.
>
> You have to decide which of the above is your goal. The bottom line is
> by definition the local wall clock time will not equal UTC, GMT
> excepted. This comes down to what the purpose of the partitions are? In
> other words how do you want to organize the data?
>
> >
> > And even while inserting the data , should we set the timezone to first
> > UTC and do the data load ?
>
>
> >
> > *** Partition created by pg_partman by setting timezone as UTC
> > ***
> >
> > *UTC*
> > *Partition_name Partition_expression*
> > test_timestamp_p2024_03_07 FOR VALUES FROM ('2024-03-07 00:00:00+00') TO
> > ('2024-03-08 00:00:00+00')
> >
> > when queried the partition_expression using EST ..
> >
> > *EST*
> > *Partition_name Partition_expression*
> > test_timestamp_p2024_03_07 FOR VALUES FROM ('2024-03-06 19:00:00-05') TO
> > ('2024-03-07 19:00:00-05')
> >
> >
> > *** Partition created by pg_partman by setting timezone as EST
> > ***
> >
> > *EST*
> > *Partition_name Partition_expression*
> > test_timestamp2_p2024_03_07 FOR VALUES FROM ('2024-03-07 00:00:00-05')
> > TO ('2024-03-08 00:00:00-05')
> >
> > when queried the partition_expression using UTC ..
> >
> > *UTC*
> > *Partition_name Partition_expression*
> > test_timestamp2_p2024_03_07 FOR VALUES FROM ('2024-03-07 05:00:00+00')
> > TO ('2024-03-08 05:00:00+00')
> >
> > ***
> >
> > Also i see both the "setting" and "reset_val" is showing as local
> > timezone only. If we set the timezone to a different value than the
> > local timezone then it gets updated on the "setting".
>
>
Our requirement is to have the transaction table partitioned by range daily
on the transaction_date column(i.e one midnight to next midnight
transaction data in one partition). Transaction date column will be of
timestamptz data type. And this application/database might be consuming
data from users across multiple time zones in future. These tables will be
queried based on the date range (minimum being ~1 transaction day) and also
will be purged one day partition.

So for above I understand , it might not be possible to keep the users data
restricted to one day partition in the table considering the users will
perform transactions across multiple timezones, but we are thinking of
restricting the database with UTC timezone irrespective of the users. And
thus during creating the table partitions , we need to ensure the UTC
timezone is set , such that the upper and lower boundary for the daily
range partitions remains consistent for all. Correct me if my understanding
is wrong.


Re: Question related to partitioning with pg_partman

2024-03-10 Thread sud
On Sun, Mar 10, 2024 at 11:31 PM Adrian Klaver 
wrote:

> 1) The partition will be across one day(24 hours) it is just the times
> may confuse people. Per you example 2024-03-07 00:00:00+00  is the same
> time as 2024-03-06 19:00:00-05 for EST. The issue is that the +00 and
> -05 maybe ignored. Also it depends on the clients being consistent in
> using timestamptz.
>
> 2) You still have not answered what the datetime range(not date range)
> is that will be queried. If you have the partitions Midnight to Midnight
> UTC and the clients are querying Midnight to Midnight local time the
> query will not match the partitions.
>
>
 My apology if not able to clearly put the details. Actually, the query
will always happen on a day basis i.e they can query from one day to 15
days transactions. But as you rightly pointed , the partitions can only
span from midnight to midnight in one timezone, and thus users who queries
the data from another time zone will mostly scan two partitions (even if
they just queries one days transaction data in their own timezone). And I
don't see an easy solution for this , which will help all users across all
time zones to scan only a single partition in the database, when they
queries data for a single transaction date.

And thus my question was, is it necessary to have the creation of
partitions to happen on UTC time zone only? and then whatever transaction
data inserted by the users from respective time zones will be stored in the
database as is and will be queried based on the user timezone (it may span
across multiple partitions though for a single user transaction date).


Is this a buggy behavior?

2024-03-24 Thread sud
Hello All,
Create a table and composite primary key. But to my surprise it allowed me
to have the composite primary key created even if one of the columns was
defined as nullable. But then inserting the NULL into that column erroring
out at the first record itself , stating "not null constraint" is violated.

CREATE TABLE test1
(
c1 varchar(36)   NULL ,
c2 varchar(36)  NOT NULL ,
CONSTRAINT test1_PK PRIMARY KEY (c1,c2)
) ;

-- Table created without any error even one of the columns in the PK was
defined as NULL.

insert into test1 values(null,'123');


*ERROR:  null value in column "c1" of relation "test1" violates not-null
constraintDETAIL:  Failing row contains (null, 123).*

insert into test1 values('123','123');

--works fine as expected


Regards
Sud


Re: Is this a buggy behavior?

2024-03-24 Thread sud
On Sun, Mar 24, 2024 at 8:47 PM Tom Lane  wrote:

> Thiemo Kellner  writes:
> > Am 24.03.2024 um 15:54 schrieb Erik Wienhold:
> >> This is required by the SQL standard: columns of a primary key must be
> >> NOT NULL.  Postgres automatically adds the missing NOT NULL constraints
> >> when defining a primary key.  You can verify that with \d test1 in psql.
>
> > To me, this behaviour, while correct, is not too concise. I wished, that
> > PG issued a warning about a definition conflict. In PostgreSQL, a PK
> > must always be not nullable, so explicitly defining on of a PK's columns
> > as nullable is contradictory, one should get notified of.
>
> To do that, we'd have to remember that you'd said NULL, which we
> don't: the word is just discarded as a noise clause.  Considering
> that this usage of NULL isn't even permitted by the SQL standard,
> that seems like a bit too much work.
>

Do you specifically mean that 'null'  keyword is just not making any sense
here in postgres. But even if that is the case , i tried inserting nothing
(hoping "nothing" is "null" in true sense), but then too it failed in the
first statement while inserting which is fine as per the PK.

But don't you think,in the first place it shouldn't have been allowed to
create the table with one of the composite PK columns being defined as
NULL. And then , while inserting the null record, it should say that the PK
constraint is violated but not the "not null constraint" violated.

CREATE TABLE test1
(
c1 numeric   NULL ,
c2 varchar(36)  NOT NULL ,
CONSTRAINT test1_PK PRIMARY KEY (c1,c2)
) ;

insert into test1(c2) values('123');

*ERROR: null value in column "c1" of relation "test1" violates not-null
constraint DETAIL: Failing row contains (null, 123).*


Grants and privileges issue

2024-03-28 Thread sud
Hi, It's postgres 15.4.

We want to give required privilege to certain users or roles and ensure to
not to provide any elevated privilege. I have below questions,

1)I am seeing in many places, we have "usage on schema" privilege given.
along with "grant select on  to " for the objects of
the schema (something as below). So I wanted to understand, what exact
privilege "grant usage on schema  to " will provide which
the "select on" privilege won't?

grant usage on schema  to ;
grant select on  schema1.tab1 to ;

2)Additionally , when we are trying to give select privilege on "cron" and
"partman" schema to a role (something as below) , so that anybody logging
through that role would be able to see/fetch the data from the tables
inside cron and partman schema. its giving output '*no privileges were
granted for cron/partman/part_config*' message. And during accessing that
object from the cron/partman schema through that role, it errors out with
an access denied message. So I wanted to understand the cause of this and
how we should fix it , such that anybody logging in through that role can
see/fetch the data from the cron and partman schema tables.

grant select on cron.job to ;
grant select on cron.job_run_details to ;
grant select on partman.part_config to ;

Regards
Sud


Re: Grants and privileges issue

2024-03-28 Thread sud
On Fri, Mar 29, 2024 at 2:43 AM Adrian Klaver 
wrote:

>
>
> On 3/28/24 2:10 PM, sud wrote:
> > Hi, It's postgres 15.4.
> >
> > We want to give required privilege to certain users or roles and ensure
> > to not to provide any elevated privilege. I have below questions,
>
> I would suggest spending some time here:
>
> https://www.postgresql.org/docs/current/ddl-priv.html
>
> It should answer many of your questions.
>
>
>
Thank you Adrian.

I think I got the answer for my first question , as the doc says below. So
it means the "*grant usage on schema*" is a must for the user to access the
object within the schema along with the "select on table" access. And with
just "select on table" we won't be able to access the object inside the
schema.

*"For schemas, allows access to objects contained in the schema (assuming
that the objects' own privilege requirements are also met). Essentially
this allows the grantee to “look up” objects within the schema. Without
this permission, it is still possible to see the object names, e.g., by
querying system catalogs. "*

Regarding my second question, I am still unable to find out why we are
seeing "*no privileges were granted for cron/partman/part_config*' message
while adding the grants to the user?


Re: Not able to purge partition

2024-04-01 Thread sud
On Thu, Mar 21, 2024 at 6:18 PM Laurenz Albe 
wrote:

>
> > [create some partitions, then drop a partition of the referenced table]
> >
>
> > SQL Error [P0001]: ERROR: cannot drop table
> schema1.test_part_drop_parent_p2024_02_01 because other objects depend on it
> > CONTEXT: SQL statement "DROP TABLE
> schema1.test_part_drop_parent_p2024_02_01"
>
> That's normal.  If you create a foreign key constraint to a partitioned
> table, you
> can no longer drop a partition of the referenced table.
>
> What you *can* do is detach the partition and then drop it, but detatching
> will

be slow because PostgreSQL has to check for referencing rows.
>


*The best solution is to create the foreign key *not* between the
partitioned*
*tables, but between the individual table partitions.  *

Interesting, even my thought was that the detach+drop parent partition will
only look into the specific child partition but not the whole child table.

However, out of curiosity, does this default foreign key setup i.e. foreign
keys between the table (but not between the partitions) also make the data
load into the child partitions slower ( as it must be then looking and
validating the presence of the keys across all the partitions of the parent
table)?


Timestamp conversion Error in dynamic sql script

2024-04-01 Thread sud
Hello ,
I am trying to create a block which will create a few partitions
dynamically and also insert ~1million rows into each of those partitions.
Not able to figure out why it's giving below error during timezone
conversion while defining the partitions even though I used the typecast?

CREATE TABLE parent_table (
id Numeric,
col1 TEXT,
col2 TEXT,
partition_key TIMESTAMP,
primary key (partition_key, id)
)
PARTITION BY RANGE (partition_key);

**

DO $$
DECLARE
start_date TIMESTAMP := '2022-01-01';
begin
FOR i IN 0..10 LOOP

EXECUTE format('
CREATE TABLE parent_table_%s (
CHECK (partition_key >= DATE ''%s'' AND partition_key < DATE ''%s''
)
) INHERITS (parent_table);',
TO_CHAR(start_date + i, '_MM_DD'),
TO_CHAR(start_date + i, '-MM-DD')::timestamp ,
TO_CHAR(start_date + i + INTERVAL '1 day', '-MM-DD')::timestamp
);
EXECUTE format('
ALTER TABLE parent_table ATTACH PARTITION parent_table_%s
FOR VALUES FROM (''%s'') TO (''%s'');',
TO_CHAR(start_date + i, '_MM_DD'),
TO_CHAR(start_date + i, '-MM-DD')::timestamp,
TO_CHAR(start_date + i + INTERVAL '1 day', '-MM-DD') ::timestamp
);

END LOOP;

  FOR i IN 0..10 LOOP
EXECUTE format('
INSERT INTO parent_table_%s (id,col1, col2,  partition_key)
SELECT
generate_series(1, 100),
md5(random()::text),
md5(random()::text),
TIMESTAMP ''%s'' + INTERVAL ''%s days''
FROM generate_series(1, 100);',
TO_CHAR(start_date + i, '_MM_DD'),
start_date,
i
);
END LOOP;
END $$;

***




*SQL Error [42883]: ERROR: operator does not exist: timestamp without time
zone + integerHint: No operator matches the given name and argument types.
You might need to add explicit type casts.Where: PL/pgSQL function
inline_code_block line 7 at EXECUTEError position:*


Re: Timestamp conversion Error in dynamic sql script

2024-04-02 Thread sud
On Tue, Apr 2, 2024 at 7:46 PM Greg Sabino Mullane 
wrote:

> 1. Declare start_date as DATE when you want to add days with date + int
>> 2. Keep TIMESTAMP and use start_date + make_interval(days => i)
>>
>
> Also
>
> 0. Use TIMESTAMPTZ not TIMESTAMP
>
>
> Thank you so much. That helped.

Now this block seems to be failing near the "LIKE" operator. Isn't it
allowed to add the check constraints along with the CREATE TABLE statement?


*SQL Error [42601]: ERROR: syntax error at or near "LIKE"Where: PL/pgSQL
function inline_code_block line 8 at EXECUTE*
*Error position*

CREATE TABLE parent_table (
id Numeric,
col1 TEXT,
col2 TEXT,
partition_key TIMESTAMP,
primary key (partition_key, id)
)
PARTITION BY RANGE (partition_key);

*

DO $$
DECLARE
start_date TIMESTAMPtz := '2022-01-01';
begin
FOR i IN 0..10 LOOP

EXECUTE format('
CREATE TABLE parent_table_%s (
CHECK (partition_key >=  ''%s'' AND partition_key <  ''%s'' )
   * ) LIKE (parent_table including all);',*
   TO_CHAR(start_date + make_interval(days=>i),'_MM_DD'),
(start_date + make_interval(days=>i))::timestamptz ,
(start_date + make_interval(days=>i))::timestamptz
);

EXECUTE format('
ALTER TABLE parent_table ATTACH PARTITION parent_table_%s
FOR VALUES FROM (''%s'') TO (''%s'');',
TO_CHAR(start_date + make_interval(days=>i),'_MM_DD'),
(start_date + make_interval(days=>i))::timestamptz ,
(start_date + make_interval(days=>i))::timestamptz
);

END LOOP;

  FOR i IN 0..10 LOOP
EXECUTE format('
INSERT INTO parent_table_%s (id,col1, col2,  partition_key)
SELECT
generate_series(1, 100),
md5(random()::text),
md5(random()::text),
''%s''
   FROM generate_series(1, 100);',TO_CHAR(start_date +
make_interval(days=>i),'_MM_DD'),

(start_date + make_interval(days=>i))::timestamptz);

END LOOP;

END $$;


*SQL Error [42601]: ERROR: syntax error at or near "LIKE"Where: PL/pgSQL
function inline_code_block line 8 at EXECUTE*
*Error position: *


Re: Timestamp conversion Error in dynamic sql script

2024-04-03 Thread sud
This one worked. Thank you so much.

On Wed, Apr 3, 2024 at 2:27 AM Erik Wienhold  wrote:

> On 2024-04-02 22:08 +0200, sud wrote:
> > On Tue, Apr 2, 2024 at 7:46 PM Greg Sabino Mullane 
> > wrote:
> >
> > Now this block seems to be failing near the "LIKE" operator. Isn't it
> > allowed to add the check constraints along with the CREATE TABLE
> statement?
> >
> > [...]
> >
> > EXECUTE format('
> > CREATE TABLE parent_table_%s (
> > CHECK (partition_key >=  ''%s'' AND partition_key <  ''%s'' )
> >* ) LIKE (parent_table including all);',*
> >TO_CHAR(start_date + make_interval(days=>i),'_MM_DD'),
> > (start_date + make_interval(days=>i))::timestamptz ,
> > (start_date + make_interval(days=>i))::timestamptz
> > );
> >
> > [...]
> >
> > *SQL Error [42601]: ERROR: syntax error at or near "LIKE"Where: PL/pgSQL
> > function inline_code_block line 8 at EXECUTE*
> > *Error position: *
>
> The LIKE clause goes inside the parenthesis along with the column and
> constraint definitions, i.e.:
>
> CREATE TABLE parent_table__mm_dd (
> LIKE parent_table,
> CHECK (...)
> );
>
> --
> Erik
>


Monitoring and debugging historical performance

2024-04-27 Thread sud
Hi All,
While looking option to see how one can monitor database performance for
debugging into historical database performance issues (historical database
wait events , CPU/memory utilization in past, query execution times in past
, executions paths at any point in time in past etc), It appears there
exists some options as i googled in the internet like
pg_sentinel,pgsnapper, pg_collector. Wanted to check, if anybody used these
utilities and suggest any of them to use for a longer term use?

Regards
Sud


Adding constraints faster

2024-05-14 Thread sud
Hi,
It's postgres version 15.4. We want to create foreign keys on three
different partitioned tables which already have data in them in production.
They all are referring to the same parent table which is also partitioned.
All the tables(both parent and child) are having ~2TB+ in size each and
having ~100 million rows in each of the partitions. These are range
partitioned on truncated date columns and the total number of partitions in
the tables is around 30 in each of them.

It's easy to create the foreign key on the new partitions of the child
table which are blank and going to be filled with data in future, however
adding the foreign key on the existing partition with data is going to take
time. We tried with one sample partition with existing data in it and it
took ~20minutes. So this way , it's going to take a long time and we may
not have the application down for such a long time.

I have the following questions.
To make this activity faster we were thinking of using the "NOT VALID"
option. I.e create the foreign key constraints on the existing partitions
with "NOT VALID" option and create the foreign key on the blank future
partitions with the VALID option. Is this okay? As because we also see in
some documents stating that , if the foreign key is in the "NOT VALID"
state ,optimizer won't be using it for estimating the row counts during
making join cardinality estimation, so want to understand from experts if
its fine or we have to make that foreign key constraints "VALID" anyway,
even if that runs longer?

We also tried to set the max_parallel_workers_per_gather to 8 and then run
the "validate constraint" step but that is still running in a single thread
only. So wondering if we have any other options available to make this
foreign key addition faster with existing data in it?

**
ALTER TABLE ADD FOREIGN KEY ... NOT VALID.
ALTER TABLE ... VALIDATE CONSTRAINT;

Regards
Sud


Re: Adding constraints faster

2024-05-14 Thread sud
On Wed, May 15, 2024 at 2:09 AM Ron Johnson  wrote:

> On Tue, May 14, 2024 at 3:59 PM sud  wrote:
>
>> *
>>
> ALTER TABLE ADD FOREIGN KEY ... NOT VALID.
>> ALTER TABLE ... VALIDATE CONSTRAINT;
>>
>>
> This is what we did, back in the PG 12.x period.  VALIDATE CONSTRAINT was
> almost instantaneous.  (Supporting indices existed, though.)
>
>
Thank you. Actually we do have a composite index on the child table columns
which is referring to the parent table. And in the parent table those
columns are the primary keys. So even then it's taking ~20minutes for each
partition.

So is there some other way to make it run faster
or
 Can we run it concurrently
or
Should we keep the existing constraints in "not valid" state only?


Long running query causing XID limit breach

2024-05-22 Thread sud
Hello ,
It's RDS postgres version 15.4. We suddenly saw the
"MaximumUsedTransactionIDs" reach to ~1.5billion and got alerted by team
members who mentioned the database is going to be in shutdown/hung if this
value reaches to ~2billion and won't be able to serve any incoming
transactions. It was a panic situation.

I have heard of it before , because of the way postgres works and the XID
being a datatype of length 32 bit integer can only represent (2^32)/2=~2
billion transactions. However, as RDS performs the auto vacuum , we thought
that we need not worry about this issue. But it seems we were wrong. And we
found one adhoc "SELECT '' query was running on the reader instance since
the last couple of days and when that was killed, the max xid
(MaximumUsedTransactionIDs) dropped to 50million immediately.

So I have few questions,

1)This system is going to be a 24/7 up and running system which will
process ~500million business transactions/day in future i.e. ~4-5billion
rows/day inserted across multiple tables each day. And as I understand each
row will have XID allocated. So in that case , does it mean that, we will
need (5billion/24)=~200million XID/hour and thus , if any such legitimate
application "SELECT" query keeps running for ~10 hours (and thus keep the
historical XID alive) , then it can saturate the
"MaximumUsedTransactionIDs" and make the database standstill in
2billion/200million=~10hrs. Is this understanding correct? Seems we are
prone to hit this limit sooner going forward.

2)We have some legitimate cases where the reporting queries can run for
5-6hrs. So in such cases if the start of this SELECT query happen at 100th
XID on table TAB1, then whatever transactions happen after that time,
across all other tables(table2, table3 etc) in the database won't get
vacuum until that SELECT query on table1 get vacuumed(as database will try
to keep that same 100th XID image) and the XID will just keep incrementing
for new transaction, eventually reaching the max limit. Is my understanding
correct here?

3)Although RDS does the auto vacuum by default. but should we also consider
doing manual vacuum without impacting ongoing transactions? Something as
below options
vacuum freeze tab1;
vacuum freeze;
vacuum;
vacuum analyze tab1;
vacuum tab1;

4)Had worked in past in oracle database where the similar transaction
identifier is called as "system change number" , but never encountered that
being exhausted and also there it used to have UNDO record and if a SELECT
query needs anything beyond certain limit(set undo_retention parameter) the
select query used to fail with snapshot too old error but not impacting any
write transactions. But in postgres it seems nothing like that happens and
every "Select query" will try to run till its completion without any such
failure, until it gets skilled by someone. Is my understanding correct?

 And in that case, It seems we have to mandatorily set "statement_timeout"
to some value e.g. 4hrs(also i am not seeing a way to set it for any
specific user level, so it will be set for all queries including
application level) and also "idle_in_transaction_session_timeout" to
5minutes, even on all the prod and non prod databases, to restrict the long
running transactions/queries and avoid such issues in future. Correct me if
I'm wrong.

Regards
Sud


Re: Long running query causing XID limit breach

2024-05-22 Thread sud
On Thu, May 23, 2024 at 9:00 AM Muhammad Salahuddin Manzoor <
salahuddi...@bitnine.net> wrote:

> Greetings,
>
> In high-transaction environments like yours, it may be necessary to
> supplement this with manual vacuuming.
>
> Few Recommendations
>
> Monitor Long-Running Queries try to optimize.
> Optimize Autovacuum.
> Partitioning.
> Adopt Vacuum Strategy after peak hours.
>
> We have these big tables already partitioned. So does "vacuum table_name"
will endup scanning whole table or just the latest/live partition which is
getting loaded currently? and do you mean to say running command "vacuum
table_name;" frequently on selective tables that are experiencing heavy DML
? Hope this won't lock the table anyway because the data will be
written/read from these tables 24/7.

When you say, "optimize autovacuum" does it mean to set a higher value
of "autovacuum_max_workers"
and "autovacuum_freeze_max_age"?

Considering we have ~4 billion rows inserted daily into the table and there
is limit of ~2billion to the "Maximumusedtxnids", what threshold should we
set for the alerting and to have enough time at hand to fix this issue?


Re: Long running query causing XID limit breach

2024-05-22 Thread sud
On Thu, May 23, 2024 at 10:42 AM Muhammad Salahuddin Manzoor <
salahuddi...@bitnine.net> wrote:

> Greetings,
>
> Running `VACUUM table_name;` on a partitioned table will vacuum each
> partition individually, not the whole table as a single unit.
>
> Yes, running `VACUUM table_name;` frequently on tables or partitions with
> heavy DML is recommended.
>
> Regular `VACUUM` does not lock the table for reads or writes, so it won't
> disrupt ongoing 24/7 data operations.
>
> "optimize autovacuum"
> Yes. Adjust following parameters as per your system/environment
> requirement
> autovacuum_max_workers,
> autovacuum_freeze_max_age ,
> autovacuum_vacuum_cost_delay
>
> Following need to be first tested thoroughly in a test environment.
> Recommended Alert Threshold
> Alert at 50% Usage: Set the alert threshold at 1 billion used XIDs. This
> provides a significant buffer, giving you ample time to take corrective
> action before reaching the critical limit.
>
> Calculation Rationale
> Daily XID Usage: Approximately 4 billion rows per day implies high XID
> consumption.
> Buffer Time: At 1 billion XIDs, you would still have 1 billion XIDs
> remaining, giving you roughly 12 hours to address the issue if your system
> consumes 200 million XIDs per hour.
>
>
>
Thank you so much. That helps.
So apart from setting these alerts on "Maximumusedtxnids" and making the
vacuum optimized by tweaking above parameters, should we also need to have
monitoring in place to ensure the Vacuum is not taking longer as compared
to its normal runtime and also if it's getting blocked/failed by something?
Like for example in our case where the select query was running longer , so
the vacuum must not be able to succeed every time it attempts, so is it
really worth having that level of alerting?  and also how can we get an
idea regarding if the vacuum is not succeeding or getting failed etc to
avoid such upcoming issues?


Re: Long running query causing XID limit breach

2024-05-22 Thread sud
Also,if i am getting it correct, it means we should not run any transaction
(even if it's legitimate one like for e.g. a big Reporting "SELECT" query)
beyond 10hrs, as that will end up consuming 10*200million XID per hour=
2billion XID limit saturation and thus causing system failure. Hope my
understanding is correct here.

On Thu, May 23, 2024 at 11:41 AM sud  wrote:

>
> On Thu, May 23, 2024 at 10:42 AM Muhammad Salahuddin Manzoor <
> salahuddi...@bitnine.net> wrote:
>
>> Greetings,
>>
>> Running `VACUUM table_name;` on a partitioned table will vacuum each
>> partition individually, not the whole table as a single unit.
>>
>> Yes, running `VACUUM table_name;` frequently on tables or partitions with
>> heavy DML is recommended.
>>
>> Regular `VACUUM` does not lock the table for reads or writes, so it won't
>> disrupt ongoing 24/7 data operations.
>>
>> "optimize autovacuum"
>> Yes. Adjust following parameters as per your system/environment
>> requirement
>> autovacuum_max_workers,
>> autovacuum_freeze_max_age ,
>> autovacuum_vacuum_cost_delay
>>
>> Following need to be first tested thoroughly in a test environment.
>> Recommended Alert Threshold
>> Alert at 50% Usage: Set the alert threshold at 1 billion used XIDs. This
>> provides a significant buffer, giving you ample time to take corrective
>> action before reaching the critical limit.
>>
>> Calculation Rationale
>> Daily XID Usage: Approximately 4 billion rows per day implies high XID
>> consumption.
>> Buffer Time: At 1 billion XIDs, you would still have 1 billion XIDs
>> remaining, giving you roughly 12 hours to address the issue if your system
>> consumes 200 million XIDs per hour.
>>
>>
>>
> Thank you so much. That helps.
> So apart from setting these alerts on "Maximumusedtxnids" and making the
> vacuum optimized by tweaking above parameters, should we also need to have
> monitoring in place to ensure the Vacuum is not taking longer as compared
> to its normal runtime and also if it's getting blocked/failed by something?
> Like for example in our case where the select query was running longer , so
> the vacuum must not be able to succeed every time it attempts, so is it
> really worth having that level of alerting?  and also how can we get an
> idea regarding if the vacuum is not succeeding or getting failed etc to
> avoid such upcoming issues?
>
>


Re: Long running query causing XID limit breach

2024-05-23 Thread sud
On Thu, May 23, 2024 at 1:22 PM Laurenz Albe 
wrote:

> On Thu, 2024-05-23 at 02:46 +0530, sud wrote:
> > It's RDS postgres version 15.4. We suddenly saw the
> "MaximumUsedTransactionIDs"
> > reach to ~1.5billion and got alerted by team members who mentioned the
> database
> > is going to be in shutdown/hung if this value reaches to ~2billion and
> won't be
> > able to serve any incoming transactions. It was a panic situation.
> >
> > I have heard of it before , because of the way postgres works and the
> XID being
> > a datatype of length 32 bit integer can only represent (2^32)/2=~2
> billion
> > transactions. However, as RDS performs the auto vacuum , we thought that
> we need
> > not worry about this issue. But it seems we were wrong. And we found one
> adhoc
> > "SELECT '' query was running on the reader instance since the last
> couple of
> > days and when that was killed, the max xid (MaximumUsedTransactionIDs)
> dropped
> > to 50million immediately.
>
> This has nothing to do with autovacuum running.
> PostgreSQL won't freeze any rows above the xmin horizon (see the
> "backend_xmin"
> column in "pg_stat_activity").
>
> > So I have few questions,
> >
> > 1)This system is going to be a 24/7 up and running system which will
> process
> >   ~500million business transactions/day in future i.e. ~4-5billion
> rows/day
> >   inserted across multiple tables each day. And as I understand each row
> will
> >   have XID allocated. So in that case , does it mean that, we will need
> >   (5billion/24)=~200million XID/hour and thus , if any such legitimate
> >   application "SELECT" query keeps running for ~10 hours (and thus keep
> the
> >   historical XID alive) , then it can saturate the
> "MaximumUsedTransactionIDs"
> >   and make the database standstill in 2billion/200million=~10hrs. Is this
> >   understanding correct? Seems we are prone to hit this limit sooner
> going forward.
>
> Yes, that is correct.  You cannot run such long-running queries with a
> transaction rate like that.
>
>
When you mean transaction ,does it mean one commit ? For example if it's
inserting+committing ~1000 rows in one batch then all the 1000 rows will be
marked as one XID rather than 1000 different XID. and so we should look for
batch processing rather than row by row types processing. Is the
understanding correct?


> One thing you could consider is running the long-running queries on a
> standby
> server.  Replication will get delayed, and you have to keep all the WAL
> around for the standby to catch up once the query is done, but it should
> work.
> You'd set "max_streaming_standby_delay" to -1 on the standby.
>
>
We have the "Select query" running on a reader instance , but still the
writer instance was showing up "MaximumUsedTransactionIDs" reaching
1.5billion, so it means both the instance as part of same cluster so
sharing same XIDs, and as per your suggestion we should run this in
separate standby cluster altogether which does not share same XID. Is this
understanding correct? or it can be handled even with another reader
instance by just tweaking some other parameter so that they won't share the
same XID?


Re: Long running query causing XID limit breach

2024-05-23 Thread sud
On Thu, May 23, 2024 at 1:45 PM Laurenz Albe 
wrote:

>
>
> If a long running query on the standby influences the primary, that means
> that
> you have "hot_standby_feedback" set to "on".  Set it to "off".
>
>
> Will the setting up of "hot_standby_feedback" value to OFF will cause the
reader instance to give incorrect query results or unexpected query failure
which will be potential inconsistency between the writer and reader
instance, as because those XID's can be removed/cleaned by the writer node
even if its being read by the reader instance query. And it can have more
replication lag. So I'm wondering , if this setup is advisable one?


Re: Long running query causing XID limit breach

2024-05-23 Thread sud
On Thu, May 23, 2024 at 8:11 PM Laurenz Albe 
wrote:

> On Thu, 2024-05-23 at 18:15 +0530, sud wrote:
> > On Thu, May 23, 2024 at 1:45 PM Laurenz Albe 
> wrote:
> > > If a long running query on the standby influences the primary, that
> means that
> > > you have "hot_standby_feedback" set to "on".  Set it to "off".
> >
> > Will the setting up of "hot_standby_feedback" value to OFF will cause the
> > reader instance to give incorrect query results or unexpected query
> failure
> > which will be potential inconsistency between the writer and reader
> instance,
> > as because those XID's can be removed/cleaned by the writer node even if
> its
> > being read by the reader instance query. And it can have more
> replication lag.
>
> There will never be incorrect query results.
>
> It can happen that a query on the standby gets canceled if you don't set
> "max_standby_streaming_delay" to -1, but that can happen even if
> "hot_standby_feedback" is "on".  It just happens less often.
>
> The effect of setting "max_standby_streaming_delay" to -1 will often be a
> replication delay if you run a long query.  That's what you have to
> accept if you want to execute long-running queries.
>
> You will never be able to have both of the following:
> - queries never get canceled
> - there is no replication delay
>
>
>
I am trying to understand these two parameters and each time it looks a bit
confusing to me. If These two parameters complement or conflict with each
other.

Say for example, If we set hot_feedback_standby to ON (which is currently
set as default ON by the way), it will make the primary wait till the query
completion at standby and can cause such a high bump in XID in scenarios
where the query on standby runs for days(like in our current scenario which
happens). So we were thinking of setting it as OFF, to avoid
the transaction ID wrap around issue..

But as you also mentioned to set the "max_standby_streaming_delay" to -1
(which is currently set as 14 second in our case) ,it will wait infinitely
, till the query completes on the standby and wont apply the WAL which can
cause override of the XID which the standby query is reading from. But wont
this same behaviour be happening while we have hot_feedback_standby set as
"ON"?

But again for HA , in case primary down we should not be in big lag for the
standby and thus we want the standby also with minimal lag. And as you
mentioned there will never be incorrect results but at amx it will be query
cancellation, so I was thinking , if it's fine to just keep the
"hot_feedback_standby" as OFF and let the max_standby_streaming_delay set
as it is like 14 sec. Let me know your thoughts.

Basically below are the combinations, i am confused between..

hot_feedback_stanby ON and max_standby_streaming_delay=-1
or
hot_feedback_stanby OFF and max_standby_streaming_delay=-1
Or
hot_feedback_stanby ON and max_standby_streaming_delay=14 sec
Or
hot_feedback_stanby OFF and max_standby_streaming_delay=14 sec


Re: Long running query causing XID limit breach

2024-05-25 Thread sud
On Sun, May 26, 2024 at 2:24 AM yudhi s  wrote:

>
>
> *hot_standby_feedback ON and max_standby_streaming_delay = -1:*
> Ensures that long-running queries on the standby are not interrupted. The
> primary waits indefinitely to avoid vacuuming rows needed by standby
> queries.
> But Can lead to significant replication lag and increased XID consumption
> on the primary, potentially causing transaction ID wraparound issues.
>
>
> *hot_standby_feedback OFF and max_standby_streaming_delay = -1:*
> Ensures long-running queries on the standby are not interrupted. No
> feedback is sent to the primary, reducing the risk of XID wraparound.
> But The standby may fall significantly behind the primary, resulting in
> high replication lag.
>
>
>
> *hot_standby_feedback ON and max_standby_streaming_delay = 14 seconds:*
> The primary prevents vacuuming rows needed by standby queries, reducing
> query cancellations on the standby. The replication lag is limited to 14
> seconds.
> But Long-running queries on the standby that exceed 14 seconds may be
> canceled, and the primary can still experience increased XID consumption.
>
>
> *hot_standby_feedback OFF and max_standby_streaming_delay = 14 seconds:*
>  Limits replication lag to 14 seconds and reduces XID consumption on the
> primary. Queries on the standby exceeding 14 seconds are canceled.
> but Long-running queries on the standby are more likely to be canceled due
> to the lack of feedback to the primary.
>
>
>

Thank you so much.
Does it mean that the last one we should go for i.e. (*hot_standby_feedback
OFF and max_standby_streaming_delay = 14 seconds), *as because high
availability is also a key requirement in any production environment, so
keeping 14 seconds lag is kind of okay and also at the same time
keeping hot_standby_feedback OFF will make sure the transaction id
wraparound around won't happen because of any long running query on standby
as it won't wait for the stand by feedback for vacuuming the tables.

But i have one question here , does max_standby_streaming_delay = 14 ,
means the queries on the standby will get cancelled after 14 seconds?


Re: Long running query causing XID limit breach

2024-05-26 Thread sud
On Sun, May 26, 2024 at 1:43 PM Torsten Förtsch 
wrote:

> On Sat, May 25, 2024 at 11:00 PM sud  wrote:
>
>>
>> But i have one question here , does max_standby_streaming_delay = 14 ,
>> means the queries on the standby will get cancelled after 14 seconds?
>>
>
> No, your query gets cancelled when it stalls replication for >14 sec. If
> your master is idle and does not send any WAL and the replica has
> caught up, the query can take as long as it wants.
>

Thank you so much.
For example , in below scenario,
if i have insert query going on on primary instance on table  25th may
partition of TABLE1, and at same time we are selecting data from 24th May
partition , then with "max_standby_streaming_delay = 14" setup , it just
allows the select query to run for any duration without any restriction
even if the WAL gets applied on the standby regularly. Also INSERT query in
primary won't make the standby SELECT queries to cancel as because the WAL
record of INSERT queries on the primary instance is not conflicting to the
exact rows those were being read by the standby. Is my understanding
correct here?

However, if i have Update/Delete query going on on primary instance on
table  25th may partition of TABLE1 and on the exact same set of rows which
were being read by the standby instance by the SELECT query, then the
application of such WAL record to standby can max wait for 14 seconds and
thus those select query are prone to be cancelled after 14 seconds. Is this
understanding correct?

If the above is true then it doesn't look good, as because in an OLTP
system there will be a lot of DMLS happening on the writer instances and
there may be many queries running on the reader/standby instances which are
meant to run for hours. And if we say making those SELECT queries run for
hours means compromising an hour of "high availability"/RPO or a lag of an
hour between primary and standby , that doesn't look good. Please
correct me if I am missing something here.


Re: Long running query causing XID limit breach

2024-05-26 Thread sud
On Sun, May 26, 2024 at 11:18 PM Torsten Förtsch 
wrote:

> Each query on the replica has a backend_xmin. You can see that in
> pg_stat_activity. From that backend's perspective, tuples marked as deleted
> by any transaction greater or equal to backend_xmin are still needed. This
> does not depend on the table.
>
> Now, vacuum writes to the WAL up to which point it has vacuumed on the
> master. In pg_waldump this looks like so:
>
> PRUNE snapshotConflictHorizon: 774, nredirected: 0, ndead: 5, nunused: 0,
> redirected: [], dead: [2, 4, 6, 8, 10], unused: [], blkref #0: rel
> 1663/5/16430 blk 0
>
> That snapshotConflictHorizon is also a transaction id. If the backend_xmin
> of all backends running transactions in the same database (the 5 in 16
> 63/5/16430) -as the vacuum WAL record is greater than vacuum's
> snapshotConflictHorizon, then there is no conflict. If any of the
> backend_xmin's is less, then there is a conflict.
>
> This type of conflict is determined by just 2 numbers, the conflict
> horizon sent by the master in the WAL, and the minimum of all
> backend_xmins. For your case this means a long running transaction querying
> table t1 might have a backend_xmin of 223. On the master update and delete
> operations happen on table T2. Since all the transactions on the master are
> fast, when vacuum hits T2, the minimum of all backend_xmins on the master
> might already be 425. So, garbage left over by all transactions up to 424
> can be cleaned up. Now that cleanup record reaches the replica. It compares
> 223>425 which is false. So, there is a conflict. Now the replica can wait
> until its own horizon reaches 425 or it can kill all backends with a lower
> backend_xmin.
>
> As I understand, hot_standby_feedback does not work for you. Not sure if
> you can run the query on the master? That would resolve the issues but
> might generate the same bloat on the master as hot_standby_feedback.
> Another option I can see is to run long running queries on a dedicated
> replica with max_standby_streaming_delay set to infinity or something large
> enough. If you go that way, you could also fetch the WAL from your
> WAL archive instead of replicating from the master. That way the replica
> has absolutely no chance to affect the master.
>
>
Thank you so much.

Would you agree that we should have two standby, one with default
max_standby_streaming_delay (say 10 sec ) which will be mainly used as high
availability and thus will be having minimal lag. and another standby with
max_standby_streaming_delay as "-1" i.e. it will wait indefinitely for the
SELECT queries to finish without caring about the lag, which will be
utilized for the long running SELECT queries.

And keep the hot_standby_feedback as ON for the first standby which is used
as HA/high availability. And keep the hot_standby_feedback as OFF for the
second standby which is utilized for long running SELECT queries, so that
primary won't be waiting for the response/feedback from this standby to
vacuum its old transactions and that will keep the transaction id wrap
around issue from not happening because of the Read/Select queries on any
of the standby.


Re: Long running query causing XID limit breach

2024-05-26 Thread sud
On Mon, May 27, 2024 at 12:55 AM Torsten Förtsch 
wrote:

> On Sun, May 26, 2024 at 8:46 PM sud  wrote:
>
>> Would you agree that we should have two standby, one with default
>> max_standby_streaming_delay (say 10 sec ) which will be mainly used as high
>> availability and thus will be having minimal lag. and another standby with
>> max_standby_streaming_delay as "-1" i.e. it will wait indefinitely for the
>> SELECT queries to finish without caring about the lag, which will be
>> utilized for the long running SELECT queries.
>>
>> And keep the hot_standby_feedback as ON for the first standby which is
>> used as HA/high availability. And keep the hot_standby_feedback as OFF for
>> the second standby which is utilized for long running SELECT queries, so
>> that primary won't be waiting for the response/feedback from this standby
>> to vacuum its old transactions and that will keep the transaction id wrap
>> around issue from not happening because of the Read/Select queries on any
>> of the standby.
>>
>
> Sure. That could work. Perhaps also set statement_timeout on the first
> replica, just in case.
>

Thank you so much. Yes, planning to set it like below. Hope i am doing it
correctly.

Master/Primary First Replica/Standby for High Availability Second Replica
for Reporting
hot_standby_feedback=ON hot_standby_feedback=ON hot_standby_feedback=OFF
max_standby_streaming_delay=10 sec max_standby_streaming_delay=10 sec
max_standby_streaming_delay=-1
(Infinite)
statement_timeout = "2hrs" statement_timeout="2hrs" No statement_timeout
i.e. infinite
idle_in_transaction_session_timeout=10minutes
idle_in_transaction_session_timeout=10minutes No
idle_in_transaction_session_timeout i.e. infinite
autovacuum_freeze_max_age=100M autovacuum_freeze_max_age=100M
autovacuum_freeze_max_age=100M
Log_autovacuum_min_duration=0 Log_autovacuum_min_duration=0
Log_autovacuum_min_duration=0


Re: Long running query causing XID limit breach

2024-06-04 Thread sud
Hello Laurenz,

Thank you so much.This information was really helpful for us
understanding the working of these parameters.

One follow up question i have , as we are setting one of the
standby/replica with value idle_in_transaction_session_timeout=-1 which can
cause the WAL's to be heavily backlogged in a scenario where we have a
query running for very long time on that instance. So in that case will
there be chances of instance restart and if that can be avoided anyway?

And the plan is to set these system parameters with different values in
writer/read replica , so in that case if we apply the "alter system"
command on the primary , won't the WAL going to apply those same commands
forcibly on reader instance making those same as the writer instance
configuration( but we want the reader replica configuration to be different
from writer)?

Appreciate your guidance.

On Wed, May 29, 2024 at 1:38 PM Laurenz Albe 
wrote:

> On Wed, 2024-05-29 at 01:34 +0530, yudhi s wrote:
> > > The only way you can have no delay in replication AND no canceled
> queries is
> > > if you use two different standby servers with different settings for
> > > "max_standby_streaming_delay".  One of the server is for HA, the other
> for
> > > your long-running queries.
> >
> > When you suggest having different max_standby_streaming_delay for first
> replica
> > (say 10 sec for High availability) and second replica(say -1 for long
> running queries).
> > Do you also suggest  keeping "hot_feedback_standby" as "OFF" for all the
> three
> > instances i.e. master and both the replicas?
>
> The parameter is ignored on the master.
> It needs to be off on the standby that is running long queries.
> For the other standby it probably doesn't matter if you are not running any
> queries on it.  I would leave "hot_standby_feedback = off" there as well.
>
> Actually, I would set "hot_standby = off" on the standby that is only used
> for HA.
>
>
> - I would leave "hot_standby_feedback" off everywhere.
> - "max_standby_streaming_delay" should be -1 on the reporting standby and
> very
>   low or 0 on the HA standby. It doesn't matter on the primary.
> - "statement_timeout" should be way lower on the first two nodes.
> - "idle_in_transaction_session_timeout" is good.
> - I would leave "autovacuum_freeze_max_age" at the default setting but 100
> million
>   is ok too.
>
> Yours,
> Laurenz Albe
>


Re: Long running query causing XID limit breach

2024-06-05 Thread sud
On Wed, 5 Jun, 2024, 12:39 pm Simon Elbaz,  wrote:

> Hi,
>
> I am following this very interesting thread.
>
> From the documentation
> https://www.postgresql.org/docs/current/runtime-config-client.html#GUC-IDLE-IN-TRANSACTION-SESSION-TIMEOUT,
> the 0 value will disable the timeout (not -1).
>
>
>
> On Wed, Jun 5, 2024 at 8:25 AM sud  wrote:
>
>> Hello Laurenz,
>>
>> Thank you so much.This information was really helpful for us
>> understanding the working of these parameters.
>>
>> One follow up question i have , as we are setting one of the
>> standby/replica with value idle_in_transaction_session_timeout=-1 which can
>> cause the WAL's to be heavily backlogged in a scenario where we have a
>> query running for very long time on that instance. So in that case will
>> there be chances of instance restart and if that can be avoided anyway?
>>
>> And the plan is to set these system parameters with different values in
>> writer/read replica , so in that case if we apply the "alter system"
>> command on the primary , won't the WAL going to apply those same commands
>> forcibly on reader instance making those same as the writer instance
>> configuration( but we want the reader replica configuration to be different
>> from writer)?
>>
>> Appreciate your guidance.
>>
>>
>
My apologies. I was meant to say setting up "max_standby_streaming_delay"
To -1. Which means unlimited lag.

>


Re: Long running query causing XID limit breach

2024-06-08 Thread sud
On Thu, Jun 6, 2024 at 12:52 AM yudhi s  wrote:

> On Wed, Jun 5, 2024 at 3:52 PM Laurenz Albe 
> wrote:
>
>>
>> There should never be a restart unless you perform one or the standby
>> crashes.
>> If you mean that you want to avoid a crash caused by a full disk on the
>> standby,
>> the answer is probably "no".  Make sure that you have enough disk space
>> and
>> use monitoring.
>>
>> Yours,
>> Laurenz Albe
>>
>
> Is this because OP initially mentioned its RDS postgres, so in that case
> there is storage space restriction on 64TB(and 128TB in case of aurora
> postgres). So I believe this storage space combines data + WAL , so in that
> case as you mentioned, appropriate monitoring needs to be put in place.
> Or else in the worst case scenario, if the storage consumption hit that
> hard limit , then there will be instance restart or crash?
>

Thank You so much Laurenz and Yudhi.

Yes its RDS and as you mentioned there does exist a space limitation of
~64TB but as Laurenz mentioned the only time the second standby may crash
would be probably because of  the storage space saturation and thus we need
to have appropriate monitoring in place to find this and get alerted
beforehand. And also a monitoring to see how much WAL gets generated per
hour/day to get an idea of the usage. I am not sure how to do it , but will
check on this.


Re: Creating big indexes

2024-06-08 Thread sud
On Sat, Jun 8, 2024 at 12:53 PM Lok P  wrote:

> Hello,
> We have a few tables having size ~5TB and are partitioned on a timestamp
> column. They have ~90 partitions in them and are storing 90 days of data.
> We want to create a couple of indexes on those tables. They are getting the
> incoming transactions(mainly inserts) 24/7 , which are mostly happening on
> the current day/live partition. Its RDS postgres version 15.4. So in this
> situation
>
> Should we go with below i.e one time create index command on the table..
>
> CREATE INDEX CONCURRENTLY idx1 ON tab(column_name);
> Or
> create index on individual partitions from different sessions, say for
> example create indexes on 30 partitions each from three different sessions
> so as to finish all the 90 partitions faster?
> CREATE INDEX CONCURRENTLY idx1 ON tab_part1(column_name);
> CREATE INDEX CONCURRENTLY idx1 ON tab_part2(column_name);
> .
> .
>
> Basically I have three questions:
> 1)If we can do this index creation activity online without impacting the
> incoming transactions or do we have to take down time for this activity?
> 2)If we can't do it online then , what is the fastest method to do this
> index creation activity ?
> 3)Should we change the DB parameters in a certain way to make the process
> faster? We have currently set below parameters
>
> max_parallel_workers-16
> max_parallel_maintenance_workers-2
> maintenance_work_mem- 4GB
>
>
>
 You can first create the index on the table using the "On ONLY"keyword,
something as below.

CREATE INDEX idx ON ONLY tab(col1);

Then create indexes on each partition in "concurrently" from multiple
sessions in chunks.

CREATE INDEX CONCURRENTLY idx_1ON tab_part1(col1);
CREATE INDEX CONCURRENTLY idx_2ON tab_part2(col1);

After this step finishes the table level index which was created in the
first step will be in valid state automatically.


Re: Question on pg_cron

2024-06-08 Thread sud
On Sat, Jun 8, 2024 at 10:05 PM yudhi s  wrote:

>
>
> On Sat, 8 Jun, 2024, 9:53 pm Ron Johnson,  wrote:
>
>> On Sat, Jun 8, 2024 at 5:31 AM yudhi s 
>> wrote:
>>
>>> Hello All,
>>>
>>> We have around 10 different partition tables for which the partition
>>> maintenance is done using pg_partman extension. These tables have foreign
>>> key dependency between them.  We just called partman.run_maintanance_proc()
>>> through pg_cron without any parameters and it was working fine. So we can
>>> see only one entry in the cron.job table. And it runs daily once.
>>>
>>> It was all working fine and we were seeing the historical partition
>>> being dropped and new partitions being created without any issue. But
>>> suddenly we started seeing, its getting failed with error "ERROR: can not
>>> drop schema1.tab1_part_p2023_12_01 because other objects depend on it"
>>>
>>
>> Have you changed version lately of PG, pg_cron or pg_partman?  Or maybe
>> what pg_cron or pg_partman depends on?
>>
>
> No version change, but we updated the part_config to set premake from 30
> to 60 for all the tables. But not sure how that impacted this behavior.
>
> However, do you think, we should better control the order of execution
> rather letting postgres to decide it's own, considering there is no such
> parameters for this ordering in part_config? And in that case which
> approach should we use out of the two i mentioned. Or any other strategies,
> should we follow, please advise?
>

 I believe, You should log this as an issue in the pg_partman open source
project.


Re: Creating big indexes

2024-06-11 Thread sud
On Sun, Jun 9, 2024 at 1:40 PM Lok P  wrote:

> On Sun, Jun 9, 2024 at 10:39 AM Lok P  wrote:
>
>>
>>
>> On Sun, Jun 9, 2024 at 10:36 AM sud  wrote:
>>
>>>
>>>  You can first create the index on the table using the "On ONLY"keyword,
>>> something as below.
>>>
>>> CREATE INDEX idx ON ONLY tab(col1);
>>>
>>> Then create indexes on each partition in "concurrently" from multiple
>>> sessions in chunks.
>>>
>>> CREATE INDEX CONCURRENTLY idx_1ON tab_part1(col1);
>>> CREATE INDEX CONCURRENTLY idx_2ON tab_part2(col1);
>>>
>>> After this step finishes the table level index which was created in the
>>> first step will be in valid state automatically.
>>>
>>>
>> Thank you so much.
>> Should we also tweak the parameters related to the parallelism and memory
>> as I mentioned in the first post?
>>
>
> Additionally ,is it also possible to drop the indexes also from the big
> partition table efficiently? To avoid the "transaction id wrap around" or
> "table bloat" when the index drop runs for longer duration?
>
>

I have never tried , but I think you can do "drop index concurrently" from
multiple sessions at same time for each of the partitions to make the drop
index finish quicker, similar to the "create index" statement as mentioned
above. Others may comment.


Design for dashboard query

2024-06-15 Thread sud
Hello All,

Its postgres version 15.4. We are having a requirement in which aggregated
information for all the users has to be displayed on the UI screen. It
should show that information on the screen. So basically, it would be
scanning the full table data which is billions of rows across many months
and then join with other master tables and aggregate those and then display
the results based on the input "user id" filter.

In such a scenario we are thinking of using a materialized view on top of
the base tables which will store the base information and refresh those
periodically to show the data based on the input user id. However i am
seeing , postgres not supporting incremental refresh of materialized view
and full refresh can take longer. So , do we have any other option
available? Additionally , It should not impact or block the online users
querying the same materialized view when the refresh is happening.


Load a csv or a avro?

2024-07-05 Thread sud
Hello all,

Its postgres database. We have option of getting files in csv and/or in
avro format messages from another system to load it into our postgres
database. The volume will be 300million messages per day across many files
in batches.

My question was, which format should we chose in regards to faster data
loading performance ? and if any other aspects to it also should be
considered apart from just loading performance?


Re: Load a csv or a avro?

2024-07-06 Thread sud
On Fri, Jul 5, 2024 at 3:27 PM Kashif Zeeshan 
wrote:

> Hi
>
> There are different data formats available, following are few points for
> there performance implications
>
> 1. CSV : It's easy to use and widely supported but it can be slower due to
> parsing overload.
> 2. Binary : Its faster to load but not human understandable.
>
> Hope this helps.
>
> Regards
> Kashif Zeeshan
>
>>
>>
 My understanding was that it will be faster to load .csv as it is
already being mapped to table rows and columns whereas in case of .avro the
mapping has to be done so that the fields in the avro can be mapped to the
columns in the table appropriately and that will be having additional
overhead. Is my understanding correct?


Re: Load a csv or a avro?

2024-07-06 Thread sud
On Fri, Jul 5, 2024 at 8:24 PM Adrian Klaver 
wrote:

> On 7/5/24 02:08, sud wrote:
> > Hello all,
> >
> > Its postgres database. We have option of getting files in csv and/or in
> > avro format messages from another system to load it into our postgres
> > database. The volume will be 300million messages per day across many
> > files in batches.
>
> Are dumping the entire contents of each file or are you pulling a
> portion of the data out?
>
>
>
Yes, all the fields in the file have to be loaded to the columns in the
tables in postgres. But how will that matter here for deciding if we should
ask the data in .csv or .avro format from the outside system to load into
the postgres database in row and column format? Again my understanding was
that irrespective of anything , the .csv file load will always faster as
because the data is already stored in row and column format as compared to
the .avro file in which the parser has to perform additional job to make it
row and column format or map it to the columns of the database table. Is my
understanding correct here?


Dropping column from big table

2024-07-10 Thread sud
Hi All,
It's postgres database version 15.4. We have a table which is daily and is
approx. ~2TB in size having a total ~90 partitions. We have a requirement
to drop columns and add new columns to this table.

I Want to understand, If this can be done online? what is the fastest way
to drop/add columns from such a big table and what will be the consequence
of this in regards to vacuum, post this activity? Or if any other issues we
may face post this?

Regards
Sud


Re: Dropping column from big table

2024-07-10 Thread sud
On Thu, Jul 11, 2024 at 2:52 AM Adrian Klaver 
wrote:

>
> https://www.postgresql.org/docs/current/sql-altertable.html
>
> "The DROP COLUMN form does not physically remove the column, but simply
> makes it invisible to SQL operations. Subsequent insert and update
> operations in the table will store a null value for the column. Thus,
> dropping a column is quick but it will not immediately reduce the
> on-disk size of your table, as the space occupied by the dropped column
> is not reclaimed. The space will be reclaimed over time as existing rows
> are updated.
>
> To force immediate reclamation of space occupied by a dropped column,
> you can execute one of the forms of ALTER TABLE that performs a rewrite
> of the whole table. This results in reconstructing each row with the
> dropped column replaced by a null value.
> "
>
>
Thank you so much. When you said *"you can execute one of the forms of
ALTER TABLE that performs a rewrite*
*of the whole table."* Does it mean that post "alter table drop column" the
vacuum is going to run longer as it will try to clean up all the rows and
recreate the new rows? But then how can this be avoidable or made better
without impacting the system performance and blocking others?


Re: Dropping column from big table

2024-07-11 Thread sud
On Thu, 11 Jul, 2024, 12:46 pm Ron Johnson,  wrote:

> On Wed, Jul 10, 2024 at 11:28 PM sud  wrote:
>
>>
>>
>>
>> Thank you so much. When you said *"you can execute one of the forms of
>> ALTER TABLE that performs a rewrite*
>> *of the whole table."* Does it mean that post "alter table drop column"
>> the vacuum is going to run longer as it will try to clean up all the rows
>> and recreate the new rows? But then how can this be avoidable or made
>> better without impacting the system performance
>>
>
> "Impact" is a non-specific word.  "How much impact" depends on how many
> autovacuum workers you've set it to use, and how many threads you set in
> vacuumdb.
>
>
>> and blocking others?
>>
>
> VACUUM never blocks.
>
> Anyway, DROP is the easy part; it's ADD COLUMN that can take a lot of time
> (depending on whether or not you populate the column with a default value).
>
> I'd detach all the partitions from the parent table, and then add the new
> column to the not-children in multiple threads, add the column to the
> parent and then reattach all of the children.  That's the fastest method,
> though takes some time to set up.
>


Thank you so much.

Dropping will take it's own time for post vacuum however as you rightly
said, it won't be blocking which should be fine.

In regards to add column, Detaching all partitions then adding column  to
the individual partition in multiple sessions and then reattaching looks to
be a really awesome idea to make it faster. However one doubt, Will it
create issue if there already exists foreign key on this partition table or
say it's the parent to other child partition/nonpartition tables?


Re: Dropping column from big table

2024-07-15 Thread sud
On Mon, Jul 15, 2024 at 7:58 PM Peter J. Holzer  wrote:

>
> > Hm, true.
> >
> > You can always do
> >
> >   UPDATE tab SET id = id;
> >
> > followed by
> >
> >   VACUUM (FULL) tab;
>
> Yes, that should work. It needs about twice the size of the table in
> temporary space, though.
>
> Since the OP wrote that the table is "daily ... and 90 partitions"
> (which understand that there is one partition per day and partitions are
> kept for 90 days) it might be better to just wait. After 90 days all the
> partitions with the obsolete column will be gone.
>
>
>
Thank You very much.

As I understand, after dropping a column, it will still internally hold the
"NOT NULL" values in that column for all the rows, even though it's not
visible outside.

So, after the DROP column,  it will force update any of the columns as
below, Then it will force create another copy of each of the rows even if
the column is updated to the same value. The new copy will have the dropped
column with values as NULL. And the post "VACUUM FULL '' will clean all the
rows with "NOT NULL '' values of that dropped column and thus reclaim the
space.

But the only issue would be "VACUUM FULL" will take a table lock and also
it may take longer to run this vacuum on the full table considering the
size of the table in TB's. Thus, is it fine to just leave it post execution
of the "update" statement , so that the normal vacuum operation (which will
be online operation) and that will take care of the removal of old rows ?

 UPDATE tab SET id = id;
 VACUUM (FULL) tab;

 And also, As you mentioned we may also leave it as is and wait for the
partition to be dropped, so that the dropped column with "not null" values
which are still lying under the hood and are occupying space will be
removed automatically. But even then, is that dropped column still lying in
the rows with null values in it throughout its lifecycle, till the table
exists in the database?

 Seems there is no other option exist to drop the column with space
reclaimed from the table in immediate effect, other than above discussed.


Re: Dropping column from big table

2024-07-15 Thread sud
On Tue, Jul 16, 2024 at 6:07 AM Peter J. Holzer  wrote:

>
> > But the only issue would be "VACUUM FULL" will take a table lock and
> also it
> > may take longer to run this vacuum on the full table considering the
> size of
> > the table in TB's. Thus, is it fine to just leave it post execution of
> the
> > "update" statement , so that the normal vacuum operation (which will be
> online
> > operation) and that will take care of the removal of old rows ?
>
> This is unlikely to recover the space.
>
> The UPDATE will duplicate all rows. Since - presumably - there isn't
> much free space within each partition the new rows will go at the end of
> each partition, effectively doubling its size.
>
> A regular VACUUM (whether autovacuum or invoked manually) will then
> remove the old rows. and make the space available for new data. But
> since that newly free space is at the beginning of each partition it
> can't be returned to the OS. It would be available for new data written
> to those partitions I guess not much is written to old partitons.
>
> You could, however, do this in small steps and vacuum after each.
> Something like this (in Python)
>
> conn = ...
> csr = conn.cursor()
> for i in range(100):
> csr.execute("UPDATE tab set id = id WHERE id % 100 = %s", (i,))
> conn.commit()
> csr.execute("VACUUM tab")
> conn.commit()
>
> That might just be able to squeeze the new rows in between the existing
> rows and not grow the table.
>
>
> >  And also, As you mentioned we may also leave it as is and wait for the
> > partition to be dropped, so that the dropped column with "not null"
> values
> > which are still lying under the hood and are occupying space will be
> removed
> > automatically. But even then, is that dropped column still lying in the
> rows
> > with null values in it throughout its lifecycle, till the table exists
> in the
> > database?
>
> Yes. But a nullable column with a null value takes only a single bit of
> storage, so that's negligible.
>
>
>

Thank you so much.

Normal vacuum marks the space occupied by the dead tuples as free or
reusable but vacuum full removes those completely. However even with
"vacuum full", the old rows will be removed completely from the storage ,
but the new rows will always be there with the 'dropped' column still
existing under the hood along with the table storage,  with just carrying
"null" values in it. However,  as it's a single bit of storage so will be
having negligible overhead. If we want to fully remove that column from the
table , we may have to create a new table and dump the data into that from
the existing table and then rename it back to old. Is this understanding
correct?


Re: Dropping column from big table

2024-07-15 Thread sud
On Tue, Jul 16, 2024 at 10:26 AM David G. Johnston <
david.g.johns...@gmail.com> wrote:

>
>
> On Monday, July 15, 2024, David G. Johnston 
> wrote:
>
>> On Monday, July 15, 2024, sud  wrote:
>>
>>>
>>> However even with "vacuum full", the old rows will be removed completely
>>> from the storage , but the new rows will always be there with the 'dropped'
>>> column still existing under the hood along with the table storage,  with
>>> just carrying "null" values in it. […] Is this understanding correct?
>>>
>>
>> No.  The table rewrite process involves creating new tuples that exactly
>> conform to the current row specification.  The potentially non-null data
>> present in live tuples for columns that have been dropped are not copied
>> into the newly constructed tuples.
>>
>>
>> https://github.com/postgres/postgres/blob/d2b74882cab84b9f4fdce0f2f32e892ba9164f5c/src/backend/access/heap/heapam_handler.c#L2499
>>
>>
> My bad, stopped at the code comment.  Apparently the data is just nulled,
> not removed, the current row descriptor contains those columns with “is
> dropped” and since this behavior doesn’t change the catalogs in this way
> the new ones must as well.  We just get the space back.
>
>
>
Thank you for the confirmation.
And if someone wants to fully remove that column from the table , then the
only option is to create a new table with an exact set of active columns
and insert the data into that from the existing/old table and then rename
it back to old. Is this correct understanding?


Trigger usecase

2024-07-30 Thread sud
Hello,

We have a streaming application (using apache flink and kafka) which
populates data in the tables of a postgres database version 15.4.

Now while loading transactions data we also get some reference data
information from source (for example customer information) and for these ,
we dont want to modify or override the existing customer data but want to
keep the old data with a flag as inactive and the new record should get
inserted with flag as active. So for such use case , should we cater this
inside the apache flink application code or should we handle this using
trigger on the table level which will execute on each INSERT and execute
this logic?

I understand trigger is difficult to debug and monitor stuff. But here in
this case , team mates is saying , we shouldn't put such code logic into a
streaming application code so should rather handle through trigger.

I understand, technically its possible bith the way, but want to understand
experts opinion on this and pros ans cons?

Regards
Sud


Re: Trigger usecase

2024-07-30 Thread sud
On Tue, Jul 30, 2024 at 10:54 PM Laurenz Albe 
wrote:

>
> It is largely a matter of taste.
>
> The advantage of a trigger is that it works even if somebody bypasses the
> application
> to insert data.
>
> I think that triggers are easy to debug, but again, that's a matter of
> taste.
>
>
> Thank you David and Laurenz.

Creating triggers to populates some audit table or say populating data in
audit columns (created_by, updated_by,created_date,updated_date) is fine i
believe, however this use case was to load/persist data in table with SCD-2
style, so is it good idea to use the trigger for such use case?

Not sure of the exact pros and cons, but we were following certain rules
like , if it's business logic which needs to be implemented in Database,
then it should not be done using triggers but rather should be done through
database procedure/functions. Hope this understanding correct.

Regards
Sud


Re: Column type modification in big tables

2024-08-07 Thread sud
On Wed, Aug 7, 2024 at 4:39 PM Lok P  wrote:

> Hello all,
> We have a postgres table which is a range partitions on a timestamp column
> having total size ~3TB holding a total ~5billion rows spanning across ~150
> daily partitions and there are ~140+columns in the table. Also this table
> is a child to another partition table. And we have partition creation
> handled through pg_partman extension on this table.
>
> We have a requirement of modifying the existing column lengths as below .
> So doing it directly through a single alter command will probably scan and
> rewrite the whole table which may take hours of run time.
>
> So trying to understand from experts what is the best possible way to
> tackle such changes in postgres database? And if any side effects we may
> see considering this table being child to another one and also dependency
> with pg_partman extension.
>
> two of the columns from varchar(20) to numeric(3)
> one of the columns from varchar(20) to varchar(2)
> one of the columns from Number(10,2) to Numeric(8,2)
>
>
>

Others may correct but i think, If you don't have the FK defined on these
columns you can do below.


--Alter table add column which will be very fast within seconds as it will
just add it to the data dictionary.

ALTER TABLE tab1 ADD COLUMN new_column1 NUMERIC(3),   new_column2
varchar2(3);


*-- Back populate the data partition wise and commit, if it's really needed*

UPDATE tab1_part1 SET new_column1 = CAST(old_column1 AS NUMERIC(3)),
new_column2 = CAST(old_column2 AS varchar2(3)) ;
commit;
UPDATE tab1_part2 SET new_column1 = CAST(old_column1 AS NUMERIC(3)),
new_column2 = CAST(old_column2 AS varchar2(3)) ;
commit;
UPDATE tab1_part3 SET new_column1 = CAST(old_column1 AS NUMERIC(3)),
new_column2 = CAST(old_column2 AS varchar2(3)) ;
commit;
.


*--Alter table drop old columns which will be very fast within seconds as
it will just drop it from the data dictionary.*
ALTER TABLE your_table DROP COLUMN old_column1, DROP COLUMN old_column2;


Re: Column type modification in big tables

2024-08-07 Thread sud
On Wed, Aug 7, 2024 at 5:00 PM Lok P  wrote:

>
>
> On Wed, Aug 7, 2024 at 4:51 PM sud  wrote:
>
>>
>>
>> Others may correct but i think, If you don't have the FK defined on these
>> columns you can do below.
>>
>>
>> --Alter table add column which will be very fast within seconds as it
>> will just add it to the data dictionary.
>>
>> ALTER TABLE tab1 ADD COLUMN new_column1 NUMERIC(3),   new_column2
>> varchar2(3);
>>
>>
>> *-- Back populate the data partition wise and commit, if it's really
>> needed*
>>
>> UPDATE tab1_part1 SET new_column1 = CAST(old_column1 AS NUMERIC(3)),
>> new_column2 = CAST(old_column2 AS varchar2(3)) ;
>> commit;
>> UPDATE tab1_part2 SET new_column1 = CAST(old_column1 AS NUMERIC(3)),
>> new_column2 = CAST(old_column2 AS varchar2(3)) ;
>> commit;
>> UPDATE tab1_part3 SET new_column1 = CAST(old_column1 AS NUMERIC(3)),
>> new_column2 = CAST(old_column2 AS varchar2(3)) ;
>> commit;
>> .
>>
>>
>> *--Alter table drop old columns which will be very fast within seconds as
>> it will just drop it from the data dictionary.*
>> ALTER TABLE your_table DROP COLUMN old_column1, DROP COLUMN old_column2;
>>
>
>
>
> Thank you so much.
>
> I understand this will be the fastest possible way to achieve the column
> modification.
>
> But talking about the dropped column which will be sitting in the table
> and consuming storage space, Is it fine to leave as is or auto vacuum will
> remove the column values behind the scene and also anyway , once those
> partitions will be purged they will be by default purged. Is this
> understanding correct?
>
>  And also will this have any impact on the partition maintenance which is
> currently done by pg_partman as because the template table is now different
> internally(not from outside though). Will it cause conflict because of
> those dropped columns from the main table?
>

I think leaving the table as is after the dropping column will be fine for
you because your regular partition maintenance/drop will slowly purge the
historical partitions and eventually they will be removed. But if you
update those new columns with the old column values, then autovacuum should
also take care of removing the rows with older column values (which are
dead actually) .

Not sure if pg_partman will cause any issue ,as because the table now has
the column data type/length changed. Others may confirm.


Re: Column type modification in big tables

2024-08-10 Thread sud
On Sat, Aug 10, 2024 at 12:52 AM Lok P  wrote:

>
> On Fri, Aug 9, 2024 at 9:19 PM Greg Sabino Mullane 
> wrote:
>
>> On Fri, Aug 9, 2024 at 6:39 AM Lok P  wrote:
>>
>>> Thank you so much. Will definitely try to evaluate this approach. The
>>> Only concern I have is , as this data is moving downstream with exactly the
>>> same data type and length , so will it cause the downstream code to break
>>> while using this column in the join or filter criteria. Also I believe the
>>> optimizer won't be able to utilize this information while preparing the
>>> execution plan.
>>>
>>
>> Yes, this is not as ideal as rewriting the table, but you asked for
>> another approaches :) As to the impact of your downstream stuff, I think
>> you have to try and see. Not clear what you mean by the optimizer, it's not
>> going to really care about numeric(10) versus numeric(8) or varchar(20) vs
>> varchar(2). It's possible the varchar -> numeric could cause issues, but
>> without real-world queries and data we cannot say.
>>
>>
>>>  Another thing , correct me if wrong, My understanding is  , if we want
>>> to run the "validate constraint" command after running this "check
>>> constraint with not valid" command, this will do a full table scan across
>>> all the partitions , but it's still beneficial as compared to updating the
>>> columns values for each rows. Correct me if I'm wrong.
>>>
>>
>> Yes, it needs to scan the entire table, but it's a lightweight lock,
>> won't block concurrent access, will not need to detoast, and makes no table
>> or index updates. Versus an entire table rewrite which will do heavy
>> locking, take up tons of I/O, update all the indexes, and generate quite a
>> lot of WAL.
>>
>>
> Thank you so much Greg.
>
> Considering the option, if we are able to get large down time to get this
> activity done.
>
> Some teammates suggested altering the column with "USING" Clause. I am not
> really able to understand the difference,  also when i tested on a simple
> table, it seems the "USING" clause takes more time as compared to normal
> ALTER. But again I don't see any way to see the progress and estimated
> completion time. Can you share your thoughts on this?
>
> ALTER TABLE foobar ALTER COLUMN mycol TYPE NUMERIC(3) USING
> mycol::NUMERIC(3);
> VS
> ALTER TABLE foobar ALTER COLUMN mycol TYPE NUMERIC(3) ;
>
> *
> Another thing also comes to my mind whether we should just create a new
> partition table(say new_part_table) from scratch from the DDL of the
> existing table(say old_part_table) and then load the data into it using
> command (insert into new_part_table.. select..from old_part_table). Then
> create indexes and constraints etc, something as below.
>
> Will this approach be faster/better as compared to the simple "alter table
> alter column approach" as above, considering we will have 4-6 hours of
> downtime for altering three different columns on this ~5TB table?
>
>
> *-- Steps*
> Create table exactly same as existing partition table but with the
> modified column types/lengths.
>
> drop indexes ;  (Except PK and FK indexes may be..)
> drop constraints;
>
> insert into new_part_table (...) select (...) from old_part_table;
>
> create indexes concurrently ;
> create constraints; (But this table is also a child table to another
> partition table, so creating the foreign key may be resource consuming here
> too).
>
> drop the old_part_table;
> rename the new_part_table to old_part_table;
> rename all the partitions;
>
> VACUUM  old_part_table  ;
> ANALYZE  old_part_table  ;
>


My 2cents.
If you have enough time then from a simplicity point of view, your single
line alter command may look good, but how are you going to see  the amount
of progress it has made so far and how much time it's going to take to
finish. And you got ~6hrs of down time but if it fails at 5th hour then you
will be in a bad position.


Insert query performance

2024-08-19 Thread sud
Hello All,

In a version 15.4 postgres database, Is it possible that , if we have two
big range partition tables with foreign key relationships between them,
insert into the child table can cause slowness if we don't have foreign key
index present in the child table? Basically it need to make sure the new
row already added to parent partition table or not.


And if there is any possible way(example query tracing etc) to get the
underlying system queries which gets triggered as part of the main insert
query? For example in above scenario, postgres must be executing some query
to check if the incoming  row to the child table already exists in the
parent table or not?


Regards

Sud


Re: Insert query performance

2024-08-20 Thread sud
On Mon, Aug 19, 2024 at 4:33 PM David Rowley  wrote:

> On Mon, 19 Aug 2024 at 19:48, sud  wrote:
> > In a version 15.4 postgres database, Is it possible that , if we have
> two big range partition tables with foreign key relationships between them,
> insert into the child table can cause slowness if we don't have foreign key
> index present in the child table? Basically it need to make sure the new
> row already added to parent partition table or not.
>
> Having an index on the referencing columns is only useful for DELETEs
> and UPDATEs affecting the foreign key column(s).  For INSERTs to the
> referencing table, technically having indexes there would only slow
> down inserts due to the additional overhead of having to maintain the
> index, however, the overhead of having the index might be fairly
> minuscule when compared to performing a CASCADE UPDATE or DELETE to
> the referencing table when the DDL is performed on the referenced
> table.
>
> > And if there is any possible way(example query tracing etc) to get the
> underlying system queries which gets triggered as part of the main insert
> query? For example in above scenario, postgres must be executing some query
> to check if the incoming  row to the child table already exists in the
> parent table or not?
>
> EXPLAIN ANALYZE will list the time it took to execute the foreign key
> trigger in the "Trigger for constraint" section.
>
> David
>


Thank you so much David.

If I get it correct , the index on the foreign key mainly helps improve the
deletes/updates performance of the parent table , if the same FK column
gets impacted from the parent table. (This might be the reason why our
detach partition in the parent table runs long and never completes as we
have no foreign key indexed).

However, my initial understanding of "*having the FK index will improve the
insert performance in the child table*" is not accurate it seems. Rather as
you mentioned it may negatively impact the loading/insert performance
because it has to now update the additional index in each insert. In case
of insert into child table, to ensure if the child row is already present
in the parent ,  it just scans the parent by the Primary key of the parent
table (which is be default indexed) and thus it doesn't need an index in
the child table foreign keys or having an index in the foreign key in the
child table won't help the constraint validation faster. Please correct me
if my understanding is wrong here.

Additionally as you mentioned "explain analyze" will show a section on how
much time it really takes for the constraint validation , I can see that
section now. But it seems it will really need that INSERT statement to be
executed and that we can't really do in production as that will physically
insert data into the table. So do you mean to just do the "explain analyze"
for the INSERT query and capture the plan and then do the rollback?  And in
our case it's a row by row insert happening , so we will see if we can
club/sum that "constraint validation" time for a handful if insert somehow
to get a better idea on the percentage of time we really spent in the
constraint validation.


Re: Insert query performance

2024-08-20 Thread sud
On Mon, Aug 19, 2024 at 1:25 PM Muhammad Ikram  wrote:

> Hi Sud,
>
> Please make following change in your postgresql.conf file
>
> log_statement = 'all'
>
>
Will this put all the internal sql query or the recursive query entries in
the pg_stats_statement view which we can analyze then? And also to debug
issues in the production will it be a good idea to set it for a few times
and then turn it off or it can have significant performance overhead.


How effectively do the indexing in postgres in such cases

2024-09-07 Thread sud
Hello Experts,
We have a requirement in which the query will be formed like below. We will
have two partitioned tables joined and there may be filters used on both of
the tables columns or it may be one of those.

These types of queries are very frequently used queries and critical to
customers as these are part of search screens , so we want to have the
indexing happen effectively to satisfy these types of queries to return
rows in not more than ~1 seconds. The both tables are daily range
partitions on column "part_date" and the volume of data per day/partitions
will be ~700mllion in both the tables.

The customer can go searching for a duration starting from one days till
max ~1 month of data i.e. part_date spanning for ~1 month duration. And the
search should provide the latest transaction on the screen which is why
"order by ..limit clause is used". "Offset" is used there because the
customer can scroll through the next page on the UI where he has the
capability to see the next 100 rows and so on. In the first screen it is
also expected to see the count of the results , so that the customer can
get an immediate idea about the total count of transactions he has matching
his search criteria.

So ,
1)In the query below , if the optimizer chooses tab1 as the driving table,
the index on just col1 should be enough or it should be (col1, tab1_id)?
Similarly if it chooses the tab2 be the driving table then , index on
(col2,tab2_id). Or just indexing the filtered column should be enough like
individual indexes on COL1 and COL2 of table tab1 and tab2 respectively?

2)In scenarios where the customer has a lot of matching transactions (say
in millions) post all the filters applied , and as the customer has to just
see the latest 100 rows transaction data, the sorting will be a bottleneck.
So what can be done to make such types of queries to return the latest
transactions in quick time on the search screen?

3)As here also the count has to happen in the first step to make the
customer know the total number of rows(which may be in millions), so what
additional index will support this requirement?

Or if any other optimization strategy we can follow for catering to such
queries?

select * from tab1, tab2
where tab1.part_date between '1-jan-2024' and '31-jan-2024'
and tab1.part_date=tab2.part_date
and tab1.tab1_id=tab2.tab2_id
and tab1.col1=<:input_col1>
and tab2.col2=<:input_col2>
order by tab1.create_timestamp desc
limit 100 offset 100;

Regards
Sud


Question on indexes

2024-10-10 Thread sud
Hi,
I have never used any 'hash index' but saw documents in the past suggesting
issues around hash index , like WAL doesnt generate for "hash index" which
means we can't get the hash index back after crash also they are not
applied to replicas etc. And also these indexes can not be used for range
queries , for sorting etc.

However, we are seeing that one of the databases has multiple hash indexes
created. So I wanted to understand from experts here, if it's advisable in
any specific scenarios over B-tre despite such downsides?
Note- Its version 15.4 database.

Regards
Sud


Re: Question on indexes

2024-10-10 Thread sud
On Fri, Oct 11, 2024 at 12:51 AM Erik Wienhold  wrote:

> On 2024-10-10 20:49 +0200, sud wrote:
> > However, we are seeing that one of the databases has multiple hash
> indexes
> > created. So I wanted to understand from experts here, if it's advisable
> in
> > any specific scenarios over B-tre despite such downsides?
>
> Two things come to my mind:
>
> 1. Btree puts a limit on the size of indexed values, whereas hash
>indexes only store the 32-bit hash code.
>
> 2. Of the core index types, only btree supports unique indexes.
>
> Example of btree's size limit:
>
> CREATE TABLE b (s text);
> CREATE INDEX ON b USING btree (s);
> INSERT INTO b (s) VALUES (repeat('x', 100));
> ERROR:  index row requires 11464 bytes, maximum size is 8191
>
> The docs have more details:
> https://www.postgresql.org/docs/current/btree.html
> https://www.postgresql.org/docs/current/hash-index.html
>
>
> Thank you.

Not yet confirmed, but actually somehow we see the DB crashed repetitively
a few times and teammates suspecting the cause while it tried extending
this hash index. Did you experience any such thing with hash index?
However, as you mentioned ,if we have any column with large string/text
values and we want it to be indexed then there is no choice but to go for a
hash index. Please correct me if I'm wrong.


Re: Question on session timeout

2024-10-04 Thread sud
On Tue, Oct 1, 2024 at 5:45 PM Greg Sabino Mullane 
wrote:

> On Tue, Oct 1, 2024 at 1:57 AM sud  wrote:
>
>> *Where are you getting the ~2000  count from?*
>> Seeing this in the "performance insights" dashboard and also its matching
>> when I query the count of sessions from pg_stat_activity.
>>
>
> So I'm guessing this is perhaps RDS or Aurora? Stating that up front can
> be helpful.
>
>
>> As you described, a long time open transaction with a session state as
>> "idle" will be threatening as that will cause locking
>>
>
> No, idle is fine, "idle in transaction" is bad. :)
>
> Is it correct to assume the session in pg_stat_activity with very old
>> XACT_START are the one which are part of long running
>>
>
> 
>
> You need to look at the "state" column as your primary bit of information.
> Second most important is how long something has been in that state, which
> you can find with now() - state_change. The best way to learn all of this
> is to open a few concurrent sessions in psql and experiment.
>
>
>> We have max_connections set as 5000.
>>
>
> That's quite high. But if you never reach that high, it doesn't matter a
> whole lot.
>
> "Database connection" touching ~2000 then coming down till 200. And we see
>> uneven spikes in those, it seems to be matching with the pattern , when we
>> have some errors occurring during the insert queries which are submitted by
>> the Java application to insert the data into the tables.
>>
>
> (What sort of errors?) 2000 is high. Clearly, you are not pooling
> connections, or not pooling them well. If you are using plain Postgres,
> look into setting up pgbouncer. If using something managed (e.g. RDS) look
> into their particular pooling solution. Or fix your application-level
> pooling.
>
>
Thanks Greg.
It's a third party app and the application team confirmed they are using
connection pooling at their side. But as you mentioned, the number of
connections *"2000 is high"* . But , isn't it possible because they may be
having a max connection pool size limit set as ~2000 which is why we see
that many connections during peak window. So in that case is it advisable
to reduce the number of Max connections, because we have a number of cores
-32 for this instance.

And yes it's RDS. The errors which we were seeing were related to the data
bit not related to connections.


Re: Question on session timeout

2024-09-30 Thread sud
On Tue, Oct 1, 2024 at 4:10 AM Adrian Klaver 
wrote:

> On 9/30/24 13:01, sud wrote:
> > Hello,
> > We are frequently seeing the total DB connection reaching ~2000+ whereas
>
> Where are you getting the ~2000  count from?
>
> > the total number of active sessions in pg_stat_activity staying <100 at
> > any point in time. And when we see the sessions from DB side they are
> > showing most of the sessions with state as 'idle' having
> > backend_start/xact_start showing date ~10days older. We do use
> > application level connection pooling, and we have ~120 sets as both the
>
> What do you mean by ~120 sets, in particular what is a set?
>
> > "max idle" and "max active" connection count and "maxage" as 7 days, so
> > does this suggest any issue at connection pool setup?
>
> Using what pooler?
>
> >
> > We do see keep alive queries in the DB (select 1), not sure if that is
> > making this scenario. When checking the
>
> How often do to keep alive queries run?
>
> > "idle_in_transaction_session_timeout" it is set as 24hours and
> > "idle_session_timeout" set as "0". So my question is , should we set the
> > parameter to a lesser value in DB cluster level like ~5minutes or so, so
> > as not to keep the idle sessions lying so long in the database and what
>
> '"idle_in_transaction_session_timeout" it is set as 24hours' is a foot
> gun as explained here:
>
> https://www.postgresql.org/docs/current/runtime-config-client.html
>
> idle_in_transaction_session_timeout (integer)
>
> [...]
>
> "This option can be used to ensure that idle sessions do not hold locks
> for an unreasonable amount of time. Even when no significant locks are
> held, an open transaction prevents vacuuming away recently-dead tuples
> that may be visible only to this transaction; so remaining idle for a
> long time can contribute to table bloat. See Section 24.1 for more
> details."
>
>
> With '"idle_session_timeout" set as "0"' a session without an open
> transaction is not going to timeout.
>
>
>

*Where are you getting the ~2000  count from?*
Seeing this in the "performance insights" dashboard and also its matching
when I query the count of sessions from pg_stat_activity.


*What do you mean by ~120 sets, in particular what is a set?*These are the
values set as mentioned in the properties file which the application team
uses for connection pooling.


*Using what pooler?*I need to check on this as Its Java application(jdbc
driver for connecting to DB), so I thought it must be using standard
connection pooling. Will double check.


*How often do to keep alive queries run?*Need to check. But I am not sure,
in general , if these "keep alive" queries are used for keeping a
transaction alive or a session alive?

As you described, a long time open transaction with a session state as
"idle" will be threatening as that will cause locking and "transaction ID
wrap around" issues to surface whereas having "idle sessions" of a closed
transaction may not cause any issue as they will do no harm. Does it mean
we can have any number of idle sessions or we should also have some non
zero "timeout" setup for the "ide_session_timeout" parameter too
(maybe ~1hr or so)?

Is it correct to assume the session in pg_stat_activity with very old
XACT_START are the one which are part of long running open transaction(i.e.
driven by idle_in_transaction_session_timeout) whereas the ones with older
BACKEND_START or QUERY_START are the one are just the idle session(driven
by idle_session_timeout) but not tied to any open transaction?

Few observations:-

I do see, "MaximumUsedTransactionIDs" staying consistently ~200M for a long
time then coming down. And its matching to "autovacuum_freeze_max_age"
which is set as 200M. Hope it's expected. We have max_connections set as
5000.

"Database connection" touching ~2000 then coming down till 200. And we see
uneven spikes in those, it seems to be matching with the pattern , when we
have some errors occurring during the insert queries which are submitted by
the Java application to insert the data into the tables.


Question on session timeout

2024-09-30 Thread sud
Hello,
We are frequently seeing the total DB connection reaching ~2000+ whereas
the total number of active sessions in pg_stat_activity staying <100 at any
point in time. And when we see the sessions from DB side they are showing
most of the sessions with state as 'idle' having backend_start/xact_start
showing date ~10days older. We do use application level connection pooling,
and we have ~120 sets as both the "max idle" and "max active" connection
count and "maxage" as 7 days, so does this suggest any issue at connection
pool setup?

We do see keep alive queries in the DB (select 1), not sure if that is
making this scenario. When checking the
"idle_in_transaction_session_timeout" it is set as 24hours and
"idle_session_timeout" set as "0". So my question is , should we set the
parameter to a lesser value in DB cluster level like ~5minutes or so, so as
not to keep the idle sessions lying so long in the database and what would
be the advisable value for these parameters?

Regards
Sud


Question on Alerts

2025-02-16 Thread sud
Hi,
We are asked to have key monitoring or alerting added to our postgres
database. And I am thinking of metrics like blocked transactions, Max used
transaction Ids,  Max Active session threshold, Deadlock, Long running
query,  replica lag, buffer cache hit ratio, read/write IOPS or latency
etc. I have below questions

1)Below are some which i tried writing, can you please let me know if these
are accurate?
2)How should we be writing the alerting query for deadlock, max used
transaction ids, read/write IOPS and latency?
3)Are there any docs available which have these sample sql queries on the
pg_* table for these critical alerts which we can easily configure through
any tool?
4)Any other alerts which we should be really having?

*Blocking sessions
select distinct blocking_id from
   (
SELECT
activity.pid,
activity.usename,
activity.query,
blocking.pid AS blocking_id,
blocking.query AS blocking_query
FROM pg_stat_activity AS activity
JOIN pg_stat_activity AS blocking
ON blocking.pid = ANY(pg_blocking_pids(activity.pid))
   ) a;

 long running beyond ~1 hours*
SELECT
query,
datname,
pid,
now() - state_change AS idle_for
FROMpg_stat_activity
WHEREstate IN ('active', 'idle in transaction')
AND pid <> pg_backend_pid()
AND xact_start < now() - interval '1 hour'
ORDER BYage(backend_xmin) DESC NULLS LAST;

 No of active sessions **
SELECT count(*) AS active_connections
FROM pg_stat_activity
WHERE state = 'active';

***replica lag
SELECT client_addr, state, sent_location, write_location, flush_location,
replay_location,
   pg_wal_lsn_diff(sent_location, replay_location) AS replica_lag
FROM pg_stat_replication;

***buffer cache hit ratio
SELECT
   (1 - (blks_read::float / (blks_hit + blks_read))) * 100 AS
buffer_cache_hit_ratio
FROM pg_stat_database;

Regards
Yudhi


Re: Question on Alerts

2025-02-16 Thread sud
On Sun, Feb 16, 2025 at 10:05 PM Guillaume Lelarge <
guillaume.lela...@dalibo.com> wrote:

>
> You should probably look at check_postgres and check_pgactivity. Their
> source code contain numerous SQL queries, that could help you write your
> own.
>
> Regards.
>
>
Thank you very much. I am a bit new to postgres here. Can you please guide
me , where exactly I can get the source code for  check_postgres and
check_pgactivity?