Backup certain months old data

2024-01-22 Thread Siraj G
Hello!

I would like to know how we can backup certain months old data from PgSQL
and then delete it. The intent is to backup everything that is older than 2
quarters to a blob storage and delete it, to improve performance and reduce
billing.

Regards
Siraj


Fwd: pgadmin not opening in concurrent sessions

2024-02-08 Thread Siraj G
Hello!

I installed PgAdmin4 8.2 in a Windows platform, for all the users. The
issue we are having is, if any user launches it, nobody else can launch it.
Eg., userB logs in and launches PgAdmin, it gets launched. Now if UserA and
UserB tried to login, they get nothing, no event/response from Windows/the
tool.

Has anyone experienced this issue? If so, is there any workaround or fix
for this?

I tried to de-install and install the 7.8 version, but to no avail.
[image: unnamed.png]
Regards
Siraj


User roles for gathering performance metrics data

2024-03-26 Thread Siraj G
Greetings!

I am from Oracle background. In Oracle, we grant select_catalog_role or
select any dictionary role to users who want to study performance data. I
am trying to get similar information on the roles or privileges in PgSQL
that we might want to request to investigate the performance problems?

Thanks
Siraj


Problem with a Query

2024-08-12 Thread Siraj G
Hello!

We migrated a PgSQL database from Cloud SQL to compute engine and since
then there is a SQL we observed taking a long time. After some study, I
found that the SQL is using NESTED LOOP where the cost is too high. I tried
VACUUM FULL and ANALYZE, but to no avail. Only when I disabled the nested
loop (enable_nestloop) the query starts running normally.

I checked in the cloud SQL the nested loop is enabled, not sure why this
difference. Can anyone please assist.

Here is the execution plan (bad one):
--
 Nested Loop  (cost=120339.76..132772.17 rows=1 width=775) (actual
time=901159.223..901897.801 rows=12 loops=1)
   Join Filter: (marketing_app_homecounsellinglead.id = w0.cancel_event_id)
   Rows Removed by Join Filter: 28926
   Buffers: shared hit=502303510 read=299 dirtied=1
   ->  Unique  (cost=1.58..4283.42 rows=1 width=4) (actual
time=47.768..51.917 rows=13 loops=1)
 Buffers: shared hit=9680 read=19
 ->  Nested Loop Semi Join  (cost=1.58..4283.42 rows=1 width=4)
(actual time=47.767..51.900 rows=36 loops=1)
   Buffers: shared hit=9680 read=19
   ->  Nested Loop  (cost=1.00..4282.75 rows=1 width=8) (actual
time=46.703..51.596 rows=44 loops=1)
 Buffers: shared hit=9379 read=19
 ->  Index Scan using marketing_a_cancel__55_idx on
marketing_app_leadhistory w0  (cost=0.57..4274.30 rows=1 width=8) (actual
time=46.678..51.232 rows=44 loops=1)
   Index Cond: ((cancel_event_id IS NOT NULL) AND
(cancel_event_type = 1))
   Filter: ((status_id = 93) AND
((followup_date)::date >= '2024-08-01'::date) AND ((followup_date)::date <=
'2024-08-07'::date))
   Rows Removed by Filter: 22268
   Buffers: shared hit=9170 read=19
 ->  Index Scan using marketing_app_leadinfo_pkey on
marketing_app_leadinfo w1  (cost=0.43..8.45 rows=1 width=8) (actual
time=0.006..0.006 rows=1 loops=44)
   Index Cond: (id = w0.lead_id)
   Buffers: shared hit=209
   ->  Nested Loop Semi Join  (cost=0.58..0.66 rows=1 width=12)
(actual time=0.006..0.006 rows=1 loops=44)
 Join Filter: (v0_1.id = u0_2.id)
 Buffers: shared hit=301
 ->  Index Only Scan using branch_id_idx on branch v0_1
 (cost=0.14..0.16 rows=1 width=4) (actual time=0.001..0.002 rows=1 loops=44)
   Index Cond: (id = w1.branch_id)
   Heap Fetches: 44
   Buffers: shared hit=88
 ->  Nested Loop  (cost=0.43..0.49 rows=1 width=8)
(actual time=0.004..0.004 rows=1 loops=44)
   Join Filter: (u0_2.id = u1_2.branch_id)
   Buffers: shared hit=213
   ->  Index Only Scan using branch_id_idx on
branch u0_2  (cost=0.14..0.16 rows=1 width=4) (actual time=0.001..0.001
rows=1 loops=44)
 Index Cond: (id = w1.branch_id)
 Heap Fetches: 44
 Buffers: shared hit=88
   ->  Index Only Scan using
"Employee_brancanh_employee_id_branch_id_06fcf064_uniq" on
authentication_employee_branch u1_2  (cost=0.29..0.31 rows=1 width=4)
(actual time=0.002..0.002 rows=1 loops=44)
 Index Cond: ((employee_id = 43) AND
(branch_id = w1.branch_id))
 Heap Fetches: 36
 Buffers: shared hit=125
   ->  Hash Join  (cost=120338.17..128483.90 rows=387 width=775) (actual
time=69312.677..69372.647 rows=2226 loops=13)
 Hash Cond: (marketing_app_homecounsellinglead.lead_id =
marketing_app_leadinfo.id)
 Buffers: shared hit=502293830 read=280 dirtied=1
 ->  Seq Scan on marketing_app_homecounsellinglead
 (cost=0.00..7513.53 rows=167553 width=775) (actual time=0.003..19.903
rows=167589 loops=13)
   Buffers: shared hit=75894
   ->  Hash  (cost=119938.59..119938.59 rows=31967 width=4) (actual
time=901063.121..901063.129 rows=5138545 loops=1)
   Buckets: 8388608 (originally 32768)  Batches: 1 (originally
1)  Memory Usage: 246188kB
   Buffers: shared hit=502217936 read=280 dirtied=1
   ->  Nested Loop  (cost=56.29..119938.59 rows=31967 width=4)
(actual time=0.271..899599.420 rows=5138545 loops=1)
 Join Filter: (u0.id = marketing_app_leadinfo.branch_id)
 Rows Removed by Join Filter: 713188839
 Buffers: shared hit=502217936 read=280 dirtied=1
 ->  Nested 

Re: Problem with a Query

2024-08-26 Thread Siraj G
Thanks Tom. Collecting full stats on the tables involved corrected the
execution.

On Tue, Aug 13, 2024 at 9:57 AM Tom Lane  wrote:

> Siraj G  writes:
> > We migrated a PgSQL database from Cloud SQL to compute engine and since
> > then there is a SQL we observed taking a long time. After some study, I
> > found that the SQL is using NESTED LOOP where the cost is too high.
>
> The core of your problem seems to be here:
>
> >  ->  Index Scan using marketing_a_cancel__55_idx
> on
> > marketing_app_leadhistory w0  (cost=0.57..4274.30 rows=1 width=8) (actual
> > time=46.678..51.232 rows=44 loops=1)
> >Index Cond: ((cancel_event_id IS NOT NULL) AND
> > (cancel_event_type = 1))
> >Filter: ((status_id = 93) AND
> > ((followup_date)::date >= '2024-08-01'::date) AND ((followup_date)::date
> <=
> > '2024-08-07'::date))
> >Rows Removed by Filter: 22268
> >Buffers: shared hit=9170 read=19
>
> If the planner had estimated 40-some rows out of this step, rather
> than one, it would certainly not have chosen to use nestloop joins
> atop this.  So the big problem to focus on is making that estimate
> better.
>
> A secondary problem is that the choice of index seems poor: the
> index itself is selecting 44+22268 = 22312 rows and then the filter
> condition is throwing away 99.8% of those rows.  Probably, using
> an index on (status_id, followup_date) would have worked better.
>
> I suspect that both of these things are tied to the non-normalization
> of your "cancel" condition.  The planner probably believes that
> "cancel_event_id IS NOT NULL" is statistically independent of
> "cancel_event_type = 1"; but I'll bet it isn't, and thus the index
> condition selects many more rows than the planner guessed.  You might
> be able to improve that estimate by creating extended stats on both of
> those columns, but really a better idea would be to take a step back
> and figure out if those two columns can't be merged into one.
>
> regards, tom lane
>


Problem with a query

2024-08-26 Thread Siraj G
Hello!

We have a couple of queries that all of a sudden became rather slow. I
took explain analyze from one of the SQLs as bdlow. Can you please check
and suggest if anything can be done?

'-> Table scan on   (actual time=0.019..71.526 rows=38622
loops=1)\n
-> Aggregate using temporary table  (actual time=33891741.611..33891815.749
rows=38622 loops=1)\n
-> Nested loop inner join  (cost=155705096.99 rows=0) (actual
time=33872496.362..33891210.133 rows=38622 loops=1)\n
-> Filter: (SES.IS_SERVER_ID <> (-(1)))  (cost=371424.15
rows=1803523) (actual time=0.036..3921.142 rows=3651493 loops=1)\n

-> Index scan on SES using IS_SESSION_IDX4  (cost=371424.15 rows=3607044)
(actual time=0.034..3600.114 rows=3651493 loops=1)\n
-> Index lookup on DST using  (IS_SESSION_ID=SES.IS_SESSION_ID)
 (actual time=0.004..0.004 rows=0 loops=3651493)\n
-> Materialize  (cost=0.00..0.00 rows=0) (actual
time=33886497.639..33886608.008 rows=38622 loops=1)\n
-> Table scan on   (actual time=0.018..51.715 rows=38622
loops=1)\n
-> Aggregate using temporary table  (actual time=33872191.430..33872246.080
rows=38622 loops=1)\n
-> Nested loop left join  (cost=1025850971.42 rows=1242668643) (actual
time=910.618..33869299.956 rows=38622 loops=1)\n

-> Nested loop left join  (cost=124916205.44 rows=1242668643) (actual
time=910.606..33868869.982 rows=38622 loops=1)\n

-> Nested loop left join  (cost=649328.22 rows=949) (actual
time=21.155..4387.994 rows=38622 loops=1)\n

-> Nested loop left join  (cost=648884.46 rows=949) (actual
time=21.144..4189.892 rows=38622 loops=1)\n

-> Nested loop inner join  (cost=648552.34 rows=949) (actual
time=21.127..3847.460 rows=38622 loops=1)\n

-> Inner hash join (MD.REP_ID = P.IS_REPOSITORY_ID)  (cost=0.96 rows=0)
(actual time=0.045..0.054 rows=1 loops=1)\n

-> Table scan on MD  (cost=7.00 rows=2) (actual time=0.007..0.013 rows=2
loops=1)\n
-> Hash\n
-> Nested loop inner join  (cost=0.70 rows=0) (actual time=0.024..0.027
rows=1 loops=1)\n

-> Filter: (LU.IS_PROJ_ID is not null)  (cost=0.35 rows=1) (actual
time=0.010..0.012 rows=1 loops=1)\n

-> Table scan on LU  (cost=0.35 rows=1) (actual time=0.009..0.011 rows=1
loops=1)\n
-> Filter: (P.IS_PROJ_GUID = LU.IS_PROJ_GUID)  (cost=0.26 rows=0) (actual
time=0.014..0.014 rows=1 loops=1)\n

-> Single-row index lookup on P using PRIMARY (IS_PROJ_ID=LU.IS_PROJ_ID)
 (cost=0.26 rows=1) (actual time=0.012..0.012 rows=1 loops=1)\n

-> Filter: ((DS.REPOSITORYID = MD.REP_GUID) and (DS.PROJECTID =
LU.IS_PROJ_GUID) and (DS.RECORDTIME >= LU.IS_WIN_BEGIN) and (DS.RECORDTIME
< LU.IS_WIN_END))  (cost=11300581.57 rows=18978) (actual
time=21.080..3837.717 rows=38622 loops=1)\n

-> Index range scan on DS (re-planned for each iteration)
 (cost=11300581.57 rows=17084027) (actual time=21.071..3653.945 rows=39790
loops=1)\n
-> Filter: (MD.REP_ID = U.IS_REPOSITORY_ID)  (cost=0.25 rows=1) (actual
time=0.008..0.008 rows=1 loops=38622)\n

-> Single-row index lookup on U using EM_USER_PK (EM_USER_GUID=DS.USERID,
IS_REPOSITORY_ID=MD.REP_ID)  (cost=0.25 rows=1) (actual time=0.007..0.007
rows=1 loops=38622)\n
-> Single-row index lookup on D using IS_DOC_PK (IS_DOC_GUID=DS.DOCUMENTID,
IS_PROJ_ID=LU.IS_PROJ_ID)  (cost=0.37 rows=1) (actual time=0.004..0.004
rows=1 loops=38622)\n
-> Nested loop inner join  (cost=23163414.56 rows=1309557) (actual
time=714.186..876.817 rows=1 loops=38622)\n

-> Table scan on LU1  (cost=0.01 rows=1) (actual time=0.007..0.011 rows=1
loops=38622)\n
-> Filter: ((SS01.PROJECTID = LU1.IS_PROJ_GUID) and (SS01.SCHEDULEID =
DS.JOBID) and (SS01.SESSIONID = DS.SESSIONID) and (SS01.RECORDTIME >=
LU1.IS_WIN_BEGIN) and (SS01.RECORDTIME < LU1.IS_WIN_END))  (cost=24410.22
rows=1309557) (actual time=714.176..876.804 rows=1 loops=38622)\n

-> Index lookup on SS01 using IS_SCHEDULE_STATS_IDX1 (SCHEDULETYPE=1)
 (cost=24410.22 rows=1309557) (actual time=0.035..522.644 rows=1360349
loops=38622)\n
-> Index lookup on S using IS_SCHED_PK (IS_SCHED_GUID=SS01.TRIGGERID)
 (cost=0.63 rows=1) (actual time=0.009..0.010 rows=1 loops=38622)\n'

Regards
Siraj


Re: Help in dealing with OOM

2024-10-16 Thread Siraj G
Thanks Joe, I will set these kernel parameters.

I also would like to highlight that the issue happened on SECONDARY. While
the PRIMARY has less memory and computation in comparison to SECONDARY, not
sure if there is anything wrong in the PgSQL.

PRIMARY: 48vCPUs & 48GB memory
SECONDARY: 64vCPUs & 64GB memory

I noticed a few things which do not sound tidy:
1. Total number of DBs are: 1860  (DB environment serves a product that has
tenants - around 1100 tenants which means these many DBs are active)
 : Is there any metric for optimal performance on the number of DBs we
should have per instance? I would assume NO (and it should be purely based
on the overall operations), but just a question out of curiosity.
2. max_connections is set to 1.
I tried to reduce it to 4000 but was unable to do so (I tried this after
reducing the max_connections in PRIMARY to 4000). This is the error:
FATAL:  hot standby is not possible because max_connections = 4000 is a
lower setting than on the master server (its value was 1)

If I am clubbing multiple things, sorry for the clutter.

Regards
Siraj

On Tue, Oct 15, 2024 at 12:39 AM Joe Conway  wrote:

> On 10/14/24 14:37, Siraj G wrote:
> > This is from the OS log (/var/log/kern.log):
> >
> > oom-
> >
> kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,global_oom,task_memcg=/system.sli
>  ce/system-postgresql.slice/postgresql@12-main.service
> ,task=postgres,pid=2334587,uid=114
> >494 Oct 14 09:58:10 gce-k12-prod-as1-erp-pg-secondary kernel:
> > [6905020.514569] Out of memory: Killed process 2334587 (postgres) total-
> > vm:26349584kB, anon-rss:3464kB, file-rss:0kB, shmem-rs
> >   s:21813032kB, UID:114 pgtables:49024kB oom_score_adj:0
>
>
> 1. Do you happen to have swap disabled? If so, don't do that.
>
> 2. Does the postgres cgroup have memory.limit (cgroup v1) or memory.max
> (cgroup v2) set?
>
> 3. If #2 answer is no, have you followed the documented guidance here
> (in particular vm.overcommit_memory=2):
>
>
>
> https://www.postgresql.org/docs/12/kernel-resources.html#LINUX-MEMORY-OVERCOMMIT
>
>
> --
> Joe Conway
> PostgreSQL Contributors Team
> RDS Open Source Databases
> Amazon Web Services: https://aws.amazon.com
>


Help in dealing with OOM

2024-10-14 Thread Siraj G
Hello Experts!

My secondary instance has been unstable today. The service is crashing with
Out of Memory. Please see below error
(/var/log/postgresql/postgresql-2024-10-14.log):

10.2.52.50,2024-10-14 15:28:11 IST,686671,finance_revamp,finance_b2b,1,LOG:
 duration: 1148.527 ms  statement: SELECT "tripschedule"."id",
"tripschedule"."name", "tripschedule"."branch_id",
"tripschedule"."route_id", "tripschedule"."route_name",
"tripschedule"."bus_id", "tripschedule"."path",
"tripschedule"."path_timings", "tripschedule"."recurring_days",
"tripschedule"."start_time", "tripschedule"."end_time",
"tripschedule"."start_date", "tripschedule"."end_date",
"tripschedule"."created_at", "tripschedule"."created_by",
"tripschedule"."shift", "tripschedule"."is_deleted",
"tripschedule"."is_active", "tripschedule"."is_cancelled",
"tripschedule"."branch_latitude", "tripschedule"."branch_longitude",
"tripschedule"."polygon_id", "tripschedule"."is_after_noon_shift" FROM
"tripschedule" INNER JOIN "bus" ON ("tripschedule"."bus_id" = "bus"."id")
WHERE ("bus"."vehicle_no" = 'KA51AH1922' AND "tripschedule"."end_date" >=
'2024-10-14'::date AND "tripschedule"."is_active" AND
"tripschedule"."recurring_days" && ARRAY[1]::integer[] AND
"tripschedule"."start_date" <= '2024-10-14'::date AND
("tripschedule"."start_time" BETWEEN '14:57:57.654167'::time AND
'15:57:57.654167'::time OR "tripschedule"."end_time" BETWEEN
'14:57:57.654167'::time AND '15:57:57.654167'::time OR
("tripschedule"."start_time" <= '15:27:57.654167'::time AND
"tripschedule"."end_time" >= '15:27:57.654167'::time))) ORDER BY
"tripschedule"."id" DESC LIMIT 1
10.2.52.22,2024-10-14 15:28:11
IST,686748,orchids_letseduvate_db,autoscaling,1,LOG:  duration: 468.028 ms
 statement: SELECT (1) AS "a" FROM "test" INNER JOIN "test_section_mapping"
ON ("test"."id" = "test_section_mapping"."test_id") INNER JOIN
"test_subjects" ON ("test"."id" = "test_subjects"."test_id") INNER JOIN
"user_response" ON ("test"."id" = "user_response"."test_id") WHERE
(("test"."test_date")::date >= '2024-10-14'::date AND
("test"."test_date")::date <= '2024-10-17'::date AND NOT "test"."is_delete"
AND "test_section_mapping"."sectionmapping_id" IN (136364) AND
"test_subjects"."subject_id" = 16 AND NOT "user_response"."is_delete" AND
"user_response"."submitted_by_id" = 61725) LIMIT 1
,2024-10-14 15:28:11 IST,2334064,,,8,LOG:  checkpointer process (PID
2334587) was terminated by signal 9: Killed
,2024-10-14 15:28:11 IST,2334064,,,9,LOG:  terminating any other active
server processes
10.2.52.50,2024-10-14 15:28:11
IST,686752,mcollege_letseduvate_db,finance_b2b,1,WARNING:  terminating
connection because

This is from the OS log (/var/log/kern.log):

oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,global_oom,task_memcg=/system.sli
 ce/system-postgresql.slice/postgresql@12-main.service
,task=postgres,pid=2334587,uid=114
  494 Oct 14 09:58:10 gce-k12-prod-as1-erp-pg-secondary kernel:
[6905020.514569] Out of memory: Killed process 2334587 (postgres)
total-vm:26349584kB, anon-rss:3464kB, file-rss:0kB, shmem-rs
 s:21813032kB, UID:114 pgtables:49024kB oom_score_adj:0

Regards
Siraj


Need assistance in converting subqueries to joins

2024-09-19 Thread Siraj G
Hello Tech gents!

I am sorry if I am asking the wrong question to this group, but wanted
assistance in converting a query replacing subqueries with joins.

Please find the query below (whose cost is very high):

select em_exists_id from IS_SEC_FILT WHERE (IS_SEC_FILT_GUID) NOT IN
(SELECT IS_OBJ_GUID FROM TMP_IS_SEC_FILT T0, IS_PROJ P0 WHERE
T0.IS_PROJ_GUID = P0.IS_PROJ_GUID AND P0.IS_PROJ_ID =
IS_SEC_FILT.IS_PROJ_ID) AND (IS_PROJ_ID) IN (SELECT IS_PROJ_ID FROM IS_PROJ
P0, TMP_IS_SEC_FILT T0, EM_MD R0 WHERE T0.IS_REPOSITORY_GUID = R0.REP_GUID
AND T0.IS_PROJ_GUID = P0.IS_PROJ_GUID AND P0.IS_REPOSITORY_ID = R0.REP_ID);

Regards
Siraj


Re: Need assistance in converting subqueries to joins

2024-09-19 Thread Siraj G
Hello Adrian!

Please find below the query in the format and its execution plan:

SELECT
em_exists_idFROM
IS_SEC_FILTWHERE (IS_SEC_FILT_GUID)NOT IN (
SELECT
IS_OBJ_GUID
FROM
TMP_IS_SEC_FILT T0,
IS_PROJ P0
WHERE
T0.IS_PROJ_GUID = P0.IS_PROJ_GUID
AND P0.IS_PROJ_ID = IS_SEC_FILT.IS_PROJ_ID)AND (IS_PROJ_ID) IN (
SELECT
IS_PROJ_ID
FROM
IS_PROJ P0,
TMP_IS_SEC_FILT T0,
EM_MD R0
WHERE
T0.IS_REPOSITORY_GUID = R0.REP_GUID
AND T0.IS_PROJ_GUID = P0.IS_PROJ_GUID
AND P0.IS_REPOSITORY_ID = R0.REP_ID);


Query plan:

'-> Aggregate: count(0)  (cost=2284.32 rows=1988) (actual
time=22602.583..22602.584 rows=1 loops=1)\n
-> Remove duplicate (P0, IS_SEC_FILT) rows using temporary table
(weedout)  (cost=2085.53 rows=1988) (actual time=0.321..22600.652
rows=10298 loops=1)\n-> Filter:
(IS_SEC_FILT.IS_SEC_FILT_GUID,(select #2) is
false)  (cost=2085.53 rows=1988) (actual time=0.315..22433.412
rows=514900 loops=1)\n
-> Inner hash join (IS_SEC_FILT.IS_PROJ_ID = P0.IS_PROJ_ID)
(cost=2085.53 rows=1988) (actual time=0.188..96.362 rows=517350
loops=1)\n
-> Index scan on IS_SEC_FILT using IS_SEC_FILT_PK  (cost=28.84
rows=19879) (actual time=0.019..7.386 rows=20086 loops=1)\n
-> Hash\n
-> Nested loop inner join  (cost=8.05 rows=1) (actual
time=0.064..0.132 rows=50 loops=1)\n
-> Inner hash join (T0.IS_REPOSITORY_GUID = R0.REP_GUID)  (cost=1.70
rows=1) (actual time=0.047..0.094 rows=50 loops=1)\n
-> Filter: (T0.IS_PROJ_GUID is not null)  (cost=0.38 rows=5) (actual
time=0.010..0.041 rows=50 loops=1)\n
-> Table scan on T0  (cost=0.38 rows=50) (actual time=0.010..0.037
rows=50 loops=1)\n
-> Hash\n
-> Filter: (R0.REP_ID is not null)  (cost=0.45 rows=2) (actual
time=0.022..0.025 rows=2 loops=1)\n
-> Table scan on R0  (cost=0.45 rows=2) (actual time=0.021..0.023
rows=2 loops=1)\n
-> Filter: (P0.IS_REPOSITORY_ID = R0.REP_ID)  (cost=0.63 rows=1)
(actual time=0.001..0.001 rows=1 loops=50)\n
-> Single-row index lookup on P0 using IS_PROJ_PK
(IS_PROJ_GUID=T0.IS_PROJ_GUID, IS_REPOSITORY_ID=R0.REP_ID)  (cost=0.63
rows=1) (actual time=0.000..0.000 rows=1 loops=50)\n
-> Select #2 (subquery in condition; dependent)\n
-> Limit: 1 row(s)  (cost=5.98 rows=1) (actual time=0.043..0.043
rows=0 loops=517350)\n
-> Filter: (T0.IS_OBJ_GUID)  (cost=5.98 rows=1)
(actual time=0.043..0.043 rows=0 loops=517350)\n
-> Filter: (((IS_SEC_FILT.IS_SEC_FILT_GUID) = T0.IS_OBJ_GUID)
or (T0.IS_OBJ_GUID is null))  (cost=5.98 rows=1) (actual
time=0.042..0.042 rows=0 loops=517350)\n
-> Inner hash join (T0.IS_PROJ_GUID = P0.IS_PROJ_GUID)  (cost=5.98
rows=1) (actual time=0.004..0.038 rows=50 loops=517350)\n
  -> Table scan on T0  (cost=0.35 rows=50) (actual
time=0.001..0.022 rows=50 loops=517350)\n
  -> Hash\n-> Single-row index
lookup on P0 using PRIMARY (IS_PROJ_ID=IS_SEC_FILT.IS_PROJ_ID)
(cost=0.72 rows=1) (actual time=0.001..0.001 rows=1 loops=517350)\n'


On Fri, Sep 20, 2024 at 9:49 AM Adrian Klaver 
wrote:

> On 9/19/24 21:07, Siraj G wrote:
> > Hello Tech gents!
> >
> > I am sorry if I am asking the wrong question to this group, but wanted
> > assistance in converting a query replacing subqueries with joins.
> >
> > Please find the query below (whose cost is very high):
>
> Add the output of the EXPLAIN ANALYZE for the query.
>
> >
> > select em_exists_id from IS_SEC_FILT WHERE (IS_SEC_FILT_GUID) NOT IN
> > (SELECT IS_OBJ_GUID FROM TMP_IS_SEC_FILT T0, IS_PROJ P0 WHERE
> > T0.IS_PROJ_GUID = P0.IS_PROJ_GUID AND P0.IS_PROJ_ID =
> > IS_SEC_FILT.IS_PROJ_ID) AND (IS_PROJ_ID) IN (SELECT IS_PROJ_ID FROM
> > IS_PROJ P0, TMP_IS_SEC_FILT T0, EM_MD R0 WHERE T0.IS_REPOSITORY_GUID =
> > R0.REP_GUID AND T0.IS_PROJ_GUID = P0.IS_PROJ_GUID AND
> > P0.IS_REPOSITORY_ID = R0.REP_ID);
>
> For future reference formatting the query here:
>
> https://sqlformat.darold.net/
>
> helps get it into a form that is easier to follow:
>
> SELECT
>  em_exists_id
> FROM
>  IS_SEC_FILT
> WHERE (IS_SEC_FILT_GUID)
> NOT IN (
>  SELECT
>  IS_OBJ_GUID
>  FROM
>  TMP_IS_SEC_FILT T0,
>  IS_PROJ P0
>  WHERE
>  T0.IS_PROJ_GUID = P0.IS_PROJ_GUID
>  AND P0.IS_PROJ_ID = IS_SEC_FILT.IS_PROJ_ID)
> AND (IS_PROJ_ID) IN (
>  SELECT
>  IS_PROJ_ID
>  FROM
>  IS_PROJ P0,
>  TMP_IS_SEC_FILT T0,
>  EM_MD R0
>  WHERE
>  T0.IS_REPOSITORY_GUID = R0.REP_GUID
>  AND T0.IS_PROJ_GUID = P0.IS_PROJ_GUID
>  AND P0.IS_REPOSITORY_ID = R0.REP_ID);
>
>
> >
> > Regards
> > Siraj
>
> --
> Adrian Klaver
> adrian.kla...@aklaver.com
>
>


Performance difference between Primary & Secondary in the query execution

2024-10-17 Thread Siraj G
Hello Experts!

We have a PgSQL instance running with HA (secondary is being in sync with
streaming replication). Both the ends, we have same version, but not sure a
few SQLs behave badly in the secondary:

Primary:
PostgreSQL 12.20 (Ubuntu 12.20-0ubuntu0.20.04.1) on x86_64-pc-linux-gnu,
compiled by gcc (Ubuntu 9.4.0-1ubuntu1~20.04.2) 9.4.0, 64-bit

OS: Ubuntu 20.04.6 LTS \n \l

Secondary:
ostgreSQL 12.20 (Ubuntu 12.20-0ubuntu0.20.04.1) on x86_64-pc-linux-gnu,
compiled by gcc (Ubuntu 9.4.0-1ubuntu1~20.04.2) 9.4.0, 64-bit

OS: Ubuntu 20.04.6 LTS \n \l

The application consumes more data from secondary, hence the server has
extra vCPUs.

Can you please advise what needs to be checked.

FYI, I am attaching the query with the different execution plans.

Regards
Siraj


bad-explain-secondary
Description: Binary data


explain-primary
Description: Binary data


query
Description: Binary data


Re: Export operation efficiency in read replica

2025-03-20 Thread Siraj G
Hello Laurenz

As per my understanding coming to a proper conclusion wrt RPO with export
operation is challenging. Eg., the export started at x and ended at z, the
time stamp here for many data sets is different. Moreover, I do not think
there is an incremental way available for export, correct?

Please correct me if my understanding is wrong.

Although I do agree the primary choice of backup to be storage based, we
wanted to have export backups as well as a secondary.

Concerning the impact taking export on read only.. Would you think we may
run into recovery issues on the replica side or anything that would prevent
the operation from being successful?

Regards
Siraj

On Thu, Mar 20, 2025 at 5:29 PM Laurenz Albe 
wrote:

> On Thu, 2025-03-20 at 17:22 +0530, Siraj G wrote:
> > I have a DB with 1TB in size serving needs of one of our critical
> > applications. I have a requirement to take export of the DB on a
> > daily basis, but want to carry out this operation in read replica.
> > The postgresql version is: 16.6
> >
> > What would be the RPO of such backup?
>
> Depends on the speed of disk and network and on what an RPO is.
>
> > What would be the impact on the READ REPLICA with a long running
> > export operation?
>
> Potentially severe.
>
> You could look into storage technologies that allow you to take
> a snapshot and clone it.
>
> Yours,
> Laurenz Albe
>


Export operation efficiency in read replica

2025-03-20 Thread Siraj G
Hello Experts!

I have a DB with 1TB in size serving needs of one of our critical
applications. I have a requirement to take export of the DB on a daily
basis, but want to carry out this operation in read replica. The postgresql
version is: 16.6

What would be the RPO of such backup?
What would be the impact on the READ REPLICA with a long running export
operation?

Thank you!
Siraj


Re: Export operation efficiency in read replica

2025-03-21 Thread Siraj G
Thank you everyone!

On Fri, Mar 21, 2025 at 7:23 PM Guillaume Lelarge <
guillaume.lela...@dalibo.com> wrote:

> On 20/03/2025 15:04, Adrian Klaver wrote:
> > On 3/20/25 05:58, Siraj G wrote:
> >> Hello Laurenz
> >>
> >> As per my understanding coming to a proper conclusion wrt RPO
> >
> > You still have not defined what RPO is.
> >
>
> I guess the OP is talking about Recovery Point Objective, which is one
> of two important parameters WRT to disaster recovery. It's the maximum
> data loss you agree on with your backup solution. That mostly depends on
> the database context, something we can't tell. And according to how much
> you agree to lose (one minute of activity? one hour?), you then can
> choose between pg_dump or PITR backups.
>
> But with a 1TB-database, I wouldn't dare using pg_dump. PITR backup is
> the only option.
>
>
> --
> Guillaume Lelarge
> Consultant
> https://dalibo.com
>
>
>


Re: size of attributes table is too big

2025-03-24 Thread Siraj G
Thank you!

I noticed over 99% free space. Now the challenge is running FULL VACUUM on
a table with size over 500GB. It is going to take a couple of hours I
presume.

Also, I hope aggressive vacuuming will prevent us from this situation.

Regards
Siraj




On Wed, Mar 19, 2025 at 11:27 PM Ron Johnson 
wrote:

> On Wed, Mar 19, 2025 at 1:06 PM Siraj G  wrote:
>
>> Hello!
>>
>> I have a PG (v16) instance which is occupying around 1TB of storage. Out
>> of this, around 350GB is occupied by the table pg_catalog.pg_attribute.
>> Why is the catalog table's size so big?
>>
>> Here are the sizes:
>>
>> pg_attribute
>> 338 GB
>> pg_attribute_relid_attnam_index
>> 117 GB
>> pg_attribute_relid_attnum_index
>> 69 GB
>>
>> I think this table must have tons of dead tuples. Please suggest to me if
>> we can purge any data/shrink the size of this table.
>>
>>
> Run pgstattuple and pgstatindex on them.  They'll tell you how much bloat
> you have.
>
> And tune your autovacuum parameters to be more aggressive.  These, for
> example, are my settings:
> autovacuum_analyze_scale_factor = 0.015
> autovacuum_vacuum_scale_factor = 0.015
> autovacuum_vacuum_insert_scale_factor = 0.015
> autovacuum_vacuum_insert_threshold = 250
>
> --
> Death to , and butter sauce.
> Don't boil me, I'm still alive.
>  lobster!
>


Re: size of attributes table is too big

2025-03-19 Thread Siraj G
Hello Pavel

The SQL instance is a target of google DMS and it does have a physical
replica.

A couple of weeks back we did have a performance issue and vacuum was run
at that time to fix the problem. Very soon we may run into the same problem
I presume.

Regards
Siraj

On Wed, Mar 19, 2025 at 10:47 PM Pavel Stehule 
wrote:

>
>
> st 19. 3. 2025 v 18:14 odesílatel Álvaro Herrera 
> napsal:
>
>> Hello
>>
>> On 2025-Mar-19, Siraj G wrote:
>>
>> > I have a PG (v16) instance which is occupying around 1TB of storage.
>> Out of
>> > this, around 350GB is occupied by the table pg_catalog.pg_attribute.
>> > Why is the catalog table's size so big?
>>
>> Heavy use of temp tables is a known cause of this.
>>
>> > I think this table must have tons of dead tuples. Please suggest to me
>> if
>> > we can purge any data/shrink the size of this table.
>>
>> Yeah, I'd also bet that there are tons of dead tuples, or just unused
>> free space.  To purge it you would use VACUUM FULL, though that would
>> need to lock all accesses to the table.
>>
>> Does your instance run with autovacuum disabled perchance?
>>
>
> or long unclosed transactions, maybe forgotten replication slots, ...
>
> It is very strange so with this size it is still usable. Today hardware is
> unbelievable strong
>
> Regards
>
> Pavel
>
>
>>
>> --
>> Álvaro Herrera PostgreSQL Developer  —
>> https://www.EnterpriseDB.com/
>> Bob [Floyd] used to say that he was planning to get a Ph.D. by the "green
>> stamp method," namely by saving envelopes addressed to him as 'Dr. Floyd'.
>> After collecting 500 such letters, he mused, a university somewhere in
>> Arizona would probably grant him a degree.  (Don Knuth)
>>
>>
>>


Re: size of attributes table is too big

2025-03-19 Thread Siraj G
Hi Adrian

Used this query to find the sizes:

select  relname AS object_name,relkind AS
object_type,pg_size_pretty(pg_relation_size(oid)) AS object_size
FROM pg_class WHERE relnamespace = (SELECT oid FROM pg_namespace WHERE
nspname = 'pg_catalog')
ORDER BY pg_relation_size(oid) DESC;

We have close to 2000 tables, not sure about the columns. There should be
around 120 - 150 tables that are active, getting the data from DMS.
We do not create/drop tables in our instance unless required.

This is a cloud managed SQL and we do not have any custom setting on the
vacuum part.

On Wed, Mar 19, 2025 at 10:47 PM Adrian Klaver 
wrote:

> On 3/19/25 10:06, Siraj G wrote:
> > Hello!
> >
> > I have a PG (v16) instance which is occupying around 1TB of storage. Out
>
> Exact version of Postgres 16, include the x in 16.x.
>
> > of this, around 350GB is occupied by the table pg_catalog.pg_attribute.
> > Why is the catalog table's size so big?
> >
> > Here are the sizes:
>
> How did you measure the sizes?
>
> pg_attribute maintains information about table columns, how many table
> columns do you have?
>
> Are you creating/dropping tables on a regular basis?
>
> Is autovacuum running properly?
>
> Have you run VACUUM manually on pg_attribute?
>
> >
> > pg_attribute
> >
> > 338 GB
> > pg_attribute_relid_attnam_index
> >
> > 117 GB
> > pg_attribute_relid_attnum_index
> >
> > 69 GB
> >
> >
> > I think this table must have tons of dead tuples. Please suggest to me
> > if we can purge any data/shrink the size of this table.
> >
> > REgards
> > Siraj
>
> --
> Adrian Klaver
> adrian.kla...@aklaver.com
>
>


size of attributes table is too big

2025-03-25 Thread Siraj G
Hello!

I have a PG (v16) instance which is occupying around 1TB of storage. Out of
this, around 350GB is occupied by the table pg_catalog.pg_attribute.
Why is the catalog table's size so big?

Here are the sizes:

pg_attribute
338 GB
pg_attribute_relid_attnam_index
117 GB
pg_attribute_relid_attnum_index
69 GB

I think this table must have tons of dead tuples. Please suggest to me if
we can purge any data/shrink the size of this table.

REgards
Siraj


hide data from admins

2025-03-11 Thread Siraj G
Hello Experts!

What are the features available in Postgresql to hide PII (personal
identifiable information) from the Admin team? Like in Oracle we have data
vault and data redaction, I am looking for similar features in
PostgreSQL.We do not want to do code level changes.

Regards
Siraj