dealing with dependencies

2021-07-16 Thread Markhof, Ingolf
Hi!

I recently switched from Oracle SQL to PostgreSQL.

In Oracle, I can easily delete a table or view that is used by existing
views. The system marks the affected views. I can then re-define the
deleted table or view and have all dependent views easily re-compiled. Done.

PostgreSQL instead is preventing inconsistency. It simply refuses to delete
a view or table that is referenced by other views. Consequently, I need to
delete all dependent views first, re-define the one I want to change and
then create all dependent views deleted before... - Which is much more
difficult to handle.

What I especially dislike is that you cannot even insert a column into an
existing view if that view is used by some other views. E.g.:

create table my_table (col1 text, col2 text);
create view my_view1 as select col1, col2 from my_table;
create view my_view2 as select col1, col2 from my_view1;
create or replace view my_view1 as select col1, col1||col2, col2 from
my_table; --> ERROR: Cannot change name of view column "col2" to ..

The create or replace of view 2 fails. Clear, the manual states about
create or replace view: "the new query must generate the same columns that
were generated by the existing view query (that is, the same column names
in the same order and with the same data types), but it may add additional
columns to the end of the list.". Obviously, the columns are internally
referenced by index, not by name.  But if I want my new column between to
exiting ones, I need to deleted my_view2, first...

I wonder how you deal with it in a professional way. Sounds like some type
of "make" (that UNIX tool dealing with dependencies in the context of e.g.
programming in C) would be helpful...

So, in an environment of rapid prototyping, if you develop the data-base
design and view for tables etc. and you then need to make changes to a base
table that affect all the views using it, there should be another way than
doing all this manually...?!

Thx for your pointers!
I.

==

Verizon Deutschland GmbH - Sebrathweg 20, 44149 Dortmund, Germany - Amtsgericht 
Dortmund, HRB 14952 - Geschäftsführer: Detlef Eppig - Vorsitzender des 
Aufsichtsrats: Francesco de Maio



Re: Frequetly updated partial index leads to bloat on index for Postresql 11

2021-07-16 Thread Tom Dearman
Hi,

We upgraded to 13 a couple of months ago on production but are still having an 
issue with bloated partial indexes which have an impact on our partial queries 
especially towards the end of a quarter when our quarterly-partitioned tables 
are getting big.  I have built 14 (on macOS catalina, 14 beta 2) and run a test 
but the index grows fairly large (even though vacuums are running as the table 
is still relatively small - I put in 2 million inserts, each having one update 
of the column that makes up the partial index). The table is:

 Table "public.buyer"
Column |  Type  | Collation | Nullable | Default | 
Storage  | Compression | Stats target | Description 
---++---+--+-+--+-+--+-
 buyer_id  | integer|   | not null | | 
plain| |  | 
 first_name| character varying(35)  |   | not null | | 
extended | |  | 
 last_name | character varying(35)  |   | not null | | 
extended | |  | 
 email_address | character varying(50)  |   |  | | 
extended | |  | 
 status| character varying(256) |   | not null | | 
extended | |  | 
Indexes:
"buyer_pkey" PRIMARY KEY, btree (buyer_id)
"idex_buyer_inactive" btree (first_name) WHERE status::text = 
'IN_PROGRESS'::text
Access method: heap


I run a loop to insert, commit, update, commit one row at a time as this is an 
emulation of what a similar table would experience in production.  The index 
never has many rows with status=‘IN_PROGRESS’ as each row is set to CANCEL in 
the update.  If the index is reindexed it takes 1 page as expected but without 
the reindexing it keeps growing, currently reaching 3MB - this is with 2 
million inserts and updates but our production will have about 300 million 
inserts and > 300 million updates on the partial index in the quarter.  Should 
we have seen more of an improvement in 14?  Is it valid to look at the size of 
the index (\di+) as a measure of whether this latest change to bottom up index 
deleting has helped?

Thanks,

Tom

> On 18 Mar 2021, at 16:30, Peter Geoghegan  wrote:
> 
> On Thu, Mar 18, 2021 at 6:51 AM Tom Dearman  wrote:
>> Is this a known issue, are they any ways around it, and if it is an
>> issue is there a plan to fix it if a fix is possible?
> 
> It's not exactly a known issue per se, but I think the problem here is
> related to the fact that you have lots of duplicates, which did
> perform rather badly prior to Postgres 12. I bet that you'd benefit
> from upgrading to Postgres 12, or especially to Postgres 13. The
> B-Tree space management is a lot better now. (Actually, it'll be
> better again in Postgres 14.)
> 
> -- 
> Peter Geoghegan



Re: Frequetly updated partial index leads to bloat on index for Postresql 11

2021-07-16 Thread Tom Dearman
We have change autovacuum so that it runs more frequently 
autovacuum_vacuum_scale_factor=0.035, the reason we have a partial index on the 
status is that in a table of 300 million entries, only about 100 or so would 
have status=‘IN_PROGRESS’ so we think this should be a nice small index and 
many of our queries want to look up with a where clause status=‘IN_PROGRESS’.  
In theory it works well, but we get a lot of index bloat as there is a lot of 
churn on the status value, ie each row starts as IN_PROGRESS and then goes to 
one of 4 possible completed statuses. 

> On 16 Jul 2021, at 15:49, Michael Lewis  wrote:
> 
> Have you tried setting autovacuum to run quite aggressively, perhaps just on 
> this table? Have you tried an index on the status column only, rather than 
> partial?



Re: Frequetly updated partial index leads to bloat on index for Postresql 11

2021-07-16 Thread Michael Lewis
Have you tried setting autovacuum to run quite aggressively, perhaps just
on this table? Have you tried an index on the status column only, rather
than partial?


Re: Frequetly updated partial index leads to bloat on index for Postresql 11

2021-07-16 Thread Tom Lane
Tom Dearman  writes:
> We have change autovacuum so that it runs more frequently 
> autovacuum_vacuum_scale_factor=0.035, the reason we have a partial index on 
> the status is that in a table of 300 million entries, only about 100 or so 
> would have status=‘IN_PROGRESS’ so we think this should be a nice small index 
> and many of our queries want to look up with a where clause 
> status=‘IN_PROGRESS’.  In theory it works well, but we get a lot of index 
> bloat as there is a lot of churn on the status value, ie each row starts as 
> IN_PROGRESS and then goes to one of 4 possible completed statuses. 

Is it really the case that only this index is bloating?  In principle, an
update on a row of the table should result in new entries in every index
of the table.  A partial index, due to the filter applied to possibly not
store any index entry, should in theory have less bloat than other
indexes.

If that's not what you're seeing, there must be something about the data
being stored in that index (not the partial-index filter condition) that
results in a lot of low-occupancy index pages over time.  You didn't say
anything about what the data payload is.  But we've seen bloat problems in
indexes where, say, every tenth or hundredth value in the index ordering
would persist for a long time while the ones in between get deleted
quickly.  That leads to low-density indexes that VACUUM can't do anything
about.

regards, tom lane




Re: ERROR: cannot freeze committed xmax

2021-07-16 Thread Sasha Aliashkevich
Hi Alvaro, glad to hear from you!

This database is relatevely young, it was initdb'ed about a year ago or so and 
it was initially at 10.x. I don't know the exact minor version but the major 
version was 10 for sure. 

The problematic row is actually visible:

SELECT COUNT(*) FROM pg_proc WHERE ctid = '(75,19)'

 count 
---
 1
(1 row) 

And it refers to a known custom stored function.

Also I'm not sure if I can really skip freezing at this point because this 
table is already over autovacuum_freeze_max_age (correct me if I'm wrong):

SELECT age(c.relfrozenxid), s.setting autovacuum_freeze_max_age 
FROM pg_class c, pg_settings s 
WHERE c.relname = 'pg_proc' AND s.name = 'autovacuum_freeze_max_age';

age| autovacuum_freeze_max_age 
---+---
 213791108 | 2
(1 row)

This is a production database which I can not restart at any point 
unfortunately, so I can't change the global settings atm.

We can sacrifice that function and recreate it, but unfortunately when I 
execute DROP it fails with the error:

ERROR:  could not find tuple for rule 16396

I guess it's because of the broken pages at pg_depend.
Do you think it is safe to execute DELETE on the corresponding records in 
pr_proc and pro_depend? Would it update the broken xmax?

I feel like I'm dealing hear with some sort of data curruption.
I recall that few months ago we already had issues with the same function and 
view based on it. At that point for some reason it just stopped working. So we 
had to DROP and RECREATE it.
It fixed the issue, but apparently the data inconsistency stayed on disk and 
now we've got to the situation where AUTOVACUUM stopped working and it makes me 
really nervous.



> On 15 Jul 2021, at 15:41, Alvaro Herrera  wrote:
> 
> One thing I forgot is that these XIDs are fairly old, perhaps dating
> back to when this database was freshly initdb'd if there has been no XID
> wraparound.  In that case you were probably running a version much older
> than 10.14 when they were written.  Do you happen to know when did you
> initdb this, with what version, when did you upgrade this to 10.14?
> That may help search the commit log for bugfixes that might explain the
> bug.  I just remembered this one as my favorite candidate:
> 
> Author: Alvaro Herrera 
> Branch: master Release: REL_11_BR [d2599ecfc] 2018-05-04 18:24:45 -0300
> Branch: REL_10_STABLE Release: REL_10_4 [e1d634758] 2018-05-04 18:23:58 -0300
> Branch: REL9_6_STABLE Release: REL9_6_9 [3a11485a5] 2018-05-04 18:23:30 -0300
> 
>Don't mark pages all-visible spuriously
> 
>Dan Wood diagnosed a long-standing problem that pages containing tuples
>that are locked by multixacts containing live lockers may spuriously end
>up as candidates for getting their all-visible flag set.  This has the
>long-term effect that multixacts remain unfrozen; this may previously
>pass undetected, but since commit XYZ it would be reported as
>  "ERROR: found multixact 134100944 from before relminmxid 192042633"
>because when a later vacuum tries to freeze the page it detects that a
>multixact that should have gotten frozen, wasn't.
> 
>Dan proposed a (correct) patch that simply sets a variable to its
>correct value, after a bogus initialization.  But, per discussion, it
>seems better coding to avoid the bogus initializations altogether, since
>they could give rise to more bugs later.  Therefore this fix rewrites
>the logic a little bit to avoid depending on the bogus initializations.
> 
>This bug was part of a family introduced in 9.6 by commit a892234f830e;
>later, commit 38e9f90a227d fixed most of them, but this one was
>unnoticed.
> 
>Authors: Dan Wood, Pavan Deolasee, Álvaro Herrera
>Reviewed-by: Masahiko Sawada, Pavan Deolasee, Álvaro Herrera
>Discussion: 
> https://postgr.es/m/84ebac55-f06d-4fbe-a3f3-8bda093ce...@amazon.com
> 
> 
> -- 
> Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/
> "El número de instalaciones de UNIX se ha elevado a 10,
> y se espera que este número aumente" (UPM, 1972)





Re: Frequetly updated partial index leads to bloat on index for Postresql 11

2021-07-16 Thread Francisco Olarte
Tom-Tom:

On Fri, Jul 16, 2021 at 5:43 PM Tom Lane  wrote:
> Tom Dearman  writes:
> > We have change autovacuum so that it runs more frequently 
> > autovacuum_vacuum_scale_factor=0.035, the reason we have a partial index on 
> > the status is that in a table of 300 million entries, only about 100 or so 
> > would have status=‘IN_PROGRESS’ so we think this should be a nice small 
> > index and many of our queries want to look up with a where clause 
> > status=‘IN_PROGRESS’.  In theory it works well, but we get a lot of index 
> > bloat as there is a lot of churn on the status value, ie each row starts as 
> > IN_PROGRESS and then goes to one of 4 possible completed statuses.
> Is it really the case that only this index is bloating?  In principle, an
> update on a row of the table should result in new entries in every index
> of the table.  A partial index, due to the filter applied to possibly not
> store any index entry, should in theory have less bloat than other
> indexes.

May be not in relative terms. If I understand correctly, lets say you
start unbloated with 300M entries with 100 in progress. You insert, in
small batches, 10K rows "in_progress" and update 10k "in_progress" row
to, let's say, "done" ( may be including the 100 original ones ) (
description seems to fit this kind of flow, something like a task
queue ). Then you will have 10k dead tuples bloating a 100 live
entries index, for an absolutely small but relatively large bloat,
while a full pk index will have 10k for 300m.

I may be misunderstanding some thing and HOT and similar things may
help here, but it seems like a plausible explanation for an apparent
bloat ( on a small index, I had similar things but for task-like
things I use a pending and a done table, and the pending table bloated
a lot on the before-autovacuum times,  small table nearly all dead
tuples, not a problem vaccuming it via cron every some minutes, as in
this case only the index is bloated autovacuum may not fire frequently
enough for it, as the table would have only 10k/30M~333 ppm bloat  )

Francisco Olarte




Re: Frequetly updated partial index leads to bloat on index for Postresql 11

2021-07-16 Thread Tom Dearman
Other indexes do bloat, but the percentage bloat is a lot less, presumably 
because this is a partial index where the partial column has a high degree of 
changes ie maybe 100 genuinely ‘live’ rows in a table of 300 million where 
every row has gone through a state where it would have been in the index.  In 
some of our partitions we might have 2000 old rows that do hang around for a 
long time and another 100 or so ‘real’ partial index entries so 2200 in total 
but the number of rows would be 300 million so it is a lot less than 1%.


> On 16 Jul 2021, at 16:43, Tom Lane  wrote:
> 
> Tom Dearman  writes:
>> We have change autovacuum so that it runs more frequently 
>> autovacuum_vacuum_scale_factor=0.035, the reason we have a partial index on 
>> the status is that in a table of 300 million entries, only about 100 or so 
>> would have status=‘IN_PROGRESS’ so we think this should be a nice small 
>> index and many of our queries want to look up with a where clause 
>> status=‘IN_PROGRESS’.  In theory it works well, but we get a lot of index 
>> bloat as there is a lot of churn on the status value, ie each row starts as 
>> IN_PROGRESS and then goes to one of 4 possible completed statuses. 
> 
> Is it really the case that only this index is bloating?  In principle, an
> update on a row of the table should result in new entries in every index
> of the table.  A partial index, due to the filter applied to possibly not
> store any index entry, should in theory have less bloat than other
> indexes.
> 
> If that's not what you're seeing, there must be something about the data
> being stored in that index (not the partial-index filter condition) that
> results in a lot of low-occupancy index pages over time.  You didn't say
> anything about what the data payload is.  But we've seen bloat problems in
> indexes where, say, every tenth or hundredth value in the index ordering
> would persist for a long time while the ones in between get deleted
> quickly.  That leads to low-density indexes that VACUUM can't do anything
> about.
> 
>   regards, tom lane





Re: dealing with dependencies

2021-07-16 Thread Laurenz Albe
On Fri, 2021-07-16 at 14:42 +0200, Markhof, Ingolf wrote:
> In Oracle, I can easily delete a table or view that is used by existing views.
>  The system marks the affected views. I can then re-define the deleted table 
> or
>  view and have all dependent views easily re-compiled. Done.

... or you end up with "invalid views" which give you an error when used.
Sorry, but we don't like inconsistent data, however convenient they might be.

> PostgreSQL instead is preventing inconsistency. It simply refuses to delete a
>  view or table that is referenced by other views. Consequently, I need to
>  delete all dependent views first, re-define the one I want to change and
>  then create all dependent views deleted before... - Which is much more 
> difficult
>  to handle.
> 
> I wonder how you deal with it in a professional way. Sounds like some type of
>  "make" (that UNIX tool dealing with dependencies in the context of e.g.
>  programming in C) would be helpful...

You have your view definitions stored in a source control system, and/or
you employ a version management tool like Liquibase.

Yours,
Laurenz Albe
-- 
Cybertec | https://www.cybertec-postgresql.com





Getting "ERROR: upper bound of FOR loop cannot be null" when trying to use a stored proc in a check constraint def.

2021-07-16 Thread David Gauthier
This stored procedure ...

create or replace function validate_proj_csv (proj_csv varchar)
returns int
language plpgsql
as
$$

-- This function used in a check constraint in the public.projects table to
ensure that
-- all projects in column sibling_project_csv are valid projects.

DECLARE
  proj_arr varchar[];
  see_prj int;

BEGIN

  proj_arr := regexp_split_to_array(proj_csv,',');

  for x in 1 .. array_upper(proj_arr,1)
  loop
select 1 into see_prj from public.projects where project = proj_arr[x];
if (see_prj is null) then
  raise notice 'Project "%" in project csv "%" is not a valid
project.', proj_arr[x],proj_csv;
  return 0;
end if;

  end loop;

  return 1;

END;
$$ ;

... works fine...

dvdb=# select validate_proj_csv('sabin,strix2,ipu1.0');
 validate_proj_csv
---
 1
(1 row)

dvdb=# select validate_proj_csv('sabin,strix2,ipu1.00');
NOTICE:  Project "ipu1.00" in project csv "sabin,strix2,ipu1.00" is not a
valid project.
 validate_proj_csv
---
 0
(1 row)


But when I try to use it in a check constraint

dvdb=# alter table projects add constraint validate_sibling_project_csv
check (validate_proj_csv(sibling_project_csv) = 0);
ERROR:  upper bound of FOR loop cannot be null
CONTEXT:  PL/pgSQL function validate_proj_csv(character varying) line 14 at
FOR with integer loop variable

What's going on ?
How to get this to work ?


Re: Getting "ERROR: upper bound of FOR loop cannot be null" when trying to use a stored proc in a check constraint def.

2021-07-16 Thread Adrian Klaver

On 7/16/21 3:26 PM, David Gauthier wrote:

This stored procedure ...


dvdb=# alter table projects add constraint validate_sibling_project_csv 
check (validate_proj_csv(sibling_project_csv) = 0);

ERROR:  upper bound of FOR loop cannot be null
CONTEXT:  PL/pgSQL function validate_proj_csv(character varying) line 14 
at FOR with integer loop variable


What's going on ?


I'm going to say you have a NULL value in sibling_project_csv in the table.


How to get this to work ?




--
Adrian Klaver
adrian.kla...@aklaver.com




Re: Frequetly updated partial index leads to bloat on index for Postresql 11

2021-07-16 Thread Peter Geoghegan
On Fri, Jul 16, 2021 at 9:19 AM Tom Dearman  wrote:
> Other indexes do bloat, but the percentage bloat is a lot less

I have to imagine that the remaining problems have a lot to do with
the fact that this is a partial index -- the partial index naturally
gets vacuumed much less frequently than what would generally be
considered ideal for the index itself. In general VACUUM scheduling
makes the naive assumption that the indexes have the same needs as the
table, which is far from the case with this partial index, for your
workload. It's all of the specifics, taken together.

It sounds like this is a case where bottom-up index deletion won't
help -- it will only trigger in those indexes that are not "logically
modified" by updates. But you're logically modifying these values. Or
you're causing them to not need to be in the index anymore, by
modifying the predicate. But that won't trigger bottom-up deletion.
It's a bit like a delete, as far as the physical index structure is
concerned -- the index won't be eagerly modified by the executor.

The overall picture is that you cycle through all of the values in the
table, and no cleanup can take place other than plain VACUUM (at least
not to any significant extent). Although only a few hundred values are
logically required to be indexed by the partial index at any one time,
in practice no cleanup can run for long stretches of time (autovacuum
just doesn't know about cases like this). This is why the partial
index inevitably exceeds its theoretical pristine/high watermark size,
which is actually more than 1 page/8KB, but still probably a lot less
than what you actually see -- the partial index "falls through the
cracks", even with recent enhancements that made cleanup more eager
and more dynamic in certain other cases.

I am afraid that I don't have a good suggestion right now. I can think
of incremental improvements that would address this case, but for now
they're just ideas. Fundamentally, we need to get to the partial index
much more frequently than the other indexes, either within VACUUM or
within some other mechanism. For example a conservative implementation
of retail index tuple deletion might totally fix the issue. It does
very much look like a problem in these partial indexes in particular
-- it's quite possible that the other indexes won't grow at all due to
garbage index tuples, especially on Postgres 14.

--
Peter Geoghegan




Re: Getting "ERROR: upper bound of FOR loop cannot be null" when trying to use a stored proc in a check constraint def.

2021-07-16 Thread David G. Johnston
On Fri, Jul 16, 2021 at 3:26 PM David Gauthier 
wrote:

> This stored procedure ...
>

> create or replace function validate_proj_csv (proj_csv varchar)
>

It is a function - I don't think you can used stored procedures in check
constraints...

>
> dvdb=# alter table projects add constraint validate_sibling_project_csv
> check (validate_proj_csv(sibling_project_csv) = 0);
>
ERROR:  upper bound of FOR loop cannot be null
> CONTEXT:  PL/pgSQL function validate_proj_csv(character varying) line 14
> at FOR with integer loop variable
>

I'm actually surprised this alter command worked at all since you are
violating a requirement for check constraints - namely that the expression
be immutable.  Your function, regardless of its declaration (which is
default volatile), is not immutable.

How to get this to work ?
>

You really need to re-write this as a trigger function.

David J.