dealing with dependencies
Hi! I recently switched from Oracle SQL to PostgreSQL. In Oracle, I can easily delete a table or view that is used by existing views. The system marks the affected views. I can then re-define the deleted table or view and have all dependent views easily re-compiled. Done. PostgreSQL instead is preventing inconsistency. It simply refuses to delete a view or table that is referenced by other views. Consequently, I need to delete all dependent views first, re-define the one I want to change and then create all dependent views deleted before... - Which is much more difficult to handle. What I especially dislike is that you cannot even insert a column into an existing view if that view is used by some other views. E.g.: create table my_table (col1 text, col2 text); create view my_view1 as select col1, col2 from my_table; create view my_view2 as select col1, col2 from my_view1; create or replace view my_view1 as select col1, col1||col2, col2 from my_table; --> ERROR: Cannot change name of view column "col2" to .. The create or replace of view 2 fails. Clear, the manual states about create or replace view: "the new query must generate the same columns that were generated by the existing view query (that is, the same column names in the same order and with the same data types), but it may add additional columns to the end of the list.". Obviously, the columns are internally referenced by index, not by name. But if I want my new column between to exiting ones, I need to deleted my_view2, first... I wonder how you deal with it in a professional way. Sounds like some type of "make" (that UNIX tool dealing with dependencies in the context of e.g. programming in C) would be helpful... So, in an environment of rapid prototyping, if you develop the data-base design and view for tables etc. and you then need to make changes to a base table that affect all the views using it, there should be another way than doing all this manually...?! Thx for your pointers! I. == Verizon Deutschland GmbH - Sebrathweg 20, 44149 Dortmund, Germany - Amtsgericht Dortmund, HRB 14952 - Geschäftsführer: Detlef Eppig - Vorsitzender des Aufsichtsrats: Francesco de Maio
Re: Frequetly updated partial index leads to bloat on index for Postresql 11
Hi, We upgraded to 13 a couple of months ago on production but are still having an issue with bloated partial indexes which have an impact on our partial queries especially towards the end of a quarter when our quarterly-partitioned tables are getting big. I have built 14 (on macOS catalina, 14 beta 2) and run a test but the index grows fairly large (even though vacuums are running as the table is still relatively small - I put in 2 million inserts, each having one update of the column that makes up the partial index). The table is: Table "public.buyer" Column | Type | Collation | Nullable | Default | Storage | Compression | Stats target | Description ---++---+--+-+--+-+--+- buyer_id | integer| | not null | | plain| | | first_name| character varying(35) | | not null | | extended | | | last_name | character varying(35) | | not null | | extended | | | email_address | character varying(50) | | | | extended | | | status| character varying(256) | | not null | | extended | | | Indexes: "buyer_pkey" PRIMARY KEY, btree (buyer_id) "idex_buyer_inactive" btree (first_name) WHERE status::text = 'IN_PROGRESS'::text Access method: heap I run a loop to insert, commit, update, commit one row at a time as this is an emulation of what a similar table would experience in production. The index never has many rows with status=‘IN_PROGRESS’ as each row is set to CANCEL in the update. If the index is reindexed it takes 1 page as expected but without the reindexing it keeps growing, currently reaching 3MB - this is with 2 million inserts and updates but our production will have about 300 million inserts and > 300 million updates on the partial index in the quarter. Should we have seen more of an improvement in 14? Is it valid to look at the size of the index (\di+) as a measure of whether this latest change to bottom up index deleting has helped? Thanks, Tom > On 18 Mar 2021, at 16:30, Peter Geoghegan wrote: > > On Thu, Mar 18, 2021 at 6:51 AM Tom Dearman wrote: >> Is this a known issue, are they any ways around it, and if it is an >> issue is there a plan to fix it if a fix is possible? > > It's not exactly a known issue per se, but I think the problem here is > related to the fact that you have lots of duplicates, which did > perform rather badly prior to Postgres 12. I bet that you'd benefit > from upgrading to Postgres 12, or especially to Postgres 13. The > B-Tree space management is a lot better now. (Actually, it'll be > better again in Postgres 14.) > > -- > Peter Geoghegan
Re: Frequetly updated partial index leads to bloat on index for Postresql 11
We have change autovacuum so that it runs more frequently autovacuum_vacuum_scale_factor=0.035, the reason we have a partial index on the status is that in a table of 300 million entries, only about 100 or so would have status=‘IN_PROGRESS’ so we think this should be a nice small index and many of our queries want to look up with a where clause status=‘IN_PROGRESS’. In theory it works well, but we get a lot of index bloat as there is a lot of churn on the status value, ie each row starts as IN_PROGRESS and then goes to one of 4 possible completed statuses. > On 16 Jul 2021, at 15:49, Michael Lewis wrote: > > Have you tried setting autovacuum to run quite aggressively, perhaps just on > this table? Have you tried an index on the status column only, rather than > partial?
Re: Frequetly updated partial index leads to bloat on index for Postresql 11
Have you tried setting autovacuum to run quite aggressively, perhaps just on this table? Have you tried an index on the status column only, rather than partial?
Re: Frequetly updated partial index leads to bloat on index for Postresql 11
Tom Dearman writes: > We have change autovacuum so that it runs more frequently > autovacuum_vacuum_scale_factor=0.035, the reason we have a partial index on > the status is that in a table of 300 million entries, only about 100 or so > would have status=‘IN_PROGRESS’ so we think this should be a nice small index > and many of our queries want to look up with a where clause > status=‘IN_PROGRESS’. In theory it works well, but we get a lot of index > bloat as there is a lot of churn on the status value, ie each row starts as > IN_PROGRESS and then goes to one of 4 possible completed statuses. Is it really the case that only this index is bloating? In principle, an update on a row of the table should result in new entries in every index of the table. A partial index, due to the filter applied to possibly not store any index entry, should in theory have less bloat than other indexes. If that's not what you're seeing, there must be something about the data being stored in that index (not the partial-index filter condition) that results in a lot of low-occupancy index pages over time. You didn't say anything about what the data payload is. But we've seen bloat problems in indexes where, say, every tenth or hundredth value in the index ordering would persist for a long time while the ones in between get deleted quickly. That leads to low-density indexes that VACUUM can't do anything about. regards, tom lane
Re: ERROR: cannot freeze committed xmax
Hi Alvaro, glad to hear from you! This database is relatevely young, it was initdb'ed about a year ago or so and it was initially at 10.x. I don't know the exact minor version but the major version was 10 for sure. The problematic row is actually visible: SELECT COUNT(*) FROM pg_proc WHERE ctid = '(75,19)' count --- 1 (1 row) And it refers to a known custom stored function. Also I'm not sure if I can really skip freezing at this point because this table is already over autovacuum_freeze_max_age (correct me if I'm wrong): SELECT age(c.relfrozenxid), s.setting autovacuum_freeze_max_age FROM pg_class c, pg_settings s WHERE c.relname = 'pg_proc' AND s.name = 'autovacuum_freeze_max_age'; age| autovacuum_freeze_max_age ---+--- 213791108 | 2 (1 row) This is a production database which I can not restart at any point unfortunately, so I can't change the global settings atm. We can sacrifice that function and recreate it, but unfortunately when I execute DROP it fails with the error: ERROR: could not find tuple for rule 16396 I guess it's because of the broken pages at pg_depend. Do you think it is safe to execute DELETE on the corresponding records in pr_proc and pro_depend? Would it update the broken xmax? I feel like I'm dealing hear with some sort of data curruption. I recall that few months ago we already had issues with the same function and view based on it. At that point for some reason it just stopped working. So we had to DROP and RECREATE it. It fixed the issue, but apparently the data inconsistency stayed on disk and now we've got to the situation where AUTOVACUUM stopped working and it makes me really nervous. > On 15 Jul 2021, at 15:41, Alvaro Herrera wrote: > > One thing I forgot is that these XIDs are fairly old, perhaps dating > back to when this database was freshly initdb'd if there has been no XID > wraparound. In that case you were probably running a version much older > than 10.14 when they were written. Do you happen to know when did you > initdb this, with what version, when did you upgrade this to 10.14? > That may help search the commit log for bugfixes that might explain the > bug. I just remembered this one as my favorite candidate: > > Author: Alvaro Herrera > Branch: master Release: REL_11_BR [d2599ecfc] 2018-05-04 18:24:45 -0300 > Branch: REL_10_STABLE Release: REL_10_4 [e1d634758] 2018-05-04 18:23:58 -0300 > Branch: REL9_6_STABLE Release: REL9_6_9 [3a11485a5] 2018-05-04 18:23:30 -0300 > >Don't mark pages all-visible spuriously > >Dan Wood diagnosed a long-standing problem that pages containing tuples >that are locked by multixacts containing live lockers may spuriously end >up as candidates for getting their all-visible flag set. This has the >long-term effect that multixacts remain unfrozen; this may previously >pass undetected, but since commit XYZ it would be reported as > "ERROR: found multixact 134100944 from before relminmxid 192042633" >because when a later vacuum tries to freeze the page it detects that a >multixact that should have gotten frozen, wasn't. > >Dan proposed a (correct) patch that simply sets a variable to its >correct value, after a bogus initialization. But, per discussion, it >seems better coding to avoid the bogus initializations altogether, since >they could give rise to more bugs later. Therefore this fix rewrites >the logic a little bit to avoid depending on the bogus initializations. > >This bug was part of a family introduced in 9.6 by commit a892234f830e; >later, commit 38e9f90a227d fixed most of them, but this one was >unnoticed. > >Authors: Dan Wood, Pavan Deolasee, Álvaro Herrera >Reviewed-by: Masahiko Sawada, Pavan Deolasee, Álvaro Herrera >Discussion: > https://postgr.es/m/84ebac55-f06d-4fbe-a3f3-8bda093ce...@amazon.com > > > -- > Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/ > "El número de instalaciones de UNIX se ha elevado a 10, > y se espera que este número aumente" (UPM, 1972)
Re: Frequetly updated partial index leads to bloat on index for Postresql 11
Tom-Tom: On Fri, Jul 16, 2021 at 5:43 PM Tom Lane wrote: > Tom Dearman writes: > > We have change autovacuum so that it runs more frequently > > autovacuum_vacuum_scale_factor=0.035, the reason we have a partial index on > > the status is that in a table of 300 million entries, only about 100 or so > > would have status=‘IN_PROGRESS’ so we think this should be a nice small > > index and many of our queries want to look up with a where clause > > status=‘IN_PROGRESS’. In theory it works well, but we get a lot of index > > bloat as there is a lot of churn on the status value, ie each row starts as > > IN_PROGRESS and then goes to one of 4 possible completed statuses. > Is it really the case that only this index is bloating? In principle, an > update on a row of the table should result in new entries in every index > of the table. A partial index, due to the filter applied to possibly not > store any index entry, should in theory have less bloat than other > indexes. May be not in relative terms. If I understand correctly, lets say you start unbloated with 300M entries with 100 in progress. You insert, in small batches, 10K rows "in_progress" and update 10k "in_progress" row to, let's say, "done" ( may be including the 100 original ones ) ( description seems to fit this kind of flow, something like a task queue ). Then you will have 10k dead tuples bloating a 100 live entries index, for an absolutely small but relatively large bloat, while a full pk index will have 10k for 300m. I may be misunderstanding some thing and HOT and similar things may help here, but it seems like a plausible explanation for an apparent bloat ( on a small index, I had similar things but for task-like things I use a pending and a done table, and the pending table bloated a lot on the before-autovacuum times, small table nearly all dead tuples, not a problem vaccuming it via cron every some minutes, as in this case only the index is bloated autovacuum may not fire frequently enough for it, as the table would have only 10k/30M~333 ppm bloat ) Francisco Olarte
Re: Frequetly updated partial index leads to bloat on index for Postresql 11
Other indexes do bloat, but the percentage bloat is a lot less, presumably because this is a partial index where the partial column has a high degree of changes ie maybe 100 genuinely ‘live’ rows in a table of 300 million where every row has gone through a state where it would have been in the index. In some of our partitions we might have 2000 old rows that do hang around for a long time and another 100 or so ‘real’ partial index entries so 2200 in total but the number of rows would be 300 million so it is a lot less than 1%. > On 16 Jul 2021, at 16:43, Tom Lane wrote: > > Tom Dearman writes: >> We have change autovacuum so that it runs more frequently >> autovacuum_vacuum_scale_factor=0.035, the reason we have a partial index on >> the status is that in a table of 300 million entries, only about 100 or so >> would have status=‘IN_PROGRESS’ so we think this should be a nice small >> index and many of our queries want to look up with a where clause >> status=‘IN_PROGRESS’. In theory it works well, but we get a lot of index >> bloat as there is a lot of churn on the status value, ie each row starts as >> IN_PROGRESS and then goes to one of 4 possible completed statuses. > > Is it really the case that only this index is bloating? In principle, an > update on a row of the table should result in new entries in every index > of the table. A partial index, due to the filter applied to possibly not > store any index entry, should in theory have less bloat than other > indexes. > > If that's not what you're seeing, there must be something about the data > being stored in that index (not the partial-index filter condition) that > results in a lot of low-occupancy index pages over time. You didn't say > anything about what the data payload is. But we've seen bloat problems in > indexes where, say, every tenth or hundredth value in the index ordering > would persist for a long time while the ones in between get deleted > quickly. That leads to low-density indexes that VACUUM can't do anything > about. > > regards, tom lane
Re: dealing with dependencies
On Fri, 2021-07-16 at 14:42 +0200, Markhof, Ingolf wrote: > In Oracle, I can easily delete a table or view that is used by existing views. > The system marks the affected views. I can then re-define the deleted table > or > view and have all dependent views easily re-compiled. Done. ... or you end up with "invalid views" which give you an error when used. Sorry, but we don't like inconsistent data, however convenient they might be. > PostgreSQL instead is preventing inconsistency. It simply refuses to delete a > view or table that is referenced by other views. Consequently, I need to > delete all dependent views first, re-define the one I want to change and > then create all dependent views deleted before... - Which is much more > difficult > to handle. > > I wonder how you deal with it in a professional way. Sounds like some type of > "make" (that UNIX tool dealing with dependencies in the context of e.g. > programming in C) would be helpful... You have your view definitions stored in a source control system, and/or you employ a version management tool like Liquibase. Yours, Laurenz Albe -- Cybertec | https://www.cybertec-postgresql.com
Getting "ERROR: upper bound of FOR loop cannot be null" when trying to use a stored proc in a check constraint def.
This stored procedure ... create or replace function validate_proj_csv (proj_csv varchar) returns int language plpgsql as $$ -- This function used in a check constraint in the public.projects table to ensure that -- all projects in column sibling_project_csv are valid projects. DECLARE proj_arr varchar[]; see_prj int; BEGIN proj_arr := regexp_split_to_array(proj_csv,','); for x in 1 .. array_upper(proj_arr,1) loop select 1 into see_prj from public.projects where project = proj_arr[x]; if (see_prj is null) then raise notice 'Project "%" in project csv "%" is not a valid project.', proj_arr[x],proj_csv; return 0; end if; end loop; return 1; END; $$ ; ... works fine... dvdb=# select validate_proj_csv('sabin,strix2,ipu1.0'); validate_proj_csv --- 1 (1 row) dvdb=# select validate_proj_csv('sabin,strix2,ipu1.00'); NOTICE: Project "ipu1.00" in project csv "sabin,strix2,ipu1.00" is not a valid project. validate_proj_csv --- 0 (1 row) But when I try to use it in a check constraint dvdb=# alter table projects add constraint validate_sibling_project_csv check (validate_proj_csv(sibling_project_csv) = 0); ERROR: upper bound of FOR loop cannot be null CONTEXT: PL/pgSQL function validate_proj_csv(character varying) line 14 at FOR with integer loop variable What's going on ? How to get this to work ?
Re: Getting "ERROR: upper bound of FOR loop cannot be null" when trying to use a stored proc in a check constraint def.
On 7/16/21 3:26 PM, David Gauthier wrote: This stored procedure ... dvdb=# alter table projects add constraint validate_sibling_project_csv check (validate_proj_csv(sibling_project_csv) = 0); ERROR: upper bound of FOR loop cannot be null CONTEXT: PL/pgSQL function validate_proj_csv(character varying) line 14 at FOR with integer loop variable What's going on ? I'm going to say you have a NULL value in sibling_project_csv in the table. How to get this to work ? -- Adrian Klaver adrian.kla...@aklaver.com
Re: Frequetly updated partial index leads to bloat on index for Postresql 11
On Fri, Jul 16, 2021 at 9:19 AM Tom Dearman wrote: > Other indexes do bloat, but the percentage bloat is a lot less I have to imagine that the remaining problems have a lot to do with the fact that this is a partial index -- the partial index naturally gets vacuumed much less frequently than what would generally be considered ideal for the index itself. In general VACUUM scheduling makes the naive assumption that the indexes have the same needs as the table, which is far from the case with this partial index, for your workload. It's all of the specifics, taken together. It sounds like this is a case where bottom-up index deletion won't help -- it will only trigger in those indexes that are not "logically modified" by updates. But you're logically modifying these values. Or you're causing them to not need to be in the index anymore, by modifying the predicate. But that won't trigger bottom-up deletion. It's a bit like a delete, as far as the physical index structure is concerned -- the index won't be eagerly modified by the executor. The overall picture is that you cycle through all of the values in the table, and no cleanup can take place other than plain VACUUM (at least not to any significant extent). Although only a few hundred values are logically required to be indexed by the partial index at any one time, in practice no cleanup can run for long stretches of time (autovacuum just doesn't know about cases like this). This is why the partial index inevitably exceeds its theoretical pristine/high watermark size, which is actually more than 1 page/8KB, but still probably a lot less than what you actually see -- the partial index "falls through the cracks", even with recent enhancements that made cleanup more eager and more dynamic in certain other cases. I am afraid that I don't have a good suggestion right now. I can think of incremental improvements that would address this case, but for now they're just ideas. Fundamentally, we need to get to the partial index much more frequently than the other indexes, either within VACUUM or within some other mechanism. For example a conservative implementation of retail index tuple deletion might totally fix the issue. It does very much look like a problem in these partial indexes in particular -- it's quite possible that the other indexes won't grow at all due to garbage index tuples, especially on Postgres 14. -- Peter Geoghegan
Re: Getting "ERROR: upper bound of FOR loop cannot be null" when trying to use a stored proc in a check constraint def.
On Fri, Jul 16, 2021 at 3:26 PM David Gauthier wrote: > This stored procedure ... > > create or replace function validate_proj_csv (proj_csv varchar) > It is a function - I don't think you can used stored procedures in check constraints... > > dvdb=# alter table projects add constraint validate_sibling_project_csv > check (validate_proj_csv(sibling_project_csv) = 0); > ERROR: upper bound of FOR loop cannot be null > CONTEXT: PL/pgSQL function validate_proj_csv(character varying) line 14 > at FOR with integer loop variable > I'm actually surprised this alter command worked at all since you are violating a requirement for check constraints - namely that the expression be immutable. Your function, regardless of its declaration (which is default volatile), is not immutable. How to get this to work ? > You really need to re-write this as a trigger function. David J.