Re: Enforcing uniqueness on [real estate/postal] addresses
> On May 11, 2020, at 12:55 PM, Peter Devoy wrote: > > Hi list > > I need to store addresses for properties (as in real estate) so in my > naivety I created a unique constraint like this: > > ALTER TABLE properties >ADD CONSTRAINT is_unique_address >UNIQUE ( >description, --e.g. Land north of Foo Cottage >address_identifier_general, >street, >postcode >); > > Of course, if any of the fields are NULL (which they often are) I end > up with duplicates. > > One solution may be to add NOT NULL constraints and use empty strings > instead of NULL values but, until asking around today, I thought this was > generally considered bad practice. > > Please can anyone recommend a way of approaching this? Perhaps empty strings > are pragmatic in this situation? Hi Peter, I wouldn’t use empty strings in place of NULL. It’s possible to define a partial unique index that has more or less the same effect as a constraint. Have you looked into them? Cheers Philip
Re: Same query taking less time in low configuration machine
> On Jul 14, 2020, at 5:27 AM, Vishwa Kalyankar > wrote: > > Hi, > > I have two machines - one with 8GB RAM & 4core CPU and the other with 64GB > Ram & 24 core CPU. Both machines have the same DB (Postgres 12 + Postgis > 2.5.3). Same query is taking less time in low end machine whereas more time > in high end machine. Any thoughts on where to look? I have tuned the db in > both machines according to https://pgtune.leopard.in.ua/#/ > > > Below I am pasting the output of query explain in both the machines. > > -bash-4.2$ psql -p 5434 > psql (12.3) > Type "help" for help. > > postgres=# \c IPDS_KSEB; > You are now connected to database "IPDS_KSEB" as user "postgres". > IPDS_KSEB=# explain analyze select * from > kseb_geometry_trace_with_barrier_partition(5,'kottarakara_version',437,'htline',2) > ; >QUERY > PLAN > - > Function Scan on kseb_geometry_trace_with_barrier_partition > (cost=0.25..10.25 rows=1000 width=169) (actual time=11626.548..11626.568 > rows=254 loops=1) > Planning Time: 0.212 ms > Execution Time: 11628.590 ms > > > -bash-4.2$ psql -p 5422 > psql (12.3) > Type "help" for help. > > postgres=# \c IPDS_KSEB; > You are now connected to database "IPDS_KSEB" as user "postgres". > IPDS_KSEB=# explain analyze select * from > kseb_geometry_trace_with_barrier_partition(5,'kottarakara_version',437,'htline',2) > ; >QUERY > PLAN > - > Function Scan on kseb_geometry_trace_with_barrier_partition > (cost=0.25..10.25 rows=1000 width=169) (actual time=22304.425..22304.448 > rows=254 loops=1) > Planning Time: 0.219 ms > Execution Time: 22352.219 ms > (3 rows) > Hi Vishwa, Is it possible that your data is in the cache on the low end machine but not on the high end machine? There’s both the Postgres cache and the OS disk cache to consider. You can see what’s in the Postgres cache with an extension like pg_buffercache. I don’t know of a way to see what’s in the OS cache; maybe others do. Cheers Philip
Re: is JIT available
> On Jul 25, 2020, at 8:21 AM, Pavel Stehule wrote: > > > > so 25. 7. 2020 v 14:04 odesílatel Scott Ribe > napsal: > > On Jul 24, 2020, at 9:55 PM, Pavel Stehule wrote: > > > > SELECT * FROM pg_config; > > That doesn't tell me whether or not it can actually be used. > > It shows if Postgres was compiled with JIT support. > > When you run EXPLAIN ANALYZE SELECT ... then you can see info about JIT > overhead. If you don't see notices about JIT in EXPLAIN, then JIT was not > used. I like Pavel’s 'EXPLAIN ANALYZE SELECT’ suggestion a lot. I think setting jit=on and jit_above_cost=1 and then running 'EXPLAIN ANALYZE SELECT’ is a very effective way to see whether jit is available in practice. On installations where jit isn’t available (like on my Mac or on AWS Aurora), you can still set jit=on in a session and Postgres doesn’t complain, but that doesn’t mean it’s actually enabled. Cheers Philip
Re: Implement a new data type
> On Aug 11, 2020, at 8:01 PM, raf wrote: > > On Tue, Aug 11, 2020 at 06:38:39AM -0700, Miles Elam > wrote: > >> Also of note: PostgreSQL already has a money type ( >> https://www.postgresql.org/docs/current/datatype-money.html) >> But you shouldn't use it ( >> https://wiki.postgresql.org/wiki/Don%27t_Do_This#Don.27t_use_money). >> >> I only bring it up so that you can know to make your money type a slightly >> different name to avoid a conflict. Money is deceptively hard to implement >> correctly. I'd recommend reading the second link if you have not already to >> avoid previously known issues. > > I use decimal(10,2) for whole cents, and decimal(12,6) > for sub-cents. Single currency only. I didn't know > there was a money type originally, but it wouldn't be > usable for me anyway without the ability to specify the > scale and precision. > > I recommend considering passing values to the database > as "decimal '1.23'" rather than bare numeric literals, > just so there's no chance of the value being > interpreted as a float at any stage by postgres. Maybe > that's being too paranoid but that's a good idea when > it comes to money. :-) Yes, I agree, this is also important (and easy to overlook) if you’re accessing the database via a non-SQL language. We use Python which, like most (all?) languages that rely on the underlying C library for floating point support, is vulnerable to floating point noise. Python has a fixed precision type, and like Postgres it also accepts character and float input. The float input can give surprising results. >>> decimal.Decimal('1.79') # This is OK Decimal('1.79') >>> decimal.Decimal(1.79) # This will not end well! Decimal('1.79003552713678800500929355621337890625') >>> In the case of a Postgres column like numeric(10,2), input like 1.79003552713678800500929355621337890625 will get rounded to 1.79 anyway and no harm will be done. But like you said, raf, it’s a good idea to be too paranoid. :-) Cheers Philip
Understanding EXPLAIN ANALYZE estimates when loops != 1
Hi all, I could use some help interpreting EXPLAIN ANALYZE output. -> Index Scan using ix_foo on t1 (cost=0.69..68.57 rows=3283 width=105) (actual time=0.006..0.918 rows=3760 loops=94) The actual rows returned by this plan node ~= 3760 * 94 = 353,440. Did postgres expect (estimate) 3283 rows from this join, or 3283 * 94 = 308,602? Same question for this node. -> Parallel Index Scan using pk_xyz on xyz (cost=0.29..2354.67 rows=54285 width=25) (actual time=0.049..6.326 rows=14864 loops=5) Actual rows ~= 14864 * 5 = 74,320, estimate = 54285 or 54285 * 5? THanks Philip
Re: Understanding EXPLAIN ANALYZE estimates when loops != 1
> On Aug 19, 2020, at 6:24 PM, David Rowley wrote: > > On Thu, 20 Aug 2020 at 09:55, Philip Semanchuk > wrote: >> I could use some help interpreting EXPLAIN ANALYZE output. >> >> -> Index Scan using ix_foo on t1 (cost=0.69..68.57 rows=3283 width=105) >> (actual time=0.006..0.918 rows=3760 loops=94) >> >> The actual rows returned by this plan node ~= 3760 * 94 = 353,440. > > Yes. It's total rows / loops rounded to the nearest integer number. > >> Did postgres expect (estimate) 3283 rows from this join, or 3283 * 94 = >> 308,602? > > Yes, that's the case at least when the node is not a Parallel node. > If this index scan was part of a parameterized nested loop, then > you'll see the estimate of the number of expected loops from the outer > side of the join. Thanks, I was wondering where the 94 came from. > Same question for this node. >> >> -> Parallel Index Scan using pk_xyz on xyz (cost=0.29..2354.67 rows=54285 >> width=25) (actual time=0.049..6.326 rows=14864 loops=5) >> >> Actual rows ~= 14864 * 5 = 74,320, estimate = 54285 or 54285 * 5? > > So parallel plans are a bit more complex. The row estimates are the > total estimated rows / the amount of workers we expect to do useful > work. You might expect the divisor there to be an integer number > since you can't really have 0.5 workers. However, it's more complex > than that since the leader has other tasks to take care of such as > pulling tuples from workers, it's not dedicated to helping out. Sounds like it help to set max_parallel_workers = 1 before running EXPLAIN ANALYZE in order to simplify the numbers, yes? Or is there a possibility that doing so would send the planner down an entirely different path? > > If you're into reading C code, then there's more information in > https://github.com/postgres/postgres/blob/master/src/backend/optimizer/path/costsize.c#L5699 > , if you hunt around for usages of that function then you'll see the > estimated row counts are divided by the return value of that function. Yes, I’ve spent some time reading that file and its relatives. It’s been helpful. Much appreciated Philip
Re: PKEY getting corrupted
> On Sep 10, 2020, at 6:43 AM, Abraham, Danny wrote: > > Hi, > > We have seen several times a situation where a PKEY is compromised and > duplicate values are created within a table. > > This has happened so far on PG 928 on Linux and Windows, and also on PG955 on > AIX. > > We ran massive test on PG10.4 but it has not recreated the problem. > > Has anyone faced the same issue? Any known bug? In my experience, this is usually due to an application writing PK values rather than allowing a Postgres sequence to generate them, or due to the application misusing sequences. Postgres sequences are monotonically increasing, so even if you insert sequence values “out of order”, they won’t overlap. That’s not to say a bug in Postgres is impossible, but every time I’ve seen this problem, it’s been my application, not Postgres. Postgres 10 added new syntax for PK columns "GENERATED ALWAYS AS IDENTITY” (https://www.postgresql.org/docs/10/sql-createtable.html). Postgres will raise an exception if an application tries to insert a specific value into a column defined this way. When we upgraded from 9.6 to 11 we converted all of our PK columns to use this syntax which added a nice safety net for us. You’re probably already aware that 9.5.23 is the oldest supported version in the 9.x series, so the sooner you can upgrade those unsupported 9.x versions the better. Maybe you’re in the middle of an upgrade already which is why this came up. :-) Cheers Philip
Re: postgres materialized view refresh performance
> On Oct 22, 2020, at 3:53 PM, Ayub M wrote: > > There is a table t which is used in a mview mv, this is the only table in the > mview definition. > > create table t (c1 int, ..., c10 int > ); > > -- there is a pk on say c1 column > create materialized view mv as select c1, c2...c10 from > t; > > ---there is a unique index on say c5 and bunch of other indexes on the mview. > The reason there is a mview created instead of using table t, is that that > the table gets truncated and reloaded every couple of hours and we don't want > users to see an empty table at any point of time that's why mview is being > used. > > Using "refresh materialized view concurrently", this mview is being used by > APIs and end users. > > Couple of questions I have - Hi Ayub, I’m not an expert on the subject; I hope you’ll get an answer from someone who is. :-) Until then, my answers might help. > • Whenever mview refresh concurrently happens, does pg create another > set of table and indexes and switch it with the orig? If no, then does it > update the existing data? My understanding is that when CONCURRENTLY is specified, Postgres implements the refresh as a series of INSERT, UPDATE, and DELETE statements on the existing view. So the answer to your question is no, Postgres doesn’t create another table and then swap it. > • The mview gets refreshed in a couple of mins sometimes and sometimes > it takes hours. When it runs for longer, there are no locks and no resource > shortage, the number of recs in the base table is 6m (7.5gb) which is not > huge so why does it take so long to refresh the mview? Does the run time correlate with the number of changes being made? > • Does mview need vacuum/analyze/reindex? My understanding is that when CONCURRENTLY is specified, yes it does need vacuuming, because of the aforementioned implementation of REFRESH as a series of INSERT, UPDATE, and DELETE statements. In our situation, we have large views that are refreshed once per week. We want to ensure that the view is in the best possible shape for users, so we create the view with autovacuum_enabled = false and then run an explicit vacuum/analyze step immediately after the refresh rather than leaving it to chance. Cheers Philip
Re: postgres materialized view refresh performance
> On Oct 23, 2020, at 9:52 AM, Ravi Krishna wrote: > >> My understanding is that when CONCURRENTLY is specified, Postgres implements >> the refresh as a series of INSERT, UPDATE, >> and DELETE statements on the existing view. So the answer to your question >> is no, Postgres doesn’t create another table and >> then swap it. > > The INSERTS/UPDATE/DELETE happens only for the difference. PG first creates > a new temp table and then compares it with > the MV and detects the difference. That is why for CONCURRENTLY, a unique > index is required on the MV. Yes, thank you, that’s what I understand too but I expressed it very poorly.
Re: postgres materialized view refresh performance
> On Oct 25, 2020, at 10:52 PM, Ayub M wrote: > > Thank you both. > > As for the mview refresh taking long -- > • The mview gets refreshed in a couple of mins sometimes and sometimes it > takes hours. When it runs for longer, there are no locks and no resource > shortage, the number of recs in the base table is 6m (7.5gb) which is not > huge so why does it take so long to refresh the mview? > > Does the run time correlate with the number of changes being made? > > -- Almost the same number of records are present in the base table (6 million > records). The base table gets truncated and reloaded everytime with almost > the same number of records. > > And the mview is a simple select from this one base table. > > The mview has around 10 indexes, 1 unique and 9 non-unique indexes. > > Population of the base tables takes about 2 mins, using "insert into select > from table", but when the mview is created for the first time it takes 16 > minutes. Even when I remove all but one unique index it takes about 7 > minutes. Any clue as to why it is taking longer than the create of the base > table (which is 2 mins). Do you know if it’s executing a different plan when it takes a long time? auto_explain can help with that. > > On Fri, Oct 23, 2020 at 10:53 AM Philip Semanchuk > wrote: > > > > On Oct 23, 2020, at 9:52 AM, Ravi Krishna wrote: > > > >> My understanding is that when CONCURRENTLY is specified, Postgres > >> implements the refresh as a series of INSERT, UPDATE, > >> and DELETE statements on the existing view. So the answer to your question > >> is no, Postgres doesn’t create another table and > >> then swap it. > > > > The INSERTS/UPDATE/DELETE happens only for the difference. PG first > > creates a new temp table and then compares it with > > the MV and detects the difference. That is why for CONCURRENTLY, a unique > > index is required on the MV. > > Yes, thank you, that’s what I understand too but I expressed it very poorly. > > > > -- > Regards, > Ayub
Re: postgres materialized view refresh performance
> On Oct 26, 2020, at 10:45 AM, Ayub M wrote: > > It's a simple sequential scan plan of one line, just reading the base table > sequentially. Well, unless I have misunderstood you, the materialized view is basically just "select * from some_other_table”, the number of records in the source table is ~6m and doesn’t change much, there are no locking delays and no resource shortages, but sometimes the refresh takes minutes, and sometimes hours. There’s something missing from the story here. Some things to try or check on — - activity (CPU, disk, memory) during the period when the mat view is refreshing - each time after you refresh the mat view, vacuum it - even better, if you can afford a brief lock on reads, run a vacuum full instead of just regular vacuum - if possible, at the same time as you create the problematic mat view, run a similar process that writes to a different mat view (tmp_throwaway_mat_view) without the CONCURRENTLY keyword and see if it behaves similarly. > > On Mon, Oct 26, 2020, 9:21 AM Philip Semanchuk > wrote: > > > > On Oct 25, 2020, at 10:52 PM, Ayub M wrote: > > > > Thank you both. > > > > As for the mview refresh taking long -- > > • The mview gets refreshed in a couple of mins sometimes and sometimes it > > takes hours. When it runs for longer, there are no locks and no resource > > shortage, the number of recs in the base table is 6m (7.5gb) which is not > > huge so why does it take so long to refresh the mview? > > > > Does the run time correlate with the number of changes being made? > > > > -- Almost the same number of records are present in the base table (6 > > million records). The base table gets truncated and reloaded everytime with > > almost the same number of records. > > > > And the mview is a simple select from this one base table. > > > > The mview has around 10 indexes, 1 unique and 9 non-unique indexes. > > > > Population of the base tables takes about 2 mins, using "insert into select > > from table", but when the mview is created for the first time it takes 16 > > minutes. Even when I remove all but one unique index it takes about 7 > > minutes. Any clue as to why it is taking longer than the create of the base > > table (which is 2 mins). > > Do you know if it’s executing a different plan when it takes a long time? > auto_explain can help with that. > > > > > > > On Fri, Oct 23, 2020 at 10:53 AM Philip Semanchuk > > wrote: > > > > > > > On Oct 23, 2020, at 9:52 AM, Ravi Krishna wrote: > > > > > >> My understanding is that when CONCURRENTLY is specified, Postgres > > >> implements the refresh as a series of INSERT, UPDATE, > > >> and DELETE statements on the existing view. So the answer to your > > >> question is no, Postgres doesn’t create another table and > > >> then swap it. > > > > > > The INSERTS/UPDATE/DELETE happens only for the difference. PG first > > > creates a new temp table and then compares it with > > > the MV and detects the difference. That is why for CONCURRENTLY, a > > > unique index is required on the MV. > > > > Yes, thank you, that’s what I understand too but I expressed it very > > poorly. > > > > > > > > -- > > Regards, > > Ayub >
Re: precautions/prerequisites to take for specific table
> On Nov 5, 2020, at 8:49 AM, Vasu Madhineni wrote: > > Hi All, > > In my organisation a newly built project application team requirement on > tables like have a column (text type), with size can reach around 3 MB, and > 45 million records annually. > > Are there any specific precautions/prerequisites we have to take from DBA end > to handle this type of table. Hi Vasu, Postgres can handle that just fine. We have a table with two text columns that’s 18Gb and almost 400 million rows, and that’s not a big table by some people’s standards. If you have specific concerns, you’ll need to tell us more about your situation and why you think you won’t be satisfied. Cheers Philip
Re: maintenance_work_mem
> On Nov 19, 2020, at 3:05 AM, Atul Kumar wrote: > > Hi, > > I have below queries: > > 1. How do i check the maintenance_work_mem for current session, before > setting some other value for this parameter for the same session. > > 2. and How do I set maintenance_work_mem for a session only, and how > will it be "rollback" once my maintainance work is done, Do I need to > execute any command for that or just closing the session will rollback > what I set for the session. In addition to Andreas’ helpful references to SHOW and SET, there’s also pg_settings -- https://www.postgresql.org/docs/13/view-pg-settings.html Cheers Philip
Re: Avoid excessive inlining?
> On Dec 19, 2020, at 12:59 AM, Joel Jacobson wrote: > > Is there a way to avoid excessive inlining when writing pure SQL functions, > without having to use PL/pgSQL? Hi Joel, The rules for inlining are here: https://wiki.postgresql.org/wiki/Inlining_of_SQL_functions According to those rules, if you declared your SQL function as VOLATILE, then Postgres wouldn’t inline it. From your question, I’m not sure if you want to have the same function inlined sometimes and not others. I can’t think of a way to do that offhand. Hope this helps, Philip
Re: Avoid excessive inlining?
> On Dec 22, 2020, at 8:40 AM, Laurenz Albe wrote: > > On Mon, 2020-12-21 at 11:45 -0500, Philip Semanchuk wrote: >>> On Dec 19, 2020, at 12:59 AM, Joel Jacobson wrote: >>> Is there a way to avoid excessive inlining when writing pure SQL functions, >>> without having to use PL/pgSQL? >> >> The rules for inlining are here: >> https://wiki.postgresql.org/wiki/Inlining_of_SQL_functions >> >> According to those rules, if you declared your SQL function as VOLATILE, >> then Postgres wouldn’t >> inline it. From your question, I’m not sure if you want to have the same >> function inlined >> sometimes and not others. I can’t think of a way to do that offhand. > > Where do you see that? As far as I know, VOLATILE is the best choice if you > want the function to be inlined. Ugh, you’re absolutely right, and I’m sorry for spreading misinformation. That’s what I get from quoting from memory rather than reading the link that I posted. > > I would say that the simplest way to prevent a function from being inlined > is to set a parameter on it: > > ALTER FUNCTION f() SET enable_seqscan = on; I appreciate the correction and education. Cheers Philip
Re: plpgsql unit testing and code coverage
> On Dec 31, 2020, at 7:20 AM, Joao Miguel Ferreira > wrote: > > Hello, > > I'm a fan of unit testing and related matters but have used it only on client > applications, not on database implemented logic. I recently joined a project > whit dozens of PL functions and procedures. > > So, it would be great for me to find a way to execute unit tests on those > functions and procedures and, even better, if I can get code coverage results > from it. I really enjoy approaching sw development with these tools. > > I'dd appreciate your feedback Hi Joao, Are you familiar with pgTAP? (https://pgtap.org) It gives you some of what you want, but I don’t think it handles coverage. Cheers Philip
Re: How to check if a materialised view is being updated?
> On Jan 19, 2021, at 6:33 AM, Jayadevan M wrote: > > > > So I’m looking for a way to identify if the refresh process is finished or if > it’s still running - preferably without having to wait for timeout by > querying a locked materialized view. But by e.g. using the system tables or > otherwise. > > > > Can anybody suggest some pointers on how to do this? > > > Maybe pg_stat_activity will have the refresh query? Yes, pg_stat_activity has a query column that could be searched with the regular string matching tools, including regex if necessary. pg_stat_activity also has some other useful columns that, like query_start which can tell you how long the query has been running. Cheers Philip
Re: How does Postgres decide if to use additional workers?
> On Feb 9, 2021, at 10:52 AM, Thorsten Schöning wrote: > > So, based on which facts does Postgres decide if to use aadditional > workers or not? Can I see those decisions explained somewhere? I don't > see anything in the query plan. Thanks! Hi Thorsten, This is an interesting topic for me too. here’s a formula for the max number of workers that Postgres will consider for a table. Of course, most queries use more than just one table, and I don’t yet understand how Postgres handles the situation where the formula suggests multiple workers for some tables and a single worker for others. There was some conversation about this on the performance mailing list in June. Here’s a link to the message that contains the formula; there’s more items of interest in the whole thread: https://www.postgresql.org/message-id/89423FD3-0F13-447D-8D9E-EB1722150F94%40americanefficient.com Hope this helps Philip
pg_stat_user_tables.n_mod_since_analyze persistence?
Hi all, I saw some unexpected behavior that I'm trying to understand. I suspect it might be a quirk specific to AWS Aurora and I'd like to confirm that. When I restart my local Postgres instance (on my Mac), the values in pg_stat_user_tables.n_mod_since_analyze are preserved. In other words, if table foo had n_mod_since_analyze=33 before the reboot, it still has n_mod_since_analyze=33 after the restart. When I restart an AWS Aurora instance, the values in pg_stat_user_tables.n_mod_since_analyze all seem to be reset to 0. Can anyone confirm (or refute) that the behavior I see on my Mac (preservation of these values through a restart) is common & expected behavior? Thanks Philip
Re: pg_stat_user_tables.n_mod_since_analyze persistence?
> On Feb 15, 2021, at 3:55 PM, Tom Lane wrote: > > Philip Semanchuk writes: >> I saw some unexpected behavior that I'm trying to understand. I suspect it >> might be a quirk specific to AWS Aurora and I'd like to confirm that. > >> When I restart my local Postgres instance (on my Mac), the values in >> pg_stat_user_tables.n_mod_since_analyze are preserved. In other words, if >> table foo had n_mod_since_analyze=33 before the reboot, it still has >> n_mod_since_analyze=33 after the restart. > >> When I restart an AWS Aurora instance, the values in >> pg_stat_user_tables.n_mod_since_analyze all seem to be reset to 0. > >> Can anyone confirm (or refute) that the behavior I see on my Mac >> (preservation of these values through a restart) is common & expected >> behavior? > > Yeah, in PG those stats would be preserved, at least as long as it's > a clean shutdown. Thanks, Tom. A colleague pointed me to a blog post by Michael Vitale that confirms this bug on AWS and contains more detail: https://elephas.io/685-2/ Hope this helps someone else Philip
Leading comments and client applications
Hi, I'm trying to understand a behavior where, with our Postgres client, a leading comment in a SQL script causes the CREATE FUNCTION statement following it to be not executed. I can't figure out if this is a bug somewhere or just a misunderstanding on my part. I would appreciate some help understanding. Here's the contents of foo.sql -- -- this is a comment CREATE FUNCTION foo(bar text) RETURNS text AS $$ SELECT bar $$ LANGUAGE sql IMMUTABLE PARALLEL SAFE ; When I feed that to 'psql -f foo.sql', the function is created as I expect. In the Postgres log, the leading comment *doesn't* appear. I see the same behavior if I just copy/paste the function into psql. Our test system uses Python 3.8, SQLAlchemy 1.3.6, and psycopg 2.8.5, and when our test harness reads foo.sql and passes it to SQLAlchemy's execute(), I can see in the Postgres log that the leading comment is *not* stripped, and the function isn't created. The server is Postgres 11. My naive interpretation is that one of the client layers (SQLAlchemy or psycopg2) should be stripping the leading comment but isn't, but that seems like a lot of responsibility to push onto a client application. I figured that would be the responsibility of the Postgres parser. I'd be grateful for any insights about what I'm missing. Thanks Philip
Re: Leading comments and client applications
> On Mar 25, 2022, at 11:59 AM, Tom Lane wrote: > > Philip Semanchuk writes: >> I'm trying to understand a behavior where, with our Postgres client, a >> leading comment in a SQL script causes the CREATE FUNCTION statement >> following it to be not executed. I can't figure out if this is a bug >> somewhere or just a misunderstanding on my part. I would appreciate some >> help understanding. > > Are you certain there's actually a newline after the comment? > The easiest explanation for this would be if something in the > SQLAlchemy code path were munging the newline. I verified that there is a newline after the comment. But yes, thanks to your suggestion and others, I was able to narrow this down to something in SQLAlchemy behavior. In case anyone else comes across this and is wondering -- In addition to accepting a plain string, execute() accepts a number of different SQLAlchemy data types, including TextClause and DDLElement. We used to pass a DDLElement to execute(), but a few months ago we switched to passing a TextClause because DDLElement interprets % signs anywhere in SQL scripts as Python string interpolation markers and that was causing us headaches in some scripts. Something about the way TextClause changes the raw SQL string causes the behavior I’m seeing, although we didn’t notice it at the time of the changeover. I don’t know what exactly it’s doing yet, but when I switch back to passing a DDLElement to execute(), my SQL function is created as I expected. https://docs.sqlalchemy.org/en/13/core/connections.html#sqlalchemy.engine.Connection.execute As David J pointed out, execute() is deprecated as of version 1.4. We’re still on 1.3 but we’ll have to move away from this code eventually so maybe this is a good inspiration to move away from execute() now and reduce the number of deprecation warnings we have to deal with in the future. > As far as the comparison behavior goes, psql's parser strips > comments that start with double dashes, for $obscure_reasons. > The server is perfectly capable of ignoring those by itself, > though. (Awhile back I tried to remove that psql behavior, > but it caused too much churn in our regression tests.) Thanks, this is most helpful. I use psql to double check I think SQLAlchemy is doing something odd. It’s good to know that psql's behavior in this case is a choice and not required behavior for clients. Peter J. Holzer’s psycopg2 example could have showed me the same; I wish I had thought of that. I appreciate all the help! Cheers Philip
Re: Leading comments and client applications
> On Mar 28, 2022, at 5:42 AM, Philippe Doussot > wrote: > > >Something about the way TextClause changes the raw SQL string causes the > >behavior I’m seeing, although we didn’t notice it at the time of the > >changeover. > >I don’t know what exactly it’s doing yet, but when I switch back to passing > >a DDLElement to execute(), my SQL function is created as I expected. > > > Alternate option if you want continue to use TextClause: > > use /* comment */ for first prefix comment. > > Comment is logged and query executed (tested on Java ( not on SQLAlchemy )). > We use it to track back the request id executed like that > > query = em.createNativeQuery("/*requete_enregistree_num_" + requete.getId() + > "*/ " + requete.getReqRequete().trim()); Thanks for the suggestion! In my testing, both single line and multiline comment blocks cause the same problem for me. I *was* able to resolve this with a simple change. I was calling SQLAlchemy’s engine.execute(). When I call connection.execute() instead, the problem resolves. This also solves a future deprecation problem for us. engine.execute() is deprecated in SQLAlchemy 1.4, but connection.execute() is not. I didn’t expect this to fix the problem. There’s no difference in the Postgres log that I can see, so I think the SQL that SQLAlchemy sends to postgres is the same. If it’s a commit/transaction problem, it should affect all of our functions equally, not just the ones that start with comments. I clearly don’t understand this problem fully. Although I'm curious about it, I’m eager to move on to other things. I plan to proceed with this fix and not investigate any more. THanks everyone for all the help and suggestions Cheers Philip > > On 25/03/2022 19:05, Philip Semanchuk wrote: >> >>> On Mar 25, 2022, at 11:59 AM, Tom Lane >>> wrote: >>> >>> Philip Semanchuk >>> >>> writes: >>> >>>> I'm trying to understand a behavior where, with our Postgres client, a >>>> leading comment in a SQL script causes the CREATE FUNCTION statement >>>> following it to be not executed. I can't figure out if this is a bug >>>> somewhere or just a misunderstanding on my part. I would appreciate some >>>> help understanding. >>>> >>> Are you certain there's actually a newline after the comment? >>> The easiest explanation for this would be if something in the >>> SQLAlchemy code path were munging the newline. >>> >> I verified that there is a newline after the comment. But yes, thanks to >> your suggestion and others, I was able to narrow this down to something in >> SQLAlchemy behavior. In case anyone else comes across this and is wondering >> -- >> >> In addition to accepting a plain string, execute() accepts a number of >> different SQLAlchemy data types, including TextClause and DDLElement. We >> used to pass a DDLElement to execute(), but a few months ago we switched to >> passing a TextClause because DDLElement interprets % signs anywhere in SQL >> scripts as Python string interpolation markers and that was causing us >> headaches in some scripts. Something about the way TextClause changes the >> raw SQL string causes the behavior I’m seeing, although we didn’t notice it >> at the time of the changeover. I don’t know what exactly it’s doing yet, but >> when I switch back to passing a DDLElement to execute(), my SQL function is >> created as I expected. >> >> >> https://docs.sqlalchemy.org/en/13/core/connections.html#sqlalchemy.engine.Connection.execute >> >> >> As David J pointed out, execute() is deprecated as of version 1.4. We’re >> still on 1.3 but we’ll have to move away from this code eventually so maybe >> this is a good inspiration to move away from execute() now and reduce the >> number of deprecation warnings we have to deal with in the future. >> >> >> >>> As far as the comparison behavior goes, psql's parser strips >>> comments that start with double dashes, for $obscure_reasons. >>> The server is perfectly capable of ignoring those by itself, >>> though. (Awhile back I tried to remove that psql behavior, >>> but it caused too much churn in our regression tests.) >>> >> >> Thanks, this is most helpful. I use psql to double check I think SQLAlchemy >> is doing something odd. It’s good to know that psql's behavior in this case >> is a choice and not required behavior for clients. Peter J. Holzer’s >> psycopg2 example could have showed me the same; I wish I had thought of that. >> >> >> I appreciate all the help! >> >> Cheers >> Philip >> >> >> >> >> > > > -- > > 📌 Le nom de domaine de nos adresses mails évolue et devient @arche-mc2.fr. > > > arche-mc2.fr > > > > > Philippe DOUSSOT > > ARCHITECTE TECHNIQUE > > DIRECTION DES SOLUTIONS ARCHE MC2 DOMICILE > > philippe.doussot@arche‑mc2.fr >
Why is my function inlined only when STABLE?
Hi all, I have a function that isn't being inlined, and I would appreciate help to understand why that's the case. I'm using PG 11.15. I know that if I declare my function IMMUTABLE and it calls a non-IMMUTABLE function, Postgres won't inline my function. But even when my function calls only substring() (which I understand to be IMMUTABLE based on '\df+ substring'), I still can't get Postgres to inline it. If I re-declare my function as STABLE, then Postgres inlines it. According to the rules I understand (https://wiki.postgresql.org/wiki/Inlining_of_SQL_functions#Inlining_conditions_for_scalar_functions), the IMMUTABLE version of my function should be inlined too. What am I missing? Here's a log of a CLI session showing that the IMMUTABLE version is not inlined, but the STABLE one is. show track_functions +---+ | track_functions | |---| | all | +---+ SHOW me@/tmp:wylan# SELECT * FROM pg_stat_user_functions +--+--++-+--+-+ | funcid | schemaname | funcname | calls | total_time | self_time | |--+--++-+--+-| +--+--++-+--+-+ SELECT 0 Time: 0.021s me@/tmp:wylan# CREATE OR REPLACE FUNCTION f(foo text) RETURNS text AS $$ SELECT substring(foo FROM 1 FOR 2) $$ LANGUAGE sql IMMUTABLE PARALLEL SAFE; CREATE FUNCTION Time: 0.003s me@/tmp:wylan# select f('4242') +-+ | f | |-| | 42 | +-+ SELECT 1 Time: 0.008s me@/tmp:wylan# SELECT * FROM pg_stat_user_functions +--+--++-+--+-+ | funcid | schemaname | funcname | calls | total_time | self_time | |--+--++-+--+-| | 14472085 | public | f | 1 | 0.05 | 0.05| +--+--++-+--+-+ SELECT 1 Time: 0.022s me@/tmp:wylan# DROP FUNCTION f(text) DROP FUNCTION Time: 0.001s me@/tmp:wylan# CREATE OR REPLACE FUNCTION f(foo text) RETURNS text AS $$ SELECT substring(foo FROM 1 FOR 2) $$ LANGUAGE sql STABLE PARALLEL SAFE; CREATE FUNCTION Time: 0.003s me@/tmp:wylan# select pg_stat_reset() +-+ | pg_stat_reset | |-| | | +-+ SELECT 1 Time: 0.008s me@/tmp:wylan# SELECT * FROM pg_stat_user_functions +--+--++-+--+-+ | funcid | schemaname | funcname | calls | total_time | self_time | |--+--++-+--+-| +--+--++-+--+-+ SELECT 0 Time: 0.022s me@/tmp:wylan# select f('4242') +-+ | f | |-| | 42 | +-+ SELECT 1 Time: 0.008s me@/tmp:wylan# SELECT * FROM pg_stat_user_functions +--+--++-+--+-+ | funcid | schemaname | funcname | calls | total_time | self_time | |--+--++-+--+-| +--+--++-+--+-+ SELECT 0 Time: 0.019s me@/tmp:wylan# Thanks Philip
Re: Why is my function inlined only when STABLE?
> On Mar 29, 2022, at 2:24 PM, Tom Lane wrote: > > Philip Semanchuk writes: >> I have a function that isn't being inlined, and I would appreciate help to >> understand why that's the case. > > I think the test methodology you used is faulty, because it does not > distinguish between "inline-able" and "foldable to a constant". > Given an immutable function applied to constant(s), the planner prefers > to fold to a constant by just executing the function. The inline-ing > transformation is considered only when that case doesn't apply. Excellent point, thank you. Now I understand. I was trying to write an inlining demo for my colleagues, and I simplified my example one step too far by using a constant. I really appreciate the help! Cheers Philip
Re: Are stored procedures/triggers common in your industry
> On Apr 20, 2022, at 3:18 PM, Guyren Howe wrote: > > I’ve really only ever worked in web development. 90+% of web developers > regard doing anything at all clever in the database with suspicion. > > I’m considering working on a book about implementing business logic in > Postgres, and I’m curious about how common that actually is. > We have some business logic in Postgres functions, particularly triggers. Our apps are written in Python, and we use pytest to exercise our SQL functions to ensure they're doing what we think they’re doing. It works well for us. FWIW, we’re not a Web dev shop. Cheers Philip
Could Postgres warn about incorrect volatility class?
Hi all, I recently discovered that a custom function that I thought was being inlined was not being inlined because I had declared it IMMUTABLE, but the function body cast an enum value to text which is a STABLE operator. Once I corrected my function's definition to declare it STABLE, Postgres inlined it. Since Postgres can apparently determine during query parsing that my function has a volatility class mismatch, is there a reason that Postgres can't make the same determination and inform me about it when I define the function? In this case a helpful message would have been "I sure hope you know what you're doing..." :-) Thanks Philip
Re: Max sane value for join_collapse_limit?
> On Jun 3, 2022, at 4:19 AM, Andreas Joseph Krogh wrote: > > Hi, I have set join_collapse_limit = 12 in production, but I'm thinking about > raising it to 16. > On modern HW is there a “sane maximum” for this value? > I can easily spare 10ms for extra planning per query on our workload, is 16 > too high? I set ours set to 24 (from_collapse_limit=24 and geqo_threshold=25). Most of our queries that involve that involve 10+ relations have a slow execution time (20-30 minutes or more) so reducing planning time isn’t a major concern for us. If the planner takes an extra 20-30 seconds to find a plan that reduces execution time by 5%, we still come out ahead. That said, in our environment the planner can make pretty bad choices once the number of relations into the mid teens because we have some difficult-to-estimate join conditions, so we write our canned queries with this in mind, breaking them into two parts if necessary to avoid throwing too much at the planner at once. IOW, we generally don’t come anywhere near 24 relations in a query. Our very high join_collapse_limit might still come into play if a user writes a very complicated ad hoc query. So (IMHO) as is often the case, the answer is “it depends”. :-) Cheers Philip
Custom function ROWS hint ignored due to inlining?
Hi, I have a custom function where the ROWS hint is getting ignored. I think it’s because the function is getting inlined, but I’d like a second opinion. Here’s my working (contrived) example. CREATE TABLE my_table ( id int primary key GENERATED ALWAYS AS IDENTITY, base_value int NOT NULL ); INSERT INTO my_table (base_value) VALUES (42); CREATE OR REPLACE FUNCTION fn_get_deltas(base_value int) RETURNS TABLE (delta int, total int) AS $$ SELECT generate_series(1, CASE base_value WHEN 42 THEN 4 ELSE 20 END), base_value + generate_series(1, CASE base_value WHEN 42 THEN 4 ELSE 20 END) $$ LANGUAGE sql IMMUTABLE PARALLEL SAFE ROWS 10; EXPLAIN SELECT base_value, delta, total FROM my_table CROSS JOIN LATERAL (SELECT delta, total FROM fn_get_deltas(base_value)) AS foo +--+ | QUERY PLAN | |--| | Nested Loop (cost=0.00..107427.80 rows=226 width=12)| | -> Seq Scan on my_table (cost=0.00..32.60 rows=2260 width=4) | | -> Result (cost=0.00..27.52 rows=1000 width=8) | | -> ProjectSet (cost=0.00..5.02 rows=1000 width=4) | | -> Result (cost=0.00..0.01 rows=1 width=0) | +--+ The plan estimates 1000 rows from the CROSS JOIN despite the “ROWS 10” hint on my function. I think this is because the planner never sees fn_get_deltas() — it has been inlined by the query preprocessor because fn_get_deltas() meets the criteria for inlining (https://wiki.postgresql.org/wiki/Inlining_of_SQL_functions). Instead of 10 rows, the planner uses its default assumption of 1000 rows. If I change the function to VOLATILE to prevent inlining, I get this plan. +-+ | QUERY PLAN | |-| | Nested Loop (cost=0.25..484.85 rows=22600 width=12)| | -> Seq Scan on my_table (cost=0.00..32.60 rows=2260 width=4)| | -> Function Scan on fn_get_deltas (cost=0.25..0.35 rows=10 width=8) | +-+ I would prefer to have the function inlined for better performance, but I can declare it VOLATILE if that’s necessary to give decent estimates to the planner. Am I correctly reading the situation? If so, is there another solution that allows inlining *and* making the ROWS hint visible to the planner? Thanks a bunch Philip
IMMUTABLE function to cast enum to/from text?
Hi all, I know that Postgres' enum_in()/enum_out() functions have a volatility class of STABLE, and STABLE is required because enum elements can be renamed. We have an enum in our database used in a number of custom functions, all of which require casting the enum to/from text. Since enum_in() and enum_out() are STABLE, that means our functions that rely on those casts must also be STABLE, and as a result we can't use them in generated columns. I have seen conversations that suggest creating a custom IMMUTABLE function to perform the cast, but I can't figure out how to do that except with a CASE statement that enumerates every possible value. Is there a more elegant approach? Thanks Philip
Re: IMMUTABLE function to cast enum to/from text?
> On Nov 10, 2022, at 3:39 PM, Tom Lane wrote: > > Joe Conway writes: >> >> CREATE OR REPLACE FUNCTION mood2text(mood) >> RETURNS text AS >> $$ >> select $1 >> $$ STRICT IMMUTABLE LANGUAGE sql; > > Of course, what this is doing is using a SQL-function wrapper to > lie about the mutability of the expression. Whether you consider > that elegant is up to you ;-) ... but it should work, as long as > you don't break things by renaming the enum's values. Thanks Joe and Tom, I’m comfortable lying to Postgres occasionally — never for evil, only for good of course. :-) Cheers Philip
ALTER COLUMN to change GENERATED ALWAYS AS expression?
Hi all, I have a column defined GENERATED ALWAYS AS {my_expression} STORED. I’d like to change the {my_expression} part. After reading the documentation for ALTER TABLE (https://www.postgresql.org/docs/current/sql-altertable.html) and trying a few things that resulted in syntax errors, there doesn’t seem to be a way to alter the column’s GENERATED expression in place. It seems like my only option is to drop and re-add the column. Is that correct? Thanks Philip
Re: ALTER COLUMN to change GENERATED ALWAYS AS expression?
> On Feb 7, 2023, at 3:30 AM, Laurenz Albe wrote: > > On Mon, 2023-02-06 at 12:04 -0500, Philip Semanchuk wrote: >> I have a column defined GENERATED ALWAYS AS {my_expression} STORED. I’d like >> to change the >> {my_expression} part. After reading the documentation for ALTER TABLE >> (https://www.postgresql.org/docs/current/sql-altertable.html) and trying a >> few things that >> resulted in syntax errors, there doesn’t seem to be a way to alter the >> column’s GENERATED >> expression in place. It seems like my only option is to drop and re-add the >> column. >> Is that correct? > > I think that is correct. But changing the expression would mean rewriting > the column > anyway. The only downside is that a dropped column remains in the table, and > no even > a VACUUM (FULL) will get rid of it. Thanks for the confirmation. I hadn’t realized that the column would remain in the table even after a DROP + VACUUM FULL. I’m curious — its presence as a deleted column doesn't affect performance in any meaningful way, does it? In this case we have the option of dropping and re-creating the table entirely, and that's probably what I'll do. Cheers Philip
Re: Uppercase version of ß desired
> On Mar 13, 2023, at 5:38 PM, Celia McInnis wrote: > > HI: > > I would be really happy if postgresql had an upper case version of the ß > german character. The wiki page > https://en.wikipedia.org/wiki/%C3%9F > > indicates that the capital (U+1E9E ẞ LATIN CAPITAL LETTER SHARP S) was > encoded by ISO 10646 in 2008. > > BTW the reason that I'd like upper('ß') to give something different than 'ß' > is because I have written a simple substitution puzzle for a large number of > languages where I show the encrypted lower case words in upper case and the > successful letter substitution submissions in lower case - so I need the > upper and lower case versions of each letter to be different! > > Thanks for any assistance! Maybe I can hack what I want in python (which is > what I am using for the puzzle). Hi Celia, I ran into this too back when we were transitioning from Python 2 to 3 (2 behaved differently from 3). While researching it I discovered this Python issue which maybe sheds some additional light on the subject: https://github.com/python/cpython/issues/74993 We ultimately found 90 characters that (under Python 3) grew longer when uppercased. python -c "print([c for c in range(0x80, 0x22ff) if len(chr(c)) != len(chr(c).upper())])” I hope this is at least interesting. :-) Cheers Philip
Re: Local postgres manual
> On Nov 3, 2023, at 9:18 AM, Ben Hancock wrote: > > Hi all: > > Does Postgres come with a local, full version of the manual installed > by default anywhere (i.e. akin to what is available on the website, but > in man, info, or plain-text format)? When I invoke `man postgres`, I do > get a very useful - but limited - manual page, which has references to > chapters. For example: > > OPTIONS > postgres accepts the following command-line arguments. For a > detailed discussion of the options consult Chapter 20. ... > > Of course, I can pull up a browser and find the manual, or consult a > local PDF if I have a graphical display. But sometimes it may be convenient > to view the the manual for the version of Postgres that is on the system, > right there. Does one exist? In addition to Bruce Momjian’s suggestion, I’ll add that you can make an HTML version of the manual from the source code. This is what I use and it works great for me. Cheers Philip
Re: Local postgres manual
> On Nov 3, 2023, at 9:45 AM, Bruce Momjian wrote: > > On Fri, Nov 3, 2023 at 09:39:46AM -0400, Philip Semanchuk wrote: >> >> In addition to Bruce Momjian’s suggestion, I’ll add that you can make an >> HTML version of the manual from the source code. This is what I use and it >> works great for me. > > Yes, we can do HTML too as a single file, postgres.html. What I don't > see is man page output for anything but the references pages, and as > separate files. It might be possible to convert the HTML to man format > using something like html2man. Yes, ISTR there’s an all-in-one-page option, but the HTML that I built is in individual pages that mirror the organization on postgresql.org which works great for me. Sorry I don’t remember the exact command I used but it was very straightforward, nothing tricky at all. Maybe as simple as `make install-docs`? Cheers Philip
Re: Parameter value in RDS
> On Jan 16, 2024, at 4:19 PM, David G. Johnston > wrote: > > On Tuesday, January 16, 2024, Atul Kumar wrote: > Hi, > > I am new to RDS postgres, I have version 14 running on it with m7g.large > > I found that lots of parameters has DBInstanceClassMemory written, so what > exactly is the value of this variable ? > > How should I calculate it? > > IIRC it’s the amount of RAM on your instance. You look it up in a table > usually. Or check the web console. Yes, I’m pretty sure that it’s the amount of RAM *in bytes*. Although I like the flexibility of having that expressed in a variable, it wan’t always as easy to use as I wanted it to be. Specifically, I found that differences in units made it a little difficult to figure out how AWS was configuring things. For example, on an old instance we had, shared_buffers was defined as {DBInstanceClassMemory/10922}. It took me a while to figure out that that translates to “calculate 75% of available memory, and express that value in 8Kb blocks”. How? Well, 10922 = 8 * 1024 * 1.3. shared_buffers is expressed in 8Kb blocks, so converting from units of bytes (DBInstanceClassMemory) to 8kB blocks (shared_buffers) requires dividing by 8 * 1024. And dividing by 1. is the same as multiplying by 3/4, which is 75%. This may have been explained in AWS documentation but I couldn’t find it at the time and it took some work on my part to figure out the logic behind 10922 and some other config magic numbers. Maybe this will save you some time. Cheers Philip
Re: How different is AWS-RDS postgres?
> On May 26, 2021, at 10:04 PM, Rob Sargent wrote: > > > >> On May 26, 2021, at 4:37 PM, Ian Harding wrote: >> >> >> There is an option to send the logs to cloudwatch which makes it less awful >> to look at them. > I have that but precious little of interest there. Lots of autovac, a > smattering of hints to increase wal size!? I have yet to spot anything which > corresponds to the “I/O failure” which the middle ware gets. > > I don’t have query logging on, but I do see reports from my psql session > fat-fingering. > > As to the logs UI, the search is pretty feeble; I don’t understand why there > are four channels of logs; the graphs are wearing the same rose-coloured as > the logs. > And 24 hours without a peep from AWS support. (I don’t call mailing me what I > sent them “contact”.) > > My guess right now is that the entire tomcat connection pool is in a single > transaction? That’s the only way the tables could disappear. I am making > separate calls to JDBC getConnection () for each doPost. We used Aurora (AWS hosted Postgres) and I agree that Cloudwatch search is pretty limited. I wrote a Python script to download cloudwatch logs to my laptop where I can use proper tools like grep to search them. It’s attached to this email. It’s hacky but not too terrible. I hope you find it useful. Cheers Philip import pathlib import operator import logging from collections import namedtuple import subprocess import datetime import json import boto3 DB_IDENTIFIER = 'your-db-name-here' PATH = './logs' Config = namedtuple('Config', ['access_key', 'secret_key', 'region', 'db_identifier', 'rds_client']) boto_session = boto3.session.Session() config = Config( access_key=boto_session._session.get_credentials().access_key, secret_key=boto_session._session.get_credentials().secret_key, region=boto_session._session.get_config_variable('region'), db_identifier=DB_IDENTIFIER, rds_client=boto_session.client('rds'), ) class LogFile: def __init__(self, aws_name, timestamp, size): self.aws_name = aws_name self.last_written = datetime.datetime.fromtimestamp(timestamp / 1000) self.size = int(size) # typical aws_name = error/postgresql.log.2019-06-21-16 self.local_path = pathlib.Path(PATH, pathlib.Path(aws_name).name + '.txt') def download(self, config): # aws rds download-db-log-file-portion \ # --db-instance-identifier wylan-sql \ # --log-file-name error/postgresql.log.2019-06-24-14 \ # --no-paginate --output text cmd = [ 'aws', 'rds', 'download-db-log-file-portion', '--db-instance-identifier', config.db_identifier, '--log-file-name', self.aws_name, '--no-paginate', ] with open(self.local_path, 'wb') as f: self._proc = subprocess.Popen(cmd, stdout=f) return_code = self._proc.wait() if return_code == 0: # Great, the data were written. It's actually in JSON format. All of the interesting # info is in the LogFileData element. Grab that and replace the file contents with it. with open(self.local_path) as f: d = json.load(f, encoding='utf-8') log_text = d['LogFileData'] with open(self.local_path, 'w') as f: f.write(log_text) else: # FIXME provide a more helpful exception raise ValueError def _get_log_files(config, root_directory): result = config.rds_client.describe_db_log_files(DBInstanceIdentifier=config.db_identifier) # FIXME filter out logs where 'Size' == 0? rds_logs = [LogFile(d['LogFileName'], d['LastWritten'], d['Size']) for d in result['DescribeDBLogFiles']] # rds_logs[0].download(config) # import pprint; pp=pprint.pprint # import pdb; pdb.set_trace() rds_logs.sort(key=lambda rds_log: rds_log.aws_name) if not rds_logs: print('No RDS logs found') else: for rds_log in rds_logs: print(f'downloading {rds_log.aws_name}...') rds_log.download(config) if __name__ == '__main__': _get_log_files(config, None)
CREATE/REFRESH MATERIALIZED VIEW planner difference?
Hi all, Should I expect a planner difference between CREATE MATERIALIZED VIEW and REFRESH MATERIALIZED VIEW? We have a materialized view that uses 4 workers during CREATE but only one worker during REFRESH, and as a result the refresh takes much longer (~90 minutes vs. 30 minutes for the CREATE). So far this behavior has been 100% consistent. I'm running both the CREATE and REFRESH on the same server (Postgres 11.9 on AWS Aurora). I don't think the refresh is using one worker in response to other things happening on the server because we’ve observed this happening when the server is not busy. We're not using the CONCURRENTLY option for REFRESH. THanks Philip
Re: CREATE/REFRESH MATERIALIZED VIEW planner difference?
> On Jun 1, 2021, at 2:20 PM, Vijaykumar Jain > wrote: > > if you are not using it concurrently, can you confirm the there are *no > active* queries on the mv. > refresh requires AccessExclusiveLock and will wait, till it gets one. > just asking if you can rule out the extended time is not due to waiting for > lock. I can confirm that it’s not waiting on a lock. In addition, through the AWS CPU utilization monitor I can see that the REFRESH uses one CPU/worker whereas the CREATE uses four. This is consistent with the EXPLAIN ANALYZE for the CREATE which says it uses four workers. > also, can you share the plans where you see the diff. Unless I misunderstand, there is no plan for a REFRESH. EXPLAIN (ANALYZE, BUFFERS) refresh materialized view my_mat_view +---+ | QUERY PLAN| |---| | Utility statements have no plan structure | +---+ Cheers Philip > > On Tue, 1 Jun 2021 at 23:30, Philip Semanchuk > wrote: > Hi all, > Should I expect a planner difference between CREATE MATERIALIZED VIEW and > REFRESH MATERIALIZED VIEW? We have a materialized view that uses 4 workers > during CREATE but only one worker during REFRESH, and as a result the refresh > takes much longer (~90 minutes vs. 30 minutes for the CREATE). So far this > behavior has been 100% consistent. > > I'm running both the CREATE and REFRESH on the same server (Postgres 11.9 on > AWS Aurora). I don't think the refresh is using one worker in response to > other things happening on the server because we’ve observed this happening > when the server is not busy. We're not using the CONCURRENTLY option for > REFRESH. > > THanks > Philip > > > > -- > Thanks, > Vijay > Mumbai, India
Re: CREATE/REFRESH MATERIALIZED VIEW planner difference?
> On Jun 1, 2021, at 3:23 PM, Thomas Munro wrote: > > On Wed, Jun 2, 2021 at 7:15 AM Vijaykumar Jain > wrote: >> i only get workers to create mv, but refresh mv plan does not use workers >> for the same conf params. > > Yeah, this changed in v14: > > https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=9e7ccd9ef64d05e87ceb1985d459bef9031205c0 Thanks, all! It’s great to have a clear explanation. I looked at the change notes for 12 & 13 before I posted. I didn’t occur to me to look at 14. :-) Cheers Philip
Re: Database issues when adding GUI
> On Jun 7, 2021, at 11:07 AM, Rob Sargent wrote: > > On 6/7/21 9:02 AM, Rich Shepard wrote: >> This is an unusual message and I hope someone(s) here can offer insights >> into the cause of the problem I've been wrestling with for the past couple >> of weeks. >> >> Context: For the past decade I've run my business tracking database from the >> psql shell (currently with postgresql-12.2) and decided it's time to add a >> frontend so it's a stand-alone desktop application. I'm learning to use >> PyQt5 as the widget set and application framework. >> >> The database contains three lookup tables: activitytypes, industrytypes, and >> statustypes, all have a single column and few rows. So I've started with >> these to get python3 and PyQt5-5.13.2 to run the SELECT query and display >> the results in a QTableView. >> >> The problem with all three is that my code produces an empty window and >> hangs. There's no python error displayed and the application reports finding >> the database but not the tables. For example, the activitytypes debugging >> log contains: >> INFO:root:found database >> DEBUG:root:Defining model/view >> DEBUG:root:model error: Unable to find table activitytypes >> DEBUG:root:about to execute select query >> DEBUG:root:End of Program >> >> I, and others on the python and pyqt mail lists and stackoverflow, can find >> nothing wrong with the python code. This suggests it's something with the >> database itself. But I have no idea where to look. The database structure >> is: >> bustrac=# \d >> List of relations >> Schema | Name| Type | Owner >> +---+--+-- >> public | activities| table| rshepard >> public | activitytypes | table| rshepard >> public | industrytypes | table| rshepard >> public | locations | table| rshepard >> public | organizations | table| rshepard >> public | organizations_org_nbr_seq | sequence | rshepard >> public | people| table| rshepard >> public | people_person_nbr_seq | sequence | rshepard >> public | projects | table| rshepard >> public | statustypes | table| rshepard >> (10 rows) >> >> What might stop a front-end application from finding a table that has been >> readily accessed from the psql shell? >> >> All suggestions and recommendations are needed. >> >> TIA, >> >> Rich >> >> \ > This looks like a permissions problem, as though you are connecting as a > role/user without permission to select from the tables. I agree with Rich. Permissions, or you’re connecting as the wrong user. You might find it helpful to simplify and not use a GUI application until you get your connection issues sorted out. A simple command line Python app that connects to the database and prints the result of “select * from activitytypes limit 1” will enable quick(er) debugging of your connection issues. Cheers Philip
Re: Low cache hit ratio
> On Jul 29, 2021, at 3:09 AM, Lucas wrote: > > Hello, > > I have recently deployed a new Slave (streaming replication) and have been > monitoring its cache hit ratio. > > At the moment, read-only queries are being sent to this slave but only 10% of > the traffic. > The cache hit ratio is now at 82%. This database is around 1.4TB and is > running on EC2 instances in AWS. PostgreSQL version is 9.2.24. > > The instance is a r4.2xlarge, so it's got 8 vCPUs and 61GB of RAM. Hi Lucas, Have you tried the pg_buffercache extension? It gives you some visibility into what’s in the cache so you can understand what’s staying resident and not leaving room for other things. I wrote a view atop pg_buffercache that I use for this purpose. It’s pasted below; I hope you find it helpful. My only caveat is that I run this under Postgres 11. I *think* I’ve used it under Postgres 9.6 but I’m not sure. It definitely hasn’t been tested on 9.2. Hope this helps, Philip /* A view of pg_buffercache which shows what's in the Postgres cache. Access to pg_buffercache requires membership in the group pg_monitor. It's OK to query this ad hoc, but don't query it aggressively (e.g. in a polling loop). The Postgres doc says -- > When the pg_buffercache view is accessed, internal buffer manager locks are taken for > long enough to copy all the buffer state data that the view will display. This ensures > that the view produces a consistent set of results, while not blocking normal buffer > activity longer than necessary. Nonetheless there could be some impact on database > performance if this view is read often. https://www.postgresql.org/docs/11/pgbuffercache.html */ CREATE OR REPLACE VIEW vw_postgres_cache AS SELECT c.relname, sum(usagecount) AS usage_count, /* count(*) is the number of buffers in use. Each buffer is 8Kb. (8Kb is the page size a.k.a. block size configured at compile time, confirm in psql with the command `show block_size`). I cast the count to float to introduce a float into calculations that are otherwise all int which would produce a result rounded to the nearest int. */ -- cache_% shows the portion of the cache that this entity occupies ((count(*)::float/ pg_settings.setting::int) * 100)::numeric(3, 1) AS "cache_%", -- entity_% shows the portion of this entity that's in cache -- pg_table_size() is in bytes, multiply buffer count by (8 * 1024) so units match (((count(*)::float * 8192) / pg_table_size(c.relname::text)) * 100)::numeric(4,1) AS "entity_%", (count(*)::float * 8 / (1024 * 1024))::numeric(20, 1) AS Gb, (count(*)::float * 8 / 1024 )::numeric(20, 1) AS Mb FROM pg_buffercache b CROSS JOIN pg_settings INNER JOIN pg_class c ON b.relfilenode = pg_relation_filenode(c.oid) AND b.reldatabase IN (0, (SELECT oid FROM pg_database WHERE datname = current_database())) WHERE pg_settings.name = 'shared_buffers' -- If this is run on a system where shared_buffers is expressed in something other than 8kB -- blocks, it will calculate incorrect kB/Mb/Gb values. Asserting the value of the unit here -- ensures no results at all will be returned in that case. AND pg_settings.unit = '8kB' GROUP BY c.relname, pg_settings.setting::int HAVING -- Only include entries that occupy at least 0.1% of the cache ((count(*)::float / pg_settings.setting::int) * 100) >= 0.1 ORDER BY 6 DESC ;
Re: Help with my MacOS PostgreSQL 12 installation
> On Aug 17, 2021, at 12:21 PM, Michael White wrote: > > This is my first post so if I’m in the wrong place please help me get to the > correct list. > > Issue: > > I installed PostgreSQL on my Mac over a year ago and since upgraded my disk > drive and OS to Big Sur. I am a retired Software Engineer and still like to > program and am learning PostgreSQL and have it working on Linux and FreeBSD. > In any case I had put it on my Mac laptop and want to use it again but it > seems that the upgrades I did might have rearranged file locations or they > may have been lost. I have been trying to understand what is where and come > to a point where I need help getting PostgreSQL running again. I have a file > with notes on the installation and here it is: … > 3. Don’t remember how I installed PostgreSQL but looks like with EnterpriseDB > because of the installation directory. (I got this from postgresapp.com) Hi Michael, I use PostgresApp on my Mac, and it has worked very nicely for me. It’s a very no-fuss installation. You should have an app in our Applications folder called Postgres. Start that, and the Postgres server will be running. There’s a little elephant in my menu bar that I can click on that shows me what databases are available. I can set it to start the server automatically when I log in. If you don’t have an app called Postgres or it doesn’t run when you click it, I’d say your installation is damaged. If you can nuke it and start over, that’d be what I’d try. Start by dragging the app to the trash and reinstalling. That might fix things. FYI I modified my .zshrc to include this line to add Postgres’ client apps to my path: export PATH=$PATH:/Applications/Postgres.app/Contents/Versions/11/bin That way I have easy access to psql, pg_config, etc. Hope this helps, Philip
psql syntax for array of strings in a variable?
Hi, I would appreciate help with the syntax for querying an array of strings declared as a psql variable. Here's an example. \set important_days ARRAY['monday', 'friday'] select 1 where 'monday' = ANY(:important_days); ERROR: 42703: column "monday" does not exist LINE 1: select 1 where 'monday' = ANY(ARRAY[monday,friday]); select 1 where 'monday' = ANY(:"important_days"); ERROR: 42703: column "ARRAY[monday,friday]" does not exist LINE 1: select 1 where 'monday' = ANY("ARRAY[monday,friday]"); I'm doing something wrong but I can't figure out what. My real-world use case is that I have a psql script that will execute several queries on a long list of strings, and rather than repeat those strings over and over in the script, I'd like to declare them once at the top of the script and then refer to the variable after that. Bonus points if there's a way to do a multiline declaration like -- \set important_days ARRAY['monday', 'friday'] Thanks for reading Philip
Re: psql syntax for array of strings in a variable?
> On Oct 29, 2021, at 2:05 PM, Tom Lane wrote: > > "David G. Johnston" writes: >> On Friday, October 29, 2021, Philip Semanchuk >> wrote: >>> I would appreciate help with the syntax for querying an array of strings >>> declared as a psql variable. Here's an example. >>> >>> \set important_days ARRAY['monday', 'friday'] > >> Not sure why the single quotes are getting stripped out but that is the >> issue. Maybe double them up to escape them like in a normal text literal? > > Yeah, that's just the way that \set works (and most other psql backslash > commands, I believe). You've likely got an issue with whitespace > disappearing, too, though that might be harmless in this specific example. > > regression=# \set foo 'bar baz' > regression=# \echo :foo > bar baz > regression=# \set foo 'bar ''baz' > regression=# \echo :foo > bar 'baz > regression=# \set foo bar ''baz > regression=# \echo :foo > barbaz > > Not sure offhand how well-documented this is. Thanks, all. Glad to know I wasn’t missing something obvious. > On Oct 29, 2021, at 1:52 PM, Pavel Stehule wrote: > psql variables can hold only text. There is not any type - all is just text. ^^^ This was especially helpful; I’d never considered that before. Cheers Philip
Re: Disk is filling up with large files. How can I clean?
> On Oct 9, 2024, at 5:52 AM, Torsten Förtsch wrote: > > Filenames like 16665, 16665.1, 16665.2 etc all represent the same table (or > similar). The number 16665 is called the file node. > > To get a list of file nodes for a specific database you can run: > > SELECT oid::regclass::text, relfilenode FROM pg_class; > > The /16384/ in the path represents the database. To decipher that you can run: > > SELECT datname, oid FROM pg_database; > > Once you have all that information, you know which database to connect to and > which tables are big. Then you can DROP/DELETE/TRUNCATE or so. Mikael, if you’re unaware of VACUUM FULL (as opposed to just VACUUM), you should read about that too. Hope that helps, Philip
Re: Suggestion for memory parameters
> On Sep 26, 2024, at 7:03 AM, yudhi s wrote: > > In a RDS postgres ... > Is it fine to let it use "FreeLocalStorage" unless it goes till zero? Hi Yudhi, FreeLocalStorage and some of the other things you ask about are specific to AWS RDS, so you might have better luck getting answers on an RDS-specific mailing list. We also use RDS-hosted Postgres and so I completely understand how Postgres and RDS are intertwined. We have had runaway queries exhaust FreeLocalStorage. It has been quite a while since that happened, so my memories are hazy, but I’m pretty sure that when we used all of FreeLocalStorage, the result was that Postgres restarted. It might be equivalent to using all memory and disk space on a standalone system. Once there’s no storage left, behavior is unpredictable but we can’t be surprised if things crash. Usually our runaway queries got killed before FreeLocalStorage filled up, but not always. I second Veem’s suggestion to set work_mem on a per-session basis. Also note that the doc for work_mem says, “the total memory used could be many times the value of work_mem; it is necessary to keep this fact in mind when choosing the value." https://www.postgresql.org/docs/current/runtime-config-resource.html#GUC-WORK-MEM Cheers Philip