Re: how to slow down parts of Pg
On Tue, 2020-04-21 at 20:30 +, Kevin Brannen wrote: > I have an unusual need: I need Pg to slow down. I know, we all want our DB > to go faster, > but in this case it's speed is working against me in 1 area. > > We have systems that are geo-redundant for HA, with the redundancy being > handled by DRBD to keep the disks in sync, > which it does at the block level. For normal operations, it actually works > out fairly well. [...] > The part that hurts so bad is when we do maintenance operations that are DB > heavy, like deleting really old records out of > archives (weekly), moving older records from current tables to archive tables > plus an analyze (every night), > running pg_backup (every night), other archiving (weekly), and vacuum full to > remove bloat (once a quarter). > All of this generates a lot of disk writes, to state the obvious. > > The local server can handle it all just fine, but the network can't handle it > as it tries to sync to the other server. The obvious and best answer is: get a faster network, or choose a different storage solution. Other than that, you can try to make the maintainance operations less resource intense: - partition the tables so that you can get rid of old data with DROP TABLE. The ANALYZE won't hurt, if you treat only the required tables. - use "pg_basebackup" with the "--max-rate" option About VACUUM, you may have a problem. Avoid the need for VACUUM (FULL) at any price. That usually requires tuning autovacuum to be faster, which means using more I/O. If you cannot find a sweet spot there, you have no alternative but getting better I/O (which, as I said in the beginning, would be the correct solution anyway). Yours, Laurenz Albe -- Cybertec | https://www.cybertec-postgresql.com
Using GIN index to retrieve distinct values
Hi all, My understanding of a GIN index is that it maps each unique key to the set of items containing that key. That suggests to me that enumerating the distinct keys efficiently using the index should be possible in some cases. I understand that given the nature of the GIN framework, the original key won't always be able to be retrieved. But in certain cases like int array, for example, would it be possible? While on the topic: what's the canonical way to do that kind of query? CREATE TABLE tagged_items ( id serial PRIMARY KEY, tag_ids integer[] ); select array_agg(tag_id) from (select distinct unnest(tag_ids) from items) tag_id; Thanks! Alex
How do work tercile, percentile & funcion percentile_cont() ?
Hello, >From a table x(c1) containing 30 lines with integer values (column c1) from 1 >to 30 : SELECT percentile_cont(1./3) WITHIN GROUP (ORDER BY c1) FROM x ; percentile_cont -- 10.7 (1 ligne) SELECT percentile_cont(2./3) WITHIN GROUP (ORDER BY c1) FROM x ; percentile_cont -- 20.3 (1 ligne) If ordering, the first 10 lines (c1 from 1 to 10) have ntile(3) OVER(ORDER BY c1) = 1, the 10 following lines (c1 from 11 to 20) have ntile(3) OVER(ORDER BY c1) = 2, the last 10 lines (c1 from 21 to 30) have ntile(3) OVER(ORDER BY c1) = 3. So, I though it should be : percentile_cont(1./3) = 10.5 (the arithmetic average between 10 et 11) and not 10.7 percentile_cont(2./3) = 20.5 (the arithmetic average between 20 et 21) and not 20.3 Thank in advance for explanation - Météo-France - PALAYRET JACQUES DCSC/MBD jacques.palay...@meteo.fr Fixe : +33 561078319
Re: How do work tercile, percentile & funcion percentile_cont() ?
Of course, the same result with this table x(c1) for example : SELECT c1, ntile(3) OVER(ORDER BY c1) FROM x ORDER BY c1 ; c1 | ntile +--- 1 | 1 2 | 1 3 | 1 4 | 1 5 | 1 6 | 1 7 | 1 8 | 1 9 | 1 10 | 1 11 | 2 11 | 2 11 | 2 11 | 2 11 | 2 11 | 2 11 | 2 11 | 2 11 | 2 20 | 2 21 | 3 21 | 3 21 | 3 21 | 3 21 | 3 21 | 3 21 | 3 21 | 3 21 | 3 21 | 3 (30 lignes) SELECT percentile_cont(1./3) WITHIN GROUP (ORDER BY c1) FROM x ; percentile_cont -- 10.7 (1 ligne) SELECT percentile_cont(2./3) WITHIN GROUP (ORDER BY c1) FROM x ; percentile_cont -- 20.3 (1 ligne) De: "PALAYRET Jacques" À: pgsql-general@lists.postgresql.org Envoyé: Mercredi 22 Avril 2020 10:00:49 Objet: How do work tercile, percentile & funcion percentile_cont() ? Hello, >From a table x(c1) containing 30 lines with integer values (column c1) from 1 >to 30 : SELECT percentile_cont(1./3) WITHIN GROUP (ORDER BY c1) FROM x ; percentile_cont -- 10.7 (1 ligne) SELECT percentile_cont(2./3) WITHIN GROUP (ORDER BY c1) FROM x ; percentile_cont -- 20.3 (1 ligne) If ordering, the first 10 lines (c1 from 1 to 10) have ntile(3) OVER(ORDER BY c1) = 1, the 10 following lines (c1 from 11 to 20) have ntile(3) OVER(ORDER BY c1) = 2, the last 10 lines (c1 from 21 to 30) have ntile(3) OVER(ORDER BY c1) = 3. So, I though it should be : percentile_cont(1./3) = 10.5 (the arithmetic average between 10 et 11) and not 10.7 percentile_cont(2./3) = 20.5 (the arithmetic average between 20 et 21) and not 20.3 Thank in advance for explanation - Météo-France - PALAYRET JACQUES DCSC/MBD jacques.palay...@meteo.fr Fixe : +33 561078319
Re: How do work tercile, percentile & funcion percentile_cont() ?
With Libre Office Calc, same result than with PostgreSQL : =CENTILE({1;2;3;4;5;6;7;8;9;10;11;12;13;14;15;16;17;18;19;20;21;22;23;24;25;26;27;28;29;30}; 0.33) gives 10.57 (and NOT 10.5) I didn't think so. De: "PALAYRET Jacques" À: pgsql-general@lists.postgresql.org Envoyé: Mercredi 22 Avril 2020 10:15:26 Objet: Re: How do work tercile, percentile & funcion percentile_cont() ? Of course, the same result with this table x(c1) for example : SELECT c1, ntile(3) OVER(ORDER BY c1) FROM x ORDER BY c1 ; c1 | ntile +--- 1 | 1 2 | 1 3 | 1 4 | 1 5 | 1 6 | 1 7 | 1 8 | 1 9 | 1 10 | 1 11 | 2 11 | 2 11 | 2 11 | 2 11 | 2 11 | 2 11 | 2 11 | 2 11 | 2 20 | 2 21 | 3 21 | 3 21 | 3 21 | 3 21 | 3 21 | 3 21 | 3 21 | 3 21 | 3 21 | 3 (30 lignes) SELECT percentile_cont(1./3) WITHIN GROUP (ORDER BY c1) FROM x ; percentile_cont -- 10.7 (1 ligne) SELECT percentile_cont(2./3) WITHIN GROUP (ORDER BY c1) FROM x ; percentile_cont -- 20.3 (1 ligne) De: "PALAYRET Jacques" À: pgsql-general@lists.postgresql.org Envoyé: Mercredi 22 Avril 2020 10:00:49 Objet: How do work tercile, percentile & funcion percentile_cont() ? Hello, >From a table x(c1) containing 30 lines with integer values (column c1) from 1 >to 30 : SELECT percentile_cont(1./3) WITHIN GROUP (ORDER BY c1) FROM x ; percentile_cont -- 10.7 (1 ligne) SELECT percentile_cont(2./3) WITHIN GROUP (ORDER BY c1) FROM x ; percentile_cont -- 20.3 (1 ligne) If ordering, the first 10 lines (c1 from 1 to 10) have ntile(3) OVER(ORDER BY c1) = 1, the 10 following lines (c1 from 11 to 20) have ntile(3) OVER(ORDER BY c1) = 2, the last 10 lines (c1 from 21 to 30) have ntile(3) OVER(ORDER BY c1) = 3. So, I though it should be : percentile_cont(1./3) = 10.5 (the arithmetic average between 10 et 11) and not 10.7 percentile_cont(2./3) = 20.5 (the arithmetic average between 20 et 21) and not 20.3 Thank in advance for explanation - Météo-France - PALAYRET JACQUES DCSC/MBD jacques.palay...@meteo.fr Fixe : +33 561078319
Re: How do work tercile, percentile & funcion percentile_cont() ?
PALAYRET Jacques writes: > SELECT percentile_cont(1./3) WITHIN GROUP (ORDER BY c1) FROM x ; > percentile_cont > -- > 10.7 > (1 ligne) > So, I though it should be : > percentile_cont(1./3) = 10.5 (the arithmetic average between 10 et 11) and > not 10.7 As far as I can see from looking at the SQL spec, this is correct; at least, it's clear that we're *not* supposed to just interpolate halfway between the nearest rows. The result is defined to be T0.Y + FACTOR * (T1.Y - T0.Y) where T0.Y and T1.Y are the nearest row values and FACTOR is NVE*(N–1) – floor(NVE*(N–1)) where NVE is the argument of percentile_cont and N is the number of rows. (In SQL:2003, see 10.9 general rule 7)h)i).) regards, tom lane
RE: how to slow down parts of Pg
>From: Laurenz Albe > >>On Tue, 2020-04-21 at 20:30 +, Kevin Brannen wrote: >> I have an unusual need: I need Pg to slow down. I know, we all want >> our DB to go faster, but in this case it's speed is working against me in 1 >> area. >> >> [...] The part that hurts so bad is when we do maintenance operations >> that are DB heavy, like deleting really old records out of archives >> (weekly), moving older records from current tables >> to archive tables plus an analyze (every night), running pg_backup (every >> night), other archiving (weekly), and vacuum full to remove bloat (once a >> quarter). >> All of this generates a lot of disk writes, to state the obvious. >> >> The local server can handle it all just fine, but the network can't handle >> it as it tries to sync to the other server. > >The obvious and best answer is: get a faster network, or choose a different >storage solution. I believe I mention originally that the network is controlled by others (the customer). I've pointed out the results of their choice repeatedly, but their reply is always "budgets", and I reply as politely as I can, "faster network or live with the slowness as I've done all I can for now". It's a somewhat frustrating conversation as you can imagine. >Other than that, you can try to make the maintainance operations less resource >intense: > >- partition the tables so that you can get rid of old data with DROP TABLE. > The ANALYZE won't hurt, if you treat only the required tables. >- use "pg_basebackup" with the "--max-rate" option Yes, this was the heart of the post, how to use less resources. I'd always thought of partitioning the larger tables as an optimization for running reports, but my eyes have been opened that it has other benefits too. I'm not sure changing the backup program will help when it's sitting on top of DRBD, but I can limit DRBD's rate to create the same effect. Still, it doesn't hurt to spend a little time researching this. OTOH, you did just prompt an idea, so that's helpful too. >About VACUUM, you may have a problem. Avoid the need for VACUUM (FULL) at any >price. >That usually requires tuning autovacuum to be faster, which means using more >I/O. OK, I've never really liked doing a "full", but I perceived it as helpful to us. I'll see about making autovacuum more aggressive. Thanks! Kevin . This e-mail transmission, and any documents, files or previous e-mail messages attached to it, may contain confidential information. If you are not the intended recipient, or a person responsible for delivering it to the intended recipient, you are hereby notified that any disclosure, distribution, review, copy or use of any of the information contained in or attached to this message is STRICTLY PROHIBITED. If you have received this transmission in error, please immediately notify us by reply e-mail, and destroy the original transmission and its attachments without reading them or saving them to disk. Thank you.
Re: how to slow down parts of Pg
On 4/21/20 7:43 PM, Virendra Kumar wrote: Hi Adrian, Here is test case, basically when autovacuum runs it did release the space to disk since it had may be continuous blocks which can be released to disk but the space used by index is still being held until I ran the reindex on the table (I assume reindex for index would work as well). Subsequent insert statement did not utilize the blocks in index segment as we can see below so index bloats are still not addressed or may be I am doing something wrong: Well I learned something. I replicated your commands on my 12 instance and got the exact same results. I should have read this: https://www.postgresql.org/docs/12/routine-reindex.html before. -- Adrian Klaver adrian.kla...@aklaver.com
PG based ETL tools
Hi, What are the ETL tools there based on PostgreSQL. I am looking for PG based MPP + ETL open source databases. Thanks Braj
Re: how to slow down parts of Pg
What you need is async replication instead of synchronous replication. On 4/21/20 3:30 PM, Kevin Brannen wrote: I have an unusual need: I need Pg to slow down. I know, we all want our DB to go faster, but in this case it's speed is working against me in 1 area. We have systems that are geo-redundant for HA, with the redundancy being handled by DRBD to keep the disks in sync, which it does at the block level. For normal operations, it actually works out fairly well. That said, we recognize that what we really need to do is one of the forms of streaming (ch 26 of the manual) which I believe would help this problem a lot if not solve it -- but we don't have the time to do that at the moment. I plan and hope to get there by the end of the year. The part that hurts so bad is when we do maintenance operations that are DB heavy, like deleting really old records out of archives (weekly), moving older records from current tables to archive tables plus an analyze (every night), running pg_backup (every night), other archiving (weekly), and vacuum full to remove bloat (once a quarter). All of this generates a lot of disk writes, to state the obvious. The local server can handle it all just fine, but the network can't handle it as it tries to sync to the other server. Sometimes we can add network bandwidth, many times we can't as it depends on others. To borrow a phrase from the current times, we need to flatten the curve. 😊 A few parts of our maintenance process I've tamed by doing "nice -20" on the process (e.g. log rotation); but I can't really do that for Pg because the work gets handed off to a background process that's not a direct child process … and I don't want to slow the DB as a whole because other work is going on (like handling incoming data). Part of the process I've slowed down by doing the work in chunks of 10K rows at a time with a pause between each chunk to allow the network to catch up (instead of an entire table in 1 statement). This sort of works, but some work/SQL is between hard to next-to-impossible to break up like that. That also produces some hard spikes, but that's better than the alternative (next sentence). Still, large portions of the process are hard to control and just punch the network to full capacity and hold it there for far too long. So, do I have any other options to help slow down some of the Pg operations? Or maybe some other short-term mitigations we can do with Pg configurations? Or is this a case where we've already done all we can do and the only answer is move to WAL streaming as fast as possible? If it matters, this is being run on Linux servers. Pg 12.2 is in final testing and will be rolled out to production soon -- so feel free to offer suggestions that only apply to 12.x. Thanks, Kevin This e-mail transmission, and any documents, files or previous e-mail messages attached to it, may contain confidential information. If you are not the intended recipient, or a person responsible for delivering it to the intended recipient, you are hereby notified that any disclosure, distribution, review, copy or use of any of the information contained in or attached to this message is STRICTLY PROHIBITED. If you have received this transmission in error, please immediately notify us by reply e-mail, and destroy the original transmission and its attachments without reading them or saving them to disk. Thank you. -- Angular momentum makes the world go 'round.
Can I tell libpq to connect to the primary?
Hi I'm setting up a new environment with a primary/hot standby replication pair. For read-only clients, I have found the host=host1,host2 connection string[1] which allows a connection when any one of the servers is up. However I'm unsure how to achieve something similar for read/write clients. If I'm providing this list, the client will connect to the first host, which might have become the standby in the meantime. I see that in pgjdbc there are additional options for targetServerType = any, primary, secondary, preferSlave and preferSecondary[2]. However this seems to be java-specific and not implemented in libpq? Is there a way to get this behaviour in PHP/Psycopg/Perl DBI? If not, what is the best alternative to achieve this? My primary & standby are not in the same L3 network, so moving around and IP address with Linux OS clustering is not an option. I'm tending to scripting the REST API of our DNS service to point a CNAME at the new primary during promotion, but maybe there is an easier way I haven't found yet? Maybe something in pgpool/pgbouncer etc? Cheers Christian [1] https://www.postgresql.org/docs/current/libpq-connect.html [2] https://jdbc.postgresql.org/documentation/head/connect.html
Re: how to slow down parts of Pg
On 2020-04-21 21:16:57 +, Kevin Brannen wrote: > From: Michael Loftis > > drbdsetup allows you to control the sync rates. > > I was hoping not to have to do that, but the more I think about this I'm > realizing that it won't hurt because the network cap is effectively limiting > me > anyway. :) Alternatively you might consider traffic shaping. DRBD can only set a fixed limit (because it knows only about its own traffic). Traffic shaping can adjust the limit depending on other traffic (it can also prioritize traffic, etc.). However, to be effective, it needs to run on a router as close to the bottleneck as possible - typically that means either the border router or the firewall. So it is something the customer's network guy should set up. hp -- _ | Peter J. Holzer| Story must make more sense than reality. |_|_) || | | | h...@hjp.at |-- Charles Stross, "Creative writing __/ | http://www.hjp.at/ | challenge!" signature.asc Description: PGP signature
Re: How do work tercile, percentile & funcion percentile_cont() ?
On 2020-04-22 10:00:49 +0200, PALAYRET Jacques wrote: > From a table x(c1) containing 30 lines with integer values (column c1) from 1 > to 30 : > > SELECT percentile_cont(1./3) WITHIN GROUP (ORDER BY c1) FROM x ; > percentile_cont > -- > 10.7 > (1 ligne) > SELECT percentile_cont(2./3) WITHIN GROUP (ORDER BY c1) FROM x ; > percentile_cont > -- > 20.3 > (1 ligne) Think of the 30 values as points in a graph: The x values run from 0 to 29, the y values are your values. The points are connected with straight lines. Since your y values just increase by 1 you have a straight line from (0, 1) to (29, 30) Then percentile_cont(1./3) is the value y value 1/3 along that line and percentile_cont(2./3) is the y value 2/3 along that line. So you need to find the y values corresponding to 29*(1/3) = 9.667 and 29*(2/3) = 19.333. These are obviously 10.667 and 20.333 respectively. hp -- _ | Peter J. Holzer| Story must make more sense than reality. |_|_) || | | | h...@hjp.at |-- Charles Stross, "Creative writing __/ | http://www.hjp.at/ | challenge!" signature.asc Description: PGP signature
Re: Can I tell libpq to connect to the primary?
On 22.04.20 21:10, Christian Ramseyer wrote: > > I see that in pgjdbc there are additional options for targetServerType = > any, primary, secondary, preferSlave and preferSecondary[2]. However > this seems to be java-specific and not implemented in libpq? Is there a > way to get this behaviour in PHP/Psycopg/Perl DBI? > Never mind, after RTFM'ing to the very end of https://www.postgresql.org/docs/current/libpq-connect.html I have discovered target_session_attrs=read-write|any which seems to do exactly what I want. Cheers Christian