Extract transactions from wals ??
Hi, due to a set of bugs and wrong manip, an inappropriate update have been done into a production DB. After that, quite a long set of valuables inserts and updates have been done and needs to be kept. Obviously getting a backup and applying pitr will get us just before the offending update. Now, we need to find a way of extracting, either from the ex prod db, or from the wals, the "good" transactions to be able to re-apply them. This did already happen on a Prod Oracle DB, and recovering was possible with a : select * from table_name AS OF TIMESTAMP TO_TIMESTAMP('09052019 0900','MMDD HH24MI'); to get most things done after the problem. As we are currently moving out of Oracle, we must prove to the business people that our new postgres env is fine. So, ... any idea ? thanks, Marc MILLAS Senior Architect +33607850334 www.mokadb.com
Re: Extract transactions from wals ??
Hi Laurenz, you say "extract the data you need" That is exactly the point of my question, as the PITR step was obvious. How to guess "what is the data" I need ?? The timestamp stuff within Oracle was providing exactly that: get all mods from a given table that did occur within a given timeframe. Quite clearly, an option, for the future, would be to modify ALL tables and add a timestamp column and a trigger to fill/update it. a tad boring to do... This is why I was wondering if it exits another possibility, like getting, from the wals, a list of modify objects. so ?? regards, Marc MILLAS Senior Architect +33607850334 www.mokadb.com On Thu, Nov 21, 2019 at 3:54 PM Laurenz Albe wrote: > On Thu, 2019-11-21 at 14:50 +0100, Marc Millas wrote: > > due to a set of bugs and wrong manip, an inappropriate update have been > done into a production DB. > > After that, quite a long set of valuables inserts and updates have been > done and needs to be kept. > > Obviously getting a backup and applying pitr will get us just before the > offending update. > > Now, we need to find a way of extracting, either from the ex prod db, or > from the wals, the "good" transactions to be able to re-apply them. > > > > This did already happen on a Prod Oracle DB, and recovering was > possible with a : > > select * from table_name AS OF TIMESTAMP TO_TIMESTAMP('09052019 > 0900','MMDD HH24MI'); > > to get most things done after the problem. > > As we are currently moving out of Oracle, we must prove to the business > people that our new postgres env is fine. > > So, ... any idea ? > > Sure. > > Restore a backup and perform point-in-time-recovery. > Then extract the data you need. > > Yours, > Laurenz Albe > -- > Cybertec | https://www.cybertec-postgresql.com > >
Re: Extract transaction from WAL
Hi, funny enough, this pb looks similar to mine. the point is: how to guess: what is the "data I need" ?? Looks like we are looking for a way to ask postgres: which transactions have occurred between this and that. Obviously, if we can have, online, both the db after the PITR and the db "corrupted" we can try to create a dblink from one to the other and, then try to extract the "differences". but this is not always possible. hence the question about wals. or ? regards, Marc MILLAS Senior Architect +33607850334 www.mokadb.com On Thu, Nov 21, 2019 at 3:24 PM Michael Loftis wrote: > > > On Thu, Nov 21, 2019 at 04:56 Jill Jade wrote: > >> Hello everyone, >> >> I am new to Postgres and I have a query. >> >> I have updated a table which I should not have. >> >> Is there a way to extract the transactions from the WAL and get back the >> previous data? >> >> Is there a tool that can help to get back the transactions? >> > > The normal way is to use a backup along with point in time recovery. But > this requires you’ve setup backups and are archiving WALs F/ex with > pgbackrest. You restore the last full backup from before the incident and > play back to a time stamp or transaction ID. Either to the original server > or elsewhere...in this case I would probably restore elsewhere and extract > the data I needed using tools like pg_dump to restore the selected data. > > I’m personally unaware of other methods which may exist. > >> >> Thanks in advance. >> >> Regards, >> Jill >> >> >> -- > > "Genius might be described as a supreme capacity for getting its possessors > into trouble of all kinds." > -- Samuel Butler >
Re: Extract transactions from wals ??
Hi Laurenz, I was writing select from ""table"" as a template. We have to do this for a bunch of tables. So, to my understanding, what you suggest is to PITR up to the first timestamp, extract all meaningfull tables, and then pitr to the second timestamp so as to be able to script a kind of "diff" between the 2 to get what I want. Yes ? Marc MILLAS Senior Architect +33607850334 www.mokadb.com On Thu, Nov 21, 2019 at 5:16 PM Laurenz Albe wrote: > On Thu, 2019-11-21 at 17:07 +0100, Marc Millas wrote: > > you say "extract the data you need" > > That is exactly the point of my question, as the PITR step was obvious. > > How to guess "what is the data" I need ?? > > Well, you asked for the contents of a table AS OF TIMESTAMP . > > That means you know which table and which timestamp. > > So restore the PostgreSQL cluster to that timestamp, connect > and SELECT from the table. > > Yours, > Laurenz Albe > -- > Cybertec | https://www.cybertec-postgresql.com > >
Re: Extract transactions from wals ??
Yes ! We are looking for something providing a functionnality similar to Oracle's :-) Through PITR or a tool or extension around wals. Still, as wals are containing enough info for replication to work, It should be possible to extract from wals a list of objects that have been written, and elements about what was written. OS files written and position should be enough to provide this. Something ? thanks, Marc MILLAS Senior Architect +33607850334 www.mokadb.com On Fri, Nov 22, 2019 at 11:02 AM Ganesh Korde wrote: > I think he wants to see data from different tables at different timestamp > (like flashback query in Oracle). As per my understanding question here is > can PITR be done for specific table and for specific timestamp. > > On Fri, Nov 22, 2019 at 2:37 PM Laurenz Albe > wrote: > >> On Thu, 2019-11-21 at 17:35 +0100, Marc Millas wrote: >> > I was writing select from ""table"" as a template. We have to do this >> for a bunch of tables. >> > So, to my understanding, what you suggest is to PITR up to the first >> timestamp, >> > extract all meaningfull tables, and then pitr to the second timestamp >> > so as to be able to script a kind of "diff" between the 2 to get what I >> want. >> >> Sure, you can do that. >> >> The description of what you wanted to do was rather unclear, all I could >> make out is that you want to query AS OF TIMESTAMP. >> >> Yours, >> Laurenz Albe >> -- >> Cybertec | https://www.cybertec-postgresql.com >> >> >> >>
install postgres
Hi, I may overlook something obvious.. I am just looking, on the download pages of postgresql.org for a way to download rpm.(for redhat 6 and redhat 7 x64 machines) NOT the noarch, but the full distro. Reason is I have to install postgres on a network with NO internet access. Thanks, Marc MILLAS Senior Architect +33607850334 www.mokadb.com
table name
sorry if my question is tooo simple :-) I got a shapefile from the french gov. I import it with postgis 3.01 utility. fine ! the table created by this utility is named regions-20180101 with the dash in the middle. I see that table name in pg_class, and, also, in the geometry_columns view. obviously if I ask: select * from regions-20180101; I get a syntax error. if I try select * from $$regions_20180101$$; I get another syntax error. If I try to rename that table, same thing. if I try a cte, same thing. What should I do ? thanks, Marc MILLAS Senior Architect +33607850334 www.mokadb.com
Re: table name
damn.. thanks Marc MILLAS Senior Architect +33607850334 www.mokadb.com On Thu, Jun 11, 2020 at 8:55 PM Paul Ramsey wrote: > ALTER TABLE "regions-20180101" rename to regions_20180101; > > > > On Jun 11, 2020, at 11:54 AM, Marc Millas > wrote: > > > > sorry if my question is tooo simple :-) > > > > I got a shapefile from the french gov. > > I import it with postgis 3.01 utility. > > fine ! > > the table created by this utility is named regions-20180101 > > with the dash in the middle. > > I see that table name in pg_class, and, also, in the geometry_columns > view. > > > > > > obviously if I ask: > > select * from regions-20180101; > > I get a syntax error. > > if I try select * from $$regions_20180101$$; > > I get another syntax error. > > If I try to rename that table, same thing. > > if I try a cte, same thing. > > > > What should I do ? > > > > thanks, > > > > > > Marc MILLAS > > Senior Architect > > +33607850334 > > www.mokadb.com > > > >
some random() clarification needed
Hi, when, in psql, on a postgres 12.3, I write: select ceiling(random()*2582); it does provide the expected answer, ie. a number between 1 and 2582, inclusive. allways. when I decide to use this to get a random row within a table prenoms having 2 columns a id serial, and a prenom varchar, with explicitly 2582 lines, no gaps, I write: select id, prenom from prenoms where id=ceiling(random()*2582); expecting to get, allways, one line. But its not the case. around 15% of time I get 0 lines which is already quite strange to me. but 10% of time, I get a random number of lines, until now up to 4. even weirder (to me !) so, can someone please clarify ? thanks, regards, Marc MILLAS Senior Architect +33607850334 www.mokadb.com
Re: some random() clarification needed
Ok, thanks for the clarification. Marc MILLAS Senior Architect +33607850334 www.mokadb.com On Tue, Jul 14, 2020 at 5:24 PM David G. Johnston < david.g.johns...@gmail.com> wrote: > On Tue, Jul 14, 2020 at 8:15 AM Marc Millas > wrote: > >> select id, prenom from prenoms where id=ceiling(random()*2582); >> >> expecting to get, allways, one line. >> But its not the case. >> around 15% of time I get 0 lines which is already quite strange to me. >> but 10% of time, I get a random number of lines, until now up to 4. >> even weirder (to me !) >> >> so, can someone please clarify ? >> >> > You are basically asking: > > For each row in my table compare the id to some random number and if they > match return that row, otherwise skip it. The random number being compared > to is different for each row because random() is volatile and thus > evaluated for each row. > > David J. >
Re: some random() clarification needed
Hi, your answer helps me understand my first problem. so, I rewrote a simple loop so as to avoid the "volatile" behaviour. (at least I was thinking I did... looks like I was wrong !) step by step loop: DO $$ BEGIN FOR counter IN 1..1000 LOOP begin declare id1 integer =ceiling(random()*2582); id3 date= '2000-01-01'; id2 date; pren varchar; begin id2=id3 + (random()*7200)::integer; SELECT prenom FROM prenoms WHERE id=id1 into pren; INSERT INTO testparttransac (datenaissance, prenoms) values(id2,pren); end; end; END LOOP; END; $$; I truncated the table, executed the loop with no errors, and expected that a select count(*) may answer 1000 ! no. it varies, from less than 1000 (much less, something like 900) and more than 1000 (up to 1094) so... what s "volatile" in the loop ? BTW the testparttransac table is partitioned on datenaissance, with a default partition. thanks, regards, Marc MILLAS Senior Architect +33607850334 www.mokadb.com On Tue, Jul 14, 2020 at 5:24 PM David G. Johnston < david.g.johns...@gmail.com> wrote: > On Tue, Jul 14, 2020 at 8:15 AM Marc Millas > wrote: > >> select id, prenom from prenoms where id=ceiling(random()*2582); >> >> expecting to get, allways, one line. >> But its not the case. >> around 15% of time I get 0 lines which is already quite strange to me. >> but 10% of time, I get a random number of lines, until now up to 4. >> even weirder (to me !) >> >> so, can someone please clarify ? >> >> > You are basically asking: > > For each row in my table compare the id to some random number and if they > match return that row, otherwise skip it. The random number being compared > to is different for each row because random() is volatile and thus > evaluated for each row. > > David J. >
how to "explain" some ddl
Hi, We would like to understand where an alter table attach partition spend its time. to my understanding, explain doesnt do this. for a BI job we have a partitionned table with 1800+ partitions. the feeding process of this table leeds to detach and attach partitions. attaching do take time, something like 12 seconds by partition. We need to understand where that time is spent (check constraints or check within the default partition or...) So, how to ? thanks, regards, Marc MILLAS Senior Architect +33607850334 www.mokadb.com
Re: how to "explain" some ddl
Hi, thanks for the answer. the pb is that the fact table do have mods for "old" data. so the current scheme implies to truncate partitions and recreate them, and copy from ods to dm, etc which is better than millions (tens of) delete and vacuuming. and so, the partitioning scheme is based on day s data. so the 1800+. the other pb we do have is the very long planning time for most request. was 120 sec in r11, down to 60 sec in 12. vs an exec time around 4 sec. Looks like the number of indexes is of paramount impact. Can you take me to any doc about optimizing the index scheme for a fact table with 40 dimensions ? thanks regards, Marc MILLAS Senior Architect +33607850334 www.mokadb.com On Tue, Jul 14, 2020 at 7:05 PM Tom Lane wrote: > Marc Millas writes: > > We would like to understand where an alter table attach partition spend > its > > time. > > to my understanding, explain doesnt do this. > > Nope :-(. As our DDL commands have gotten more complicated, there's > been some discussion of adding that, but nothing's really been done > yet. > > There is some progress-monitoring support for some DDL commands now, > but that's not quite the same thing. > > > for a BI job we have a partitionned table with 1800+ partitions. > > TBH I'd recommend scaling that down by at least a factor of ten. > We are not at a point where you can expect that all operations will > be cheap even with thousands of partitions. We may never be at that > point, although people continue to chip away at the bottlenecks. > > regards, tom lane >
Re: how to "explain" some ddl
Hi Tom, a few tests later. Looks like when you add a partition as default, all tupples of it are read, even if there is an index on the column that is the partition key. this do explain our attach time. We are going to clean the default partition... regards, Marc MILLAS Senior Architect +33607850334 www.mokadb.com On Tue, Jul 14, 2020 at 7:05 PM Tom Lane wrote: > Marc Millas writes: > > We would like to understand where an alter table attach partition spend > its > > time. > > to my understanding, explain doesnt do this. > > Nope :-(. As our DDL commands have gotten more complicated, there's > been some discussion of adding that, but nothing's really been done > yet. > > There is some progress-monitoring support for some DDL commands now, > but that's not quite the same thing. > > > for a BI job we have a partitionned table with 1800+ partitions. > > TBH I'd recommend scaling that down by at least a factor of ten. > We are not at a point where you can expect that all operations will > be cheap even with thousands of partitions. We may never be at that > point, although people continue to chip away at the bottlenecks. > > regards, tom lane >
Re: some random() clarification needed
Thanks! makes it clearer :-) its not that obvious to guess the consequences of the "volatile" behaviour. regards, Marc MILLAS Senior Architect +33607850334 www.mokadb.com On Wed, Jul 15, 2020 at 1:53 AM David Rowley wrote: > On Wed, 15 Jul 2020 at 04:01, Marc Millas wrote: > > your answer helps me understand my first problem. > > so, I rewrote a simple loop so as to avoid the "volatile" behaviour. > > Not sure what you're trying to do with the plpgsql, but you can just > escape the multiple evaluations by putting the volatile function in a > sub-select with no FROM clause. > > SELECT ... FROM ... WHERE id = (SELECT ceiling(random()*2582)); > > Or the more traditional way to get a random row is: > > SELECT ... FROM ... WHERE id BETWEEN 0 AND 2585 ORDER BY random() LIMIT 1; > > David >
ransomware
Hi, I have been asked the following question: is there anyway, from within postgres, to detect any ""abnormal"" disk writing activity ? obvious goal would be to alert if... its quite clear that the underlying OS is the place to do the checks, but, still --to my understanding, a simple script can check various inner counters, but this will imply that the "undesired" soft uses postgres to do the crypting (any experience on this ???) --another approach would be based on the fact that, if anything do change any postgres file (data, current wal, ...) postgres should somehow "hang" there are various ways to do those checks but I was wandering if any ""standard''" solution exist within postgres ecosystem, or someone do have any feedback on the topic. thanks for your help Marc MILLAS Senior Architect +33607850334 www.mokadb.com
Re: ransomware
Hi, I know its quite general. It is as I dont know what approaches may exist. Requirement is extremely simple: Is there anyway, from a running postgres standpoint, to be aware that a ransomware is currently crypting your data ? answer can be as simple as: when postgres do crash. something else ? Marc MILLAS Senior Architect +33607850334 www.mokadb.com On Tue, Feb 2, 2021 at 2:37 AM Michael Paquier wrote: > On Mon, Feb 01, 2021 at 03:38:35PM +0100, Marc Millas wrote: > > there are various ways to do those checks but I was wandering if any > > ""standard''" solution exist within postgres ecosystem, or someone do > have > > any feedback on the topic. > > It seems to me that you should first write down on a sheet of paper a > list of all the requirements you are trying to satisfy. What you are > describing here is a rather general problem line, so nobody can help > without knowing what you are trying to achieve, precisely. > -- > Michael >
prepare in a do loop
Hi, in psql, with a postgres 12.5 db on a centos 7 intel: I do create a function named randname() returning a varchar, and a table matable with a column prenom varchar(50). then postgres=# prepare moninsert(varchar) as postgres-# insert into matable(prenoms) values($1); PREPARE I test it: postgres=# execute moninsert(randname()); INSERT 0 1 up to now, everything fine. then: do $$ begin for counter in 1..100 loop execute moninsert(randname());end loop;end;$$; ERREUR: la fonction moninsert(character varying) n'existe pas LIGNE 1 : SELECT moninsert(randname()) someone can explain ? thanks (its a french db, so error message in french :-) Marc MILLAS Senior Architect +33607850334 www.mokadb.com
Re: prepare in a do loop
Hi Tom, I do read the doc, and understand the caching behaviour of plpgsql. if in psql I write begin;execute moninsert(randname()); execute moninsert(randname());end; it does work. And if I put this (begin execute end) inside a do loop it doesnt anymore. ok the begin execute end is ""pure"" SQL, and the same thing within a do loop is plpgsql so postgres=# create function testexec()returns void as $$ postgres$# execute moninsert(randname()); postgres$# end; postgres$# $$ language plpgsql; ERREUR: erreur de syntaxe sur ou près de « execute » LIGNE 2 : execute moninsert(randname()); fine, quite coherent. then postgres=# create function testexec()returns void as $$ execute moninsert(randname()); end; $$ language sql; CREATE FUNCTION as SQL, legal syntax.. ok but postgres=# select testexec(); ERREUR: COMMIT n'est pas autorisé dans une fonction SQL CONTEXTE : fonction SQL « testexec » lors du lancement a bit more difficult to understand, as such.(where is the commit ??) so.. the prepare//execute thing can only be used in embedded SQL (as not in any plpg, nor in sql functions. The doc states : The SQL standard includes a PREPARE statement, but it is only for use in embedded SQL. This version of the PREPARE statement also uses a somewhat different syntax. ??? where is the difference for the prepare context thing (I dont mean the different syntax part) ?? thanks for clarification Marc MILLAS Senior Architect +33607850334 www.mokadb.com On Mon, Feb 15, 2021 at 5:27 PM Tom Lane wrote: > Marc Millas writes: > > in psql, with a postgres 12.5 db on a centos 7 intel: > > I do create a function named randname() returning a varchar, and a table > > matable with a column prenom varchar(50). then > > postgres=# prepare moninsert(varchar) as > > postgres-# insert into matable(prenoms) values($1); > > PREPARE > > > I test it: > > postgres=# execute moninsert(randname()); > > INSERT 0 1 > > > up to now, everything fine. then: > > do $$ begin for counter in 1..100 loop execute > > moninsert(randname());end loop;end;$$; > > ERREUR: la fonction moninsert(character varying) n'existe pas > > LIGNE 1 : SELECT moninsert(randname()) > > > someone can explain ? > > EXECUTE in plpgsql has nothing whatever to do with the SQL EXECUTE > command. See the respective documentation. > > You don't actually need to use SQL PREPARE/EXECUTE in plpgsql. > If you just write "insert into ..." as a command in a > plpgsql function, it's automatically prepared behind the scenes. > Indeed, one of the common uses for plpgsql's EXECUTE is to stop > a prepared plan from being used when you don't want that ... so > far from being the same thing, they're more nearly opposites. > Perhaps a different name should have been chosen, but we're > stuck now. > > regards, tom lane >
postgis
Hi, I would like to install postgis 3.04 on a debian 11. digging into various web sites, I didnt found the name of that packet. can someone help ? thanks Marc MILLAS Senior Architect +33607850334 www.mokadb.com
Re: postgis
Hi, postgres 12. I may use whatever repo. I install postgres from postgresql.org... Marc MILLAS Senior Architect +33607850334 www.mokadb.com On Mon, Jul 18, 2022 at 7:15 PM Adrian Klaver wrote: > On 7/18/22 10:08, Marc Millas wrote: > > Hi, > > > > I would like to install postgis 3.04 on a debian 11. > > > > digging into various web sites, I didnt found the name of that packet. > > > > can someone help ? > > Should have added to previous response, what version of Postgres? > > > > thanks > > > > Marc MILLAS > > Senior Architect > > +33607850334 > > www.mokadb.com <http://www.mokadb.com> > > > > > -- > Adrian Klaver > adrian.kla...@aklaver.com >
Re: postgis
Thanks Adrian still, I see a bunch of files, and nothing that can be installed via apt-get install. The only "things" big enough to contain a postgis distrib, like https://apt.postgresql.org/pub/repos/apt/pool/main/p/postgis/postgis_3.0.3%2Bdfsg.orig.tar.xz can obviously not be installed via apt. what am I missing ?? Marc MILLAS Senior Architect +33607850334 www.mokadb.com On Tue, Jul 19, 2022 at 12:09 AM Adrian Klaver wrote: > On 7/18/22 11:48, Marc Millas wrote: > > Hi, > > postgres 12. > > I may use whatever repo. > > > > I install postgres from postgresql.org... > > You pretty much need to install from the same repo for PostGIS. > Otherwise you will most likely run into compatibility issues. > > From here: > > https://apt.postgresql.org/pub/repos/apt/pool/main/p/postgis/ > > I only see 3.0.1 and 3.0.3 as options. > > A more complete answer will need to come from one of the maintainers. > > > > > > > > > Marc MILLAS > > Senior Architect > > +33607850334 > > www.mokadb.com <http://www.mokadb.com> > > > > > > > > -- > Adrian Klaver > adrian.kla...@aklaver.com >
Re: postgis
just... nothing ! Marc MILLAS Senior Architect +33607850334 www.mokadb.com On Tue, Jul 19, 2022 at 6:36 PM Adrian Klaver wrote: > On 7/19/22 9:01 AM, Marc Millas wrote: > > Thanks Adrian > > > > still, I see a bunch of files, and nothing that can be installed via > > apt-get install. > > The only "things" big enough to contain a postgis distrib, like > > > https://apt.postgresql.org/pub/repos/apt/pool/main/p/postgis/postgis_3.0.3%2Bdfsg.orig.tar.xz > > < > https://apt.postgresql.org/pub/repos/apt/pool/main/p/postgis/postgis_3.0.3%2Bdfsg.orig.tar.xz > > > > > > can obviously not be installed via apt. > > > > > what am I missing ?? > > What does: > > apt-cache search postgresql-12-postgis > > return? > > > > > > > Marc MILLAS > > Senior Architect > > +33607850334 > > www.mokadb.com <http://www.mokadb.com> > > > > > > -- > Adrian Klaver > adrian.kla...@aklaver.com >
Re: postgis
It is. I do begin with the postgres site script: sudo sh -c 'echo "deb http://apt.postgresql.org/pub/repos/apt $(lsb_release -cs)-pgdg main" > /etc/apt/sources.list.d/pgdg.list' so... Marc MILLAS Senior Architect +33607850334 www.mokadb.com On Tue, Jul 19, 2022 at 6:54 PM Adrian Klaver wrote: > On 7/19/22 9:51 AM, Marc Millas wrote: > > just... nothing ! > > I thought you said you used the Postgres community repo to install > Postgres. > > Is that not the case? > > > > > > Marc MILLAS > > Senior Architect > > +33607850334 > > www.mokadb.com <http://www.mokadb.com> > > > > > > > > -- > Adrian Klaver > adrian.kla...@aklaver.com >
Re: postgis
Hi, from your message, I understand that for debian 11, I can NOT get any 3.0.x version. right ? Marc MILLAS Senior Architect +33607850334 www.mokadb.com On Tue, Jul 19, 2022 at 7:20 PM Imre Samu wrote: > > I would like to install postgis 3.04 on a debian 11 > > postgres 12. > > I may use whatever repo. > > I install postgres from postgresql.org... > > As I see - from the official postgresql.org debian11 repo, > you can only install the "postgresql-12-postgis-3" package ( now: > Postgis=3.2.1 ) > > docker run -it --rm postgres:12 bash > apt update && apt search postgis | grep postgresql-12-postgis-3 >---> *postgresql-12-postgis-3/bullseye-pgdg 3.2.1+dfsg-1.pgdg110+1 > amd64* > cat /etc/os-release | grep VERSION >---> VERSION="11 (bullseye) > > Now the latest 3.0 version is http://postgis.net/2022/02/02/postgis-3.0.5/ > > regards, > Imre > > Marc Millas ezt írta (időpont: 2022. júl. 18., > H, 20:48): > >> Hi, >> postgres 12. >> I may use whatever repo. >> >> I install postgres from postgresql.org... >> >> >> >> Marc MILLAS >> Senior Architect >> +33607850334 >> www.mokadb.com >> >> >> >> On Mon, Jul 18, 2022 at 7:15 PM Adrian Klaver >> wrote: >> >>> On 7/18/22 10:08, Marc Millas wrote: >>> > Hi, >>> > >>> > I would like to install postgis 3.04 on a debian 11. >>> > >>> > digging into various web sites, I didnt found the name of that packet. >>> > >>> > can someone help ? >>> >>> Should have added to previous response, what version of Postgres? >>> > >>> > thanks >>> > >>> > Marc MILLAS >>> > Senior Architect >>> > +33607850334 >>> > www.mokadb.com <http://www.mokadb.com> >>> > >>> >>> >>> -- >>> Adrian Klaver >>> adrian.kla...@aklaver.com >>> >>
Re: postgis
I did run each step of the script and did install a postgres 12.11. then destroyed the instance created by the script, and, then pg_createcluster a new one, which is running fine. Marc MILLAS Senior Architect +33607850334 www.mokadb.com On Tue, Jul 19, 2022 at 8:42 PM Adrian Klaver wrote: > On 7/19/22 11:19, Marc Millas wrote: > > It is. I do begin with the postgres site script: > > > > sudo sh -c 'echo "debhttp://apt.postgresql.org/pub/repos/apt < > http://apt.postgresql.org/pub/repos/apt> $(lsb_release -cs)-pgdg main" > > /etc/apt/sources.list.d/pgdg.list' > > > > > > so... > > Have you run?: > > sudo apt update > > > > > > > > Marc MILLAS > > Senior Architect > > +33607850334 > > www.mokadb.com <http://www.mokadb.com> > > > > > > -- > Adrian Klaver > adrian.kla...@aklaver.com >
Re: postgis
Postgres installed, but not postgis.. which is why I need some help... Marc MILLAS Senior Architect +33607850334 www.mokadb.com On Tue, Jul 19, 2022 at 11:43 PM Adrian Klaver wrote: > On 7/19/22 2:09 PM, Marc Millas wrote: > > I did run each step of the script and did install a postgres 12.11. > > then destroyed the instance created by the script, and, then > > pg_createcluster a new one, which is running fine. > > Does this mean you have PostGIS installed now? > > > Marc MILLAS > > Senior Architect > > +33607850334 > > www.mokadb.com <http://www.mokadb.com> > > > -- > Adrian Klaver > adrian.kla...@aklaver.com >
Re: postgis
??? I did describe precisely what I did: On 7/19/22 2:09 PM, Marc Millas wrote: > > I did run each step of the script and did install a postgres 12.11. > > then destroyed the instance created by the script, and, then > > pg_createcluster a new one, which is running fine. no error messages.. Postgres is fine. My question i(from the beginning) s about the availability of a *postgis *3.0.x distro for postgres 12 on debian 11. and, if yes, the name of the package, and the @ of the repo. To my understanding, such a *postgis* distro does not exist in the postgresql.org repos so through this mail list, I was trying to know if it may exist somewhere. I know that I am supposed to post such a question on the postgis mail list. But.. I register thru the postgis web site, and didnt get any answer. ... Marc MILLAS Senior Architect +33607850334 www.mokadb.com On Wed, Jul 20, 2022 at 10:25 AM Ron wrote: > You've never shown us *exactly what you did*, along with any *error > messages*. > > On 7/19/22 22:07, Marc Millas wrote: > > Postgres installed, but not postgis.. which is why I need some help... > > Marc MILLAS > Senior Architect > +33607850334 > www.mokadb.com > > > > On Tue, Jul 19, 2022 at 11:43 PM Adrian Klaver > wrote: > >> On 7/19/22 2:09 PM, Marc Millas wrote: >> > I did run each step of the script and did install a postgres 12.11. >> > then destroyed the instance created by the script, and, then >> > pg_createcluster a new one, which is running fine. >> >> Does this mean you have PostGIS installed now? >> >> > Marc MILLAS >> > Senior Architect >> > +33607850334 >> > www.mokadb.com <http://www.mokadb.com> >> >> >> -- >> Adrian Klaver >> adrian.kla...@aklaver.com >> > > -- > Angular momentum makes the world go 'round. >
Re: postgis
??? I didnt get any error, as I dont know the name of the package to be installed !!! my question was, and still is: "Hi, I would like to install postgis 3.04 on a debian 11. digging into various web sites, I didnt found the name of that packet. can someone help ?" so.. the only info on the debian postgis page I was able to find is the name of a package: postgres-12-postgis-3 which do install a postgis 3.2.1 not a postgis 3.0.x Marc MILLAS Senior Architect +33607850334 www.mokadb.com On Wed, Jul 20, 2022 at 3:52 PM Ron wrote: > This long drama is about *POSTGIS*, not Postgresql. What error do you > get when trying to install *POSTGIS*? > > On 7/20/22 08:26, Marc Millas wrote: > > ??? > > I did describe precisely what I did: > > On 7/19/22 2:09 PM, Marc Millas wrote: >> > I did run each step of the script and did install a postgres 12.11. >> > then destroyed the instance created by the script, and, then >> > pg_createcluster a new one, which is running fine. > > no error messages.. Postgres is fine. > > My question i(from the beginning) s about the availability of a *postgis > *3.0.x > distro for postgres 12 on debian 11. > and, if yes, the name of the package, and the @ of the repo. > > To my understanding, such a *postgis* distro does not exist in the > postgresql.org repos > so through this mail list, I was trying to know if it may exist somewhere. > > I know that I am supposed to post such a question on the postgis mail list. > But.. I register thru the postgis web site, and didnt get any answer. > ... > > Marc MILLAS > Senior Architect > +33607850334 > www.mokadb.com > > > > On Wed, Jul 20, 2022 at 10:25 AM Ron wrote: > >> You've never shown us *exactly what you did*, along with any *error >> messages*. >> >> On 7/19/22 22:07, Marc Millas wrote: >> >> Postgres installed, but not postgis.. which is why I need some help... >> >> Marc MILLAS >> Senior Architect >> +33607850334 >> www.mokadb.com >> >> >> >> On Tue, Jul 19, 2022 at 11:43 PM Adrian Klaver >> wrote: >> >>> On 7/19/22 2:09 PM, Marc Millas wrote: >>> > I did run each step of the script and did install a postgres 12.11. >>> > then destroyed the instance created by the script, and, then >>> > pg_createcluster a new one, which is running fine. >>> >>> Does this mean you have PostGIS installed now? >>> >>> > Marc MILLAS >>> > Senior Architect >>> > +33607850334 >>> > www.mokadb.com <http://www.mokadb.com> >>> >>> >>> -- >>> Adrian Klaver >>> adrian.kla...@aklaver.com >>> >> >> -- >> Angular momentum makes the world go 'round. >> > > -- > Angular momentum makes the world go 'round. >
Re: postgis
Thanks for your answer. I would like to avoid compiling as much as possible. I know that postgis 3.2.1 is available and does install without pb. but.. That db run an app which is very long to test, so I need to stick to a postgis 3.0.x regards, Marc MILLAS Senior Architect +33607850334 www.mokadb.com On Wed, Jul 20, 2022 at 4:16 PM jian he wrote: > > Can you try compiling from source: https://postgis.net/source/ > > postgis 3.2.1 is OK. postgresql & postgis version info: > https://trac.osgeo.org/postgis/wiki/UsersWikiPostgreSQLPostGIS > > if you already installed check postgis version: > https://postgis.net/docs/PostGIS_Version.html > > On Wed, Jul 20, 2022 at 7:37 PM Marc Millas > wrote: > >> ??? >> I didnt get any error, as I dont know the name of the package to be >> installed !!! >> >> my question was, and still is: >> "Hi, >> I would like to install postgis 3.04 on a debian 11. >> >> digging into various web sites, I didnt found the name of that packet. >> >> can someone help ?" >> >> so.. >> the only info on the debian postgis page I was able to find is the name >> of a package: >> postgres-12-postgis-3 which do install a postgis 3.2.1 >> not a postgis 3.0.x >> >> >> >> >> >> >> Marc MILLAS >> Senior Architect >> +33607850334 >> www.mokadb.com >> >> >> >> On Wed, Jul 20, 2022 at 3:52 PM Ron wrote: >> >>> This long drama is about *POSTGIS*, not Postgresql. What error do you >>> get when trying to install *POSTGIS*? >>> >>> On 7/20/22 08:26, Marc Millas wrote: >>> >>> ??? >>> >>> I did describe precisely what I did: >>> >>> On 7/19/22 2:09 PM, Marc Millas wrote: >>>> > I did run each step of the script and did install a postgres 12.11. >>>> > then destroyed the instance created by the script, and, then >>>> > pg_createcluster a new one, which is running fine. >>> >>> no error messages.. Postgres is fine. >>> >>> My question i(from the beginning) s about the availability of a *postgis >>> *3.0.x distro for postgres 12 on debian 11. >>> and, if yes, the name of the package, and the @ of the repo. >>> >>> To my understanding, such a *postgis* distro does not exist in the >>> postgresql.org repos >>> so through this mail list, I was trying to know if it may exist >>> somewhere. >>> >>> I know that I am supposed to post such a question on the postgis mail >>> list. >>> But.. I register thru the postgis web site, and didnt get any answer. >>> ... >>> >>> Marc MILLAS >>> Senior Architect >>> +33607850334 >>> www.mokadb.com >>> >>> >>> >>> On Wed, Jul 20, 2022 at 10:25 AM Ron wrote: >>> >>>> You've never shown us *exactly what you did*, along with any *error >>>> messages*. >>>> >>>> On 7/19/22 22:07, Marc Millas wrote: >>>> >>>> Postgres installed, but not postgis.. which is why I need some help... >>>> >>>> Marc MILLAS >>>> Senior Architect >>>> +33607850334 >>>> www.mokadb.com >>>> >>>> >>>> >>>> On Tue, Jul 19, 2022 at 11:43 PM Adrian Klaver < >>>> adrian.kla...@aklaver.com> wrote: >>>> >>>>> On 7/19/22 2:09 PM, Marc Millas wrote: >>>>> > I did run each step of the script and did install a postgres 12.11. >>>>> > then destroyed the instance created by the script, and, then >>>>> > pg_createcluster a new one, which is running fine. >>>>> >>>>> Does this mean you have PostGIS installed now? >>>>> >>>>> > Marc MILLAS >>>>> > Senior Architect >>>>> > +33607850334 >>>>> > www.mokadb.com <http://www.mokadb.com> >>>>> >>>>> >>>>> -- >>>>> Adrian Klaver >>>>> adrian.kla...@aklaver.com >>>>> >>>> >>>> -- >>>> Angular momentum makes the world go 'round. >>>> >>> >>> -- >>> Angular momentum makes the world go 'round. >>> >> > > -- > I recommend David Deutsch's <> > > Jian > > >
Re: postgis
right. so I scratch the debian vm, install a centos 7 and within minutes I have a postgres 12 with postgis 3.0.4 running. so easy. regards. Marc MILLAS Senior Architect +33607850334 www.mokadb.com On Wed, Jul 20, 2022 at 7:27 PM Imre Samu wrote: > > I would expect the 35 packages implied by the version policies of those > two projects. > > Based on my docker-postgis support - the "geos" is also important. > Now Bullseye(Debian11) geos version is 3.9 - and this is likely to > continue until the end of the cycle ( so no upgrade expected to 3.10,3.11) > > And the (next) Postgis 3.3.0 Release is not enabling all new features > with the current Bullseye - Geos version: > https://git.osgeo.org/gitea/postgis/postgis/raw/tag/3.3.0beta2/NEWS > > *"This version requires PostgreSQL 11 or higher, GEOS 3.6 or higher, and > Proj 5.2+.* > > *Additional features are enabled if you are running GEOS 3.9+ST_MakeValid > enhancements with 3.10+, * > *numerouse additional enhancements with GEOS 3.11+. * > *Requires SFCGAL 1.4.1+ for ST_AlphaShape and ST_OptimalAlphaShape.* > *"* > > And Postgis 3.2 also has some enhancements working only with geos 3.10+ ( > ST_MakeValid enhancements ) > And "Bookworm" Debian12 expected >= mid-2023. > so not easy ... > > Imre > > > David G. Johnston ezt írta (időpont: 2022. > júl. 20., Sze, 18:31): > >> On Wed, Jul 20, 2022 at 9:21 AM Imre Samu wrote: >> >>> > My general impression is that the packaging, at least for Debian, >>> > doesn’t actually understand how the PostGIS project handles versioning >>> support. >>> > But i may be missing something >>> >>> "PostGIS Pre-built Binary Distributions for various OS" >>> ---> https://trac.osgeo.org/postgis/wiki/UsersWikiPackages >>> >>> Debian is a conservative Linux. >>> >>> IMHO: >>> Packaging is not so easy, [ >>> https://trac.osgeo.org/postgis/wiki/UsersWikiPostgreSQLPostGIS ] >>> - there are [n.=7] Postgres version [9.6,10,11,12,13,14,15 ] [ now: >>> all supported in bullseye ] >>> - there are [g.=9 ] Geos version >>> [3.3,3.4,3.5,3.6,3.7,3.8,3.9,3.10,3.11] [ now: bullsey= 3.9.0 ] >>> - there are [p.=7 ] Proj version [ 4.8,4.9,5.x,6.x,7.x,8.x,9.x ][ >>> now: bullseye = 7.2.1 ] >>> - there are [d.= 7 ] Gdal version [ 2.4,3.0,3.1,3.2,3.3,3.4,3.5][ >>> now: bullseye = 3.2.2 ] >>> - there are [m.=5] Postgis version [2.4,2.5,3.0,3.1,3.2,3.3] [now: >>> bullseye= 3.2.1 ] >>> >>> And there are also projects based on PostGIS. >>> - Pgrouting [r.=7 ] [2.3,2.4,2.5,2.6,3.0,3.1,3.2,3.3] [ now: >>> bullseye= 3.3.0 ; postgresql-12-pgrouting ] >>> >>> So the ideal "end user" combination = n*g*p*d*m*r = 7*9*7*7*5*7 = >>> 108045 >>> >>> // disclaimer: I am a Postgis user and a >>> https://github.com/postgis/docker-postgis contributor >>> >>>> >>>> >> Yes, my expectation may be naive, but as the package name is >> "postgresql-[version]-postgis-[version]" I would expect the 35 packages >> implied by the version policies of those two projects. So that one can >> choose their combination and focus on patch releases within those two named >> projects. The OP seems to as well. Or maybe a functional subset so that >> some number less than 35 may exist but, say, you cannot combine v14 and 3.0 >> since 3.0 since 3.2 was the most recent release of PostGIS when PostgreSQL >> v14 came out. >> >> In any case it does sound like the request by the OP is not something the >> community has chosen to provide. Which means a choice on their part - move >> up PostGIS or compile from source. >> >> David J. >> >> >>
impact join syntax ?? and gist index ??
Hi, postgres 12, postgis 3.0 I have a small table A, 11 rows with a varchar column x and a geometry column y. gist index on the geometry column. the geometry do contains multipolygons (regions on a map) I have a second table B , same structure, around 420 000 rows. no index, the geometry do contains points. all geometries are on 4326 srid. If i ask to count points in each multipolygons: select A.x, count(B.x) from A, B where st_within(B.y, A.y) group by A.x; it takes 11 seconds (everything in shared buffers). If I do the very same thing as: select A.x, count(B.x) from A left join B on st_within(B.y, A.y) group by A.x; same result, but 85 seconds (every thing in shared buffers, again) if I redo asking with explain analyze, buffers, the plan is very different. if I do create a gist index on geometry column of the big table, both syntax takes 21 seconds. I get the feeling I am missing something.. (at least 2 things...) can someone shed some light ?? thanks Marc MILLAS Senior Architect +33607850334 www.mokadb.com
Re: impact join syntax ?? and gist index ??
Yes, I know the 2 syntax provide a different result: one provides the 6 meaningful lines, the left join do add 5 lines with a count of 0... ... Marc MILLAS Senior Architect +33607850334 www.mokadb.com On Sat, Jan 7, 2023 at 8:46 PM Marc Millas wrote: > Hi, > > postgres 12, postgis 3.0 > > I have a small table A, 11 rows with a varchar column x and a geometry > column y. > gist index on the geometry column. > the geometry do contains multipolygons (regions on a map) > I have a second table B , same structure, around 420 000 rows. > no index, > the geometry do contains points. > all geometries are on 4326 srid. > > If i ask to count points in each multipolygons: > > select A.x, count(B.x) from A, B where st_within(B.y, A.y) group by A.x; > it takes 11 seconds (everything in shared buffers). > If I do the very same thing as: > select A.x, count(B.x) from A left join B on st_within(B.y, A.y) group by > A.x; > same result, but 85 seconds (every thing in shared buffers, again) > if I redo asking with explain analyze, buffers, the plan is very different. > > > if I do create a gist index on geometry column of the big table, > both syntax takes 21 seconds. > > I get the feeling I am missing something.. (at least 2 things...) > can someone shed some light ?? > > thanks > > > Marc MILLAS > Senior Architect > +33607850334 > www.mokadb.com > >
Re: impact join syntax ?? and gist index ??
on postgres 15 and postgis 3.3, with the very same dataset, without gist index on the 420k rows table, the syntax with the left join takes 25 seconds, and without 770 ms. so to get 5 empty lines its 30 times slower. if I add the gist index, both syntaxes takes 770 ms... at least, this close the discussion about the versions my project will use :-) Marc MILLAS Senior Architect +33607850334 www.mokadb.com On Sat, Jan 7, 2023 at 8:46 PM Marc Millas wrote: > Hi, > > postgres 12, postgis 3.0 > > I have a small table A, 11 rows with a varchar column x and a geometry > column y. > gist index on the geometry column. > the geometry do contains multipolygons (regions on a map) > I have a second table B , same structure, around 420 000 rows. > no index, > the geometry do contains points. > all geometries are on 4326 srid. > > If i ask to count points in each multipolygons: > > select A.x, count(B.x) from A, B where st_within(B.y, A.y) group by A.x; > it takes 11 seconds (everything in shared buffers). > If I do the very same thing as: > select A.x, count(B.x) from A left join B on st_within(B.y, A.y) group by > A.x; > same result, but 85 seconds (every thing in shared buffers, again) > if I redo asking with explain analyze, buffers, the plan is very different. > > > if I do create a gist index on geometry column of the big table, > both syntax takes 21 seconds. > > I get the feeling I am missing something.. (at least 2 things...) > can someone shed some light ?? > > thanks > > > Marc MILLAS > Senior Architect > +33607850334 > www.mokadb.com > >
Re: impact join syntax ?? and gist index ??
here they are: (I replace the column and table names) also I post 2 more remarks, one on left join, and one on the test I did on postgres 15 postgis 3.3... 2023=# explain (analyze, buffers) select A.x, count(B.x) from A left join B on st_within(B.geom, A.geom) group by A.x; QUERY PLAN -- GroupAggregate (cost=212638398.98..212701792.16 rows=20 width=16) (actual time=86717.857..86757.820 rows=11 loops=1) Group Key: A.x Buffers: shared hit=4243867 -> Sort (cost=212638398.98..212659529.97 rows=8452398 width=16) (actual time=86717.851..86727.334 rows=421307 loops=1) Sort Key: A.x Sort Method: quicksort Memory: 37963kB Buffers: shared hit=4243867 -> Nested Loop Left Join (cost=0.00..211521459.31 rows=8452398 width=16) (actual time=17.473..86642.332 rows=421307 loops=1) Join Filter: st_within(B.geom, A.geom) Rows Removed by Join Filter: 4229377 Buffers: shared hit=4243867 -> Seq Scan on A (cost=0.00..9.20 rows=20 width=17752) (actual time=0.009..0.043 rows=11 loops=1) Buffers: shared hit=9 -> Materialize (cost=0.00..22309.83 rows=422789 width=40) (actual time=0.001..23.392 rows=422789 loops=11) Buffers: shared hit=15968 -> Seq Scan on B (cost=0.00..20195.89 rows=422789 width=40) (actual time=0.006..57.651 rows=422789 loops=1) Buffers: shared hit=15968 Planning Time: 0.693 ms Execution Time: 86763.087 ms (19 lignes) 2023=# explain (analyze, buffers) select A.x, count(B.x) from A, B where st_within(B.geom, A.geom) group by A.x; QUERY PLAN Finalize GroupAggregate (cost=6301606.00..6301608.60 rows=20 width=16) (actual time=11857.363..11863.212 rows=6 loops=1) Group Key: A.x Buffers: shared hit=2128836 -> Gather Merge (cost=6301606.00..6301608.30 rows=20 width=16) (actual time=11857.359..11863.207 rows=12 loops=1) Workers Planned: 1 Workers Launched: 1 Buffers: shared hit=2128836 -> Sort (cost=6300605.99..6300606.04 rows=20 width=16) (actual time=11840.355..11840.356 rows=6 loops=2) Sort Key: A.x Sort Method: quicksort Memory: 25kB Worker 0: Sort Method: quicksort Memory: 25kB Buffers: shared hit=2128836 -> Partial HashAggregate (cost=6300605.36..6300605.56 rows=20 width=16) (actual time=11840.331..11840.332 rows=6 loops=2) Group Key: A.x Buffers: shared hit=2128825 -> Nested Loop (cost=0.13..6275745.36 rows=4971999 width=16) (actual time=0.505..11781.817 rows=210651 loops=2) Buffers: shared hit=2128825 -> Parallel Seq Scan on B (cost=0.00..18454.99 rows=248699 width=40) (actual time=0.005..22.859 rows=211395 loops=2) Buffers: shared hit=15968 -> Index Scan using A_geom_idx on A (cost=0.13..25.15 rows=1 width=17752) (actual time=0.054..0.055 rows=1 loops=422789) Index Cond: (geom ~ B.geom) Filter: st_within(B.geom, geom) Rows Removed by Filter: 0 Buffers: shared hit=2112857 Planning Time: 0.252 ms Execution Time: 11863.357 ms (26 lignes) Marc MILLAS Senior Architect +33607850334 www.mokadb.com On Sat, Jan 7, 2023 at 9:40 PM Erik Wienhold wrote: > > On 07/01/2023 20:46 CET Marc Millas wrote: > > > > Hi, > > > > postgres 12, postgis 3.0 > > > > I have a small table A, 11 rows with a varchar column x and a geometry > column y. > > gist index on the geometry column. > > the geometry do contains multipolygons (regions on a map) > > I have a second table B , same structure, around 420 000 rows. > > no index, > > the geometry do contains points. > > all geometries are on 4326 srid. > > > > If i ask to count points in each multipolygons: > > > > select A.x, count(B.x) from A, B where st_within(B.y, A.y) group by A.x; > > it takes 11 seconds (everything in shared buffers). > > If I do the very same thing as: > > select A.x, count(B.x) from A left join B on st_within(B.y, A.y) group > by A.x; > > same result, but 85 seconds (every thing in shared buffers, again) > > if I redo asking with explain analyze, buffers, the pl
Re: impact join syntax ?? and gist index ??
I read your answer, Tom, but I cannot connect it to my measurements: why adding the index did slow the request twice ?? Marc MILLAS Senior Architect +33607850334 www.mokadb.com On Sat, Jan 7, 2023 at 10:33 PM Tom Lane wrote: > Marc Millas writes: > > 2023=# explain (analyze, buffers) select A.x, count(B.x) from A left > join B > > on st_within(B.geom, A.geom) group by A.x; > > So the problem with this is that the only decently-performant way to > do the join is like > > > -> Nested Loop (cost=0.13..6275745.36 > rows=4971999 width=16) (actual time=0.505..11781.817 rows=210651 loops=2) > >-> Parallel Seq Scan on B > (cost=0.00..18454.99 rows=248699 width=40) (actual time=0.005..22.859 > rows=211395 loops=2) > >-> Index Scan using A_geom_idx on A > (cost=0.13..25.15 rows=1 width=17752) (actual time=0.054..0.055 rows=1 > loops=422789) > > Index Cond: (geom ~ B.geom) > > Filter: st_within(B.geom, geom) > > (Ignore the parallelism, it's not very relevant here.) There's no > chance for merge or hash join because those require simple equality > join conditions. The only way to avoid a stupid > compare-every-row-of-A-to-every-row-of-B nestloop is to use a > parameterized inner indexscan, as this plan does. But that only works > if the join is inner or has the indexed table on the nullable side. > We have no support for nestloop right join, which is what would be > needed to make things run fast with no index on B. > > regards, tom lane >
alter table impact on view
Hello, to my understanding, if I do alter table rename column, Postgres change the name of the said column, and modify the views using that table so that they keep working (NOT oracle behaviour..) fine. But if I alter table to change a column that is a varchar 20 into a varchar 21 postgres refuse saying that it cannot due to the return rule... using said column why ?? as the view is not a materialized object, the impact of the length of a column of an underlying table do change the description of the view, clearly, but I dont see where the difficulty is "hidden". Can someone enlighten me? thanks, Marc MILLAS Senior Architect +33607850334 www.mokadb.com
Re: alter table impact on view
A bit sad Thanks.. Le lun. 30 janv. 2023 à 13:53, David Rowley a écrit : > On Tue, 31 Jan 2023 at 01:14, Marc Millas wrote: > > But if I alter table to change a column that is a varchar 20 into a > varchar 21 > > postgres refuse saying that it cannot due to the return rule... using > said column > > > > why ?? as the view is not a materialized object, the impact of the > length of a column of an underlying table do change the description of the > view, clearly, but I dont see where the difficulty is "hidden". Can someone > enlighten me? > > Primarily because nobody has written the required code. > > In [1], which is now quite old, there was some discussion about > various aspects of making this better. Perhaps changing the typmod is > easier than changing the type completely, but we still don't have any > code for it. So for now, you're just stuck manually dropping and > recreating your views. > > David > > [1] > https://www.postgresql.org/message-id/603c8f070807291912x37412373q7cd7dc36dd55a...@mail.gmail.com >
Re: Best Open Source OS for Postgresql
Sorry for inappropriate "reply". if you do check the debian postgis repo, you ll find that its NOT possible to choose a postgis version. its possible for postgis 2.4 and 2.5, then ALL 3.x versions are inaccessible but one, that did change from time to time. (you MUST ask for postgis 3 without being able to ask for 3.0 or 3.1 or... its like asking for postgres 9 without .5 or .6) 2 of my customers reverse to a RH family linux because they have been unable to install the requested postgres/postgis version on debian. when I did ask the team, the reply was: we cannot package for all cross possibilities (ie. 5 postgres x 6 postgis, less some impossibilities according to postgis matrix) so... Marc MILLAS Senior Architect +33607850334 www.mokadb.com On Tue, Jan 31, 2023 at 1:23 PM hubert depesz lubaczewski wrote: > On Tue, Jan 31, 2023 at 01:18:08PM +0100, Marc Millas wrote: > > Did you check postgis debian repo? ?? > > Not sure why: > 1. you ask me that > 2. you ask me that off list > > but no, i haven't. > > depesz >
Re: Best Open Source OS for Postgresql
Hello, What about postgis : 3.0, 3.3, 3.4 ?? Le mer. 1 févr. 2023 à 07:20, Tony Shelver a écrit : > > Copied to the list > On Wed, 1 Feb 2023 at 08:18, Tony Shelver wrote: > >> >> >> On Wed, 1 Feb 2023 at 08:04, Tony Shelver wrote: >> >>> >>> On Tue, 31 Jan 2023 at 15:10, Marc Millas >>> wrote: >>> >>>> Sorry for inappropriate "reply". >>>> >>>> if you do check the debian postgis repo, you ll find that its NOT >>>> possible to choose a postgis version. >>>> its possible for postgis 2.4 and 2.5, then ALL 3.x versions are >>>> inaccessible but one, that did change from time to time. >>>> (you MUST ask for postgis 3 without being able to ask for 3.0 or 3.1 >>>> or... its like asking for postgres 9 without .5 or .6) >>>> 2 of my customers reverse to a RH family linux because they have been >>>> unable to install the requested postgres/postgis version on debian. >>>> when I did ask the team, the reply was: we cannot package for all cross >>>> possibilities (ie. 5 postgres x 6 postgis, less some impossibilities >>>> according to postgis matrix) >>>> >>>> >> Maybe I am not understanding this, but have you checked the UbuntuGIS >> source? I know not pure Debian, but... >> >> Sent before complete: >> >> UbuntuGIS stable shows PostGIS 2.4 and 2.5 available, as well as 3.1 and >> 3.2 : >> https://launchpad.net/%7Eubuntugis/+archive/ubuntu/ppa/+index?batch=75&memo=75&start=75 >> Got this link from https://wiki.ubuntu.com/UbuntuGIS via >> https://postgis.net/install/ >> Also note that UbuntuGIS is based off the DebianGIS project: >> https://wiki.debian.org/DebianGis >> >> We run both Ubuntu and Centos servers. The change to Centos licensing >> has led us to support it only for one specific application that is embedded >> in Centos that we are trying to end-of-life ASAP. At least Ubuntu server >> has a 5 year support window, and Ubuntu has now announced a limited 'free' >> Pro option for smaller businesses that provides a 10 year window. >> >> >> Regards >> >> >>>
Re: Best Open Source OS for Postgresql
If I remember well, I did try all repo I ever heard off, and basic Google search Till I ask someone thru a postgis mailing list. If you find something, like how to install a postgres 12 with a postgis 3.0 on any debian based distro.. Pls tell, I will read with interest. Also, pls, not recompiling the whole thing with all associated libs ... Thanks Which was my customer requirelent Le mar. 31 janv. 2023 à 17:38, Adrian Klaver a écrit : > On 1/31/23 05:09, Marc Millas wrote: > > Sorry for inappropriate "reply". > > > > if you do check the debian postgis repo, you ll find that its NOT > > possible to choose a postgis version. > > Are you talking about the repo hosted by Debian or the Postgres > community repo's here: > > https://www.postgresql.org/download/linux/debian/ > > > its possible for postgis 2.4 and 2.5, then ALL 3.x versions are > > inaccessible but one, that did change from time to time. > > (you MUST ask for postgis 3 without being able to ask for 3.0 or 3.1 > > or... its like asking for postgres 9 without .5 or .6) > > 2 of my customers reverse to a RH family linux because they have been > > unable to install the requested postgres/postgis version on debian. > > when I did ask the team, the reply was: we cannot package for all cross > > possibilities (ie. 5 postgres x 6 postgis, less some impossibilities > > according to postgis matrix) > > > > so... > > > > > > Marc MILLAS > > Senior Architect > > +33607850334 > > www.mokadb.com <http://www.mokadb.com> > > > > > > -- > Adrian Klaver > adrian.kla...@aklaver.com > >
Re: Postgres undeterministically uses a bad plan, how to convince it otherwise?
If I may.. this answer looks more "philosophical" than "practical". On Oracle (maybe 10, I don't remember well) was introduced the possibility to explicitly store an execution plan, so that a given query use THAT plan ie. dont go thru planner job. OK if someone do stupid things, one may get stupid results...it was an "expert only" functionality :-) Still, in some cases, it was very useful to manage the rare cases where the planner cannot, for whatever reason do a good job. OK its not the way postgres do behave. Still, in some cases... Marc MILLAS Senior Architect +33607850334 www.mokadb.com On Thu, Feb 16, 2023 at 5:08 PM David G. Johnston < david.g.johns...@gmail.com> wrote: > On Thu, Feb 16, 2023 at 8:48 AM cen wrote: > >> >> - does the planner take previous runs of the same query and it's >> execution time into account? If not, why? >> > > No, because that isn't how it works. And while I'm no planner expert I'm > not imagining any particularly compelling argument for why it would even > make sense to try. The book-keeping would be expensive and dealing with > supposedly an ever-changing dataset would in many cases make any such > comparisons be meaningless. > > >> - assuming the query to be immutable, would it be possible for the >> planner to microbenchmark a few different plans instead of trying to >> estimate the cost? >> As in, actually executing the query with different plans and caching the >> best one. >> > > No, the planner may not cause execution. While I could imagine extending > EXPLAIN to somehow retrieve and maybe even try alternative plans that have > been fully constructed today I'm not holding my breath. > > There is little reason for the project to give any real weight to > "assuming the query to be immutable". We do want to fix the planner to > behave better if it is mis-behaving, otherwise you do have access to cost > parameters, and potentially other planner toggles if you've truly run into > an intractable problem. > > David J. > >
pg_reload_conf()
Hi, maybe I am missing something... with superuser rights, on a postgres 14 and postgres 15 version: select setting, boot_val, reset_val from pg_settings where name='log_connections'; off off off alter system set log_connections to 'on'; select setting, boot_val, reset_val from pg_settings where name='log_connections'; off off off ... strange select pg_reload_conf(); t select setting, boot_val, reset_val from pg_settings where name='log_connections'; off off off very strange. cat postgresql.auto.conf log_connections = 'on' obviously, if I stop and restart the DB, select setting, boot_val, reset_val from pg_settings where name='log_connections'; on off on So, I am perplexed: what pg_reload_conf() is doing/not doing ? thanks Marc MILLAS Senior Architect +33607850334 www.mokadb.com
Re: pg_reload_conf()
Thanks Adrian, the fact that the changes are visible only by the new backend, and NOT by the backend which did the changes was what I did missed. Marc MILLAS Senior Architect +33607850334 www.mokadb.com On Mon, Feb 20, 2023 at 6:01 PM Adrian Klaver wrote: > On 2/20/23 08:57, Marc Millas wrote: > > select setting, boot_val, reset_val from pg_settings where > > name='log_connections'; > > # select setting, boot_val, reset_val, context from pg_settings where > name='log_connections'; > setting | boot_val | reset_val | context > -+--+---+--- > on | off | on| superuser-backend > > > https://www.postgresql.org/docs/15/view-pg-settings.html > > There are several possible values of context. In order of decreasing > difficulty of changing the setting, they are: > > > ... > > superuser-backend > > Changes to these settings can be made in postgresql.conf without > restarting the server. They can also be set for a particular session in > the connection request packet (for example, via libpq's PGOPTIONS > environment variable), but only if the connecting user is a superuser or > has been granted the appropriate SET privilege. However, these settings > never change in a session after it is started. If you change them in > postgresql.conf, send a SIGHUP signal to the postmaster to cause it to > re-read postgresql.conf. The new values will only affect > subsequently-launched sessions. > > ... > -- > Adrian Klaver > adrian.kla...@aklaver.com > >
Re: Oracle to PostgreSQL Migration
EDB do have a replication server which can be used to transfer real time data from oracle to postgres. don't know if it can be used to get to "no downtime" BTW what do you call "no downtime" as anyway a switch, as fast as it can be do take ""some"" time ? Marc MILLAS Senior Architect +33607850334 www.mokadb.com On Mon, Mar 20, 2023 at 1:58 PM Inzamam Shafiq wrote: > Hi, > > Hope everyone is fine. > > Can someone help or guide regarding Open Source tools for Oracle to > PostgreSQL migration with real time CDC. along with this is there any > possibility to change the structure of the database? Let me explain a > little more, > > We have an Oracle DB which is around 1TB and we want to migrate to > PostgreSQL that have a new table structure, so we want to perform data > transformation and real time CDC from Oracle to PostgreSQL. Do we have any > good open source tool to achieve this with No Coding involved.?? > > Thanks. > > Regards, > > *Inzamam Shafiq* > *Sr. DBA* >
SIze 0 in pg_stat_file, pg_stat_progress_copy
Hi, on a debian machine, with a postgres 14,2 server logs in a dedicated directory (not log) when logged as superuser, I get: --pg_stat_file for the current logfile says size 0, --pg_ls_logdir answers 0 files, --pg_ls_dir, for the log directory provides postgres.csv, postgres.json,... list of files --and when running a copy from the current logfile with a where clause to choose only the current day logs, the bytes total column is 0 what can provides such a behaviour ? thanks, Marc MILLAS Senior Architect +33607850334 www.mokadb.com
Re: SIze 0 in pg_stat_file, pg_stat_progress_copy
Sorry... someone did setup the log as a named pipe... Marc MILLAS Senior Architect +33607850334 www.mokadb.com On Fri, Apr 14, 2023 at 4:26 PM Marc Millas wrote: > Hi, > > on a debian machine, with a postgres 14,2 server > > logs in a dedicated directory (not log) > when logged as superuser, I get: > --pg_stat_file for the current logfile says size 0, > --pg_ls_logdir answers 0 files, > --pg_ls_dir, for the log directory provides postgres.csv, > postgres.json,... list of files > --and when running a copy from the current logfile with a where clause to > choose only the current day logs, the bytes total column is 0 > > what can provides such a behaviour ? > > thanks, > > > Marc MILLAS > Senior Architect > +33607850334 > www.mokadb.com > >
missing something about json syntax
Hi, postgres 15 looks Iike I am missing something, maybe obvious :-( In a table with a json column (_data) if I ask psql to select _data from mytable with a where clause to get only one line,, I get something beginning by {"time":"2023-04-19T16:28:01.19780551+02:00","stream":"stderr","_p":"F","log":"{\"level\":\"info\",\"ts\":\"2023-04-19T14:28:01Z\",\"logger\":\"_audit\",\"msg\":\"record\",\"logging_pod\":\"cluster-pgsql\",\"record\":{\"log_time\":\"2023-04-19 14:28:01.197 UTC\",\ etc... if I create table anothertable as select _data as _data from mytable, it creates and feed that new table with all the appropriate data, and when I ask psql \d anothertable it says that its a table with a json column.named _data. fine ! now if I select json_object_keys(_data) from mytable, I get a list of tags. time, stream, _p, log, fine. now, if i select json_object_keys(_data) from anothettable, I get an error: cannot call json_objet_keys on a scalar.. ??? both columns are fed and of type json. and postgres didn't throw any error feeding them. if I create a table with a jsonb column and feed it with the anothertable json column, same, fine... but still unusable. and unusable with all the other ways I did try, like simply select _data->'log'->>'level' from mytable, or select _data->'level' from anothertable sure if I look at the json field one is showed { "tag": "value", ... and the other is showed "{\"tag\":\"value\", ... not the very same so 2 questions: 1) how postgres can feed a json or jsonb column and CANNOT use the values in it ?? 2) how to "transform" the inappropriate json into a usable one ? of course, if what I am missing is very obvious, I apologize... Marc MILLAS Senior Architect +33607850334 www.mokadb.com
Re: missing something about json syntax
Thanks for your input. select (_data->>'log')::json->'level' from mytable; this does work. but it doesnt explain how postgres is able to put a scalar in a json or jsonb column without pb: I don't understand how this ('"{\"t\"}"') can be considered a valid enough json to be inserted in a json column and at the same time invalid for all other json uses. just like if postgres was allowing to insert things that are not of the column type it's the first time I do encounter this kind of behaviour from postgres Marc MILLAS Senior Architect +33607850334 www.mokadb.com On Thu, Apr 20, 2023 at 7:47 PM Erik Wienhold wrote: > > On 20/04/2023 18:35 CEST Marc Millas wrote: > > > > Hi, > > > > postgres 15 > > > > looks Iike I am missing something, maybe obvious :-( > > In a table with a json column (_data) if I ask psql to select _data from > > mytable with a where clause to get only one line,, I get something > beginning > > by > > > {"time":"2023-04-19T16:28:01.19780551+02:00","stream":"stderr","_p":"F","log":"{\"level\":\"info\",\"ts\":\"2023-04-19T14:28:01Z\",\"logger\":\"_audit\",\"msg\":\"record\",\"logging_pod\":\"cluster-pgsql\",\"record\":{\"log_time\":\"2023-04-19 > 14:28:01.197 UTC\",\ > > etc... > > The value of property "log" is a string, not an object. Notice the escaped > double quotes (\"). > > > if I create table anothertable as select _data as _data from mytable, it > > creates and feed that new table with all the appropriate data, and when > I ask > > psql \d anothertable it says that its a table with a json column.named > _data. > > fine ! > > > > now if I select json_object_keys(_data) from mytable, I get a list of > tags. > > time, stream, _p, log, fine. > > now, if i select json_object_keys(_data) from anothettable, I get an > error: > > cannot call json_objet_keys on a scalar.. > > > > ??? > > both columns are fed and of type json. and postgres didn't throw any > error > > feeding them. > > if I create a table with a jsonb column and feed it with the > anothertable json > > column, same, fine... but still unusable. > > > > and unusable with all the other ways I did try, like simply > > select _data->'log'->>'level' from mytable, or > > select _data->'level' from anothertable > > > > sure if I look at the json field one is showed { "tag": "value", ... > > and the other is showed "{\"tag\":\"value\", ... > > You executed > > create table anothertable as select _data->'log' as _data from > mytable; > > and not > > create table anothertable as select _data as _data from mytable; > > So you end up with the scalar value of property "log" in > anothertable._data. > > > not the very same > > > > so 2 questions: > > 1) how postgres can feed a json or jsonb column and CANNOT use the > values in > >it ?? > > 2) how to "transform" the inappropriate json into a usable one ? > > > > of course, if what I am missing is very obvious, I apologize... > > Get the log value with operator ->> and cast the returned text to json: > > select (_data->>'log')::json->'level' from mytable; > > -- > Erik >
Re: missing something about json syntax
Ok, thanks. Le jeu. 20 avr. 2023 à 22:42, Tom Lane a écrit : > Marc Millas writes: > > but it doesnt explain how postgres is able to put a scalar in a json or > > jsonb column without pb: > > I don't understand how this ('"{\"t\"}"') can be considered a valid > enough > > json to be inserted in a json column > > and at the same time invalid for all other json uses. > > That's a bare string (it's not an object). That's valid JSON per > RFC 7159: > >JSON can represent four primitive types (strings, numbers, booleans, >and null) and two structured types (objects and arrays). >... >A JSON text is a serialized value. Note that certain previous >specifications of JSON constrained a JSON text to be an object or an >array. > > However, there certainly are some operations that require the top-level > value to be an object or array. > > regards, tom lane >
Death postgres
Hi, postgres 14.2 on Linux redhat temp_file_limit set around 210 GB. a select request with 2 left join have crashed the server (oom killer) after the postgres disk occupation did grow from 15TB to 16 TB. What are the cases where postgres may grow without caring about temp_file_limit ? thanks, Marc MILLAS Senior Architect +33607850334 www.mokadb.com
Re: Death postgres
Le sam. 6 mai 2023 à 09:46, Peter J. Holzer a écrit : > On 2023-05-06 03:14:20 +0200, Marc Millas wrote: > > postgres 14.2 on Linux redhat > > > > temp_file_limit set around 210 GB. > > > > a select request with 2 left join have crashed the server (oom killer) > after > > the postgres disk occupation did grow from 15TB to 16 TB. > > temp_file_limit limits the space a process may use on disk while the OOM > killer gets activated when the system runs out of RAM. So these seem to > be unrelated. > > hp > Its clear that oom killer is triggered by RAM and temp_file is a disk thing... But the sudden growth of disk space usage and RAM did happen exactly at the very same time, with only one user connected, and only one query running... > > -- >_ | Peter J. Holzer| Story must make more sense than reality. > |_|_) || > | | | h...@hjp.at |-- Charles Stross, "Creative writing > __/ | http://www.hjp.at/ | challenge!" >
Re: Death postgres
Le sam. 6 mai 2023 à 06:18, Adrian Klaver a écrit : > On 5/5/23 18:14, Marc Millas wrote: > > Hi, > > > > postgres 14.2 on Linux redhat > > > > temp_file_limit set around 210 GB. > > > > a select request with 2 left join have crashed the server (oom killer) > > after the postgres disk occupation did grow from 15TB to 16 TB. > > The result of EXPLAIN would be helpful. > Sure! > But. One of the table looks "inaccessible" since. Ie. Even explain select * from the_table didnt answer and must be killed by control c > > > > What are the cases where postgres may grow without caring about > > temp_file_limit ? > > > > thanks, > > > > > > > > > > Marc MILLAS > > Senior Architect > > +33607850334 > > www.mokadb.com <http://www.mokadb.com> > > > > -- > Adrian Klaver > adrian.kla...@aklaver.com > >
Re: Death postgres
Le sam. 6 mai 2023 à 15:15, Ron a écrit : > On 5/6/23 07:19, Marc Millas wrote: > > > > Le sam. 6 mai 2023 à 09:46, Peter J. Holzer a écrit : > >> On 2023-05-06 03:14:20 +0200, Marc Millas wrote: >> > postgres 14.2 on Linux redhat >> > >> > temp_file_limit set around 210 GB. >> > >> > a select request with 2 left join have crashed the server (oom killer) >> after >> > the postgres disk occupation did grow from 15TB to 16 TB. >> > > "15TB" and "16TB" are pretty low-resolution. For example, 15.4TB rounds > *down* to 15TB, while 15.6TB rounds *up* to 16TB, while they are in fact > only 200GB apart. > > Heck, even 15.4TB and 15.6TB are low-resolution. temp_file_limit may > actually be working. > It was... 15.2 and becomes 16.3... > > >> temp_file_limit limits the space a process may use on disk while the OOM >> killer gets activated when the system runs out of RAM. So these seem to >> be unrelated. >> >> hp >> > Its clear that oom killer is triggered by RAM and temp_file is a disk > thing... > But the sudden growth of disk space usage and RAM did happen exactly at > the very same time, with only one user connected, and only one query > running... > > > If your question is about temp_file_limit, don't distract us with OOM > issues. > > -- > Born in Arizona, moved to Babylonia. >
Re: Death postgres
Le sam. 6 mai 2023 à 15:15, Ron a écrit : > On 5/6/23 07:19, Marc Millas wrote: > > > > Le sam. 6 mai 2023 à 09:46, Peter J. Holzer a écrit : > >> On 2023-05-06 03:14:20 +0200, Marc Millas wrote: >> > postgres 14.2 on Linux redhat >> > >> > temp_file_limit set around 210 GB. >> > >> > a select request with 2 left join have crashed the server (oom killer) >> after >> > the postgres disk occupation did grow from 15TB to 16 TB. >> > > "15TB" and "16TB" are pretty low-resolution. For example, 15.4TB rounds > *down* to 15TB, while 15.6TB rounds *up* to 16TB, while they are in fact > only 200GB apart. > > Heck, even 15.4TB and 15.6TB are low-resolution. temp_file_limit may > actually be working. > > >> temp_file_limit limits the space a process may use on disk while the OOM >> killer gets activated when the system runs out of RAM. So these seem to >> be unrelated. >> >> hp >> > Its clear that oom killer is triggered by RAM and temp_file is a disk > thing... > But the sudden growth of disk space usage and RAM did happen exactly at > the very same time, with only one user connected, and only one query > running... > > > If your question is about temp_file_limit, don't distract us with OOM > issues. > My question is how postgres can use space without caring about temp_file_limit. The oom info is kind of hint about the context as, as said, one select did generate both things > > -- > Born in Arizona, moved to Babylonia. >
Re: Death postgres
Le sam. 6 mai 2023 à 18:11, Adrian Klaver a écrit : > On 5/6/23 05:25, Marc Millas wrote: > > > > > > Le sam. 6 mai 2023 à 06:18, Adrian Klaver > <mailto:adrian.kla...@aklaver.com>> a écrit : > > > > On 5/5/23 18:14, Marc Millas wrote: > > > Hi, > > > > > > postgres 14.2 on Linux redhat > > > > > > temp_file_limit set around 210 GB. > > > > > > a select request with 2 left join have crashed the server (oom > > killer) > > > after the postgres disk occupation did grow from 15TB to 16 TB. > > > > The result of EXPLAIN would be helpful. > > Sure! > > > > But. One of the table looks "inaccessible" since. > > Ie. Even explain select * from the_table didnt answer and must be killed > > by control c > > When you restarted the server where there any warnings shown? > Sadly, I cannot. Will be done next tuesday. > > Using psql can you \d ? > Yes, and no pb to check pg_statistic,... > > Can you select from any other table in the database? > Yes > > > > > > -- > Adrian Klaver > adrian.kla...@aklaver.com > >
Re: Death postgres
Le sam. 6 mai 2023 à 21:46, Adrian Klaver a écrit : > On 5/6/23 10:13, Marc Millas wrote: > > > > > > > When you restarted the server where there any warnings shown? > > > > Sadly, I cannot. Will be done next tuesday. > > Cannot do what: > > 1) Get to the log to see if there are warnings? > > 2) Restart the server? > > Your original post said the server crashed. > > If that was the case how can you do any of the below without restarting it? > Infrastructure is managed by another company. The db is within a > container. > So.. If I run a pg_ctl restart, the container... Vanished. So for each main thing, I must ask... By mail.. > > > > > > Using psql can you \d ? > > > > Yes, and no pb to check pg_statistic,... > > > > > > Can you select from any other table in the database? > > > > Yes > > > > > > > > > > > > > > -- > > Adrian Klaver > > adrian.kla...@aklaver.com <mailto:adrian.kla...@aklaver.com> > > > > -- > Adrian Klaver > adrian.kla...@aklaver.com > >
Re: Death postgres
On Sun, May 7, 2023 at 8:42 PM Laurenz Albe wrote: > On Sat, 2023-05-06 at 03:14 +0200, Marc Millas wrote: > > postgres 14.2 on Linux redhat > > > > temp_file_limit set around 210 GB. > > > > a select request with 2 left join have crashed the server (oom killer) > after the postgres > > disk occupation did grow from 15TB to 16 TB. > > > > What are the cases where postgres may grow without caring about > temp_file_limit ? > > That's too little information for a decent answer. > One obvious answer is: if it is not writing temporary files. > > Yours, > Laurenz Albe > Logical ... so here is a little more info: db=# analyze myschema.table_a; ANALYZE db=# with ta as(select 'myschema' as s, 'table_a' as t), p as (select * from information_schema.columns, ta where table_schema=ta.s and table_name=ta.t), tableid as(select relid from pg_stat_user_tables, ta where schemaname=ta.s and relname=ta.t) select staattnum, column_name, stanullfrac, stadistinct from tableid, pg_statistic join p on p.ordinal_position=staattnum where starelid=tableid.relid order by staattnum; staattnum | column_name | stanullfrac | stadistinct ---+--+-+- 1 | col_ne | 0 | -0.6100224 2 | col_brg| 0.0208 | 6 3 | col_ano| 0 | 447302 4 | col_ine| 0 | -0.5341927 5 | col_cha| 0 | 11 (5 rows) db=# select count(*) from myschema.table_a; count -- 13080776 (1 row) db=# select count(distinct col_ano) from myschema.table_a; count -- 10149937 (1 row) // stats looks somewhat absurd, as analyze guess 20 times less distinct values as a select distinct does on column col_ano... db=# analyze myschema.table_b; ANALYZE db=# with ta as(select 'myschema' as s, 'table_b' as t), p as (select * from information_schema.columns, ta where table_schema=ta.s and table_name=ta.t), tableid as(select relid from pg_stat_user_tables, ta where schemaname=ta.s and relname=ta.t) select staattnum, column_name, stanullfrac, stadistinct from tableid, pg_statistic join p on p.ordinal_position=staattnum where starelid=tableid.relid order by staattnum; staattnum | column_name | stanullfrac | stadistinct ---+-+-+- 1 | col_nerg | 0 | 161828 2 | col_ibi | 0 | 362161 3 | col_imi | 0 | 381023 4 | col_ipi | 0 | 391915 5 | col_ne| 0 | -0.53864235 6 | col_ano | 0 | 482225 (6 rows) db=# select count(*) from myschema.table_b; count -- 14811485 (1 row) db=# select count(distinct col_ano) from myschema.table_b; count -- 10149937 (1 row) //same remark db=# explain select distinct t1.col_ine, case when t2.col_ibi is null then t3.col_ibi else t2.col_ibi end from myschema.table_a t1 left join myschema.table_b t2 on t1.col_ano=t2.col_ano Left join myschema.table_b t3 on t1.NUM_ENQ=t3.NUM_ENQ; QUERY PLAN --- Unique (cost=72377463163.02..201012533981.80 rows=1021522829864 width=97) -> Gather Merge (cost=72377463163.02..195904919832.48 rows=1021522829864 width=97) Workers Planned: 5 -> Sort (cost=72377462162.94..72888223577.87 rows=204304565973 width=97) Sort Key: t1.col_ine, (CASE WHEN (t2.col_ibi IS NULL) THEN t3.col_ibi ELSE t2.col_ibi END) -> Parallel Hash Left Join (cost=604502.76..1276224253.51 rows=204304565973 width=97) Hash Cond: ((t1.col_ano)::text = (t2.col_ano)::text) -> Parallel Hash Left Join (cost=300803.38..582295.38 rows=4857277 width=52) Hash Cond: ((t1.col_ne)::text = (t3.col_ne)::text) -> Parallel Seq Scan on table_a t1 (cost=0.00..184052.35 rows=2616335 width=53) -> Parallel Hash (cost=243466.06..243466.06 rows=2965306 width=31) -> Parallel Seq Scan on table_b t3 (cost=0.00..243466.06 rows=2965306 width=31) -> Parallel Hash (cost=243466.06..243466.06 rows=2965306 width=34) -> Parallel Seq Scan on table_b t2 (cost=0.00..243466.06 rows=2965306 width=34) JIT: Functions: 19 Options: Inlining true, Optimization true, Expressions true, Deforming true (17 rows) //so.. the planner guess that those 2 join will generate 1000 billions rows... //so, I try to change stats db=# alter table myschema.table_a alter column col_ano
Re: Death postgres
On Wed, May 10, 2023 at 7:24 PM Peter J. Holzer wrote: > On 2023-05-10 16:35:04 +0200, Marc Millas wrote: > > Unique (cost=72377463163.02..201012533981.80 rows=1021522829864 > width=97) > >-> Gather Merge (cost=72377463163.02..195904919832.48 > rows=1021522829864 width=97) > ... > >-> Parallel Hash Left Join > (cost=604502.76..1276224253.51 rows=204304565973 width=97) > > Hash Cond: ((t1.col_ano)::text = (t2.col_ano)::text) > ... > > > > //so.. the planner guess that those 2 join will generate 1000 billions > rows... > > Are some of the col_ano values very frequent? If say the value 42 occurs > 1 million times in both table_a and table_b, the join will create 1 > trillion rows for that value alone. That doesn't explain the crash or the > disk usage, but it would explain the crazy cost (and would probably be a > hint that this query is unlikely to finish in any reasonable time). > > hp > > good guess, even if a bit surprising: there is one (and only one) "value" which fit your supposition: NULL 75 in each table which perfectly fit the planner rows estimate. One question: what is postgres doing when it planned to hash 1000 billions rows ? Did postgres create an appropriate ""space"" to handle those 1000 billions hash values ? thanks, MM > -- >_ | Peter J. Holzer| Story must make more sense than reality. > |_|_) || > | | | h...@hjp.at |-- Charles Stross, "Creative writing > __/ | http://www.hjp.at/ | challenge!" >
Re: Death postgres
On Thu, May 11, 2023 at 1:56 AM Peter J. Holzer wrote: > On 2023-05-10 22:52:47 +0200, Marc Millas wrote: > > On Wed, May 10, 2023 at 7:24 PM Peter J. Holzer > wrote: > > > > On 2023-05-10 16:35:04 +0200, Marc Millas wrote: > > > Unique (cost=72377463163.02..201012533981.80 rows=1021522829864 > width= > > 97) > > >-> Gather Merge (cost=72377463163.02..195904919832.48 rows= > > 1021522829864 width=97) > > ... > > >-> Parallel Hash Left Join (cost= > > 604502.76..1276224253.51 rows=204304565973 width=97) > > > Hash Cond: ((t1.col_ano)::text = > (t2.col_ano)::text) > > ... > > > > > > //so.. the planner guess that those 2 join will generate 1000 > billions > > rows... > > > > Are some of the col_ano values very frequent? If say the value 42 > occurs > > 1 million times in both table_a and table_b, the join will create 1 > > trillion rows for that value alone. That doesn't explain the crash > or the > > disk usage, but it would explain the crazy cost (and would probably > be a > > hint that this query is unlikely to finish in any reasonable time). > > > > > > good guess, even if a bit surprising: there is one (and only one) > "value" which > > fit your supposition: NULL > > But NULL doesn't equal NULL, so that would result in only one row in the > left join. So that's not it. > if so... how ??? > > hp > > -- >_ | Peter J. Holzer| Story must make more sense than reality. > |_|_) || > | | | h...@hjp.at |-- Charles Stross, "Creative writing > __/ | http://www.hjp.at/ | challenge!" >
order by
Hi, I keep on investigating on the "death postgres" subject but open a new thread as I don't know if it's related to my pb. I have 2 different clusters, on 2 different machines, one is prod, the second test. Same data volumes. On prod if I do select col_a, count(col_a) from table_a group by col_a order by col_a desc, I get the numbers of NULL on top. To get the number of NULL on top on the test db, I have to select col_a, count(col_a) from table_a group by col_a order by col_a asc. so, it looks like there is something different within the b-tree operator class of varchar (?!?) between those 2 clusters. What can I check to to explain this difference as, to my understanding, it's not a postgresql.conf parameter. thanks Marc MILLAS Senior Architect +33607850334 www.mokadb.com
gather merge
Hi, another new thread related to "death postgres": how to stop Gather Merge from going parallel ? ie. not forcing parallel to one thread as limitating max_parallel_workers (per_gatherer) thanks, Marc MILLAS Senior Architect +33607850334 www.mokadb.com
Re: order by
Thanks, I do know about index options. that table have NO (zero) indexes. Marc MILLAS Senior Architect +33607850334 www.mokadb.com On Thu, May 11, 2023 at 4:48 PM Adam Scott wrote: > Check the index creation has NULLS FIRST (or LAST) on both indexes that > are used. Use explain to see what indexes are used > > See docs for create index: > https://www.postgresql.org/docs/current/sql-createindex.html > > On Thu, May 11, 2023, 7:30 AM Marc Millas wrote: > >> Hi, >> >> I keep on investigating on the "death postgres" subject >> but open a new thread as I don't know if it's related to my pb. >> >> I have 2 different clusters, on 2 different machines, one is prod, the >> second test. >> Same data volumes. >> >> On prod if I do >> select col_a, count(col_a) from table_a group by col_a order by col_a >> desc, >> I get the numbers of NULL on top. >> To get the number of NULL on top on the test db, I have to >> select col_a, count(col_a) from table_a group by col_a order by col_a asc. >> >> so, it looks like there is something different within the b-tree operator >> class of varchar (?!?) >> between those 2 clusters. >> >> What can I check to to explain this difference as, to my understanding, >> it's not a postgresql.conf parameter. >> >> thanks >> >> Marc MILLAS >> Senior Architect >> +33607850334 >> www.mokadb.com >> >>
Re: order by
On Thu, May 11, 2023 at 4:43 PM Adrian Klaver wrote: > On 5/11/23 07:29, Marc Millas wrote: > > Hi, > > > > I keep on investigating on the "death postgres" subject > > but open a new thread as I don't know if it's related to my pb. > > > > I have 2 different clusters, on 2 different machines, one is prod, the > > second test. > > Same data volumes. > > How can they be sharing the same data 'volume'? > roughly: one table is 1308 lines and the second is 1310 lines, the data comes from yet another DB. > those 2 tables have no indexes. they are used to build kind of aggregates > thru multiple left joins. > > Do you mean you are doing dump/restore between them? > no > > Postgres version for each cluster is? > 14.2 > > > > > On prod if I do > > select col_a, count(col_a) from table_a group by col_a order by col_a > desc, > > I get the numbers of NULL on top. > > To get the number of NULL on top on the test db, I have to > > select col_a, count(col_a) from table_a group by col_a order by col_a > asc. > > > > so, it looks like there is something different within the b-tree > > operator class of varchar (?!?) > > between those 2 clusters. > > > > What can I check to to explain this difference as, to my understanding, > > it's not a postgresql.conf parameter. > > > > thanks > > > > Marc MILLAS > > Senior Architect > > +33607850334 > > www.mokadb.com <http://www.mokadb.com> > > > > -- > Adrian Klaver > adrian.kla...@aklaver.com > >
Re: order by
On Thu, May 11, 2023 at 5:23 PM Adrian Klaver wrote: > On 5/11/23 08:00, Marc Millas wrote: > > > > On Thu, May 11, 2023 at 4:43 PM Adrian Klaver > <mailto:adrian.kla...@aklaver.com>> wrote: > > > > On 5/11/23 07:29, Marc Millas wrote: > > > Hi, > > > > > > I keep on investigating on the "death postgres" subject > > > but open a new thread as I don't know if it's related to my pb. > > > > > > I have 2 different clusters, on 2 different machines, one is > > prod, the > > > second test. > > > Same data volumes. > > > > How can they be sharing the same data 'volume'? > > > > roughly: one table is 1308 lines and the second is 1310 > > lines, the data comes from yet another DB. > > > > those 2 tables have no indexes. they are used to build kind of > > aggregates thru multiple left joins. > > > > Do you mean you are doing dump/restore between them? > > > > no > > So how is the data getting from the third database to the prod and test > clusters? > > For the machines hosting the third db, the prod and test clusters what > are?: > should I understand that you suggest that the way the data is inserted Do change the behaviour of the ORDER BY clause ?? > > OS > > OS version > > locale > > > > > > > > Postgres version for each cluster is? > > 14.2 > > FYI, 14.8 has just been released so the clusters are behind by 6 bug fix > releases. > Sadly.. I know. > > > -- > Adrian Klaver > adrian.kla...@aklaver.com > >
Re: gather merge
so, I put max_parallel_workers_per_gather to 0, and it does work, no more parallel execution. Marc MILLAS Senior Architect +33607850334 www.mokadb.com On Thu, May 11, 2023 at 4:38 PM Marc Millas wrote: > Hi, > > another new thread related to "death postgres": > how to stop Gather Merge from going parallel ? > ie. not forcing parallel to one thread as limitating max_parallel_workers > (per_gatherer) > > thanks, > > > > Marc MILLAS > Senior Architect > +33607850334 > www.mokadb.com > >
Re: Death postgres
On Thu, May 11, 2023 at 1:56 AM Peter J. Holzer wrote: > On 2023-05-10 22:52:47 +0200, Marc Millas wrote: > > On Wed, May 10, 2023 at 7:24 PM Peter J. Holzer > wrote: > > > > On 2023-05-10 16:35:04 +0200, Marc Millas wrote: > > > Unique (cost=72377463163.02..201012533981.80 rows=1021522829864 > width= > > 97) > > >-> Gather Merge (cost=72377463163.02..195904919832.48 rows= > > 1021522829864 width=97) > > ... > > >-> Parallel Hash Left Join (cost= > > 604502.76..1276224253.51 rows=204304565973 width=97) > > > Hash Cond: ((t1.col_ano)::text = > (t2.col_ano)::text) > > ... > > > > > > //so.. the planner guess that those 2 join will generate 1000 > billions > > rows... > > > > Are some of the col_ano values very frequent? If say the value 42 > occurs > > 1 million times in both table_a and table_b, the join will create 1 > > trillion rows for that value alone. That doesn't explain the crash > or the > > disk usage, but it would explain the crazy cost (and would probably > be a > > hint that this query is unlikely to finish in any reasonable time). > > > > > > good guess, even if a bit surprising: there is one (and only one) > "value" which > > fit your supposition: NULL > > But NULL doesn't equal NULL, so that would result in only one row in the > left join. So that's not it. > so, apo... the 75 lines in each tables are not NULLs but '' empty varchar, which, obviously is not the same thing. and which perfectly generates 500 billions lines for the left join. So, no planner or statistics pbs. apologies for the time wasted. Back to the initial pb: if, with temp_file_limit positioned to 210 GB, I try to run the select * from table_a left join table_b on the col_a (which contains the 75 '' on both tables) then postgres do crash, killed by oom, after having taken 1.1 TB of additional disk space. the explain plan guess 512 planned partitions. (obviously, I cannot provide an explain analyze...) to my understanding, before postgres 13, hash aggregate did eat RAM limitless in such circumstances. but in 14.2 ?? (I know, 14.8 is up...) > hp > > -- >_ | Peter J. Holzer| Story must make more sense than reality. > |_|_) || > | | | h...@hjp.at |-- Charles Stross, "Creative writing > __/ | http://www.hjp.at/ | challenge!" > Marc MILLAS Senior Architect +33607850334 www.mokadb.com
Re: Death postgres
On Fri, May 12, 2023 at 8:31 AM Peter J. Holzer wrote: > On 2023-05-11 21:27:57 +0200, Marc Millas wrote: > > the 75 lines in each tables are not NULLs but '' empty varchar, > which, > > obviously is not the same thing. > > and which perfectly generates 500 billions lines for the left join. > > So, no planner or statistics pbs. apologies for the time wasted. > > No problem. Glad to have solved that puzzle. > > > Back to the initial pb: if, with temp_file_limit positioned to 210 GB, > > I try to run the select * from table_a left join table_b on the col_a > > (which contains the 75 '' on both tables) > > then postgres do crash, killed by oom, after having taken 1.1 TB of > additional > > disk space. > > My guess is that the amount of parallelism is the problem. > > work_mem is a per-node limit. Even a single process can use a multiple of > work_mem if the query contains nested nodes (which almost every query > does, but most nodes don't need much memory). With 5 parallel workers, > the total consumption will be 5 times that. So to prevent the OOM > condition you would need to reduce work_mem or max_parallel_workers (at > least for this query). > we have more than 100GB RAM and only 1 user, with one request running. work_mem is set to 10MB. for oom to kill due to work_mem it means that for one request with 2 left join, postgres needs more than 10.000 work_mem buffers. to me, it seems difficult to believe. but that postgres may need that RAM space for hashing or whatever other similar purpose seems more probable. no ? > > The description temp_file_limit says "...the maximum amount of disk > space that *a process* can use...". So with 5 workers that's 210*5 = > 1050 GB total. Again, you may want to reduce either temp_file_limit or > max_parallel_workers. > Yes, but if so, we may have had a request canceled due to temp_file limit, which was not the case. > > > to my understanding, before postgres 13, hash aggregate did eat RAM > limitless > > in such circumstances. > > but in 14.2 ?? > > (I know, 14.8 is up...) > > Maybe the older version of postgres didn't use as many workers for that > query (or maybe not parallelize it at all)? > > hp > > -- >_ | Peter J. Holzer| Story must make more sense than reality. > |_|_) || > | | | h...@hjp.at |-- Charles Stross, "Creative writing > __/ | http://www.hjp.at/ | challenge!" >
Re: order by
On Thu, May 11, 2023 at 11:08 PM Ron wrote: > On 5/11/23 09:55, Marc Millas wrote: > > Thanks, > > I do know about index options. > > that table have NO (zero) indexes. > > > If the table has no indices, then why did you write "it looks like there > is something different within the *b-tree operator* class of varchar"? > After all, you only care about b-trees when you have b-tree indices. > > to my understanding, the btree operator is the default operator used to do any sort, like an order by, for varchar, text, .. types. > -- > Born in Arizona, moved to Babylonia. >
Re: Would PostgreSQL 16 native transparent data encryption support database level encryption?
Oracle have One (1) DB per instance (in Oracle its not named a cluster as...there is only one !). So ... Marc MILLAS Senior Architect +33607850334 www.mokadb.com On Thu, May 18, 2023 at 9:30 PM Adrian Klaver wrote: > On 5/18/23 11:56, Ron wrote: > > On 5/18/23 12:54, Rob Sargent wrote: > >> On 5/18/23 11:49, Ron wrote: > > > We need to keep costs down, too. > > > > Oracle (I think) does it at the DB level, and so does SQL Server. Upper > > Management hears us say "sorry, no can do" and wonders what bunch of > > amateurs are developing PostgreSQL. > > Looks like you will be migrating to Oracle or SQL Server. > > Good luck on keeping costs down. > > > > > -- > > Born in Arizona, moved to Babylonia. > > -- > Adrian Klaver > adrian.kla...@aklaver.com > > > >
Re: Would PostgreSQL 16 native transparent data encryption support database level encryption?
?? the sqlite thing ?? Marc MILLAS Senior Architect +33607850334 www.mokadb.com On Sun, May 21, 2023 at 7:15 PM Theodore M Rolle, Jr. wrote: > What about sqlcipher? > > On Sun, May 21, 2023, 07:16 Marc Millas wrote: > >> Oracle have One (1) DB per instance (in Oracle its not named a cluster >> as...there is only one !). So ... >> >> Marc MILLAS >> Senior Architect >> +33607850334 >> www.mokadb.com >> >> >> >> On Thu, May 18, 2023 at 9:30 PM Adrian Klaver >> wrote: >> >>> On 5/18/23 11:56, Ron wrote: >>> > On 5/18/23 12:54, Rob Sargent wrote: >>> >> On 5/18/23 11:49, Ron wrote: >>> >>> > We need to keep costs down, too. >>> > >>> > Oracle (I think) does it at the DB level, and so does SQL Server. >>> Upper >>> > Management hears us say "sorry, no can do" and wonders what bunch of >>> > amateurs are developing PostgreSQL. >>> >>> Looks like you will be migrating to Oracle or SQL Server. >>> >>> Good luck on keeping costs down. >>> >>> > >>> > -- >>> > Born in Arizona, moved to Babylonia. >>> >>> -- >>> Adrian Klaver >>> adrian.kla...@aklaver.com >>> >>> >>> >>>
Re: Would PostgreSQL 16 native transparent data encryption support database level encryption?
Hi Thomas, to my understanding, there is ONE master key for TDE in 12c. Marc MILLAS Senior Architect +33607850334 www.mokadb.com On Sun, May 21, 2023 at 9:47 PM Thomas Kellerer wrote: > > Marc Millas schrieb am 21.05.2023 um 13:13: > > Oracle have One (1) DB per instance (in Oracle its not named a cluster > as...there is only one !). So ... > > Oracle can have multiple (pluggable) databases per instance since Oracle 12 > > > >
syntax pb
Hi, I always have had difficulties to understand syntax. So... If I have: create table t1 (t text); create table t2 (a text, b text, c test, d numeric); insert into t1('azerty'); INSERT 0 1 fine ! so, now, if I write: Select distinct test1.t, 'abc' as b, NULL as c, NULL as d >From t1 test1; t | b | c | d +-+---+--- azerty | abc | | (1 row) ok. and , now, if I want to insert that: Insert into t2 (a, b, c, d) Select distinct test1.t, 'abc' as b, NULL as c, NULL as d >From t1 test1; I get: ERROR: column "d" is of type numeric but expression is of type text LINE 2: Select distinct test1.t, 'abc' as b, NULL as c, NULL as d HINT: You will need to rewrite or cast the expression. Can someone give a short SQL syntax hint ? thanks, Marc MILLAS Senior Architect +33607850334 www.mokadb.com
Re: syntax pb
The above should have been: > > Insert into t2 (a, b, c, d) > Select distinct test1.t, 'abc' as b, NULL::text, NULL::numeric > From t1 test1; > > > > > which results in: > > > > select * from t2; > > a| b | c | d > > +-+--+-- > > azerty | abc | NULL | NULL > > > > > Thanks Adrian, but if the query becomes more complex, for example with a few joins more, then even casting doesn't work. This comes from a prod environment and even casting NULLs (which is more than strange, BTW) generates absurd errors. Too my understanding it looks like the parser did not parse the select distinct as we think he does. > >> Marc MILLAS > >> Senior Architect > >> +33607850334 > >> www.mokadb.com <http://www.mokadb.com> > >> > > > > -- > Adrian Klaver > adrian.kla...@aklaver.com > >
Re: syntax pb
Marc MILLAS Senior Architect +33607850334 www.mokadb.com On Tue, May 30, 2023 at 3:51 PM David G. Johnston < david.g.johns...@gmail.com> wrote: > On Tuesday, May 30, 2023, Marc Millas wrote: > >> >> I get: >> ERROR: column "d" is of type numeric but expression is of type text >> LINE 2: Select distinct test1.t, 'abc' as b, NULL as c, NULL as d >> >> HINT: You will need to rewrite or cast the expression. >> >> Can someone give a short SQL syntax hint ? >> > > > https://www.postgresql.org/docs/current/sql-expressions.html#SQL-SYNTAX-TYPE-CASTS > I plainly agree on that... but its NOT what's happeninng. The doc you point to states: "An explicit type cast can usually be omitted if there is no ambiguity as to the type that a value expression must produce (for example, when it is assigned to a table column);" in the SQL I provide I ask to put a NULL in a numeric column. Can you tell where the ambiguity is ? > > David J. >
Re: syntax pb
Marc MILLAS Senior Architect +33607850334 www.mokadb.com On Tue, May 30, 2023 at 7:12 PM David G. Johnston < david.g.johns...@gmail.com> wrote: > On Tue, May 30, 2023 at 8:53 AM Marc Millas wrote > > >> This comes from a prod environment and even casting NULLs (which is more >> than strange, BTW) generates absurd errors. >> > > If you want an input to be anything other than plain text (numbers > partially exempted) you need to cast it. Sure, some limited cases allow > for other parts of a query to infer untyped literals, but literals defined > at the top-level of a SELECT is not one of those places. > > Too my understanding it looks like the parser did not parse the select >> distinct as we think he does. >> > > The DISTINCT clause doesn't really come into play here at all, so if you > think it does you indeed have a misunderstanding. > Inputting literal NULLs, and using DISTINCT, are both, IMO, considered > code smells and seldom used. You still need to be able to interpret error > messages but if you are running actual queries with these things you may > have larger model design and query writing concerns to deal with in > addition to being able to identify the problems specific error messages are > pointing out and trying to fix them. > Hi David, my guess about the distinct syntax was just because if I take the distinct OUT, the SQL works fine. nothing more, nothing less... > > David J. > >
Re: syntax pb
Thanks for the explanation. Crystal clear, thanks Marc MILLAS Senior Architect +33607850334 www.mokadb.com On Tue, May 30, 2023 at 7:31 PM Tom Lane wrote: > "David G. Johnston" writes: > > On Tue, May 30, 2023 at 8:53 AM Marc Millas > wrote > >> Too my understanding it looks like the parser did not parse the select > >> distinct as we think he does. > > > The DISTINCT clause doesn't really come into play here at all, so if you > > think it does you indeed have a misunderstanding. > > No, he's correct: > > postgres=# create table z (f1 int); > CREATE TABLE > postgres=# insert into z values(null); > INSERT 0 1 > postgres=# insert into z select null; > INSERT 0 1 > postgres=# insert into z select distinct null; > ERROR: column "f1" is of type integer but expression is of type text > LINE 1: insert into z select distinct null; > ^ > HINT: You will need to rewrite or cast the expression. > > The first two INSERTs are accepted because there's nothing > "between" the untyped NULL and the INSERT, so we can resolve > the NULL as being of type int. But use of DISTINCT requires > resolving the type of the value (else how do you know what's > distinct from what?) and by default we'll resolve to text, > and then that doesn't match what the INSERT needs. > > regards, tom lane >
Re: Hash Index on Partitioned Table
Hi Peter, in postgres 13, create index should be, by default, parallelized. so albeit for specific values of the parallelization parameters in postgresql.conf, your machine should use more than one core while creating the indexes. also you can set the maintenance_workmem parameter to the max for such a job, as you have some RAM. In my own experience of indexing big partitioned tables, I did create a few scripts to create each index (there is one index for each partition) and when finished create the 'global' index which, as the job is already done, is fast. (check the 'only' parameter in create index doc). doing this it was easy to somewhat optimize the process according to number of available core/RAM/storage. hf Marc MILLAS Senior Architect +33607850334 www.mokadb.com On Wed, May 31, 2023 at 7:53 PM peter.boris...@kartographia.com < peter.boris...@kartographia.com> wrote: > Hi Tom, > > Thanks so much for your quick response. As luck would have it, the > index FINALLY finished about an hour ago. For a size comparison: > > > > BTree: 6,433 GB > > Hash: 8,647 GB > > > > Although I don't have a proper benchmark to compare performance, I can say > the hash is working as good as if not faster than the BTree for my use case > (web application). > > > > I guess I was getting a little nervous waiting for the index to complete > and seeing such a huge difference in file size but I'm ok now :-) > > > > Thanks again, > > Peter > > > > > > -Original Message- > From: "Tom Lane" > Sent: Wednesday, May 31, 2023 10:07am > To: "peter.boris...@kartographia.com" > Cc: pgsql-general@lists.postgresql.org > Subject: Re: Hash Index on Partitioned Table > > "peter.boris...@kartographia.com" > writes: > > I have a rather large database with ~250 billion records in a > partitioned table. The database has been running and ingesting data > continuously for about 3 years. > > > I had a "regular" BTree index on one of the fields (a unique bigint > column) but it was getting too big for the disk it was on. The index was > consuming 6.4 TB of disk space. > > That's ... really about par for the course. Each row requires an 8-byte > index entry, plus 12 bytes index overhead. If I'm doing the math right > then the index is physically about 78% full which is typical to good for > a b-tree. Reindexing would remove most of the extra space, but only > temporarily. > > > After doing some research I decided to try to create a hash index > instead of a BTree. For my purposes, the index is only used to find > specific numbers ("=" and "IN" queries). From what I read, the hash index > should run a little faster than btree for my use case and should use less > disk space. > > I'm skeptical. The thing to bear in mind is that btree is the mainstream > use-case and has been refined and optimized far more than the hash index > logic. > > > (1) Why is the hash index consuming more disk space than the btree > index? Is it because the hash of the bigint values larger than the storing > the bigints in the btree? > > From memory, the index entries will be the same size in this case, > but hash might have more wasted space. > > > (4) Is there any way to estimate when the index process will complete? > > An index on a partitioned table isn't a single object, it's one index per > partition. So you should be able to look at how many partitions have > indexes so far. You might have to drill down to the point of counting how > many files in the database's directory, if the individual indexes aren't > showing up as committed catalog entries yet. > > regards, tom lane >
date format
Hi, I would like to load data from a file via file_fdw or COPY.. its a postgres 14 cluster but.. One date (timestamp) column is written french order and another column is written english order. Data comes from a state owned entity so asking for a normalization may take ages. obviously I could load as char and then apply an appropriate transformation. no pb. But is there a direct way to do this ? thanks Marc MILLAS Senior Architect +33607850334 www.mokadb.com
Re: date format
On Wed, Jun 14, 2023 at 7:27 PM David G. Johnston < david.g.johns...@gmail.com> wrote: > On Wed, Jun 14, 2023 at 9:42 AM Marc Millas > wrote: > >> Hi, >> >> I would like to load data from a file via file_fdw or COPY.. its a >> postgres 14 cluster >> >> but.. One date (timestamp) column is written french order and another >> column is written english order. Data comes from a state owned entity so >> asking for a normalization may take ages. >> >> obviously I could load as char and then apply an appropriate >> transformation. no pb. >> But is there a direct way to do this ? >> > > Probably no - casting formats via locale cannot be specified at that scope > when using copy. Either the cast for a given single setting produces the > correct result or it doesn't. If you need a custom cast like this you have > to get away from COPY first. Usually that is best done after importing > data to a temporary table as text. > > David J. > So, creating a foreign table with varchar type, and then doing the insert as select with the appropriate format.. clear. somewhat sad as it was a one step process with the former oracle db we get rid off. Marc
Re: date format
On Wed, Jun 14, 2023 at 8:15 PM Ron wrote: > On 6/14/23 13:02, Marc Millas wrote: > > > On Wed, Jun 14, 2023 at 7:27 PM David G. Johnston < > david.g.johns...@gmail.com> wrote: > >> On Wed, Jun 14, 2023 at 9:42 AM Marc Millas >> wrote: >> >>> Hi, >>> >>> I would like to load data from a file via file_fdw or COPY.. its a >>> postgres 14 cluster >>> >>> but.. One date (timestamp) column is written french order and another >>> column is written english order. Data comes from a state owned entity so >>> asking for a normalization may take ages. >>> >>> obviously I could load as char and then apply an appropriate >>> transformation. no pb. >>> But is there a direct way to do this ? >>> >> >> Probably no - casting formats via locale cannot be specified at that >> scope when using copy. Either the cast for a given single setting produces >> the correct result or it doesn't. If you need a custom cast like this you >> have to get away from COPY first. Usually that is best done after >> importing data to a temporary table as text. >> >> David J. >> > > So, creating a foreign table with varchar type, and then doing the insert > as select with the appropriate format.. clear. > somewhat sad as it was a one step process with the former oracle db we get > rid off. > > > How did Oracle know what format the date was in? > when you describe the external file you describe the field format one by one. > > In Postgresql, could you write a simple anonymous procedure that reads the > file_fdw table records, does the conversion and then inserts into the > destination table? > -> one sql line insert as select using the to_date() function with > appropriate format. > > -- > Born in Arizona, moved to Babylonia. >
Re: date format
On Wed, Jun 14, 2023 at 8:23 PM Marc Millas wrote: > > > > > On Wed, Jun 14, 2023 at 8:15 PM Ron wrote: > >> On 6/14/23 13:02, Marc Millas wrote: >> >> >> On Wed, Jun 14, 2023 at 7:27 PM David G. Johnston < >> david.g.johns...@gmail.com> wrote: >> >>> On Wed, Jun 14, 2023 at 9:42 AM Marc Millas >>> wrote: >>> >>>> Hi, >>>> >>>> I would like to load data from a file via file_fdw or COPY.. its a >>>> postgres 14 cluster >>>> >>>> but.. One date (timestamp) column is written french order and another >>>> column is written english order. Data comes from a state owned entity so >>>> asking for a normalization may take ages. >>>> >>>> obviously I could load as char and then apply an appropriate >>>> transformation. no pb. >>>> But is there a direct way to do this ? >>>> >>> >>> Probably no - casting formats via locale cannot be specified at that >>> scope when using copy. Either the cast for a given single setting produces >>> the correct result or it doesn't. If you need a custom cast like this you >>> have to get away from COPY first. Usually that is best done after >>> importing data to a temporary table as text. >>> >>> David J. >>> >> >> So, creating a foreign table with varchar type, and then doing the insert >> as select with the appropriate format.. clear. >> somewhat sad as it was a one step process with the former oracle db we >> get rid off. >> >> >> How did Oracle know what format the date was in? >> > when you describe the external file you describe the field format one by > one. > >> >> In Postgresql, could you write a simple anonymous procedure that reads >> the file_fdw table records, does the conversion and then inserts into the >> destination table? >> -> one sql line insert as select using the to_date() function with >> appropriate format. >> https://www.postgresql.org/docs/current/functions-formatting.html >> > I do know what s in the file, I dont have to guess... > -- >> Born in Arizona, moved to Babylonia. >> >
pb with join plan
Hi, I have a postgres 14 on linux with a 15 TB db, with 20 cores and 150GB RAM, all nvme ssd. . Currently one user :-) A join between 2 big tables and then another join with a smaller third one takes less than 1 minute and provides a result of 15 M lines. Fine. if I do add a third join, with a 30k lines table, with a simple equality as join clause it does work almost as fast. explain analyze indicates 45 sec. same if i do the very same with another equality clause. explain analyze indicates 140 sec. But if I do the same with clause one OR clause 2, I have to kill the request after an hour, seeing the filesystem showing more than 140 Mb of increased usage. Looking at the explain plan with one clause or the 2 ORed, there are changes in the plan (of course) with the fastest clause the estimated cost is 3 700 000 and with the a bit slower one 3 900 000. with both ORed, the estimated cost is 16 000 000. To me it does sound a bit strange, as ORing the join clauses should add times, but not more (so so) So, before providing the 3 explain plans (I must anonymize everything, so somewhat boring) I would like to know if there is some obvious thing I am missing. all tables have been vacuum analyzed. thanks Marc MILLAS Senior Architect +33607850334 www.mokadb.com
Re: pb with join plan
On Tue, Jun 20, 2023 at 10:14 PM David Rowley wrote: > On Wed, 21 Jun 2023 at 07:42, Marc Millas wrote: > > But if I do the same with clause one OR clause 2, I have to kill the > request after an hour, seeing the filesystem showing more than 140 Mb of > increased usage. > > > So, before providing the 3 explain plans (I must anonymize everything, > so somewhat boring) I would like to know if there is some obvious thing I > am missing. > > all tables have been vacuum analyzed. > > I believe you can anonymise the explain with https://explain.depesz.com/ > > link to the anonymized plan of the req with one clause : https://explain.depesz.com/s/TWp4 It's pretty hard to say until we see the query, but having an OR in > the join condition makes it impossible to Hash or Merge join, so > perhaps it's slow due to Nested Loop join. > > You could consider rewriting the query to use a UNION or a UNION ALL > separating out each branch of the OR into a UNION of its own. That > would allow Hash and Merge join to work again. However, that's all > speculation until you provide more details. > > https://wiki.postgresql.org/wiki/Guide_to_reporting_problems > > David > Marc MILLAS
Re: pb with join plan
On Tue, Jun 20, 2023 at 11:19 PM David Rowley wrote: > On Wed, 21 Jun 2023 at 08:34, Marc Millas wrote: > > > > On Tue, Jun 20, 2023 at 10:14 PM David Rowley > wrote: > >> > >> On Wed, 21 Jun 2023 at 07:42, Marc Millas > wrote: > >> > But if I do the same with clause one OR clause 2, I have to kill the > request after an hour, seeing the filesystem showing more than 140 Mb of > increased usage. > >> > >> > > link to the anonymized plan of the req with one clause : > https://explain.depesz.com/s/TWp4 link to the plan with the second clause alone: https://explain.depesz.com/s/byW5 link to the plan with both clauses ORed (the one not finishing) https://explain.depesz.com/s/jHO2 > > > It's quite difficult to know what the problem is you want to fix here. > Your initial post indicated it was the query with the OR condition > that was causing you the problems, but the plan you've posted has no > OR condition?! > > You're more likely to get help here if you take time to properly > explain the situation and post the information that's actually > relevant to the problem you're having, or state the problem more > clearly, as there's a mismatch somewhere. > > It might also be worth having a look at > https://wiki.postgresql.org/wiki/Slow_Query_Questions . EXPLAIN is not > going to tell us what part of the query is slow. I'll let the wiki > page guide you into what to do instead. > I know that page. obviously, as I have to kill the request, I cannot provide a explain analyze... > > David >
Re: pb with join plan
On Wed, Jun 21, 2023 at 12:43 PM Tomas Vondra wrote: > On 6/21/23 00:26, Marc Millas wrote: > > > > > > On Tue, Jun 20, 2023 at 11:19 PM David Rowley > <mailto:dgrowle...@gmail.com>> wrote: > > > > On Wed, 21 Jun 2023 at 08:34, Marc Millas > <mailto:marc.mil...@mokadb.com>> wrote: > > > > > > On Tue, Jun 20, 2023 at 10:14 PM David Rowley > > mailto:dgrowle...@gmail.com>> wrote: > > >> > > >> On Wed, 21 Jun 2023 at 07:42, Marc Millas > <mailto:marc.mil...@mokadb.com>> wrote: > > >> > But if I do the same with clause one OR clause 2, I have to > > kill the request after an hour, seeing the filesystem showing more > > than 140 Mb of increased usage. > > >> > > >> > > > link to the anonymized plan of the req with one clause : > > https://explain.depesz.com/s/TWp4 <https://explain.depesz.com/s/TWp4 > > > > > > link to the plan with the second > > clause alone: https://explain.depesz.com/s/byW5 > > <https://explain.depesz.com/s/byW5> > > link to the plan with both clauses ORed (the one not > > finishing) https://explain.depesz.com/s/jHO2 > > <https://explain.depesz.com/s/jHO2> > > > > > > > > It's quite difficult to know what the problem is you want to fix > here. > > Your initial post indicated it was the query with the OR condition > > that was causing you the problems, but the plan you've posted has no > > OR condition?! > > > > You're more likely to get help here if you take time to properly > > explain the situation and post the information that's actually > > relevant to the problem you're having, or state the problem more > > clearly, as there's a mismatch somewhere. > > > > It might also be worth having a look at > > https://wiki.postgresql.org/wiki/Slow_Query_Questions > > <https://wiki.postgresql.org/wiki/Slow_Query_Questions> . EXPLAIN > is not > > going to tell us what part of the query is slow. I'll let the wiki > > page guide you into what to do instead. > > > > > > I know that page. obviously, as I have to kill the request, I cannot > > provide a explain analyze... > > > > It's a bit weird the "victor" table is joined seemingly without any join > conditions, leading to a cross join (which massively inflates the cost > for joins above it). Maybe the anonymized plan mangles it somehow. > the query does: select blabla from table1 join table2 on (list of 9 fields ANDed and corresponding to the index of both table1 and table2) join table3 on table1.a=table3.a and table1.b=table3.b join table4 on (list of 2 clauses table2.d=table4.e and one clause substr(table2.f)=table4.g all ORed) table1 and table2 are big (15M and 60M lines), table3 and table4 are small (30k lines) basically, if I rewrites the query beginning by the join between table2 and table4, then join table1 and then table3, postgres generates the same plan, which doesnt end. if instead of the 3 clauses of the last join I keep one equality clause, the explain plan looks the same, but executes in 45 secondes. > > regards > > -- > Tomas Vondra > EnterpriseDB: http://www.enterprisedb.com > The Enterprise PostgreSQL Company > Marc MILLAS
Re: pb with join plan
Marc MILLAS On Wed, Jun 21, 2023 at 12:43 PM Tomas Vondra wrote: > On 6/21/23 00:26, Marc Millas wrote: > > > > > > On Tue, Jun 20, 2023 at 11:19 PM David Rowley > <mailto:dgrowle...@gmail.com>> wrote: > > > > On Wed, 21 Jun 2023 at 08:34, Marc Millas > <mailto:marc.mil...@mokadb.com>> wrote: > > > > > > On Tue, Jun 20, 2023 at 10:14 PM David Rowley > > mailto:dgrowle...@gmail.com>> wrote: > > >> > > >> On Wed, 21 Jun 2023 at 07:42, Marc Millas > <mailto:marc.mil...@mokadb.com>> wrote: > > >> > But if I do the same with clause one OR clause 2, I have to > > kill the request after an hour, seeing the filesystem showing more > > than 140 Mb of increased usage. > > It's a bit weird the "victor" table is joined seemingly without any join > conditions, leading to a cross join (which massively inflates the cost > for joins above it). Maybe the anonymized plan mangles it somehow. > So I did try to simplify my pb. I create a table with the result of the first 3 joins. That table do have 15M lines. all tables have been vacuum analyze Now if I do an explain analyze of a simple join between that table and my original table 4 using a simple = clause, I get a result in one second (around). and the planner guesses for rows seems in line with the observed values . if I use a substr(table1.a)= table2.b, the explain analyze get a result in 21 seconds and the planner estimates a 65M rows result set while the observed is 330 k rows so here its 20 times slower and the discrepency between planner rows guess and reality is a 200 ratio. Now, if I try an explain analyze with join on a=b or substr(c)=d or e=f then... I kill the query after a quarter an hour without any answer. if I try to just explain the query, the planner rows guess becomes more than 2 Billions the extremely simple query and plan are here, without automatic obfuscation https://explain.depesz.com/s/b8Ll > > regards > > -- > Tomas Vondra > EnterpriseDB: http://www.enterprisedb.com > The Enterprise PostgreSQL Company > Marc MILLAS
bug or lacking doc hint
Hi, I have had a perf (++) pb with a join plan cf the pb with join plan thread. I did simplify the thing up to when its a simple join between a 15M lines table and a 30k lines table. if I put in the on part something like table1.a=table2.b, Postgres does the job in around 1 seconde. if in the on part of the join I write table1.a=table2.b OR substr(table1.c,x,x) =table2.d then I have to cancel the request after 20 minutes seeing disk usage growing and growing. When I ask this list, David Rowley suggest to rewrite the SQL, replacing the OR by a union. Fine, this do work, even if a bit complex as the original SQL was a set of intricate joins. So, either this behaviour ( postgres unable to find an appropriate plan for join with OR clauses) is a true bug which is waiting to be corrected, either the doc is lacking a chapter on which one can read that Postgres does NOT support that syntax as soon as there is some data volumes. So, now I am writing some kind of recipe book for the users of that DB. What should I write ? Marc MILLAS Senior Architect +33607850334 www.mokadb.com
Re: bug or lacking doc hint
On Sun, Jun 25, 2023 at 9:35 PM David Rowley wrote: > On Mon, 26 Jun 2023 at 03:02, Marc Millas wrote: > > When I ask this list, David Rowley suggest to rewrite the SQL, replacing > the OR by a union. > > > > Fine, this do work, even if a bit complex as the original SQL was a set > of intricate joins. > > > > > > So, either this behaviour ( postgres unable to find an appropriate plan > for join with OR clauses) > > is a true bug which is waiting to be corrected, either the doc is > lacking a chapter on which one can read that Postgres does NOT support that > syntax as soon as there is some data volumes. > > The problem is that out of the 3 methods PostgreSQL uses to join > tables, only 1 of them supports join conditions with an OR clause. > Merge Join cannot do this because results can only be ordered one way > at a time. Hash Join technically could do this, but it would require > that it built multiple hash tables. Currently, it only builds one > table. That leaves Nested Loop as the join method to implement joins > with OR clauses. Unfortunately, nested loops are quadratic and the > join condition must be evaluated once per each cartesian product row. > That does not scale well. > > Tom Lane did start some work [1] to allow the planner to convert some > queries to use UNION instead of evaluating OR clauses, but, if I > remember correctly, it didn't handle ORs in join conditions, though > perhaps having it do that would be a natural phase 2. I don't recall > why the work stopped. > > > So, now I am writing some kind of recipe book for the users of that DB. > > > > What should I write ? > > You're asking a public mailing list to write private documentation for > the company you work for? hmm. > looks like some kind of misunderstanding: what I am asking is: should I , or am I allowed to write that, according to the community, Postgres is unable to do something ? > David > > [1] > https://www.postgresql.org/message-id/flat/20180904015910.GA1797012%40rfd.leadboat.com#0ddccc114c74d1ddaedcb07b64530967 Marc MILLAS Senior Architect +33607850334 www.mokadb.com
Re: bug or lacking doc hint
On Sun, Jun 25, 2023 at 11:48 PM Tom Lane wrote: > David Rowley writes: > > The problem is that out of the 3 methods PostgreSQL uses to join > > tables, only 1 of them supports join conditions with an OR clause. > > Merge Join cannot do this because results can only be ordered one way > > at a time. Hash Join technically could do this, but it would require > > that it built multiple hash tables. Currently, it only builds one > > table. That leaves Nested Loop as the join method to implement joins > > with OR clauses. Unfortunately, nested loops are quadratic and the > > join condition must be evaluated once per each cartesian product row. > > We can do better than that if the OR'd conditions are each amenable > to an index scan on one of the tables: then it can be a nestloop with > a bitmap-OR'd inner index scan. I thought the upthread advice to > convert the substr() condition into something that could be indexed > was on-point. > ok. but one of the tables within the join(s) tables is 10 billions rows, splitted in 120 partitions. Creating something like 20 more indexes to fulfill that condition do have its own problems. > > > Tom Lane did start some work [1] to allow the planner to convert some > > queries to use UNION instead of evaluating OR clauses, but, if I > > remember correctly, it didn't handle ORs in join conditions, though > > perhaps having it do that would be a natural phase 2. I don't recall > > why the work stopped. > > As I recall, I was having difficulty convincing myself that > de-duplication of results (for cases where the same row satisfies > more than one of the OR'd conditions) would work correctly. > You can't just blindly make it a UNION because that might remove > identical rows that *should* appear more than once in the result. > I did rewrite the query using a cte and union(s). For that query, no dedup point. But my pb is that that DB will be used by a bunch of people writing raw SQL queries, and I cannot let them write queries that are going to go on for ages, and eventually crash over temp_file_limit after hours every now and then. So, my understanding of the above is that I must inform the users NOT to use OR clauses into joins. which maybe a pb by itself. regards Marc > regards, tom lane > Marc MILLAS
Re: bug or lacking doc hint
On Mon, Jun 26, 2023 at 5:47 AM Avin Kavish wrote: > Sounds like the problem you are having is, the server is running out of > temporary resources for the operation that users are trying to do. So > according to Tom, on the postgres side, the operation cannot be optimized > further. > > I think you have few choices here, > - See if increasing the resources of the server will allow them to run > the operation > - Ask users not to do that operation > - Use a extension like citus to scale horizontally > > But I'm thinking why a massively inefficient join is needed in the first > place. Shouldn't joins be for following keyed relationships. So ideally a > unique indexed column, but at the very least an indexed column. Why is a > join required on a dynamically calculated substring? Can it be made into a > static computed value and indexed? Substring sounds like an op that should > be in the filter stage. > > Can you describe your data model? Maybe we can give some specific advice. > There is a set of big tables containing fine grain health data. The DB is accessed by people doing research on various kind of sicknesses. So, by nature, all columns (I mean ALL) can be used for every kind of SQL including, obviously, lots of intricated joins. >From time to time, People write requests that may take more than 2 days to complete. So the idea of indexing 'appropriate' columns translate in indexing all columns, of all tables, including the big ones with 30+ columns. with only main keys indexes, the DB is already 15TB+. and my own experience of putting 30 indexes on one table is not very positive. so... BTW rewriting the original request using cte and union does complete in 134 seconds, doing 3 merge join. And I have one more question: the explain analyze plan shows that Postgres decided to do external sorts using around 2 GB of disk space. I did a set work_mem to '4GB' to try to have those sorts in memory. No effect. How can I tell the planner to do those sort in memory ?? thanks > Regards, > Avin > > On Mon, Jun 26, 2023 at 3:57 AM Marc Millas > wrote: > >> >> On Sun, Jun 25, 2023 at 11:48 PM Tom Lane wrote: >> >>> David Rowley writes: >>> > The problem is that out of the 3 methods PostgreSQL uses to join >>> > tables, only 1 of them supports join conditions with an OR clause. >>> > Merge Join cannot do this because results can only be ordered one way >>> > at a time. Hash Join technically could do this, but it would require >>> > that it built multiple hash tables. Currently, it only builds one >>> > table. That leaves Nested Loop as the join method to implement joins >>> > with OR clauses. Unfortunately, nested loops are quadratic and the >>> > join condition must be evaluated once per each cartesian product row. >>> >>> We can do better than that if the OR'd conditions are each amenable >>> to an index scan on one of the tables: then it can be a nestloop with >>> a bitmap-OR'd inner index scan. I thought the upthread advice to >>> convert the substr() condition into something that could be indexed >>> was on-point. >>> >> ok. but one of the tables within the join(s) tables is 10 billions rows, >> splitted in 120 partitions. Creating something like 20 more indexes to >> fulfill that condition do have its own problems. >> >>> >>> > Tom Lane did start some work [1] to allow the planner to convert some >>> > queries to use UNION instead of evaluating OR clauses, but, if I >>> > remember correctly, it didn't handle ORs in join conditions, though >>> > perhaps having it do that would be a natural phase 2. I don't recall >>> > why the work stopped. >>> >>> As I recall, I was having difficulty convincing myself that >>> de-duplication of results (for cases where the same row satisfies >>> more than one of the OR'd conditions) would work correctly. >>> You can't just blindly make it a UNION because that might remove >>> identical rows that *should* appear more than once in the result. >>> >> >> I did rewrite the query using a cte and union(s). For that query, no >> dedup point. >> But my pb is that that DB will be used by a bunch of people writing raw >> SQL queries, and I cannot let them write queries that are going to go on >> for ages, and eventually crash over temp_file_limit after hours every now >> and then. >> So, my understanding of the above is that I must inform the users NOT to >> use OR clauses into joins. >> which maybe a pb by itself. >> regards >> Marc >> >> >>> regards, tom lane >>> >> >> Marc MILLAS >> >
Re: bug or lacking doc hint
On Mon, Jun 26, 2023 at 4:05 PM Ron wrote: > On 6/26/23 07:22, Marc Millas wrote: > > > > On Mon, Jun 26, 2023 at 5:47 AM Avin Kavish wrote: > >> Sounds like the problem you are having is, the server is running out of >> temporary resources for the operation that users are trying to do. So >> according to Tom, on the postgres side, the operation cannot be optimized >> further. >> >> I think you have few choices here, >> - See if increasing the resources of the server will allow them to run >> the operation >> - Ask users not to do that operation >> - Use a extension like citus to scale horizontally >> >> But I'm thinking why a massively inefficient join is needed in the first >> place. Shouldn't joins be for following keyed relationships. So ideally a >> unique indexed column, but at the very least an indexed column. Why is a >> join required on a dynamically calculated substring? Can it be made into a >> static computed value and indexed? Substring sounds like an op that should >> be in the filter stage. >> >> Can you describe your data model? Maybe we can give some specific advice. >> > > There is a set of big tables containing fine grain health data. The DB is > accessed by people doing research on various kind of sicknesses. So, by > nature, all columns (I mean ALL) can be used for every kind of SQL > including, obviously, lots of intricated joins. > > > This sounds like the kind of problem normally solved by data warehouses. > Is your schema designed like a DW, or is it in 3NF? > it's, indeed, some kind of dwh. but it's neither a star nor a snowflake .at least not used like those standard schemas. in one of the big tables (10 billions+ rows), there is around 60 columns, describing one event: some guy have had a given sickness, got a given medoc etc The pb is that its not one simple event with a set of dimensions, the people using that db are NOT looking for an event according to various criterias, they are looking for correlations between each of the 60+ columns. As a consequence very few indexes are used as most requests end in some kind of huge sequential reads. The machine was built for this and perform well, but some requests are posing pb and we must find solutions/workaround. one of the users did rewrite the request using a select distinct matched with left join(s) and table.a is not null set of conditions. looks crazy, but does work. I'll get the request tomorrow. > -- > Born in Arizona, moved to Babylonia. > Marc MILLAS
Re: pb with join plan
On Tue, Jun 27, 2023 at 8:12 PM Kirk Wolak wrote: > On Wed, Jun 21, 2023 at 12:10 PM Marc Millas > wrote: > >> Marc MILLAS >> On Wed, Jun 21, 2023 at 12:43 PM Tomas Vondra < >> tomas.von...@enterprisedb.com> wrote: >> >>> On 6/21/23 00:26, Marc Millas wrote: >>> > >>> > >>> > On Tue, Jun 20, 2023 at 11:19 PM David Rowley >> > <mailto:dgrowle...@gmail.com>> wrote: >>> > >>> > On Wed, 21 Jun 2023 at 08:34, Marc Millas >> > <mailto:marc.mil...@mokadb.com>> wrote: >>> > > >>> > > On Tue, Jun 20, 2023 at 10:14 PM David Rowley >>> > mailto:dgrowle...@gmail.com>> wrote: >>> > >> >>> > >> On Wed, 21 Jun 2023 at 07:42, Marc Millas < >>> marc.mil...@mokadb.com >>> > <mailto:marc.mil...@mokadb.com>> wrote: >>> > >> > But if I do the same with clause one OR clause 2, I have to >>> > kill the request after an hour, seeing the filesystem showing more >>> > than 140 Mb of increased usage. >>> >>> It's a bit weird the "victor" table is joined seemingly without any join >>> conditions, leading to a cross join (which massively inflates the cost >>> for joins above it). Maybe the anonymized plan mangles it somehow. >>> >> >> So I did try to simplify my pb. >> I create a table with the result of the first 3 joins. >> That table do have 15M lines. all tables have been vacuum analyze >> >> Now if I do an explain analyze of a simple join between that table and my >> original table 4 >> using a simple = clause, I get a result in one second (around). and the >> planner guesses for rows seems in line with the observed values . >> if I use a substr(table1.a)= table2.b, the explain analyze get a result >> in 21 seconds and the planner estimates a 65M rows result set while the >> observed is 330 k rows >> so here its 20 times slower and the discrepency between planner rows >> guess and reality is a 200 ratio. >> >> Now, if I try an explain analyze with join on a=b or substr(c)=d or e=f >> then... I kill the query after a quarter an hour without any answer. >> if I try to just explain the query, the planner rows guess becomes more >> than 2 Billions >> the extremely simple query and plan are here, without automatic >> obfuscation >> https://explain.depesz.com/s/b8Ll >> > > First, I am not sure why you cannot send us the explain analyze. But > moving on... > Kirk, the explain analyze, with the SQL query is directly accessible on the explain.depesz link . > > substr() is a function that mutilates a value such that the index becomes > useless... > If you are looking for the LEFT() of the value, then an INDEX can be used. > I have COLLATION "C" and when I query: > WHERE fld like fld_b||"%" > there are NO indexes on those columns. One of the reasons is that the simplest index on one column is 380 GB on disk So to put indexes on each criteria, I must add around 1 TB of disk just for ttt the full scan is not a problem. Its fast.. The problem is the nested loop which do compare each of the 15M lines of ttt to each of the 30K lines of inc_pha_r. its an operation done 450 000 000 000 times. so if each comparison is 1 microsecond long, the nested loop is 125 hours long. And I am not sure that the comparison is done in 1 microsecond... > > The optimizer constructs a query that uses the index on "fld"... > But when I try: > > WHERE fld like CONCAT_WS("", fld_b,"%") > It doesn't use the index version. (because the function call is too > complicated to see through) > > When using functions in where clauses, indexes either have to be made on > those functions, or often times the index cannot be used. > > BTW, I noted the COLLATION. That turned out to be important, because my > first DB test did NOT use that collation, and the result > of the LIKE was the non-indexed version... > > I hope you find something useful in here. > Thanks for trying > > Also, WHERE fld <> 72... (unless you have a heavily skewed set of > statistics, I read that as. SCAN everything, and check later, > because this should filter very few rows), whereas fld = 72 will be > blazingly fast. > > Kirk >
need explanation about an explain plan
Hi, Postgres 14.2 (for one more month) The explain plan and request is here: https://explain.depesz.com/s/Opk0 The big table is split in around 130 partitions, one by month. the ladate column is the partition key, and it does have 1 value for each partition. there is a Marc MILLAS Senior Architect +33607850334 www.mokadb.com
Re: need explanation about an explain plan
Sorry, gmail sent uncompleted message Hi, Postgres 14.2 (for one more month) The explain plan and request is here: https://explain.depesz.com/s/Opk0 The big table (10 billions raws) is split in around 130 partitions, one by month. the ladate column is the partition key, and it does have 1 value for each partition. there is an index on the numfic column. the distribution of values for that column may differ before and after 2019 january. The request is executed in a loop for all ten years of data, year 1,2,3, then year2,3,4 etc that request is also executed for columns other than cod, one column at a time (there are 107 columns...) I dont understand why the planner use a filter on the date, as its already within the partition structure. Thanks, Marc MILLAS On Wed, Jun 28, 2023 at 5:22 PM Marc Millas wrote: > Hi, > > Postgres 14.2 (for one more month) > The explain plan and request is here: > https://explain.depesz.com/s/Opk0 > > The big table is split in around 130 partitions, one by month. > the ladate column is the partition key, and it does have 1 value for each > partition. > there is a > > > > > Marc MILLAS > Senior Architect > +33607850334 > www.mokadb.com > >
Re: need explanation about an explain plan
On Wed, Jun 28, 2023 at 6:48 PM Laurenz Albe wrote: > On Wed, 2023-06-28 at 17:29 +0200, Marc Millas wrote: > > https://explain.depesz.com/s/Opk0 > > > > The big table (10 billions raws) is split in around 130 partitions, one > by month. > > the ladate column is the partition key, and it does have 1 value for > each partition. > > there is an index on the numfic column. the distribution of values for > that column may differ before and after 2019 january. > > > > The request is executed in a loop for all ten years of data, year 1,2,3, > then year2,3,4 etc > > that request is also executed for columns other than cod, one column at > a time (there are 107 columns...) > > > > I dont understand why the planner use a filter on the date, as its > already within the partition structure. > > If the grouping column is the partitioning key, try to set > enable_partitionwise_join = on. > > Otherwise, your best bet is to create an index that covers both WHERE > conditions, > or a covering index, which will get you the best result: > > CREATE INDEX ON table1 (numfic, ladate) INCLUDE (cod, nb_obs); > Hi Laurenz, as said, in each partition there is only one value for ladate. I don't understand the point of creating an index for the tens of millions rows of each partition, index wich will contain the very same unique value within all of the index. I did set enable_partitionwise_join = 'on' and retry, but it doesn't seem to change the plan. > Yours, > Laurenz Albe >
Re: need explanation about an explain plan
Le mer. 28 juin 2023 à 22:46, Laurenz Albe a écrit : > On Wed, 2023-06-28 at 19:25 +0200, Marc Millas wrote: > > Hi Laurenz, as said, in each partition there is only one value for > ladate. > > The planner doesn't seem to take that into account. > Indeed. I did check values in pg_statistic. And rerun analyze. No change... > > Yours, > Laurenz Albe >
analyze partition
Hi, the documentation, on chapter 5.11.3.3 caveat says that a manual vacuum or analyze on the root table does it only for that said root table. To my understanding, the root table when used with declarative partitioning, does not contain data, so vacuuming or analyzing should be 'fast'. If I run vacuum analyze ma_table on my big partitioned table (130+ partitions) it does work for quite a long time. Can someone clarify ? thanks Marc MILLAS Senior Architect +33607850334 www.mokadb.com
Re: suggestion about time based partitioning and hibernate
On Tue, Jul 18, 2023 at 8:18 AM Luca Ferrari wrote: > Dear all, > I'm looking for ideas here, and it could be someone already stepped > into declarative partitioning of an existing database where Hibernate > (a Java ORM) handles the tables. > The situation is as follows: > > create table foo( id primary key, a_date date, ... ); > > Now, the trivial way to partition this would be on a range based on > a_date, so that the primary key of the tables shifts from id to (id, > a_date). One thing that frightens me is that Hibernate does a lot of > per-row lookups by means of the id, so while the partitioning is > probably going to make things more manageable and even faster in some > scenarios, could lead to drawbacks when Hibernate queries by id. > Moreover, hibernate will think id is unique while it is not anymore. > Last but not least, referencing foreign keys are made by Hibernate > thru the id column, and it means that incoming foreign keys to foo > will not be in place anymore. > > Now, I know that I can define a composite key in hibernate, in order > to match the effective new data structure, but this requires a huge > rewrite of the application code. > And after all, we are talking about a non-PostgreSQL related piece, so > the problem is not on the PostgreSQL side. > > Anyone has already done a partitioning in such scenario? > > I am thinking that partitioning on an hash of id could be the only way > to go without having to touch the hibernate side, even if this would > bring up a less balanced partitioned structure. In such case, I mean > partitioning by hash, having a table with 60 millions rows per 50 GB > in size, what would be the rule of thumb to select the number of > partitions (i.e., a suggested modulus)? > > I will appreciate any suggestion. > > Thanks, > Luca > Hello, quite often, in such cases, the id is a sequence or something like that. so, partitioning on id range or a_date range is similar, on a maintenance point of vue. and, obviously if you partition by id range, no pk pb. the difference is the SQL plan to access the data, as to prune partitions, Postgres need to have the partition key in the where clause. obviously, if your code do access the data thru id... then partition by id. > > Marc MILLAS > >
Re: DB Server slow down & hang during Peak hours of Usage
Hello, in the postgresql.conf joined, 2 things (at least) look strange: 1) the values for background writer are the default values, fit for a server with a limited writes throughput. you may want to increase those, like: bgwriter_delay = 50ms bgwriter_lru_maxpages = 400 bgwriter_lru_multiplier = 4.0 and check the checkpoint log to see if there are still backend processes writes. 2) work_mem is set to 2 GB. so, if 50 simultaneous requests use at least one buffer for sorting, joining, ..., you will consume 100 GB of RAM this value seems huge for the kind of config/usage you describe. You may try to set work_mem to 100 MB and check what's happening. Also check the logs, postgres tells his life there... Marc MILLAS Senior Architect +33607850334 www.mokadb.com On Mon, Aug 7, 2023 at 3:36 PM KK CHN wrote: > List , > > *Description:* > > Maintaining a DB Server Postgres and with a lot of read writes to this > Server( virtual machine running on ESXi 7 with CentOS 7) . > > ( I am not sure how to get the read / write counts or required IOPS or any > other parameters for you. If you point our I can execute those commands > and get the data. ) > > Peak hours say 19:00 Hrs to 21:00 hrs it hangs ( The application is an > Emergency call response system writing many Emergency Response vehicles > locations coordinates to the DB every 30 Seconds and every emergency call > metadata (username, phone number, location info and address of the caller > to the DB for each call) > > During these hours the system hangs and the Application ( which shows > the location of the vehicles on a GIS map hangs ) and the CAD machines > which connects to the system hangs as those machines can't connect to the > DB and get data for displaying the caller information to the call taking > persons working on them. ) > > *Issue : * > How to trace out what makes this DB hangs and make it slow and how to > fix it.. > > *Resource poured on the system :* > > *64 vCPUs allocate ( Out of a host machine comprised of 2 processor slots > of 20 cores each with Hyper Threading, intel xeon 2nd Gen, CPU usage show > 50 % in vCentre Console), and RAM 64 GB allocated ( buy usage always > showing around 33 GB only ) * > > *Query :* > > How to rectify the issues that makes the DB server underperforming and > find a permanent fix for this slow down issue*. * > > *Attached the Postgres.conf file here for reference .* > > *Any more information required I can share for analysis to fix the issue. * > > > *Krishane * >