Performance issue after migration from 9.4 to 15
Hi list, We have recently migrated from our old PostgreSQL 9.4 server to a PostgreSQL 15 server. Even though the new machine has more resources, we see a considerable decrease in the performance of some of our heavier queries, and I have no idea where I should start tuning. ? Old database: PostgreSQL 9.4.26 on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-17), 64-bit New database: PostgreSQL 15.0 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-44), 64-bit I tested the same query against the old and the new database. Old: https://explain.dalibo.com/plan/b7d7gab73f6c7274#plan/node/21 New: https://explain.dalibo.com/plan/84h16cg1f19266be#plan/node/34 Particularly interesting are the sequential scans. In the old plan, we have node #21, which took 32 seconds. Almost all of the time goes into actual I/O. In the new plan, the corresponding node is #34. It took 55 seconds, but less than 1 second was actual I/O (because most of the data was already in memory). Why did this step still take about twice the time? There is another Seq Scan of the same table. Node #10 in the old plan took 3 seconds, whereas the corresponding node #21 in the new plan took more than 2 minutes (of which less than 2 seconds was actual I/O). Am I misreading the plans? If not, I have no idea why the sequential scans take so much longer in the new database, even though the I/O is even faster than before. The configuration was left almost unchanged, with only some adjustments due to changes between the versions. As far as I can tell, none of these changes is performance related. Can anybody give me a hint into which direction I should investigate further? Thanks, Christian
Re: Performance issue after migration from 9.4 to 15
On Tue, 11 Apr 2023 at 23:03, Christian Schröder wrote: > We have recently migrated from our old PostgreSQL 9.4 server to a PostgreSQL > 15 server. Even though the new machine has more resources, we see a > considerable decrease in the performance of some of our heavier queries, and > I have no idea where I should start tuning. ? Using pg_upgrade? Did you run ANALYZE? If not then you may be suffering from lack of statistics leading to bad plans. > Old: https://explain.dalibo.com/plan/b7d7gab73f6c7274#plan/node/21 > New: https://explain.dalibo.com/plan/84h16cg1f19266be#plan/node/34 I'm not well versed in looking at JSON output plans. You might get a better overall response posting to https://explain.depesz.com/. Personally, I always look for raw output, and at least with depesz, that's text-based. > Particularly interesting are the sequential scans. In the old plan, we have > node #21, which took 32 seconds. Almost all of the time goes into actual I/O. > In the new plan, the corresponding node is #34. It took 55 seconds, but less > than 1 second was actual I/O (because most of the data was already in > memory). Why did this step still take about twice the time? Perhaps your 15 server is under more load than 9.4 due to all concurrent plans being slower from bad statistics? Load averages might be a good indicator. (I assume the server is busy due to the "Workers Launched": 0) > Am I misreading the plans? If not, I have no idea why the sequential scans > take so much longer in the new database, even though the I/O is even faster > than before. Looks that way to me too. > Can anybody give me a hint into which direction I should investigate further? Probably just run ANALYZE on the database in question. David
Re: Performance issue after migration from 9.4 to 15
David Rowley writes: > On Tue, 11 Apr 2023 at 23:03, Christian Schröder > wrote: >> Particularly interesting are the sequential scans. In the old plan, we have >> node #21, which took 32 seconds. Almost all of the time goes into actual >> I/O. In the new plan, the corresponding node is #34. It took 55 seconds, but >> less than 1 second was actual I/O (because most of the data was already in >> memory). Why did this step still take about twice the time? > Perhaps your 15 server is under more load than 9.4 due to all > concurrent plans being slower from bad statistics? Load averages might > be a good indicator. (I assume the server is busy due to the "Workers > Launched": 0) I think the extra time is due to useless overhead from trying and failing to parallelize: the leader has to do all the work, but there's probably overhead added anyway. 9.4 of course knew nothing of parallelism. My guess is that the OP is trying to run with a large number of backends and has not raised the max number of parallel workers to match. It does look like the stats might need updating (since 9.4's rowcount estimate is OK and 15's less so) but that is not why we see "Workers planned: 2, Workers launched: 0". Either provision enough parallel workers to fix that, or disable parallelism. regards, tom lane
Re: "PANIC: could not open critical system index 2662" - twice
> No idea about the former, but bad hardware is a good enough explanation. > As to keeping it from happening: use good hardware. Alright, thanks, I'll just keep my fingers crossed that it doesn't happen again then! > Also: Use checksums. PostgreSQL offers data checksums[1]. Some filesystems also offer checksums. We have data_checksums=on. (It must be on by default, since I cannot find that in our config files anywhere.) However, the docs say "Only data pages are protected by checksums; internal data structures and temporary files are not.", so I guess pg_class_oid_index might be an "internal data structure"? We also have checksum=on for the ZFS dataset on which the data is stored (also the default - we didn't change it). ZFS did detect problems (zpool status reported read, write and checksum errors for one of the old disks), but it also said "errors: No known data errors". I understood that to meant that it recovered from the errors, i.e. wrote the data different disk blocks or read it from another disk in the pool.
TEXT column > 1Gb
Hello, I’ve recently encountered the issue of trying to insert more than 1 Gb into a TEXT column. While the docs say TEXT is unlimited length, I had been unaware of the 1Gb buffer size limitations. We can debate whether or not saving something this big in a single column is a good idea (spoiler: it isn’t. But not my design and, in fairness, was not anticipated when the schema was designed.), I’d like to implement something that is not a major disruption and try to keep the mods on the server side. My first idea is to have a chunked associated table (in pseudo code) CREATE TABLE associated(key_id integer references main_table(key_id), chunk integer, text_start integer, text_end integer, text_chunk TEXT); And define functions for inserting and selecting by dividing into 1Mb chunks CREATE FUNCTION insertText(INTEGER,TEXT) RETURNS INTEGER AS $$ DECLARE chunk INTEGER := 0; key_id ALIAS for $1; the_text ALIAS for $2; text_chunk TEXT; BEGIN LOOP text_chunk := substr(the_text,chunk*100,100); IF length(text_chunk) = 0 THEN EXIT; END IF; INSERT INTO associated(key_id,chunk,text_start,text_end,text_chunk) VALUES (key_id,chunk,chunk*100,(chunk*100+length(text_chunk)),text_chunk); chunk := chunk + 1; END LOOP; RETURN chunk; END; $$ LANGUAGE plpgsql; This apparently runs into the same issues of buffers size: I get an ‘invalid message length’ in the log file and the insert fails. I can see from adding notices in the code that I never enter the LOOP; I assume having function arguments > 1Gb is also a bad thing. I’d like to continue to keep the modifications on the server size. And I’d like to believe someone else has had this problem before. Any suggestions other than have the client do the chunking? Can I use a different language binding and get around the argument length limitations? Thanks
Re: TEXT column > 1Gb
On 4/11/23 11:41, Joe Carlson wrote: Hello, I’ve recently encountered the issue of trying to insert more than 1 Gb into a TEXT column. While the docs say TEXT is unlimited length, I had been unaware of the 1Gb buffer size limitations. We can debate whether or not saving something this big in a single column is a good idea (spoiler: it isn’t. But not my design and, in fairness, was not anticipated when the schema was designed.), I’d like to implement something that is not a major disruption and try to keep the mods on the server side. My first idea is to have a chunked associated table (in pseudo code) CREATE TABLE associated(key_id integer references main_table(key_id), chunk integer, text_start integer, text_end integer, text_chunk TEXT); And define functions for inserting and selecting by dividing into 1Mb chunks CREATE FUNCTION insertText(INTEGER,TEXT) RETURNS INTEGER AS $$ DECLARE chunk INTEGER := 0; key_id ALIAS for $1; the_text ALIAS for $2; text_chunk TEXT; BEGIN LOOP text_chunk := substr(the_text,chunk*100,100); IF length(text_chunk) = 0 THEN EXIT; END IF; INSERT INTO associated(key_id,chunk,text_start,text_end,text_chunk) VALUES (key_id,chunk,chunk*100,(chunk*100+length(text_chunk)),text_chunk); chunk := chunk + 1; END LOOP; RETURN chunk; END; $$ LANGUAGE plpgsql; This apparently runs into the same issues of buffers size: I get an ‘invalid message length’ in the log file and the insert fails. I can see from adding notices in the code that I never enter the LOOP; I assume having function arguments > 1Gb is also a bad thing. I’d like to continue to keep the modifications on the server size. And I’d like to believe someone else has had this problem before. Any suggestions other than have the client do the chunking? Can I use a different language binding and get around the argument length limitations? Thanks I've hit this same limitation in Java (with write to db). What is your stack in this case? Not sure my solution applies.
Re: TEXT column > 1Gb
Hi út 11. 4. 2023 v 19:42 odesílatel Joe Carlson napsal: > Hello, > > I’ve recently encountered the issue of trying to insert more than 1 Gb > into a TEXT column. While the docs say TEXT is unlimited length, I had been > unaware of the 1Gb buffer size limitations. > I think so this is some misunderstanding see https://www.postgresql.org/docs/current/datatype-character.html >>>The storage requirement for a short string (up to 126 bytes) is 1 byte plus the actual string, which includes the space padding in the case of character. Longer strings have 4 bytes of overhead instead of 1. Long strings are compressed by the system automatically, so the physical requirement on disk might be less. Very long values are also stored in background tables so that they do not interfere with rapid access to shorter column values. In any case, the longest possible character string that can be stored is about 1 GB. (The maximum value that will be allowed for *n* in the data type declaration is less than that. It wouldn't be useful to change this because with multibyte character encodings the number of characters and bytes can be quite different. If you desire to store long strings with no specific upper limit, use text or character varying without a length specifier, rather than making up an arbitrary length limit.)<<< My note: nothing is unlimited in this world :-) 1GB is a theoretical limit, but depending on usage, it can sometimes be too much - and the safe limit is about 500MB. Long strings can take too much RAM in some cases. For longer data you can use large objects LO API https://www.postgresql.org/docs/current/largeobjects.html It is much better than extra long strings, and the implementation is +/- similar like you proposed. Regards Pavel > > We can debate whether or not saving something this big in a single column > is a good idea (spoiler: it isn’t. But not my design and, in fairness, was > not anticipated when the schema was designed.), I’d like to implement > something that is not a major disruption and try to keep the mods on the > server side. My first idea is to have a chunked associated table (in pseudo > code) > > CREATE TABLE associated(key_id integer references main_table(key_id), > chunk integer, text_start integer, text_end integer, text_chunk TEXT); > > And define functions for inserting and selecting by dividing into 1Mb > chunks > > CREATE FUNCTION insertText(INTEGER,TEXT) RETURNS INTEGER AS $$ > DECLARE > chunk INTEGER := 0; > key_id ALIAS for $1; > the_text ALIAS for $2; > text_chunk TEXT; > BEGIN > LOOP > text_chunk := substr(the_text,chunk*100,100); > IF length(text_chunk) = 0 THEN > EXIT; > END IF; > INSERT INTO associated(key_id,chunk,text_start,text_end,text_chunk) > VALUES > (key_id,chunk,chunk*100,(chunk*100+length(text_chunk)),text_chunk); > chunk := chunk + 1; > END LOOP; > RETURN chunk; > END; > $$ LANGUAGE plpgsql; > > This apparently runs into the same issues of buffers size: I get an > ‘invalid message length’ in the log file and the insert fails. I can see > from adding notices in the code that I never enter the LOOP; I assume > having function arguments > 1Gb is also a bad thing. > > I’d like to continue to keep the modifications on the server size. And I’d > like to believe someone else has had this problem before. Any suggestions > other than have the client do the chunking? Can I use a different language > binding and get around the argument length limitations? > > Thanks > > > >
Guidance on INSERT RETURNING order
Hello list, I have a few clarification questions regarding using insert with returning. The use case is SQLAlchemy development, where the orm wants to insert a list of rows, get back the generated ids, defaults, etc, and match the returned values with the original list of orm objects. The following assumes a table like this CREATE TABLE t( id SERIAL, data TEXT -- type here can be anything ) On PostgreSQL, since SQLAlchemy version 1.4 (Mar 2021) this operation has used INSERT INTO t(data) VALUES ('a'), ('b'), ('c') RETURNING id but we were recently made aware that there is no guarantee on the order of the returned columns. Looking at the documentation there is no mention of the order of the RETURNING clause, but searching past discussion there are multiple indication that the order is not guaranteed, like https://www.postgresql.org/message-id/19445.1350482182%40sss.pgh.pa.us . I think the docs should mention this, similar to what the sqlite docs do at https://www.sqlite.org/lang_returning.html#limitations_and_caveats Searching the archive seems that a using the INSERT SELECT ORDER BY form should be a better solution, so the above insert should be rewritten as INSERT INTO t(data) SELECT data FROM (VALUES ('a', 1), ('b', 2), ('c', 3)) as vv(data, num) ORDER BY num RETURNING id to ensure that the id are created in the order specified by num. The returned id can again be in arbitrary order, but sorting them should enable correctly matching the orm object so that they can be properly updated. Is this correct? The documentation does not say anything about this, and looking at the archive it seems that it's mostly correct but not 100% guaranteed, as stated here https://www.postgresql.org/message-id/29386.1528813619%40sss.pgh.pa.us . The MSSQL docs, for example, clearly state that this is the case https://learn.microsoft.com/en-us/sql/t-sql/statements/insert-transact-sql?view=sql-server-ver16#limitations-and-restrictions , so it would be helpful if something similar were mentioned in the PostgreSQL docs. The above insert form (INSERT SELECT ORDER BY) can be used when the primary key is an auto incrementing value, in case it isn't (such as when it's an UUID), another solution must be used. Since there does not seem to be any way of getting the position of the original row inside the VALUES clause with RETURNING, the solution SQLAlchemy is implementing is to either degrade to inserts with a single value or to optionally allow the user to add a "sentinel" column to the table, so that a sequential value can be inserted into it and then returned allowing the ordering of the RETURNING clause rows: ALTER TABLE t ADD COLUMN sentinel SMALLINT INSERT INTO t(data, sentinel) VALUES ('a', 1), ('b', 2), ('c', 3) RETURNING id, sentinel Is there any better solution to achieve this? (For reference this feature is tracked in SQLAlchemy by https://github.com/sqlalchemy/sqlalchemy/issues/9618) >From an ORM standpoint it would be very useful having a way of forcing the order of RETURNING to be the same as the one in VALUES, maybe with an additional keyword. Alternatively having a system column or other function that can be placed into the returning clause to return the output row position wrt the input values list, similar to what the sentinel column above does. At the very least I think the documentation could do a better job at mentioning that RETURNING order is arbitrary, and documenting that INSERT SELECT ORDER BY precesses the default in select order (if that's indeed the case) Sorry for the long email, Thanks Federico
Re: Guidance on INSERT RETURNING order
On 4/11/23 12:47, Federico wrote: Hello list, I have a few clarification questions regarding using insert with returning. The use case is SQLAlchemy development, where the orm wants to insert a list of rows, get back the generated ids, defaults, etc, and match the returned values with the original list of orm objects. The following assumes a table like this CREATE TABLE t( id SERIAL, data TEXT -- type here can be anything ) On PostgreSQL, since SQLAlchemy version 1.4 (Mar 2021) this operation has used INSERT INTO t(data) VALUES ('a'), ('b'), ('c') RETURNING id but we were recently made aware that there is no guarantee on the order of the returned columns. 1) Because returned data in SQL is inherently unordered. 2) What would you order by, id or data or both? Sorry for the long email, Thanks Federico -- Adrian Klaver adrian.kla...@aklaver.com
Re: Guidance on INSERT RETURNING order
On 4/11/23 12:47, Federico wrote: Hello list, https://www.sqlite.org/lang_returning.html#limitations_and_caveats Searching the archive seems that a using the INSERT SELECT ORDER BY form should be a better solution, so the above insert should be rewritten as INSERT INTO t(data) SELECT data FROM (VALUES ('a', 1), ('b', 2), ('c', 3)) as vv(data, num) ORDER BY num RETURNING id Or with i as (INSERT INTO t(data) VALUES ('a', 1), ('b', 2), ('c', 3) returning id) select i.id from i order by id; Sorry for the long email, Thanks Federico -- Adrian Klaver adrian.kla...@aklaver.com
Re: Guidance on INSERT RETURNING order
Στις 11/4/23 23:06, ο/η Adrian Klaver έγραψε: On 4/11/23 12:47, Federico wrote: Hello list, https://www.sqlite.org/lang_returning.html#limitations_and_caveats Searching the archive seems that a using the INSERT SELECT ORDER BY form should be a better solution, so the above insert should be rewritten as INSERT INTO t(data) SELECT data FROM (VALUES ('a', 1), ('b', 2), ('c', 3)) as vv(data, num) ORDER BY num RETURNING id Or with i as (INSERT INTO t(data) VALUES ('a', 1), ('b', 2), ('c', 3) returning id) select i.id from i order by id; +1 for this version! Sorry for the long email, Thanks Federico -- Achilleas Mantzios IT DEV - HEAD IT DEPT Dynacom Tankers Mgmt
Re: Guidance on INSERT RETURNING order
Federico writes: > Searching the archive seems that a using the INSERT SELECT ORDER BY > form should be a better solution, > so the above insert should be rewritten as > INSERT INTO t(data) > SELECT data FROM (VALUES ('a', 1), ('b', 2), ('c', 3)) as vv(data, > num) ORDER BY num > RETURNING id > to ensure that the id are created in the order specified by num. The > returned id can again be in > arbitrary order, but sorting them should enable correctly matching the > orm object so that they can > be properly updated. > Is this correct? No. Sadly, adding that ORDER BY is just voodoo programming, because it applies to the result of the SELECT while promising nothing about the order in which INSERT/RETURNING will act on those rows. Re-reading that 2012 thread, the main new observation I'd make today is that parallel operation is a thing now, and it's not hard to foresee that sometime soon we'll want to parallelize INSERTs. Which'd make it *really* hard to promise anything about the order of RETURNING output. I think if you want to use RETURNING with multi-row inserts, the thing to do is more like INSERT INTO t(data) VALUES ('a'), ('b'), ('c') RETURNING data, id and then explicitly match up the returned "data" values rather than presuming they appear in the same order you wrote them in in VALUES. Admittedly this might be problematic if some of the VALUES rows are identical, but how much should you care? regards, tom lane
Re: Guidance on INSERT RETURNING order
Thanks for the ansers > 2) What would you order by, id or data or both? by values order, (that incidentally seems to be what PG does) > with i as (INSERT INTO t(data) VALUES ('a', 1), ('b', 2), ('c', 3) > returning id) > select i.id from i order by id; The problem here is not having the auto increment id in a particular order, is that there is apparently no correlation with the position of an element in the values clause with the id generated. That's the reason for using the sentinel column in the general solution in the previous message. The extend on the use case, SQLAlchemy has 3 objects T that have T(data='a'), T(data='b'), T(data='c') but no value for the id column. The objective is to insert the 3 data values, get back the ids and correctly match them with the correct 3 objects. > No. Sadly, adding that ORDER BY is just voodoo programming, because > it applies to the result of the SELECT while promising nothing about > the order in which INSERT/RETURNING will act on those rows. I wasn't probably clear, it's fine if INSERT/RETURNING order is arbitrary, what matters is that the autoincementing values is executed in the same order as select, like mentioned in this previous message https://www.postgresql.org/message-id/29386.1528813619%40sss.pgh.pa.us Is that not the case? > Re-reading that 2012 thread, the main new observation I'd make today > is that parallel operation is a thing now, and it's not hard to foresee > that sometime soon we'll want to parallelize INSERTs. Which'd make it > *really* hard to promise anything about the order of RETURNING output. I think it's fine not promising anything about the order of RETURNING, but it would be very helpful having a way of tracking what input row generated a particular output row. Basically the sentinel case in the original post, without actually having to insert the sentinel into the table. > I think if you want to use RETURNING with multi-row inserts, the > thing to do is more like > > INSERT INTO t(data) VALUES ('a'), ('b'), ('c') RETURNING data, id > > and then explicitly match up the returned "data" values rather than > presuming they appear in the same order you wrote them in in VALUES. > Admittedly this might be problematic if some of the VALUES rows > are identical, but how much should you care? Well, the example is very easy, but it's hard to generalize when inserting multiple columns with possible complex values in them, since it would mean matching on possibly large json values, arrays, etc. So definitely not ideal Thanks, Federico On Tue, 11 Apr 2023 at 22:06, Adrian Klaver wrote: > > On 4/11/23 12:47, Federico wrote: > > Hello list, > > > https://www.sqlite.org/lang_returning.html#limitations_and_caveats > > > > Searching the archive seems that a using the INSERT SELECT ORDER BY > > form should be a better solution, > > so the above insert should be rewritten as > > > > INSERT INTO t(data) > > SELECT data FROM (VALUES ('a', 1), ('b', 2), ('c', 3)) as vv(data, > > num) ORDER BY num > > RETURNING id > > Or > > with i as (INSERT INTO t(data) VALUES ('a', 1), ('b', 2), ('c', 3) > returning id) > select i.id from i order by id; > > > Sorry for the long email, > > Thanks > > > > Federico > > > > > > -- > Adrian Klaver > adrian.kla...@aklaver.com >
Re: Guidance on INSERT RETURNING order
On Tue, 11 Apr 2023, Federico wrote: >The problem here is not having the auto increment id in a particular The id might not even be auto-increment but UUID or something… (I am surprised you would even try to insert multiple rows at once.) bye, //mirabilos -- 15:41⎜ Somebody write a testsuite for helloworld :-)
Re: Guidance on INSERT RETURNING order
On Tue, Apr 11, 2023, at 4:22 PM, Tom Lane wrote: > Federico writes: > > Searching the archive seems that a using the INSERT SELECT ORDER BY > > form should be a better solution, > > so the above insert should be rewritten as > > > INSERT INTO t(data) > > SELECT data FROM (VALUES ('a', 1), ('b', 2), ('c', 3)) as vv(data, > > num) ORDER BY num > > RETURNING id > > > to ensure that the id are created in the order specified by num. The > > returned id can again be in > > arbitrary order, but sorting them should enable correctly matching the > > orm object so that they can > > be properly updated. > > Is this correct? > > No. Sadly, adding that ORDER BY is just voodoo programming, because > it applies to the result of the SELECT while promising nothing about > the order in which INSERT/RETURNING will act on those rows. > > Re-reading that 2012 thread, the main new observation I'd make today > is that parallel operation is a thing now, and it's not hard to foresee > that sometime soon we'll want to parallelize INSERTs. Which'd make it > *really* hard to promise anything about the order of RETURNING output. if I can state this without having RETURNING getting in the way, because we know RETURNING is not ordered. Start with this table: CREATE TABLE mytable ( id SERIAL PRIMARY KEY, a INT, b INT ) Then insert two rows, where id SERIAL fires implicitly, assume the next value the sequence will give us is N1, and then the value after that is N2. It doesn't matter what N1 and N2 are (don't need to be consecutive) but we want N2 > N1, that is, increasing. INSERT INTO mytable (a, b) SELECT p1, p2 FROM (VALUES (10, 11, 1), (12, 13, 2)) AS sel(p1, p2, num) ORDER BY num Then SELECT with ORDER BY: SELECT id, a, b FROM mytable ORDER BY id We want the results to be: (N1, 10, 11) (N2, 12, 13) and we dont want them to *ever* be: (N1, 12, 13) (N2, 10, 11) that is, we want the SERIAL column (or an IDENTITY also) to be lined up with the VALUES. >From what you wrote in >https://www.postgresql.org/message-id/29386.1528813619%40sss.pgh.pa.us , that >seems to be exactly what you've stated, where this statement: INSERT INTO table (a, b) SELECT p1, p2 FROM (VALUES (10, 11, 1), (12, 13, 2)) AS sel(p1, p2, num) ORDER BY num is organized by the query planner to essentially be equivalent to this, where the nextval() is part of the SELECTed data: INSERT INTO mytable (id, a, b) SELECT nextval('mytable_id_seq'), p1, p2 FROM (SELECT p1, p2 FROM (VALUES (10, 11, 1), (12, 13, 2)) AS sel(p1, p2, num) ORDER BY num) as _x in practice, we add "RETURNING id" and expect those "id" cols to be in increasing order, so we sort to match it up with the input rows. > > I think if you want to use RETURNING with multi-row inserts, the > thing to do is more like > > INSERT INTO t(data) VALUES ('a'), ('b'), ('c') RETURNING data, id > > and then explicitly match up the returned "data" values rather than > presuming they appear in the same order you wrote them in in VALUES. we're going to do that also when the table has something like a uuid for a primary key or otherwise. > Admittedly this might be problematic if some of the VALUES rows > are identical, but how much should you care? we only do any of this if the rows have something unique in them we can hook onto. > > regards, tom lane > thanks so much for replying!
Re: Guidance on INSERT RETURNING order
On Tue, 11 Apr 2023 at 22:59, Thorsten Glaser wrote: > > On Tue, 11 Apr 2023, Federico wrote: > > >The problem here is not having the auto increment id in a particular > > The id might not even be auto-increment but UUID or something… > (I am surprised you would even try to insert multiple rows at once.) Well the documentation makes no mention of any limitation on returning and the observed behaviour has consistently been that returning is in values order. Again, that was SQLAlchemy's fault for assuming this (but the docs surely did not help). Also re-reading my reply, I've made a typo there, sorry. What it should have read is: The problem here is not having the returned ids in a particular order, is that there is apparently no correlation with the position of an element in the values clause with the id generated. Of course sorting the returned ids is only viable when using a serial or identity column, that's why in the general case I've mentioned the insert with sentinel column to ask if there are better or alternative solutions. Thanks for the reply, best Federico > > bye, > //mirabilos > -- > 15:41⎜ Somebody write a testsuite for helloworld :-) > > > >
Re: Guidance on INSERT RETURNING order
On Tue, 11 Apr 2023, Federico wrote: >Of course sorting the returned ids is only viable when using a serial Yes, which is why I pointed out it doesn’t have to be. >or identity column, that's why in the general case I've mentioned the >insert with sentinel column But it was pointed out that that’s not guaranteed either, unless you add that sentinel column to the table itself… bye, //mirabilos -- 15:41⎜ Somebody write a testsuite for helloworld :-)
Re: Guidance on INSERT RETURNING order
On 4/11/23 14:37, Federico wrote: The problem here is not having the auto increment id in a particular order, is that there is apparently no correlation with the position of an element in the values clause with the id generated. That's the reason for using the sentinel column in the general solution in the previous message. The extend on the use case, SQLAlchemy has 3 objects T that have T(data='a'), T(data='b'), T(data='c') but no value for the id column. The objective is to insert the 3 data values, get back the ids and correctly match them with the correct 3 objects. No. Sadly, adding that ORDER BY is just voodoo programming, because it applies to the result of the SELECT while promising nothing about the order in which INSERT/RETURNING will act on those rows. I wasn't probably clear, it's fine if INSERT/RETURNING order is arbitrary, what matters is that the autoincementing values is executed in the same order as select, like mentioned in this previous message https://www.postgresql.org/message-id/29386.1528813619%40sss.pgh.pa.us Is that not the case? Re-reading that 2012 thread, the main new observation I'd make today is that parallel operation is a thing now, and it's not hard to foresee that sometime soon we'll want to parallelize INSERTs. Which'd make it *really* hard to promise anything about the order of RETURNING output. I think it's fine not promising anything about the order of RETURNING, but it would be very helpful having a way of tracking what input row generated a particular output row. Basically the sentinel case in the original post, without actually having to insert the sentinel into the table. I think if you want to use RETURNING with multi-row inserts, the thing to do is more like INSERT INTO t(data) VALUES ('a'), ('b'), ('c') RETURNING data, id and then explicitly match up the returned "data" values rather than presuming they appear in the same order you wrote them in in VALUES. Admittedly this might be problematic if some of the VALUES rows are identical, but how much should you care? Well, the example is very easy, but it's hard to generalize when inserting multiple columns with possible complex values in them, since it would mean matching on possibly large json values, arrays, etc. So definitely not ideal Thanks, Federico Can your client retain a hashmap of md5,data pairings, allowing the lookup on the way back using the returned data and supplied id?
Re: Guidance on INSERT RETURNING order
On Tue, 11 Apr 2023 at 23:22, Thorsten Glaser wrote: > > On Tue, 11 Apr 2023, Federico wrote: > > >Of course sorting the returned ids is only viable when using a serial > > Yes, which is why I pointed out it doesn’t have to be. > > >or identity column, that's why in the general case I've mentioned the > >insert with sentinel column > > But it was pointed out that that’s not guaranteed either, unless you add > that sentinel column to the table itself… I was under the impression that when using INSERT SELECT ORDER BY the sequence ids were generated using the select order. That has been mentioned in multiple other previous messages, like https://www.postgresql.org/message-id/29386.1528813619%40sss.pgh.pa.us The above does not cover all cases, but in practice serial or identity are very common, so it would be nice if at least in these cases a sentinel is not needed Thanks, Federico > bye, > //mirabilos > -- > 15:41⎜ Somebody write a testsuite for helloworld :-) > > > >
Re: Guidance on INSERT RETURNING order
On Tue, 11 Apr 2023 at 23:31, Rob Sargent wrote: > > On 4/11/23 14:37, Federico wrote: > > > > The problem here is not having the auto increment id in a particular > > order, is that there > > is apparently no correlation with the position of an element in the > > values clause with the > > id generated. That's the reason for using the sentinel column in the > > general solution in the previous message. > > > > The extend on the use case, SQLAlchemy has 3 objects T that have > > T(data='a'), T(data='b'), T(data='c') but no > > value for the id column. The objective is to insert the 3 data values, > > get back the ids and correctly match them with > > the correct 3 objects. > > > >> No. Sadly, adding that ORDER BY is just voodoo programming, because > >> it applies to the result of the SELECT while promising nothing about > >> the order in which INSERT/RETURNING will act on those rows. > > I wasn't probably clear, it's fine if INSERT/RETURNING order is > > arbitrary, what matters is that the > > autoincementing values is executed in the same order as select, like > > mentioned in this > > previous message > > https://www.postgresql.org/message-id/29386.1528813619%40sss.pgh.pa.us > > > > Is that not the case? > > > >> Re-reading that 2012 thread, the main new observation I'd make today > >> is that parallel operation is a thing now, and it's not hard to foresee > >> that sometime soon we'll want to parallelize INSERTs. Which'd make it > >> *really* hard to promise anything about the order of RETURNING output. > > I think it's fine not promising anything about the order of RETURNING, but > > it would be very helpful having a way of tracking what input row > > generated a particular > > output row. Basically the sentinel case in the original post, > > without actually having to insert the sentinel into the table. > > > >> I think if you want to use RETURNING with multi-row inserts, the > >> thing to do is more like > >> > >>INSERT INTO t(data) VALUES ('a'), ('b'), ('c') RETURNING data, id > >> > >> and then explicitly match up the returned "data" values rather than > >> presuming they appear in the same order you wrote them in in VALUES. > >> Admittedly this might be problematic if some of the VALUES rows > >> are identical, but how much should you care? > > Well, the example is very easy, but it's hard to generalize when > > inserting multiple columns > > with possible complex values in them, since it would mean matching on > > possibly large json values, > > arrays, etc. So definitely not ideal > > > > Thanks, > > Federico > > > Can your client retain a hashmap of md5,data pairings, allowing the > lookup on the way back using the returned data and supplied id? > When using unique columns or similar, that's something that is done, but if there are no unique columns in the value no match can be done reliably with the source data, since sqlalchemy is a library that allows arbitrary schemas to be generated. Thanks for the reply, Federico
Re: Guidance on INSERT RETURNING order
On Tue, 11 Apr 2023, Federico wrote: >I was under the impression that when using INSERT SELECT ORDER BY the sequence >ids were generated using the select order. But someone said that’s not guaranteed, especially when INSERT will be parallelised later. bye, //mirabilos -- 15:41⎜ Somebody write a testsuite for helloworld :-)
Re: Guidance on INSERT RETURNING order
Can your client retain a hashmap of md5,data pairings, allowing the lookup on the way back using the returned data and supplied id? When using unique columns or similar, that's something that is done, but if there are no unique columns in the value no match can be done reliably with the source data, since sqlalchemy is a library that allows arbitrary schemas to be generated. Thanks for the reply, Federico So you're returned data is not what was sent to the server? Otherwise it should generate the same md5, as I understand it. Identical data would of course be a problem.
Re: Guidance on INSERT RETURNING order
On Tue, 11 Apr 2023 at 23:44, Thorsten Glaser wrote: > > On Tue, 11 Apr 2023, Federico wrote: > > >I was under the impression that when using INSERT SELECT ORDER BY the > >sequence > >ids were generated using the select order. > > But someone said that’s not guaranteed, especially when INSERT will > be parallelised later. It was Tom Lane's message that said > Re-reading that 2012 thread, the main new observation I'd make today > is that parallel operation is a thing now, and it's not hard to foresee > that sometime soon we'll want to parallelize INSERTs. Which'd make it > *really* hard to promise anything about the order of RETURNING output. My reading of it is that we are talking about RETURNING, not about the order in which the serial ids are generated. My understanding was that they are generated in select order, then the rows are inserted in any arbitrary order the planner may choose and returned again in any arbitrary order. If my understanding is incorrect, would this alternative guarantee the above (that nextval is called in the order set by ORDER BY), again re-using the table in the original message? INSERT INTO t(id, data) SELECT nextval(pg_get_serial_sequence('t', 'id')) data FROM (VALUES ('a', 1), ('b', 2), ('c', 3)) as vv(data, num) ORDER BY num RETURNING id best, Federico > bye, > //mirabilos > -- > 15:41⎜ Somebody write a testsuite for helloworld :-) > > > >
Re: Guidance on INSERT RETURNING order
On Tue, 11 Apr 2023 at 23:46, Rob Sargent wrote: > > > >> Can your client retain a hashmap of md5,data pairings, allowing the > >> lookup on the way back using the returned data and supplied id? > >> > > When using unique columns or similar, that's something that is done, > > but if there are no unique columns in the value no match can be done > > reliably with the source data, since sqlalchemy is a library that > > allows arbitrary schemas to be generated. > > > > Thanks for the reply, > >Federico > So you're returned data is not what was sent to the server? Otherwise it > should generate the same md5, as I understand it. Identical data would > of course be a problem. > That should be the case, yes. If a table has a non-nullable unique key, it should be possible to use a hashmap and perform that lockup. We are planning on implementing something like this to cover the cases where it can be used. Thanks for the reply, Federico
Re: Guidance on INSERT RETURNING order
On Tue, 11 Apr 2023, Federico wrote: >My understanding was that they are generated in select order But are they? (I don’t know, but I’d not assume that.) >If my understanding is incorrect, would this alternative guarantee the above >INSERT INTO t(id, data) >SELECT nextval(pg_get_serial_sequence('t', 'id')) data >FROM (VALUES ('a', 1), ('b', 2), ('c', 3)) as vv(data, num) >ORDER BY num >RETURNING id Wouldn’t, at that point, it be better to just send multiple individual INSERT statements? The overhead (on both sides) for all mentioned… workarounds… surely is larger than that? bye, //mirabilos -- 15:41⎜ Somebody write a testsuite for helloworld :-)
Re: Guidance on INSERT RETURNING order
I'm not getting every reply in the list but I want to point this out from the archive version of this thread: > > I was under the impression that when using INSERT SELECT ORDER BY the > > sequence > > ids were generated using the select order. > But someone said that’s not guaranteed, especially when INSERT will > be parallelised later. this should not matter. as in my other message if INSERT INTO table SELECT a, b FROM (VALUES () () ()) ORDER BY... takes the SERIAL or IDENTITY column of the table, and creates a parse tree for that looks like INSERT INTO table (SELECT nextval(table.id), a, b FROM (SELECT a, b FROM (VALUES () () ()) ORDER BY), the INSERT can put the rows in any way it wants. We don't care what INSERT does, we care about the generated sequence value, the nextval(table.id) part, which the SELECT should be emitting in order and occurs outside of the purview of the INSERT, according to other emails I have read on these lists (see my previous post). On Tue, Apr 11, 2023, at 5:07 PM, Federico wrote: > On Tue, 11 Apr 2023 at 22:59, Thorsten Glaser wrote: > > > > On Tue, 11 Apr 2023, Federico wrote: > > > > >The problem here is not having the auto increment id in a particular > > > > The id might not even be auto-increment but UUID or something… > > (I am surprised you would even try to insert multiple rows at once.) > > Well the documentation makes no mention of any limitation on returning > and the observed behaviour has consistently been that returning is in > values order. > Again, that was SQLAlchemy's fault for assuming this (but the docs > surely did not help). > > Also re-reading my reply, I've made a typo there, sorry. What it > should have read is: > The problem here is not having the returned ids in a particular > order, is that there is apparently no correlation with the position of > an element in the values clause with the id generated. > > Of course sorting the returned ids is only viable when using a serial > or identity column, that's why in the general case I've mentioned the > insert with sentinel column to ask if there are better or alternative > solutions. > > Thanks for the reply, best > Federico > > > > > bye, > > //mirabilos > > -- > > 15:41⎜ Somebody write a testsuite for helloworld :-) > > > > > > > > >
Transaction Rollback errors
Hi All, when my application (Node.js) receives a class 40 error: Class 40 — Transaction Rollback 4 transaction_rollback 40002 transaction_integrity_constraint_violation 40001 serialization_failure 40003 statement_completion_unknown 40P01 deadlock_detectedthen does it mean PG has already rolled back the tx and therefore I should not attempt to roll it back again? Thanks, S.
Re: "PANIC: could not open critical system index 2662" - twice
On Tue, Apr 11, 2023 at 04:44:54PM +, Evgeny Morozov wrote: > We have data_checksums=on. (It must be on by default, since I cannot > find that in our config files anywhere.) initdb does not enable checksums by default, requiring a -k/--data-checksums, so likely this addition comes from from your environment. > However, the docs say "Only > data pages are protected by checksums; internal data structures and > temporary files are not.", so I guess pg_class_oid_index might be an > "internal data structure"? pg_class_oid_index is a btree index that relies on 8k on-disk pages (default size), so it is subject to the same rules as normal relations regarding checksums for the pages flushed to disk, even if it is on a catalog. -- Michael signature.asc Description: PGP signature
Re: Transaction Rollback errors
Siddharth Jain writes: > when my application (Node.js) receives a class 40 error: > Class 40 — Transaction Rollback > 4 transaction_rollback > 40002 transaction_integrity_constraint_violation > 40001 serialization_failure > 40003 statement_completion_unknown > 40P01 deadlock_detectedthen does it mean PG has already rolled back the tx > and therefore I should not attempt to roll it back again? Thanks, No, treat these the same as any other error. regards, tom lane