Looking for an index that supports top-n searches by enforcing a max-n automatically
Looking for an index to support top-n searches, were n has a fixed maximum Recently, I've been looking at strategies to handle top-n queries in Postgres. In my current cases, we've got definition tables, and very large related tables. Here's a stripped-down example, the real tables are much wider. CREATE TABLE IF NOT EXISTS data.inventory ( id uuid NOT NULL DEFAULT NULL PRIMARY KEY, inv_id uuid NOT NULL DEFAULT NULL ); CREATE TABLE IF NOT EXISTS data.scan ( id uuid NOT NULL DEFAULT NULL PRIMARY KEY, inv_id uuid NOT NULL DEFAULT NULL scan_dts_utctimestamp NOT NULL DEFAULT NOW(); -- We run out servers on UTC ); Every item in inventory is scanned when it passes through various steps in a clean-dispatch-arrive-use-clean sort of a work cycle. The ratio between inventory and scan is 1:0-n, where n can be virtually any number. In another table pair like this, the average is 1:1,000. In the inventory example, it's roughly 0:200,000. The distribution of related row counts is all over the map. The reasons behind these ratios sometimes map to valid processes, and sometimes are a consequence of some low-quality data leaking into the system. In the case of inventory, the results make sense. In our case: * The goal value for n is often 1, and not more than up to 25. * Depending on the tables, the % of rows that are discarded because they're past the 25th most recent record is 97% or more. * Partial indexes do not work as well on huge tables as I hoped. The same data copied via a STATEMENT trigger into a thin, subsetted table is much faster. I think this has to do with the increase in random disk access required for a table and/or index with more pages spread around on the disk. * We can't filter in advance by *any* reasonable date range. Could get 25 scans on one item in an hour, or a year, or five years, or never. We're finding that we need the top-n records more and more often, returned quickly. This gets harder to do as the table(s) grow. SELECT id, scan_dts_utc FROM data.scan WHERE inv_id = 'b7db5d06-8275-224d-a38a-ac263dc1c767' curve. ORDER BY scan_dts_utc DESC LIMIT 25; -- Full search product might be 0, 200K, or anything in-between. Not on a bell curve. A compound index works really well to optimize these kinds of searches: CREATE INDEX scan_inv_id_scan_time_utc_dts_idx ON ascendco.analytic_scan (inv_id, scan_time_utc_dts DESC); What I'm wondering is if there is some index option, likely not with a B-tree, that can *automatically* enforce a maximum-length list of top values, based on a defined sort CREATE INDEX scan_inv_id_scan_time_utc_dts_idx ON ascendco.analytic_scan (inv_id, scan_time_utc_dts DESC) -- This defines the ordering LIMIT 25; -- This sets the hard max for n The goal is to have an automatically maintained list of the top values *in* the index itself. In the right situations (like ours), this reduces the index size by 20x or more. Smaller index, faster results. And, since the index is on the source table, the row references are already there. (Something I lose when maintaining this by hand in a side/shadow/top table.) I've looked at a ton of plans, and Postgres *clearly* goes to a lot of effort to recognize and optimize top-n searches already. That's encouraging, as it suggests that the planner takes LIMIT into account. (I've picked up already that maintaining the purity of the planner and executor abstractions is a core value to the project.) And, for sure, I can build and maintain my own custom, ordered list in various ways. None of them seem like they can possibly rival the trimming behavior being handled by an index. I'm out over my skis here, but I'm intuiting that this might be a job for one of the multi-value/inverted index types/frameworks. I tried some experiments, but only got worse results. Hope that reads as understandable...grateful for any suggestions.
Re: Looking for an index that supports top-n searches by enforcing a max-n automatically
Just about as soon as I sent the above, I realized that it's unlikely to make sense in the real world in a row-store. If the goal is to keep the top-25 results and trim the rest, what happens when values are added/modified/deleted? You now *have to go look at all of the data you aren't caching in the index. *Unless you can *guarantee* that data is entered in perfect order, and/or never changes, I don't think what I'm looking for is likely to make sense. On Fri, Apr 5, 2024 at 11:27 AM Morris de Oryx wrote: > Looking for an index to support top-n searches, were n has a fixed maximum > > Recently, I've been looking at strategies to handle top-n queries in > Postgres. In my current cases, we've got definition tables, and very large > related tables. Here's a stripped-down example, the real tables are much > wider. > > CREATE TABLE IF NOT EXISTS data.inventory ( > id uuid NOT NULL DEFAULT NULL PRIMARY KEY, > inv_id uuid NOT NULL DEFAULT NULL > ); > > > CREATE TABLE IF NOT EXISTS data.scan ( > id uuid NOT NULL DEFAULT NULL PRIMARY KEY, > inv_id uuid NOT NULL DEFAULT NULL > scan_dts_utctimestamp NOT NULL DEFAULT NOW(); -- We run out > servers on UTC > ); > > Every item in inventory is scanned when it passes through various steps in > a clean-dispatch-arrive-use-clean sort of a work cycle. The ratio between > inventory and scan is 1:0-n, where n can be virtually any number. In > another table pair like this, the average is 1:1,000. In the inventory > example, it's roughly 0:200,000. The distribution of related row counts is > all over the map. The reasons behind these ratios sometimes map to valid > processes, and sometimes are a consequence of some low-quality data leaking > into the system. In the case of inventory, the results make sense. In our > case: > > * The goal value for n is often 1, and not more than up to 25. > > * Depending on the tables, the % of rows that are discarded because > they're past the 25th most recent record is 97% or more. > > * Partial indexes do not work as well on huge tables as I hoped. The same > data copied via a STATEMENT trigger into a thin, subsetted table is much > faster. I think this has to do with the increase in random disk access > required for a table and/or index with more pages spread around on the disk. > > * We can't filter in advance by *any* reasonable date range. Could get 25 > scans on one item in an hour, or a year, or five years, or never. > > We're finding that we need the top-n records more and more often, returned > quickly. This gets harder to do as the table(s) grow. > >SELECT id, scan_dts_utc > FROM data.scan > WHERE inv_id = 'b7db5d06-8275-224d-a38a-ac263dc1c767' curve. > ORDER BY scan_dts_utc DESC > LIMIT 25; -- Full search product might be 0, 200K, or anything > in-between. Not on a bell curve. > > A compound index works really well to optimize these kinds of searches: > > CREATE INDEX scan_inv_id_scan_time_utc_dts_idx > ON ascendco.analytic_scan (inv_id, scan_time_utc_dts DESC); > > What I'm wondering is if there is some index option, likely not with a > B-tree, that can *automatically* enforce a maximum-length list of top > values, based on a defined sort > > CREATE INDEX scan_inv_id_scan_time_utc_dts_idx > ON ascendco.analytic_scan (inv_id, scan_time_utc_dts DESC) -- > This defines the ordering >LIMIT 25; -- > This sets the hard max for n > > The goal is to have an automatically maintained list of the top values > *in* the index itself. In the right situations (like ours), this reduces > the index size by 20x or more. Smaller index, faster results. And, since > the index is on the source table, the row references are already there. > (Something I lose when maintaining this by hand in a side/shadow/top table.) > > I've looked at a ton of plans, and Postgres *clearly* goes to a lot of > effort to recognize and optimize top-n searches already. That's > encouraging, as it suggests that the planner takes LIMIT into account. > (I've picked up already that maintaining the purity of the planner and > executor abstractions is a core value to the project.) > > And, for sure, I can build and maintain my own custom, ordered list in > various ways. None of them seem like they can possibly rival the trimming > behavior being handled by an index. > > I'm out over my skis here, but I'm intuiting that this might be a job for > one of the multi-value/inverted index types/frameworks. I tried some > experiments, but only got worse results. > > Hope that reads as understandable...grateful for any suggestions. >
Are operations on real values IMMUTABLE or STABLE?
I've got a small question about marking functions working with decimal number types as either IMMUTABLE or STABLE. Below are a pair of trivial functions that show what I'm guessing. An int8/int8[] seems like it's going to be immutable forever. However, decimal types aren't quite so crisp and consistent. Does this mean that I need to mark such a function as STABLE instead of IMMUTABLE, like below? I'm a bit hazy on exactly when some operations shift from IMMUTABLE to STABLE. For example, it seems fair that many time/date operations are not IMMUTABLE because they vary based on the current time zone. Likewise, I think that text operations are generally not IMMUTABLE since collations vary across versions and platforms. Any clarification would be appreciated. I've been googling around and checking the archives, but haven't found these specific details addressed, so far. Ah, and I have no clue how much difference it even makes to mark a function as IMMUTABLE instead of STABLE. If the difference is more theoretical than practical, I can feel comfortable using STABLE, when unclear. Thank you! --- -- array_sum(int8[]) : int8 --- CREATE OR REPLACE FUNCTION tools.array_sum(array_in int8[]) RETURNS int8 AS $BODY$ SELECT SUM(element) AS result FROM UNNEST(array_in) AS element; $BODY$ LANGUAGE sql IMMUTABLE; -- Add a comment to describe the function COMMENT ON FUNCTION tools.array_sum(int8[]) IS 'Sum an int8[] array.'; -- Set the function's owner to USER_BENDER ALTER FUNCTION tools.array_sum(int8[]) OWNER TO user_bender; --- -- array_sum(real[]]) : real --- CREATE OR REPLACE FUNCTION tools.array_sum(array_in real[]) RETURNS real AS $BODY$ SELECT SUM(element) AS result FROM UNNEST(array_in) AS element; $BODY$ LANGUAGE sql STABLE; -- Decimal number types seem to change across versions and chips? -- Add a comment to describe the function COMMENT ON FUNCTION tools.array_sum(real[]) IS 'Sum an real[] array.'; -- Set the function's owner to USER_BENDER ALTER FUNCTION tools.array_sum(real[]) OWNER TO user_bender;
Re: Are operations on real values IMMUTABLE or STABLE?
> > > I think you're overthinking it. > *Moi*? Never happens ;-) Fantastic answer, thanks very much for giving me all of these details. Coming from you, I'll take it as authoritative and run with it.
System column support for partitioned tables using heap
I've run into an existing behavior where xmax(), and various other system tables, return an error when included in the RETURNING list on a partitioned table. ERROR: cannot retrieve a system column in this context ` This issue got a fair airing back in 2020: AW: posgres 12 bug (partitioned table) https://www.postgresql.org/message-id/flat/GVAP278MB006939B1D7DFDD650E383FBFEACE0%40GVAP278MB0069.CHEP278.PROD.OUTLOOK.COM#908f2604081699e7f41fa20d352e1b79 I'm using 14.4, and just ran into this behavior today. I'm wondering if there has been any new work on this subject, or anything to take into account moving forward? I'm not a C coder, and do not know the Postgres internals, but here's what I gleaned from the thread: * Available system columns depend on the underlying table access method, and may/will vary across AMs. For example, the columns implemented by heap is what the docs describe, an FDW could be anything, and Postgres has no control of what, if any, system column-like attributes they support, and future and hypothetical AMs may have different sets. * Rather than return garbage results, or a default of 0, etc., the system throws the error I ran into. I'd be happier working with a NULL result than garbage, ambiguous results, or errors...but an error is the current behavior. Agreed on that, I'd rather an error than a bad/meaningless result. Postgres' consistent emphasis on correctness is easily one of its greatest qualities. In my case, I'm upgrading a lot of existing code to try and capture a more complete profile of what an UPSERT did. Right now, I grab a count(*) of the rows and return that. Works fine. A revised snippet looks a bit like this: ...UPSERT code returning xmax as inserted_transaction_id), status_data AS ( select count(*) FILTER (where inserted_transaction_id = 0) AS insert_count, count(*) FILTER (where inserted_transaction_id != 0) AS estimated_update_count, pg_current_xact_id_if_assigned()::text AS transaction_id from inserted_rows), ...custom logging code -- Final output/result. select insert_count, estimated_update_count, transaction_id from status_data; This fails on a partitioned table because xmax() may not exist. In fact, it does exist in all of those tables, but the system doesn't know how to guarantee that. I know which tables are partitioned, and can downgrade the result on partitioned tables to the count(*) I've been using to date. But now I'm wondering if working with xmax() like this is a poor idea going forward. I don't want to lean on a feature/behavior that's likely to change. For example, I noticed the other day that MERGE does not support RETURNING. I'd appreciate any insight or advice you can offer.
Re: System column support for partitioned tables using heap
> What is motivating you to want to see the xmax value here? It's not an > unreasonable thing to want to do, IMHO, but it's a little bit niche so > I'm just curious what the motivation is. Yeah, I figured it was niche when I saw so little mention of the issue. My reason for xmax() in the result is to break down the affected rows count into an insert count, and a modified estimate. Not super critical, but helpful. I've built out some simple custom logging table in out system for this kind of detail, and folks have been wanting to break down rows submitted, rows inserted, and rows updated a bit better. Rows submitted is easy and rows inserted is too...update is an estimate as I'm not using anything fancy with xmax() to sort out what exactly happened. For clarification, we're not using an ORM, and may need to support straggling clients, so our push cycle works like this: * Create a view with the fields expected in the insert. I figured I'd use CREATE VIEW instead of CREATE TYPE as then I can quickly check out the details against live data, and I still get a custom compound type. * Write a function that accepts an array of view_name_type. I *love* Postgres' typing system, It has spoiled me forever. Can't submit badly formatted objects from the client, they're rejected automatically. * Write a client-side routine to package data as an array and push it into the insert handling function. The function unnests the array, and then the actual insert code draws from the unpacked values. If I need to extend the table, I can add a new function that knows about the revised fields, and revise (when necessary) earlier supported formats to map to new types/columns/defaults. There are few CTEs in there, including one that does the main insert and returns the xmax(). That lets me distinguish xmax = 0 (insert) from xmax <> 0 (not an insert). > I do agree with you that it would be nice if this worked better than > it does, but I don't really know exactly how to make that happen. The > column list for a partitioned table must be fixed at the time it is > created, but we do not know what partitions might be added in the > future, and thus we don't know whether they will have an xmax column. > I guess we could have tried to work things out so that a 0 value would > be passed up from children that lack an xmax column, and that would > allow the parent to have such a column, but I don't feel too bad that > we didn't do that ... should I? You should never feel bad about anything ;-) You and others on that thread contribute so much that I'm getting value out of. I had it in mind that it would be nice to have some kind of catalog/abstraction that would make it possible to interrogate what system columns are available on a table/partition based on access method. In my vague notion, that might make some of the other ideas from that thread, such as index-oriented stores with quite different physical layouts, easier to implement. But, it's all free when you aren't the one who can write the code. I've switched the partition-based tables back to returning * on the insert CTE, and then aggregating that to add to a log table and the client result. It's fine. A rich result summary would be very nice. As in rows added/modified/deleted on whatever table(s). If anyone ever decides to implement such a structure for MERGE, it would be nice to see it retrofitted to the other data modification commands where RETURNING works. On Tue, Jul 19, 2022 at 6:13 AM Robert Haas wrote: > On Sun, Jul 17, 2022 at 9:04 PM Morris de Oryx > wrote: > > This fails on a partitioned table because xmax() may not exist. In fact, > it does exist in all of those tables, but the system doesn't know how to > guarantee that. I know which tables are partitioned, and can downgrade the > result on partitioned tables to the count(*) I've been using to date. But > now I'm wondering if working with xmax() like this is a poor idea going > forward. I don't want to lean on a feature/behavior that's likely to > change. For example, I noticed the other day that MERGE does not support > RETURNING. > > > > I'd appreciate any insight or advice you can offer. > > What is motivating you to want to see the xmax value here? It's not an > unreasonable thing to want to do, IMHO, but it's a little bit niche so > I'm just curious what the motivation is. > > I do agree with you that it would be nice if this worked better than > it does, but I don't really know exactly how to make that happen. The > column list for a partitioned table must be fixed at the time it is > created, but we do not know what partitions might be added in the > future, and thus we don't know whether they will have an xmax column. > I guess w
Re: System column support for partitioned tables using heap
> The column list for a partitioned table must be fixed at the time it is > created, but we do not know what partitions might be added in the > future, and thus we don't know whether they will have an xmax column. Right, seeing what you're meaning there. It's fantastic that a partition might be an FDW to a system that has no concept at all of anything like a "system column", or something with an alternative AM to heap that has a different set of system columns. That flexibility in partitions is super valuable. I'd love to be able to convert old partitions into column stores, for example. (I think that Citus offers that feature now.) I guess if anyone ever felt it was worth the effort, maybe whatever checks are done at attach-partition time for the column list could also enforce meta/system columns. If missing, a shimming mechanism would be pretty necessary. Sounds like a lot of work for not much gain, at least in this narrow case. Thanks again for answering. On Tue, Jul 19, 2022 at 6:43 PM Morris de Oryx wrote: > > What is motivating you to want to see the xmax value here? It's not an > > unreasonable thing to want to do, IMHO, but it's a little bit niche so > > I'm just curious what the motivation is. > > Yeah, I figured it was niche when I saw so little mention of the issue. > > My reason for xmax() in the result is to break down the affected rows > count into an insert count, and a modified estimate. Not super critical, > but helpful. I've built out some simple custom logging table in out system > for this kind of detail, and folks have been wanting to break down rows > submitted, rows inserted, and rows updated a bit better. Rows submitted is > easy and rows inserted is too...update is an estimate as I'm not using > anything fancy with xmax() to sort out what exactly happened. > > For clarification, we're not using an ORM, and may need to support > straggling clients, so our push cycle works like this: > > * Create a view with the fields expected in the insert. I figured I'd use > CREATE VIEW instead of CREATE TYPE as then I can quickly check out the > details against live data, and I still get a custom compound type. > > * Write a function that accepts an array of view_name_type. I *love* Postgres' > typing system, It has spoiled me forever. Can't submit badly formatted > objects from the client, they're rejected automatically. > > * Write a client-side routine to package data as an array and push it into > the insert handling function. The function unnests the array, and then the > actual insert code draws from the unpacked values. If I need to extend the > table, I can add a new function that knows about the revised fields, and > revise (when necessary) earlier supported formats to map to new > types/columns/defaults. > > There are few CTEs in there, including one that does the main insert and > returns the xmax(). That lets me distinguish xmax = 0 (insert) from xmax <> > 0 (not an insert). > > > I do agree with you that it would be nice if this worked better than > > it does, but I don't really know exactly how to make that happen. The > > column list for a partitioned table must be fixed at the time it is > > created, but we do not know what partitions might be added in the > > future, and thus we don't know whether they will have an xmax column. > > I guess we could have tried to work things out so that a 0 value would > > be passed up from children that lack an xmax column, and that would > > allow the parent to have such a column, but I don't feel too bad that > > we didn't do that ... should I? > > You should never feel bad about anything ;-) You and others on that thread > contribute so much that I'm getting value out of. > > I had it in mind that it would be nice to have some kind of > catalog/abstraction that would make it possible to interrogate what system > columns are available on a table/partition based on access method. In my > vague notion, that might make some of the other ideas from that thread, > such as index-oriented stores with quite different physical layouts, easier > to implement. But, it's all free when you aren't the one who can write the > code. > > I've switched the partition-based tables back to returning * on the insert > CTE, and then aggregating that to add to a log table and the client result. > It's fine. A rich result summary would be very nice. As in rows > added/modified/deleted on whatever table(s). If anyone ever decides to > implement such a structure for MERGE, it would be nice to see it > retrofitted to the other data modification commands where RETURNING works. > > On Tue, Jul 19
Re: System column support for partitioned tables using heap
On Tue, Jul 19, 2022 at 10:38 PM Robert Haas wrote: > For MERGE itself, I wonder if some information about this should be > included in the command tag. It looks like MERGE already includes some > sort of row count in the command tag, but I guess perhaps it doesn't > distinguish between inserts and updates. I don't know why we couldn't > expose multiple values this way, though. It would be great to get some sort of feedback from MERGE accessible through SQL results, even if that doesn't come in the form of a RETURNING list. > I wonder whether you could just have the CTEs bubble up 1 or 0 and > then sum them at some stage, instead of relying on xmax. Presumably > your UPSERT simulation knows which thing it did in each case. It might help if I show a sample insert handling function. The issue is with the line at the end of the top CTE, insert_rows: returning xmax as inserted_transaction_id), That's what fails on partitions. Is there an alternative way to test what happened to the row(s)? here's the full function. . I wrote a code generator, so I don't have to hand-code all of these bits for each table+version: -- Create a function to accept an array of rows formatted as item_type_v1 for UPSERT into item_type. DROP FUNCTION IF EXISTS types_plus.insert_item_type_v1 (types_plus.item_type_v1[]); CREATE OR REPLACE FUNCTION types_plus.insert_item_type_v1 (data_in types_plus.item_type_v1[]) RETURNS TABLE ( insert_countinteger, estimated_update_count integer, transaction_id text) LANGUAGE SQL BEGIN ATOMIC -- The CTE below is a roundabout way of returning an insertion count from a pure SQL function in Postgres. WITH inserted_rows as ( INSERT INTO item_type ( id, marked_for_deletion, name_) SELECT rows_in.id, rows_in.marked_for_deletion, rows_in.name_ FROM unnest(data_in) as rows_in ON CONFLICT(id) DO UPDATE SET marked_for_deletion = EXCLUDED.marked_for_deletion, name_ = EXCLUDED.name_ returning xmax as inserted_transaction_id), status_data AS ( select count(*) FILTER (where inserted_transaction_id = 0) AS insert_count, count(*) FILTER (where inserted_transaction_id != 0) AS estimated_update_count, pg_current_xact_id_if_assigned()::text AS transaction_id from inserted_rows), insert_log_entry AS ( INSERT INTO insert_log ( data_file_id, ib_version, job_run_id, schema_name, table_name, records_submitted, insert_count, estimated_update_count) SELECT coalesce_session_variable( 'data_file_id', '')::uuid, coalesce_session_variable('ib_version'), -- Default result is '' coalesce_session_variable( 'job_run_id', '')::uuid, 'ascendco', 'item_type', (select cardinality(data_in)), insert_count, estimated_update_count FROM status_data ) -- Final output/result. select insert_count, estimated_update_count, transaction_id from status_data; END;
Re: [BUG] pg_dump does not properly deal with BEGIN ATOMIC function
Thanks to Imseih and Sami at AWS for reporting this. The original case comes from an upgrade I've been trying to complete for a couple of months now, since RDS started supporting 15 with a 15.2 release. The process has been slow and painful because, originally, there was a bug on the RDS side that stopped any of the upgrade logs from appearing in RDS or CloudWatch. Now, minimal log errors are shown, but not much detail. I think that BEGIN ATOMIC is the sleeper feature of Postgres 14. It is a *fantastic *addition to the dependency-tracking system. However, it does not seem to work. I just found this thread now looking for the string warning: could not resolve dependency loop among these items. I got that far by setting up a new database, and a simple test case that reproduces the problem. I called the test database ba for Begin Atomic: -- New database CREATE DATABASE ba; -- Connect -- Connect to new database... -- Setup schemas CREATE SCHEMA data; -- I don't have public running, so create a schema. -- All of my UPSERT view and function plumbing is tucked away here: CREATE SCHEMA types_plus; -- Define table DROP TABLE IF EXISTS data.test_event; CREATE TABLE IF NOT EXISTS data.test_event ( iduuid NOT NULL DEFAULT NULL PRIMARY KEY, ts_dtstimestamp NOT NULL DEFAULT 'epoch', who text NOT NULL DEFAULT NULL, what text NOT NULL DEFAULT NULL ); -- PK is created by default as test_event_pkey, used in ON CONFLICT later. -- Create view, get type for free CREATE VIEW types_plus.test_event_v1 AS SELECT id, ts_dts, who, what FROM data.test_event; -- Create a function to accept an array of rows formatted as test_event_v1 for UPSERT into test_event. DROP FUNCTION IF EXISTS types_plus.insert_test_event_v1 (types_plus.test_event_v1[]); CREATE OR REPLACE FUNCTION types_plus.insert_test_event_v1 (data_in types_plus.test_event_v1[]) RETURNS void LANGUAGE SQL BEGIN ATOMIC INSERT INTO data.test_event ( id, ts_dts, who, what) SELECT rows_in.id, rows_in.ts_dts, rows_in.who, rows_in.what FROM unnest(data_in) as rows_in ON CONFLICT ON CONSTRAINT test_event_pkey DO UPDATE SET id = EXCLUDED.id, ts_dts = EXCLUDED.ts_dts, who= EXCLUDED.who, what = EXCLUDED.what; END; I've tested pg_dump with the plain, custom, directory, and tar options. All report the same problem: Note that I'm using Postgres and pg_dump 14.8 locally, RDS is still at 14.7 of Postgres and presumably 14.7 of pg_dump*. * pg_dump: warning: could not resolve dependency loop among these items: pg_dump: FUNCTION insert_test_event_v1 (ID 224 OID 1061258) pg_dump: CONSTRAINT test_event_pkey (ID 3441 OID 1061253) pg_dump: POST-DATA BOUNDARY (ID 3584) pg_dump: TABLE DATA test_event (ID 3582 OID 1061246) pg_dump: PRE-DATA BOUNDARY (ID 3583) Hunting around earlier, I found a thread here from 2020 that mentioned that BEGIN ATOMIC was going to make dependency resolution tougher for pg_dump. Makes sense, it can become circular or ambiguous in a hurry. However, in my case, I don't see that the dependencies are any kind of crazy spaghetti. I have hundreds of tables with the same pattern of dependencies for UPSERT work: 1. CREATE TABLE foo 2. CREATE PK foo_pk and other constraints. 3. CREATE VIEW foo_v1 (I could use CREATE TYPE here, for my purposes, but prefer CREATE VIEW.) 4. CREATE FUNCTION insert_foo_v1 (foo_v1[]) > The example I listed earlier is a simplified version of this. I didn't even check that the new database works, that's not importantI am only trying to check out pg_dump/pg_restore. Can anyone suggest a path forward for me with the upgrade to PG 15? I'm waiting on that as we need to use MERGE and I'd like other PG 15 improvements, like the sort optimizations. As far as I can see it, my best bet is to 1. Delete all of my routines with BEGIN ATOMIC. That's roughly 250 routines. 2. Upgrade. 3. Add back in the routines in PG 15. That likely would work for me as my dependencies are shallow and not circular. They simply require a specific order. I avoid chaining views of views and functions off functions as a deliberate practice in Postgres. Down the track, does my sort of dependency problem seem resolvable by pg_dump? I've got my own build-the-system-from-scratch system that use for local testing out of the source files, and I had to resort to hinting files to inject some things in the correct order. So, I'm not assuming that it *is* possible for pg_dump to resolve all sequences. Then again, all of this could go
Re: [BUG] pg_dump does not properly deal with BEGIN ATOMIC function
Edit error above, I said that dependency tracking "does not seem to work." Not what I mean, it works great...It just does not seem to work for me with any of the upgrade options. >
Re: [BUG] pg_dump does not properly deal with BEGIN ATOMIC function
Well *that *was quick. Thank you! Imseih, Sami what are the chances of getting RDS to apply this patch? Postgres 15 was released nearly 8 months ago, and it would be great to get onto it. Thanks On Mon, Jun 5, 2023 at 3:01 PM Tom Lane wrote: > Morris de Oryx writes: > > Can anyone suggest a path forward for me with the upgrade to PG 15? > > Apply this patch: > > > https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=ca9e79274938d8ede07d9990c2f6f5107553b524 > > or more likely, pester RDS to do so sooner than the next quarterly > releases. > > regards, tom lane >
Re: [BUG] pg_dump does not properly deal with BEGIN ATOMIC function
Reminds me to say a *big* *thank you* to everyone involved in and contributing to Postgres development for making error messages which are so good. For a programmer, error text is a primary UI. Most Postgres errors and log messages are clear and sufficient. Even when they're a bit obscure, they alway seem to be *on topic*, and enough to get you on the right track.I assume that we've all used programs and operating systems that emit morerunic...errors. On Mon, Jun 5, 2023 at 6:03 PM Morris de Oryx wrote: > Another suggestion for AWS/RDS: Expose *all of the logs in the upgrade > tool chain*. If I'd had all of the logs at the start of this, I'd have > been able to track down the issue myself quite quickly. Setting up that > simple case database took me less than an hour today. Without the logs, > it's been impossible (until the RDS patch a month ago) and difficult (now) > to get a sense of what's happening. > > Thank you > > On Mon, Jun 5, 2023 at 5:19 PM Kirk Wolak wrote: > >> On Sun, Jun 4, 2023 at 1:41 PM Tom Lane wrote: >> >>> Kirk Wolak writes: >>> .. to strings of other lengths. So the new output (before 016107478 >>> fixed it) is >>> >>> pg_dump: warning: could not resolve dependency loop among these items: >>> pg_dump: detail: FUNCTION a_f (ID 216 OID 40532) >>> pg_dump: detail: CONSTRAINT a_pkey (ID 3466 OID 40531) >>> pg_dump: detail: POST-DATA BOUNDARY (ID 3612) >>> pg_dump: detail: TABLE DATA a (ID 3610 OID 40525) >>> pg_dump: detail: PRE-DATA BOUNDARY (ID 3611) >>> >>> regards, tom lane >>> >> +1 >> >
Re: [BUG] pg_dump does not properly deal with BEGIN ATOMIC function
Another suggestion for AWS/RDS: Expose *all of the logs in the upgrade tool chain*. If I'd had all of the logs at the start of this, I'd have been able to track down the issue myself quite quickly. Setting up that simple case database took me less than an hour today. Without the logs, it's been impossible (until the RDS patch a month ago) and difficult (now) to get a sense of what's happening. Thank you On Mon, Jun 5, 2023 at 5:19 PM Kirk Wolak wrote: > On Sun, Jun 4, 2023 at 1:41 PM Tom Lane wrote: > >> Kirk Wolak writes: >> .. to strings of other lengths. So the new output (before 016107478 >> fixed it) is >> >> pg_dump: warning: could not resolve dependency loop among these items: >> pg_dump: detail: FUNCTION a_f (ID 216 OID 40532) >> pg_dump: detail: CONSTRAINT a_pkey (ID 3466 OID 40531) >> pg_dump: detail: POST-DATA BOUNDARY (ID 3612) >> pg_dump: detail: TABLE DATA a (ID 3610 OID 40525) >> pg_dump: detail: PRE-DATA BOUNDARY (ID 3611) >> >> regards, tom lane >> > +1 >
Re: [BUG] pg_dump does not properly deal with BEGIN ATOMIC function
Quick follow-up: I've heard back from AWS regarding applying Tom Lane's patch. Nope. RDS releases numbered versions, nothing else. As Postgres is now at 15.8/15.3 in the wild and on 15.7/15.3 on RDS, I'm guessing that the patch won't be available until 14.9/15.4. Am I right in thinking that this patch will be integrated into 14.9/15.4, if they are released? Thank you On Mon, Jun 5, 2023 at 6:20 PM Morris de Oryx wrote: > Reminds me to say a *big* *thank you* to everyone involved in and > contributing to Postgres development for making error messages which are so > good. For a programmer, error text is a primary UI. Most Postgres errors > and log messages are clear and sufficient. Even when they're a bit obscure, > they alway seem to be *on topic*, and enough to get you on the right > track.I assume that we've all used programs and operating systems that emit > morerunic...errors. > > On Mon, Jun 5, 2023 at 6:03 PM Morris de Oryx > wrote: > >> Another suggestion for AWS/RDS: Expose *all of the logs in the upgrade >> tool chain*. If I'd had all of the logs at the start of this, I'd have >> been able to track down the issue myself quite quickly. Setting up that >> simple case database took me less than an hour today. Without the logs, >> it's been impossible (until the RDS patch a month ago) and difficult (now) >> to get a sense of what's happening. >> >> Thank you >> >> On Mon, Jun 5, 2023 at 5:19 PM Kirk Wolak wrote: >> >>> On Sun, Jun 4, 2023 at 1:41 PM Tom Lane wrote: >>> >>>> Kirk Wolak writes: >>>> .. to strings of other lengths. So the new output (before 016107478 >>>> fixed it) is >>>> >>>> pg_dump: warning: could not resolve dependency loop among these items: >>>> pg_dump: detail: FUNCTION a_f (ID 216 OID 40532) >>>> pg_dump: detail: CONSTRAINT a_pkey (ID 3466 OID 40531) >>>> pg_dump: detail: POST-DATA BOUNDARY (ID 3612) >>>> pg_dump: detail: TABLE DATA a (ID 3610 OID 40525) >>>> pg_dump: detail: PRE-DATA BOUNDARY (ID 3611) >>>> >>>> regards, tom lane >>>> >>> +1 >>> >>
Re: [BUG] pg_dump does not properly deal with BEGIN ATOMIC function
Thanks or the confirmation, and here's hoping no ICBMs!
Re: crosstab/repivot...any interest?
Hello, I'm not a C coder and can't helpbut I *love* cross-tab/pivot tables. They're the best, and just fantastic for preparing data to feed into various analysis tools. The tablefunc module is helpful, but a bit awkward to use (as I remember it.) >From a user's point of view, I high-performance cross-tab generator would be just fantastic. As I understand it, this is what's involved in a pivot: 1. Identify rows that should be *grouped* (consolidated.) 2. Distinguish the value that identifies each *derived* column. 3. Distinguish the value that *identifies* each row-column value. 4. *Collapse* the rows, *build* the columns, and *populate* the 'cells' with data. In an ideal world, you would be able to perform different grouping operations. Such as count, sum, avg, etc. If there's a way to do this in a system-wide and standards-pointing way, so much the better. *Apologies* if I'm violating list etiquette by jumping in here. I've been lurking on several Postgres lists for a bit and picking up interesting details every day. If I've been Unintentionally and Cluelessly Off, I'm find with being told. On Sat, Jan 26, 2019 at 10:49 AM David Fetter wrote: > On Fri, Jan 25, 2019 at 04:31:00PM -0600, Merlin Moncure wrote: > > On Fri, Jan 25, 2019 at 3:16 PM David Fetter wrote: > > > > > > On Fri, Jan 25, 2019 at 02:21:55PM -0600, Merlin Moncure wrote: > > > > Hackers, > > > > > > > > We have a strong need to make a variant to the crosstab interface so > > > > that data that is pivoted one way would be sent through a crosstab > > > > like function so that it would be pivoted another way. For example, > > > > if you had > > > > > > > > row 0: a1, a2, a3, k1, c1, c2, ... > > > > row 1: a1, a2, a3, k2, c1, c2, ... > > > > row 2: a1, a2, a3, k3, c1, c2, ... > > > > ... > > > > > > > > where 'a' columns are uninteresting attribute columns, 'k' is the > > > > dimension we want to pivot on, and c1->cN would be stacked > vertically, > > > > so that we'd end up with, > > > > row 0: a1, a2, a3, c1, k1, k2, ... > > > > row 1: a1, a2, a3, c2, k1, k2, ... > > > > row 2: a1, a2, a3, c3, k1, k2, ... > > > > > > > > There are various SQL level approaches to this but they tend to be > > > > imperformant with large datasets so that I think a crosstab-like C > > > > implementation ought to be able to do better (or at least I hope so) > > > > since you have to cross product rows and columns in such a way that > > > > you can get a clean join. Cribbing from tablefunc.c I don't think > > > > this is a terrible challenge to do in hash table style. > > > > > > > > Questions on the table: > > > > *) Has anyone done anything like this or know of any current > implementations? > > > > *) Would there be any interest in expanding tablefunc along these > lines? > > > > > > There's something in SQL:2016 that I read as crosstabs, or at least as > > > enabling crosstabs. > > > https://www.iso.org/standard/69776.html > > > > > > If we're going to put work into crosstabs, it seems to me that the > > > "we" needs to be the project as a whole, and the work should be, to > > > the extent reasonable, toward standard compliance. > > > > Interesting. Do you see that the spec (it makes my brain hurt) can > > handle that kind of repivoting? > > I believe the constructs can nest and/or refer to each other, so yes. > > Best, > David. > -- > David Fetter http://fetter.org/ > Phone: +1 415 235 3778 > > Remember to vote! > Consider donating to Postgres: http://www.postgresql.org/about/donate > >
Detailed questions about pg_xact_commit_timestamp
I have some specific questions about pg_xact_commit_timestamp, and am hoping that this is the right place to ask. I read a lot of the commentary about the original patch, and the contributors seem to be here. If I'm asking in the wrong place, just let me know. I'm working on a design for a concurrency-safe incremental aggregate rollup system,and pg_xact_commit_timestamp sounds perfect. But I've found very little commentary on it generally, and couldn't figure out how it works in detail from the source code. Hopefully, someone knows the answers to a few questions: * Is it possible for pg_xact_commit_timestamp to produce times out of order? What I'm after is a way to identify records that have been chagned since a specific time so that I can get any later changes for processing. I don't need them in commit order, so overlapping timestamps aren't a problem. * How many bytes are added to each row in the final implementation? The discussions I saw seemed to be ranging from 12-24 bytes. There was discussion of adding in extra bytes for "just in case." This is pre 9.5, so a world ago. * Are the timestamps indexed internally? With a B-tree? I ask for capacity-planning reasons. * I've seen on StackOverflow and the design discussions that the timestamps are not kept indefinitely, but can't find the details on exactly how long they are stored. * Any rules of thumb on the performance impact of enabling pg_xact_commit_timestamp? I don't need the data on all tables but, where I do, it sounds like it might work perfectly. Many thanks for any assistance!
Re: Detailed questions about pg_xact_commit_timestamp
Adrien, thanks very much for answering my question. Just a couple of follow-up points, if you don't mind. In our answer, you offer an example of pg_xact_commit_timestamp showing out-of-sequence commit times: Session xid pg_xact_commit_timestamp A 34386826 2019-07-11 09:32:38.994440+00 Started earlier, committed later B 34386827 2019-07-11 09:32:29.806183+00 I may not have asked my question clearly, or I may not understand the answer properly. Or both ;-) If I understand it correctly, an xid is assigned when a transaction starts. One transaction might take a second, another might take ten minutes. So, the xid sequence doesn't imply anything at all about commit sequence. What I'm trying to figure out is if it is possible for the commit timestamps to somehow be out of order. What I'm looking for is a way of finding changes committed since a specific moment. When the transaction started doesn't matter in my case. Is pg_xact_commit_timestamp suitable for this? I'm getting the impression that it isn't. But I don't understand quite how. And if it isn't suited to this purpose, does anyone know what pg_xact_commit_timestamp is for? What I'm after is something like a "xcommitserial" that increases reliably, and monotonically on transaction commit. That's how I'm hoping that pg_xact_commit_timestamp functions. Thanks also for making me understand that pg_xact_commit_timestamp applies to a *transaction*, not to each row. That makes it a lot lighter in the database. I was thinking 12 bytes+ per row, which is completely off for my case. (I tend to insert thousands of rows in a transaction.) > Yes timestamp are stored in pg_commit_ts directory. Old timestamp are removed after freeze has explained in > https://www.postgresql.org/docs/current/routine-vacuuming.html Thanks for the answer, and for kindly pointing me to the right section of the documentation. It's easy to get impatient with new(er) users. I'm _not_ lazy about reading manuls and researching but, well, the Postgres documentation is over 3,000 pages long when you download it. So, I may have missed a detail or two If I read that correctly, the ~4 billion number range is made into an endless circle by keeping ~2 billions numbers in the past, and 2 billion in the future. If that's right, I'm never going to be so out of data that the ~2 billion number window is too small.
Re: Detailed questions about pg_xact_commit_timestamp
Adrien, thanks a lot for taking the time to try and explain all of these details to me. I'm looking at incremental rollups, and thinking through various alternative designs. It sounds like pg_xact_commit_timestamp just isn't the right tool for my purposes, so I'll go in another direction. All the same, I've learned a _lot_ of important points about Postgres from trying to sort all of this out. Your messages have been a real help. On Tue, Jul 16, 2019 at 7:03 PM Adrien Nayrat wrote: > On 7/12/19 2:50 PM, Morris de Oryx wrote: > > Adrien, thanks very much for answering my question. Just a couple of > follow-up > > points, if you don't mind. > > > > In our answer, you offer an example of pg_xact_commit_timestamp showing > > out-of-sequence commit times: > > > > Session xid pg_xact_commit_timestamp > > A 34386826 2019-07-11 09:32:38.994440+00 Started earlier, > > committed later > > B 34386827 2019-07-11 09:32:29.806183+00 > > > > I may not have asked my question clearly, or I may not understand the > answer > > properly. Or both ;-) If I understand it correctly, an xid is assigned > when a > > transaction starts. > > It is a little bit more complicated :) When a transaction start, a > *virtual* xid > is assigned. It is when the transaction change the state of the database, > an xid > is assigned: > > Throughout running a transaction, a server process holds an exclusive > lock on the transaction's virtual transaction ID. If a permanent ID is > assigned to the transaction (which normally happens only if the transaction > changes the state of the database), it also holds an exclusive lock on the > transaction's permanent transaction ID until it ends. > > https://www.postgresql.org/docs/current/view-pg-locks.html > > (It shouldn't change anything for you) > > > > One transaction might take a second, another might take ten > > minutes. So, the xid sequence doesn't imply anything at all about commit > > sequence. What I'm trying to figure out is if it is possible for the > commit > > timestamps to somehow be out of order. > > I am sorry but I don't understand what you mean by "commit timestamps to > somehow > be out of order"? > > > What I'm looking for is a way of finding > > changes committed since a specific moment. When the transaction started > doesn't > > matter in my case. > > > Yes, the commit timestamp is the time when the transaction is committed : > postgres=# begin; > BEGIN > postgres=# select now(); > now > -- > 2019-07-16 08:46:59.64712+00 > (1 row) > > postgres=# select txid_current(); > txid_current > -- > 34386830 > (1 row) > > postgres=# commit; > COMMIT > postgres=# select pg_xact_commit_timestamp('34386830'::xid); >pg_xact_commit_timestamp > --- > 2019-07-16 08:47:30.238746+00 > (1 row) > > > > > > Is pg_xact_commit_timestamp suitable for this? I'm getting the > impression that > > it isn't. But I don't understand quite how. And if it isn't suited to > this > > purpose, does anyone know what pg_xact_commit_timestamp is for? What I'm > after > > is something like a "xcommitserial" that increases reliably, and > monotonically > > on transaction commit. That's how I'm hoping that > pg_xact_commit_timestamp > > functions. > > I don't think so. pg_xact_commit_timestamp returns the timestamp. If you > want > some kind of ordering you have to fetch all commit timestamps (with their > respective xid) and order them. > > You also can implement this tracking by yourself with triggers which > insert a > row containing xid and timestamp in a tracking table. You can add an index > on > timestamp column. With this approach you don't have to worry about vacuum > freeze > which remove old timestamps. As you add more write, it could be more > expensive > than track_commit_timestamp. > > > > > Thanks also for making me understand that pg_xact_commit_timestamp > applies to a > > *transaction*, not to each row. That makes it a lot lighter in the > database. I > > was thinking 12 bytes+ per row, which is completely off for my case. (I > tend to > > insert thousands of rows in a transaction.) > > > >> Yes timestamp are stored in pg_commit_ts directory. Old timestamp are > removed > > after freeze has explained in > >> https://www.postgresql.org/docs/current/routine-vacuuming.ht