Re: Proposal for disk quota feature
Thanks Chapman. @Pavel, could you please explain more about your second suggestion "implement some quotas on storage level?" We will not keep the long-lived processes attach to all databases(just like you mentioned servers with thousands of databases) And you are right, we could share ideas with autovacuum process, fork worker processes in need. "autovacuum checks for tables that have had a large number of inserted, updated or deleted tuples. These checks use the statistics collection facility" diskquota process is similar to autovacuum at caring about insert, but the difference is that it also care about vucuum full, truncate and drop. While update and delete may not be interested since no file change happens. So a separate diskquota process is preferred. So if we implemented disk quota as a full native feature, and in the first initial version I prefer to implement the following features: 1 Fork diskquota launcher process under Postmaster serverloop, which is long-lived. 2 Diskquota launcher process is responsible for creating diskquota worker process for every database. 3 DIskquota setting is stored in a separate catalog table for each database. 4 Initialization stage, Diskquota launcher process creates diskquota worker process for all the databases(traverse like autovacuum). Worker process calculates disk usage of db objects and their diskquota setting. If any db object exceeds its quota limit, put them into the blacklist in the shared memory, which will later be used by enforcement operator. Worker process exits when works are done. 5 Running stage, Diskquota launcher process creates diskquota worker process for the database with a large number of insert, copy, truncate, drop etc. or create disk quota statement. Worker process updates the file size for db objects containing the result relation, and compare with the diskquota setting. Again, if exceeds quota limit, put them into blacklist, remove from blacklist vice versa. Worker process exits when works are done and a GUC could control the frequency of worker process restart to a specific database. As you know, this GUC also controls the delay when we do enforcement. 6 Enforcement. When postgres backend executes queries, check the blacklist in shared memory to determine whether the query is allowed(before execute) or need rollback(is executing)? If we implemented disk quota as an extension, we could just use background worker to start diskquota launcher process and use RegisterDynamicBackgroundWorker() to fork child diskquota worker processes by the launcher process as suggested by @Chapman. Diskquota setting could be stored in user table in a separate schema for each database(Schema and table created by create extension statement) just like what Heikki has done in pg_quota project. But in this case, we need to create extension for each database before diskquota worker process can be set up for that database. Any comments on the above design and which is preferred, native feature or extension as the POC? -- Hubert On Fri, Aug 31, 2018 at 3:32 AM, Pavel Stehule wrote: > > > 2018-08-30 16:22 GMT+02:00 Chapman Flack : > >> On 08/30/2018 09:57 AM, Hubert Zhang wrote: >> >> > 2 Keep one worker process for each database. But using a parent/global >> > quota worker process to manage the lifecycle of database level worker >> > processes. It could handle the newly created database(avoid restart >> > database) and save resource when a database is not used. But this needs >> to >> > change worker process to be hierarchical. Postmaster becomes the >> grandfather >> > of database level worker processes in this case. >> >> I am using background workers this way in 9.5 at $work. >> >> In my case, one worker lives forever, wakes up on a set period, and >> starts a short-lived worker for every database, waiting for each >> one before starting the next. >> >> It was straightforward to implement. Looking back over the code, >> I see the global worker assigns its own PID to worker.bgw_notify_pid >> of each of its children, and also obtains a handle for each child >> from RegisterDynamicBackgroundWorker(). >> >> I imagine the global quota worker would prefer to start workers >> for every database and then just wait for notifications from any >> of them, but that seems equally straightforward at first glance. >> > > There are servers with thousands databases. Worker per database is not > good idea. > > It should to share ideas, code with autovacuum process. > > Not sure, how to effective implementation based on bg workers can be. On > servers with large set of databases, large set of tables it can identify > too big table too late. > > Isn't better to implement some quotas on storage level? > > Regards > > Pavel > > > >> -Chap >> >> > -- Thanks Hubert Zhang
Re: Proposal for disk quota feature
Hi 2018-09-02 14:18 GMT+02:00 Hubert Zhang : > Thanks Chapman. > @Pavel, could you please explain more about your second suggestion "implement > some quotas on storage level?" > See attached patch - it is very simple - and good enough for our purposes. Regards Pavel > We will not keep the long-lived processes attach to all databases(just > like you mentioned servers with thousands of databases) > And you are right, we could share ideas with autovacuum process, fork > worker processes in need. > "autovacuum checks for tables that have had a large number of inserted, > updated or deleted tuples. These checks use the statistics collection > facility" > diskquota process is similar to autovacuum at caring about insert, but the > difference is that it also care about vucuum full, truncate and drop. While > update and delete may not be interested since no file change happens. So a > separate diskquota process is preferred. > > So if we implemented disk quota as a full native feature, and in the > first initial version I prefer to implement the following features: > 1 Fork diskquota launcher process under Postmaster serverloop, which is > long-lived. > 2 Diskquota launcher process is responsible for creating diskquota worker > process for every database. > 3 DIskquota setting is stored in a separate catalog table for each > database. > 4 Initialization stage, Diskquota launcher process creates diskquota worker > process for all the databases(traverse like autovacuum). Worker process > calculates disk usage of db objects and their diskquota setting. If any > db object exceeds its quota limit, put them into the blacklist in the > shared memory, which will later be used by enforcement operator. Worker > process exits when works are done. > 5 Running stage, Diskquota launcher process creates diskquota worker > process for the database with a large number of insert, copy, truncate, > drop etc. or create disk quota statement. Worker process updates the file > size for db objects containing the result relation, and compare with the > diskquota setting. Again, if exceeds quota limit, put them into blacklist, > remove from blacklist vice versa. Worker process exits when works are > done and a GUC could control the frequency of worker process restart to a > specific database. As you know, this GUC also controls the delay when we do > enforcement. > 6 Enforcement. When postgres backend executes queries, check the blacklist > in shared memory to determine whether the query is allowed(before execute) > or need rollback(is executing)? > > If we implemented disk quota as an extension, we could just use background > worker to start diskquota launcher process and use > RegisterDynamicBackgroundWorker() to fork child diskquota worker > processes by the launcher process as suggested by @Chapman. > Diskquota setting could be stored in user table in a separate schema for > each database(Schema and table created by create extension statement) just > like what Heikki has done in pg_quota project. But in this case, we need to > create extension for each database before diskquota worker process can be > set up for that database. > > Any comments on the above design and which is preferred, native feature or > extension as the POC? > > > -- Hubert > > > > On Fri, Aug 31, 2018 at 3:32 AM, Pavel Stehule > wrote: > >> >> >> 2018-08-30 16:22 GMT+02:00 Chapman Flack : >> >>> On 08/30/2018 09:57 AM, Hubert Zhang wrote: >>> >>> > 2 Keep one worker process for each database. But using a parent/global >>> > quota worker process to manage the lifecycle of database level worker >>> > processes. It could handle the newly created database(avoid restart >>> > database) and save resource when a database is not used. But this >>> needs to >>> > change worker process to be hierarchical. Postmaster becomes the >>> grandfather >>> > of database level worker processes in this case. >>> >>> I am using background workers this way in 9.5 at $work. >>> >>> In my case, one worker lives forever, wakes up on a set period, and >>> starts a short-lived worker for every database, waiting for each >>> one before starting the next. >>> >>> It was straightforward to implement. Looking back over the code, >>> I see the global worker assigns its own PID to worker.bgw_notify_pid >>> of each of its children, and also obtains a handle for each child >>> from RegisterDynamicBackgroundWorker(). >>> >>> I imagine the global quota worker would prefer to start workers >>> for every database and then just wait for notifications from any >>> of them, but that seems equally straightforward at first glance. >>> >> >> There are servers with thousands databases. Worker per database is not >> good idea. >> >> It should to share ideas, code with autovacuum process. >> >> Not sure, how to effective implementation based on bg workers can be. On >> servers with large set of databases, large set of tables it can identify >> too big table too late. >> >> Isn't better to implement som
Re[3]: Adding a note to protocol.sgml regarding CopyData
Hello Bradley & Tatsuo-san, ... references to the protocol version lacks homogeneity. ... I'd suggest to keep "the vX.0 protocol" for a short version, and "the version X.0 protocol" for long ... I agree. Change made. Patch applies cleanly. Doc build ok. One part talks about "terminating line", the other of "termination line". I wonder whether one is better than the other? -- Fabien.
pg_constraint.conincluding is useless
Hi Already mentioned this in https://postgr.es/m/20180831205020.nxhw6ypysgshjtnl@alvherre.pgsql While trying to add support for foreign keys to partitioned tables, I noticed that commit 8224de4f42cc ("Indexes with INCLUDE columns and their support in B-tree") added a column to pg_constraint that appears to be there only to enable ruleutils.c to print out the list of columns in a PRIMARY KEY or UNIQUE constraint that uses included columns. However, this is pretty easy to obtain from pg_index.conkey instead, so I claim that that column is useless. In fact, here's a patch to remove it. This requires a catversion bump, for which it may seem a bit late; however I think it's better to release pg11 without a useless catalog column only to remove it in pg12 ... Thoughts? -- Álvaro Herrera >From 5e9c74c4dc59afb348b8b3ea5233b7cbf43c0616 Mon Sep 17 00:00:00 2001 From: Alvaro Herrera Date: Sun, 2 Sep 2018 13:35:46 -0300 Subject: [PATCH] remove conincluding --- doc/src/sgml/catalogs.sgml| 8 src/backend/catalog/pg_constraint.c | 21 -- src/backend/utils/adt/ruleutils.c | 57 +-- src/include/catalog/pg_constraint.h | 6 --- src/test/regress/expected/index_including.out | 40 +-- src/test/regress/sql/index_including.sql | 10 ++--- 6 files changed, 70 insertions(+), 72 deletions(-) diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml index 07e8b3325f..0179deea2e 100644 --- a/doc/src/sgml/catalogs.sgml +++ b/doc/src/sgml/catalogs.sgml @@ -2374,14 +2374,6 @@ SCRAM-SHA-256$:&l - conincluding - int2[] - pg_attribute.attnum - List of the non-constrained columns which are included into -the same index as the constrained columns - - - confkey int2[] pg_attribute.attnum diff --git a/src/backend/catalog/pg_constraint.c b/src/backend/catalog/pg_constraint.c index 7a6d158f89..ea84441360 100644 --- a/src/backend/catalog/pg_constraint.c +++ b/src/backend/catalog/pg_constraint.c @@ -85,7 +85,6 @@ CreateConstraintEntry(const char *constraintName, boolnulls[Natts_pg_constraint]; Datum values[Natts_pg_constraint]; ArrayType *conkeyArray; - ArrayType *conincludingArray; ArrayType *confkeyArray; ArrayType *conpfeqopArray; ArrayType *conppeqopArray; @@ -116,21 +115,6 @@ CreateConstraintEntry(const char *constraintName, else conkeyArray = NULL; - if (constraintNTotalKeys > constraintNKeys) - { - Datum *conincluding; - int j = 0; - int constraintNIncludedKeys = constraintNTotalKeys - constraintNKeys; - - conincluding = (Datum *) palloc(constraintNIncludedKeys * sizeof(Datum)); - for (i = constraintNKeys; i < constraintNTotalKeys; i++) - conincluding[j++] = Int16GetDatum(constraintKey[i]); - conincludingArray = construct_array(conincluding, constraintNIncludedKeys, - INT2OID, 2, true, 's'); - } - else - conincludingArray = NULL; - if (foreignNKeys > 0) { Datum *fkdatums; @@ -204,11 +188,6 @@ CreateConstraintEntry(const char *constraintName, else nulls[Anum_pg_constraint_conkey - 1] = true; - if (conincludingArray) - values[Anum_pg_constraint_conincluding - 1] = PointerGetDatum(conincludingArray); - else - nulls[Anum_pg_constraint_conincluding - 1] = true; - if (confkeyArray) values[Anum_pg_constraint_confkey - 1] = PointerGetDatum(confkeyArray); else diff --git a/src/backend/utils/adt/ruleutils.c b/src/backend/utils/adt/ruleutils.c index 03e9a28a63..bd0792e13e 100644 --- a/src/backend/utils/adt/ruleutils.c +++ b/src/backend/utils/adt/ruleutils.c @@ -315,7 +315,7 @@ static char *deparse_expression_pretty(Node *expr, List *dpcontext, static char *pg_get_viewdef_worker(Oid viewoid, int prettyFlags, int wrapColumn); static char *pg_get_triggerdef_worker(Oid trigid, bool pretty); -static void decompile_column_index_array(Datum column_index_array, Oid relId, +static int decompile_column_index_array(Datum column_index_array, Oid relId, StringInfo buf); static char *pg_get_ruledef_worker(Oid ruleoid, int prettyFlags); static char *pg_get_indexdef_worker(Oid indexrelid, int colno, @@ -2055,6 +2055,8 @@ pg_get_constraintdef_worker(Oid constraintId, bool fullCommand, Datum val; boolisnull;
Re: pg_constraint.conincluding is useless
Alvaro Herrera writes: > This requires a catversion bump, for which it may seem a bit late; > however I think it's better to release pg11 without a useless catalog > column only to remove it in pg12 ... Catversion bumps during beta are routine. If we had put out rc1 I'd say it was too late, but we have not. If we do do a bump for beta4, I'd be strongly tempted to address the lack of a unique index for pg_constraint as well, cf https://www.postgresql.org/message-id/10110.1535907...@sss.pgh.pa.us regards, tom lane
Re: pg_constraint.conincluding is useless
On Sun, Sep 02, 2018 at 01:27:25PM -0400, Tom Lane wrote: > Alvaro Herrera writes: > > This requires a catversion bump, for which it may seem a bit late; > > however I think it's better to release pg11 without a useless catalog > > column only to remove it in pg12 ... > > Catversion bumps during beta are routine. If we had put out rc1 > I'd say it was too late, but we have not. At the same time Covering indexes are a new thing, so if the timing allows, let's move on with having a cleaner catalog layer from the start, that's less compatibility breakages to justify later on. Hopefully. > If we do do a bump for beta4, I'd be strongly tempted to address the > lack of a unique index for pg_constraint as well, cf > https://www.postgresql.org/message-id/10110.1535907...@sss.pgh.pa.us Yeah... I looked at the thread. -- Michael signature.asc Description: PGP signature
Re: pg_constraint.conincluding is useless
Greetings, * Tom Lane (t...@sss.pgh.pa.us) wrote: > Alvaro Herrera writes: > > This requires a catversion bump, for which it may seem a bit late; > > however I think it's better to release pg11 without a useless catalog > > column only to remove it in pg12 ... Yeah, I really don't think we want to have another catalog column that we end up wanting to remove later on, if we can avoid it.. > Catversion bumps during beta are routine. If we had put out rc1 > I'd say it was too late, but we have not. I agree that rc1 would be too late. On the flip side, I don't think we should really consider catversion bumps during beta to be 'routine'. > If we do do a bump for beta4, I'd be strongly tempted to address the > lack of a unique index for pg_constraint as well, cf > https://www.postgresql.org/message-id/10110.1535907...@sss.pgh.pa.us All that said, given that we've got two pretty good reasons for a catversion bump, and one is to remove a useless column before it ever gets in a release, I'm +1 for making both of these changes. Thanks! Stephen signature.asc Description: PGP signature
Re: Bug in slot.c and are replication slots ever used at Window?
On Fri, Aug 31, 2018 at 11:46:47AM -0700, Michael Paquier wrote: > The checks will not be included in the final fix, still they look useful > so I am planning to start a new thread on the matter as perhaps other > folks have more and/or better ideas. Pushed the fix down to 9.4, without the extra sanity checks. -- Michael signature.asc Description: PGP signature
Incorrect use of errcode_for_file_access in backend code
Hi all, While looking at another patch for slot.c, I have noticed what looks like incorrect use of errcode_for_file_access: - errcode_for_file_access() is used with rmtree(), which makes no sense as this comes from common/rmtree.c, and a warning already shows up using %m. - ERRCODE_DATA_CORRUPTED is not used to mention corrupted data, and instead errcode_for_file_access() gets called. Wouldn't something like the attached provide more adapted error handling? That's mostly error handling beautification, so I would be incline to just fix HEAD. (I have noticed some inconsistent error string format in autoprewarm.c on the way.) Thoughts? -- Michael diff --git a/contrib/pg_prewarm/autoprewarm.c b/contrib/pg_prewarm/autoprewarm.c index cc5e2dd89c..03bf90ce2d 100644 --- a/contrib/pg_prewarm/autoprewarm.c +++ b/contrib/pg_prewarm/autoprewarm.c @@ -641,7 +641,7 @@ apw_dump_now(bool is_bgworker, bool dump_unlogged) errno = save_errno; ereport(ERROR, (errcode_for_file_access(), - errmsg("could not write to file \"%s\" : %m", + errmsg("could not write to file \"%s\": %m", transient_dump_file_path))); } @@ -664,7 +664,7 @@ apw_dump_now(bool is_bgworker, bool dump_unlogged) errno = save_errno; ereport(ERROR, (errcode_for_file_access(), - errmsg("could not write to file \"%s\" : %m", + errmsg("could not write to file \"%s\": %m", transient_dump_file_path))); } } @@ -684,7 +684,7 @@ apw_dump_now(bool is_bgworker, bool dump_unlogged) errno = save_errno; ereport(ERROR, (errcode_for_file_access(), - errmsg("could not close file \"%s\" : %m", + errmsg("could not close file \"%s\": %m", transient_dump_file_path))); } diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c index cb781e6e9a..800ca14488 100644 --- a/src/backend/replication/slot.c +++ b/src/backend/replication/slot.c @@ -628,8 +628,7 @@ ReplicationSlotDropPtr(ReplicationSlot *slot) */ if (!rmtree(tmppath, true)) ereport(WARNING, -(errcode_for_file_access(), - errmsg("could not remove directory \"%s\"", tmppath))); +(errmsg("could not remove directory \"%s\"", tmppath))); /* * We release this at the very end, so that nobody starts trying to create @@ -1132,8 +1131,8 @@ StartupReplicationSlots(void) if (!rmtree(path, true)) { ereport(WARNING, - (errcode_for_file_access(), - errmsg("could not remove directory \"%s\"", path))); + (errmsg("could not remove directory \"%s\"", +path))); continue; } fsync_fname("pg_replslot", true); @@ -1432,21 +1431,21 @@ RestoreSlotFromDisk(const char *name) /* verify magic */ if (cp.magic != SLOT_MAGIC) ereport(PANIC, -(errcode_for_file_access(), +(errcode(ERRCODE_DATA_CORRUPTED), errmsg("replication slot file \"%s\" has wrong magic number: %u instead of %u", path, cp.magic, SLOT_MAGIC))); /* verify version */ if (cp.version != SLOT_VERSION) ereport(PANIC, -(errcode_for_file_access(), +(errcode(ERRCODE_DATA_CORRUPTED), errmsg("replication slot file \"%s\" has unsupported version %u", path, cp.version))); /* boundary check on length */ if (cp.length != ReplicationSlotOnDiskV2Size) ereport(PANIC, -(errcode_for_file_access(), +(errcode(ERRCODE_DATA_CORRUPTED), errmsg("replication slot file \"%s\" has corrupted length %u", path, cp.length))); @@ -1496,8 +1495,8 @@ RestoreSlotFromDisk(const char *name) if (!rmtree(slotdir, true)) { ereport(WARNING, - (errcode_for_file_access(), - errmsg("could not remove directory \"%s\"", slotdir))); + (errmsg("could not remove directory \"%s\"", + slotdir))); } fsync_fname("pg_replslot", true); return; signature.asc Description: PGP signature
Re: Stored procedures and out parameters
On Thu, Aug 30, 2018 at 4:14 PM, Dave Cramer wrote: > Reading this from the (JDBC) drivers perspective, which is probably a fairly > popular one, > We now have a standard that we can't really support. Either the driver will > have to support > the new PROCEDURE with the {call } mechanism or stay with the existing > FUNCTIONS. > This puts the drivers in a no win situation. I understand and I agree. > Undoubtedly.. surely the opportunity to do something about this has not > passed as this has not been > officially released ? I agree with that, too, but I can't make other people do things they don't want to do, and then there's the question of time. On the basis of previous experience, there is going to be little appetite for holding up the release to address this issue no matter how badly busted it is. Ultimately, it's the RMT that must decide what to do in cases like this. I have confidence that they are watching this thread, but I don't know what they will decide to do about it. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Stored procedures and out parameters
On Thu, Aug 30, 2018 at 7:45 PM, Chapman Flack wrote: > On 08/30/18 15:35, Robert Haas wrote: >> On Tue, Aug 28, 2018 at 6:30 AM, Peter Eisentraut >> wrote: >>> CALL compatible with the SQL standard. For example, if you have a >>> function f1(IN a int, OUT b int), you would call it as SELECT f1(x) >>> and the "b" would somehow be the return value. But a procedure call >>> would be CALL p1(x, y), where x and y could be, say, PL/pgSQL >>> variables. > > I suppose the key question for most driver writers is going to be, > what does that difference look like at the fe-be protocol level? I don't think the issue is so much the FE-BE protocol level as the SQL syntax. > PL/pgSQL might be an unrepresentative example for that question, > as it lives in the backend and could have some other way of retrieving > b to store in y. For any remote client, the result still needs to get > back there before the client can apply any "this result gets assigned > to my y variable" semantics, and is there any material difference between > the protocol message sequences that return these results > > select foo(1,2); > select * from foo(1,2); > call bar(1,2); > > to the client? And, in the parallel universe where functions got > implemented according to the standard, what in that picture would > be different? You may (or may not) be missing the point here. Your first two examples do not obviously involve OUT parameters, although in theory they could, since whatever OUT parameters exist are going to show up in the third one. The third one definitely does not, since CALL apparently would require a variable that could be set to be passed as an argument. I'm not actually sure how it's supposed to work, actually, because the documentation for CALL addresses neither the proper usage nor the incompatibility as compared with functions: https://www.postgresql.org/docs/11/static/sql-call.html -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Why hash OIDs?
On Tue, Aug 28, 2018 at 8:02 PM, Tom Lane wrote: > I think this argument is a red herring TBH. The example Robert shows is > of *zero* interest for dynahash or catcache, unless it's taking only the > low order 3 bits of the OID for the bucket number. But actually we'll > increase the table size proportionally to the number of entries, so > that you can't have say 1000 table entries without at least 10 bits > being used for the bucket number. That means that you'd only have > trouble if those 1000 tables all had OIDs exactly 1K (or some multiple > of that) apart. Such a case sounds quite contrived from here. Hmm. I was thinking that it was a problem if the number of OIDs consumed per table was a FACTOR of 1000, not just if it was a POWER of 1000. I mean, if it's, say, 4, that means three-quarters of your hash table buckets are unused, which seems poor. But maybe it's not really a big enough problem in practice for us to care? Dunno. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: [HACKERS] WIP: long transactions on hot standby feedback replica / proof of concept
On Mon, Aug 27, 2018 at 11:38 AM, Alexander Korotkov wrote: > The aspect I'm more concerned here about is whether we miss ability > for detecting some of IO errors, if we don't distinguish new pages > from pages whose tuples were removed by vacuum. My main concern is correctness. If we ever have valid-looking buffers in shared_buffers after the corresponding data has been truncated away on disk, we've got to make sure that nobody ever confuses one of them with an actually-valid buffer. Reading over your algorithm, I can't convince myself that you have that case nailed down tightly enough. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: typcache.c typos
On Tue, Aug 28, 2018 at 9:39 AM, Tom Lane wrote: >> Opps. I mistakenly attached the incorrect patch. The correct one is >> attached to this email. > > I don't see any typos here; it looks to me more like a difference of > opinion about whether "information" is singular or plural. I don't > really feel a need to change the existing wording. +1. "To speed up lookup" would be OK but I don't think "to speed lookup" is wrong, and "to speedup lookup" does sound wrong. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: logical decoding: ABI break in 10.5 et al
On Tue, Aug 28, 2018 at 11:38 AM, Alvaro Herrera wrote: > [TL;DR: I propose to set up abidiff.postgresql.org.] Nice idea. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: pg_dump --load-via-partition-root vs. parallel restore
On Tue, Aug 28, 2018 at 3:53 PM, Tom Lane wrote: > Parallel pg_restore generally assumes that the archive file is telling it > the truth about data dependencies; in particular, that a TABLE DATA item > naming a particular target table is loading data into exactly that table. > --load-via-partition-root creates a significant probability that that > assumption is wrong, at least in scenarios where the data really does get > redirected into other partitions than the original one. This can result > in inefficiencies (e.g., index rebuild started before a table's data is > really all loaded) or outright failures (foreign keys or RLS policies > applied before the data is all loaded). I suspect that deadlock failures > during restore are also possible, since identify_locking_dependencies > is not going to be nearly close to the truth about which operations > might hold which locks. Hmm. I had the idea that this wasn't a problem because I thought we had all of pg_dump strictly separated into pre-data, data, and post-data; therefore, I thought it would be the case that none of that other stuff would happen until all table data was loaded. From what you are saying here, I guess that's not the case? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: typcache.c typos
On 3 September 2018 at 10:24, Robert Haas wrote: > On Tue, Aug 28, 2018 at 9:39 AM, Tom Lane wrote: >> I don't see any typos here; it looks to me more like a difference of >> opinion about whether "information" is singular or plural. I don't >> really feel a need to change the existing wording. > > +1. "To speed up lookup" would be OK but I don't think "to speed > lookup" is wrong, and "to speedup lookup" does sound wrong. Not sure where changing it to "to speedup lookup" was proposed. In any case, it seems most people are happy the way it is. That's fine for me. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: pg_dump --load-via-partition-root vs. parallel restore
Robert Haas writes: > On Tue, Aug 28, 2018 at 3:53 PM, Tom Lane wrote: >> Parallel pg_restore generally assumes that the archive file is telling it >> the truth about data dependencies; in particular, that a TABLE DATA item >> naming a particular target table is loading data into exactly that table. >> --load-via-partition-root creates a significant probability that that >> assumption is wrong, at least in scenarios where the data really does get >> redirected into other partitions than the original one. This can result >> in inefficiencies (e.g., index rebuild started before a table's data is >> really all loaded) or outright failures (foreign keys or RLS policies >> applied before the data is all loaded). I suspect that deadlock failures >> during restore are also possible, since identify_locking_dependencies >> is not going to be nearly close to the truth about which operations >> might hold which locks. > Hmm. I had the idea that this wasn't a problem because I thought we > had all of pg_dump strictly separated into pre-data, data, and > post-data; therefore, I thought it would be the case that none of that > other stuff would happen until all table data was loaded. From what > you are saying here, I guess that's not the case? We don't run restore operations in parallel during the pre-data phase, mainly because we lack faith that the dependencies are really completely represented for those objects (particularly in older archives). Once we get to parallel operation, though, everything is schedulable as soon as its dependencies are satisfied; so it's possible for post-data objects to run before supposedly-unrelated data objects are done. There's no explicit representation of the data/post-data boundary in the archive's dependencies. regards, tom lane
Re: libpq debug log
On Fri, Aug 24, 2018 at 9:48 AM, Tom Lane wrote: > PQtrace() is utterly useless for anything except debugging libpq > internals, and it's not tremendously useful even for that. Don't > bother with that part. I think that improving the output format could help with that a lot. What it current produces is almost unreadable; adjusting it to emit one line per protocol message would, I think, help a lot. There are probably other improvements that could be made at the same time. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: libpq debug log
Robert Haas writes: > On Fri, Aug 24, 2018 at 9:48 AM, Tom Lane wrote: >> PQtrace() is utterly useless for anything except debugging libpq >> internals, and it's not tremendously useful even for that. Don't >> bother with that part. > I think that improving the output format could help with that a lot. > What it current produces is almost unreadable; adjusting it to emit > one line per protocol message would, I think, help a lot. There are > probably other improvements that could be made at the same time. I wouldn't mind throwing it out and reimplementing it ;-) ... tracing at the logical message level rather than the byte level would help. But still, I'm not really convinced that it has very much to do with what you'd want in a user-level debug log. Part of what's really bad about the PQtrace output is that in v2 protocol the same output can be repeated several times as we try to parse a message and conclude we don't have it all yet. I believe that problem is gone in v3, but it may be hard to do a consistent redesign until we nuke libpq's v2 support. Still, it might be past time for the latter, seeing that we no longer promise pre-7.4 compatibility in either psql or pg_dump. regards, tom lane
Re: memory leak when serializing TRUNCATE in reorderbuffer
I've pushed this, with some minor tweak, and backpatched to 11 (which is where TRUNCATE decoding was introduced). regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [PATCH] Improve geometric types
On 08/17/2018 11:23 PM, Tom Lane wrote: > Tomas Vondra writes: >> Hmm, yeah. Based on past experience, the powerpc machines are likely to >> stumble on this. > >> FWIW my understanding is that these failures actually happen in new >> tests, it's not an issue introduced by this patch series. > > Yeah, we've definitely hit such problems before. The geometric logic > seems particularly prone to it because it's doing more and subtler > float arithmetic than the rest of the system ... but it's not the sole > source of minus-zero issues. > It's not entirely clear to me what's the best way forward with this. We can go through the patch and fix places with obvious -0 risks, but then what? I don't have access to any powepc machines, so I can't check if there are any other failures. So the only thing I could do is commit and fix based on buildfarm failures ... Emre, any better ideas? regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Proposal for disk quota feature
Thanks Pavel. Your patch did enforcement on storage level(md.c or we could also use smgr_extend). It's straight forward. But I prefer to implement disk_quota as a feature with following objectives: 1 set/alter disk quota setting on different database objects, e.g. user, database, schema etc. not only a general GUC, but we could set separate quota limit for a specific objects. 2 enforcement operator should work at two positions: before query is running and when query is running. The latter one's implementation maybe similar to your patch. On Sun, Sep 2, 2018 at 8:44 PM, Pavel Stehule wrote: > Hi > > 2018-09-02 14:18 GMT+02:00 Hubert Zhang : > >> Thanks Chapman. >> @Pavel, could you please explain more about your second suggestion >> "implement >> some quotas on storage level?" >> > > See attached patch - it is very simple - and good enough for our purposes. > > Regards > > Pavel > > > >> We will not keep the long-lived processes attach to all databases(just >> like you mentioned servers with thousands of databases) >> And you are right, we could share ideas with autovacuum process, fork >> worker processes in need. >> "autovacuum checks for tables that have had a large number of inserted, >> updated or deleted tuples. These checks use the statistics collection >> facility" >> diskquota process is similar to autovacuum at caring about insert, but >> the difference is that it also care about vucuum full, truncate and drop. >> While update and delete may not be interested since no file change happens. >> So a separate diskquota process is preferred. >> >> So if we implemented disk quota as a full native feature, and in the >> first initial version I prefer to implement the following features: >> 1 Fork diskquota launcher process under Postmaster serverloop, which is >> long-lived. >> 2 Diskquota launcher process is responsible for creating diskquota >> worker process for every database. >> 3 DIskquota setting is stored in a separate catalog table for each >> database. >> 4 Initialization stage, Diskquota launcher process creates diskquota worker >> process for all the databases(traverse like autovacuum). Worker process >> calculates disk usage of db objects and their diskquota setting. If any >> db object exceeds its quota limit, put them into the blacklist in the >> shared memory, which will later be used by enforcement operator. Worker >> process exits when works are done. >> 5 Running stage, Diskquota launcher process creates diskquota worker >> process for the database with a large number of insert, copy, truncate, >> drop etc. or create disk quota statement. Worker process updates the file >> size for db objects containing the result relation, and compare with the >> diskquota setting. Again, if exceeds quota limit, put them into blacklist, >> remove from blacklist vice versa. Worker process exits when works are >> done and a GUC could control the frequency of worker process restart to a >> specific database. As you know, this GUC also controls the delay when we do >> enforcement. >> 6 Enforcement. When postgres backend executes queries, check the >> blacklist in shared memory to determine whether the query is allowed(before >> execute) or need rollback(is executing)? >> >> If we implemented disk quota as an extension, we could just use >> background worker to start diskquota launcher process and use >> RegisterDynamicBackgroundWorker() to fork child diskquota worker >> processes by the launcher process as suggested by @Chapman. >> Diskquota setting could be stored in user table in a separate schema for >> each database(Schema and table created by create extension statement) just >> like what Heikki has done in pg_quota project. But in this case, we need to >> create extension for each database before diskquota worker process can be >> set up for that database. >> >> Any comments on the above design and which is preferred, native feature >> or extension as the POC? >> >> >> -- Hubert >> >> >> >> On Fri, Aug 31, 2018 at 3:32 AM, Pavel Stehule >> wrote: >> >>> >>> >>> 2018-08-30 16:22 GMT+02:00 Chapman Flack : >>> On 08/30/2018 09:57 AM, Hubert Zhang wrote: > 2 Keep one worker process for each database. But using a parent/global > quota worker process to manage the lifecycle of database level worker > processes. It could handle the newly created database(avoid restart > database) and save resource when a database is not used. But this needs to > change worker process to be hierarchical. Postmaster becomes the grandfather > of database level worker processes in this case. I am using background workers this way in 9.5 at $work. In my case, one worker lives forever, wakes up on a set period, and starts a short-lived worker for every database, waiting for each one before starting the next. It was straightforward to implement. Looking back over the code, I see the global worker assigns its own PID to worker.bgw_
Re: [PATCH] Improve geometric types
Tomas Vondra writes: > On 08/17/2018 11:23 PM, Tom Lane wrote: >> Yeah, we've definitely hit such problems before. The geometric logic >> seems particularly prone to it because it's doing more and subtler >> float arithmetic than the rest of the system ... but it's not the sole >> source of minus-zero issues. > We can go through the patch and fix places with obvious -0 risks, but > then what? I don't have access to any powepc machines, so I can't check > if there are any other failures. So the only thing I could do is commit > and fix based on buildfarm failures ... That's the usual solution. I don't see anything particularly wrong with it. If the buildfarm results suggest a whole mess of problems, it might be appropriate to revert and rethink; but you shouldn't be afraid to use the buildfarm as a testbed. regards, tom lane
Re: pg_verify_checksums failure with hash indexes
On Sat, Sep 1, 2018 at 10:28 AM Dilip Kumar wrote: > > On Sat, Sep 1, 2018 at 8:22 AM, Robert Haas wrote: > > On Thu, Aug 30, 2018 at 7:27 AM, Amit Kapila > > wrote: > > > > I wouldn't bother bumping HASH_VERSION. First, the fix needs to be > > back-patched, and you certainly can't back-patch a HASH_VERSION bump. > > Second, you should just pick a formula that gives the same answer as > > now for the cases where the overrun doesn't occur, and some other > > sufficiently-value for the cases where an overrun currently does > > occur. If you do that, you're not changing the behavior in any case > > that currently works, so there's really no reason for a version bump. > > It just becomes a bug fix at that point. > > makes sense. > > I think if we compute with below formula which I suggested upthread > > #define HASH_MAX_BITMAPS Min(BLCKSZ / 8, 1024) > > then for BLCKSZ 8K and bigger, it will remain the same value where it > does not overrun. And, for the small BLCKSZ, I think it will give > sufficient space for the hash map. If the BLCKSZ is 1K then the sizeof > (HashMetaPageData) + sizeof (HashPageOpaque) = 968 which is very close > to the BLCKSZ. > Yeah, so at 1K, the value of HASH_MAX_BITMAPS will be 128 as per above formula which is what it was its value prior to the commit 620b49a1. I think it will be better if you add a comment in your patch indicating the importance/advantage of such a formula. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: Proposal for disk quota feature
2018-09-03 3:49 GMT+02:00 Hubert Zhang : > Thanks Pavel. > Your patch did enforcement on storage level(md.c or we could also use > smgr_extend). It's straight forward. > But I prefer to implement disk_quota as a feature with following > objectives: > 1 set/alter disk quota setting on different database objects, e.g. user, > database, schema etc. not only a general GUC, but we could set separate > quota limit for a specific objects. > 2 enforcement operator should work at two positions: before query is > running and when query is running. The latter one's implementation maybe > similar to your patch. > The patch was just example. The resource quotes should be more complex - per partition, table, schema, database, user - so GUC are possible, but not very user friendly. Our case is specific, but not too much. The servers are used for multidimensional analyses - and some tables can grow too fast (COPY, INSERT SELECT). We need to solve limits immediately. The implementation is simple, so I did it. Same implementation on database level, or schema level needs some more locks, so it will not be too effective. The resource management can be complex very complex, and I expect so it will be hard work. Regards Pavel > On Sun, Sep 2, 2018 at 8:44 PM, Pavel Stehule > wrote: > >> Hi >> >> 2018-09-02 14:18 GMT+02:00 Hubert Zhang : >> >>> Thanks Chapman. >>> @Pavel, could you please explain more about your second suggestion >>> "implement >>> some quotas on storage level?" >>> >> >> See attached patch - it is very simple - and good enough for our >> purposes. >> >> Regards >> >> Pavel >> >> >> >>> We will not keep the long-lived processes attach to all databases(just >>> like you mentioned servers with thousands of databases) >>> And you are right, we could share ideas with autovacuum process, fork >>> worker processes in need. >>> "autovacuum checks for tables that have had a large number of inserted, >>> updated or deleted tuples. These checks use the statistics collection >>> facility" >>> diskquota process is similar to autovacuum at caring about insert, but >>> the difference is that it also care about vucuum full, truncate and drop. >>> While update and delete may not be interested since no file change happens. >>> So a separate diskquota process is preferred. >>> >>> So if we implemented disk quota as a full native feature, and in the >>> first initial version I prefer to implement the following features: >>> 1 Fork diskquota launcher process under Postmaster serverloop, which is >>> long-lived. >>> 2 Diskquota launcher process is responsible for creating diskquota >>> worker process for every database. >>> 3 DIskquota setting is stored in a separate catalog table for each >>> database. >>> 4 Initialization stage, Diskquota launcher process creates diskquota worker >>> process for all the databases(traverse like autovacuum). Worker process >>> calculates disk usage of db objects and their diskquota setting. If any >>> db object exceeds its quota limit, put them into the blacklist in the >>> shared memory, which will later be used by enforcement operator. Worker >>> process exits when works are done. >>> 5 Running stage, Diskquota launcher process creates diskquota worker >>> process for the database with a large number of insert, copy, truncate, >>> drop etc. or create disk quota statement. Worker process updates the file >>> size for db objects containing the result relation, and compare with the >>> diskquota setting. Again, if exceeds quota limit, put them into blacklist, >>> remove from blacklist vice versa. Worker process exits when works are >>> done and a GUC could control the frequency of worker process restart to a >>> specific database. As you know, this GUC also controls the delay when we do >>> enforcement. >>> 6 Enforcement. When postgres backend executes queries, check the >>> blacklist in shared memory to determine whether the query is allowed(before >>> execute) or need rollback(is executing)? >>> >>> If we implemented disk quota as an extension, we could just use >>> background worker to start diskquota launcher process and use >>> RegisterDynamicBackgroundWorker() to fork child diskquota worker >>> processes by the launcher process as suggested by @Chapman. >>> Diskquota setting could be stored in user table in a separate schema for >>> each database(Schema and table created by create extension statement) just >>> like what Heikki has done in pg_quota project. But in this case, we need to >>> create extension for each database before diskquota worker process can be >>> set up for that database. >>> >>> Any comments on the above design and which is preferred, native feature >>> or extension as the POC? >>> >>> >>> -- Hubert >>> >>> >>> >>> On Fri, Aug 31, 2018 at 3:32 AM, Pavel Stehule >>> wrote: >>> 2018-08-30 16:22 GMT+02:00 Chapman Flack : > On 08/30/2018 09:57 AM, Hubert Zhang wrote: > > > 2 Keep one worker process for each database. But using a > p
Re: pg_verify_checksums failure with hash indexes
On Tue, Aug 28, 2018 at 6:43 PM Michael Paquier wrote: > > On Tue, Aug 28, 2018 at 11:21:34AM +0200, Peter Eisentraut wrote: > > The files in question correspond to > > > > hash_i4_index > > hash_name_index > > hash_txt_index > > The hash index code has been largely refactored in v10, so I would > imagine that you can see the problem as well there. > > [... digging digging ...] > > And indeed I can see the problem in 10 as well with my own pg_checksums, > and I can see hash_f8_index with a problem on top of what Peter has > reported. > AFAICS, this problem exists in 9.6 and prior branches as well, although, I can't test it. I am not sure whether we need to backpatch this beyond 10 (where hash indexes are WAL logged) as prior to that hash-indexes are anyway not-reliable. What's your opinion? -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: Hint to set owner for tablespace directory
On Fri, Aug 31, 2018 at 4:11 PM, Maksim Milyutin wrote: > On 08/31/2018 01:12 PM, Rafia Sabih wrote: > > > > On Fri, Aug 31, 2018 at 3:30 PM, Maksim Milyutin > wrote: > >> On 08/31/2018 11:55 AM, Rafia Sabih wrote: >> >> >> Adding to that this patch needs a rebase. And, please don't forget to run >> 'git diff --check', as it has some white-space issues. >> >> >> git apply /home/edb/Desktop/patches/others/owner_tbsp_hint.patch > /home/edb/Desktop/patches/others/owner_tbsp_hint.patch:9: trailing > whitespace. > { > /home/edb/Desktop/patches/others/owner_tbsp_hint.patch:10: trailing > whitespace. > bool wrong_owner = (errno == EPERM); > > > This is hex+ASCII display (from *hexdump -C owner_tbsp_hint.patch* output) > of code fragment above: > > 01a0 0a 20 09 09 65 6c 73 65 0a 2b 09 09 7b 0a 2b 09 |. > ..else.+..{.+.| > 01b0 09 09 62 6f 6f 6c 20 77 72 6f 6e 67 5f 6f 77 6e |..bool > wrong_own| > 01c0 65 72 20 3d 20 28 65 72 72 6e 6f 20 3d 3d 20 45 |er = (errno > == E| > 01d0 50 45 52 4d 29 3b 0a 2b 0a 20 09 09 09 65 72 65 |PERM);.+. > ...ere| > > Problem lines don't have trailing whitespaces, only line feed (0x0a) > symbols > > This is what I got. Please correct me if I am missng anything. I am using > centos 7, just an FYI. > > > My git version is 2.7.4. > > I am using version 2.12.1, that might be the reason for the discrepancy. -- Regards, Rafia Sabih EnterpriseDB: http://www.enterprisedb.com/
Re: Adding a note to protocol.sgml regarding CopyData
> Hello Bradley & Tatsuo-san, > >>> ... references to the protocol version lacks homogeneity. >>> ... I'd suggest to keep "the vX.0 protocol" for a short version, >>> and "the version X.0 protocol" for long ... >> >> I agree. Change made. > > Patch applies cleanly. Doc build ok. > > One part talks about "terminating line", the other of "termination > line". I wonder whether one is better than the other? I am not a native English speaker so I maybe wrong... for me, current usage of "terminating line", and "termination line" looks correct. The former refers to concrete example thus uses "the", while the latter refers to more general case thus uses "an". BTW, I think the patch should apply to master and REL_11_STABLE branches at least. But should this be applied to other back branches as well? Best regards, -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese:http://www.sraoss.co.jp
Re: speeding up planning with partitions
Thank you Kato-san for testing. On 2018/08/31 19:48, Kato, Sho wrote: > Hi, Amit > > Great! > With the total number of records being 6400, I benchmarked while increasing > the number of partitions from 100 to 6400. > Applying three all patches, 20% performance improved for 100 partitions. > > I have the same number of records for each partition, do you do the same? I didn't load any data into tables when running the tests, because these patches are meant for reducing planner latency. More specifically, they're addressed to fix the current planner behavior that its latency increases with increasing number of partitions, with focus on the common case where only a single partition will need to be scanned by a given query. I'd try to avoid using a benchmark whose results is affected by anything other than the planning latency. It will be a good idea if you leave the tables empty and just vary the number of partitions and nothing else. Also, we're interested in planning latency, so using just SELECT and UPDATE in your benchmark script will be enough, because their planning time is affected by the number of partitions. No need to try to measure the INSERT latency, because its planning latency is not affected by the number of partitions. Moreover, I'd rather suggest you take out the INSERT statement from the benchmark for now, because its execution time does vary unfavorably with increasing number of partitions. Sure, there are other patches to address that, but it's better not to mix the patches to avoid confusion. > Also, in my case, performance was better when not prepare. Patches in this thread do nothing for the execution, so, there is no need to compare prepared vs non-prepared. In fact, just measure non-prepared tps and latency, because we're only interested in planning time here. > I think these patches do not improve execute case, so we need faster runtime > pruning patch[1], right? We already have run-time pruning code (that is the code in the patch you linked) committed into the tree in PG 11, so the master tree also has it. But since we're not interested in execution time, no need to worry about run-time pruning. > Details of measurement conditions and results are as follows. > - base source > master(@777e6ddf17) + Speeding up Insert v8 patch[1] > > - table definition(e.g. 100 partition) > create table test.accounts(aid serial, abalance int) > partition by range(aid); > create table test.accounts_history(id serial, aid int, delta int, > mtime timestamp without time zone) > partition by range(aid); > > - command option > pgbench -d testdb -f benchmark.pgbench -T 180 -r -n -M prepare > pgbench -d testdb -f benchmark.pgbench -T 180 -r -n > > -results > base source no prepared >part_num | tps_ex | update_latency | select_latency | insert_latency > --++++ > 100 | 662.414805 | 0.357 | 0.265 | 0.421 > 200 | 494.478431 | 0.439 | 0.349 | 0.579 > 400 | 307.982089 | 0.651 | 0.558 | 0.927 > 800 | 191.360676 | 0.979 | 0.876 | 1.548 >1600 | 75.344947 | 2.253 | 2.003 | 4.301 >3200 | 30.643902 | 5.716 | 4.955 | 10.118 >6400 | 16.726056 | 12.512 | 8.582 | 18.054 > > 0001 no prepared >part_num | tps_ex | update_latency | select_latency | insert_latency > --++++ > 100 | 429.816329 | 0.811 | 0.75 | 0.365 > 200 | 275.211531 | 1.333 | 1.248 | 0.501 > 400 | 160.499833 | 2.384 | 2.252 | 0.754 > 800 | 79.387776 | 4.935 | 4.698 | 1.468 >1600 | 24.787377 | 16.593 | 15.954 | 4.302 >3200 | 9.846421 | 42.96 | 42.139 | 8.848 >6400 | 4.919772 | 87.43 | 83.109 | 16.56 Hmm, since 0001 is meant to improve update planning time, it seems odd that you'd get poorer results compared to base source. But, it seems base source is actually master + the patch to improve the execution time of update, so maybe that patch is playing a part here, although I'm not sure why even that's making this much of a difference. I suggest that you use un-patched master as base source, that is, leave out any patches to improve execution time. [ ... ] > 0001 + 0002 no prepared >part_num | tps_ex | update_latency | select_latency | insert_latency > --++++ > 100 | 682.53091 | 0.388 | 0.35 | 0.35 > 200 | 469.906601 | 0.543 | 0.496 |
Re: Undo logs
On Sun, Sep 2, 2018 at 12:18 AM, Thomas Munro wrote: > On Fri, Aug 31, 2018 at 10:24 PM Dilip Kumar > wrote: >> On Fri, Aug 31, 2018 at 3:08 PM, Dilip Kumar wrote: >> > As Thomas has already mentioned upthread that we are working on an >> > undo-log based storage and he has posted the patch sets for the lowest >> > layer called undo-log-storage. >> > >> > This is the next layer which sits on top of the undo log storage, >> > which will provide an interface for prepare, insert, or fetch the undo >> > records. This layer will use undo-log-storage to reserve the space for >> > the undo records and buffer management routine to write and read the >> > undo records. > > I have also pushed a new WIP version of the lower level undo log > storage layer patch set to a public branch[1]. I'll leave the earlier > branch[2] there because the record-level patch posted by Dilip depends > on it for now. Rebased undo_interface patches on top of the new branch of undo-log-storage[1]. [1] https://github.com/EnterpriseDB/zheap/tree/undo-log-storage-v2 -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com undo_interface_v2.patch Description: Binary data undo_interface_test_v2.patch Description: Binary data
RE: libpq debug log
> "Iwata, Aya" writes: > > I'm going to propose libpq debug log for analysis of queries on the > > application > side. > > I think that it is useful to determine whether the cause is on the > > application > side or the server side when a slow query occurs. > > Hm, how will you tell that really? And what's the advantage over the existing > server-side query logging capability? The log I would like to propose is used when the performance issue happen, system administrator knows the process of application internally and check if there is any problem. "debug" is not the correct description of the feature. The correct one should be "trace". Should I create another thread? > > The provided information is "date and time" at which execution of processing > is started, "query", "application side processing", which is picked up > information from PQtrace(), and "connection id", which is for uniquely > identifying the connection. > > PQtrace() is utterly useless for anything except debugging libpq internals, > and it's not tremendously useful even for that. Don't bother with that part. My initial intention was to get only useful information from PQTrace () since it acquires a lot of information. Is there another way to obtain application side information besides PQTrace() ? > Where will you get a "unique connection id" from? When collecting trace log file in the application side, I think it is necessary to identify different connection. In order to do this, when new empty PQconn structure is created, new connection id is also created. Then we output it in the trace log file for one application. > How will you deal with asynchronously-executed queries --- either the > PQgetResult style, or the single-row-at-a-time style? In my understanding, PQgetResult style outputs logs of multiple result queries at once, While the single-row-at-a-time style outputs log for each query. Is this correct? I think PQgetResult style is better, because this style traces the internal process of the application. Regards, Aya Iwata