Re: Allow to_date() and to_timestamp() to accept localized names
On 2020-01-24 19:01, Peter Eisentraut wrote: postgres=# select to_char(now(),'TMmonth'); to_char ιανουαρίου (1 row) which is the genitive of ιανουάριος. You use the genitive form for a date (24th of January) but the nominative otherwise. But the reverse mapping can only take one of these forms. So here select to_date('Ιανουαριος', 'TMMonth'); fails, which is bad. For the record, the correct form of that would appear to be select to_date('Ιανουάριος', 'TMMonth'); with the accent. I had tried different variations of that and they all failed. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Parallel grouping sets
On Sat, Jan 25, 2020 at 4:22 AM Jesse Zhang wrote: > > On Thu, Jan 23, 2020 at 2:47 AM Amit Kapila wrote: > > > > On Sun, Jan 19, 2020 at 2:23 PM Richard Guo wrote: > > > > > > I realized that there are two patches in this thread that are > > > implemented according to different methods, which causes confusion. > > > > > > > Both the idea seems to be different. Is the second approach [1] > > inferior for any case as compared to the first approach? Can we keep > > both approaches for parallel grouping sets, if so how? If not, then > > won't the code by the first approach be useless once we commit second > > approach? > > > > > > [1] - > > https://www.postgresql.org/message-id/CAN_9JTwtTTnxhbr5AHuqVcriz3HxvPpx1JWE--DCSdJYuHrLtA%40mail.gmail.com > > > > I glanced over both patches. Just the opposite, I have a hunch that v3 > is always better than v5. > This is what I also understood after reading this thread. So, my question is why not just review v3 and commit something on those lines even though it would take a bit more time. It is possible that if we decide to go with v5, we can make it happen earlier, but later when we try to get v3, the code committed as part of v5 might not be of any use or if it is useful, then in which cases? -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [WIP] UNNEST(REFCURSOR): allowing SELECT to consume data from a REFCURSOR
> On 19 Jan 2020, at 22:30, Dent John wrote: > > I have not yet made steps towards documentation, nor yet rebased, so the > Makefile chunk will probably still fail. Attached patch addresses these points, so should now apply cleanly agains dev. I also changed the OID assigned to ROWS_IN and its support function. In passing, I noticed there is one existing function that can consume and make good use of ROWS_IN’s result when used in the target list, which is row_to_json. This is good, as it makes ROWS_IN useful even outside of a change to allow results in the FROM to be pipelined. I’ve called out row_to_json specifically in the documentation change. denty. unnest-refcursor-v5a.patch Description: Binary data
Re: Greatest Common Divisor
On Mon, 20 Jan 2020 at 08:04, Vik Fearing wrote: > > On 20/01/2020 08:44, Dean Rasheed wrote: > >> > > I see this has been marked RFC. I'll take it, > Committed with some adjustments, mostly cosmetic but a couple more substantive: The code to guard against a floating point exception with inputs of (INT_MIN, -1) wasn't quite right because it actually just moved the problem so that it would fall over with inputs of (INT_MIN, +1). The convention in numeric.c is that the xxx_var() functions take *pointers* to their NumericVar arguments rather than copies, and they do not modify their inputs, as indicated by the use of "const". You might just have gotten away with what you were doing, but I think it was bad style and potentially unsafe -- for example, someone calling gcd_var() with a NumericVar that came from some other computation and having a non-null buf would risk having the buf freed in the copy, leaving the original NumericVar with a buf pointing to freed memory. Regards, Dean
Re: pg_stat_bgwriter.buffers_backend is pretty meaningless (and more?)
On Fri, Jan 24, 2020 at 8:52 PM Andres Freund wrote: > > Hi, > > Currently pg_stat_bgwriter.buffers_backend is pretty useless to gauge > whether backends are doing writes they shouldn't do. That's because it > counts things that are either unavoidably or unlikely doable by other > parts of the system (checkpointer, bgwriter). > In particular extending the file can not currently be done by any > another type of process, yet is counted. When using a buffer access > strategy it is also very likely that writes have to be done by the > 'dirtying' backend itself, as the buffer will be reused soon after (when > not previously in s_b that is). Yeah. That's quite annoying. > Additionally pg_stat_bgwriter.buffers_backend also counts writes done by > autovacuum et al. > > > I think it'd make sense to at least split buffers_backend into > buffers_backend_extend, > buffers_backend_write, > buffers_backend_write_strat > > but it could also be worthwhile to expand it into > buffers_backend_extend, > buffers_{backend,checkpoint,bgwriter,autovacuum}_write > buffers_{backend,autovacuum}_write_stat Given that these are individual global counters, I don't really see any reason not to expand it to the bigger set of counters. It's easy enough to add them up together later if needed. > Possibly by internally, in contrast to SQL level, having just counter > arrays indexed by backend types. > > > It's also noteworthy that buffers_backend is accounted in an absurd > manner. One might think that writes are accounted from backend -> shared > memory or such. But instead it works like this: > > 1) backend flushes buffer in bufmgr.c, accounts for backend *write time* > 2) mdwrite writes and registers a sync request, which forwards the sync > request to checkpointer > 3) ForwardSyncRequest(), when not called by bgwriter, increments > CheckpointerShmem->num_backend_writes > 4) checkpointer, whenever doing AbsorbSyncRequests(), moves >CheckpointerShmem->num_backend_writes to >BgWriterStats.m_buf_written_backend (local memory!) > 5) Occasionally it calls pgstat_send_bgwriter(), which sends the data to >pgstat (which bgwriter also does) > 6) Which then updates the shared memory used by the display functions > > Worthwhile to note that backend buffer read/write *time* is accounted > differently. That's done via pgstat_send_tabstat(). > > > I think there's very little excuse for the indirection via checkpointer, > besides architectually being weird, it actually requires that we > continue to wake up checkpointer over and over instead of optimizing how > and when we submit fsync requests. > > As far as I can tell we're also simply not accounting at all for writes > done outside of shared buffers. All writes done directly through > smgrwrite()/extend() aren't accounted anywhere as far as I can tell. > > > I think we also count things as writes that aren't writes: mdtruncate() > is AFAICT counted as one backend write for each segment. Which seems > weird to me. It's at least slightly weird :) Might it be worth counting truncate events separately? > Lastly, I don't understand what the point of sending fixed size stats, > like the stuff underlying pg_stat_bgwriter, through pgstats IPC. While > I don't like it's architecture, we obviously need something like pgstat > to handle variable amounts of stats (database, table level etc > stats). But that doesn't at all apply to these types of global stats. That part has annoyed me as well a few times. +1 for just moving that into a global shared memory. Given that we don't really care about things being in sync between those different counters *or* if we loose a bit of data (which the stats collector is designed to do), we could even do that without a lock? -- Magnus Hagander Me: https://www.hagander.net/ Work: https://www.redpill-linpro.com/
Re: [Proposal] Global temporary tables
Thank you for review patch. > 2020年1月24日 下午4:20,Konstantin Knizhnik 写道: > > > > On 23.01.2020 19:28, 曾文旌(义从) wrote: >> >> I'm trying to improve this part of the implementation in >> global_temporary_table_v7-pg13.patch >> Please check my patch and give me feedback. >> >> >> Thanks >> >> Wenjing >> >> > > Below is my short review of the patch: > > +/* > + * For global temp table only > + * use AccessExclusiveLock for ensure safety > + */ > +{ > +{ > +"on_commit_delete_rows", > +"global temp table on commit options", > +RELOPT_KIND_HEAP | RELOPT_KIND_PARTITIONED, > +ShareUpdateExclusiveLock > +}, > +true > +}, > > > The comment seems to be confusing: it says about AccessExclusiveLock but > actually uses ShareUpdateExclusiveLock. There is a problem with the comment description, I will fix it. > > -Assert(TransactionIdIsNormal(onerel->rd_rel->relfrozenxid)); > -Assert(MultiXactIdIsValid(onerel->rd_rel->relminmxid)); > +Assert((RELATION_IS_GLOBAL_TEMP(onerel) && onerel->rd_rel->relfrozenxid > == InvalidTransactionId) || > +(!RELATION_IS_GLOBAL_TEMP(onerel) && > TransactionIdIsNormal(onerel->rd_rel->relfrozenxid))); > +Assert((RELATION_IS_GLOBAL_TEMP(onerel) && onerel->rd_rel->relminmxid == > InvalidMultiXactId) || > +(!RELATION_IS_GLOBAL_TEMP(onerel) && > MultiXactIdIsValid(onerel->rd_rel->relminmxid))); > > It is actually equivalent to: > > Assert(RELATION_IS_GLOBAL_TEMP(onerel) ^ > TransactionIdIsNormal(onerel->rd_rel->relfrozenxid); > Assert(RELATION_IS_GLOBAL_TEMP(onerel) ^ > MultiXactIdIsValid(onerel->rd_rel->relminmxid)); Yes, Thank you for your points out, It's simpler. > > +/* clean temp relation files */ > +if (max_active_gtt > 0) > +RemovePgTempFiles(); > + > /* > > I wonder why do we need some special check for GTT here. > From my point of view cleanup at startup of local storage of temp tables > should be performed in the same way for local and global temp tables. After oom kill, In autovacuum, the Isolated local temp table will be cleaned like orphan temporary tables. The definition of local temp table is deleted with the storage file. But GTT can not do that. So we have the this implementation in my patch. If you have other solutions, please let me know. > > > -new_rel_reltup->relfrozenxid = relfrozenxid; > -new_rel_reltup->relminmxid = relminmxid; > +/* global temp table not remember transaction info in catalog */ > +if (relpersistence == RELPERSISTENCE_GLOBAL_TEMP) > +{ > +new_rel_reltup->relfrozenxid = InvalidTransactionId; > +new_rel_reltup->relminmxid = InvalidMultiXactId; > +} > +else > +{ > +new_rel_reltup->relfrozenxid = relfrozenxid; > +new_rel_reltup->relminmxid = relminmxid; > +} > + > > > Why do we need to do it for GTT? > Did you check that there will be no problems with GTT in case of XID > wraparound? > Right now if you create temp table and keep session open, then it will block > XID wraparound. In my design 1 Because different sessions have different transaction information, I choose to store the transaction information of GTT in MyProc,not catalog. 2 About the XID wraparound problem, the reason is the design of the temp table storage(local temp table and global temp table) that makes it can not to do vacuum by autovacuum. It should be completely solve at the storage level. > > +/* We allow to drop global temp table only this session use it */ > +if (RELATION_IS_GLOBAL_TEMP(rel)) > +{ > +if (is_other_backend_use_gtt(rel->rd_node)) > +elog(ERROR, "can not drop relation when other backend attached > this global temp table"); > +} > + > > Here we once again introduce incompatibility with normal (permanent) tables. > Assume that DBA or programmer need to change format of GTT. But there are > some active sessions which have used this GTT sometime in the past. > We will not be able to drop this GTT until all this sessions are terminated. > I do not think that it is acceptable behaviour. In fact, The dba can still complete the DDL of the GTT. I've provided a set of functions for this case. If the dba needs to modify a GTT A(or drop GTT or create index on GTT), he needs to do: 1 Use the pg_gtt_attached_pids view to list the pids for the session that is using the GTT A. 2 Use pg_terminate_backend(pid)terminate they except itself. 3 Do alter GTT A. > > +LOCKMODElockmode = AccessExclusiveLock; > + > +/* truncate global temp table only need RowExclusiveLock */ > +if (get_rel_persistence(rid) == RELPERSISTENCE_GLOBAL_TEMP) > +lockmode = RowExclusiveLock; > > > What are the reasons of using RowExclusiveLock for GTT instead of > AccessExclusiveLock? > Yes, GTT data is access only by one backend so no locking here seems to be > needed at all. > B
Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)
Hi, I took a look at this patch. With some additions I think the feature itself is useful but the patch needs more work. It also doesn't have any of its own automated tests yet so the testing below was done manually. The attached file, kms_v2.patch, is a rebased version of the kms_v1.patch that fixes some bit rot. It sorts some of the Makefile additions but otherwise is the original patch. This version applies cleanly on master and passes make check. I don't have a Windows machine to test it, but I think the Windows build files for these changes are missing. The updated src/common/Makefile has a comment to coordinate updates to Mkvcbuild.pm but I don't see kmgr_utils.c or cipher_openssl.c referenced anywhere in there. The patch adds "pg_kmgr" to the list of files to skip in pg_checksums.c but there's no additional "pg_kmgr" file written to the data directory. Perhaps that's from a prior version that saved data to its own file? The constant AES128_KEY_LEN is defined in cipher.c but it's not used anywhere. RE: AES-128, not sure the value of even supporting it for this feature (v.s. just supporting AES-256). Unlike something like table data encryption, I'd expect a KMS to be used much less frequently so any performance boost of AES-128 vs AES-256 would be meaningless. The functions pg_cipher_encrypt(...), pg_cipher_decrypt(...), and pg_compute_HMAC(...) return true if OpenSSL is not configured. Should that be false? The ctx init functions all return false when not configured. I don't think that code path would ever be reached as you would not have a valid context but seems more consistent to have them all return false. There's a comment referring to "Encryption keys (TDEK and WDEK) length" but this feature is only for a KMS so that should be renamed. The passphrase is hashed to split it into two 32-byte keys but the min length is only 8-bytes: #define KMGR_MIN_PASSPHRASE_LEN 8 ... that should be at least 64-bytes to reflect how it's being used downstream. Depending on the format of the passphrase commands output it should be even longer (ex: binary data in hex should really be double that). The overall min should be 64-byte but maybe add a note to the docs to explain how the output will be used and the expected amount of entropy. In pg_kmgr_wrap(...) it checks that the input is a multiple of 8 bytes: if (datalen % 8 != 0) ereport(ERROR, (errmsg("input data must be multiple of 8 bytes"))); ...but after testing it, the OpenSSL key wrap functions it invokes require a multiple of 16-bytes (block size of AES). Otherwise you get a generic error: # SELECT pg_kmgr_wrap('abcd1234'::bytea); ERROR: could not wrap the given secret In ossl_compute_HMAC(...) it refers to AES256_KEY_LEN. Should be SHA256_HMAC_KEY_LEN (they're both 32-bytes but naming is wrong) return HMAC(EVP_sha256(), key, AES256_KEY_LEN, data, (uint32) data_size, result, (uint32 *) result_size); In pg_rotate_encryption_key(...) the error message for short passphrases should be "at least %d bytes": if (passlen < KMGR_MIN_PASSPHRASE_LEN) ereport(ERROR, (errmsg("passphrase must be more than %d bytes", KMGR_MIN_PASSPHRASE_LEN))); Rotating the passphrase via "SELECT pg_rotate_encryption_key()" and restarting the server worked (good). Having the server attempt to start with invalid output from the command gives an error "FATAL: cluster passphrase does not match expected passphrase" (good). Round tripping via wrap/unwrap works (good!): # SELECT convert_from(pg_kmgr_unwrap(pg_kmgr_wrap('abcd1234abcd1234'::bytea)), 'utf8'); convert_from -- abcd1234abcd1234 (1 row) Trying to unwrap gibberish fails (also good!): # SELECT pg_kmgr_unwrap('\x123456789012345678901234567890123456789012345678'); ERROR: could not unwrap the given secret The pg_kmgr_wrap/unwrap functions use EVP_aes_256_wrap()[1] which implements RFC 5649[2] with the default IVs so they always return the same value for the same input: # SELECT x, pg_kmgr_wrap('abcd1234abcd1234abcd1234') FROM generate_series(1,5) x; x |pg_kmgr_wrap ---+ 1 | \x51041d1fe52916fd15f456c2b67108473d9bf536795e2b6d4db81c065c8cd688 2 | \x51041d1fe52916fd15f456c2b67108473d9bf536795e2b6d4db81c065c8cd688 3 | \x51041d1fe52916fd15f456c2b67108473d9bf536795e2b6d4db81c065c8cd688 4 | \x51041d1fe52916fd15f456c2b67108473d9bf536795e2b6d4db81c065c8cd688 5 | \x51041d1fe52916fd15f456c2b67108473d9bf536795e2b6d4db81c065c8cd688 (5 rows) The IVs should be randomized so that repeated wrap operations give distinct results. To do that, the output format needs to include the randomized IV. It need not be secret but it needs to be included in the wrapped output. Related, IIUC, the wrapping mechanism of RFC5649 does provide some integrity checking but it's only 64-bits (v.s. say 256-bits for a full HMAC-SHA-256). Rather than use EVP_ae
Re: [PATCH] ltree, lquery, and ltxtquery binary protocol support
Attached is the new patch rebased onto master. Best regards, Nino Floris On Thu, Jan 16, 2020 at 11:00 PM Tomas Vondra wrote: > > On Fri, Nov 29, 2019 at 11:29:03AM +0900, Michael Paquier wrote: > >On Mon, Nov 11, 2019 at 03:44:54PM +0100, Nino Floris wrote: > >> Alright, as usual life got in the way. I've attached a new version of > >> the patch with pgindent changes. > >> > >> > What do you think about writing patch for ALTER TYPE? > >> I'd rather not :$ > >> > >> > 1) Write migration script, which directly updates pg_type. > >> This sounds like the best option right now, if we don't want people to > >> do manual migrations to ltree 1.2. > >> How would I best go at this? > >> > >> > Travis has a small complaint: > >> Should be fixed! > > > >The latest patch provided fails to apply for me on HEAD. Please > >provide a rebase. For now I am bumping this patch to next CF with > >"waiting on author" as status. > > Nino, any plans to submit a rebased/fixed patch, so that people can > review it? Not sure if this needs a simple rebase or something more > complex, all I know is cputube can't apply it. > > regards > > -- > Tomas Vondra http://www.2ndQuadrant.com > PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services 0003-Implements-ltree-lquery-and-ltxtquery-binary-protoco.patch Description: Binary data
Re: Strange coding in _mdfd_openseg()
On Thu, Apr 04, 2019 at 12:15:52PM +0900, Kyotaro HORIGUCHI wrote: > At Wed, 3 Apr 2019 13:47:46 -0700, Andres Freund wrote > in <20190403204746.2yumq7c2mirmo...@alap3.anarazel.de> > > On 2019-04-04 09:24:49 +1300, Thomas Munro wrote: > > > On Wed, Apr 3, 2019 at 5:34 PM Kyotaro HORIGUCHI > > > wrote: > > > > I may be missing something, but it seems possible that > > > > _mdfd_getseg calls it with segno > opensegs. > > > > > > > > | for (nextsegno = reln->md_num_open_segs[forknum]; > > > > > > Here nextsegno starts out equal to opensegs. > > > > > > > | nextsegno <= targetseg; nextsegno++) > > > > > > Here we add one to nextsegno... > > > > > > > | ... > > > > | v = _mdfd_openseg(reln, forknum, nextsegno, flags); > > > > > > ... after adding one to opensegs. So they're always equal. This fits > > > a general programming pattern when appending to an array, the next > > > element's index is the same as the number of elements. But I claim > > > the coding is weird, because _mdfd_openseg's *looks* like it can > > > handle opening segments in any order, except that the author > > > accidentally wrote "<=" instead of ">=". In fact it can't open them > > > in any order, because we don't support "holes" in the array. So I > > > think it should really be "==", and it should be an assertion, not a > > > condition. > > > > Yea, I totally agree it's weird. I'm not sure if I'd go for an assertion > > of equality, or just invert the >= (which I agree I probably just > > screwed up and didn't notice when reviewing the patch because it looked > > close enough to correct and it didn't have a measurable effect). > > I looked there and agreed. _mdfd_openseg is always called just to > add one new segment after the last opened segment by the two > callers. So I think == is better. Agreed. The rest of md.c won't cope with a hole in this array, so allowing less-than-or-equal here is futile. The patch in the original post looks fine.
Re: Duplicate Workers entries in some EXPLAIN plans
Maciek Sakrejda writes: > For what it's worth, this version makes sense to me. Thanks for looking. Here's a version that deals with the JIT instrumentation. As Andres noted far upthread, that was also really bogusly done before. Not only could you get multiple "JIT" subnodes on a Gather node, but we failed to print the info at all if the parallelism was expressed as Gather Merge rather than Gather. A side effect of this change is that per-worker JIT info is now printed one plan level further down: before it was printed on the Gather node, but now it's attached to the Gather's child, because that's where we print other per-worker data. I don't see anything particularly wrong with that, but it's another change from the behavior today. It's still really unclear to me how we could exercise any of this behavior meaningfully in a regression test. I thought for a little bit about using the TAP infrastructure instead of a traditional-style test, but it seems like that doesn't buy anything except for a bias towards ignoring details instead of overspecifying them. Which is not much of an improvement. regards, tom lane diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c index d189b8d..fe3c83b 100644 --- a/src/backend/commands/explain.c +++ b/src/backend/commands/explain.c @@ -57,6 +57,8 @@ static void ExplainOneQuery(Query *query, int cursorOptions, IntoClause *into, ExplainState *es, const char *queryString, ParamListInfo params, QueryEnvironment *queryEnv); +static void ExplainPrintJIT(ExplainState *es, int jit_flags, + JitInstrumentation *ji); static void report_triggers(ResultRelInfo *rInfo, bool show_relname, ExplainState *es); static double elapsed_time(instr_time *starttime); @@ -123,11 +125,20 @@ static void ExplainSubPlans(List *plans, List *ancestors, const char *relationship, ExplainState *es); static void ExplainCustomChildren(CustomScanState *css, List *ancestors, ExplainState *es); +static ExplainWorkersState *ExplainCreateWorkersState(int num_workers); +static void ExplainOpenWorker(int n, ExplainState *es); +static void ExplainCloseWorker(int n, ExplainState *es); +static void ExplainFlushWorkersState(ExplainState *es); static void ExplainProperty(const char *qlabel, const char *unit, const char *value, bool numeric, ExplainState *es); +static void ExplainOpenSetAsideGroup(const char *objtype, const char *labelname, + bool labeled, int depth, ExplainState *es); +static void ExplainSaveGroup(ExplainState *es, int depth, int *state_save); +static void ExplainRestoreGroup(ExplainState *es, int depth, int *state_save); static void ExplainDummyGroup(const char *objtype, const char *labelname, ExplainState *es); static void ExplainXMLTag(const char *tagname, int flags, ExplainState *es); +static void ExplainIndentText(ExplainState *es); static void ExplainJSONLineEnding(ExplainState *es); static void ExplainYAMLLineStarting(ExplainState *es); static void escape_yaml(StringInfo buf, const char *str); @@ -790,31 +801,22 @@ ExplainPrintJITSummary(ExplainState *es, QueryDesc *queryDesc) if (queryDesc->estate->es_jit_worker_instr) InstrJitAgg(&ji, queryDesc->estate->es_jit_worker_instr); - ExplainPrintJIT(es, queryDesc->estate->es_jit_flags, &ji, -1); + ExplainPrintJIT(es, queryDesc->estate->es_jit_flags, &ji); } /* * ExplainPrintJIT - * Append information about JITing to es->str. - * - * Can be used to print the JIT instrumentation of the backend (worker_num = - * -1) or that of a specific worker (worker_num = ...). */ -void -ExplainPrintJIT(ExplainState *es, int jit_flags, -JitInstrumentation *ji, int worker_num) +static void +ExplainPrintJIT(ExplainState *es, int jit_flags, JitInstrumentation *ji) { instr_time total_time; - bool for_workers = (worker_num >= 0); /* don't print information if no JITing happened */ if (!ji || ji->created_functions == 0) return; - /* don't print per-worker info if we're supposed to hide that */ - if (for_workers && es->hide_workers) - return; - /* calculate total time */ INSTR_TIME_SET_ZERO(total_time); INSTR_TIME_ADD(total_time, ji->generation_counter); @@ -827,16 +829,13 @@ ExplainPrintJIT(ExplainState *es, int jit_flags, /* for higher density, open code the text output format */ if (es->format == EXPLAIN_FORMAT_TEXT) { - appendStringInfoSpaces(es->str, es->indent * 2); - if (for_workers) - appendStringInfo(es->str, "JIT for worker %u:\n", worker_num); - else - appendStringInfoString(es->str, "JIT:\n"); - es->indent += 1; + ExplainIndentText(es); + appendStringInfoString(es->str, "JIT:\n"); + es->indent++; ExplainPropertyInteger("Functions", NULL, ji->created_functions, es); - appendStringInfoSpaces(es->str, es->indent * 2); + ExplainIndentText(es); appendStringInfo(es->str, "Options: %s %s, %s %s, %s %s, %s %s\n", "Inlining", jit_flags & PGJIT_I
Re: Duplicate Workers entries in some EXPLAIN plans
I wrote: > It's still really unclear to me how we could exercise any of > this behavior meaningfully in a regression test. I thought > for a little bit about using the TAP infrastructure instead > of a traditional-style test, but it seems like that doesn't > buy anything except for a bias towards ignoring details instead > of overspecifying them. Which is not much of an improvement. After further thought, I decided that about the best we can do is suppress the "Workers" field in the regression test's expected output. This still gives us code coverage of the relevant code, and we can check that the output is valid JSON before we strip it, so it's not a dead loss. I rewrote the test script a bit to add some coverage of XML and YAML output formats, since we had exactly none before, and pushed it. regards, tom lane
Re: Duplicate Workers entries in some EXPLAIN plans
Hi, On 2020-01-25 14:23:50 -0500, Tom Lane wrote: > A side effect of this change is that per-worker JIT info is now > printed one plan level further down: before it was printed on > the Gather node, but now it's attached to the Gather's child, > because that's where we print other per-worker data. I don't > see anything particularly wrong with that, but it's another > change from the behavior today. Yea, I don't see any need to be bothered by that. > It's still really unclear to me how we could exercise any of > this behavior meaningfully in a regression test. I thought > for a little bit about using the TAP infrastructure instead > of a traditional-style test, but it seems like that doesn't > buy anything except for a bias towards ignoring details instead > of overspecifying them. Which is not much of an improvement. Hm. I'd like to avoid needing TAP for this kind of thing, it'll just make it much more expensive in just about all ways. Testing JIT explain is "easy" enough I think, I've posted a patch in another thread, which just skips over the region of the test if JIT is not available. See e.g. 0008 in https://www.postgresql.org/message-id/20191029000229.fkjmuld3g7f2jq7i%40alap3.anarazel.de (that's a thread I'd love your input in btw) It's harder for parallel query though. If parallel query were able to reuse workers, we could "just" check at the beginning of the test if we are able to get the workers we need, and then skip the rest of the tests if not. But as things stand that doesn't guarantee anything. I wonder if we could introduce a debug GUC that makes parallel worker acquisition just retry in a loop, for a time determined by the GUC. That obviously would be a bad idea to do in a production setup, but it could be good enough for regression tests? There are some deadlock dangers, but I'm not sure they really matter for the tests. > + /* prepare per-worker general execution details */ > + if (es->workers_state && es->verbose) > + { > + WorkerInstrumentation *w = planstate->worker_instrument; > + > + for (int n = 0; n < w->num_workers; n++) > + { > + Instrumentation *instrument = &w->instrument[n]; > + double nloops = instrument->nloops; > + double startup_ms; > + double total_ms; > + double rows; > + > + if (nloops <= 0) > + continue; > + startup_ms = 1000.0 * instrument->startup / nloops; > + total_ms = 1000.0 * instrument->total / nloops; > + rows = instrument->ntuples / nloops; > + > + ExplainOpenWorker(n, es); > + > + if (es->format == EXPLAIN_FORMAT_TEXT) > + { > + ExplainIndentText(es); > + if (es->timing) > + appendStringInfo(es->str, > + > "actual time=%.3f..%.3f rows=%.0f loops=%.0f\n", > + > startup_ms, total_ms, rows, nloops); > + else > + appendStringInfo(es->str, > + > "actual rows=%.0f loops=%.0f\n", > + rows, > nloops); > + } > + else > + { > + if (es->timing) > + { > + ExplainPropertyFloat("Actual Startup > Time", "ms", > + > startup_ms, 3, es); > + ExplainPropertyFloat("Actual Total > Time", "ms", > + > total_ms, 3, es); > + } > + ExplainPropertyFloat("Actual Rows", NULL, rows, > 0, es); > + ExplainPropertyFloat("Actual Loops", NULL, > nloops, 0, es); > + } > + > + ExplainCloseWorker(n, es); > + } > + } I'd personally move this into a separate function, given the patches moves the code around already. ExplainNode() is already hard enough to navigate... It probably also makes sense to move everything but the nloops <= 0, ExplainOpenWorker/ExplainCloseWorker into its own function. As far as I can tell it now should be identical between the non-parallel case? > +/* > + * Begin or resume output into the set-aside group for worker N. > + */ > +static void Would it make sense to make these functions non-static? It s
Re: Duplicate Workers entries in some EXPLAIN plans
Andres Freund writes: > I wonder if we could introduce a debug GUC that makes parallel worker > acquisition just retry in a loop, for a time determined by the GUC. That > obviously would be a bad idea to do in a production setup, but it could > be good enough for regression tests? There are some deadlock dangers, > but I'm not sure they really matter for the tests. Hmmm might work. Seems like a better idea than "run it by itself" as we have to do now. > I'd personally move this into a separate function, given the patches > moves the code around already. ExplainNode() is already hard enough to > navigate... Well, it was already inline in ExplainNode, so this just moved the code a few lines. I'm not that excited about moving little bits of that function out-of-line. >> +/* >> + * Begin or resume output into the set-aside group for worker N. >> + */ >> +static void > Would it make sense to make these functions non-static? It seems > plausible that code for a custom node or such would want to add > its own information? Maybe, but seems to me that there'd be a whole lot of other infrastructure needed to track additional per-worker state. I'd just as soon not expose this stuff until (a) there's a concrete not hypothetical use case and (b) it's been around long enough to feel comfortable that there's nothing wrong with the design. >> +/* >> + * In TEXT format, prefix the first output line for this worker with >> + * "Worker N:". Then, any additional lines should be indented one more >> + * stop than the "Worker N" line is. >> + */ > I don't quite get the Worker %d bit. Why are we not outputting that in > the !worker_inited block? We might strip it off again in ExplainCloseWorker, and then potentially add it back again in a later ExplainOpenWorker. That could only happen if an earlier ExplainOpen/CloseWorker fragment decides not to emit any text and then a later one wants to do so. Which I'm pretty sure is unreachable right now, but I don't think this code should assume that. >> +appendStringInfoString(es->str, >> wstate->worker_str[i].data); > Probably never matters, but given we do have the string length already, > we could use appendBinaryStringInfo(). Ah, I was thinking about making that change but then forgot. >> +ExplainCloseGroup("Worker", NULL, true, es); > Not related to this patch: I never got why we maintain a grouping stack > for some things, but don't do it for the group properties > themselves. Right now it'd just be extra overhead. If we ever have a case where it's not convenient for an ExplainCloseGroup caller to provide the same data as for ExplainOpenGroup, then I'd be on board with storing that info. > Hm. It might be worthwhile to rename ExplainOpenSetAsideGroup and use it > from ExplainOpenGroup()? Seems we could just call it after > ExplainOpenGroup()'s switch, and remove all the indent/grouping_stack > related code from ExplainOpenGroup(). Hmm. It seemed easier to me to keep them separate, but ... I did consider a design in which, instead of ExplainOpenSetAsideGroup, there was some function that would initialize the "state_save" area and then you'd call the "restore" function to make that state active. It seemed like that would be too dissimilar from ExplainOpenGroup --- but conceivably, we could reimplement ExplainOpenGroup as calling the initialize function and then the restore function (along with doing some output). Not really sure that'd be an improvement though: it'd involve less duplicate code, but the functions would individually be harder to wrap your brain around. regards, tom lane
Re: pg_stat_bgwriter.buffers_backend is pretty meaningless (and more?)
Hi, On 2020-01-25 15:43:41 +0100, Magnus Hagander wrote: > On Fri, Jan 24, 2020 at 8:52 PM Andres Freund wrote: > > Additionally pg_stat_bgwriter.buffers_backend also counts writes done by > > autovacuum et al. > > I think it'd make sense to at least split buffers_backend into > > buffers_backend_extend, > > buffers_backend_write, > > buffers_backend_write_strat > > > > but it could also be worthwhile to expand it into > > buffers_backend_extend, > > buffers_{backend,checkpoint,bgwriter,autovacuum}_write > > buffers_{backend,autovacuum}_write_stat > > Given that these are individual global counters, I don't really see > any reason not to expand it to the bigger set of counters. It's easy > enough to add them up together later if needed. Are you agreeing to buffers_{backend,checkpoint,bgwriter,autovacuum}_write or are you suggesting further ones? > > I think we also count things as writes that aren't writes: mdtruncate() > > is AFAICT counted as one backend write for each segment. Which seems > > weird to me. > > It's at least slightly weird :) Might it be worth counting truncate > events separately? Is that really something interesting? Feels like it'd have to be done at a higher level to be useful. E.g. the truncate done by TRUNCATE (when in same xact as creation) and VACUUM are quite different. I think it'd be better to just not include it. > > Lastly, I don't understand what the point of sending fixed size stats, > > like the stuff underlying pg_stat_bgwriter, through pgstats IPC. While > > I don't like it's architecture, we obviously need something like pgstat > > to handle variable amounts of stats (database, table level etc > > stats). But that doesn't at all apply to these types of global stats. > > That part has annoyed me as well a few times. +1 for just moving that > into a global shared memory. Given that we don't really care about > things being in sync between those different counters *or* if we loose > a bit of data (which the stats collector is designed to do), we could > even do that without a lock? I don't think we'd quite want to do it without any (single counter) synchronization - high concurrency setups would be pretty likely to loose values that way. I suspect the best would be to have a struct in shared memory that contains the potential counters for each potential process. And then sum them up when actually wanting the concrete value. That way we avoid unnecessary contention, in contrast to having a single shared memory value for each(which would just pingpong between different sockets and store buffers). There's a few details like how exactly to implement resetting the counters, but ... Thanks, Andres Freund
t/010_pg_basebackup.pl checksum verify fails with RELSEG_SIZE 1
Hi, Sometimes I set RELSEG_SIZE to 1, as a way to get the various in-tree tests to give the relation segment code a good workout. That's outside the range that configure --with-segsize would allow and therefore not really a supported size (it's set in GB), but it's very useful for giving the relation segment code a good workout on small databases like the check-world ones. At some point I think that worked, but now it says: t/010_pg_basebackup.pl ... 100/106 # Failed test 'pg_basebackup does not report more than 5 checksum mismatches stderr /(?^s:^WARNING.*further.*failures.*will.not.be.reported)/' # at t/010_pg_basebackup.pl line 526. # 'WARNING: checksum verification failed in file "./base/13759/16396", block 0: calculated 49B8 but expected A91E # WARNING: could not verify checksum in file "./base/13759/16396", block 4: read buffer size 8225 and page size 8192 differ # pg_basebackup: error: checksum error occurred # ' # doesn't match '(?^s:^WARNING.*further.*failures.*will.not.be.reported)' I haven't quite figured out why it does that yet (I don't see any files of size other than 0 or 8192, as expected), but it'd be nice to do that. I was even thinking of running a bf animal that way.
Re: t/010_pg_basebackup.pl checksum verify fails with RELSEG_SIZE 1
Thomas Munro writes: > Sometimes I set RELSEG_SIZE to 1, as a way to get the various in-tree > tests to give the relation segment code a good workout. That's > outside the range that configure --with-segsize would allow and > therefore not really a supported size (it's set in GB), but it's very > useful for giving the relation segment code a good workout on small > databases like the check-world ones. At some point I think that > worked, but now it says: > t/010_pg_basebackup.pl ... 100/106 > # Failed test 'pg_basebackup does not report more than 5 checksum > mismatches stderr So ... presumably, the problem is that this supposes that whatever damage it did is spread across less than 5 relation segments, and with a sufficiently small segment size, that assumption is wrong. I'd say this is a a poorly designed test. regards, tom lane
Re: vacuum verbose detail logs are unclear; log at *start* of each stage
On Wed, Jan 22, 2020 at 02:34:57PM +0900, Michael Paquier wrote: > From patch 0003: > /* > +* Indent multi-line DETAIL if being sent to client (verbose) > +* We don't know if it's sent to the client (client_min_messages); > +* Also, that affects output to the logfile, too; assume that it's > more > +* important to format messages requested by the client than to make > +* verbose logs pretty when also sent to the logfile. > +*/ > + msgprefix = elevel==INFO ? "!\t" : ""; > Such stuff gets a -1 from me. This is not project-like, and you make > the translation of those messages much harder than they should be. I don't see why its harder to translate ? Do you mean because it changes the strings by adding %s ? - appendStringInfo(&sbuf, ngettext("%u page is entirely empty.\n", - "%u pages are entirely empty.\n", + appendStringInfo(&sbuf, ngettext("%s%u page is entirely empty.\n", + "%s%u pages are entirely empty.\n", ... I did raise two questions regarding translation: I'm not sure why this one doesn't use get ngettext() ? Seems to have been missed at a8d585c0. |appendStringInfo(&buf, _("There were %.0f unused item identifiers.\n"), Or why this one does use _/gettext() ? (580ddcec suggests that I'm missing something, but I just experimented, and it really seems to do nothing, since "%s" shouldn't be translated). |appendStringInfo(&buf, _("%s."), pg_rusage_show(&ru0)); Also, I realized it's possible to write different strings to the log vs the client (with and without a prefix) by calling errdetail_internal() and errdetail_log(). Here's a version rebased on top of f942dfb9, and making use of errdetail_log. I'm not sure if it address your concern about translation, but it doesn't change strings. I think it's not needed or desirable to change what's written to the logfile, since CSV logs have a separate "detail" field, and text logs are indented. The server log is unchanged: > 2020-01-25 23:08:40.451 CST [13971] INFO: "t": removed 0, found 160 > nonremovable row versions in 1 out of 888 pages > 2020-01-25 23:08:40.451 CST [13971] DETAIL: 0 dead row versions cannot be > removed yet, oldest xmin: 781 > There were 0 unused item identifiers. > Skipped 0 pages due to buffer pins, 444 frozen pages. > 0 pages are entirely empty. > CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.01 s. If VERBOSE, then the client log has ! prefixes, with the style borrowed from ShowUsage: > INFO: "t": removed 0, found 160 nonremovable row versions in 1 out of 888 > pages > DETAIL: ! 0 dead row versions cannot be removed yet, oldest xmin: 781 > ! There were 0 unused item identifiers. > ! Skipped 0 pages due to buffer pins, 444 frozen pages. > ! 0 pages are entirely empty. > ! CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.01 s. I mentioned before that maybe the client's messages with newlines should be indented similarly to how the they're done in the text logfile. I looked, that's append_with_tabs() in elog.c. So that's a different possible implementation, which would apply to any message with newlines (or possibly just DETAIL). I'll also fork the allvisible/frozen/hintbits patches to a separate thread. Thanks, Justin >From a3d0b41435655615ab13f808ec7c30e53e596e50 Mon Sep 17 00:00:00 2001 From: Justin Pryzby Date: Sat, 25 Jan 2020 21:25:37 -0600 Subject: [PATCH v3 1/4] Remove gettext erronously readded at 580ddce --- src/backend/access/heap/vacuumlazy.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c index 8ce5011..8e8ea9d 100644 --- a/src/backend/access/heap/vacuumlazy.c +++ b/src/backend/access/heap/vacuumlazy.c @@ -1690,7 +1690,7 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats, "%u pages are entirely empty.\n", empty_pages), empty_pages); - appendStringInfo(&buf, _("%s."), pg_rusage_show(&ru0)); + appendStringInfo(&buf, "%s.", pg_rusage_show(&ru0)); ereport(elevel, (errmsg("\"%s\": found %.0f removable, %.0f nonremovable row versions in %u out of %u pages", -- 2.7.4 >From 2db7c4e3482120b2a83cda74603f2454da7eaa03 Mon Sep 17 00:00:00 2001 From: Justin Pryzby Date: Sat, 25 Jan 2020 22:50:46 -0600 Subject: [PATCH v3 2/4] vacuum verbose: use ngettext() everywhere possible --- src/backend/access/heap/vacuumlazy.c | 9 ++--- 1 file changed, 6 insertions(+), 3 deletions(-) diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c index 8e8ea9d..eb903d5 100644 --- a/src/backend/access/heap/vacuumlazy.c +++ b/src/backend/access/heap/vacuumlazy.c @@ -1673,10 +1673,13 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats, * individual parts of the
Re: Greatest Common Divisor
On 25/01/2020 15:18, Dean Rasheed wrote: > > Committed with some adjustments, mostly cosmetic but a couple more > substantive: Thanks! > The code to guard against a floating point exception with inputs of > (INT_MIN, -1) wasn't quite right because it actually just moved the > problem so that it would fall over with inputs of (INT_MIN, +1). Good catch. > The convention in numeric.c is that the xxx_var() functions take > *pointers* to their NumericVar arguments rather than copies, and they > do not modify their inputs, as indicated by the use of "const". You > might just have gotten away with what you were doing, but I think it > was bad style and potentially unsafe -- for example, someone calling > gcd_var() with a NumericVar that came from some other computation and > having a non-null buf would risk having the buf freed in the copy, > leaving the original NumericVar with a buf pointing to freed memory. Thank you for taking the time to look closely at this. This was my first time dealing with "numeric" so I was bound to make some mistakes. -- Vik Fearing