date:20200125

Re: Allow to_date() and to_timestamp() to accept localized names

2020-01-25 Thread Peter Eisentraut


On 2020-01-24 19:01, Peter Eisentraut wrote:

postgres=# select to_char(now(),'TMmonth');
to_char

   ιανουαρίου
(1 row)

which is the genitive of ιανουάριος.  You use the genitive form for a
date (24th of January) but the nominative otherwise.  But the reverse
mapping can only take one of these forms.  So here

select to_date('Ιανουαριος', 'TMMonth');

fails, which is bad.


For the record, the correct form of that would appear to be

select to_date('Ιανουάριος', 'TMMonth');

with the accent.  I had tried different variations of that and they all 
failed.


--
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Parallel grouping sets

2020-01-25 Thread Amit Kapila

On Sat, Jan 25, 2020 at 4:22 AM Jesse Zhang  wrote:
>
> On Thu, Jan 23, 2020 at 2:47 AM Amit Kapila  wrote:
> >
> > On Sun, Jan 19, 2020 at 2:23 PM Richard Guo  wrote:
> > >
> > > I realized that there are two patches in this thread that are
> > > implemented according to different methods, which causes confusion.
> > >
> >
> > Both the idea seems to be different.  Is the second approach [1]
> > inferior for any case as compared to the first approach?  Can we keep
> > both approaches for parallel grouping sets, if so how?  If not, then
> > won't the code by the first approach be useless once we commit second
> > approach?
> >
> >
> > [1] - 
> > https://www.postgresql.org/message-id/CAN_9JTwtTTnxhbr5AHuqVcriz3HxvPpx1JWE--DCSdJYuHrLtA%40mail.gmail.com
> >
>
> I glanced over both patches. Just the opposite, I have a hunch that v3
> is always better than v5.
>

This is what I also understood after reading this thread.  So, my
question is why not just review v3 and commit something on those lines
even though it would take a bit more time.  It is possible that if we
decide to go with v5, we can make it happen earlier, but later when we
try to get v3, the code committed as part of v5 might not be of any
use or if it is useful, then in which cases?


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: [WIP] UNNEST(REFCURSOR): allowing SELECT to consume data from a REFCURSOR

2020-01-25 Thread Dent John

> On 19 Jan 2020, at 22:30, Dent John  wrote:
> 
> I have not yet made steps towards documentation, nor yet rebased, so the 
> Makefile chunk will probably still fail.

Attached patch addresses these points, so should now apply cleanly agains dev.

I also changed the OID assigned to ROWS_IN and its support function.

In passing, I noticed there is one existing function that can consume and make 
good use of ROWS_IN’s result when used in the target list, which is 
row_to_json. This is good, as it makes ROWS_IN useful even outside of a change 
to allow results in the FROM to be pipelined. I’ve called out row_to_json 
specifically in the documentation change.

denty.

unnest-refcursor-v5a.patch
Description: Binary data

Re: Greatest Common Divisor

2020-01-25 Thread Dean Rasheed

On Mon, 20 Jan 2020 at 08:04, Vik Fearing  wrote:
>
> On 20/01/2020 08:44, Dean Rasheed wrote:
> >>
> > I see this has been marked RFC. I'll take it,
>

Committed with some adjustments, mostly cosmetic but a couple more substantive:

The code to guard against a floating point exception with inputs of
(INT_MIN, -1) wasn't quite right because it actually just moved the
problem so that it would fall over with inputs of (INT_MIN, +1).

The convention in numeric.c is that the xxx_var() functions take
*pointers* to their NumericVar arguments rather than copies, and they
do not modify their inputs, as indicated by the use of "const". You
might just have gotten away with what you were doing, but I think it
was bad style and potentially unsafe -- for example, someone calling
gcd_var() with a NumericVar that came from some other computation and
having a non-null buf would risk having the buf freed in the copy,
leaving the original NumericVar with a buf pointing to freed memory.

Regards,
Dean

Re: pg_stat_bgwriter.buffers_backend is pretty meaningless (and more?)

2020-01-25 Thread Magnus Hagander

On Fri, Jan 24, 2020 at 8:52 PM Andres Freund  wrote:
>
> Hi,
>
> Currently pg_stat_bgwriter.buffers_backend is pretty useless to gauge
> whether backends are doing writes they shouldn't do. That's because it
> counts things that are either unavoidably or unlikely doable by other
> parts of the system (checkpointer, bgwriter).
> In particular extending the file can not currently be done by any
> another type of process, yet is counted. When using a buffer access
> strategy it is also very likely that writes have to be done by the
> 'dirtying' backend itself, as the buffer will be reused soon after (when
> not previously in s_b that is).

Yeah. That's quite annoying.


> Additionally pg_stat_bgwriter.buffers_backend also counts writes done by
> autovacuum et al.
>
>
> I think it'd make sense to at least split buffers_backend into
> buffers_backend_extend,
> buffers_backend_write,
> buffers_backend_write_strat
>
> but it could also be worthwhile to expand it into
> buffers_backend_extend,
> buffers_{backend,checkpoint,bgwriter,autovacuum}_write
> buffers_{backend,autovacuum}_write_stat

Given that these are individual global counters, I don't really see
any reason not to expand it to the bigger set of counters. It's easy
enough to add them up together later if needed.


> Possibly by internally, in contrast to SQL level, having just counter
> arrays indexed by backend types.
>
>
> It's also noteworthy that buffers_backend is accounted in an absurd
> manner. One might think that writes are accounted from backend -> shared
> memory or such. But instead it works like this:
>
> 1) backend flushes buffer in bufmgr.c, accounts for backend *write time*
> 2) mdwrite writes and  registers a sync request, which forwards the sync 
> request to checkpointer
> 3) ForwardSyncRequest(), when not called by bgwriter, increments 
> CheckpointerShmem->num_backend_writes
> 4) checkpointer, whenever doing AbsorbSyncRequests(), moves
>CheckpointerShmem->num_backend_writes to
>BgWriterStats.m_buf_written_backend (local memory!)
> 5) Occasionally it calls pgstat_send_bgwriter(), which sends the data to
>pgstat (which bgwriter also does)
> 6) Which then updates the shared memory used by the display functions
>
> Worthwhile to note that backend buffer read/write *time* is accounted
> differently. That's done via pgstat_send_tabstat().
>
>
> I think there's very little excuse for the indirection via checkpointer,
> besides architectually being weird, it actually requires that we
> continue to wake up checkpointer over and over instead of optimizing how
> and when we submit fsync requests.
>
> As far as I can tell we're also simply not accounting at all for writes
> done outside of shared buffers. All writes done directly through
> smgrwrite()/extend() aren't accounted anywhere as far as I can tell.
>
>
> I think we also count things as writes that aren't writes: mdtruncate()
> is AFAICT counted as one backend write for each segment. Which seems
> weird to me.

It's at least slightly weird :) Might it be worth counting truncate
events separately?


> Lastly, I don't understand what the point of sending fixed size stats,
> like the stuff underlying pg_stat_bgwriter, through pgstats IPC. While
> I don't like it's architecture, we obviously need something like pgstat
> to handle variable amounts of stats (database, table level etc
> stats). But that doesn't at all apply to these types of global stats.

That part has annoyed me as well a few times. +1 for just moving that
into a global shared memory. Given that we don't really care about
things being in sync between those different counters *or* if we loose
a bit of data (which the stats collector is designed to do), we could
even do that without a lock?

-- 
 Magnus Hagander
 Me: https://www.hagander.net/
 Work: https://www.redpill-linpro.com/

Re: [Proposal] Global temporary tables

2020-01-25 Thread 曾文旌(义从)

Thank you for review patch.

> 2020年1月24日 下午4:20，Konstantin Knizhnik  写道：
> 
> 
> 
> On 23.01.2020 19:28, 曾文旌(义从) wrote:
>> 
>> I'm trying to improve this part of the implementation in 
>> global_temporary_table_v7-pg13.patch
>> Please check my patch and give me feedback.
>> 
>> 
>> Thanks
>> 
>> Wenjing
>> 
>> 
> 
> Below is my short review of the patch:
> 
> +/*
> + * For global temp table only
> + * use AccessExclusiveLock for ensure safety
> + */
> +{
> +{
> +"on_commit_delete_rows",
> +"global temp table on commit options",
> +RELOPT_KIND_HEAP | RELOPT_KIND_PARTITIONED,
> +ShareUpdateExclusiveLock
> +},
> +true
> +},
> 
> 
> The comment seems to be confusing: it says about AccessExclusiveLock but 
> actually uses ShareUpdateExclusiveLock.
There is a problem with the comment description, I will fix it.

> 
> -Assert(TransactionIdIsNormal(onerel->rd_rel->relfrozenxid));
> -Assert(MultiXactIdIsValid(onerel->rd_rel->relminmxid));
> +Assert((RELATION_IS_GLOBAL_TEMP(onerel) && onerel->rd_rel->relfrozenxid 
> == InvalidTransactionId) ||
> +(!RELATION_IS_GLOBAL_TEMP(onerel) && 
> TransactionIdIsNormal(onerel->rd_rel->relfrozenxid)));
> +Assert((RELATION_IS_GLOBAL_TEMP(onerel) && onerel->rd_rel->relminmxid == 
> InvalidMultiXactId) ||
> +(!RELATION_IS_GLOBAL_TEMP(onerel) && 
> MultiXactIdIsValid(onerel->rd_rel->relminmxid)));
>  
> It is actually equivalent to:
> 
> Assert(RELATION_IS_GLOBAL_TEMP(onerel) ^ 
> TransactionIdIsNormal(onerel->rd_rel->relfrozenxid);
> Assert(RELATION_IS_GLOBAL_TEMP(onerel) ^ 
> MultiXactIdIsValid(onerel->rd_rel->relminmxid));
Yes, Thank you for your points out, It's simpler.

> 
> +/* clean temp relation files */
> +if (max_active_gtt > 0)
> +RemovePgTempFiles();
> +
>  /*
>  
> I wonder why do we need some special check for GTT here.
> From my point of view cleanup at startup of local storage of temp tables 
> should be performed in the same way for local and global temp tables.
After oom kill, In autovacuum, the Isolated local temp table will be cleaned 
like orphan temporary tables. The definition of local temp table is deleted 
with the storage file. 
But GTT can not do that. So we have the this implementation in my patch.
If you have other solutions, please let me know.

> 
> 
> -new_rel_reltup->relfrozenxid = relfrozenxid;
> -new_rel_reltup->relminmxid = relminmxid;
> +/* global temp table not remember transaction info in catalog */
> +if (relpersistence == RELPERSISTENCE_GLOBAL_TEMP)
> +{
> +new_rel_reltup->relfrozenxid = InvalidTransactionId;
> +new_rel_reltup->relminmxid = InvalidMultiXactId;
> +}
> +else
> +{
> +new_rel_reltup->relfrozenxid = relfrozenxid;
> +new_rel_reltup->relminmxid = relminmxid;
> +}
> +
> 
> 
> Why do we need to do it for GTT?
> Did you check that there will be no problems with GTT in case of XID 
> wraparound?
> Right now if you create temp table and keep session open, then it will block 
> XID wraparound.
In my design
1 Because different sessions have different transaction information, I choose 
to store the transaction information of GTT in MyProc,not catalog.
2 About the XID wraparound problem, the reason is the design of the temp table 
storage(local temp table and global temp table) that makes it can not to do 
vacuum by autovacuum. 
It should be completely solve at the storage level.

> 
> +/* We allow to drop global temp table only this session use it */
> +if (RELATION_IS_GLOBAL_TEMP(rel))
> +{
> +if (is_other_backend_use_gtt(rel->rd_node))
> +elog(ERROR, "can not drop relation when other backend attached 
> this global temp table");
> +}
> +
> 
> Here we once again introduce incompatibility with normal (permanent) tables.
> Assume that DBA or programmer need to change format of GTT. But there are 
> some active sessions which have used this GTT sometime in the past.
> We will not be able to drop this GTT until all this sessions are terminated.
> I do not think that it is acceptable behaviour.
In fact, The dba can still complete the DDL of the GTT.
I've provided a set of functions for this case.
If the dba needs to modify a GTT A(or drop GTT or create index on GTT), he 
needs to do:
1 Use the pg_gtt_attached_pids view to list the pids for the session that is 
using the GTT A.
2 Use pg_terminate_backend(pid)terminate they except itself.
3 Do alter GTT A.

> 
> +LOCKMODElockmode = AccessExclusiveLock;
> +
> +/* truncate global temp table only need RowExclusiveLock */
> +if (get_rel_persistence(rid) == RELPERSISTENCE_GLOBAL_TEMP)
> +lockmode = RowExclusiveLock;
> 
> 
> What are the reasons of using RowExclusiveLock for GTT instead of 
> AccessExclusiveLock?
> Yes, GTT data is access only by one backend so no locking here seems to be 
> needed at all.
> B

Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)

2020-01-25 Thread Sehrope Sarkuni

Hi,

I took a look at this patch. With some additions I think the feature
itself is useful but the patch needs more work. It also doesn't have
any of its own automated tests yet so the testing below was done
manually.

The attached file, kms_v2.patch, is a rebased version of the
kms_v1.patch that fixes some bit rot. It sorts some of the Makefile
additions but otherwise is the original patch. This version applies
cleanly on master and passes make check.

I don't have a Windows machine to test it, but I think the Windows
build files for these changes are missing. The updated
src/common/Makefile has a comment to coordinate updates to
Mkvcbuild.pm but I don't see kmgr_utils.c or cipher_openssl.c
referenced anywhere in there.

The patch adds "pg_kmgr" to the list of files to skip in
pg_checksums.c but there's no additional "pg_kmgr" file written to the
data directory. Perhaps that's from a prior version that saved data to
its own file?

The constant AES128_KEY_LEN is defined in cipher.c but it's not used
anywhere. RE: AES-128, not sure the value of even supporting it for
this feature (v.s. just supporting AES-256). Unlike something like
table data encryption, I'd expect a KMS to be used much less
frequently so any performance boost of AES-128 vs AES-256 would be
meaningless.

The functions pg_cipher_encrypt(...), pg_cipher_decrypt(...), and
pg_compute_HMAC(...) return true if OpenSSL is not configured. Should
that be false? The ctx init functions all return false when not
configured. I don't think that code path would ever be reached as you
would not have a valid context but seems more consistent to have them
all return false.

There's a comment referring to "Encryption keys (TDEK and WDEK)
length" but this feature is only for a KMS so that should be renamed.

The passphrase is hashed to split it into two 32-byte keys but the min
length is only 8-bytes:

#define KMGR_MIN_PASSPHRASE_LEN 8

... that should be at least 64-bytes to reflect how it's being used
downstream. Depending on the format of the passphrase commands output
it should be even longer (ex: binary data in hex should really be
double that). The overall min should be 64-byte but maybe add a note
to the docs to explain how the output will be used and the expected
amount of entropy.

In pg_kmgr_wrap(...) it checks that the input is a multiple of 8 bytes:

if (datalen % 8 != 0)
ereport(ERROR,
(errmsg("input data must be multiple of 8 bytes")));

...but after testing it, the OpenSSL key wrap functions it invokes
require a multiple of 16-bytes (block size of AES). Otherwise you get
a generic error:

# SELECT pg_kmgr_wrap('abcd1234'::bytea);
ERROR:  could not wrap the given secret

In ossl_compute_HMAC(...) it refers to AES256_KEY_LEN. Should be
SHA256_HMAC_KEY_LEN (they're both 32-bytes but naming is wrong)

return HMAC(EVP_sha256(), key, AES256_KEY_LEN, data,
(uint32) data_size, result, (uint32 *) result_size);

In pg_rotate_encryption_key(...) the error message for short
passphrases should be "at least %d bytes":

if (passlen < KMGR_MIN_PASSPHRASE_LEN)
ereport(ERROR,
(errmsg("passphrase must be more than %d bytes",
KMGR_MIN_PASSPHRASE_LEN)));

Rotating the passphrase via "SELECT pg_rotate_encryption_key()" and
restarting the server worked (good). Having the server attempt to
start with invalid output from the command gives an error "FATAL:
cluster passphrase does not match expected passphrase" (good).

Round tripping via wrap/unwrap works (good!):

# SELECT convert_from(pg_kmgr_unwrap(pg_kmgr_wrap('abcd1234abcd1234'::bytea)),
'utf8');
   convert_from
--
 abcd1234abcd1234
(1 row)

Trying to unwrap gibberish fails (also good!):

# SELECT pg_kmgr_unwrap('\x123456789012345678901234567890123456789012345678');
ERROR:  could not unwrap the given secret

The pg_kmgr_wrap/unwrap functions use EVP_aes_256_wrap()[1] which
implements RFC 5649[2] with the default IVs so they always return the
same value for the same input:

# SELECT x, pg_kmgr_wrap('abcd1234abcd1234abcd1234') FROM
generate_series(1,5) x;
 x |pg_kmgr_wrap
---+
 1 | \x51041d1fe52916fd15f456c2b67108473d9bf536795e2b6d4db81c065c8cd688
 2 | \x51041d1fe52916fd15f456c2b67108473d9bf536795e2b6d4db81c065c8cd688
 3 | \x51041d1fe52916fd15f456c2b67108473d9bf536795e2b6d4db81c065c8cd688
 4 | \x51041d1fe52916fd15f456c2b67108473d9bf536795e2b6d4db81c065c8cd688
 5 | \x51041d1fe52916fd15f456c2b67108473d9bf536795e2b6d4db81c065c8cd688
(5 rows)

The IVs should be randomized so that repeated wrap operations give
distinct results. To do that, the output format needs to include the
randomized IV. It need not be secret but it needs to be included in
the wrapped output. Related, IIUC, the wrapping mechanism of RFC5649
does provide some integrity checking but it's only 64-bits (v.s. say
256-bits for a full HMAC-SHA-256).

Rather than use EVP_ae

Re: [PATCH] ltree, lquery, and ltxtquery binary protocol support

2020-01-25 Thread Nino Floris

Attached is the new patch rebased onto master.

Best regards,
Nino Floris

On Thu, Jan 16, 2020 at 11:00 PM Tomas Vondra
 wrote:
>
> On Fri, Nov 29, 2019 at 11:29:03AM +0900, Michael Paquier wrote:
> >On Mon, Nov 11, 2019 at 03:44:54PM +0100, Nino Floris wrote:
> >> Alright, as usual life got in the way. I've attached a new version of
> >> the patch with pgindent changes.
> >>
> >> > What do you think about writing patch for ALTER TYPE?
> >> I'd rather not :$
> >>
> >> > 1) Write migration script, which directly updates pg_type.
> >> This sounds like the best option right now, if we don't want people to
> >> do manual migrations to ltree 1.2.
> >> How would I best go at this?
> >>
> >> > Travis has a small complaint:
> >> Should be fixed!
> >
> >The latest patch provided fails to apply for me on HEAD.  Please
> >provide a rebase.  For now I am bumping this patch to next CF with
> >"waiting on author" as status.
>
> Nino, any plans to submit a rebased/fixed patch, so that people can
> review it? Not sure if this needs a simple rebase or something more
> complex, all I know is cputube can't apply it.
>
> regards
>
> --
> Tomas Vondra  http://www.2ndQuadrant.com
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


0003-Implements-ltree-lquery-and-ltxtquery-binary-protoco.patch
Description: Binary data

Re: Strange coding in _mdfd_openseg()

2020-01-25 Thread Noah Misch

On Thu, Apr 04, 2019 at 12:15:52PM +0900, Kyotaro HORIGUCHI wrote:
> At Wed, 3 Apr 2019 13:47:46 -0700, Andres Freund  wrote 
> in <20190403204746.2yumq7c2mirmo...@alap3.anarazel.de>
> > On 2019-04-04 09:24:49 +1300, Thomas Munro wrote:
> > > On Wed, Apr 3, 2019 at 5:34 PM Kyotaro HORIGUCHI 
> > >  wrote:
> > > > I may be missing something, but it seems possible that
> > > > _mdfd_getseg calls it with segno > opensegs.
> > > >
> > > > | for (nextsegno = reln->md_num_open_segs[forknum];
> > > 
> > > Here nextsegno starts out equal to opensegs.
> > > 
> > > > |  nextsegno <= targetseg; nextsegno++)
> > > 
> > > Here we add one to nextsegno...
> > > 
> > > > | ...
> > > > | v = _mdfd_openseg(reln, forknum, nextsegno, flags);
> > > 
> > > ... after adding one to opensegs.  So they're always equal.  This fits
> > > a general programming pattern when appending to an array, the next
> > > element's index is the same as the number of elements.  But I claim
> > > the coding is weird, because _mdfd_openseg's *looks* like it can
> > > handle opening segments in any order, except that the author
> > > accidentally wrote "<=" instead of ">=".  In fact it can't open them
> > > in any order, because we don't support "holes" in the array.  So I
> > > think it should really be "==", and it should be an assertion, not a
> > > condition.
> > 
> > Yea, I totally agree it's weird. I'm not sure if I'd go for an assertion
> > of equality, or just invert the >= (which I agree I probably just
> > screwed up and didn't notice when reviewing the patch because it looked
> > close enough to correct and it didn't have a measurable effect).
> 
> I looked there and agreed. _mdfd_openseg is always called just to
> add one new segment after the last opened segment by the two
> callers. So I think == is better.

Agreed.  The rest of md.c won't cope with a hole in this array, so allowing
less-than-or-equal here is futile.  The patch in the original post looks fine.

Re: Duplicate Workers entries in some EXPLAIN plans

2020-01-25 Thread Tom Lane

Maciek Sakrejda  writes:
> For what it's worth, this version makes sense to me.

Thanks for looking.  Here's a version that deals with the JIT
instrumentation.  As Andres noted far upthread, that was also
really bogusly done before.  Not only could you get multiple "JIT"
subnodes on a Gather node, but we failed to print the info at all
if the parallelism was expressed as Gather Merge rather than
Gather.

A side effect of this change is that per-worker JIT info is now
printed one plan level further down: before it was printed on
the Gather node, but now it's attached to the Gather's child,
because that's where we print other per-worker data.  I don't
see anything particularly wrong with that, but it's another
change from the behavior today.

It's still really unclear to me how we could exercise any of
this behavior meaningfully in a regression test.  I thought
for a little bit about using the TAP infrastructure instead
of a traditional-style test, but it seems like that doesn't
buy anything except for a bias towards ignoring details instead
of overspecifying them.  Which is not much of an improvement.

regards, tom lane

diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index d189b8d..fe3c83b 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -57,6 +57,8 @@ static void ExplainOneQuery(Query *query, int cursorOptions,
 			IntoClause *into, ExplainState *es,
 			const char *queryString, ParamListInfo params,
 			QueryEnvironment *queryEnv);
+static void ExplainPrintJIT(ExplainState *es, int jit_flags,
+			JitInstrumentation *ji);
 static void report_triggers(ResultRelInfo *rInfo, bool show_relname,
 			ExplainState *es);
 static double elapsed_time(instr_time *starttime);
@@ -123,11 +125,20 @@ static void ExplainSubPlans(List *plans, List *ancestors,
 			const char *relationship, ExplainState *es);
 static void ExplainCustomChildren(CustomScanState *css,
   List *ancestors, ExplainState *es);
+static ExplainWorkersState *ExplainCreateWorkersState(int num_workers);
+static void ExplainOpenWorker(int n, ExplainState *es);
+static void ExplainCloseWorker(int n, ExplainState *es);
+static void ExplainFlushWorkersState(ExplainState *es);
 static void ExplainProperty(const char *qlabel, const char *unit,
 			const char *value, bool numeric, ExplainState *es);
+static void ExplainOpenSetAsideGroup(const char *objtype, const char *labelname,
+	 bool labeled, int depth, ExplainState *es);
+static void ExplainSaveGroup(ExplainState *es, int depth, int *state_save);
+static void ExplainRestoreGroup(ExplainState *es, int depth, int *state_save);
 static void ExplainDummyGroup(const char *objtype, const char *labelname,
 			  ExplainState *es);
 static void ExplainXMLTag(const char *tagname, int flags, ExplainState *es);
+static void ExplainIndentText(ExplainState *es);
 static void ExplainJSONLineEnding(ExplainState *es);
 static void ExplainYAMLLineStarting(ExplainState *es);
 static void escape_yaml(StringInfo buf, const char *str);
@@ -790,31 +801,22 @@ ExplainPrintJITSummary(ExplainState *es, QueryDesc *queryDesc)
 	if (queryDesc->estate->es_jit_worker_instr)
 		InstrJitAgg(&ji, queryDesc->estate->es_jit_worker_instr);
 
-	ExplainPrintJIT(es, queryDesc->estate->es_jit_flags, &ji, -1);
+	ExplainPrintJIT(es, queryDesc->estate->es_jit_flags, &ji);
 }
 
 /*
  * ExplainPrintJIT -
  *	  Append information about JITing to es->str.
- *
- * Can be used to print the JIT instrumentation of the backend (worker_num =
- * -1) or that of a specific worker (worker_num = ...).
  */
-void
-ExplainPrintJIT(ExplainState *es, int jit_flags,
-JitInstrumentation *ji, int worker_num)
+static void
+ExplainPrintJIT(ExplainState *es, int jit_flags, JitInstrumentation *ji)
 {
 	instr_time	total_time;
-	bool		for_workers = (worker_num >= 0);
 
 	/* don't print information if no JITing happened */
 	if (!ji || ji->created_functions == 0)
 		return;
 
-	/* don't print per-worker info if we're supposed to hide that */
-	if (for_workers && es->hide_workers)
-		return;
-
 	/* calculate total time */
 	INSTR_TIME_SET_ZERO(total_time);
 	INSTR_TIME_ADD(total_time, ji->generation_counter);
@@ -827,16 +829,13 @@ ExplainPrintJIT(ExplainState *es, int jit_flags,
 	/* for higher density, open code the text output format */
 	if (es->format == EXPLAIN_FORMAT_TEXT)
 	{
-		appendStringInfoSpaces(es->str, es->indent * 2);
-		if (for_workers)
-			appendStringInfo(es->str, "JIT for worker %u:\n", worker_num);
-		else
-			appendStringInfoString(es->str, "JIT:\n");
-		es->indent += 1;
+		ExplainIndentText(es);
+		appendStringInfoString(es->str, "JIT:\n");
+		es->indent++;
 
 		ExplainPropertyInteger("Functions", NULL, ji->created_functions, es);
 
-		appendStringInfoSpaces(es->str, es->indent * 2);
+		ExplainIndentText(es);
 		appendStringInfo(es->str, "Options: %s %s, %s %s, %s %s, %s %s\n",
 		 "Inlining", jit_flags & PGJIT_I

Re: Duplicate Workers entries in some EXPLAIN plans

2020-01-25 Thread Tom Lane

I wrote:
> It's still really unclear to me how we could exercise any of
> this behavior meaningfully in a regression test.  I thought
> for a little bit about using the TAP infrastructure instead
> of a traditional-style test, but it seems like that doesn't
> buy anything except for a bias towards ignoring details instead
> of overspecifying them.  Which is not much of an improvement.

After further thought, I decided that about the best we can do
is suppress the "Workers" field in the regression test's expected
output.  This still gives us code coverage of the relevant code,
and we can check that the output is valid JSON before we strip it,
so it's not a dead loss.

I rewrote the test script a bit to add some coverage of XML and YAML
output formats, since we had exactly none before, and pushed it.

regards, tom lane

Re: Duplicate Workers entries in some EXPLAIN plans

2020-01-25 Thread Andres Freund

Hi,

On 2020-01-25 14:23:50 -0500, Tom Lane wrote:
> A side effect of this change is that per-worker JIT info is now
> printed one plan level further down: before it was printed on
> the Gather node, but now it's attached to the Gather's child,
> because that's where we print other per-worker data.  I don't
> see anything particularly wrong with that, but it's another
> change from the behavior today.

Yea, I don't see any need to be bothered by that.

> It's still really unclear to me how we could exercise any of
> this behavior meaningfully in a regression test.  I thought
> for a little bit about using the TAP infrastructure instead
> of a traditional-style test, but it seems like that doesn't
> buy anything except for a bias towards ignoring details instead
> of overspecifying them.  Which is not much of an improvement.

Hm. I'd like to avoid needing TAP for this kind of thing, it'll just
make it much more expensive in just about all ways.

Testing JIT explain is "easy" enough I think, I've posted a patch in
another thread, which just skips over the region of the test if JIT is
not available. See e.g. 0008 in
https://www.postgresql.org/message-id/20191029000229.fkjmuld3g7f2jq7i%40alap3.anarazel.de
(that's a thread I'd love your input in btw)

It's harder for parallel query though. If parallel query were able to
reuse workers, we could "just" check at the beginning of the test if we
are able to get the workers we need, and then skip the rest of the tests
if not. But as things stand that doesn't guarantee anything.

I wonder if we could introduce a debug GUC that makes parallel worker
acquisition just retry in a loop, for a time determined by the GUC. That
obviously would be a bad idea to do in a production setup, but it could
be good enough for regression tests?  There are some deadlock dangers,
but I'm not sure they really matter for the tests.

> + /* prepare per-worker general execution details */
> + if (es->workers_state && es->verbose)
> + {
> + WorkerInstrumentation *w = planstate->worker_instrument;
> +
> + for (int n = 0; n < w->num_workers; n++)
> + {
> + Instrumentation *instrument = &w->instrument[n];
> + double  nloops = instrument->nloops;
> + double  startup_ms;
> + double  total_ms;
> + double  rows;
> +
> + if (nloops <= 0)
> + continue;
> + startup_ms = 1000.0 * instrument->startup / nloops;
> + total_ms = 1000.0 * instrument->total / nloops;
> + rows = instrument->ntuples / nloops;
> +
> + ExplainOpenWorker(n, es);
> +
> + if (es->format == EXPLAIN_FORMAT_TEXT)
> + {
> + ExplainIndentText(es);
> + if (es->timing)
> + appendStringInfo(es->str,
> +  
> "actual time=%.3f..%.3f rows=%.0f loops=%.0f\n",
> +  
> startup_ms, total_ms, rows, nloops);
> + else
> + appendStringInfo(es->str,
> +  
> "actual rows=%.0f loops=%.0f\n",
> +  rows, 
> nloops);
> + }
> + else
> + {
> + if (es->timing)
> + {
> + ExplainPropertyFloat("Actual Startup 
> Time", "ms",
> + 
>  startup_ms, 3, es);
> + ExplainPropertyFloat("Actual Total 
> Time", "ms",
> + 
>  total_ms, 3, es);
> + }
> + ExplainPropertyFloat("Actual Rows", NULL, rows, 
> 0, es);
> + ExplainPropertyFloat("Actual Loops", NULL, 
> nloops, 0, es);
> + }
> +
> + ExplainCloseWorker(n, es);
> + }
> + }

I'd personally move this into a separate function, given the patches
moves the code around already.  ExplainNode() is already hard enough to
navigate...

It probably also makes sense to move everything but the nloops <= 0,
ExplainOpenWorker/ExplainCloseWorker into its own function. As far as I
can tell it now should be identical between the non-parallel case?

> +/*
> + * Begin or resume output into the set-aside group for worker N.
> + */
> +static void

Would it make sense to make these functions non-static? It s

Re: Duplicate Workers entries in some EXPLAIN plans

2020-01-25 Thread Tom Lane

Andres Freund  writes:
> I wonder if we could introduce a debug GUC that makes parallel worker
> acquisition just retry in a loop, for a time determined by the GUC. That
> obviously would be a bad idea to do in a production setup, but it could
> be good enough for regression tests?  There are some deadlock dangers,
> but I'm not sure they really matter for the tests.

Hmmm  might work.  Seems like a better idea than "run it by itself"
as we have to do now.

> I'd personally move this into a separate function, given the patches
> moves the code around already.  ExplainNode() is already hard enough to
> navigate...

Well, it was already inline in ExplainNode, so this just moved the
code a few lines.  I'm not that excited about moving little bits of
that function out-of-line.

>> +/*
>> + * Begin or resume output into the set-aside group for worker N.
>> + */
>> +static void

> Would it make sense to make these functions non-static? It seems
> plausible that code for a custom node or such would want to add
> its own information?

Maybe, but seems to me that there'd be a whole lot of other infrastructure
needed to track additional per-worker state.  I'd just as soon not
expose this stuff until (a) there's a concrete not hypothetical use case
and (b) it's been around long enough to feel comfortable that there's
nothing wrong with the design.

>> +/*
>> + * In TEXT format, prefix the first output line for this worker with
>> + * "Worker N:".  Then, any additional lines should be indented one more
>> + * stop than the "Worker N" line is.
>> + */

> I don't quite get the Worker %d bit. Why are we not outputting that in
> the !worker_inited block?

We might strip it off again in ExplainCloseWorker, and then potentially
add it back again in a later ExplainOpenWorker.  That could only happen
if an earlier ExplainOpen/CloseWorker fragment decides not to emit any
text and then a later one wants to do so.  Which I'm pretty sure is
unreachable right now, but I don't think this code should assume that.

>> +appendStringInfoString(es->str, 
>> wstate->worker_str[i].data);

> Probably never matters, but given we do have the string length already,
> we could use appendBinaryStringInfo().

Ah, I was thinking about making that change but then forgot.

>> +ExplainCloseGroup("Worker", NULL, true, es);

> Not related to this patch: I never got why we maintain a grouping stack
> for some things, but don't do it for the group properties
> themselves.

Right now it'd just be extra overhead.  If we ever have a case where it's
not convenient for an ExplainCloseGroup caller to provide the same data
as for ExplainOpenGroup, then I'd be on board with storing that info.

> Hm. It might be worthwhile to rename ExplainOpenSetAsideGroup and use it
> from ExplainOpenGroup()? Seems we could just call it after
> ExplainOpenGroup()'s switch, and remove all the indent/grouping_stack
> related code from ExplainOpenGroup().

Hmm.  It seemed easier to me to keep them separate, but ...

I did consider a design in which, instead of ExplainOpenSetAsideGroup,
there was some function that would initialize the "state_save" area and
then you'd call the "restore" function to make that state active.  It
seemed like that would be too dissimilar from ExplainOpenGroup --- but
conceivably, we could reimplement ExplainOpenGroup as calling the
initialize function and then the restore function (along with doing some
output).  Not really sure that'd be an improvement though: it'd involve
less duplicate code, but the functions would individually be harder to
wrap your brain around.

regards, tom lane

Re: pg_stat_bgwriter.buffers_backend is pretty meaningless (and more?)

2020-01-25 Thread Andres Freund

Hi,

On 2020-01-25 15:43:41 +0100, Magnus Hagander wrote:
> On Fri, Jan 24, 2020 at 8:52 PM Andres Freund  wrote:
> > Additionally pg_stat_bgwriter.buffers_backend also counts writes done by
> > autovacuum et al.

> > I think it'd make sense to at least split buffers_backend into
> > buffers_backend_extend,
> > buffers_backend_write,
> > buffers_backend_write_strat
> >
> > but it could also be worthwhile to expand it into
> > buffers_backend_extend,
> > buffers_{backend,checkpoint,bgwriter,autovacuum}_write
> > buffers_{backend,autovacuum}_write_stat
> 
> Given that these are individual global counters, I don't really see
> any reason not to expand it to the bigger set of counters. It's easy
> enough to add them up together later if needed.

Are you agreeing to
buffers_{backend,checkpoint,bgwriter,autovacuum}_write
or are you suggesting further ones?

> > I think we also count things as writes that aren't writes: mdtruncate()
> > is AFAICT counted as one backend write for each segment. Which seems
> > weird to me.
> 
> It's at least slightly weird :) Might it be worth counting truncate
> events separately?

Is that really something interesting? Feels like it'd have to be done at
a higher level to be useful. E.g. the truncate done by TRUNCATE (when in
same xact as creation) and VACUUM are quite different. I think it'd be
better to just not include it.

> > Lastly, I don't understand what the point of sending fixed size stats,
> > like the stuff underlying pg_stat_bgwriter, through pgstats IPC. While
> > I don't like it's architecture, we obviously need something like pgstat
> > to handle variable amounts of stats (database, table level etc
> > stats). But that doesn't at all apply to these types of global stats.
> 
> That part has annoyed me as well a few times. +1 for just moving that
> into a global shared memory. Given that we don't really care about
> things being in sync between those different counters *or* if we loose
> a bit of data (which the stats collector is designed to do), we could
> even do that without a lock?

I don't think we'd quite want to do it without any (single counter)
synchronization - high concurrency setups would be pretty likely to
loose values that way. I suspect the best would be to have a struct in
shared memory that contains the potential counters for each potential
process. And then sum them up when actually wanting the concrete
value. That way we avoid unnecessary contention, in contrast to having a
single shared memory value for each(which would just pingpong between
different sockets and store buffers).  There's a few details like how
exactly to implement resetting the counters, but ...

Thanks,

Andres Freund

t/010_pg_basebackup.pl checksum verify fails with RELSEG_SIZE 1

2020-01-25 Thread Thomas Munro

Hi,

Sometimes I set RELSEG_SIZE to 1, as a way to get the various in-tree
tests to give the relation segment code a good workout.  That's
outside the range that configure --with-segsize would allow and
therefore not really a supported size (it's set in GB), but it's very
useful for giving the relation segment code a good workout on small
databases like the check-world ones.  At some point I think that
worked, but now it says:

t/010_pg_basebackup.pl ... 100/106
#   Failed test 'pg_basebackup does not report more than 5 checksum
mismatches stderr
/(?^s:^WARNING.*further.*failures.*will.not.be.reported)/'
#   at t/010_pg_basebackup.pl line 526.
#   'WARNING:  checksum verification failed in file
"./base/13759/16396", block 0: calculated 49B8 but expected A91E
# WARNING:  could not verify checksum in file "./base/13759/16396",
block 4: read buffer size 8225 and page size 8192 differ
# pg_basebackup: error: checksum error occurred
# '
# doesn't match '(?^s:^WARNING.*further.*failures.*will.not.be.reported)'

I haven't quite figured out why it does that yet (I don't see any
files of size other than 0 or 8192, as expected), but it'd be nice to
do that.  I was even thinking of running a bf animal that way.

Re: t/010_pg_basebackup.pl checksum verify fails with RELSEG_SIZE 1

2020-01-25 Thread Tom Lane

Thomas Munro  writes:
> Sometimes I set RELSEG_SIZE to 1, as a way to get the various in-tree
> tests to give the relation segment code a good workout.  That's
> outside the range that configure --with-segsize would allow and
> therefore not really a supported size (it's set in GB), but it's very
> useful for giving the relation segment code a good workout on small
> databases like the check-world ones.  At some point I think that
> worked, but now it says:

> t/010_pg_basebackup.pl ... 100/106
> #   Failed test 'pg_basebackup does not report more than 5 checksum
> mismatches stderr

So ... presumably, the problem is that this supposes that whatever
damage it did is spread across less than 5 relation segments, and
with a sufficiently small segment size, that assumption is wrong.

I'd say this is a a poorly designed test.

regards, tom lane

Re: vacuum verbose detail logs are unclear; log at start of each stage

2020-01-25 Thread Justin Pryzby

On Wed, Jan 22, 2020 at 02:34:57PM +0900, Michael Paquier wrote:
> From patch 0003:
> /*
> +* Indent multi-line DETAIL if being sent to client (verbose)
> +* We don't know if it's sent to the client (client_min_messages);
> +* Also, that affects output to the logfile, too; assume that it's 
> more
> +* important to format messages requested by the client than to make
> +* verbose logs pretty when also sent to the logfile.
> +*/
> +   msgprefix = elevel==INFO ? "!\t" : "";
> Such stuff gets a -1 from me.  This is not project-like, and you make
> the translation of those messages much harder than they should be.

I don't see why its harder to translate ?  Do you mean because it changes the
strings by adding %s ?

-   appendStringInfo(&sbuf, ngettext("%u page is entirely empty.\n",
-   "%u 
pages are entirely empty.\n",
+   appendStringInfo(&sbuf, ngettext("%s%u page is entirely empty.\n",
+   "%s%u 
pages are entirely empty.\n",
...

I did raise two questions regarding translation:

I'm not sure why this one doesn't use get ngettext() ?  Seems to have been
missed at a8d585c0.
|appendStringInfo(&buf, _("There were %.0f unused item identifiers.\n"),

Or why this one does use _/gettext() ?  (580ddcec suggests that I'm missing
something, but I just experimented, and it really seems to do nothing, since
"%s" shouldn't be translated).
|appendStringInfo(&buf, _("%s."), pg_rusage_show(&ru0));

Also, I realized it's possible to write different strings to the log vs the
client (with and without a prefix) by calling errdetail_internal() and
errdetail_log().

Here's a version rebased on top of f942dfb9, and making use of errdetail_log.
I'm not sure if it address your concern about translation, but it doesn't
change strings.

I think it's not needed or desirable to change what's written to the logfile,
since CSV logs have a separate "detail" field, and text logs are indented.  The
server log is unchanged:

> 2020-01-25 23:08:40.451 CST [13971] INFO:  "t": removed 0, found 160 
> nonremovable row versions in 1 out of 888 pages
> 2020-01-25 23:08:40.451 CST [13971] DETAIL:  0 dead row versions cannot be 
> removed yet, oldest xmin: 781
> There were 0 unused item identifiers.
> Skipped 0 pages due to buffer pins, 444 frozen pages.
> 0 pages are entirely empty.
> CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.01 s.

If VERBOSE, then the client log has ! prefixes, with the style borrowed from
ShowUsage:

> INFO:  "t": removed 0, found 160 nonremovable row versions in 1 out of 888 
> pages
> DETAIL:  ! 0 dead row versions cannot be removed yet, oldest xmin: 781
> ! There were 0 unused item identifiers.
> ! Skipped 0 pages due to buffer pins, 444 frozen pages.
> ! 0 pages are entirely empty.
> ! CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.01 s.

I mentioned before that maybe the client's messages with newlines should be
indented similarly to how the they're done in the text logfile.  I looked,
that's append_with_tabs() in elog.c.  So that's a different possible
implementation, which would apply to any message with newlines (or possibly
just DETAIL).

I'll also fork the allvisible/frozen/hintbits patches to a separate thread.

Thanks,
Justin
>From a3d0b41435655615ab13f808ec7c30e53e596e50 Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Sat, 25 Jan 2020 21:25:37 -0600
Subject: [PATCH v3 1/4] Remove gettext erronously readded at 580ddce

---
 src/backend/access/heap/vacuumlazy.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 8ce5011..8e8ea9d 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1690,7 +1690,7 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats,
 	"%u pages are entirely empty.\n",
 	empty_pages),
 	 empty_pages);
-	appendStringInfo(&buf, _("%s."), pg_rusage_show(&ru0));
+	appendStringInfo(&buf, "%s.", pg_rusage_show(&ru0));

 	ereport(elevel,
 			(errmsg("\"%s\": found %.0f removable, %.0f nonremovable row versions in %u out of %u pages",
-- 
2.7.4

>From 2db7c4e3482120b2a83cda74603f2454da7eaa03 Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Sat, 25 Jan 2020 22:50:46 -0600
Subject: [PATCH v3 2/4] vacuum verbose: use ngettext() everywhere possible

---
 src/backend/access/heap/vacuumlazy.c | 9 ++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 8e8ea9d..eb903d5 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1673,10 +1673,13 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats,
 	 * individual parts of the

Re: Greatest Common Divisor

2020-01-25 Thread Vik Fearing

On 25/01/2020 15:18, Dean Rasheed wrote:
> 
> Committed with some adjustments, mostly cosmetic but a couple more 
> substantive:

Thanks!

> The code to guard against a floating point exception with inputs of
> (INT_MIN, -1) wasn't quite right because it actually just moved the
> problem so that it would fall over with inputs of (INT_MIN, +1).

Good catch.

> The convention in numeric.c is that the xxx_var() functions take
> *pointers* to their NumericVar arguments rather than copies, and they
> do not modify their inputs, as indicated by the use of "const". You
> might just have gotten away with what you were doing, but I think it
> was bad style and potentially unsafe -- for example, someone calling
> gcd_var() with a NumericVar that came from some other computation and
> having a non-null buf would risk having the buf freed in the copy,
> leaving the original NumericVar with a buf pointing to freed memory.

Thank you for taking the time to look closely at this.  This was my
first time dealing with "numeric" so I was bound to make some mistakes.
-- 
Vik Fearing

Re: Allow to_date() and to_timestamp() to accept localized names

Re: Parallel grouping sets

Re: [WIP] UNNEST(REFCURSOR): allowing SELECT to consume data from a REFCURSOR

Re: Greatest Common Divisor

Re: pg_stat_bgwriter.buffers_backend is pretty meaningless (and more?)

Re: [Proposal] Global temporary tables

Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)

Re: [PATCH] ltree, lquery, and ltxtquery binary protocol support

Re: Strange coding in _mdfd_openseg()

Re: Duplicate Workers entries in some EXPLAIN plans

Re: Duplicate Workers entries in some EXPLAIN plans

Re: Duplicate Workers entries in some EXPLAIN plans

Re: Duplicate Workers entries in some EXPLAIN plans

Re: pg_stat_bgwriter.buffers_backend is pretty meaningless (and more?)

t/010_pg_basebackup.pl checksum verify fails with RELSEG_SIZE 1

Re: t/010_pg_basebackup.pl checksum verify fails with RELSEG_SIZE 1

Re: vacuum verbose detail logs are unclear; log at start of each stage

Re: Greatest Common Divisor

18 matches

Site Navigation

Mail list logo

Footer information