Re: [WIP] UNNEST(REFCURSOR): allowing SELECT to consume data from a REFCURSOR
Hi folks,I’ve made a revision of this patch. The significant change is to access the Portal using Portal APIs rather than through SPI. It seems the persisted state necessary to survive being used to retrieve a row at a time inside an SRF just isn’t a good fit for SPI. It turned out there was upstream machinery in the FunctionScan node that prevented Postgres being able to pipeline SRFs, even if they return ValuePerCall. So, in practice, this patch is of limited benefit without another patch that changes that behaviour (see [1]). Nevertheless, the code is independent so I’m submitting the two changes separately. I’ll push this into the Jan commit fest.denty. [1] https://commitfest.postgresql.org/26/2372/ unnest-refcursor-v3.patch Description: Binary data
Re: psql's \watch is broken
I've not dug into code itself, I just bisected it. Thanks for the report. I'll look into it. Looked at it already. Ah, the magic of timezones! And yes, I can see the difference. This comes from the switch from cancel_pressed to CancelRequested in psql, especially PSQLexecWatch() in this case. And actually, now that I look at it, I think that we should simply get rid of cancel_pressed in psql completely and replace it with CancelRequested. This also removes the need of having cancel_pressed defined in print.c, which was not really wanted originally. Attached is a patch which addresses the issue for me, and cleans up the code while on it. Fabien, Jeff, can you confirm please? Yep. Patch applies cleanly, compiles, works for me as well. -- Fabien.
Re: Memory-Bounded Hash Aggregation
On Wed, Nov 27, 2019 at 02:58:04PM -0800, Jeff Davis wrote: On Wed, 2019-08-28 at 12:52 -0700, Taylor Vesely wrote: Right now the patch always initializes 32 spill partitions. Have you given any thought into how to intelligently pick an optimal number of partitions yet? Attached a new patch that addresses this. 1. Divide hash table memory used by the number of groups in the hash table to get the average memory used per group. 2. Multiply by the number of groups spilled -- which I pessimistically estimate as the number of tuples spilled -- to get the total amount of memory that we'd like to have to process all spilled tuples at once. 3. Divide the desired amount of memory by work_mem to get the number of partitions we'd like to have such that each partition can be processed in work_mem without spilling. 4. Apply a few sanity checks, fudge factors, and limits. Using this runtime information should be substantially better than using estimates and projections. Additionally, I removed some branches from the common path. I think I still have more work to do there. I also rebased of course, and fixed a few other things. I've done a bit more testing on this, after resolving a couple of minor conflicts due to recent commits (rebased version attached). In particular, I've made a comparison with different dataset sizes, group sizes, GUC settings etc. The script and results from two different machines are available here: * https://bitbucket.org/tvondra/hashagg-tests/src/master/ The script essentially runs a simple grouping query with different number of rows, groups, work_mem and parallelism settings. There's nothing particularly magical about it. I did run it both on master and patched code, allowing us to compare results and assess impact of the patch. Overall, the changes are expected and either neutral or beneficial, i.e. the timing are the same or faster. The number of cases that regressed is fairly small, but sometimes the regressions are annoyingly large - up to 2x in some cases. Consider for example this trivial example with 100M rows: CREATE TABLE t AS SELECT (1 * random())::int AS a FROM generate_series(1,1) s(i); On the master, the plan with default work_mem (i.e. 4MB) and SET max_parallel_workers_per_gather = 8; looks like this: EXPLAIN SELECT * FROM (SELECT a, count(*) FROM t GROUP BY a OFFSET 10) foo; QUERY PLAN Limit (cost=16037474.49..16037474.49 rows=1 width=12) -> Finalize GroupAggregate (cost=2383745.73..16037474.49 rows=60001208 width=12) Group Key: t.a -> Gather Merge (cost=2383745.73..14937462.25 rows=10032 width=12) Workers Planned: 8 -> Partial GroupAggregate (cost=2382745.59..2601495.66 rows=1254 width=12) Group Key: t.a -> Sort (cost=2382745.59..2413995.60 rows=1254 width=4) Sort Key: t.a -> Parallel Seq Scan on t (cost=0.00..567478.04 rows=1254 width=4) (10 rows) Which kinda makes sense - we can't do hash aggregate, because there are 100M distinct values, and that won't fit into 4MB of memory (and the planner knows about that). And it completes in about 108381 ms, give or take. With the patch, the plan changes like this: EXPLAIN SELECT * FROM (SELECT a, count(*) FROM t GROUP BY a OFFSET 10) foo; QUERY PLAN --- Limit (cost=2371037.74..2371037.74 rows=1 width=12) -> HashAggregate (cost=1942478.48..2371037.74 rows=42855926 width=12) Group Key: t.a -> Seq Scan on t (cost=0.00..1442478.32 rows=10032 width=4) (4 rows) i.e. it's way cheaper than the master plan, it's not parallel, but when executed it takes much longer (about 147442 ms). After forcing a parallel query (by setting parallel_setup_cost = 0) the plan changes to a parallel one, but without a partial aggregate, but it's even slower. The explain analyze for the non-parallel plan looks like this: QUERY PLAN - Limit (cost=2371037.74..2371037.74 rows=1 width=12) (actual time=160180.718..160180.718 rows=0 loops=1) -> HashAggregate (cost=1942478.48..2371037.74 rows=42855926 width=12) (actual time=54462.728..157594.756 rows=63215980 loops=1) Group Key: t.a Memory Usage: 4096kB Batches: 8320 Disk Usage:4529172kB -> Seq Scan on t (cost=0.00..1442478.32 rows=10032 width=4) (actual time=0.014..12198.044 rows=1 loops=1) Planning Time: 0.110 ms Execution Time: 160183
Re: tuplesort test coverage
Andres Freund writes: > On 2019-12-12 09:27:04 -0500, Tom Lane wrote: >> What seems like a simpler and more reliable fix is to make >> test_mark_restore a temp table, thus keeping autovac away from it. >> Is there a reason in terms of the test's goals not to do that? > I can't see any reason. The sorting code shouldn't care about the source > of tuples. I guess there could at some point be tests for parallel > sorting, but that'd just use a different table. OK, done that way. >> Also ... why in the world does the script drop its tables at the end >> with IF EXISTS? They'd better exist at that point. I object >> to the DROP IF EXISTS up at the top, too. The regression tests >> do not need to be designed to deal with an unpredictable start state, >> and coding them to do so can have no effect other than possibly >> masking problems. > Well, it makes it a heck of a lot easier to run tests in isolation while > evolving them. While I personally think it's good to leave cleanup for > partial states in for cases where it was helpful during development, I > also don't care about it strongly. As far as that goes, making the tables temp is an even better solution. regards, tom lane
Re: [HACKERS] proposal: schema variables
po 18. 11. 2019 v 19:47 odesílatel Pavel Stehule napsal: > > > ne 3. 11. 2019 v 17:27 odesílatel Pavel Stehule > napsal: > >> >> >> čt 10. 10. 2019 v 11:41 odesílatel Pavel Stehule >> napsal: >> >>> Hi >>> >>> minor change - replace heap_tuple_fetch_attr by detoast_external_attr. >>> >>> >> similar update - heap_open, heap_close was replaced by table_open, >> table_close >> > > fresh rebase > only rebase Regards Pavel > Regards > > Pavel > > >> Regards >> >> Pavel >> > schema-variables-20191214.patch.gz Description: application/gzip
Re: pg_ls_tmpdir to show shared filesets
On Thu, Dec 12, 2019 at 11:39:31PM -0600, Justin Pryzby wrote: > I suggested that pg_ls_tmpdir should show shared filesets like > > 169347 5492 -rw-r- 1 postgres postgres 5619712 Dec 7 01:35 > > /var/lib/pgsql/12/data/base/pgsql_tmp/pgsql_tmp11025.0.sharedfileset/0.0 .. > Actually, my suggestion would be to make pg_ls_tmpdir expose "isdir", same as > pg_stat_file. Done like that >From 83740ad5897f8724b073e110ff3f477875fcaeaf Mon Sep 17 00:00:00 2001 From: Justin Pryzby Date: Sat, 14 Dec 2019 16:22:15 -0600 Subject: [PATCH v1] pg_ls_tmpdir to show directories See also 9cd92d1a33699f86aa53d44ab04cc3eb50c18d11 --- doc/src/sgml/func.sgml | 12 +++- src/backend/utils/adt/genfile.c | 29 ++--- src/include/catalog/catversion.h | 2 +- src/include/catalog/pg_proc.dat | 8 4 files changed, 30 insertions(+), 21 deletions(-) diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml index 5a98158..83e567e 100644 --- a/doc/src/sgml/func.sgml +++ b/doc/src/sgml/func.sgml @@ -21922,12 +21922,14 @@ postgres=# SELECT * FROM pg_walfile_name_offset(pg_stop_backup()); setof record -List the name, size, and last modification time of files in the -temporary directory for tablespace. If +For files in the temporary directory for +tablespace, list the name, size, last modification time, +and boolean indicating if it's a directory. Directories are used for temporary files +used by parallel processes. If tablespace is not provided, the -pg_default tablespace is used. Access is granted -to members of the pg_monitor role and may be -granted to other non-superuser roles. +pg_default tablespace is used. Access is granted to +members of the pg_monitor role and may be granted to +other non-superuser roles. diff --git a/src/backend/utils/adt/genfile.c b/src/backend/utils/adt/genfile.c index 5d4f26a..671fc60 100644 --- a/src/backend/utils/adt/genfile.c +++ b/src/backend/utils/adt/genfile.c @@ -522,9 +522,9 @@ pg_ls_dir_1arg(PG_FUNCTION_ARGS) return pg_ls_dir(fcinfo); } -/* Generic function to return a directory listing of files */ +/* Generic function to return a directory listing of files (and optionally dirs) */ static Datum -pg_ls_dir_files(FunctionCallInfo fcinfo, const char *dir, bool missing_ok) +pg_ls_dir_files(FunctionCallInfo fcinfo, const char *dir, bool missing_ok, bool dir_ok) // could be bitmap of "flags" { FuncCallContext *funcctx; struct dirent *de; @@ -540,13 +540,17 @@ pg_ls_dir_files(FunctionCallInfo fcinfo, const char *dir, bool missing_ok) fctx = palloc(sizeof(directory_fctx)); - tupdesc = CreateTemplateTupleDesc(3); + tupdesc = CreateTemplateTupleDesc(dir_ok ? 4:3); TupleDescInitEntry(tupdesc, (AttrNumber) 1, "name", TEXTOID, -1, 0); TupleDescInitEntry(tupdesc, (AttrNumber) 2, "size", INT8OID, -1, 0); TupleDescInitEntry(tupdesc, (AttrNumber) 3, "modification", TIMESTAMPTZOID, -1, 0); + if (dir_ok) + TupleDescInitEntry(tupdesc, (AttrNumber) 4, "isdir", + BOOLOID, -1, 0); + funcctx->tuple_desc = BlessTupleDesc(tupdesc); fctx->location = pstrdup(dir); @@ -575,8 +579,8 @@ pg_ls_dir_files(FunctionCallInfo fcinfo, const char *dir, bool missing_ok) while ((de = ReadDir(fctx->dirdesc, fctx->location)) != NULL) { - Datum values[3]; - bool nulls[3]; + Datum values[4]; + bool nulls[4]; char path[MAXPGPATH * 2]; struct stat attrib; HeapTuple tuple; @@ -593,14 +597,17 @@ pg_ls_dir_files(FunctionCallInfo fcinfo, const char *dir, bool missing_ok) errmsg("could not stat directory \"%s\": %m", dir))); /* Ignore anything but regular files */ - if (!S_ISREG(attrib.st_mode)) + if (!S_ISREG(attrib.st_mode) && + (!dir_ok && S_ISDIR(attrib.st_mode))) continue; values[0] = CStringGetTextDatum(de->d_name); values[1] = Int64GetDatum((int64) attrib.st_size); values[2] = TimestampTzGetDatum(time_t_to_timestamptz(attrib.st_mtime)); - memset(nulls, 0, sizeof(nulls)); + if (dir_ok) + values[3] = BoolGetDatum(S_ISDIR(attrib.st_mode)); + memset(nulls, 0, sizeof(nulls)); tuple = heap_form_tuple(funcctx->tuple_desc, values, nulls); SRF_RETURN_NEXT(funcctx, HeapTupleGetDatum(tuple)); } @@ -613,14 +620,14 @@ pg_ls_dir_files(FunctionCallInfo fcinfo, const char *dir, bool missing_ok) Datum pg_ls_logdir(PG_FUNCTION_ARGS) { - return pg_ls_dir_files(fcinfo, Log_directory, false); + return pg_ls_dir_files(fcinfo, Log_directory, false, false); } /* Function to return the list of files in the WAL directory */ Datum pg_ls_waldir(PG_FUNCTION_ARGS) { - return pg_ls_dir_files(fcinfo, XLOGDIR, false); + return pg_ls_dir_files(fcinfo, XLOGDIR, false, false); } /* @@ -638,7 +645,7 @@ pg_ls_tmpdir(FunctionCallInfo fcinfo, Oid tblspc) tblspc))); TempTablespacePath(path, tbl
Re: Wrong assert in TransactionGroupUpdateXidStatus
On Thu, Dec 12, 2019 at 9:23 PM Amit Kapila wrote: > Do you think we need such an Assert after having StaticAssert for > (THRESHOLD_SUBTRANS_CLOG_OPT <= PGPROC_MAX_CACHED_SUBXIDS) and then > an if statement containing (nsubxids <= THRESHOLD_SUBTRANS_CLOG_OPT) > just before this Assert? Sure, we can keep this for extra safety, but > I don't see the need for it. I don't have strong feelings about it. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: allowing broader use of simplehash
On Thu, Dec 12, 2019 at 2:33 PM Andres Freund wrote: > I was basically just thinking that we could pass the context to use via > CurrentMemoryContext, instead of explicitly passing it in. I thought about that, but as a general rule, replacing a function parameter with a global variable is the wrong direction. One could argue this particular case is a counterexample, and I won't fight tooth and nail if you want to take that position, but I don't think I believe it myself. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: more backtraces
On Fri, Dec 13, 2019 at 7:26 AM Peter Eisentraut wrote: > On 2019-12-04 22:34, Tom Lane wrote: > > Andres Freund writes: > >> It'd be bad if the addition of backtraces for SEGV/BUS suddenly made it > >> harder to attach a debugger and getting useful results. > > > > Yeah. TBH, I'm not sure I want this, at least not in debug builds. > > I understand that the SEGV/BUS thing can be a bit scary. We can skip it. > > Are people interested in backtraces on abort()? That was asked for in > an earlier thread. I mean, I think backtraces are great, and we should have more of them. It's possible that trying to do it in certain cases will cause problems, but we could back off those cases as we find them, or maybe try to work around them using sigaltstack(), or maybe back it off in debug builds. It would make life a lot easier for me if I never had to explain to a customer (1) how to install gdb or (2) that they needed to get $BOSS to approve installation of development tools on production systems. I would hate to see us shy away from improvements that might reduce the need for such conversations on the theory that bad stuff *might* happen. In my experience, the importance of having a stack trace in the log is greatest for a segmentation fault, because otherwise you have no indication whatsoever of where the problem happened. Having the query text has been a boon, but it's still not a lot to go on unless the same query crashes every time. In other situations, like a PANIC, Assertion failure, or (and this is a big one) non-descriptive error message (cache look failed for thingy %u) a backtrace is sometimes really helpful as well. You don't *always* need it, but you *often* need it. It is absolutely important that we don't break debuggability in the service of getting more stack traces. At the same time, there are a lot more PostgreSQL users out there than there are PostgreSQL developers, and a lot of those people are running non-cassert, non-debug builds. Being able to get debugging information from failures that happen on those installations that enables us to fix things without having to go through a time-consuming process of guesswork and attempted reproduction is really valuable. A stack trace can turn a lengthy nightmare into a quick fix. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company