Re: PoC Refactor AM analyse API
ze improvement for > append-optimized row and column AM with variable size compressed blocks. > Currently we do analyze in two steps. > > 1. Sample fix size blocks with algorithm S from Knuth (BlockSampler function) > 2. Collect tuples into reservoir with algorithm Z from Vitter. > > So this doesn’t work for AMs using variable sized physical blocks for > example. They need weight random sampling (WRS) algorithms like A-Chao or > logical blocks to follow S-Knuth (and have a problem with > RelationGetNumberOfBlocks() estimating a physical number of blocks). Another > problem with columns - they are not passed to analyze begin scan and can’t > benefit from column storage at ANALYZE TABLE (COL). > > The suggestion is to replace table_scan_analyze_next_block() and > table_scan_analyze_next_tuple() with a single function: > table_acquire_sample_rows(). The AM implementation of > table_acquire_sample_rows() can use the BlockSampler functions if it wants > to, but if the AM is not block-oriented, it could do something else. This > suggestion also passes VacAttrStats to table_acquire_sample_rows() for > column-oriented AMs and removes PROGRESS_ANALYZE_BLOCKS_TOTAL and > PROGRESS_ANALYZE_BLOCKS_DONE definitions as not all AMs can be block-oriented. > > > > > > Best regards, > Denis Smirnov | Developer > s...@arenadata.io > Arenadata | Godovikova 9-17, Moscow 129085 Russia >
Re: PoC Refactor AM analyse API
Andrey, thanks for your feedback! I agree that AMs with fix sized blocks can have much alike code in acquire_sample_rows() (though it is not a rule). But there are several points about current master sampling. * It is not perfect - AM developers may want to improve it with other sampling algorithms. * It is designed with a big influence of heap AM - for example, RelationGetNumberOfBlocks() returns uint32 while other AMs can have a bigger amount of blocks. * heapam_acquire_sample_rows() is a small function - I don't think it is not a big trouble to write something alike for any AM developer. * Some AMs may have a single level sampling (only algorithm Z from Vitter for example) - why not? As a result we get a single and clear method to acquire rows for statistics. If we don’t modify but rather extend current API ( for example in a manner it is done for FDW) the code becomes more complicated and difficult to understand. > 8 дек. 2020 г., в 18:42, Andrey Borodin написал(а): > > Hi Denis! > >> 7 дек. 2020 г., в 18:23, Смирнов Денис написал(а): >> >> I suggest a refactoring of analyze AM API as it is too much heap specific at >> the moment. The problem was inspired by Greenplum’s analyze improvement for >> append-optimized row and column AM with variable size compressed blocks. >> Currently we do analyze in two steps. >> >> 1. Sample fix size blocks with algorithm S from Knuth (BlockSampler function) >> 2. Collect tuples into reservoir with algorithm Z from Vitter. >> >> So this doesn’t work for AMs using variable sized physical blocks for >> example. They need weight random sampling (WRS) algorithms like A-Chao or >> logical blocks to follow S-Knuth (and have a problem with >> RelationGetNumberOfBlocks() estimating a physical number of blocks). Another >> problem with columns - they are not passed to analyze begin scan and can’t >> benefit from column storage at ANALYZE TABLE (COL). >> >> The suggestion is to replace table_scan_analyze_next_block() and >> table_scan_analyze_next_tuple() with a single function: >> table_acquire_sample_rows(). The AM implementation of >> table_acquire_sample_rows() can use the BlockSampler functions if it wants >> to, but if the AM is not block-oriented, it could do something else. This >> suggestion also passes VacAttrStats to table_acquire_sample_rows() for >> column-oriented AMs and removes PROGRESS_ANALYZE_BLOCKS_TOTAL and >> PROGRESS_ANALYZE_BLOCKS_DONE definitions as not all AMs can be >> block-oriented. > > Just few random notes about the idea. > Heap pages are not of a fixed size, when measured in tuple count. And comment > in the codes describes it. > * Although every row has an equal chance of ending up in the final > * sample, this sampling method is not perfect: not every possible > * sample has an equal chance of being selected. For large relations > * the number of different blocks represented by the sample tends to be > * too small. We can live with that for now. Improvements are welcome. > > Current implementation provide framework with shared code. Though this > framework is only suitable for block-of-tuples AMs. And have statistical > downsides when count of tuples varies too much. > Maybe can we just provide a richer API? To have both: avoid copying code and > provide flexibility. > > Best regards, Andrey Borodin. >
Re: libpq compression
Hello all, I’ve finally read the whole thread (it was huge). It is extremely sad that this patch hang without progress for such a long time. It seems that the main problem in discussion is that everyone has its own view what problems should be solve with this patch. Here are some of positions (not all of them): 1. Add a compression for networks with a bad bandwidth (and make a patch as simple and maintainable as possible) - author’s position. 2. Don’t change current network protocol and related code much. 3. Refactor compression API (and network compression as well) 4. Solve cloud provider’s problems: on demand buy network bandwidth with CPU utilisation and vice versa. All of these requirements have a different nature and sometimes conflict with each other. Without clearly formed requirements this patch would never be released. Anyway, I have rebased it to the current master branch, applied pgindent, tested on MacOS and fixed a MacOS specific problem with strcpy in build_compressors_list(): it has an undefined behaviour when source and destination strings overlap. - *client_compressors = src = dst = strdup(value); + *client_compressors = src = strdup(value); + dst = strdup(value); According to my very simple tests with randomly generated data, zstd gives about 3x compression (zlib has a little worse compression ratio and a little bigger CPU utilisation). It seems to be a normal ratio for any streaming data - Greenplum also uses zstd/zlib to compress append optimised tables and compression ratio is usually about 3-5x. Also according to my Greenplum experience, the most commonly used zstd ratio is 1, while for zlib it is usually in a range of 1-5. CPU and execution time were not affected much according to uncompressed data (but my tests were very simple and they should not be treated as reliable). From e33b9c99a6fdac54df40aa0c14eb43b6b1f3709d Mon Sep 17 00:00:00 2001 From: Denis Smirnov Date: Wed, 16 Dec 2020 22:02:49 +1000 Subject: [PATCH] Rebase patch-27 to actual master and fix strcpy 1. Patch-27 had conflicts with actual master. They were fixed. 2. On MacOS psql failed with EXC_BAD_INSTRUCTION on "compression" option. The problem was in strcpy OS-specific implementation: "The source and destination strings should not overlap, as the behavior is undefined." 3. pgindent was applied to the code. --- configure | 102 +++- configure.ac | 21 + .../postgres_fdw/expected/postgres_fdw.out| 2 +- doc/src/sgml/config.sgml | 18 + doc/src/sgml/libpq.sgml | 16 + doc/src/sgml/protocol.sgml| 97 +++ src/Makefile.global.in| 1 + src/backend/Makefile | 8 + src/backend/catalog/system_views.sql | 9 + src/backend/libpq/pqcomm.c| 245 ++-- src/backend/postmaster/pgstat.c | 30 + src/backend/postmaster/postmaster.c | 13 +- src/backend/utils/adt/pgstatfuncs.c | 50 +- src/backend/utils/misc/guc.c | 10 + src/common/Makefile | 3 +- src/common/zpq_stream.c | 578 ++ src/include/catalog/pg_proc.dat | 18 +- src/include/common/zpq_stream.h | 36 ++ src/include/libpq/libpq-be.h | 3 + src/include/libpq/libpq.h | 1 + src/include/libpq/pqcomm.h| 1 + src/include/pg_config.h.in| 3 + src/include/pgstat.h | 7 + src/interfaces/libpq/Makefile | 14 + src/interfaces/libpq/fe-connect.c | 83 ++- src/interfaces/libpq/fe-exec.c| 4 +- src/interfaces/libpq/fe-misc.c| 54 +- src/interfaces/libpq/fe-protocol3.c | 162 - src/interfaces/libpq/libpq-int.h | 20 + src/test/regress/expected/rules.out | 14 +- src/tools/msvc/Mkvcbuild.pm | 2 +- 31 files changed, 1532 insertions(+), 93 deletions(-) create mode 100644 src/common/zpq_stream.c create mode 100644 src/include/common/zpq_stream.h diff --git a/configure b/configure index 11a4284e5b..a64125f3ec 100755 --- a/configure +++ b/configure @@ -699,6 +699,7 @@ LD LDFLAGS_SL LDFLAGS_EX with_zlib +with_zstd with_system_tzdata with_libxslt XML2_LIBS @@ -866,6 +867,7 @@ with_libxml with_libxslt with_system_tzdata with_zlib +with_zstd with_gnu_ld enable_largefile ' @@ -8573,40 +8575,120 @@ fi # -# Zlib +# Assignments # +CPPFLAGS="$CPPFLAGS $INCLUDES" +LDFLAGS="$LDFLAGS $LIBDIRS" -# Check whether --with-zlib was given. -if test "${with_zlib+set}" = set; then : - withval=$with_zlib; + + +# +# ZStd +# + + + +# Check whether --with-zstd was given. +if test
Re: PoC Refactor AM analyse API
> But why do you pass int natts and VacAttrStats **stats to > acquire_sample_rows()? Is it of any use? It seems to break abstraction too. Yes, it is really a kluge that should be discussed. The main problem is that we don’t pass projection information to analyze scan (analyze begin scan relise only on relation information during initialization). And as a result we can’t benefit from column AMs during «analyze t(col)» and consume data only from target columns. This parameters were added to solve this problem. Best regards, Denis Smirnov | Developer s...@arenadata.io Arenadata | Godovikova 9-17, Moscow 129085 Russia
Re: PoC Refactor AM analyse API
Thanks for your review, Heikki. I have made the changes you have requested. 1. All modifications interconnected with column projection were reverted (they should be added in https://commitfest.postgresql.org/31/2922 if the current patch would be merged one day). 2. I have returned PROGRESS_ANALYZE_* states. 3. qsort() was moved into heapam_acquire_sample_rows(). Also a comment was added, that the acquire_sample_rows() AM function must return the tuples in a physical table order. 4. table_beginscan_analyze() was removed as a redundant function. 5. acquire_sample_rows() comment about reservoir was changed. From a5104b15683524a375aab1beb9615f690de31882 Mon Sep 17 00:00:00 2001 From: Denis Smirnov Date: Sat, 5 Dec 2020 23:25:29 +1000 Subject: [PATCH v2] Refactor AM analyze (acqiure rows method) Analyze AM functions were implemented for fix sized block AM (heap influence). Other AMs with variable size blocks are not be able to use current two stage block sampling (Knuth' algorithm S and Vitter's algorithm Z). This refactoring gives more freedom to AM developers to implement analyze for non-fix sized AMs. Discussion: https://www.postgresql.org/message-id/flat/C7CFE16B-F192-4124-BEBB-7864285E0FF7%40arenadata.io --- src/backend/access/heap/heapam_handler.c | 199 ++- src/backend/access/table/tableamapi.c| 3 +- src/backend/commands/analyze.c | 189 + src/include/access/tableam.h | 99 +++ 4 files changed, 227 insertions(+), 263 deletions(-) diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c index 4a70e20a14..6bf0c887be 100644 --- a/src/backend/access/heap/heapam_handler.c +++ b/src/backend/access/heap/heapam_handler.c @@ -33,9 +33,11 @@ #include "catalog/storage.h" #include "catalog/storage_xlog.h" #include "commands/progress.h" +#include "commands/vacuum.h" #include "executor/executor.h" #include "miscadmin.h" #include "pgstat.h" +#include "port.h" #include "storage/bufmgr.h" #include "storage/bufpage.h" #include "storage/lmgr.h" @@ -44,6 +46,7 @@ #include "storage/smgr.h" #include "utils/builtins.h" #include "utils/rel.h" +#include "utils/sampling.h" static void reform_and_rewrite_tuple(HeapTuple tuple, Relation OldHeap, Relation NewHeap, @@ -53,6 +56,8 @@ static bool SampleHeapTupleVisible(TableScanDesc scan, Buffer buffer, HeapTuple tuple, OffsetNumber tupoffset); +static int compare_rows(const void *a, const void *b); + static BlockNumber heapam_scan_get_blocks_done(HeapScanDesc hscan); static const TableAmRoutine heapam_methods; @@ -1154,6 +1159,173 @@ heapam_scan_analyze_next_tuple(TableScanDesc scan, TransactionId OldestXmin, return false; } +/* + * As of May 2004 we use a new two-stage method: Stage one selects up + * to targrows random blocks (or all blocks, if there aren't so many). + * Stage two scans these blocks and uses the Vitter algorithm to create + * a random sample of targrows rows (or less, if there are less in the + * sample of blocks). The two stages are executed simultaneously: each + * block is processed as soon as stage one returns its number and while + * the rows are read stage two controls which ones are to be inserted + * into the sample. + * + * Although every row has an equal chance of ending up in the final + * sample, this sampling method is not perfect: not every possible + * sample has an equal chance of being selected. For large relations + * the number of different blocks represented by the sample tends to be + * too small. We can live with that for now. Improvements are welcome. + * + * An important property of this sampling method is that because we do + * look at a statistically unbiased set of blocks, we should get + * unbiased estimates of the average numbers of live and dead rows per + * block. The previous sampling method put too much credence in the row + * density near the start of the table. + */ +static int +heapam_acquire_sample_rows(Relation rel, int elevel, + BufferAccessStrategy bstrategy, + HeapTuple *rows, int targrows, + double *totalrows, double *totaldeadrows) +{ + int numrows = 0;/* # rows now in reservoir */ + double samplerows = 0; /* total # rows collected */ + double liverows = 0; /* # live rows seen */ + double deadrows = 0; /* # dead rows seen */ + double rowstosk
Re: PoC Refactor AM analyse API
Hello, Zhihong. Thanks for your comments. 1. I am afraid that an equivalence of "floor(val + 0.5)" to "cell(val)" is incorrect: "floor(2.1 + 0.5)" returns 2 while "cell(2.1)" returns 3. We can’t use such replacement, as you have suggested. 2. >> For compare_rows(), it seems the computation of oa and ob can be delayed to when ba == bb (after the first two if statements). I have checked some examples of ASM code generated by different compilers with -O2/O3 flags on https://gcc.godbolt.org and didn’t see any big benefit in result CPU instructions. You can check yourself an attached example below. simplified_compare_rows.c Description: Binary data Best regards, Denis Smirnov | Developer s...@arenadata.io Arenadata | Godovikova 9-17, Moscow 129085 Russia
Re: Use virtual tuple slot for Unique node
I have taken a look at this discussion, at the code and I am confused how we choose tuple table slot (TTS) type in PG. May be you can clarify this topic or me. 1. Brief intro. There are four types of TTS. Plan tree «leaves»: - buffer heap (produced by index and table scans, has system columns and keeps shared buffer pins) - heap (produced by FDW: has system columns, but doesn’t keep any pins) - minimal (produced by values and materializations nodes like sort, agg, etc.) Plan «branches»: - virtual (keeps datum references to the columns of the tuples in the child nodes) Virtual TTS is cheeper to copy among the plan (as we copy datum references), but more expensive to materialize (we have to construct a tuple from pieces). Leaves are cheeper to materialize (as we make a memcmp under hood), but very expensive to copy (we copy the value, not the datum reference). 2. If we take a look at the materialize nodes in the plan, they produce different result TTS. - Outer child TTS type: gater, gather merge, lock rows, limit; - Minimal: material, sort, incremental sort, memoize, unique, hash, setup (can be heap as well); - Virtual: group, agg, window agg. From my point of view, the materialization node should preserve the incoming TTS type. For the sort node (that materializes incoming tuples as minimal) it is ok to output minimal result as well. Looks that unique should use the outer child’d TTS (instead of hardcoded minimal). But can anyone explain me why do group, agg and window agg return the virtual instead of the same TTS type as outer child has? Do we expect that the parent node exists and requires exactly virtual tuples (but what if the parent node is sort and benefits from minimal TTS)? So, it looks like we need to take a look not only at the unique, but also inspect all the materialization nodes.
Re: Use virtual tuple slot for Unique node
It looks like my patch was not analyzed by the hackers mailing list due to incorrect mime type, so I duplicate it here. commit 2852a3f2fab8e723f208d81c1ad1eb6a6a377b09 Author: Denis Smirnov Date: Thu Aug 31 08:51:14 2023 +0700 Change tuple table slot for Unique node to "virtual" The Unique node does a very simple thing in the plan - it processes the incoming sorted tuple stream and adds the unique tuples to the resulting tuple table slot. Previously the Unique node palloc'ed the results with the "miminal" tuples. It is redundant and for now we simply collect references to the child node tuples with the "virtual" tuples. diff --git a/src/backend/executor/nodeUnique.c b/src/backend/executor/nodeUnique.c index 45035d74fa..c859add6e0 100644 --- a/src/backend/executor/nodeUnique.c +++ b/src/backend/executor/nodeUnique.c @@ -141,7 +141,7 @@ ExecInitUnique(Unique *node, EState *estate, int eflags) * Initialize result slot and type. Unique nodes do no projections, so * initialize projection info for this node appropriately. */ - ExecInitResultTupleSlotTL(&uniquestate->ps, &TTSOpsMinimalTuple); + ExecInitResultTupleSlotTL(&uniquestate->ps, &TTSOpsVirtual); uniquestate->ps.ps_ProjInfo = NULL; /* > 31 авг. 2023 г., в 10:06, Denis Smirnov написал(а): > >
Re: Use virtual tuple slot for Unique node
I have made a small research and found out that though the patch itself is correct (i.e. we can benefit from replacing TTSOpsMinimalTuple with TTSOpsVirtual for the Unique node), my explanation WHY was wrong.1. We always materialize the new unique tuple in the slot, never mind what type of tuple table slots do we use.2. But the virtual tuple materialization (tts_virtual_copyslot) have performance benefits over the minimal tuple one (tts_minimal_copyslot): - tts_minimal_copyslot always allocates zeroed memory with palloc0 (expensive according to the flame graph); - tts_minimal_copyslot() doesn’t allocate additional memory if the tuples are constructed from the passed by value column (but for the variable-size columns we still need memory allocation); - if tts_minimal_copyslot() need allocation it doesn’t need to zero the memory;So as a result we seriously benefit from virtual TTS for the tuples constructed from the fixed-sized columns when get a Unique node in the plan.commit 148642d81f046b7d72b3a40182c165e42a8ab6d7 Author: Denis Smirnov Date: Thu Aug 31 08:51:14 2023 +0700 Change tuple table slot for Unique node to "virtual" The Unique node uses minimal TTS implementation to copy the unique tuples from the sorted stream into the resulting tuple slot. But if we replace the minimal TTS with the virtual TTS copy method, the performance improves. 1. Minimal TTS always allocates zeroed memory for the materialized tuple. 2. Virtual TTS doesn't allocate additional memory for the tuples with the columns passed by value. For the columns with external memory we don't need to zero the bytes but can simply take the memory chunk from the free list "as is". diff --git a/src/backend/executor/nodeUnique.c b/src/backend/executor/nodeUnique.c index 45035d74fa..c859add6e0 100644 --- a/src/backend/executor/nodeUnique.c +++ b/src/backend/executor/nodeUnique.c @@ -141,7 +141,7 @@ ExecInitUnique(Unique *node, EState *estate, int eflags) * Initialize result slot and type. Unique nodes do no projections, so * initialize projection info for this node appropriately. */ - ExecInitResultTupleSlotTL(&uniquestate->ps, &TTSOpsMinimalTuple); + ExecInitResultTupleSlotTL(&uniquestate->ps, &TTSOpsVirtual); uniquestate->ps.ps_ProjInfo = NULL; /* 31 авг. 2023 г., в 10:28, Denis Smirnov написал(а):It looks like my patch was not analyzed by the hackers mailing list due to incorrect mime type, so I duplicate it here.31 авг. 2023 г., в 10:06, Denis Smirnov написал(а):
Re: Async-unsafe functions in signal handlers
As far as I understand, the main problem with backtrace_symbols() is the internal malloc() call. Backend can lock forever if malloc() was interrupted by a signal and then was evaluated again in a signal handler. At the moment Greenplum uses "addr2line -s -e» (on Linux) and "atos -o" (on macOS) for each stack address instead of backtrace_symbols(). Both of these utils don’t use malloc() underhood, although there is no guarantee that this implementation never changes in the future. It seems to be a safer approach, but looks like a dirty hack. > 26 авг. 2021 г., в 08:52, Andrey Borodin написал(а): > > > >> 25 авг. 2021 г., в 19:22, Denis Smirnov написал(а): >> >> I am going to refactor Greenplum backtraces for error messages and want to >> make it more compatible with PostgreSQL code. Backtraces in PostgreSQL were >> introduced by 71a8a4f6e36547bb060dbcc961ea9b57420f7190 commit (original >> discussion >> https://www.postgresql.org/message-id/CAMsr+YGL+yfWE=jvbubnpwtrrzney7hj07+zt4byjdvp4sz...@mail.gmail.com >> ) and rely on backtrace() and backtrace_symbols() functions. They are used >> inside errfinish() that is wrapped by ereport() macros. ereport() is invoked >> inside bgworker_die() and FloatExceptionHandler() signal handlers. I am >> confused with this fact - both backtrace functions are async-unsafe: >> backtrace_symbols() - always, backtrace() - only for the first call due to >> dlopen. I wonder why does PostgreSQL use async-unsafe functions in signal >> handlers? > > In my view GUC backtrace_functions is expected to be used for debug purposes. > Not for enabling on production server for bgworker_die() or > FloatExceptionHandler(). > Are there any way to call backtrace_symbols() without touching > backtrace_functions? > > Best regards, Andrey Borodin. > Best regards, Denis Smirnov | Developer s...@arenadata.io Arenadata | Godovikova 9-17, Moscow 129085 Russia
Re: Async-unsafe functions in signal handlers
> 26 авг. 2021 г., в 23:39, Tom Lane написал(а): > > (BTW, I think it's pretty silly to imagine that adding backtrace() > calls inside ereport is making things any more dangerous. ereport > has pretty much always carried a likelihood of calling malloc(), > for example.) I have taken a look through the signal handlers and found out that many of them use malloc() via ereport() and elog(). Here is the list: SIGUSR1 - procsignal_sigusr1_handler(): autoprewarm, autovacuum, bgworker, bgwriter, checkpointer, pgarch, startup, walwriter, walreciever, walsender - sigusr1_handler(): postmaster SIGFPE: - FloatExceptionHandler(): autovacuum, bgworker, postgres, plperl SIGHUP: - SIGHUP_handler(): postmaster SIGCHLD: - reaper(): postmaster SIGQUIT: - quickdie(): postgres SIGTERM: - bgworker_die(): bgworker SIGALRM: - handle_sig_alarm(): autovacuum, bgworker, postmaster, startup, walsender, postgres I suspect there are lots of potential ways to lock on malloc() inside any of this handlers. An interesting question is why there are still no evidence of such locks? Best regards, Denis Smirnov | Developer s...@arenadata.io Arenadata | Godovikova 9-17, Moscow 129085 Russia
Re: Async-unsafe functions in signal handlers
> 28 авг. 2021 г., в 07:05, Andres Freund написал(а): > > However, we have a > bandaid that deals with possible hangs, by SIGKILLing when processes don't > shut down (at that point things have already gone quite south, so that's not > an issue). Thanks for the explanation. I can see that child process SIGKILL machinery was introduced by 82233ce7ea42d6ba519aaec63008aff49da6c7af commit to fix a malloc() deadlock in quickdie() signal handler. As a result, all child processes that die too long are killed in the ServerLoop() with SIGKILL. But bgworker_die() is a problem as we initialize bgworkers right before ServerLoop(). So we can face malloc() deadlock on postmaster startup (before ServerLoop() started). Maybe we should simply use write() and exit() instead of ereport() for bgworker_die()? Best regards, Denis Smirnov | Developer s...@arenadata.io Arenadata | Godovikova 9-17, Moscow 129085 Russia
Re: Async-unsafe functions in signal handlers
Honestly, I don’t know what to do with bgworker_die(). At the moment it produces ereport(FATAL) with async-unsafe proc_exit_prepare() and exit() underhood. I can see three solutions: 1. Leave the code as is. Then SIGTERM can produce deadlocks in bgworker's signal handler. The locked process can terminated with an immediate shutdown <https://github.com/postgres/postgres/commit/82233ce7ea42d6ba519aaec63008aff49da6c7af> of the cluster. May be it is ok as we don’t expect to send SIGTERM to bgworker too often. 2. Use async-safe _exit() in a signal handler instead of proc_exit_prepare() and exit(). In this case we’ll have to go through cluster recovery as the bgworker doesn't properly clean its shared memory. This solution is even worth than immediate shutdown as we recover for every SIGTERM have been sent to bgworker. 3. Set a signal flag inside the handler (something like miscadmin.h XXX_INTERRUPTS() macros). So it becomes an extension developer's responsibility to properly handle this flag in the bgworker’s code. This approach breaks backward compatibility. May be I've missed a good solution, do you see any? Best regards, Denis Smirnov | Developer s...@arenadata.io Arenadata | Godovikova 9-17, Moscow 129085 Russia
Async-unsafe functions in signal handlers
Hello all, I am going to refactor Greenplum backtraces for error messages and want to make it more compatible with PostgreSQL code. Backtraces in PostgreSQL were introduced by 71a8a4f6e36547bb060dbcc961ea9b57420f7190 commit (original discussion https://www.postgresql.org/message-id/CAMsr+YGL+yfWE=jvbubnpwtrrzney7hj07+zt4byjdvp4sz...@mail.gmail.com ) and rely on backtrace() and backtrace_symbols() functions. They are used inside errfinish() that is wrapped by ereport() macros. ereport() is invoked inside bgworker_die() and FloatExceptionHandler() signal handlers. I am confused with this fact - both backtrace functions are async-unsafe: backtrace_symbols() - always, backtrace() - only for the first call due to dlopen. I wonder why does PostgreSQL use async-unsafe functions in signal handlers? Best regards, Denis Smirnov | Developer s...@arenadata.io Arenadata | Godovikova 9-17, Moscow 129085 Russia